How to unit test applications (Python only)
LangSmith functional tests are assertions and expectations designed to quickly identify obvious bugs and regressions in your AI system. Relative to evaluations, tests typically are designed to be fast and cheap to run, focusing on specific functionality and edge cases with binary assertions. We recommend using LangSmith to track any unit tests, end-to-end integration tests, or other specific assertions that touch an LLM or other non-deterministic part of your AI system. Ideally these run on every commit in your CI pipeline to catch regressions early.
@unit
requires langsmith
Python version >=0.1.74
.
If you are interested in unit testing functionality in TypeScript or other languages, please upvote/comment on this GitHub Issue.
Write a @unit
To write a LangSmith functional test, decorate your test function with @unit
.
If you want to track the full nested trace of the system or component being tested, you can mark those functions with @traceable
. For example:
# my_app/main.py
from langsmith import traceable
@traceable # Optional
def generate_sql(user_query):
# Replace with your SQL generation logic
# e.g., my_llm(my_prompt.format(user_query))
return "SELECT * FROM customers"
Then define your test:
# tests/test_my_app.py
from langsmith import unit
from my_app.main import generate_sql
@unit
def test_sql_generation_select_all():
user_query = "Get all users from the customers table"
sql = generate_sql(user_query)
# LangSmith logs any exception raised by `assert` / `pytest.fail` / `raise` / etc.
# as a test failure
assert sql == "SELECT * FROM customers"
Run tests
You can use a standard testing framework such as pytest
(docs) to run. For example:
pytest tests/
Each time you run this test suite, LangSmith collects the pass/fail rate and other traces as a new TestSuiteResult
, logging the pass
rate (1 for pass, 0 for fail) over all the applicable tests.
The test suite syncs to a corresponding dataset named after your package or github repository.