Skip to content

Commit 61da4ab

Browse files
authored
Merge branch 'main' into fix/traceable-docs
2 parents 5be60d1 + 6a67aa2 commit 61da4ab

File tree

363 files changed

+13451
-1637
lines changed

Some content is hidden

Large Commits have some content hidden by default. Use the searchbox below for content that may be hidden.

363 files changed

+13451
-1637
lines changed

.github/workflows/spell-check.yml

Lines changed: 21 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,21 @@
1+
name: Spell Checking
2+
3+
on: [pull_request]
4+
5+
jobs:
6+
codespell:
7+
name: Check spelling with codespell
8+
runs-on: ubuntu-latest
9+
steps:
10+
- uses: codespell-project/actions-codespell@v2
11+
with:
12+
check_filenames: true
13+
misspell:
14+
name: Check spelling with misspell
15+
runs-on: ubuntu-latest
16+
steps:
17+
- uses: actions/checkout@v2
18+
- name: Install
19+
run: wget -O - -q https://git.io/misspell | sh -s -- -b .
20+
- name: Misspell
21+
run: ./misspell -error

docs/evaluation/faq/custom-evaluators.mdx

Lines changed: 2 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -288,8 +288,9 @@ The flexibility of the functional interface means you can easly apply evaluators
288288
```python
289289
from evaluate import load
290290
from langsmith.schemas import Example, Run
291+
from langsmith.evaluation import RunEvaluator
291292

292-
class PerplexityEvaluator:
293+
class PerplexityEvaluator(RunEvaluator):
293294
def __init__(self, prediction_key: Optional[str] = None, model_id: str = "gpt-2"):
294295
self.prediction_key = prediction_key
295296
self.model_id = model_id

docs/evaluation/faq/regression-testing.mdx

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -37,4 +37,4 @@ Click on the regressions or improvements buttons on the top of each column to fi
3737

3838
## Try it out
3939

40-
To get started with regression testing, try [running a no-code experiment in our prompt playground](experiments-app) or check out the [Evaluation Quick Start Guide](/evaluation/quickstart) to get started with the SDK.
40+
To get started with regression testing, try [running a no-code experiment in our prompt playground](experiments-app) or check out the [Evaluation Quick Start Guide](../quickstart) to get started with the SDK.

docs/evaluation/faq/unit-testing.mdx

Lines changed: 4 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -135,12 +135,12 @@ def test_embedding_similarity(query, expectation):
135135
prediction = my_chatbot(query)
136136
expect.embedding_distance(
137137
# This step logs the distance as feedback for this run
138-
prediction=prediction, expectation=expectation
138+
prediction=prediction, reference=expectation
139139
# Adding a matcher (in this case, 'to_be_*"), logs 'expectation' feedback
140140
).to_be_less_than(0.5) # Optional predicate to assert against
141141
expect.edit_distance(
142142
# This computes the normalized Damerau-Levenshtein distance between the two strings
143-
prediction=prediction, expectation=expectation
143+
prediction=prediction, reference=expectation
144144
# If no predicate is provided below, 'assert' isn't called, but the score is still logged
145145
)
146146
```
@@ -195,8 +195,8 @@ The following metrics are available off-the-shelf:
195195
| -------------------- | ----------------------------------------------------------- | --------------------------------------------------------------------------------------------------------------------- |
196196
| `pass` | Binary pass/fail score, 1 for pass, 0 for fail | `assert False` # Fails |
197197
| `expectation` | Binary expectation score, 1 if expectation is met, 0 if not | `expect(prediction).against(lambda x: re.search(r"\b[a-f\d]{8}-[a-f\d]{4}-[a-f\d]{4}-[a-f\d]{4}-[a-f\d]{12}\b", x)` ) |
198-
| `embedding_distance` | Cosine distance between two embeddings | expect.embedding_distance(prediction=prediction, expectation=expectation) |
199-
| `edit_distance` | Edit distance between two strings | expect.edit_distance(prediction=prediction, expectation=expectation) |
198+
| `embedding_distance` | Cosine distance between two embeddings | expect.embedding_distance(prediction=prediction, reference=expectation) |
199+
| `edit_distance` | Edit distance between two strings | expect.edit_distance(prediction=prediction, reference=expectation) |
200200

201201
You can also log any arbitrary feeback within a unit test manually using the `client`.
202202

docs/evaluation/index.mdx

Lines changed: 5 additions & 5 deletions
Original file line numberDiff line numberDiff line change
@@ -129,14 +129,14 @@ The evaluator itself can be any arbitrary function. There are a few different ty
129129
which would be considered a **ground truth** evaluator because it compares the output to a reference. See [How to create custom evaluators](evaluation/faq/custom-evaluators).
130130
- **LLM-as-judge**: An LLM-as-judge evaluator uses an LLM to score system output. For example, you might want to check whether your system is outputting
131131
offensive content. This is **reference-free**, as there is no comparison to an example output. You might also want to check whether the system output has the same
132-
meaning as the example output, which would be a **ground truth** evaluator. To get started with LLM-as-a-judge, try out LangSmith's [off-the-shelf evaluators](https://docs.smith.langchain.com/evaluation/faq/evaluator-implementations)!
132+
meaning as the example output, which would be a **ground truth** evaluator. To get started with LLM-as-a-judge, try out LangSmith's [off-the-shelf evaluators](evaluation/faq/evaluator-implementations)!
133133
- **Human**: You can also evaluate your runs manually. This can be done in LangSmith [via the SDK](tracing/faq/logging_feedback#capturing-feedback-programmatically),
134-
or [in the LangSmith UI](http://localhost:3000/tracing/faq/logging_feedback#annotating-traces-with-feedback).
134+
or [in the LangSmith UI](tracing/faq/logging_feedback#annotating-traces-with-feedback).
135135

136136
## Next steps
137137

138-
To get started with code, check out the [Quick Start Guide](/evaluation/quickstart).
138+
To get started with code, check out the [Quick Start Guide](evaluation/quickstart).
139139

140-
If you want to learn how to accomplish a particular task, check out our comprehensive [How-To Guides](/evaluation/faq)
140+
If you want to learn how to accomplish a particular task, check out our comprehensive [How-To Guides](evaluation/faq)
141141

142-
For a higher-level set of recommendations on how to think about testing and evaluating your LLM app, check out the [evaluation recommendations](/evaluation/recommendations) page.
142+
For a higher-level set of recommendations on how to think about testing and evaluating your LLM app, check out the [evaluation recommendations](evaluation/recommendations) page.

docs/index.mdx

Lines changed: 12 additions & 16 deletions
Original file line numberDiff line numberDiff line change
@@ -27,7 +27,7 @@ import DocCardList from "@theme/DocCardList";
2727

2828
## Introduction
2929

30-
**LangSmith** is a platform for building production-grade LLM applications. It allows you to closely monitor and evaluate your application, so you can ship quickly and with confidence. Use of LangChain is not necessary - LangSmith works on its own!
30+
[LangSmith](https://smith.langchain.com/) is a platform for building production-grade LLM applications. It allows you to closely monitor and evaluate your application, so you can ship quickly and with confidence. Use of LangChain is not necessary - LangSmith works on its own!
3131

3232
## Install LangSmith
3333

@@ -61,11 +61,9 @@ To create an API key head to the [Settings page](https://smith.langchain.com/set
6161

6262
## Log your first trace
6363

64-
<p>
65-
We provide multiple ways to log traces to LangSmith. Below, we'll highlight
66-
how to use <code>traceable</code>. See more on the{" "}
67-
<a href="./tracing/integrations">Integrations</a> page.
68-
</p>
64+
We provide multiple ways to log traces to LangSmith. Below, we'll highlight
65+
how to use `traceable`. See more on the [Integrations](./tracing/integrations/index.mdx) page.
66+
6967
<CodeTabs
7068
tabs={[
7169
{
@@ -85,11 +83,11 @@ To create an API key head to the [Settings page](https://smith.langchain.com/set
8583
/>
8684

8785
- View a [sample output trace](https://smith.langchain.com/public/b37ca9b1-60cd-4a2a-817e-3c4e4443fdc0/r).
88-
- Learn more about tracing on the [tracing page](/tracing).
86+
- Learn more about tracing on the [tracing page](./tracing/index.mdx).
8987

9088
## Create your first evaluation
9189

92-
Evalution requires a system to test, [data](evaluation/faq) to serve as test cases, and optionally evaluators to grade the results. Here we use a built-in accuracy evaluator.
90+
Evalution requires a system to test, [data](./evaluation/faq/index.mdx) to serve as test cases, and optionally evaluators to grade the results. Here we use a built-in accuracy evaluator.
9391

9492
<CodeTabs
9593
tabs={[
@@ -176,19 +174,17 @@ await runOnDataset(
176174
groupId="client-language"
177175
/>
178176

179-
- See more on the [evaluation quick start page](/evaluation/quickstart).
177+
- See more on the [evaluation quick start page](./evaluation/quickstart.mdx).
180178

181179
## Next Steps
182180

183181
Check out the following sections to learn more about LangSmith:
184182

185-
- **[User Guide](/user_guide)**: Learn about the workflows LangSmith supports at each stage of the LLM application lifecycle.
183+
- **[User Guide](./user_guide.mdx)**: Learn about the workflows LangSmith supports at each stage of the LLM application lifecycle.
186184
- **[Pricing](/pricing)**: Learn about the pricing model for LangSmith.
187-
- **[Self-Hosting](/category/self-hosting)**: Learn about self-hosting options for LangSmith.
188-
- **[Proxy](/category/proxy)**: Learn about the proxy capabilities of LangSmith.
189-
- **[Tracing](/tracing)**: Learn about the tracing capabilities of LangSmith.
190-
- **[Evaluation](/evaluation)**: Learn about the evaluation capabilities of LangSmith.
191-
- **[Prompt Hub](/category/prompt-hub)** Learn about the Prompt Hub, a prompt management tool built into LangSmith.
185+
- **[Self-Hosting](./self_hosting)**: Learn about self-hosting options for LangSmith.
186+
- **[Tracing](./tracing/index.mdx)**: Learn about the tracing capabilities of LangSmith.
187+
- **[Evaluation](./evaluation/index.mdx)**: Learn about the evaluation capabilities of LangSmith.
192188

193189
## Additional Resources
194190

@@ -202,7 +198,7 @@ Check out the following sections to learn more about LangSmith:
202198

203199
### How do I migrate projects between organizations?
204200

205-
Currently we do not support project migration betwen organizations. While you can manually imitate this by reading and writing runs and datasets using the SDK (see the querying runs and traces guide [here](/tracing/faq/querying_traces)), it will be fastest to create a new project within your organization and go from there.
201+
Currently we do not support project migration betwen organizations. While you can manually imitate this by reading and writing runs and datasets using the SDK (see the querying runs and traces guide [here](./tracing/faq/querying_traces.mdx)), it will be fastest to create a new project within your organization and go from there.
206202

207203
### Why aren't my runs aren't showing up in my project?
208204

docs/monitoring/concepts.mdx

Lines changed: 3 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -7,7 +7,7 @@ table_of_contents: true
77
# Concepts
88

99
In this guide we will go over some of the concepts that are important to understand when thinking about production logging and automations in LangSmith.
10-
A lot of these concepts build off of tracing concepts - it is recommended to read the [Tracing Concepts](/tracing/concepts) documentation before.
10+
A lot of these concepts build off of tracing concepts - it is recommended to read the [Tracing Concepts](../tracing/concepts) documentation before.
1111

1212
## Runs
1313

@@ -29,7 +29,7 @@ A `Thread` is a sequence of traces representing a conversation. Each response is
2929

3030
You can track threads by attaching a special metadata key to runs (one of `session_id`, `thread_id` or `conversation_id`).
3131

32-
See [this documentation](/tracing/faq/customizing_trace_attributes#adding-metadata-and-tags-to-traces) for how to add metadata keys to a trace.
32+
See [this documentation](../tracing/faq/customizing_trace_attributes#adding-metadata-and-tags-to-traces) for how to add metadata keys to a trace.
3333

3434
## Monitoring Dashboard
3535

@@ -43,7 +43,7 @@ An example of a rule could be, in plain English, "Run a 'vagueness' evaluator on
4343

4444
## Datasets
4545

46-
Datasets are a way to collect examples, which are input-output pairs. You can use datasets for evaluation, as well as fine-tuning and few-shot prompting. For more information, see [here](/evaluation)
46+
Datasets are a way to collect examples, which are input-output pairs. You can use datasets for evaluation, as well as fine-tuning and few-shot prompting. For more information, see [here](../evaluation)
4747

4848
## Annotation Queues
4949

docs/monitoring/faq/filter.mdx

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -22,7 +22,7 @@ You can also define a filter from the `Filter Shortcuts` on the sidebar. This co
2222
## How to filter for sub runs
2323

2424
In order to filter for sub runs, you first need to remove the default filter of `IsRoot` is `true`. After that, you can apply any filter you wish. A common way to do this is to filter by name for sub runs.
25-
This relies on good naming for all parts of your pipeline - see [here](/tracing/faq/customizing_trace_attributes#customizing-the-run-name) for more details on how to do that.
25+
This relies on good naming for all parts of your pipeline - see [here](../../tracing/faq/customizing_trace_attributes#customizing-the-run-name) for more details on how to do that.
2626

2727
## How to filter for sub runs whose parent traces have some attribute
2828

docs/monitoring/faq/monitoring.mdx

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -16,7 +16,7 @@ You can view monitors over differing time periods. This can be controlled by the
1616
By default, the monitor tab shows results for all runs. However, you can group runs in order to see how different subsets perform.
1717
This can be useful to compare how two different prompts or models are performing.
1818

19-
In order to do this, you first need to make sure you are [attaching appropriate tags or metadata](/tracing/faq/customizing_trace_attributes#adding-metadata-and-tags-to-traces) to these runs when logging them.
19+
In order to do this, you first need to make sure you are [attaching appropriate tags or metadata](../../tracing/faq/customizing_trace_attributes#adding-metadata-and-tags-to-traces) to these runs when logging them.
2020
After that, you can click the `Tag` or `Metadata` tab at the top to group runs accordingly.
2121

2222
![Subsets Monitor](../static/subsets_monitor.png)

docs/monitoring/faq/online_evaluation.mdx

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -11,7 +11,7 @@ Currently, we provide support for specifying a prompt template, a model, and a s
1111

1212
## How to set up online evaluation
1313

14-
The way to configure online evaluation is to first set up an [automation](/monitoring/faq/automations).
14+
The way to configure online evaluation is to first set up an [automation](../../monitoring/faq/automations).
1515

1616
![Subsets Monitor](../static/filter_rule.png)
1717

docs/monitoring/index.mdx

Lines changed: 4 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -11,10 +11,10 @@ It's also crucial to get a high-level overview of application performance with r
1111
In order to facilitate this, LangSmith supports a series of workflows to support production monitoring and automations.
1212
This includes support for easily exploring and visualizing key production metrics, as well as support for defining automations to process the data.
1313

14-
To get started, check out the [Quick Start Guide](/monitoring/quickstart).
14+
To get started, check out the [Quick Start Guide](monitoring/quickstart).
1515

16-
After that, peruse the [Concepts Section](/monitoring/concepts) to better understand the different components involved with monitoring and automations.
16+
After that, peruse the [Concepts Section](monitoring/concepts) to better understand the different components involved with monitoring and automations.
1717

18-
If you want to learn how to accomplish a particular task, check out our comprehensive [How-To Guides](/monitoring/faq)
18+
If you want to learn how to accomplish a particular task, check out our comprehensive [How-To Guides](monitoring/faq)
1919

20-
For example use cases, check out the [Use Cases](/monitoring/use_cases) page.
20+
For example use cases, check out the [Use Cases](monitoring/use_cases) page.

docs/monitoring/quickstart.mdx

Lines changed: 4 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -6,13 +6,13 @@ table_of_contents: true
66

77
# Quick Start
88

9-
Production monitoring starts by configuring tracing for your application. See the [tracing section](/tracing) for details on how to do that.
9+
Production monitoring starts by configuring tracing for your application. See the [tracing section](../tracing) for details on how to do that.
1010

1111
Compared to tracing while prototyping applications, you want to pay attention to a few particular points:
1212

13-
- [Sampling](/tracing/faq/logging_and_viewing#setting-a-sampling-rate-for-tracing): When logging production workloads, you may only want to log a subset of the datapoints flowing through your system.
14-
- [Adding Metadata](/tracing/faq/customizing_trace_attributes#adding-metadata-and-tags-to-traces): As we'll see with automations, attaching relevant metadata to runs is particularly important to enable filtering and grouping your data.
15-
- [Feedback](/tracing/faq/logging_feedback): When an application is in production you can't always look at all datapoints. Capturing user feedback is helpful to draw your attention to particular datapoints.
13+
- [Sampling](../tracing/faq/logging_and_viewing#setting-a-sampling-rate-for-tracing): When logging production workloads, you may only want to log a subset of the datapoints flowing through your system.
14+
- [Adding Metadata](../tracing/faq/customizing_trace_attributes#adding-metadata-and-tags-to-traces): As we'll see with automations, attaching relevant metadata to runs is particularly important to enable filtering and grouping your data.
15+
- [Feedback](../tracing/faq/logging_feedback): When an application is in production you can't always look at all datapoints. Capturing user feedback is helpful to draw your attention to particular datapoints.
1616

1717
So - now you've got your logs flowing into LangSmith. What can you do with that data?
1818

0 commit comments

Comments
 (0)