langchain-ai
diff --git a/‎.github/workflows/spell-check.yml
Lines changed: 21 additions & 0 deletions b/‎.github/workflows/spell-check.yml
Lines changed: 21 additions & 0 deletions
diff --git a/‎docs/evaluation/faq/custom-evaluators.mdx
Lines changed: 2 additions & 1 deletion b/‎docs/evaluation/faq/custom-evaluators.mdx
Lines changed: 2 additions & 1 deletion
diff --git a/‎docs/evaluation/faq/regression-testing.mdx
Lines changed: 1 addition & 1 deletion b/‎docs/evaluation/faq/regression-testing.mdx
Lines changed: 1 addition & 1 deletion
diff --git a/‎docs/evaluation/faq/unit-testing.mdx
Lines changed: 4 additions & 4 deletions b/‎docs/evaluation/faq/unit-testing.mdx
Lines changed: 4 additions & 4 deletions
diff --git a/‎docs/evaluation/index.mdx
Lines changed: 5 additions & 5 deletions b/‎docs/evaluation/index.mdx
Lines changed: 5 additions & 5 deletions
diff --git a/‎docs/index.mdx
Lines changed: 12 additions & 16 deletions b/‎docs/index.mdx
Lines changed: 12 additions & 16 deletions
diff --git a/‎docs/monitoring/concepts.mdx
Lines changed: 3 additions & 3 deletions b/‎docs/monitoring/concepts.mdx
Lines changed: 3 additions & 3 deletions
diff --git a/‎docs/monitoring/faq/filter.mdx
Lines changed: 1 addition & 1 deletion b/‎docs/monitoring/faq/filter.mdx
Lines changed: 1 addition & 1 deletion
diff --git a/‎docs/monitoring/faq/monitoring.mdx
Lines changed: 1 addition & 1 deletion b/‎docs/monitoring/faq/monitoring.mdx
Lines changed: 1 addition & 1 deletion
diff --git a/‎docs/monitoring/faq/online_evaluation.mdx
Lines changed: 1 addition & 1 deletion b/‎docs/monitoring/faq/online_evaluation.mdx
Lines changed: 1 addition & 1 deletion
diff --git a/‎docs/monitoring/index.mdx
Lines changed: 4 additions & 4 deletions b/‎docs/monitoring/index.mdx
Lines changed: 4 additions & 4 deletions
diff --git a/‎docs/monitoring/quickstart.mdx
Lines changed: 4 additions & 4 deletions b/‎docs/monitoring/quickstart.mdx
Lines changed: 4 additions & 4 deletions
@@ -0,0 +1,21 @@
+name: Spell Checking
+
+on: [pull_request]
+
+jobs:
+  codespell:
+    name: Check spelling with codespell
+    runs-on: ubuntu-latest
+    steps:
+      - uses: codespell-project/actions-codespell@v2
+        with:
+          check_filenames: true
+  misspell:
+    name: Check spelling with misspell
+    runs-on: ubuntu-latest
+    steps:
+      - uses: actions/checkout@v2
+      - name: Install
+        run: wget -O - -q https://git.io/misspell | sh -s -- -b .
+      - name: Misspell
+        run: ./misspell -error
@@ -288,8 +288,9 @@ The flexibility of the functional interface means you can easly apply evaluators
 ```python
 from evaluate import load
 from langsmith.schemas import Example, Run
+from langsmith.evaluation import RunEvaluator
 
-class PerplexityEvaluator:
+class PerplexityEvaluator(RunEvaluator):
     def __init__(self, prediction_key: Optional[str] = None, model_id: str = "gpt-2"):
         self.prediction_key = prediction_key
         self.model_id = model_id
 
@@ -37,4 +37,4 @@ Click on the regressions or improvements buttons on the top of each column to fi
 
 ## Try it out
 
-To get started with regression testing, try [running a no-code experiment in our prompt playground](experiments-app) or check out the [Evaluation Quick Start Guide](/evaluation/quickstart) to get started with the SDK.
+To get started with regression testing, try [running a no-code experiment in our prompt playground](experiments-app) or check out the [Evaluation Quick Start Guide](../quickstart) to get started with the SDK.
@@ -135,12 +135,12 @@ def test_embedding_similarity(query, expectation):
     prediction = my_chatbot(query)
     expect.embedding_distance(
         # This step logs the distance as feedback for this run
-        prediction=prediction, expectation=expectation
+        prediction=prediction, reference=expectation
     # Adding a matcher (in this case, 'to_be_*"), logs 'expectation' feedback
     ).to_be_less_than(0.5) # Optional predicate to assert against
     expect.edit_distance(
         # This computes the normalized Damerau-Levenshtein distance between the two strings
-        prediction=prediction, expectation=expectation
+        prediction=prediction, reference=expectation
     # If no predicate is provided below, 'assert' isn't called, but the score is still logged
     )
 ```
@@ -195,8 +195,8 @@ The following metrics are available off-the-shelf:
 | -------------------- | ----------------------------------------------------------- | --------------------------------------------------------------------------------------------------------------------- |
 | `pass`               | Binary pass/fail score, 1 for pass, 0 for fail              | `assert False` # Fails                                                                                                |
 | `expectation`        | Binary expectation score, 1 if expectation is met, 0 if not | `expect(prediction).against(lambda x: re.search(r"\b[a-f\d]{8}-[a-f\d]{4}-[a-f\d]{4}-[a-f\d]{4}-[a-f\d]{12}\b", x)` ) |
-| `embedding_distance` | Cosine distance between two embeddings                      | expect.embedding_distance(prediction=prediction, expectation=expectation)                                             |
-| `edit_distance`      | Edit distance between two strings                           | expect.edit_distance(prediction=prediction, expectation=expectation)                                                  |
+| `embedding_distance` | Cosine distance between two embeddings                      | expect.embedding_distance(prediction=prediction, reference=expectation)                                               |
+| `edit_distance`      | Edit distance between two strings                           | expect.edit_distance(prediction=prediction, reference=expectation)                                                    |
 
 You can also log any arbitrary feeback within a unit test manually using the `client`.
 
 
@@ -129,14 +129,14 @@ The evaluator itself can be any arbitrary function. There are a few different ty
   which would be considered a **ground truth** evaluator because it compares the output to a reference. See [How to create custom evaluators](evaluation/faq/custom-evaluators).
 - **LLM-as-judge**: An LLM-as-judge evaluator uses an LLM to score system output. For example, you might want to check whether your system is outputting
   offensive content. This is **reference-free**, as there is no comparison to an example output. You might also want to check whether the system output has the same
-  meaning as the example output, which would be a **ground truth** evaluator. To get started with LLM-as-a-judge, try out LangSmith's [off-the-shelf evaluators](https://docs.smith.langchain.com/evaluation/faq/evaluator-implementations)!
+  meaning as the example output, which would be a **ground truth** evaluator. To get started with LLM-as-a-judge, try out LangSmith's [off-the-shelf evaluators](evaluation/faq/evaluator-implementations)!
 - **Human**: You can also evaluate your runs manually. This can be done in LangSmith [via the SDK](tracing/faq/logging_feedback#capturing-feedback-programmatically),
-  or [in the LangSmith UI](http://localhost:3000/tracing/faq/logging_feedback#annotating-traces-with-feedback).
+  or [in the LangSmith UI](tracing/faq/logging_feedback#annotating-traces-with-feedback).
 
 ## Next steps
 
-To get started with code, check out the [Quick Start Guide](/evaluation/quickstart).
+To get started with code, check out the [Quick Start Guide](evaluation/quickstart).
 
-If you want to learn how to accomplish a particular task, check out our comprehensive [How-To Guides](/evaluation/faq)
+If you want to learn how to accomplish a particular task, check out our comprehensive [How-To Guides](evaluation/faq)
 
-For a higher-level set of recommendations on how to think about testing and evaluating your LLM app, check out the [evaluation recommendations](/evaluation/recommendations) page.
+For a higher-level set of recommendations on how to think about testing and evaluating your LLM app, check out the [evaluation recommendations](evaluation/recommendations) page.
@@ -27,7 +27,7 @@ import DocCardList from "@theme/DocCardList";
 
 ## Introduction
 
-**LangSmith** is a platform for building production-grade LLM applications. It allows you to closely monitor and evaluate your application, so you can ship quickly and with confidence. Use of LangChain is not necessary - LangSmith works on its own!
+[LangSmith](https://smith.langchain.com/) is a platform for building production-grade LLM applications. It allows you to closely monitor and evaluate your application, so you can ship quickly and with confidence. Use of LangChain is not necessary - LangSmith works on its own!
 
 ## Install LangSmith
 
@@ -61,11 +61,9 @@ To create an API key head to the [Settings page](https://smith.langchain.com/set
 
 ## Log your first trace
 
-<p>
-  We provide multiple ways to log traces to LangSmith. Below, we'll highlight
-  how to use <code>traceable</code>. See more on the{" "}
-  <a href="./tracing/integrations">Integrations</a> page.
-</p>
+We provide multiple ways to log traces to LangSmith. Below, we'll highlight
+how to use `traceable`. See more on the [Integrations](./tracing/integrations/index.mdx) page.
+
 <CodeTabs
   tabs={[
     {
@@ -85,11 +83,11 @@ To create an API key head to the [Settings page](https://smith.langchain.com/set
 />
 
 - View a [sample output trace](https://smith.langchain.com/public/b37ca9b1-60cd-4a2a-817e-3c4e4443fdc0/r).
-- Learn more about tracing on the [tracing page](/tracing).
+- Learn more about tracing on the [tracing page](./tracing/index.mdx).
 
 ## Create your first evaluation
 
-Evalution requires a system to test, [data](evaluation/faq) to serve as test cases, and optionally evaluators to grade the results. Here we use a built-in accuracy evaluator.
+Evalution requires a system to test, [data](./evaluation/faq/index.mdx) to serve as test cases, and optionally evaluators to grade the results. Here we use a built-in accuracy evaluator.
 
 <CodeTabs
   tabs={[
@@ -176,19 +174,17 @@ await runOnDataset(
   groupId="client-language"
 />
 
-- See more on the [evaluation quick start page](/evaluation/quickstart).
+- See more on the [evaluation quick start page](./evaluation/quickstart.mdx).
 
 ## Next Steps
 
 Check out the following sections to learn more about LangSmith:
 
-- **[User Guide](/user_guide)**: Learn about the workflows LangSmith supports at each stage of the LLM application lifecycle.
+- **[User Guide](./user_guide.mdx)**: Learn about the workflows LangSmith supports at each stage of the LLM application lifecycle.
 - **[Pricing](/pricing)**: Learn about the pricing model for LangSmith.
-- **[Self-Hosting](/category/self-hosting)**: Learn about self-hosting options for LangSmith.
-- **[Proxy](/category/proxy)**: Learn about the proxy capabilities of LangSmith.
-- **[Tracing](/tracing)**: Learn about the tracing capabilities of LangSmith.
-- **[Evaluation](/evaluation)**: Learn about the evaluation capabilities of LangSmith.
-- **[Prompt Hub](/category/prompt-hub)** Learn about the Prompt Hub, a prompt management tool built into LangSmith.
+- **[Self-Hosting](./self_hosting)**: Learn about self-hosting options for LangSmith.
+- **[Tracing](./tracing/index.mdx)**: Learn about the tracing capabilities of LangSmith.
+- **[Evaluation](./evaluation/index.mdx)**: Learn about the evaluation capabilities of LangSmith.
 
 ## Additional Resources
 
@@ -202,7 +198,7 @@ Check out the following sections to learn more about LangSmith:
 
 ### How do I migrate projects between organizations?
 
-Currently we do not support project migration betwen organizations. While you can manually imitate this by reading and writing runs and datasets using the SDK (see the querying runs and traces guide [here](/tracing/faq/querying_traces)), it will be fastest to create a new project within your organization and go from there.
+Currently we do not support project migration betwen organizations. While you can manually imitate this by reading and writing runs and datasets using the SDK (see the querying runs and traces guide [here](./tracing/faq/querying_traces.mdx)), it will be fastest to create a new project within your organization and go from there.
 
 ### Why aren't my runs aren't showing up in my project?
 
 
@@ -7,7 +7,7 @@ table_of_contents: true
 # Concepts
 
 In this guide we will go over some of the concepts that are important to understand when thinking about production logging and automations in LangSmith.
-A lot of these concepts build off of tracing concepts - it is recommended to read the [Tracing Concepts](/tracing/concepts) documentation before.
+A lot of these concepts build off of tracing concepts - it is recommended to read the [Tracing Concepts](../tracing/concepts) documentation before.
 
 ## Runs
 
@@ -29,7 +29,7 @@ A `Thread` is a sequence of traces representing a conversation. Each response is
 
 You can track threads by attaching a special metadata key to runs (one of `session_id`, `thread_id` or `conversation_id`).
 
-See [this documentation](/tracing/faq/customizing_trace_attributes#adding-metadata-and-tags-to-traces) for how to add metadata keys to a trace.
+See [this documentation](../tracing/faq/customizing_trace_attributes#adding-metadata-and-tags-to-traces) for how to add metadata keys to a trace.
 
 ## Monitoring Dashboard
 
@@ -43,7 +43,7 @@ An example of a rule could be, in plain English, "Run a 'vagueness' evaluator on
 
 ## Datasets
 
-Datasets are a way to collect examples, which are input-output pairs. You can use datasets for evaluation, as well as fine-tuning and few-shot prompting. For more information, see [here](/evaluation)
+Datasets are a way to collect examples, which are input-output pairs. You can use datasets for evaluation, as well as fine-tuning and few-shot prompting. For more information, see [here](../evaluation)
 
 ## Annotation Queues
 
 
@@ -22,7 +22,7 @@ You can also define a filter from the `Filter Shortcuts` on the sidebar. This co
 ## How to filter for sub runs
 
 In order to filter for sub runs, you first need to remove the default filter of `IsRoot` is `true`. After that, you can apply any filter you wish. A common way to do this is to filter by name for sub runs.
-This relies on good naming for all parts of your pipeline - see [here](/tracing/faq/customizing_trace_attributes#customizing-the-run-name) for more details on how to do that.
+This relies on good naming for all parts of your pipeline - see [here](../../tracing/faq/customizing_trace_attributes#customizing-the-run-name) for more details on how to do that.
 
 ## How to filter for sub runs whose parent traces have some attribute
 
 
@@ -16,7 +16,7 @@ You can view monitors over differing time periods. This can be controlled by the
 By default, the monitor tab shows results for all runs. However, you can group runs in order to see how different subsets perform.
 This can be useful to compare how two different prompts or models are performing.
 
-In order to do this, you first need to make sure you are [attaching appropriate tags or metadata](/tracing/faq/customizing_trace_attributes#adding-metadata-and-tags-to-traces) to these runs when logging them.
+In order to do this, you first need to make sure you are [attaching appropriate tags or metadata](../../tracing/faq/customizing_trace_attributes#adding-metadata-and-tags-to-traces) to these runs when logging them.
 After that, you can click the `Tag` or `Metadata` tab at the top to group runs accordingly.
 
 ![Subsets Monitor](../static/subsets_monitor.png)
 
@@ -11,7 +11,7 @@ Currently, we provide support for specifying a prompt template, a model, and a s
 
 ## How to set up online evaluation
 
-The way to configure online evaluation is to first set up an [automation](/monitoring/faq/automations).
+The way to configure online evaluation is to first set up an [automation](../../monitoring/faq/automations).
 
 ![Subsets Monitor](../static/filter_rule.png)
 
 
@@ -11,10 +11,10 @@ It's also crucial to get a high-level overview of application performance with r
 In order to facilitate this, LangSmith supports a series of workflows to support production monitoring and automations.
 This includes support for easily exploring and visualizing key production metrics, as well as support for defining automations to process the data.
 
-To get started, check out the [Quick Start Guide](/monitoring/quickstart).
+To get started, check out the [Quick Start Guide](monitoring/quickstart).
 
-After that, peruse the [Concepts Section](/monitoring/concepts) to better understand the different components involved with monitoring and automations.
+After that, peruse the [Concepts Section](monitoring/concepts) to better understand the different components involved with monitoring and automations.
 
-If you want to learn how to accomplish a particular task, check out our comprehensive [How-To Guides](/monitoring/faq)
+If you want to learn how to accomplish a particular task, check out our comprehensive [How-To Guides](monitoring/faq)
 
-For example use cases, check out the [Use Cases](/monitoring/use_cases) page.
+For example use cases, check out the [Use Cases](monitoring/use_cases) page.
@@ -6,13 +6,13 @@ table_of_contents: true
 
 # Quick Start
 
-Production monitoring starts by configuring tracing for your application. See the [tracing section](/tracing) for details on how to do that.
+Production monitoring starts by configuring tracing for your application. See the [tracing section](../tracing) for details on how to do that.
 
 Compared to tracing while prototyping applications, you want to pay attention to a few particular points:
 
-- [Sampling](/tracing/faq/logging_and_viewing#setting-a-sampling-rate-for-tracing): When logging production workloads, you may only want to log a subset of the datapoints flowing through your system.
-- [Adding Metadata](/tracing/faq/customizing_trace_attributes#adding-metadata-and-tags-to-traces): As we'll see with automations, attaching relevant metadata to runs is particularly important to enable filtering and grouping your data.
-- [Feedback](/tracing/faq/logging_feedback): When an application is in production you can't always look at all datapoints. Capturing user feedback is helpful to draw your attention to particular datapoints.
+- [Sampling](../tracing/faq/logging_and_viewing#setting-a-sampling-rate-for-tracing): When logging production workloads, you may only want to log a subset of the datapoints flowing through your system.
+- [Adding Metadata](../tracing/faq/customizing_trace_attributes#adding-metadata-and-tags-to-traces): As we'll see with automations, attaching relevant metadata to runs is particularly important to enable filtering and grouping your data.
+- [Feedback](../tracing/faq/logging_feedback): When an application is in production you can't always look at all datapoints. Capturing user feedback is helpful to draw your attention to particular datapoints.
 
 So - now you've got your logs flowing into LangSmith. What can you do with that data?
Original file line number	Diff line number	Diff line change
`@@ -37,4 +37,4 @@ Click on the regressions or improvements buttons on the top of each column to fi`
`37`	`37`
`38`	`38`	`## Try it out`
`39`	`39`
`40`		`-To get started with regression testing, try [running a no-code experiment in our prompt playground](experiments-app) or check out the [Evaluation Quick Start Guide](/evaluation/quickstart) to get started with the SDK.`
	`40`	`+To get started with regression testing, try [running a no-code experiment in our prompt playground](experiments-app) or check out the [Evaluation Quick Start Guide](../quickstart) to get started with the SDK.`