add readme

angus-langchain · angus-langchain · commit faf75c8ee95c · 2025-05-15T17:42:32.000-07:00
diff --git a/README.md b/README.md
@@ -1,17 +1,127 @@
-# LangSmith Collector Proxy
+# LangSmith Collector-Proxy
 
-HTTP/JSON OTLP collector that batches, compresses (zstd) and uploads traces to LangSmith.
+## Overview
 
+The **LangSmith Collector-Proxy** is a middleware service designed to efficiently aggregate, compress, and bulk-upload tracing data from your applications to LangSmith. It's specifically optimized for large-scale, parallel environments generating high volumes of tracing data.
+## Why Use LangSmith Collector-Proxy?
 
-Run locally:
+Traditionally, each LangSmith SDK instance pushes traces directly to the LangSmith backend.
+When running massively parallel workloads, this approach of individual tracing data directly from each application instance can lead to significant TLS/HTTP overhead and increased egress costs.
+
+The LangSmith Collector-Proxy addresses these issues by batching multiple spans into fewer, larger, and compressed uploads.
+
+## Key Features
+
+* **Efficient Data Transfer**: Significantly reduces the number of requests by aggregating spans.
+* **Compression**: Utilizes `zstd` compression to minimize data size.
+* **OTLP Support**: Accepts standard OpenTelemetry Protocol (OTLP) data in both JSON and Protocol Buffer formats via HTTP POST requests.
+* **Semantic Translation**: Converts GenAI semantic convention attributes to the LangSmith tracing model.
+* **Flexible Batching**: Configurable batching based on either the number of spans or a time interval.
+
+## Getting Started
+
+### Prerequisites
+
+* Kubernetes cluster
+* Helm installed and configured
+
+### Installation
+
+To deploy the LangSmith Collector-Proxy, follow these steps:
+
+1. Add the LangChain Helm repository:
+
+```bash
+helm repo add langchain-ai https://github.com/langchain-ai/helm
+helm repo update
+```
+
+2. Install the Collector-Proxy using Helm:
+
+```bash
+helm install langsmith-collector-proxy langchain-ai/langsmith-collector-proxy --namespace=<your-namespace> --create-namespace
+```
+
+Replace `<your-namespace>` with your desired Kubernetes namespace.
+
+## Configuration
+
+The Collector-Proxy can be customized via environment variables or Helm chart values:
+
+| Variable             | Description                                    | Default                           |
+| -------------------- | ---------------------------------------------- | --------------------------------- |
+| `HTTP_PORT`          | Port to run the proxy server                   | `4318`                            |
+| `LANGSMITH_ENDPOINT` | LangSmith backend URL                          | `https://api.smith.langchain.com` |
+| `LANGSMITH_API_KEY`  | API key for authenticating with LangSmith      | Required                          |
+| `LANGSMITH_PROJECT`  | Default project for tracing data               | Optional                          |
+| `BATCH_SIZE`         | Number of spans per upload batch               | `100`                             |
+| `FLUSH_INTERVAL_MS`  | Time interval (in milliseconds) to flush spans | `5000` (5 seconds)                |
+| `MAX_BUFFER_BYTES`   | Maximum uncompressed buffer size               | `10485760` (10MB)                 |
+| `MAX_BODY_BYTES`     | Maximum size of incoming request body          | `209715200` (200MB)               |
+| `MAX_RETRIES`        | Number of retry attempts for failed uploads    | `3`                               |
+| `RETRY_BACKOFF_MS`   | Initial backoff duration in milliseconds       | `100`                             |
+
+### Example Helm Configuration
+
+```yaml
+langsmith:
+  endpoint: "https://api.smith.langchain.com"
+  apiKey: "your-api-key"
+  project: "default-project"
+
+batching:
+  size: 200
+  intervalMs: 10000
+  maxBufferBytes: 20971520
+
+retry:
+  maxAttempts: 5
+  backoffInitialMs: 200
+```
+
+Save this as `values.yaml` and deploy with:
+
+```bash
+helm upgrade --install langsmith-collector-proxy langchain-ai/langsmith-collector-proxy -f values.yaml
+```
+
+## Usage
+
+
+Point your OTLP-compatible tracing clients or OpenTelemetry Collector exporter to the Collector-Proxy endpoint:
+```bash
+export OTEL_EXPORTER_OTLP_ENDPOINT=http://langsmith-collector-proxy.<your-namespace>.svc.cluster.local:4318/v1/traces
+```
+
+Ensure your tracing requests include the necessary headers:
+
+* `X-API-Key`: Your LangSmith API key (not required if set in environment)
+* `Langsmith-Project`: Optional, specifies the project name
+
+## Monitoring and Health Checks
+
+The LangSmith Collector-Proxy exposes simple health-check endpoints:
+
+* Liveness check: `/live`
+* Readiness check: `/ready`
+
+These endpoints return `HTTP 200` when the service is operational.
+
+## Running Locally
+
+To run the Collector-Proxy locally, follow these steps:
 ```
-  export LANGSMITH_API_KEY=devkey
-  go run ./cmd/collector
+export LANGSMITH_API_KEY=lsv2...
+go run ./cmd/collector
 ```
-POST OTLP/JSON:
+You can send sample OTLP/JSON data using curl:
 ```
-  curl -X POST -H "X-API-Key: devkey" \
-       -H "Content-Type: application/json" \
-       --data '{"resourceSpans":[]}' \
-       http://localhost:4318/v1/traces
+curl -X POST -H "X-API-Key: devkey" \
+     -H "Content-Type: application/json" \
+     --data '{"resourceSpans":[]}' \
+     http://localhost:4318/v1/traces
 ```
+
+---
+
+**License:** Apache License 2.0 © LangChain AI