Skip to content

Commit faf75c8

Browse files
add readme
1 parent 5583c3e commit faf75c8

File tree

1 file changed

+120
-10
lines changed

1 file changed

+120
-10
lines changed

README.md

Lines changed: 120 additions & 10 deletions
Original file line numberDiff line numberDiff line change
@@ -1,17 +1,127 @@
1-
# LangSmith Collector Proxy
1+
# LangSmith Collector-Proxy
22

3-
HTTP/JSON OTLP collector that batches, compresses (zstd) and uploads traces to LangSmith.
3+
## Overview
44

5+
The **LangSmith Collector-Proxy** is a middleware service designed to efficiently aggregate, compress, and bulk-upload tracing data from your applications to LangSmith. It's specifically optimized for large-scale, parallel environments generating high volumes of tracing data.
6+
## Why Use LangSmith Collector-Proxy?
57

6-
Run locally:
8+
Traditionally, each LangSmith SDK instance pushes traces directly to the LangSmith backend.
9+
When running massively parallel workloads, this approach of individual tracing data directly from each application instance can lead to significant TLS/HTTP overhead and increased egress costs.
10+
11+
The LangSmith Collector-Proxy addresses these issues by batching multiple spans into fewer, larger, and compressed uploads.
12+
13+
## Key Features
14+
15+
* **Efficient Data Transfer**: Significantly reduces the number of requests by aggregating spans.
16+
* **Compression**: Utilizes `zstd` compression to minimize data size.
17+
* **OTLP Support**: Accepts standard OpenTelemetry Protocol (OTLP) data in both JSON and Protocol Buffer formats via HTTP POST requests.
18+
* **Semantic Translation**: Converts GenAI semantic convention attributes to the LangSmith tracing model.
19+
* **Flexible Batching**: Configurable batching based on either the number of spans or a time interval.
20+
21+
## Getting Started
22+
23+
### Prerequisites
24+
25+
* Kubernetes cluster
26+
* Helm installed and configured
27+
28+
### Installation
29+
30+
To deploy the LangSmith Collector-Proxy, follow these steps:
31+
32+
1. Add the LangChain Helm repository:
33+
34+
```bash
35+
helm repo add langchain-ai https://github.com/langchain-ai/helm
36+
helm repo update
37+
```
38+
39+
2. Install the Collector-Proxy using Helm:
40+
41+
```bash
42+
helm install langsmith-collector-proxy langchain-ai/langsmith-collector-proxy --namespace=<your-namespace> --create-namespace
43+
```
44+
45+
Replace `<your-namespace>` with your desired Kubernetes namespace.
46+
47+
## Configuration
48+
49+
The Collector-Proxy can be customized via environment variables or Helm chart values:
50+
51+
| Variable | Description | Default |
52+
| -------------------- | ---------------------------------------------- | --------------------------------- |
53+
| `HTTP_PORT` | Port to run the proxy server | `4318` |
54+
| `LANGSMITH_ENDPOINT` | LangSmith backend URL | `https://api.smith.langchain.com` |
55+
| `LANGSMITH_API_KEY` | API key for authenticating with LangSmith | Required |
56+
| `LANGSMITH_PROJECT` | Default project for tracing data | Optional |
57+
| `BATCH_SIZE` | Number of spans per upload batch | `100` |
58+
| `FLUSH_INTERVAL_MS` | Time interval (in milliseconds) to flush spans | `5000` (5 seconds) |
59+
| `MAX_BUFFER_BYTES` | Maximum uncompressed buffer size | `10485760` (10MB) |
60+
| `MAX_BODY_BYTES` | Maximum size of incoming request body | `209715200` (200MB) |
61+
| `MAX_RETRIES` | Number of retry attempts for failed uploads | `3` |
62+
| `RETRY_BACKOFF_MS` | Initial backoff duration in milliseconds | `100` |
63+
64+
### Example Helm Configuration
65+
66+
```yaml
67+
langsmith:
68+
endpoint: "https://api.smith.langchain.com"
69+
apiKey: "your-api-key"
70+
project: "default-project"
71+
72+
batching:
73+
size: 200
74+
intervalMs: 10000
75+
maxBufferBytes: 20971520
76+
77+
retry:
78+
maxAttempts: 5
79+
backoffInitialMs: 200
80+
```
81+
82+
Save this as `values.yaml` and deploy with:
83+
84+
```bash
85+
helm upgrade --install langsmith-collector-proxy langchain-ai/langsmith-collector-proxy -f values.yaml
86+
```
87+
88+
## Usage
89+
90+
91+
Point your OTLP-compatible tracing clients or OpenTelemetry Collector exporter to the Collector-Proxy endpoint:
92+
```bash
93+
export OTEL_EXPORTER_OTLP_ENDPOINT=http://langsmith-collector-proxy.<your-namespace>.svc.cluster.local:4318/v1/traces
94+
```
95+
96+
Ensure your tracing requests include the necessary headers:
97+
98+
* `X-API-Key`: Your LangSmith API key (not required if set in environment)
99+
* `Langsmith-Project`: Optional, specifies the project name
100+
101+
## Monitoring and Health Checks
102+
103+
The LangSmith Collector-Proxy exposes simple health-check endpoints:
104+
105+
* Liveness check: `/live`
106+
* Readiness check: `/ready`
107+
108+
These endpoints return `HTTP 200` when the service is operational.
109+
110+
## Running Locally
111+
112+
To run the Collector-Proxy locally, follow these steps:
7113
```
8-
export LANGSMITH_API_KEY=devkey
9-
go run ./cmd/collector
114+
export LANGSMITH_API_KEY=lsv2...
115+
go run ./cmd/collector
10116
```
11-
POST OTLP/JSON:
117+
You can send sample OTLP/JSON data using curl:
12118
```
13-
curl -X POST -H "X-API-Key: devkey" \
14-
-H "Content-Type: application/json" \
15-
--data '{"resourceSpans":[]}' \
16-
http://localhost:4318/v1/traces
119+
curl -X POST -H "X-API-Key: devkey" \
120+
-H "Content-Type: application/json" \
121+
--data '{"resourceSpans":[]}' \
122+
http://localhost:4318/v1/traces
17123
```
124+
125+
---
126+
127+
**License:** Apache License 2.0 © LangChain AI

0 commit comments

Comments
 (0)