A robust Retrieval-Augmented Generation (RAG) system for document processing and querying using Weaviate, and Ollama.
- Flask API Server: Handles HTTP requests and document uploads
- Celery Workers: Manages asynchronous document processing
- Redis: Message broker and result backend for Celery
- Weaviate: Vector database for storing document embeddings
- Ollama: Local LLM for generating embeddings
- Document Upload → Flask API
- Async Processing → Celery Worker
- Embedding Generation → Ollama
- Vector Storage → Weaviate
- Query Processing → Weaviate
- Result Delivery → Flask API
- Docker and Docker Compose
- Python 3.11+
- Weaviate Cloud Account (or self-hosted instance)
- 8GB+ RAM recommended
- Clone the repository:
git clone <repository-url> cd document-rag-system
- Create a
.env
file with the following configuration:# Weaviate Configuration WEAVIATE_REST_URL=your-weaviate-url WEAVIATE_GRPC_URL=your-weaviate-grpc-url WEAVIATE_CLIENT_NAME=your-client-name WEAVIATE_ADMIN_API_KEY=your-api-key # Ollama Configuration OLLAMA_BASE_URL=http://localhost:11434 OLLAMA_MODEL=llama3.1:latest # Redis Configuration REDIS_URL=redis://redis:6379/0 # Document Processing CHUNK_SIZE=1000 CHUNK_OVERLAP=200
- Start services:
docker-compose up --build
POST /ingest
Content-Type: multipart/form-data
file: <document_file>
- Uploads a new document for processing
- Supports PDF, DOCX, JSON, and TXT formats
- Returns a task ID for tracking processing status
Response:
{
"task_id": "task-uuid",
"status": "processing"
}
POST /query
Content-Type: application/json
{
"query": "your search query",
"document_name": "document.pdf",
"limit": 3,
"similarity_threshold": 0.7,
"include_metadata": true
}
- Performs semantic search within a specific document using RAG (Retrieval-Augmented Generation)
- Retrieves relevant text chunks based on semantic similarity
- Supports natural language queries
- Optional parameters:
limit
: Maximum number of results to return (default: 3)similarity_threshold
: Minimum similarity score for results (default: 0.7)include_metadata
: Include document metadata in response (default: true)
Response:
{
"results": [
{
"content": "matching text chunk",
"chunk_index": 1,
"relevance_score": 0.95,
"page_number": 1,
"metadata": {
"source": "document.pdf",
"chunk_type": "text",
"created_at": "2025-02-21T07:45:23Z"
}
}
],
"document_info": {
"documentId": "doc-uuid",
"fileName": "document.pdf",
"processed_at": "2025-02-20T14:44:41.376936",
"chunk_count": "5",
"total_pages": 10,
"file_type": "pdf"
},
"query_stats": {
"processing_time_ms": 156,
"total_matches": 3
}
}
Error Responses:
// Document not found
{
"error": "Document 'document.pdf' not found",
"status": 404
}
// Invalid query
{
"error": "Query string cannot be empty",
"status": 400
}
GET /status/<task_id>
- Checks the status of a document processing task
Response:
{
"task_id": "task-uuid",
"status": "completed",
"result": true
}
Choices:
- Asynchronous processing using Celery
- Document chunking with configurable size and overlap
- Unique document IDs for proper isolation
- Metadata storage in Redis for fast retrieval
Trade-offs:
- Added complexity with distributed system
- Additional infrastructure requirements
- Potential for temporary inconsistency during processing
Choices:
- Weaviate for vector storage and similarity search
- Custom schema with indexed properties
- Document-level filtering for precise results
- Metadata enrichment for better context
Trade-offs:
- Higher resource requirements
- More complex query construction
- Additional system dependency
Choices:
- Redis for fast metadata access
- Separate storage for document info and vectors
- Filename to ID mapping for user-friendly interface
Trade-offs:
-
Data duplication between stores
-
Need for consistency management
-
Additional memory usage
- Secure file handling
- Input validation
- Non-root Docker containers
- Environment-based configuration
- API key authentication
-
Create a virtual environment:
python -m venv venv source venv/bin/activate # Linux/Mac venv\Scripts\activate # Windows
-
Install dependencies:
pip install -r requirements.txt
-
Run services individually:
# Terminal 1: Redis docker run -p 6379:6379 redis:alpine # Terminal 2: Ollama docker run -p 11434:11434 ollama/ollama # Terminal 3: Flask API python app.py # Terminal 4: Celery Worker celery -A tasks worker --pool=solo -l info
-
Check API health:
curl http://localhost:5000/health
-
Celery Tasks
- Check task status:
curl http://localhost:5000/status/<task_id>
-
Memory Issues
- Increase Docker memory limit
- Adjust chunk size in
.env
- Monitor Redis memory usage
- Document Processing
CHUNK_SIZE=1000 # Adjust based on document size CHUNK_OVERLAP=200 # Adjust for context preservation
- Query Performance
- Use
limit
parameter in queries - Enable result streaming for large documents
- Implement proper indexing in Weaviate
- Use
-
Scalability
- Implement batch processing for large documents
- Add support for distributed vector storage
- Optimize memory usage during processing
-
Reliability
- Add retry mechanisms for failed tasks
- Implement better error recovery
- Add data consistency checks
-
Features
- Support more document formats
- Add document update capabilities
- Implement cross-document search
- Add user authentication and authorization
-
Performance
- Optimize chunk size based on content
- Implement caching for frequent queries
- Add query result ranking improvements
The system implements comprehensive error handling:
- Input validation for supported file types
- Processing status tracking
- Detailed error messages
- Automatic cleanup of failed processes
- Detailed logging of all operations
- Performance metrics collection
- Processing status tracking
- Error reporting and monitoring