A containerized search and ingest API for MongoDB Atlas with vector search capabilities, built with FastAPI and Docker.
- Vector search for product data using MongoDB Atlas Search with knnBeta
- Local embeddings generation using
paraphrase-multilingual-MiniLM-L12-v2
model (optimized for Norwegian and Swedish) - Faceted search results for e-commerce applications
- Product recommendations based on purchase history
- Simple API key authorization
- Dockerized deployment with MongoDB and FastAPI
- Docker and Docker Compose
- MongoDB Atlas account (optional, can use local MongoDB instead)
- Python 3.9+ (for local development)
- Clone the repository
- Create a
.env
file from the examplecp .env.example .env
- Update the
.env
file with your MongoDB Atlas connection string (if using Atlas) - Build and start the containers
docker-compose up -d --build
- Access the API documentation at http://localhost:8000/docs
MONGODB_URI
: MongoDB connection string- Local:
mongodb://mongodb:27017/productdb
- Atlas:
mongodb+srv://<username>:<password>@<cluster>.mongodb.net/productdb?retryWrites=true&w=majority
- Local:
API_KEY
: Secret API key for authorization
For production use with MongoDB Atlas:
- Create a MongoDB Atlas cluster if you don't have one
- Create a vector search index on the
products
collection- Index name:
product_search
- Configure vector fields:
title_embedding
: 384 dimensions, cosine similaritydescription_embedding
: 384 dimensions, cosine similarity
- Index name:
POST /ingest/products
: Ingest product data with automatic embedding generationGET /doc/{product_id}
: Retrieve a specific product by IDDELETE /remove/product/{product_id}
: Remove a specific productDELETE /remove/products/all
: Remove all products
POST /ingest/orderlines
: Ingest order line data for recommendationsDELETE /remove/order/{order_id}
: Remove a specific orderDELETE /remove/orders/all
: Remove all ordersDELETE /remove/orders/user/{user_id}
: Remove all orders for a specific user
POST /recommend/content/{product_id}
: Get content-based recommendationsPOST /recommend/collaborative/{user_id}
: Get collaborative filtering recommendationsPOST /recommend/hybrid/{user_id}
: Get hybrid recommendations combining multiple strategies
POST /search
: Main search endpoint with combined keyword and vector searchPOST /autosuggest
: Lightweight search for autocomplete functionalityPOST /query-explain
: Debug endpoint to explain search behaviorPOST /feedback
: Log user feedback for future improvements
GET /health
: Health check endpoint
Detailed API documentation is available at:
- Swagger UI: http://localhost:8000/docs
- ReDoc: http://localhost:8000/redoc
For local development without Docker:
-
Create a virtual environment
python -m venv venv source venv/bin/activate # On Windows: venv\Scripts\activate
-
Install dependencies
pip install -r requirements.txt
-
Run the application
cd app uvicorn main:app --reload
Included in the project is a comprehensive test script test_endpoints.py
that validates all functionality:
# Run all tests
python test_endpoints.py
You can also use the Swagger UI or curl for manual testing:
# Health check
curl -X GET http://localhost:8000/health
# Ingest a product (requires API key)
curl -X POST http://localhost:8000/ingest/products \
-H "Content-Type: application/json" \
-H "x-apikey: your_default_api_key" \
-d '[{"id":"prod1","title":"Baby Shoes","description":"Comfortable shoes for babies","brand":"BabySteps","priceOriginal":299.99,"priceCurrent":249.99,"isOnSale":true,"ageFrom":1,"ageTo":3,"ageBucket":"1 to 3 years","color":"red","seasons":["winter","spring"],"productType":"main","seasonRelevancyFactor":0.8,"stockLevel":45}]'
- ✅ Vector search with MongoDB Atlas Search (knnBeta)
- ✅ Faceted search results for e-commerce applications
- ✅ Local embedding generation optimized for Norwegian and Swedish
- ✅ Order data ingestion and naive recommender system
- ✅ API key authorization for all secured endpoints
- ✅ Comprehensive data management functionality
- ✅ Environment-aware deployment with Docker
- ✅ Complete test suite
- ✅ Batch processing for large datasets
- ✅ Robust data transformation pipeline
- ✅ Validation framework for data integrity
- ✅ Comprehensive implementation documentation
The project includes a robust data processing pipeline for handling large datasets:
Our transformation script (scripts/data_processing/transform_data.py
) efficiently converts products from source format to the API schema:
python scripts/data_processing/transform_data.py \
--input "path/to/source/data.json" \
--output "transformed_data.json" \
--batch-size 200
Features:
- Memory-efficient batch processing
- Error handling with appropriate fallbacks
- Field normalization for multilingual content
- Detailed logging for each transformation step
The validation script (scripts/data_processing/validate_data.py
) ensures data integrity:
python scripts/data_processing/validate_data.py \
--input "transformed_data.json" \
--report "validation_report.json"
Features:
- Schema validation against API requirements
- Field coverage statistics
- Detailed validation reporting
The ingestion script (scripts/data_processing/ingest_data.py
) loads data into MongoDB:
python scripts/data_processing/ingest_data.py \
--input "transformed_data.json" \
--api-url "http://localhost:8000" \
--api-key "your-api-key" \
--batch-size 100
Features:
- Configurable batch sizes for efficient loading
- Retry logic for failed ingestion attempts
- Progress tracking during ingestion
The testing script (scripts/data_processing/test_search.py
) validates search functionality:
# Online mode (with API)
python scripts/data_processing/test_search.py \
--api-url "http://localhost:8000" \
--api-key "your-api-key" \
--output "search_results.json"
# Offline mode (without API)
python scripts/data_processing/test_search.py \
--offline-mode \
--input "transformed_data.json" \
--output "search_results.json"
Features:
- Support for both online and offline testing
- Multiple query types: keyword, brand, category, etc.
- Detailed search result analysis
Our implementation has been thoroughly tested with the following results:
Test | Status | Notes |
---|---|---|
Data Transformation | ✅ Success | Processed 1,000 products with batch processing |
Data Validation | ✅ Success | All 1,000 products passed schema validation |
Field Coverage | ✅ Success | 100% coverage for required fields |
Error Handling | ✅ Success | Graceful handling of data inconsistencies |
Query Type | Results | Top Result |
---|---|---|
Keyword search | 3 | Aktivitetspakke, Solar System glow in the dark |
Brand search | 9 | Gunghäst Vera AIDEN |
Book search | 10 | Barnebok – Bluey Sommerfugler |
Seasonal search | 4 | Summertime |
Age-specific search | 10 | Cornhole Set |
The project includes comprehensive documentation:
-
Implementation Guide:
docs/MongoDB_Atlas_Search_Test_Report_and_Implementation_Guide.md
A complete guide covering system architecture, configuration, deployment, and maintenance -
Requirements and Progress:
docs/Project_Requirements_and_Progress.md
Tracks all client requirements and implementation progress -
Implementation Report:
docs/Implementation_Report.md
Provides details on implementation decisions and technical approach -
Data Processing Documentation:
docs/data_processing/
Detailed guides for each step of the data processing pipeline
MIT