An intelligent chatbot powered by a local LLM that can role-play and answer questions based on PDF content.
- Python 3.10+
- Streamlit β Interface
- PyMuPDF (fitz) β PDF text extraction
- LangChain β Chunking, Retriever, PromptTemplate
- FAISS β Vector database
- Ollama (Qwen2.5:latest) β Embedding and LLM
pdf-chatbot/
β
βββ app.py # Main application (Streamlit interface)
βββ pdf_handler.py # Extracts text from PDF and chunks it
βββ embedder.py # Embedding + FAISS database operations
βββ chatbot.py # Response generation with Qwen2.5 (via Ollama)
βββ prompts.py # Role-based prompt templates
βββ roles.json # Defines the list of roles
βββ vectordb/ # FAISS files are stored here (created when the app runs)
βββ data/ # User-uploaded PDFs (created when the app runs)
βββ requirements.txt # Required libraries
βββ README.md # Project description
-
Set up the Python environment:
python -m venv venv source venv/bin/activate # For Windows: venv\Scripts\activate pip install -r requirements.txt
-
Ensure Ollama is running: Make sure the
qwen2.5:latest
model is running on Ollama. If not installed:ollama pull qwen2.5:latest
The Ollama service needs to be running in the background (usually started with the
ollama serve
command or the Ollama Desktop application is running). It is not necessary to run the model separately withollama run qwen2.5:latest
while the application is running.To list installed models:
ollama list
You should see the
qwen2.5:latest
model (or a similar Qwen2.5 tag) in this list. -
Start the application: While in the project's main directory (
pdf-chatbot/
):streamlit run app.py
Aspect | Description |
---|---|
π Niche | An AI capable of role-playing and providing document-based consultancy. |
π§ LLM Usage | Local model (Qwen2.5 via Ollama) providing AI privacy and offline operation advantages. |
π οΈ RAG Arch. | Data-driven approach using LangChain + FAISS, differing from classic chatbots. |
πΌ CV Contrib. | Combination of LangChain, FAISS, Ollama, PDF parsing, real-world use case. |