Voicebook is a sophisticated web application that transforms PDF documents and YouTube videos into engaging podcast-style conversations between two AI personas (Alex and Sarah). The application leverages advanced AI technologies for content processing, embedding generation, and natural language understanding.
- Upload and process PDF documents
- Automatic content chunking and embedding
- Vector storage in Pinecone database
- Generate AI-powered podcast conversations
- Interactive chat with document context
- Process YouTube videos via URL
- Automatic caption/transcript extraction
- Content vectorization and storage
- Generate podcast discussions about video content
- Context-aware chat about video content
- Dynamic conversation generation between Alex and Sarah
- Minimum 55 lines of detailed dialogue
- Structured discussion format:
- Introduction/Overview
- Main points analysis
- Critical discussion
- Real-world implications
- Personal perspectives
- Real-time audio playback
- Voice-enabled chat interface
- Context-aware responses
- PDF/Video content reference
- Semantic search capabilities
- Next.js 15.0.3 - React framework
- React 19 - UI library
- TailwindCSS - Styling
- TypeScript - Type safety
- React Icons - Icon components
- Next.js API Routes - Serverless functions
- Pinecone - Vector database
- Google AI (Gemini) - Embeddings generation
- Groq - LLM for conversation generation
- LangChain - Document processing
- PDF Parse - PDF text extraction
- Gemini Embedding Model - Vector embeddings
- Llama 3.2 90B - Podcast generation
- Mixtral 8x7B - Chat responses
- Web Speech API - Voice interface
{
"@google/generative-ai": "^0.21.0",
"@langchain/community": "^0.3.14",
"@pinecone-database/pinecone": "^4.0.0",
"groq-sdk": "^0.8.0",
"langchain": "^0.3.5",
"next": "15.0.3"
}
- PDF/YouTube content upload
- Content chunking and preprocessing
- Embedding generation via Gemini AI
- Vector storage in Pinecone
- Podcast script generation via Groq
- Interactive chat capabilities
- Content Ingestion → Chunking → Embedding → Storage
- Query Processing → Semantic Search → Context Retrieval → Response Generation
- Chat Interface → Voice Processing → Context-Aware Responses
- Visit Pinecone Console
- Sign up or login to your account
- Navigate to API Keys section
- Create a new API key
- Copy the key and environment
- Go to Google AI Studio
- Create or sign in to your Google Cloud account
- Enable the Gemini API
- Create a new API key
- Copy the key
- Visit Groq Console
- Create an account or sign in
- Go to API section
- Generate new API key
- Copy the key
- Clone the repository:
git clone https://github.com/yourusername/voicebook.git
cd voicebook
- Copy the environment example file:
cp .env.example .env
- Update the .env file with your API keys:
PINECONE_API_KEY=your_pinecone_api_key
PINECONE_INDEX_NAME=your_index_name
PINECONE_ENVIRONMENT=your_environment
GOOGLE_API_KEY=your_google_api_key
GROQ_API_KEY=your_groq_api_key
-
Create a new index in Pinecone console with:
- Dimensions: 768 (Gemini embeddings)
- Metric: Cosine
- Pod Type: s1.x1 (recommended)
-
Update your .env with the index name:
PINECONE_INDEX_NAME=your-index-name
- Install dependencies:
npm install
# or
yarn install
- Run the development server:
npm run dev
# or
yarn dev
- Build for production:
npm run build
npm start
# or
yarn build
yarn start
## 🎯 Key Components
### Content Processing
- PDF document chunking and embedding generation
- YouTube transcript extraction and processing
- Vector storage and retrieval
### Conversation Generation
- Structured podcast script generation
- Context-aware chat responses
- Voice interface integration
### User Interface
- Responsive design with TailwindCSS
- Audio playback controls
- Interactive chat interface
- Voice command support
## 📝 API Routes
### Main Endpoints
- `/api/upload` - PDF processing
- `/api/youtube` - YouTube video processing
- `/api/generate-podcast` - Podcast script generation
- `/api/chat` - Context-aware chat
- `/api/podcast-chat` - Podcast-specific chat
- `/api/podcast-yt-chat` - YouTube podcast chat
## 🔒 Security Considerations
- Environment variables for API keys
- Server-side processing of sensitive operations
- Rate limiting implementation
- Error handling and validation
## 🎨 UI/UX Features
- Clean, modern interface
- Responsive design
- Loading states and animations
- Error handling and user feedback
- Voice interaction capabilities
## 📚 Documentation References
- [Next.js Documentation](https://nextjs.org/docs)
- [Pinecone Documentation](https://docs.pinecone.io/)
- [Google AI Documentation](https://ai.google.dev/docs)
- [Groq Documentation](https://console.groq.com/docs)
- [LangChain Documentation](https://js.langchain.com/docs)
## 🤝 Contributing
Contributions are welcome! Please read our contributing guidelines and submit pull requests for any enhancements.
## 📄 License
This project is licensed under the MIT License - see the LICENSE file for details.