Voicebook - PDF & YouTube to Podcast Converter

Voicebook is a sophisticated web application that transforms PDF documents and YouTube videos into engaging podcast-style conversations between two AI personas (Alex and Sarah). The application leverages advanced AI technologies for content processing, embedding generation, and natural language understanding.

🚀 Features

PDF Processing

Upload and process PDF documents
Automatic content chunking and embedding
Vector storage in Pinecone database
Generate AI-powered podcast conversations
Interactive chat with document context

YouTube Integration

Process YouTube videos via URL
Automatic caption/transcript extraction
Content vectorization and storage
Generate podcast discussions about video content
Context-aware chat about video content

Podcast Generation

Dynamic conversation generation between Alex and Sarah
Minimum 55 lines of detailed dialogue
Structured discussion format:
- Introduction/Overview
- Main points analysis
- Critical discussion
- Real-world implications
- Personal perspectives

Interactive Features

Real-time audio playback
Voice-enabled chat interface
Context-aware responses
PDF/Video content reference
Semantic search capabilities

🛠 Tech Stack

Frontend

Next.js 15.0.3 - React framework
React 19 - UI library
TailwindCSS - Styling
TypeScript - Type safety
React Icons - Icon components

Backend (API Routes)

Next.js API Routes - Serverless functions
Pinecone - Vector database
Google AI (Gemini) - Embeddings generation
Groq - LLM for conversation generation
LangChain - Document processing
PDF Parse - PDF text extraction

AI/ML Components

Gemini Embedding Model - Vector embeddings
Llama 3.2 90B - Podcast generation
Mixtral 8x7B - Chat responses
Web Speech API - Voice interface

📦 Key Dependencies

{
  "@google/generative-ai": "^0.21.0",
  "@langchain/community": "^0.3.14",
  "@pinecone-database/pinecone": "^4.0.0",
  "groq-sdk": "^0.8.0",
  "langchain": "^0.3.5",
  "next": "15.0.3"
}

🏗 Architecture

Document Processing Flow

PDF/YouTube content upload
Content chunking and preprocessing
Embedding generation via Gemini AI
Vector storage in Pinecone
Podcast script generation via Groq
Interactive chat capabilities

Data Flow

Content Ingestion → Chunking → Embedding → Storage
Query Processing → Semantic Search → Context Retrieval → Response Generation
Chat Interface → Voice Processing → Context-Aware Responses

🔧 Environment Setup

1. API Keys Required

Pinecone API Key

Visit Pinecone Console
Sign up or login to your account
Navigate to API Keys section
Create a new API key
Copy the key and environment

Google AI (Gemini) API Key

Go to Google AI Studio
Create or sign in to your Google Cloud account
Enable the Gemini API
Create a new API key
Copy the key

Groq API Key

Visit Groq Console
Create an account or sign in
Go to API section
Generate new API key
Copy the key

2. Environment Configuration

Clone the repository:

git clone https://github.com/yourusername/voicebook.git
cd voicebook

Copy the environment example file:

cp .env.example .env

Update the .env file with your API keys:

PINECONE_API_KEY=your_pinecone_api_key
PINECONE_INDEX_NAME=your_index_name
PINECONE_ENVIRONMENT=your_environment
GOOGLE_API_KEY=your_google_api_key
GROQ_API_KEY=your_groq_api_key

3. Pinecone Index Setup

Create a new index in Pinecone console with:
- Dimensions: 768 (Gemini embeddings)
- Metric: Cosine
- Pod Type: s1.x1 (recommended)
Update your .env with the index name:

PINECONE_INDEX_NAME=your-index-name

4. Development Setup

Install dependencies:

npm install
# or
yarn install

Run the development server:

npm run dev
# or
yarn dev

Build for production:

npm run build
npm start
# or
yarn build
yarn start


## 🎯 Key Components

### Content Processing
- PDF document chunking and embedding generation
- YouTube transcript extraction and processing
- Vector storage and retrieval

### Conversation Generation
- Structured podcast script generation
- Context-aware chat responses
- Voice interface integration

### User Interface
- Responsive design with TailwindCSS
- Audio playback controls
- Interactive chat interface
- Voice command support

## 📝 API Routes

### Main Endpoints
- `/api/upload` - PDF processing
- `/api/youtube` - YouTube video processing
- `/api/generate-podcast` - Podcast script generation
- `/api/chat` - Context-aware chat
- `/api/podcast-chat` - Podcast-specific chat
- `/api/podcast-yt-chat` - YouTube podcast chat

## 🔒 Security Considerations

- Environment variables for API keys
- Server-side processing of sensitive operations
- Rate limiting implementation
- Error handling and validation

## 🎨 UI/UX Features

- Clean, modern interface
- Responsive design
- Loading states and animations
- Error handling and user feedback
- Voice interaction capabilities

## 📚 Documentation References

- [Next.js Documentation](https://nextjs.org/docs)
- [Pinecone Documentation](https://docs.pinecone.io/)
- [Google AI Documentation](https://ai.google.dev/docs)
- [Groq Documentation](https://console.groq.com/docs)
- [LangChain Documentation](https://js.langchain.com/docs)

## 🤝 Contributing

Contributions are welcome! Please read our contributing guidelines and submit pull requests for any enhancements.

## 📄 License

This project is licensed under the MIT License - see the LICENSE file for details.

Name		Name	Last commit message	Last commit date
Latest commit History 6 Commits
app		app
public		public
.env.example		.env.example
.eslintrc.json		.eslintrc.json
.gitignore		.gitignore
README.md		README.md
next.config.ts		next.config.ts
package-lock.json		package-lock.json
package.json		package.json
postcss.config.mjs		postcss.config.mjs
tailwind.config.ts		tailwind.config.ts
tsconfig.json		tsconfig.json

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

Voicebook - PDF & YouTube to Podcast Converter

🚀 Features

PDF Processing

YouTube Integration

Podcast Generation

Interactive Features

🛠 Tech Stack

Frontend

Backend (API Routes)

AI/ML Components

📦 Key Dependencies

🏗 Architecture

Document Processing Flow

Data Flow

🔧 Environment Setup

1. API Keys Required

Pinecone API Key

Google AI (Gemini) API Key

Groq API Key

2. Environment Configuration

3. Pinecone Index Setup

4. Development Setup

About

Uh oh!

Releases

Packages

Uh oh!

Languages

samarthpandeydev/voicebook

Folders and files

Latest commit

History

Repository files navigation

Voicebook - PDF & YouTube to Podcast Converter

🚀 Features

PDF Processing

YouTube Integration

Podcast Generation

Interactive Features

🛠 Tech Stack

Frontend

Backend (API Routes)

AI/ML Components

📦 Key Dependencies

🏗 Architecture

Document Processing Flow

Data Flow

🔧 Environment Setup

1. API Keys Required

Pinecone API Key

Google AI (Gemini) API Key

Groq API Key

2. Environment Configuration

3. Pinecone Index Setup

4. Development Setup

About

Topics

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Languages

Packages