Overview
A retrieval-augmented generation (RAG) system designed to index and query large textbook collections for intelligent question-answering.
Key Achievements
- Indexed 1,000+ textbook pages into FAISS with dense + BM25 retrieval
- Implemented DocETL style query planning for efficient document processing
- Built ensemble re-ranking system for improved answer quality
- Added structured logging and visualization for system monitoring
Technical Implementation
The system combines multiple advanced techniques:
- Dense Retrieval: FAISS indexing with sentence transformers
- Sparse Retrieval: BM25 for keyword-based matching
- Query Planning: DocETL style processing for complex queries
- Re-ranking: Ensemble methods for answer quality improvement
- Inference: Support for llama.cpp inference and document chunking
Impact
This RAG pipeline enables efficient querying of large document collections, making educational content more accessible and searchable through natural language interfaces.