Overview

A retrieval-augmented generation (RAG) system designed to index and query large textbook collections for intelligent question-answering.

Key Achievements

  • Indexed 1,000+ textbook pages into FAISS with dense + BM25 retrieval
  • Implemented DocETL style query planning for efficient document processing
  • Built ensemble re-ranking system for improved answer quality
  • Added structured logging and visualization for system monitoring

Technical Implementation

The system combines multiple advanced techniques:

  • Dense Retrieval: FAISS indexing with sentence transformers
  • Sparse Retrieval: BM25 for keyword-based matching
  • Query Planning: DocETL style processing for complex queries
  • Re-ranking: Ensemble methods for answer quality improvement
  • Inference: Support for llama.cpp inference and document chunking

Impact

This RAG pipeline enables efficient querying of large document collections, making educational content more accessible and searchable through natural language interfaces.

View on GitHub →