RAG Pipeline for Textbook QA

Overview

A retrieval-augmented generation (RAG) system designed to index and query large textbook collections for intelligent question-answering.

Key Achievements

Indexed 1,000+ textbook pages into FAISS with dense + BM25 retrieval
Implemented DocETL style query planning for efficient document processing
Built ensemble re-ranking system for improved answer quality
Added structured logging and visualization for system monitoring

Technical Implementation

The system combines multiple advanced techniques:

Dense Retrieval: FAISS indexing with sentence transformers
Sparse Retrieval: BM25 for keyword-based matching
Query Planning: DocETL style processing for complex queries
Re-ranking: Ensemble methods for answer quality improvement
Inference: Support for llama.cpp inference and document chunking

Impact

This RAG pipeline enables efficient querying of large document collections, making educational content more accessible and searchable through natural language interfaces.

View on GitHub →

Overview#

Key Achievements#

Technical Implementation#

Impact#

Overview

Key Achievements

Technical Implementation

Impact