TokenSmith: Agentic RAG system on Local LLM

View Code Overview TokenSmith is a retrieval-augmented QA pipeline for technical textbooks, designed to reduce “semantic miss” failures by combining sparse keyword matching with dense embedding retrieval, then re-ranking candidates with an ensemble ranker. Key Achievements Hybrid retrieval (dense + sparse) to improve coverage on exact terminology and equations-heavy text. EnsembleRanker supports Reciprocal Rank Fusion (RRF) and weighted score fusion for configurable ranking behavior. Stable evaluation harness with structured logging to debug retrieval failures and measure end-to-end answer quality. Technical Implementation Retrieval: FAISS for dense similarity + BM25 for keyword recall, merged via fusion strategies. Ranking: Config-driven EnsembleRanker stage to combine multiple retrievers and optionally add lightweight feature-based scoring. Inference: Pluggable LLM backend (local or hosted), with chunking and provenance attached to retrieved passages. Inference: Support for llama.cpp inference and document chunking Impact This RAG pipeline enables efficient querying of large document collections, making educational content more accessible and searchable through natural language interfaces. ...

1 min · 158 words · Raj Shah