LensDB: Compressed Learned Index for Traffic Video Analytics

Overview

LensDB is a compressed learned index for surveillance and traffic video that decouples ingestion from interactive querying. Instead of running object detection per frame, it stores a sparse set of representative frames as CLIP embeddings and answers exploratory queries directly in the latent space, avoiding repeated video decode from network-attached storage.

Key Achievements

Up to 99.991% storage reduction with sub-second query latency vs. exhaustive YOLO baselines.
Compresses a 1.4 GB video into ~1.5 MB of embeddings while supporting approximate car-count queries.
Keyframe filtering removes 90–99% redundant frames before embedding using FrameDiff, SSIM, MOG2, and optical flow.

Technical Implementation

Ingestion: 1 FPS sampling → heuristic keyframe selection → CLIP (ViT-B/32) image embeddings → FAISS index + timestamp metadata map.
Query: CLIP text encoder retrieves top-k via FAISS, then an MLP predicts object count for threshold filters (Count ≥ T), with temporal expansion via metadata.

Results

Achieves F1 = 0.963 for event detection at T ≥ 1, with expected precision trade-offs at higher count thresholds.

Overview#

Key Achievements#

Technical Implementation#

Results#

Overview

Key Achievements

Technical Implementation

Results