LAB 6

Re-ranking

Vector search retrieves fast but roughly. A cross-encoder re-ranker reads the query and each chunk together, scoring true relevance. Watch the order change — and notice which chunk jumps to the top.

Stage 1 — Fast Retrieval

🧬

Bi-encoder (Vector Search)

Embeds query and each chunk separately. Very fast — can scan thousands of chunks in milliseconds. Retrieves top-10 candidates but lacks fine-grained understanding.

Stage 2 — Precision Re-ranking

🏆

Cross-encoder (HuggingFace)

Reads query + chunk together in one pass. Understands the relationship between them. Slower — but far more accurate at judging true relevance.

🤗Free HuggingFace model · sentence-transformers/all-MiniLM-L6-v2 · No credit card · Runs via HuggingFace Inference APIAdd HF_TOKEN to .env.local for higher rate limits