Search for a command to run...
Vector similarity scores — cosine similarity, inner product, Euclidean distance — are not probabilities. A cosine similarity of 0.85 does not mean an 85% chance of relevance, yet hybrid search systems routinely combine such scores with lexical signals through ad-hoc normalization (min-max, arctangent) or rank-based fusion (RRF) that discards score magnitude information entirely. We present a Bayesian calibration framework that transforms vector similarity scores into calibrated relevance probabilities by exploiting the distributional statistics already computed during approximate nearest neighbor (ANN) index construction and search. Our approach is grounded in a likelihood ratio formulation: the calibrated probability is determined by the ratio of a local distance density (how likely this distance is among relevant documents) to a global background density (how likely this distance is by chance in the corpus), combined with an independent prior. We address the fundamental circularity problem — estimating the local density requires knowing which documents are relevant — through cross-modal conditional independence: any relevance signal conditionally independent of vector distance given true relevance (e.g., lexical matching, alternative embedding models, or index-derived density priors) provides importance weights that break the self-referential loop. We develop two estimation procedures: a nonparametric weighted kernel density estimator for the local distribution, and a parametric Gaussian mixture model with EM optimization initialized by external relevance priors. For pure vector environments where no external signal is available, we present fallback strategies based on distance distribution gap detection and index-derived density priors. Both estimators derive their statistics from ANN index structures — IVF cell populations and intra-cluster distances, HNSW edge distances and search trajectories — at negligible additional cost. The resulting calibrated vector scores integrate seamlessly with other calibrated signals through additive log-odds fusion, yielding a unified hybrid search framework where every signal contributes independently calibrated Bayesian evidence. Experimental evaluation on 5 BEIR benchmark datasets demonstrates improved calibration over ad-hoc baselines, validates that cross-modal importance weights outperform structurally independent density priors despite violating conditional independence, and achieves competitive NDCG@10 compared to RRF and convex combination baselines.