Search for a command to run...
Toroidal Logit Bias for Hallucination Reduction in Large Language Models v1.1 - Added TruthfulQA evaluation (817 samples) Key Results: - Custom benchmark (100 prompts): +40% error reduction (Qwen), +15.4% (OLMo) - TruthfulQA (817 prompts): +6.8% error reduction (Qwen) - Paired analysis: 46 improvements vs 32 regressions (McNemar p=0.14) - Consistent directional improvement (b > c) Method: Inference-time toroidal logit bias. No fine-tuning required, ~5% latency overhead. Scope: This work focuses narrowly on an inference-time intervention for hallucination reduction. It makes no claims about ontology, training dynamics, or universal representations. The contribution is operational and empirical. Changelog v1.1: - Added TruthfulQA evaluation (817 samples) with generation-based matching - Added paired McNemar's test analysis - Confirmed directional improvement across both benchmarks An inference-time intervention that imposes toroidal topological constraints on token selection in large language models. v7 — Replication Update (March 2026) This version corrects the empirical claims from v2. A comprehensive independent replication (6 experimental phases, 4 models, 3 benchmarks, n=100–200 per condition) found that the toroidal logit bias does not produce statistically significant hallucination reduction. The v2 results (+0.2pp to +2.8pp) were within sampling variance of the LLM judge. v2 Original Results (817 samples, LLM-judged — NOT REPLICATED): Qwen 0.5B: 16.9% → 17.1% (+0.2pp), Qwen 1.5B: 32.2% → 32.8% (+0.6pp), Qwen 7B: 75.6% → 77.7% (+2.1pp), Mistral 7B: 74.4% → 77.2% (+2.8pp) v7 Replication Results: Phase 1 — Greedy Decoding (4 models × 3 benchmarks): All null. α=0.3 bias (~0.9 max logit shift) too small to change argmax. Phase 2 — Sampling with T&I Judge (exact match to v2 methodology, Qwen 7B, n=200): Baseline 76.5% T&I, Toroidal 74.5% T&I, delta −2.0pp, McNemar p=0.22. Our baseline matches v2 (76.5% vs 75.6%), confirming correct methodology. Toroidal shows opposite direction, not significant. Phase 3 — Alpha Sweep (Qwen 7B, semantic torus mapping, n=100): α=0.3 has zero effect. α=1.0 slight degradation (+1pp). α=5.0 severe degradation (+25pp TriviaQA). α=10.0 catastrophic (96% hallucination vs 50% baseline). Higher alpha monotonically degrades output. No beneficial sweet spot. Phase 4 — Factorial Decomposition: Hardening system prompt ("Answer concisely and truthfully") accounts for −14pp hallucination reduction on TriviaQA (p=0.05). Toroidal logit bias contributes 0pp (p=0.88). The prompt, not the bias, was the active ingredient. Phase 5 — Semantic Torus Mapping: Replacing modular arithmetic with learned embedding alignment. Consistent −1pp on NQ-Open, not statistically significant. Phase 6 — Orthogonal Projection: G = VVᵀ on hidden states. Question-only evidence basis too restrictive (4% of space). Expanded with top-k predictions still zero effect. What remains valid: Theoretical framework (toroidal topology, Lean 4 proofs, spectral gap bounds). Toy model validation (40% drift reduction vs random masking). Rust/Python library primitives. Hardening prompt as effective hallucination reduction. PQ attestation and audit trail (Coherence Shield). Training-time Karmonic spectral regularization (untested — promising direction). What is corrected: The v2 TruthfulQA improvements (+0.2pp to +2.8pp) were within LLM judge sampling variance. "Improvement scales with model capacity" is not supported. The OLMo +15.4% was from a 100-configuration sweep (overfitting to test set). The custom benchmark +40% (95%→97%, 5→3 errors on 100 prompts) was not statistically significant. Scope: The theoretical framework remains formally valid. The inference-time logit bias does not produce measurable hallucination reduction at practical bias strengths. Training-time Karmonic regularization is the recommended direction. Replication scripts: https://github.com/Paraxiom/topological-coherence/tree/main/experiments