Search for a command to run...
The standard explanation for why sigmoid activations fail in hidden layers of deep networks is vanishing gradients: repeated multiplication by sigma'(z) in (0, 0.25] attenuates error signals exponentially with depth. We present experimental evidence that this explanation is incomplete and identify a more fundamental mechanism: answer bandwidth compression. Sigmoid's bounded output range (0, 1) reduces the effective rank of hidden representations, destroying representational capacity independently of gradient flow. We establish this through a six-experiment falsification-refinement arc on CIFAR-10 (VGG-style CNN) and WikiText-2 (GPT-2 Transformer), testing 60+ configurations across 30 hypotheses. We conclude with a five-level hierarchy of bandwidth preservation: (1) the gate function (sigma vs Phi) is freely interchangeable, (2) only multiplicative bypass (x factor) preserves bandwidth -- additive bypass (skip connections) cannot rescue it, (3) bandwidth must be preserved within the activation (intra-activation), not around the block (inter-block), (4) the penalty is depth-modulated but not depth-dependent in the vanishing gradient sense, and (5) the mechanism transfers across architectures with differing mediating pathways.