Search for a command to run...
Ensembles improve prediction performance and uncertainty quantification by aggregating predictions from multiple models. In deep ensembling, these individual models can be black box neural networks or more interpretable models, such as neural additive models. However, interpretability of the ensemble members is generally lost when computing ensemble predictions. This is a crucial drawback of deep ensembles in high-stake decision fields, in which interpretable models are desired. We propose a novel transformation ensemble which aggregates probabilistic predictions with the guarantee to preserve interpretability and yield uniformly better predictions than the ensemble members on average. Transformation ensembles are tailored towards a class of normalizing flows, called deep transformation models, but are applicable to a wider range of probabilistic neural networks. In experiments on several publicly available data sets, we demonstrate that transformation ensembles perform on par with classical deep ensembles in terms of prediction performance, discrimination, and calibration. In addition, we demonstrate how transformation ensembles capture both aleatoric and algorithmic uncertainty, and produce minimax optimal predictions under certain conditions.