Search for a command to run...
Standard approaches in Markov Decision Processes (MDPs) typically focus on maximizing the expected return. Yet, many real-world applications require a consideration of risk that extends beyond the average outcome. This thesis investigates risk-sensitive sequential decision-making, aiming to optimize functionals of the return distribution beyond simple expectation.The distributional approach has raised significant hope in this domain by allowing for the capture of the full return distribution. This framework theoretically facilitates the handling of complex risk metrics such as Value-at-Risk (VaR), Conditional Value-at-Risk (CVaR), and the Entropic Risk Measure (EntRM). This thesis rigorously investigates the capabilities and limitations of this approach, specifically studying which risk measures can be effectively optimized using dynamic programming.Despite the promise of the distributional perspective, we uncover fundamental theoretical barriers. We characterize the set of risk measures amenable to dynamic programming and demonstrate that it is much narrower than previously assumed. In particular, we show that only a specific class of risk measures, the Entropic Risk Measure family, can be exactly optimized using standard dynamic programming recursion.However, this family proves to be crucial, as it appears naturally in the approximation of other important risk measures. Building on this insight, we propose a unified planning framework. This method leverages the full spectrum of risk-sensitive behaviors offered by the entire EntRM family (the Optimality Front), for which we prove key structural properties. Inspired by these properties, we develop an algorithm called DOLFIN (Distributional Optimality Front Iteration) to approximately solve otherwise intractable objectives (VaR, CVaR, Threshold Probabilities) via Generalized Policy Improvement.Finally, we investigate the problem of learning the EntRM under uncertainty to ensure reliable decision-making in environments with unknown dynamics. We derive statistical concentration bounds for its estimation and provide the first analysis of learning the EntRM for a full range of risk parameters simultaneously.