Learning Robotic Manipulation through Vision, Touch, and Spatially Grounded Representations

20260 citationsDissertationgreen Open Access

Authors

Niklas Wilhelm Funk · Fraunhofer Institute for Intelligent Analysis and Information Systems

Abstract

Achieving dexterous robotic manipulation remains one of the grand challenges in robotics. Progress toward this goal is essential for physically integrating intelligent autonomous systems into everyday life, enabling robots to assist with or fully perform the diverse range of manipulation tasks currently carried out by humans. Despite remarkable advances in robotic manipulation within structured industrial environments, where tasks are well-defined and predictable, achieving robust manipulation in unstructured, versatile settings such as homes, logistics, and field robotics continues to be an open challenge. A promising direction toward more flexible and generalizable robotic manipulation lies in leveraging data-driven, machine learning–based approaches. While such approaches have achieved remarkable success in domains like natural language processing, their direct application to robotic manipulation remains challenging. Key difficulties include data scarcity, satisfying real-time constraints, the tight coupling among sensing, policy inference, control, and contact dynamics, as well as the need for long-horizon planning capabilities. One way to address these challenges would be to collect vast amounts of data covering all possible task variations and apply standard large-scale training pipelines. However, considering practical constraints such as cost and time, such approaches are often infeasible. Instead, this thesis adopts a more structured approach, aiming to leverage domain knowledge about robotic manipulation to enhance the performance and efficiency of learning-based methods. From this perspective, several key capabilities emerge as essential for advancing robotic manipulation: advanced tactile sensing to complement visual perception during contact-rich interactions; efficient scene representations that enable learning from few demonstrations and generalization to novel scenarios; policy learning approaches that yield robust and reactive behavior; and flexible long-horizon skill sequencing that accounts for the capabilities of low-level skills to reliably accomplish multi-step manipulation. In line with these insights, this thesis focuses on the four core topics of tactile sensing, scene representation, policy learning, and skill sequencing. (...)

Topics & Keywords

Robot Manipulation and Learning Reinforcement Learning in Robotics Multimodal Machine Learning Applications

Publication Details

Published in: TUprints

DOI: 10.26083/tuda-7785

Command Palette

Learning Robotic Manipulation through Vision, Touch, and Spatially Grounded Representations

Authors

Abstract

Topics & Keywords

Publication Details