Search for a command to run...
We present a new method for multimodal conditional 3D face geometry generation that allows user-friendly control over the output identity and expression via a number of different conditioning signals. Within a single model, we demonstrate 3D faces generated from artistic sketches, portrait photos, Canny edges, FLAME face model parameters, 2D face landmarks, or text prompts. Our approach is based on a diffusion process that generates 3D geometry in a 2D parameterized UV domain. Geometry generation passes each conditioning signal through a set of cross-attention layers (IP-Adapter), one set for each user-defined conditioning signal. The result is an easy-to-use 3D face generation tool that produces topology-consistent, high-quality geometry with fine-grain user control. • We present a new method for 3D face geometry generation from 6 different types of conditionings (prompts) within a single model. • We propose a comprehensive solution for training such a method from scratch, with 3D geometry data augmentations and by representing 3D geometry as position maps to better fit existing diffusion pipelines. • We show that our method supports face generation with expressions, sketch-based editing for 3D face design, stochastic variations of details conditioned on low resolution FLAME faces, generalization to in-the-wild data and dynamic face generation from videos.