SuperHead: Improving Realism in Animatable 3D Head Avatars
By Ashlyn Lacovara
Researchers: Ding-Jiun Huang, Yuanhao Wang, Shao-Ji Yuan, Albert Mosella-Montoro, Francisco Vicente Carrasco, and Fernando De la Torre (黑料正能量), and Cheng Zhang (Texas A&M University).
Researchers at 黑料正能量 have introduced a new framework designed to significantly improve the realism of animatable 3D head avatars. Their approach, called SuperHead, addresses a persistent challenge in immersive digital media: transforming low-quality visual inputs into believable, high-fidelity 3D avatars capable of realistic motion.

High-quality digital humans are becoming increasingly important across a wide range of applications, including augmented and virtual reality, telepresence, gaming, and digital entertainment. In these environments, realism plays a critical role in creating convincing experiences. However, generating detailed 3D avatars typically requires expensive scanning equipment and specialized capture pipelines. While these systems can produce highly detailed geometry and textures, their cost and complexity limit accessibility.
To overcome these barriers, many recent approaches attempt to reconstruct 3D heads from ordinary photos or videos captured with consumer devices. While this makes avatar creation more accessible, it introduces new challenges. Smartphone or webcam footage often suffers from low resolution, motion blur, and inconsistent lighting, resulting in avatars with blurry textures and visual artifacts.
To ensure realistic results, the system is trained using a sparse set of upscaled 2D face renderings paired with depth maps captured from multiple viewpoints and facial expressions. This multi-view supervision helps maintain cross-view consistency, geometric accuracy, and stable identity preservation even as the avatar moves and speaks.
The framework was evaluated on benchmarks such as NeRSemble and INSTA, where it showed strong performance across multiple avatar representations, including GaussianAvatar and SplattingAvatar models.
Beyond improving quality, the approach is also efficient. Because the method builds on GAN inversion techniques, it remains computationally lightweight while still delivering substantial improvements in visual realism. This efficiency enables faster inference while maintaining consistent results across dynamic facial motion—an area where many existing methods struggle.
