Repaint123: Fast and High-quality One Image to 3D Generation
with Progressive Controllable 2D Repainting

Arxiv 2023


Junwu Zhang1,*, Zhenyu Tang1,*, Yatian Pang3, Xinhua Cheng1, Peng Jin1, Yida Wei4, Munan Ning1,2, Li Yuan1,2

*Equal contributions   
1Peking University    2Pengcheng Laboratory    3National University of Singapore    4Wuhan University

Abstract


Repaint123 crafts 3D content from a single image, matching 2D generation quality in just 2 minutes.
performance

Recent one image to 3D generation methods commonly adopt Score Distillation Sampling (SDS). Despite the impressive results, there are multiple deficiencies including multi-view inconsistency, over-saturated and over-smoothed textures, as well as the slow generation speed. To address these deficiencies, we present Repaint123 to alleviate multi-view bias as well as texture degradation and speed up the generation process. The core idea is to combine the powerful image generation capability of the 2D diffusion model and the texture alignment ability of the repainting strategy for generating high-quality multi-view images with consistency. We further propose visibility-aware adaptive repainting strength for overlap regions to enhance the generated image quality in the repainting process. The generated high-quality and multi-view consistent images enable the use of simple Mean Square Error (MSE) loss for fast 3D content generation. We conduct extensive experiments and show that our method has a superior ability to generate high-quality 3D content with multi-view consistency and fine textures in 2 minutes from scratch.


Image-to-3D generation pipeline


In the coarse stage, we adopt Gaussian Splatting representation optimized by SDS loss at the novel view. In the fine stage, we export Mesh representation and bidirectionally and progressively sample novel views for controllable progressive repainting. The novel-view refined images will compute MSE loss with the input novel-view image for efficient generation. Cameras in red are bidirectional neighbor cameras for obtaining the visibility map.

Overview

Controllable repainting scheme


Our scheme employs DDIM Inversion to generate deterministic noisy latent from coarse images, which are then refined via a diffusion model controlled by depth-guided geometry, reference image semantics, and attention-driven reference texture. We binarize the visibility map into an overlap mask by the timestep-aware binarization operation. Overlap regions are selectively repainted during each denoising step, leading to high-quality novel-view image.

Overview


Comparison


Repaint123 generates high-quality and view-consistent 3D objects from a single unposed image.

Input
cherries
donuts
hamburger
horse
stone dragon statue
Repaint123
(GS)
Repaint123
(NeRF)

More results


cherries
donuts
hamburger
horse
stone dragon statue
stone dragon statue
cherries
donuts
hamburger
horse
stone dragon statue
stone dragon statue

Citation


 @misc{zhang2023repaint123,
      title={Repaint123: Fast and High-quality One Image to 3D Generation with Progressive Controllable 2D Repainting},
      author={Junwu Zhang and Zhenyu Tang and Yatian Pang and Xinhua Cheng and Peng Jin and Yida Wei and Wangbo Yu and Munan Ning and Li Yuan},
      year={2023},
      eprint={2312.13271},
      archivePrefix={arXiv},
      primaryClass={cs.CV}
}