PanoDiffusion

Abstract

Generating complete 360° panoramas from narrow field of view images is ongoing research as omnidirectional RGB data is not readily available. Existing GAN-based approaches face some barriers to achieving higher quality output, and have poor generalization performance over different mask types. In this paper, we present our 360° indoor RGB panorama outpainting model using latent diffusion models (LDM), called PanoDiffusion. We introduce a new bi-modal latent diffusion structure that utilizes both RGB and depth panoramic data during training, which works surprisingly well to outpaint depth-free RGB images during inference. We further propose a novel technique of introducing progressive camera rotations during each diffusion denoising step, which leads to substantial improvement in achieving panorama wraparound consistency. Results show that our PanoDiffusion not only significantly outperforms state-of-the-art methods on RGB-D panorama outpainting by producing diverse well-structured results for different types of masks, but can also synthesize high-quality depth panoramas to provide realistic 3D indoor models.

Model Designs

The overall pipeline of our proposed PanoDiffusion method

PanoDiffusion is finetuned upon existing pretrained diffusion models. Note that the VQ-based encoder-decoders for RGBD images are pre-trained in advance, and fixed in the rest of our framework (identified as “locked”).
During training, no masks are used, and depth information is applied to aid in completing RGB panorama synthesis.
During inference, the depth information is no longer needed for masked RGB panorama outpainting.

I. Latent Diffusion Outpainting

As the partially visible regions are not changed during perceptual image compression, we extend RePaint to latent space outpainting in order to perform our task on 512x1024 panoramas. Note that the 360° wraparound consistency is still preserved in both the pixel and latent domains, which is important for our setting.

II. Two-end Alignment Mechanism

Since 360° panoramas are meant to be wraparound consistent, we apply a circular shift data augmentation, called camera-rotation, to the panorama image dataset to enhance the model's performance. In the inference processing, we propose a novel two-end alignment mechanism that can be naturally combined with our latent diffusion outpainting process. During each iteration, we apply the camera-rotation operation to rotate both the latent vectors and masks by 90° (in order to facilitate the rotation back to the initial position so we choose 90°, there can be many other options), before performing an outpainting step.

III. Bi-modal Latent Diffusion Model

We designed a bi-modal latent diffusion structure to introduce depth information while generating high-quality RGB output, but depth is needed only during training. We trained two separate VQ models for RGB and depth images, and then concatenate them at latent level. Reconstructed RGB-D images can be obtained by decoupling $$z_{rgbd}$$ and decoding.

Results Showcase

We show the capacity of PanoDiffusion on two challanging image completion tasks: 1) RGB panorama outpainting, 2) depth estimation and 3) RGB-D panorama synthesis.

RGB Panorama Outpainting

PanoDiffusion effectively generates semantically meaningful content and plausible appearances on various masks with multiple and diverse solutions.

Depth Estimation

RGB input

Depth GT

PanoDiffusion output

Given complete RGB images, our PanoDiffusion can correspondingly generate accurate absolute depth images.

Synthesized RGB-D Panorama Outpainting Results

We provide some synthesized RGB-D panorama examples where RGB is partially visible and depth is fully masked. The results show that our PanoDiffusion can outpainting plausible and consistent RGB-D panoramas simultaneously

Acknowledgement

This website is adapted from GLIGEN.

BibTeX


@misc{wu2023ipoldm,
  title={PanoDiffusion: Depth-aided 360-degree Indoor RGB Panorama Outpainting via Latent Diffusion Model}, 
  author={Tianhao Wu and Chuanxia Zheng and Tat-Jen Cham},
  year={2023},
  eprint={2307.03177},
  archivePrefix={arXiv},
  primaryClass={cs.CV}
}

PanoDiffusion: 360-degree Panorama Outpainting via Diffusion (ICLR 2024)

PanoDiffusion model not only effectively generates semantically meaningful content and plausible appearances with many objects, such as beds, sofas and TV's, but also provides multiple and diverse solutions for this ill-posed problem. (Feel free to play with them!)