IPO-LDM: Depth-aided 360-degree Indoor RGB Panorama Outpainting via Latent Diffusion Model

SCSE, Nanyang Technological University; VGG, University of Oxford;

Masked input

IPO-LDM output

3D scenes (interactable :) )

IPO-LDM model not only effectively generates semantically meaningful content and plausible appearances with many objects, such as beds, sofas and TV's, but also provides multiple and diverse solutions for this ill-posed problem. (Feel free to play with them!)


Generating complete 360° panoramas from narrow field of view images is ongoing research as omnidirectional RGB data is not readily available. Existing GAN-based approaches face some barriers to achieving higher quality output, and have poor generalization performance over different mask types. In this paper, we present our 360° indoor RGB panorama outpainting model using latent diffusion models (LDM), called IPO-LDM. We introduce a new bi-modal latent diffusion structure that utilizes both RGB and depth panoramic data during training, but works surprisingly well to outpaint normal depth-free RGB images during inference. We further propose a novel technique of introducing progressive camera rotations during each diffusion denoising step, which leads to substantial improvement in achieving panorama wraparound consistency. Results show that our IPO-LDM not only significantly outperforms state-of-the-art methods on RGB panorama outpainting, but can also produce multiple and diverse well-structured results for different types of masks.

Model Designs

The overall pipeline of our proposed IPO-LDM method

  • IPO-LDM is finetuned upon existing pretrained diffusion models. Note that the VQ-based encoder-decoders for RGBD images are pre-trained in advance, and fixed in the rest of our framework (identified as “locked”).
  • During training, no masks are used, and depth information is applied to aid in completing RGB panorama synthesis.
  • During inference, the depth information is no longer needed for masked RGB panorama outpainting.

I. Latent Diffusion Outpainting

As the partially visible regions are not changed during perceptual image compression, we extend RePaint to latent space outpainting in order to perform our task on 512x1024 panoramas. Note that the 360° wraparound consistency is still preserved in both the pixel and latent domains, which is important for our setting.

II. Two-end Alignment Mechanism

Since 360° panoramas are meant to be wraparound consistent, we apply a circular shift data augmentation, called camera-rotation, to the panorama image dataset to enhance the model's performance. In the inference processing, we propose a novel two-end alignment mechanism that can be naturally combined with our latent diffusion outpainting process. During each iteration, we apply the camera-rotation operation to rotate both the latent vectors and masks by 90° (in order to facilitate the rotation back to the initial position so we choose 90°, there can be many other options), before performing an outpainting step.

III. Bi-modal Latent Diffusion Model

We designed a bi-modal latent diffusion structure to introduce depth information while generating high-quality RGB output, but depth is needed only during training. We trained two separate VQ models for RGB and depth images, and then concatenate them at latent level. Reconstructed RGB-D images can be obtained by decoupling $$z_{rgbd}$$ and decoding.

Results Showcase

We show the capacity of IPO-LDM on two challanging image completion tasks: 1) RGB panorama outpainting and 2) depth estimation.

RGB Panorama Outpainting

IPO-LDM effectively generates semantically meaningful content and plausible appearances on various masks with multiple and diverse solutions.

Depth Estimation

RGB input

Depth GT

IPO-LDM output

Given complete RGB images, our IPO-LDM can correspondingly generate accurate absolute depth images.

Coming Soon

Code will be available soon.


This website is adapted from GLIGEN.


  title={IPO-LDM: Depth-aided 360-degree Indoor RGB Panorama Outpainting via Latent Diffusion Model}, 
  author={Tianhao Wu and Chuanxia Zheng and Tat-Jen Cham},