Compared with 2D amodal completion + 3D reconstruction, Amodal3R achieves better performance in terms of 3D reconstruction quality from occluded object. The target objects and occluders are marked with red and green outlines.
Overview: Given an image as input and point prompts in the regions of interest, Amodal3R first extracts the partially visible target object, along with the visibility and occlusion masks using an off-the-shelf 2D segmenter. It then applies DINOv2 to extract features cdino as additional conditioning for the 3D reconstructor. To enhance occlusion reasoning, each transformer block incorporates a mask- weighted cross-attention (via \(c_{vis}\)) and occlusion-aware attention layer (via \(c_{occ}\)), ensuring the 3D reconstructor accurately perceives visible information while effectively inferring occluded parts.