SceneSense: Diffusion models for 3D Occupancy Synthesis from Partial Observation

Alec Reed
Brendan Crowe
Doncey Albin
Lorin Achey
Bradley Hayes
Chris Heckman
University of Colorado - Boulder
Code
|
ArXiv


We present SceneSense, a novel generative 3D diffusion model for synthesizing 3D occupancy information from observations. SceneSense uses a running occupancy map and a single RGB-D camera to generate predicted geometry around the platform, even when the geometry is occluded or out of view. The architecture of our framework ensures that the generative model never overwrites observed free or occupied space, making SceneSense a low risk addition to any robotic planning stack.


Photo example results Photo example results


Method

Our occupancy in-painting method ensures that observed space remains intact while integrating SceneSense predictions. Drawing inspiration from image inpainting techniques like image diffusion and guided image synthesis, our approach continuously incorporates known occupancy information during inference. To execute occupancy in-painting, we select a portion of the occupancy map for diffusion, generating masks for occupied and unoccupied voxels. These masks guide the diffusion process to modify only relevant voxels while introducing noise at each step. This iterative process, depicted below, enhances scene predictions’ accuracy while preventing the model from altering observed geometry.

SceneSense Framework


Presentation Video


Citation

@misc{reed2024scenesense,
      title={SceneSense: Diffusion Models for 3D Occupancy Synthesis from Partial Observation}, 
      author={Alec Reed and Brendan Crowe and Doncey Albin and Lorin Achey and Bradley Hayes and Christoffer Heckman},
      year={2024},
      eprint={2403.11985},
      archivePrefix={arXiv},
      primaryClass={cs.RO}
}