CutScene: Active Vision for Next Best View Planning

3 minute read

Overview

CutScene is an academic group project developed as part of CMU’s 16824 Visual Learning course, focusing on active vision for Next Best View (NBV) Planning in outdoor scenes. This project extends the NeU-NBV framework to larger outdoor navigational environments using a novel data augmentation technique.

Team Members: Mukul Ganwal, Aditya Rauniyar, Omar Alama, Yuechuan Hou

Institution: Carnegie Mellon University Duration: January 2023 - May 2023

Abstract

This project addresses autonomous robotic tasks in large-scale outdoor scenes by extending next best view planning capabilities. We introduce a cutscene augmentation method that semantically divides larger outdoor scenes into smaller components, significantly improving training efficiency and prediction accuracy. Our model is trained to predict uncertainty and RGB values for novel poses within segmented scenes, enabling more effective data collection strategies for outdoor navigation scenarios.

Key Contributions

1. Training on Larger Outdoor Scenes

Utilized the UrbanScene3D dataset for large-scale outdoor environments
Imported 3D scene representations into Blender
Generated custom datasets emulating DTU dataset configuration with camera poses and renders

2. Viewing Range Sensitivity Analysis

Conducted comprehensive analysis across multiple viewing angles (30°, 60°, 90°)
Created datasets with double dome configurations
Evaluated the impact of viewing range on reconstruction quality and uncertainty estimation
Demonstrated high sensitivity of the framework to viewing range parameters

3. Cutscene Augmentation (Novel Technique)

Inspired by 2D augmentation methods like cutout and cutmix, we developed cutscene augmentation which:

Semantically divides large outdoor scenes into smaller subscenes
Increases dataset size and variation significantly
Reduces overfitting when training data is scarce
Substantially improves novel view prediction accuracy
Enhances the network’s generalization capabilities

Technical Approach

Foundation

Built upon the NeU-NBV framework by Jin et al., which adapts PixelNeRF for next best view planning using:

LSTM-based ray tracing optimization for faster volumetric rendering
Uncertainty estimation through log-normal distribution sampling
Information gain maximization for view selection

Architecture

Network: Modified PixelNeRF with uncertainty prediction
Loss Function: Uncertainty-guided loss for RGB and variance prediction
Planning Framework: Iterative next best view selection based on pixel uncertainty

Training Setup

Hardware: NVIDIA 3090Ti GPU
Epochs: 200
Dataset: UrbanScene3D with custom Blender rendering pipeline
Metrics: PSNR and SSIM for reconstruction quality evaluation

Results

Viewing Range Analysis

Demonstrated clear trade-off between viewing range and reconstruction quality
Smaller viewing ranges (30°) achieved superior PSNR and SSIM scores
Uncertainty predictions correlated well with error across all viewing ranges
Validated the framework’s sensitivity to viewing angle parameters

Cutscene Augmentation Impact

Significant performance improvements over baseline models
Successfully generalized to broader viewing ranges (90°) despite training on 60° data
Surpassed DTU pretrained baseline on outdoor scenes
Reduced uncertainty noise, improving NBV planning reliability
Clear visual improvements in both RGB reconstruction and uncertainty estimation

Next Best View Planning

Compared three policies:

Our Method (Cutscene + Uncertainty-guided)
DTU Baseline (Pretrained model)
Maximum View Distance (Geometric baseline)

Our approach outperformed both baselines in reconstruction quality as measured by PSNR and SSIM metrics across iterative view collection.

Future Directions

Automated Semantic Division: Utilize 2D bird’s eye view detectors or geometric clustering for automatic scene segmentation
Scene Rearrangement: Extend augmentation by creating novel configurations from cutout scenes
Scheduled Viewing Range: Implement gradual viewing range increase during training for better stability and generalization
Multi-UAV Integration: Explore collaborative data collection strategies

Technologies & Methods

Deep Learning: NeRF, PixelNeRF, Neural Radiance Fields
Computer Vision: Novel view synthesis, uncertainty estimation, active vision
3D Reconstruction: Volumetric rendering, ray tracing optimization
Tools: Blender, Python, PyTorch
Datasets: UrbanScene3D, DTU (for comparison)

Impact

This work successfully demonstrates the extension of indoor-focused next best view planning methods to large-scale outdoor environments, addressing key challenges in autonomous navigation and robotic exploration. The cutscene augmentation technique provides a practical solution for dealing with scarce large-scale outdoor training data.

Project Website: github.com/FanFeast/cutscene.github.io

Citation:

@article{2023cutscene,
  title={Cutscene: Active vision for Next Best View Planning in outdoor scenes},
  author={Rauniyar, Aditya and Alama, Omar and Hou, Yuechuan and Ganwal, Mukul},
  url={},
  year={2023}
}

Share on

Bluesky Facebook LinkedIn X (formerly Twitter)