SurgCUT3R: Surgical Scene-Aware Continuous Understanding of Temporal 3D Representation
Summary
SurgCUT3R adapts a state-of-the-art unified online reconstruction model to monocular surgical endoscopic video, addressing (i) the lack of supervised training data and (ii) accumulated pose drift on long sequences.
- Metric-scale pseudo-GT depth from public stereo surgical datasets (SCARED, StereoMIS).
- Hybrid supervision combining pseudo-GT with geometric self-correction to resist label noise.
- Hierarchical long-sequence inference using global stability + local accuracy models to suppress drift.
4D visualization
Depth maps and pointmaps.
Abstract
Reconstructing surgical scenes from monocular endoscopic video is critical for advancing robotic-assisted surgery. However, the application of state-of-the-art general-purpose reconstruction models is constrained by two key challenges: the lack of supervised training data and performance degradation over long video sequences. To overcome these limitations, we propose SurgCUT3R, a systematic framework that adapts unified 3D reconstruction models to the surgical domain. Our contributions are threefold. First, we develop a data generation pipeline that exploits public stereo surgical datasets to produce large-scale, metric-scale pseudo-ground-truth depth maps, effectively bridging the data gap. Second, we propose a hybrid supervision strategy that couples our pseudo-ground-truth with geometric self-correction to enhance robustness against inherent data imperfections. Third, we introduce a hierarchical inference framework that employs two specialized models to effectively mitigate accumulated pose drift over long surgical videos: one for global stability and one for local accuracy. Experiments on the SCARED and StereoMIS datasets demonstrate that our method achieves a competitive balance between accuracy and efficiency, delivering near state-of-the-art but substantially faster pose estimation and offering a practical and effective solution for robust reconstruction in surgical environments.
Method
Results
BibTeX
@inproceedings{xu2026surgcut3r,
title = {SurgCUT3R: Surgical Scene-Aware Continuous Understanding of Temporal 3D Representation},
author = {Xu, Kaiyuan and Hong, Fangzhou and Elson, Daniel and Huang, Baoru},
booktitle = {Proceedings of the IEEE International Conference on Robotics and Automation (ICRA)},
year = {2026},
url = {https://YOUR_DOMAIN.com/SurgCUT3R}
}