Skip to content

Official implementation of On-the-fly Reconstruction for Large-Scale Novel View Synthesis from Unposed Images. A. Meuleman, I. Shah, A. Lanvin, B. Kerbl, G. Drettakis, ACM TOG (proc. SIGGRAPH) 2025

License

Notifications You must be signed in to change notification settings

graphdeco-inria/on-the-fly-nvs

Repository files navigation

On-the-fly Reconstruction for Large-Scale Novel View Synthesis from Unposed Images

Andreas Meuleman, Ishaan Shah, Alexandre Lanvin, Bernhard Kerbl, George Drettakis

Table of contents: Setup | Data Guidelines | Optimization | Evaluation | Viewers | Capture Guidelines | Video Stream | Acknowledgments

We propose a fast, on-the-fly 3D Gaussian Splatting method that jointly estimates poses and reconstructs scenes. Through fast pose initialization, direct primitive sampling, and scalable clustering and merging, it efficiently handles diverse ordered image sequences of arbitrary length.


Citation

If you find this code useful in a publication, please use the following citation:

@article{meuleman2025onthefly,
  title={On-the-fly Reconstruction for Large-Scale Novel View Synthesis from Unposed Images},
  author={Meuleman, Andreas and Shah, Ishaan and Lanvin, Alexandre and Kerbl, Bernhard and Drettakis, George},
  journal={ACM Transactions on Graphics},
  volume={44},
  number={4},
  year={2025}
}

Setup

Tested on Ubuntu 22.04 and Windows 11 with PyTorch 2.7.0, and CUDA 11.8 and 12.8.
Create the environment:

git clone --recursive https://github.com/graphdeco-inria/on-the-fly-nvs.git
cd on-the-fly-nvs
conda create -n onthefly_nvs python=3.12 -y
conda activate onthefly_nvs

Default setup with CUDA 12.8 (check your compute platform with nvcc --version):

# Windows Only
SET DISTUTILS_USE_SDK=1 # (If you use cmd.exe)
$env:DISTUTILS_USE_SDK=1 # (If you use PowerShell)
# Get the versions corresponding to your compute platform at https://pytorch.org/
pip install torch torchvision xformers --index-url https://download.pytorch.org/whl/cu128
pip install cupy-cuda12x
pip install -r requirements.txt
Setup with CUDA 11.8 Note that xformers will not be installed with CUDA 11.8 because it requires a version of PyTorch that is incompatible with our codebase.
pip install torch torchvision --index-url https://download.pytorch.org/whl/cu118
pip install cupy-cuda11x
pip install -r requirements.txt
Installing CUDA within a Conda Environment If nvcc --version returns an error, you can install CUDA within your Conda environment. After activating your environment and before installing PyTorch, run:
conda install nvidia/label/cuda-12.8.0::cuda-nvcc
Make sure to replace 12.8.0 with a version supported by your driver (check maximum version with nvidia-smi). A list of the available versions can be found here.
Specifying Environment Path You can specify paths for Conda to save space on your system drive:
conda config --add pkgs_dirs <pkg_path>
conda create python=3.12 -y --prefix <env_path>/onthefly_nvs
conda activate <env_path>/onthefly_nvs
Where <pkg_path> is the desired package download location and <env_path>/onthefly_nvs is the desired environment location.

Data Guidelines

Please note that our method is not a drop-in replacement for COLMAP + 3DGS, as it does not reorder images. We require sequential capture that implies several constraints on the kind of data that can be handled. Please follow the Capture Guidelines for best results on your own data.

The dataloader will look for images in ${SOURCE_PATH}/images by default. The images should be ordered alphabetically and have a .png, .jpg or .jpeg extension. It will also optionally look for COLMAP files in ${SOURCE_PATH}/sparse/0 for ground truth poses visualization.

To download the datasets used in Table 1 of the paper, run:

# All datasets will be downloaded in data/
python scripts/download_datasets.py --out_dir data/

# Or download a specific dataset
python scripts/download_datasets.py --out_dir data/ --datasets MipNeRF360 # or TUM, or StaticHikes

For best results, we recommend using a high-quality camera and providing still photographs to the method. We provide an experimental prototype for reconstruction from a Video Stream that will not provide the same level of quality.

Optimization

The following command runs the reconstruction and saves the model. If -m is not provided, the model will be saved in results/xxxxxx/.

python train.py -s ${SOURCE_PATH} -m ${MODEL_PATH}

This command uses all input images, which provides the best result. Metrics in the paper are computed with evaluation protocol below (see Evaluation). See also the Interactive Viewers section below for direct feedback on your training.
Example basic training command (see Data Guidelines for downloading the dataset):

python train.py -s data/MipNeRF360/garden -m results/MipNeRF360/garden
Main Command Line Arguments for train.py

--source_path / -s

Path to the data folder (should have sparse/0/ if using COLMAP or evaluating poses).

--model_path / -m

Directory to store the renders from test view and checkpoints after training. If not set, will be set to results/xxxxxx.

--images_dir / -i

source_path/images_dir/ is the path to the images (with extensions jpg, png or jpeg). If not set, will use source_path/images.

--masks_dir

If set, source_path/masks_dir/ is the path to optional masks to apply to the images before computing the loss (the format expected is png).

--viewer_mode

Running mode for the viewer, choices are [local, server, web, none], none by default. See Interactive Viewers for more details.

--ip

IP address of the viewer client, if using server viewer_mode, 0.0.0.0 by default.

--port

Port of the viewer client, if using "server" viewer_mode, 6009 by default.

--downsampling

Downsampling ratio for input images.

--save_every

Frequency of exporting renders w.r.t input frames.

--save_at_finetune_epoch

Enable finetuning after the initial on-the-fly reconstruction and save the scene at the end of the specified epochs when fine-tuning.

--eval_poses

Compare poses to COLMAP.

--use_colmap_poses

Load COLMAP data for pose and intrinsics initialization.

--test_hold

Holdout for test set, will exclude every test_hold image from the Gaussian optimization and use them for testing. The test frames will still be used for training the pose. If set to -1, no keyframes will be excluded from training.

--test_frequency

Test and get metrics every test_frequency keyframes.

--display_runtimes

Display runtimes for each step in the tqdm bar.

Run python train.py -h for a complete list of available options.

Evaluation

The following command runs the reconstruction while excluding every ${TEST_HOLD}-th image from the Gaussian optimization. It evaluates and saves the test images to ${MODEL_PATH}/test_images at the end of training.

python train.py -s ${SOURCE_PATH} -m ${MODEL_PATH} --test_hold ${TEST_HOLD}

Example (see Data Guidelines for downloading the dataset):

python train.py -s data/MipNeRF360/garden -m results/MipNeRF360/garden --test_hold 8 --test_frequency 20

To evaluate all scenes reported in Table 1 of the paper, run:

python scripts/train_eval_all.py --base_dir data/ --base_out_dir results/

Interactive Viewers

The viewers allow navigation of the scene during and after optimization, and visualization of both optimized and ground truth poses. W, A, S, D, Q, E control camera translation and I, K, J, L, U, O control rotation. We release the base viewer components in a separate repository so that they can be used in other projects. If you find it useful, please consider citing it.

viewer_demo.mp4

Live Optimization Viewer

To open an interactive viewer window during the optimization process, use the following command:

python train.py -s ${SOURCE_PATH} --viewer_mode local

Example (see Data Guidelines for downloading the dataset):

python train.py -s data/MipNeRF360/garden --viewer_mode local

This viewer operates concurrently with the optimization process. You can enable throttling by clicking the Throttling checkbox and adjust the Max FPS slider in the viewer to balance resource allocation between the viewer and the optimization task. Enabling the live optimization viewer will keep the optimization process running after the training is complete.

Visualizing a Reconstructed Scene

After optimization, you can visualize the reconstructed scene using the following command:

python gaussianviewer.py local ${MODEL_PATH}

Example:

python gaussianviewer.py local results/MipNeRF360/garden

Network Viewer

The network viewer allows you to visualize a scene and monitor the optimization process from a different machine. The client waits for the server program to connect and stream rendered images.

To run the client, use the following command:

python gaussianviewer.py client

On the server side, run one of the following commands:

# live optimization visualization
python train.py -s ${SOURCE_PATH} --viewer_mode server

# or
# visualize a reconstructed scene
python gaussianviewer.py server ${MODEL_PATH}

When using different machines, ensure the specified port is forwarded or set --ip and --port appropriately for both the client and server.

Lightweight Remote Viewer Environment The remote viewer has fewer dependencies, making it convenient to run on a different machine than the one performing the optimization. Since rendering occurs on the host machine, the client machine does not need a CUDA-compatible GPU.

To set up the remote viewer on a different machine, follow these steps:

conda create -n onthefly_nvs_remoteviewer python=3.12 -y
conda activate onthefly_nvs_remoteviewer
pip install submodules/graphdecoviewer

Note: On the first run, the settings window might be hidden behind the Point View window. Move the window to reveal it. The updated layout will be stored when the viewer is closed for future runs.

Render a Video from an Optimized Scene

The following command renders the reconstruction saved in ${MODEL_PATH} along the path ${RENDER_PATH} and exports the frames and video in ${VIDEO_DIR}. The camera trajectory files in ${RENDER_PATH} must follow the COLMAP format (images.[bin/txt] and cameras.[bin/txt]).

python scripts/render_path.py -m ${MODEL_PATH} --render_path ${RENDER_PATH} --out_dir ${VIDEO_DIR}

Here, we render the reconstruction of the garden scene along the optimized poses (that train.py saves in ${MODEL_PATH}/colmap):

python scripts/render_path.py -m results/MipNeRF360/garden --render_path results/MipNeRF360/garden/colmap --out_dir results/MipNeRF360/garden/video
Aligning Render Path The poses in ${RENDER_PATH} may be in a different coordinate system than the optimized scene, so we provide an optional argument --alignment_path to align it to the scene. Specifically, we find a transformation between the cameras in <alignment_path> and the scene keyframes, and apply this transformation to the cameras in ${RENDER_PATH}. Note that the image names corresponding to the poses in <alignment_path> should match the image names used to optimize the scene. This is useful for rendering a camera path that has been captured on a different viewer and method (e.g. 3DGS and SIBR) and ensure the rendered video paths match.

Capture Guidelines

As mentioned above, our method is not a drop-in replacement for COLMAP and 3DGS. In particular, for efficiency and to allow live feedback, we do not perform exhaustive matching or reorder the input images like SfM approaches such as COLMAP. Many standard datasets used in previous work thus cannot be handled by our approach (e.g., Zip-NeRF scenes, or most MipNeRF360 scenes such as bicycle).

It is important to carefully follow the guidelines below for the method to work and to achieve good results:

  • Ordered Sequences: Capture images sequentially with sufficient overlap. Ideally, consecutive frames should share >2/3rd of their content. This typically involves walking slowly around a scene and taking pictures sequentially. This works very well for outdoors scenes where one naturally moves forward; for indoors scenes, restrictions in space often lead to errors. Please see below on what to avoid. Again, unordered datasets (e.g., Zip-NeRF-style, many DeepBlending scenes, etc.) are not supported.
  • Translation: Ensure sufficient translation between consecutive frames for accurate triangulation. Avoid rotation without translation: taking a step sideways between pictures when turning helps maintain a sufficient baseline, especially indoors. This is critical for bootstrapping, as an insufficient translation can lead to incorrect focal length estimation, but is also important throughout capture.
  • Resolution: We found that the matcher performs best within the 1-2MP range.
  • Pinhole Camera Model: We optimize only for focal length, so ensure your images follow a pinhole projection with centered principal point (no fisheye/distortion).
  • No Loop Closure: Drift compensation is not performed when the trajectory revisits a previously reconstructed region. This can cause misalignments due to accumulated pose errors, especially after registering a long sequence between the start and end of the loop. For small loops (e.g., the Truck scene from the T and T dataset) our method works well.

Example Failure Cases:

The reconstruction fails when the sequence is not ordered or if there is not enough overlap between consecutive frames. Bootstrapping fails when there is high rotation but small translation. Drift is not corrected when the trajectory loops (estimated poses in white, ground truth in red).

Video Stream

We provide a proof-of-concept setup for reconstructing a scene from a video stream with live feedback. Due to lower image quality and limited control over capture conditions, reconstructions may be less satisfactory than with images carefully captured following the guidelines above using a high-quality camera. Reconstruction quality also depends heavily on camera quality and network speed, with poor connectivity causing motion blur and compression artifacts.

1. Setting up a Stream

To set up a stream from a phone, we found IP Webcam (For Android) and IP Camera Lite (For iOS / iPadOS) to be a simple solution. We found that the method works best with wide angle camera at 1080p resolution.

Settings for IP Webcam (Android)

Using "Video Preferences", select the widest camera of your device. Set the resolution to be 1920x1080 and video orientation to be "Landscape". You could also enable background streaming in the "Optional Permissions" menu. Finally start the server using "Start Server". You can optionally select "Run in Background" in "Actions" menu which will keep the stream running when the application is put into background.

Settings for IP Camera Lite (iOS / iPadOS)

In the "Settings" menu go the the "Video overlay" menu and uncheck the "Display Server Timestamp", "Display Camera Name" and "Display Battery Info" options. Delete the username and password in the RTSP settings to disable authentication. Then start the the server by clicking "Turn on IP Camera Server". Before starting the reconstruction, disable the "Multi-Cam" option in the "Settings" menu. Set the video resolution to 1920x1080 and choose the widest camera.

General Instructions

You can visit the stream URL at http://<ip>:<port>/video (replace <ip> and <port> with the values shown in the app) on your browser to check the stream and modify any settings if needed. IP Camera Lite adds a watermark to the stream which might affect the reconstruction quality.

2. Setting up the Web Viewer

We provide a simple web viewer for live feedback on the mobile device running the video stream app. To access the web viewer from another machine from the one running the reconstruction you will need to allow access to the TCP ports 6009 and 8000:

Ubuntu Firewall Settings The following command will allow access to the TCP ports 6009 and 8000.
Note that different Linux distributions have different firewall management software.
sudo ufw allow 8000/tcp
sudo ufw allow 6009/tcp
sudo ufw enable
sudo ufw status
If successful, the output should include the following lines:
8000/tcp                   ALLOW       Anywhere                  
6009/tcp                   ALLOW       Anywhere                  
Fedora Firewall Settings The following command will allow access to the TCP ports 6009 and 8000.
Note that different Linux distributions have different firewall management software.
sudo firewall-cmd --add-port 8000/tcp --permanent
sudo firewall-cmd --add-port 6009/tcp --permanent
Windows Firewall Settings
  1. Press Windows + R, type wf.msc, and hit Enter.
  2. In the left pane, click Inbound Rules.
  3. In the right pane, click New Rule.
  4. Select Port, then click Next.
  5. Select TCP, enter 6009, 8000, click Next.
  6. Choose Allow the connection, then click Next.
  7. Check all profiles (Domain, Private, Public), click Next.
  8. Name the rule (e.g., WebViewer 6009, 8000), then click Finish.

On the phone or tablet, open a browser in split screen mode side-by-side (only supported on Android and iPadOS) with the stream app. Enter http://<ip>:8000/webviewer in the browser, where <ip> is the IP address of the computer running the reconstruction server (The page will only load after running the train script). Find the IPv4 address by running ifconfig (Ubuntu) or ipconfig (Windows) in the command prompt.

3. Starting the Reconstruction Server

Run the reconstruction script with the webviewer mode:

python train.py -s ${STREAM_URL} --downsampling=1.5 --viewer_mode web

Note: We found that setting the stream to 1080p resolution before downsampling by a factor of 1.5 mitigates the impact of compression artifacts.

4. Starting the Reconstruction from the Mobile Device

After the previous steps, the brower on the mobile devide should show the webviewer. Click the "Start" button to start the reconstruction.

Live reconstruction on Android using IP Webcam in split screen mode with browser connected to the webviewer. Live reconstruction on iPad using IP Camera Lite in split screen mode with browser connected to the webviewer.

Live 3D reconstruction on Android: the input video stream (left) and the browser-based preview (right) shown in split-screen mode:

stream_demo_interface.mp4

Live reconstruction monitored with the remote viewer:

stream_demo.mp4

Acknowledgments

This work was funded by the European Research Council (ERC) Advanced Grant NERPHYS, number 101141721 https://project.inria.fr/nerphys/. The authors are grateful to the OPAL infrastructure of the Université Côte d'Azur for providing resources and support, as well as Adobe and NVIDIA for software and hardware donations. This work was granted access to the HPC resources of IDRIS under the allocation AD011015561 made by GENCI. Bernhard Kerbl has received funding by WWTF (project ICT22-055 - Instant Visualization and Interaction for Large Point Clouds). Thanks to Peter Hedman for early comments and suggestions, George Kopanas for proofreading a draft, and Jeffrey Hu for setting up the phone-based capture.

About

Official implementation of On-the-fly Reconstruction for Large-Scale Novel View Synthesis from Unposed Images. A. Meuleman, I. Shah, A. Lanvin, B. Kerbl, G. Drettakis, ACM TOG (proc. SIGGRAPH) 2025

Resources

License

Stars

Watchers

Forks

Contributors 5