Official PyTorch implementation of pose auto-encoders for camera pose regression - see our paper for more details.
This code implements:
- Training and testing of single and multi-scene APRs: PoseNet with different backbones and MS-Transformer. MS-Transformer and its training/testing (1) was cloned from: https://github.com/yolish/multi-scene-pose-transformer
- Training and testing of pose auto-encoders
- Test-time optimization for position regression with camera pose encoding
- Image reconstruction from camera pose encoding
In order to run this repository you will need:
- Python3 (tested with Python 3.7.7)
- PyTorch deep learning framework (tested with version 1.0.0)
- Use torch==1.4.0, torchvision==0.5.0
- Download the Cambridge Landmarks dataset and the 7Scenes dataset
- You can also download pre-trained models to reproduce reported results (see below) Note: All experiments reported in our paper were performed with an 8GB 1080 NVIDIA GeForce GTX GPU
- For a quick set up you can run: pip install -r requirments.txt
The entry point for training and testing APRs is the main.py
script in the root directory.
See example_cmd/example_cmd_train_test_aprs.md
for example command lines.
The entry point for training and testing camera pose auto-encoders are the main_learn_pose_encoding.py
and main_learn_multiscene_pose_encoding.py
scripts in the root directory, corresponding to auto-encoders for single and multi-scene APRs.
See example_cmd\example_cmd_train_test_pose_auto_encoders.md for example command lines.
The entry training and testing an image decoder to reconstruct images from camera pose encoding, is the main_reconstruct_img.py
script.
See example_cmd\example_cmd_reconstruct_img.md
for example command lines.
You can download pretrained models in order to easily reproduce our results
Model (Linked) | Description |
---|---|
APR models | |
PoseNet+MobileNet | Single-scene APR, KingsCollege scene |
PoseNet+ResNet50 | Single-scene APR, KingsCollege scene |
PoseNet+EfficientB0 | Single-scene APR, KingsCollege scene |
MS-Transformer | Multi-scene APR, CambridgeLandmarks dataset |
MS-Transformer | Multi-scene APR, 7Scenes dataset |
Camera Pose Auto-Encoders | |
Auto-Encoder for PoseNet+MobileNet | Auto-Encoder for a single-scene APR, KingsCollege scene |
Auto-Encoder for PoseNet+ResNet50 | Auto-Encoder for a single-scene APR, KingsCollege scene |
Auto-Encoder for PoseNet+EfficientB0 | Auto-Encoder for a single-scene APR, KingsCollege scene |
Auto-Encoder for Auto-Encoder for MS-Transformer | Auto-Encoder for a multi-scene APR, CambridgeLandmarks dataset |
Auto-Encoder for MS-Transformer | Auto-Encoder for a multi-scene APR, 7Scenes dataset |
Decoders for Image Reconstruction | |
Decoder for MS-Transformer | Decoder trained for reconstructing images from the Shop Facade scene |
If you find this repository useful, please consider giving a star and citation:
@article{ShavitandKeller22,
title={Camera Pose Auto-Encoders for Improving Pose Regression},
author={Shavit, Yoli and and Keller, Yosi},
journal={arXiv preprint arXiv:2207.05530},
year={2022}
}