181 lines
7.6 KiB
Markdown
181 lines
7.6 KiB
Markdown
# Monoloco
|
|
|
|
> We tackle the fundamentally ill-posed problem of 3D human localization from monocular RGB images.
|
|
Driven by the limitation of neural networks outputting point estimates,
|
|
we address the ambiguity in the task with a new neural network predicting confidence intervals through a
|
|
loss function based on the Laplace distribution.
|
|
Our architecture is a light-weight feed-forward neural network which predicts the 3D coordinates given 2D human pose.
|
|
The design is particularly well suited for small training data and cross-dataset generalization.
|
|
Our experiments show that (i) we outperform state-of-the art results on KITTI and nuScenes datasets,
|
|
(ii) even outperform stereo for far-away pedestrians, and (iii) estimate meaningful confidence intervals.
|
|
We further share insights on our model of uncertainty in case of limited observation and out-of-distribution samples.
|
|
|
|
```
|
|
@article{bertoni2019monoloco,
|
|
title={MonoLoco: Monocular 3D Pedestrian Localization and Uncertainty Estimation},
|
|
author={Bertoni, Lorenzo and Kreiss, Sven and Alahi, Alexandre},
|
|
journal={arXiv preprint arXiv:xxxx},
|
|
year={2019}
|
|
}
|
|
```
|
|
|
|
Add link paper
|
|

|
|
# Setup
|
|
|
|
### Install
|
|
Python 3.6 or 3.7 is required for nuScenes development kit. Python 3 is required for openpifpaf.
|
|
All details for Pifpaf pose detector at [openpifpaf](https://github.com/vita-epfl/openpifpaf).
|
|
|
|
|
|
```
|
|
pip install nuscenes-devkit openpifpaf
|
|
```
|
|
### Data structure
|
|
|
|
Data
|
|
├── arrays
|
|
├── models
|
|
├── kitti
|
|
├── nuscenes
|
|
├── logs
|
|
|
|
|
|
Run the following to create the folders:
|
|
```
|
|
mkdir data
|
|
cd data
|
|
mkdir arrays models data-kitti data-nuscenes logs
|
|
```
|
|
|
|
### Pre-trained Models
|
|
* Download a MonoLoco pre-trained model from Google Drive: ADD LINK and save it in `data/models`
|
|
* Download a Pifpaf pre-trained model from [openpifpaf](https://github.com/vita-epfl/openpifpaf) project
|
|
and save it into `data/models`
|
|
|
|
|
|
# Interfaces
|
|
All the commands are run through a main file called `main.py` using subparsers.
|
|
To check all the commands for the parser and the subparsers run:
|
|
|
|
* `python3 src/main.py --help`
|
|
* `python3 src/main.py prep --help`
|
|
* `python3 src/main.py predict --help`
|
|
* `python3 src/main.py train --help`
|
|
* `python3 src/main.py eval --help`
|
|
|
|
|
|
# Predict
|
|
The predict script receives an image (or an entire folder using glob expressions),
|
|
calls PifPaf for 2d human pose detection over the image
|
|
and runs Monoloco for 3d location of the detected poses.
|
|
The command `--networks` defines if saving pifpaf outputs, MonoLoco outputs or both.
|
|
You can check all commands for Pifpaf at [openpifpaf](https://github.com/vita-epfl/openpifpaf).
|
|
|
|
|
|
Output options include json files and/or visualization of the predictions on the image in *frontal mode*,
|
|
*birds-eye-view mode* or *combined mode* and can be specified with `--output_types`
|
|
|
|
|
|
### Ground truth matching
|
|
* In case you provide a ground-truth json file to compare the predictions of MonoLoco,
|
|
the script will match every detection using Intersection over Union metric.
|
|
The ground truth file can be generated using the subparser `prep` and called with the command `--path_gt`.
|
|
Check preprocess section for more details.
|
|
|
|
* In case you don't provide a ground-truth file, the script will look for a predefined path.
|
|
If it does not find the file, it will generate images
|
|
with all the predictions without ground-truth matching.
|
|
|
|
Below an example with and without ground-truth matching. They have been created (adding or removing `--path_gt`) with:
|
|
`python src/main.py predict --networks monoloco --glob docs/002282.png --output_types combined --scale 2 --y_scale 1.85
|
|
--model data/models/base_model.pickle --n_dropout 100 --z_max 30`
|
|
|
|
With ground truth matching (only matching people):
|
|

|
|
|
|
Without ground_truth matching (all the detected people):
|
|

|
|
|
|
# Preprocess
|
|
|
|
### Datasets
|
|
|
|
#### 1) KITTI dataset
|
|
Download KITTI ground truth files and camera calibration matrices for training
|
|
from [here](http://www.cvlibs.net/datasets/kitti/eval_object.php?obj_benchmark=3d) and
|
|
save them respectively into `data/kitti/gt` and `data/kitti/calib`.
|
|
To extract pifpaf joints, you also need to download training images, put it in any folder and soft link in `
|
|
data/kitti/images`
|
|
|
|
#### 2) nuScenes dataset
|
|
Download nuScenes dataset (any version: Mini, Teaser or TrainVal) from [nuScenes](https://www.nuscenes.org/download),
|
|
save it anywhere and soft link it in `data/nuscenes`
|
|
|
|
|
|
### Annotations to preprocess
|
|
MonoLoco is trained using 2D human pose joints. To create them run pifaf over KITTI or nuScenes training images.
|
|
You can create them running the predict script and using `--network pifpaf`.
|
|
|
|
### Inputs joints for training
|
|
MonoLoco is trained using 2D human pose joints matched with the ground truth location provided by
|
|
nuScenes or KITTI Dataset. To create the joints run: `python src/main.py prep` specifying:
|
|
1. `--dir_ann` annotation directory containing pifpaf joints of kitti or nuScenes.
|
|
|
|
2. `--dataset` Which dataset to preprocess. For nuscenes, all three versions of the
|
|
dataset are supported: nuscenes_mini, nuscenes, nuscenes_teaser.
|
|
|
|
### Ground truth file for evaluation
|
|
The preprocessing script also outputs a second json file called **names.json** which provide a dictionary indexed
|
|
by the image name to easily access ground truth files for evaluation and prediction purposes.
|
|
|
|
|
|
|
|
# Train
|
|
Provide the json file containing the preprocess joints (**joints.json**) as argument.
|
|
|
|
As simple as `python3 src/main.py --train --joints 'data/arrays/joints.json`
|
|
|
|
All the hyperparameters options can be checked at `python3 src/main.py train --help`.
|
|
|
|
### Hyperparameters tuning
|
|
Random search in log space is provided. An example: `python3 src/main.py train --hyp --multiplier 10 --r_seed 1`.
|
|
One iteration of the multiplier includes 6 runs.
|
|
|
|
|
|
# Evaluation
|
|
Evaluate performances of the trained model on KITTI or Nuscenes Dataset.
|
|
### 1) nuScenes
|
|
Evaluation on nuScenes is already provided during training. It is also possible to evaluate an existing model running
|
|
`python src/main.py eval --dataset nuscenes --model <model to evaluate>`
|
|
|
|
### 2) KITTI
|
|
### Baselines
|
|
We provide evaluation on KITTI for models trained on nuScenes or KITTI. We compare them with other monocular
|
|
and stereo Baselines:
|
|
|
|
[Mono3D](https://www.cs.toronto.edu/~urtasun/publications/chen_etal_cvpr16.pdf),
|
|
[3DOP](https://xiaozhichen.github.io/papers/nips15chen.pdf),
|
|
[MonoDepth](https://arxiv.org/abs/1609.03677) and our
|
|
[Geometrical Baseline](src/eval/geom_baseline.py).
|
|
|
|
* **Mono3D**: download validation files from [here](http://3dimage.ee.tsinghua.edu.cn/cxz/mono3d)
|
|
and save them into `data/kitti/m3d`
|
|
* **3DOP**: download validation files from [here](https://xiaozhichen.github.io/)
|
|
and save them into `data/kitti/3dop`
|
|
* **MonoDepth**: compute an average depth for every instance using the following script
|
|
[here](https://github.com/Parrotlife/pedestrianDepth-baseline/tree/master/MonoDepth-PyTorch)
|
|
and save them into `data/kitti/monodepth`
|
|
* **GeometricalBaseline**: A geometrical baseline comparison is provided.
|
|
The best average value for comparison can be created running `python src/main.py eval --geometric`
|
|
|
|
#### Evaluation
|
|
First the model preprocess the joints starting from json annotations predicted from pifpaf, runs the model and save the results
|
|
in txt file with format comparable to other baseline.
|
|
Then the model performs evaluation.
|
|
|
|
The following graph is obtained running:
|
|
`python3 src/main.py eval --dataset kitti --model data/models/base_model.pickle`
|
|

|
|
|