update license
This commit is contained in:
parent
c40fe6bf89
commit
be6a5e6734
2
LICENSE
2
LICENSE
@ -1,4 +1,4 @@
|
||||
Copyright 2020-2021 by EPFL/VITA. All rights reserved.
|
||||
Copyright 2018-2021 by EPFL/VITA. All rights reserved.
|
||||
|
||||
This project and all its files are licensed under
|
||||
GNU AGPLv3 or later version.
|
||||
|
||||
@ -355,7 +355,7 @@ When using this library in your research, we will be happy if you cite us!
|
||||
@InProceedings{bertoni_2021_icra,
|
||||
author = {Bertoni, Lorenzo and Kreiss, Sven and Mordan, Taylor and Alahi, Alexandre},
|
||||
title = {MonStereo: When Monocular and Stereo Meet at the Tail of 3D Human Localization},
|
||||
booktitle = {International Conference on Robotics and Automation},
|
||||
booktitle = {International Conference on Robotics and Automation (ICRA)},
|
||||
year = {2021}
|
||||
}
|
||||
```
|
||||
|
||||
@ -1,151 +0,0 @@
|
||||
|
||||
# MonStereo
|
||||
|
||||
|
||||
> Monocular and stereo vision are cost-effective solutions for 3D human localization
|
||||
in the context of self-driving cars or social robots. However, they are usually developed independently
|
||||
and have their respective strengths and limitations. We propose a novel unified learning framework that
|
||||
leverages the strengths of both monocular and stereo cues for 3D human localization.
|
||||
Our method jointly (i) associates humans in left-right images,
|
||||
(ii) deals with occluded and distant cases in stereo settings by relying on the robustness of monocular cues,
|
||||
and (iii) tackles the intrinsic ambiguity of monocular perspective projection by exploiting prior knowledge
|
||||
of human height distribution.
|
||||
We achieve state-of-the-art quantitative results for the 3D localization task on KITTI dataset
|
||||
and estimate confidence intervals that account for challenging instances.
|
||||
We show qualitative examples for the long tail challenges such as occluded,
|
||||
far-away, and children instances.
|
||||
|
||||
```
|
||||
@InProceedings{bertoni_monstereo,
|
||||
author = {Bertoni, Lorenzo and Kreiss, Sven and Mordan, Taylor and Alahi, Alexandre},
|
||||
title = {MonStereo: When Monocular and Stereo Meet at the Tail of 3D Human Localization},
|
||||
booktitle = {arXiv:2008.10913},
|
||||
month = {August},
|
||||
year = {2020}
|
||||
}
|
||||
```
|
||||
|
||||
# Prediction
|
||||
The predict script receives an image (or an entire folder using glob expressions),
|
||||
calls PifPaf for 2d human pose detection over the image
|
||||
and runs MonStereo for 3d location of the detected poses.
|
||||
|
||||
Output options include json files and/or visualization of the predictions on the image in *frontal mode*,
|
||||
*birds-eye-view mode* or *multi mode* and can be specified with `--output_types`
|
||||
|
||||
|
||||
### Pre-trained Models
|
||||
* Download Monstereo pre-trained model from
|
||||
[Google Drive](https://drive.google.com/drive/folders/1jZToVMBEZQMdLB5BAIq2CdCLP5kzNo9t?usp=sharing),
|
||||
and save them in `data/models`
|
||||
(default) or in any folder and call it through the command line option `--model <model path>`
|
||||
* Pifpaf pre-trained model will be automatically downloaded at the first run.
|
||||
Three standard, pretrained models are available when using the command line option
|
||||
`--checkpoint resnet50`, `--checkpoint resnet101` and `--checkpoint resnet152`.
|
||||
Alternatively, you can download a Pifpaf pre-trained model from [openpifpaf](https://github.com/vita-epfl/openpifpaf)
|
||||
and call it with `--checkpoint <pifpaf model path>`. All experiments have been run with v0.8 of pifpaf.
|
||||
If you'd like to use an updated version, we suggest to re-train the MonStereo model as well.
|
||||
* The model for the experiments is provided in *data/models/ms-200710-1511.pkl*
|
||||
|
||||
|
||||
### Ground truth matching
|
||||
* In case you provide a ground-truth json file to compare the predictions of MonSter,
|
||||
the script will match every detection using Intersection over Union metric.
|
||||
The ground truth file can be generated using the subparser `prep` and called with the command `--path_gt`.
|
||||
As this step requires running the pose detector over all the training images and save the annotations, we
|
||||
provide the resulting json file for the category *pedestrians* from
|
||||
[Google Drive](https://drive.google.com/file/d/1e-wXTO460ip_Je2NdXojxrOrJ-Oirlgh/view?usp=sharing)
|
||||
and save it into `data/arrays`.
|
||||
|
||||
* In case the ground-truth json file is not available, with the command `--show_all`, is possible to
|
||||
show all the prediction for the image
|
||||
|
||||
After downloading model and ground-truth file, a demo can be tested with the following commands:
|
||||
|
||||
`python3 -m monstereo.run predict --glob docs/000840*.png --output_types multi --scale 2
|
||||
--model data/models/ms-200710-1511.pkl --z_max 30 --checkpoint resnet152 --path_gt data/arrays/names-kitti-200615-1022.json
|
||||
-o data/output`
|
||||
|
||||

|
||||
|
||||
`python3 -m monstereo.run predict --glob docs/005523*.png --output_types multi --scale 2
|
||||
--model data/models/ms-200710-1511.pkl --z_max 30 --checkpoint resnet152 --path_gt data/arrays/names-kitti-200615-1022.json
|
||||
-o data/output`
|
||||
|
||||

|
||||
|
||||
# Preprocessing
|
||||
Preprocessing and training step are already fully supported by the code provided,
|
||||
but require first to run a pose detector over
|
||||
all the training images and collect the annotations.
|
||||
The code supports this option (by running the predict script and using `--mode pifpaf`).
|
||||
|
||||
### Data structure
|
||||
|
||||
Data
|
||||
├── arrays
|
||||
├── models
|
||||
├── kitti
|
||||
├── logs
|
||||
├── output
|
||||
|
||||
Run the following to create the folders:
|
||||
```
|
||||
mkdir data
|
||||
cd data
|
||||
mkdir arrays models kitti logs output
|
||||
```
|
||||
|
||||
|
||||
### Datasets
|
||||
Download KITTI ground truth files and camera calibration matrices for training
|
||||
from [here](http://www.cvlibs.net/datasets/kitti/eval_object.php?obj_benchmark=3d) and
|
||||
save them respectively into `data/kitti/gt` and `data/kitti/calib`.
|
||||
To extract pifpaf joints, you also need to download training images soft link the folder in `
|
||||
data/kitti/images`
|
||||
|
||||
|
||||
### Annotations to preprocess
|
||||
MonStereo is trained using 2D human pose joints. To obtain the joints the first step is to run
|
||||
pifaf over KITTI training images, by either running the predict script and using `--mode pifpaf`,
|
||||
or by using pifpaf code directly.
|
||||
MonStereo preprocess script expects annotations from left and right images in 2 different folders
|
||||
with the same path apart from the suffix `_right` for the ``right" folder.
|
||||
For example `data/annotations` and `data/annotations_right`.
|
||||
Do not change name of json files created by pifpaf. For each left annotation,
|
||||
the code will look for the corresponding right annotation.
|
||||
|
||||
### Inputs joints for training
|
||||
MonoStereo is trained using 2D human pose joints matched with the ground truth location provided by
|
||||
KITTI Dataset. To create the joints run: `python3 -m monstereo.run prep` specifying:
|
||||
|
||||
`--dir_ann` annotation directory containing Pifpaf joints of KITTI for the left images.
|
||||
|
||||
|
||||
### Ground truth file for evaluation
|
||||
The preprocessing script also outputs a second json file called **names-<date-time>.json** which provide a dictionary indexed
|
||||
by the image name to easily access ground truth files for evaluation and prediction purposes.
|
||||
|
||||
|
||||
# Training
|
||||
Provide the json file containing the preprocess joints as argument.
|
||||
As simple as `python3 -m monstereo.run train --joints <json file path> `
|
||||
All the hyperparameters options can be checked at `python3 -m monstereo.run train --help`.
|
||||
|
||||
# Evaluation (KITTI Dataset)
|
||||
### Average Localization Metric (ALE)
|
||||
We provide evaluation on KITTI in the eval section. Txt files for MonStereo are generated with the command:
|
||||
|
||||
`python -m monstereo.run eval --dir_ann <directory of pifpaf annotations> --model data/models/ms-200710-1511.pkl --generate`
|
||||
|
||||
<img src="quantitative_mono.png" width="600"/>
|
||||
|
||||
### Relative Average Precision Localization (RALP-5%)
|
||||
We modified the original C++ evaluation of KITTI to make it relative to distance. We use **cmake**.
|
||||
To run the evaluation, first generate the txt files with:
|
||||
|
||||
`python -m monstereo.run eval --dir_ann <directory of pifpaf annotations> --model data/models/ms-200710-1511.pkl --generate`
|
||||
|
||||
Then follow the instructions of this [repository](https://github.com/cguindel/eval_kitti)
|
||||
to prepare the folders accordingly (or follow kitti guidelines) and run evaluation.
|
||||
The modified file is called *evaluate_object.cpp* and runs exactly as the original kitti evaluation.
|
||||
@ -1,230 +0,0 @@
|
||||
|
||||
# Perceiving Humans: from Monocular 3D Localization to Social Distancing
|
||||
|
||||
> Perceiving humans in the context of Intelligent Transportation Systems (ITS)
|
||||
often relies on multiple cameras or expensive LiDAR sensors.
|
||||
In this work, we present a new cost- effective vision-based method that perceives humans’ locations in 3D
|
||||
and their body orientation from a single image.
|
||||
We address the challenges related to the ill-posed monocular 3D tasks by proposing a deep learning method
|
||||
that predicts confidence intervals in contrast to point estimates. Our neural network architecture estimates
|
||||
humans 3D body locations and their orientation with a measure of uncertainty.
|
||||
Our vision-based system (i) is privacy-safe, (ii) works with any fixed or moving cameras,
|
||||
and (iii) does not rely on ground plane estimation.
|
||||
We demonstrate the performance of our method with respect to three applications:
|
||||
locating humans in 3D, detecting social interactions,
|
||||
and verifying the compliance of recent safety measures due to the COVID-19 outbreak.
|
||||
Indeed, we show that we can rethink the concept of “social distancing” as a form of social interaction
|
||||
in contrast to a simple location-based rule. We publicly share the source code towards an open science mission.
|
||||
|
||||
```
|
||||
@InProceedings{bertoni_social,
|
||||
author = {Bertoni, Lorenzo and Kreiss, Sven and Alahi, Alexandre},
|
||||
title={Perceiving Humans: from Monocular 3D Localization to Social Distancing},
|
||||
booktitle = {arXiv:2009.00984},
|
||||
month = {September},
|
||||
year = {2020}
|
||||
}
|
||||
```
|
||||

|
||||
|
||||
## Predictions
|
||||
For a quick setup download a pifpaf and a MonoLoco++ models from
|
||||
[here](https://drive.google.com/drive/folders/1jZToVMBEZQMdLB5BAIq2CdCLP5kzNo9t?usp=sharing) and save them into `data/models`.
|
||||
|
||||
### 3D Localization
|
||||
The predict script receives an image (or an entire folder using glob expressions),
|
||||
calls PifPaf for 2d human pose detection over the image
|
||||
and runs Monoloco++ for 3d location of the detected poses.
|
||||
The command `--net` defines if saving pifpaf outputs, MonoLoco++ outputs or MonStereo ones.
|
||||
You can check all commands for Pifpaf at [openpifpaf](https://github.com/vita-epfl/openpifpaf).
|
||||
|
||||
Output options include json files and/or visualization of the predictions on the image in *frontal mode*,
|
||||
*birds-eye-view mode* or *combined mode* and can be specified with `--output_types`
|
||||
|
||||
Ground-truth KITTI files for comparing results can be downloaded from
|
||||
[here](https://drive.google.com/drive/folders/1jZToVMBEZQMdLB5BAIq2CdCLP5kzNo9t?usp=sharing)
|
||||
(file called *names-kitti*) and should be saved into `data/arrays`
|
||||
Ground-truth files can also be generated, more info in the preprocessing section.
|
||||
|
||||
For an example image, run the following command:
|
||||
|
||||
```
|
||||
python -m monstereo.run predict \
|
||||
docs/002282.png \
|
||||
--net monoloco_pp \
|
||||
--output_types multi \
|
||||
--model data/models/monoloco_pp-201203-1424.pkl \
|
||||
--path_gt data/arrays/names-kitti-200615-1022.json \
|
||||
-o <output directory> \
|
||||
--long-edge <rescale the image by providing dimension of long side. If None original resolution>
|
||||
--n_dropout <50 to include epistemic uncertainty, 0 otherwise>
|
||||
```
|
||||
|
||||

|
||||
|
||||
To show all the instances estimated by MonoLoco add the argument `show_all` to the above command.
|
||||
|
||||

|
||||
|
||||
It is also possible to run [openpifpaf](https://github.com/vita-epfl/openpifpaf) directly
|
||||
by specifying the network with the argument `--net pifpaf`. All the other pifpaf arguments are also supported
|
||||
and can be checked with `python -m monstereo.run predict --help`.
|
||||
|
||||

|
||||
|
||||
### Focal Length and Camera Parameters
|
||||
Absolute distances are affected by the camera intrinsic parameters.
|
||||
When processing KITTI images, the network uses the provided intrinsic matrix of the dataset.
|
||||
In all the other cases, we use the parameters of nuScenes cameras, with "1/1.8'' CMOS sensors of size 7.2 x 5.4 mm.
|
||||
The default focal length is 5.7mm and this parameter can be modified using the argument `--focal`.
|
||||
|
||||
### Social Distancing
|
||||
To visualize social distancing compliance, simply add the argument `--social-distance` to the predict command.
|
||||
|
||||
An example from the Collective Activity Dataset is provided below.
|
||||
|
||||
<img src="frame0038.jpg" width="500"/>
|
||||
|
||||
To visualize social distancing run the below, command:
|
||||
```
|
||||
python -m monstereo.run predict \
|
||||
docs/frame0038.jpg \
|
||||
--net monoloco_pp \
|
||||
--social_distance \
|
||||
--output_types front bird --show_all \
|
||||
--model data/models/monoloco_pp-201203-1424.pkl -o <output directory>
|
||||
```
|
||||
<img src="out_frame0038.jpg.front.png" width="400"/>
|
||||
|
||||
|
||||
<img src="out_frame0038.jpg.bird.png" width="400"/>
|
||||
|
||||
Threshold distance and radii (for F-formations) can be set using `--threshold-dist` and `--radii`, respectively.
|
||||
|
||||
For more info, run:
|
||||
|
||||
`python -m monstereo.run predict --help`
|
||||
|
||||
### Orientation and Bounding Box dimensions
|
||||
MonoLoco++ estimates orientation and box dimensions as well. Results are saved in a json file when using the command
|
||||
`--output_types json`. At the moment, the only visualization including orientation is the social distancing one.
|
||||
|
||||
## Preprocessing
|
||||
|
||||
### Kitti
|
||||
Annotations from a pose detector needs to be stored in a folder.
|
||||
For example by using [openpifpaf](https://github.com/vita-epfl/openpifpaf):
|
||||
```
|
||||
python -m openpifpaf.predict \
|
||||
--glob "<kitti images directory>/*.png" \
|
||||
--json-output <directory to contain predictions>
|
||||
--checkpoint=shufflenetv2k30 \
|
||||
--instance-threshold=0.05 --seed-threshold 0.05 --force-complete-pose
|
||||
```
|
||||
Once the step is complete:
|
||||
`python -m monstereo.run prep --dir_ann <directory that contains predictions> --monocular`
|
||||
|
||||
|
||||
### Collective Activity Dataset
|
||||
To evaluate on of the [collective activity dataset](http://vhosts.eecs.umich.edu/vision//activity-dataset.html)
|
||||
(without any training) we selected 6 scenes that contain people talking to each other.
|
||||
This allows for a balanced dataset, but any other configuration will work.
|
||||
|
||||
THe expected structure for the dataset is the following:
|
||||
|
||||
collective_activity
|
||||
├── images
|
||||
├── annotations
|
||||
|
||||
where images and annotations inside have the following name convention:
|
||||
|
||||
IMAGES: seq<sequence_name>_frame<frame_name>.jpg
|
||||
ANNOTATIONS: seq<sequence_name>_annotations.txt
|
||||
|
||||
With respect to the original datasets the images and annotations are moved to a single folder
|
||||
and the sequence is added in their name. One command to do this is:
|
||||
|
||||
`rename -v -n 's/frame/seq14_frame/' f*.jpg`
|
||||
|
||||
which for example change the name of all the jpg images in that folder adding the sequence number
|
||||
(remove `-n` after checking it works)
|
||||
|
||||
Pifpaf annotations should also be saved in a single folder and can be created with:
|
||||
|
||||
```
|
||||
python -m openpifpaf.predict \
|
||||
--glob "data/collective_activity/images/*.jpg" \
|
||||
--checkpoint=shufflenetv2k30 \
|
||||
--instance-threshold=0.05 --seed-threshold 0.05 --force-complete-pose\
|
||||
--json-output /data/lorenzo-data/annotations/collective_activity/v012
|
||||
```
|
||||
|
||||
Finally, to evaluate activity using a MonoLoco++ pre-trained model trained either on nuSCENES or KITTI:
|
||||
```
|
||||
python -m monstereo.run eval --activity \
|
||||
--net monoloco_pp --dataset collective \
|
||||
--model <MonoLoco++ model path> --dir_ann <pifpaf annotations directory>
|
||||
```
|
||||
|
||||
## Training
|
||||
We train on KITTI or nuScenes dataset specifying the path of the input joints.
|
||||
|
||||
Our results are obtained with:
|
||||
|
||||
`python -m monstereo.run train --lr 0.001 --joints data/arrays/joints-kitti-201202-1743.json --save --monocular`
|
||||
|
||||
For a more extensive list of available parameters, run:
|
||||
|
||||
`python -m monstereo.run train --help`
|
||||
|
||||
## Evaluation
|
||||
|
||||
### 3D Localization
|
||||
We provide evaluation on KITTI for models trained on nuScenes or KITTI. We compare them with other monocular
|
||||
and stereo Baselines:
|
||||
|
||||
[MonoLoco](https://github.com/vita-epfl/monoloco),
|
||||
[Mono3D](https://www.cs.toronto.edu/~urtasun/publications/chen_etal_cvpr16.pdf),
|
||||
[3DOP](https://xiaozhichen.github.io/papers/nips15chen.pdf),
|
||||
[MonoDepth](https://arxiv.org/abs/1609.03677)
|
||||
[MonoPSR](https://github.com/kujason/monopsr) and our
|
||||
[MonoDIS](https://research.mapillary.com/img/publications/MonoDIS.pdf) and our
|
||||
[Geometrical Baseline](monoloco/eval/geom_baseline.py).
|
||||
|
||||
* **Mono3D**: download validation files from [here](http://3dimage.ee.tsinghua.edu.cn/cxz/mono3d)
|
||||
and save them into `data/kitti/m3d`
|
||||
* **3DOP**: download validation files from [here](https://xiaozhichen.github.io/)
|
||||
and save them into `data/kitti/3dop`
|
||||
* **MonoDepth**: compute an average depth for every instance using the following script
|
||||
[here](https://github.com/Parrotlife/pedestrianDepth-baseline/tree/master/MonoDepth-PyTorch)
|
||||
and save them into `data/kitti/monodepth`
|
||||
* **GeometricalBaseline**: A geometrical baseline comparison is provided.
|
||||
|
||||
The average geometrical value for comparison can be obtained running:
|
||||
```
|
||||
python -m monstereo.run eval
|
||||
--dir_ann <annotation directory>
|
||||
--model <model path>
|
||||
--net monoloco_pp
|
||||
--generate
|
||||
````
|
||||
|
||||
To include also geometric baselines and MonoLoco, add the flag ``--baselines``
|
||||
|
||||
<img src="quantitative_mono.png" width="550"/>
|
||||
|
||||
Adding the argument `save`, a few plots will be added including 3D localization error as a function of distance:
|
||||
<img src="results.png" width="600"/>
|
||||
|
||||
### Activity Estimation (Talking)
|
||||
Please follow preprocessing steps for Collective activity dataset and run pifpaf over the dataset images.
|
||||
Evaluation on this dataset is done with models trained on either KITTI or nuScenes.
|
||||
For optimal performances, we suggest the model trained on nuScenes teaser (TODO add link)
|
||||
```
|
||||
python -m monstereo.run eval
|
||||
--activity
|
||||
--dataset collective
|
||||
--net monoloco_pp
|
||||
--model <path to the model>
|
||||
--dir_ann <annotation directory>
|
||||
```
|
||||
Loading…
Reference in New Issue
Block a user