update license

2021-03-23 10:22:21 +01:00 · 2021-03-23 10:22:21 +01:00 · be6a5e6734
commit be6a5e6734
parent c40fe6bf89
4 changed files with 2 additions and 383 deletions
--- a/2
+++ b/2
@ -1,4 +1,4 @@
-Copyright 2020-2021 by EPFL/VITA. All rights reserved.
+Copyright 2018-2021 by EPFL/VITA. All rights reserved.

 This project and all its files are licensed under
 GNU AGPLv3 or later version.
--- a/README.md
+++ b/README.md
@ -355,7 +355,7 @@ When using this library in your research, we will be happy if you cite us!
@InProceedings{bertoni_2021_icra,
    author = {Bertoni, Lorenzo and Kreiss, Sven and Mordan, Taylor and Alahi, Alexandre},
    title = {MonStereo: When Monocular and Stereo Meet at the Tail of 3D Human Localization},
-    booktitle = {International Conference on Robotics and Automation},
+    booktitle = {International Conference on Robotics and Automation (ICRA)},
    year = {2021}
 }
 ```
--- a/docs/MonStereo.md
+++ b/docs/MonStereo.md
@ -1,151 +0,0 @@
-
-# MonStereo
-
-
- > Monocular and stereo vision are cost-effective solutions for 3D human localization 
- in the context of self-driving cars or social robots. However, they are usually developed independently 
- and have their respective strengths and limitations. We propose a novel unified learning framework that 
- leverages the strengths of both monocular and stereo cues for 3D human localization. 
- Our method jointly (i) associates humans in left-right images, 
- (ii) deals with occluded and distant cases in stereo settings by relying on the robustness of monocular cues, 
- and (iii) tackles the intrinsic ambiguity of monocular perspective projection by exploiting prior knowledge 
- of human height distribution.
-We achieve state-of-the-art quantitative results for the 3D localization task on KITTI dataset 
-and estimate confidence intervals that account for challenging instances. 
-We show qualitative examples for the long tail challenges such as occluded, 
-far-away, and children instances. 
-
-```
-@InProceedings{bertoni_monstereo,
-author = {Bertoni, Lorenzo and Kreiss, Sven and Mordan, Taylor and Alahi, Alexandre},
-title = {MonStereo: When Monocular and Stereo Meet at the Tail of 3D Human Localization},
-booktitle = {arXiv:2008.10913},
-month = {August},
-year = {2020}
-}
-```
-              
-# Prediction
-The predict script receives an image (or an entire folder using glob expressions), 
-calls PifPaf for 2d human pose detection over the image
-and runs MonStereo for 3d location of the detected poses.
-
-Output options include json files and/or visualization of the predictions on the image in *frontal mode*, 
-*birds-eye-view mode* or *multi mode* and can be specified with `--output_types`
-
-
-### Pre-trained Models
-* Download Monstereo pre-trained model from 
-[Google Drive](https://drive.google.com/drive/folders/1jZToVMBEZQMdLB5BAIq2CdCLP5kzNo9t?usp=sharing),
-and save them in `data/models` 
-(default) or in any folder and call it through the command line option `--model <model path>`
-* Pifpaf pre-trained model will be automatically downloaded at the first run. 
-Three standard, pretrained models are available when using the command line option 
-`--checkpoint resnet50`, `--checkpoint resnet101` and `--checkpoint resnet152`.
-Alternatively, you can download a Pifpaf pre-trained model from [openpifpaf](https://github.com/vita-epfl/openpifpaf)
- and call it with `--checkpoint  <pifpaf model path>`. All experiments have been run with v0.8 of pifpaf.
-  If you'd like to use an updated version, we suggest to re-train the MonStereo model as well.
-* The model for the experiments is provided in *data/models/ms-200710-1511.pkl*
-
-
-### Ground truth matching
-* In case you provide a ground-truth json file to compare the predictions of MonSter,
- the script will match every detection using Intersection over Union metric. 
- The ground truth file can be generated using the subparser `prep` and called with the command `--path_gt`. 
-As this step requires running the pose detector over all the training images and save the annotations, we 
-provide the resulting json file for the category *pedestrians* from 
-[Google Drive](https://drive.google.com/file/d/1e-wXTO460ip_Je2NdXojxrOrJ-Oirlgh/view?usp=sharing) 
-and save it into `data/arrays`.
- 
-* In case the ground-truth json file is not available, with the command `--show_all`, is possible to 
-show all the prediction for the image
-
-After downloading model and ground-truth file, a demo can be tested with the following commands:
-
-`python3 -m monstereo.run predict --glob docs/000840*.png --output_types multi --scale 2
- --model data/models/ms-200710-1511.pkl --z_max 30 --checkpoint resnet152 --path_gt data/arrays/names-kitti-200615-1022.json
- -o data/output`
- 
-![Crowded scene](out_000840.jpg)
-
-`python3 -m monstereo.run predict --glob docs/005523*.png --output_types multi --scale 2
- --model data/models/ms-200710-1511.pkl --z_max 30 --checkpoint resnet152 --path_gt data/arrays/names-kitti-200615-1022.json
- -o data/output`
-
-![Occluded hard example](out_005523.jpg)
-
-# Preprocessing
-Preprocessing and training step are already fully supported by the code provided, 
-but require first to run a pose detector over
-all the training images and collect the annotations. 
-The code supports this option (by running the predict script and using `--mode pifpaf`).
-
-### Data structure
-
-    Data         
-    ├── arrays                 
-    ├── models
-    ├── kitti
-    ├── logs
-    ├── output
-    
-Run the following to create the folders:
-```
-mkdir data
-cd data
-mkdir arrays models kitti logs output
-```
-
-
-### Datasets
-Download KITTI ground truth files and camera calibration matrices for training
-from [here](http://www.cvlibs.net/datasets/kitti/eval_object.php?obj_benchmark=3d) and
-save them respectively into `data/kitti/gt` and `data/kitti/calib`. 
-To extract pifpaf joints, you also need to download training images soft link the folder in `
-data/kitti/images`
-
-
-### Annotations to preprocess
-MonStereo is trained using 2D human pose joints. To obtain the joints the first step is to run 
-pifaf over KITTI training images, by either running the predict script and using `--mode pifpaf`,
- or by using pifpaf code directly.
-MonStereo preprocess script expects annotations from left and right images in 2 different folders 
-with the same path apart from the suffix `_right`  for the ``right" folder. 
-For example `data/annotations` and `data/annotations_right`. 
- Do not change name of json files created by pifpaf. For each left annotation, 
- the code will look for the corresponding right annotation.
-
-### Inputs joints for training
-MonoStereo is trained using 2D human pose joints matched with the ground truth location provided by
-KITTI Dataset. To create the joints run: `python3 -m monstereo.run prep` specifying:
-
-`--dir_ann` annotation directory containing Pifpaf joints of KITTI for the left images.
-
-
-### Ground truth file for evaluation
-The preprocessing script also outputs a second json file called **names-<date-time>.json** which provide a dictionary indexed
-by the image name to easily access ground truth files for evaluation and prediction purposes.
-
-
-# Training
-Provide the json file containing the preprocess joints as argument. 
-As simple as `python3 -m monstereo.run train --joints <json file path> `
-All the hyperparameters options can be checked at `python3 -m monstereo.run train --help`.
-
-# Evaluation (KITTI Dataset)
-### Average Localization Metric (ALE)
-We provide evaluation on KITTI in the eval section. Txt files for MonStereo are generated with the command:
-
-`python -m monstereo.run eval --dir_ann <directory of pifpaf annotations> --model data/models/ms-200710-1511.pkl  --generate`
-
-<img src="quantitative_mono.png" width="600"/>
-
-### Relative Average Precision Localization (RALP-5%)
-We modified the original C++ evaluation of KITTI to make it relative to distance. We use **cmake**.
-To run the evaluation, first generate the txt files with:
-
-`python -m monstereo.run eval --dir_ann <directory of pifpaf annotations> --model data/models/ms-200710-1511.pkl  --generate`
-
-Then follow the instructions of this [repository](https://github.com/cguindel/eval_kitti) 
-to prepare the folders accordingly (or follow kitti guidelines) and run evaluation. 
-The modified file is called *evaluate_object.cpp* and runs exactly as the original kitti evaluation.
--- a/docs/MonoLoco++.md
+++ b/docs/MonoLoco++.md
@ -1,230 +0,0 @@
-
-# Perceiving Humans: from Monocular 3D Localization to Social Distancing
-
-> Perceiving humans in the context of Intelligent Transportation Systems (ITS) 
-often relies on multiple cameras or expensive LiDAR sensors. 
-In this work, we present a new cost- effective vision-based method that perceives humans’ locations in 3D 
-and their body orientation from a single image.
-We address the challenges related to the ill-posed monocular 3D tasks by proposing a deep learning method 
-that predicts confidence intervals in contrast to point estimates. Our neural network architecture estimates 
-humans 3D body locations and their orientation with a measure of uncertainty. 
-Our vision-based system (i) is privacy-safe, (ii) works with any fixed or moving cameras,
- and (iii) does not rely on ground plane estimation. 
- We demonstrate the performance of our method with respect to three applications: 
- locating humans in 3D, detecting social interactions, 
- and verifying the compliance of recent safety measures due to the COVID-19 outbreak. 
- Indeed, we show that we can rethink the concept of “social distancing” as a form of social interaction 
- in contrast to a simple location-based rule. We publicly share the source code towards an open science mission.
-
-```
-@InProceedings{bertoni_social,
-author = {Bertoni, Lorenzo and Kreiss, Sven and Alahi, Alexandre},
-title={Perceiving Humans: from Monocular 3D Localization to Social Distancing}, 
-booktitle = {arXiv:2009.00984},
-month = {September},
-year = {2020}
-}
-```
-![social distancing](social_distancing.jpg)
-
-##  Predictions
-For a quick setup download a pifpaf and a MonoLoco++ models from 
-[here](https://drive.google.com/drive/folders/1jZToVMBEZQMdLB5BAIq2CdCLP5kzNo9t?usp=sharing)  and save them into `data/models`.
-
-### 3D Localization
-The predict script receives an image (or an entire folder using glob expressions), 
-calls PifPaf for 2d human pose detection over the image
-and runs Monoloco++ for 3d location of the detected poses.
-The command `--net` defines if saving pifpaf outputs, MonoLoco++ outputs or MonStereo ones.
-You can check all commands for Pifpaf at [openpifpaf](https://github.com/vita-epfl/openpifpaf).
-
-Output options include json files and/or visualization of the predictions on the image in *frontal mode*, 
-*birds-eye-view mode* or *combined mode* and can be specified with `--output_types`
-
-Ground-truth KITTI files for comparing results can be downloaded from 
-[here](https://drive.google.com/drive/folders/1jZToVMBEZQMdLB5BAIq2CdCLP5kzNo9t?usp=sharing) 
-(file called *names-kitti*) and should be saved into `data/arrays`
-Ground-truth files can also be generated, more info in the preprocessing section.
-
-For an example image, run the following command:
-
-```
-python -m monstereo.run predict \
-docs/002282.png \
--net monoloco_pp \
--output_types multi \
--model data/models/monoloco_pp-201203-1424.pkl \
--path_gt data/arrays/names-kitti-200615-1022.json \
-o <output directory> \
--long-edge <rescale the image by providing dimension of long side. If None original resolution>
--n_dropout <50 to include epistemic uncertainty, 0 otherwise>
-```
-
-![predict](out_002282.png.multi.jpg)
-
-To show all the instances estimated by MonoLoco add the argument `show_all` to the above command.
-
-![predict_all](out_002282.png.multi_all.jpg)
-
-It is also possible to run [openpifpaf](https://github.com/vita-epfl/openpifpaf) directly
-by specifying the network with the argument `--net pifpaf`. All the other pifpaf arguments are also supported 
-and can be checked with `python -m monstereo.run predict --help`.
-
-![predict_all](out_002282_pifpaf.jpg)
-
-### Focal Length and Camera Parameters
-Absolute distances are affected by the camera intrinsic parameters. 
-When processing KITTI images, the network uses the provided intrinsic matrix of the dataset. 
-In all the other cases, we use the parameters of nuScenes cameras, with "1/1.8'' CMOS sensors of size 7.2 x 5.4 mm.
-The default focal length is 5.7mm and this parameter can be modified using the argument `--focal`.
-
-### Social Distancing
-To visualize social distancing compliance, simply add the argument `--social-distance` to the predict command.
-
-An example from the Collective Activity Dataset is provided below.
-
-<img src="frame0038.jpg" width="500"/>
-
-To visualize social distancing run the below, command:
-```
-python -m monstereo.run predict \
-docs/frame0038.jpg \
--net monoloco_pp  \
--social_distance \
--output_types front bird --show_all \
--model data/models/monoloco_pp-201203-1424.pkl -o <output directory> 
-```
-<img src="out_frame0038.jpg.front.png" width="400"/>
-
-
-<img src="out_frame0038.jpg.bird.png" width="400"/>
-
-Threshold distance and radii (for F-formations) can be set using `--threshold-dist` and `--radii`, respectively.
-
-For more info, run:
-
-`python -m monstereo.run predict --help`
-
-### Orientation and Bounding Box dimensions
-MonoLoco++ estimates orientation and box dimensions as well. Results are saved in a json file when using the command 
-`--output_types json`. At the moment, the only visualization including orientation is the social distancing one.
-
-## Preprocessing
-
-### Kitti
-Annotations from a pose detector needs to be stored in a folder.
-For example by using [openpifpaf](https://github.com/vita-epfl/openpifpaf):
-```
-python -m openpifpaf.predict \
--glob "<kitti images directory>/*.png" \
--json-output <directory to contain predictions> 
--checkpoint=shufflenetv2k30 \
--instance-threshold=0.05 --seed-threshold 0.05 --force-complete-pose 
-```
-Once the step is complete:
-`python -m monstereo.run prep --dir_ann <directory that contains predictions> --monocular`
-
-
-### Collective Activity Dataset
-To evaluate on of the [collective activity dataset](http://vhosts.eecs.umich.edu/vision//activity-dataset.html)
- (without any training) we selected 6 scenes that contain people talking to each other. 
- This allows for a balanced dataset, but any other configuration will work. 
-
-THe expected structure for the dataset is the following:
-
-    collective_activity         
-    ├── images                 
-    ├── annotations
-    
-where images and annotations inside have the following name convention:
-
-IMAGES: seq<sequence_name>_frame<frame_name>.jpg
-ANNOTATIONS: seq<sequence_name>_annotations.txt
-
-With respect to the original datasets the images and annotations are moved to a single folder 
-and the sequence is added in their name. One command to do this is:
-
-`rename -v -n 's/frame/seq14_frame/'  f*.jpg`
-
-which for example change the name of all the jpg images in that folder adding the sequence number
- (remove `-n` after checking it works)
-
-Pifpaf annotations should also be saved in a single folder and can be created with:
-
-```
-python -m openpifpaf.predict \
--glob "data/collective_activity/images/*.jpg"  \
--checkpoint=shufflenetv2k30 \
--instance-threshold=0.05 --seed-threshold 0.05 --force-complete-pose\
--json-output /data/lorenzo-data/annotations/collective_activity/v012 
-```
-
-Finally, to evaluate activity using a MonoLoco++ pre-trained model trained either on nuSCENES or KITTI:
-```
-python -m monstereo.run eval --activity \ 
--net monoloco_pp --dataset collective \
--model <MonoLoco++ model path>  --dir_ann <pifpaf annotations directory>
-```
-
-## Training
-We train on KITTI or nuScenes dataset specifying the path of the input joints.
-
-Our results are obtained with: 
-
-`python -m monstereo.run train --lr 0.001 --joints data/arrays/joints-kitti-201202-1743.json --save --monocular`
-
-For a more extensive list of available parameters, run:
-
-`python -m monstereo.run train --help`
-
-## Evaluation
-
-### 3D Localization
-We provide evaluation on KITTI for models trained on nuScenes or KITTI. We compare them with other monocular 
-and stereo Baselines: 
-
-[MonoLoco](https://github.com/vita-epfl/monoloco), 
-[Mono3D](https://www.cs.toronto.edu/~urtasun/publications/chen_etal_cvpr16.pdf), 
-[3DOP](https://xiaozhichen.github.io/papers/nips15chen.pdf), 
-[MonoDepth](https://arxiv.org/abs/1609.03677) 
-[MonoPSR](https://github.com/kujason/monopsr) and our 
-[MonoDIS](https://research.mapillary.com/img/publications/MonoDIS.pdf) and our 
-[Geometrical Baseline](monoloco/eval/geom_baseline.py).
-
-* **Mono3D**: download validation files from [here](http://3dimage.ee.tsinghua.edu.cn/cxz/mono3d) 
-and save them into `data/kitti/m3d`
-* **3DOP**: download validation files from [here](https://xiaozhichen.github.io/) 
-and save them into `data/kitti/3dop`
-* **MonoDepth**: compute an average depth for every instance using the following script 
-[here](https://github.com/Parrotlife/pedestrianDepth-baseline/tree/master/MonoDepth-PyTorch) 
-and save them into `data/kitti/monodepth`
-* **GeometricalBaseline**: A geometrical baseline comparison is provided. 
-
-The average geometrical value for comparison can be obtained running:
-```
-python -m monstereo.run eval 
--dir_ann <annotation directory> 
--model <model path> 
--net monoloco_pp 
--generate
-````
-
-To include also geometric baselines and MonoLoco, add the flag ``--baselines``
-
-<img src="quantitative_mono.png" width="550"/>
-
-Adding the argument `save`, a few plots will be added including 3D localization error as a function of distance:
-<img src="results.png" width="600"/>
-
-### Activity Estimation (Talking)
-Please follow preprocessing steps for Collective activity dataset and run pifpaf over the dataset images.
-Evaluation on this dataset is done with models trained on either KITTI or nuScenes. 
-For optimal performances, we suggest the model trained on nuScenes teaser (TODO add link)
-```
-python -m monstereo.run eval 
--activity 
--dataset collective
--net monoloco_pp
--model <path to the model>   
--dir_ann <annotation directory>
-```