Fixed merge conflicts
17
.github/workflows/tests.yml
vendored
@ -5,7 +5,22 @@
|
||||
|
||||
name: Tests
|
||||
|
||||
on: [push, pull_request]
|
||||
on:
|
||||
push:
|
||||
paths:
|
||||
- 'monoloco/**'
|
||||
- 'test/**'
|
||||
- 'docs/00*.png'
|
||||
- 'docs/frame0032.jpg'
|
||||
- '.github/workflows/tests.yml'
|
||||
|
||||
pull_request:
|
||||
paths:
|
||||
- 'monoloco/**'
|
||||
- 'test/**'
|
||||
- 'docs/00*.png'
|
||||
- 'docs/frame0032.jpg'
|
||||
- '.github/workflows/tests.yml'
|
||||
|
||||
jobs:
|
||||
build:
|
||||
|
||||
130
README.md
@ -2,12 +2,14 @@
|
||||
Continuously tested on Linux, MacOS and Windows: [](https://github.com/vita-epfl/monoloco/actions?query=workflow%3ATests)
|
||||
|
||||
|
||||
<img src="docs/webcam.gif" width="700" alt="gif" />
|
||||
|
||||
<img src="docs/monoloco.gif" alt="gif" />
|
||||
|
||||
<br />
|
||||
<br />
|
||||
|
||||
This library is based on three research projects for monocular/stereo 3D human localization (detection), body orientation, and social distancing. Check the __video teaser__ of the library on [__YouTube__](https://www.youtube.com/watch?v=O5zhzi8mwJ4).
|
||||
|
||||
|
||||
---
|
||||
|
||||
> __MonStereo: When Monocular and Stereo Meet at the Tail of 3D Human Localization__<br />
|
||||
@ -34,7 +36,12 @@ __[Article](https://arxiv.org/abs/2009.00984)__ &nbs
|
||||
__[Article](https://arxiv.org/abs/1906.06059)__ __[Citation](#Citation)__ __[Video](https://www.youtube.com/watch?v=ii0fqerQrec)__
|
||||
|
||||
<img src="docs/surf.jpg" width="700"/>
|
||||
|
||||
|
||||
## Library Overview
|
||||
Visual illustration of the library components:
|
||||
|
||||
<img src="docs/monoloco.gif" width="700" alt="gif" />
|
||||
|
||||
## License
|
||||
All projects are built upon [Openpifpaf](https://github.com/vita-epfl/openpifpaf) for the 2D keypoints and share the AGPL Licence.
|
||||
|
||||
@ -52,6 +59,7 @@ For quick installation, do not clone this repository, make sure there is no fold
|
||||
|
||||
```
|
||||
pip3 install monoloco
|
||||
pip3 install matplotlib
|
||||
```
|
||||
|
||||
For development of the source code itself, you need to clone this repository and then:
|
||||
@ -102,27 +110,6 @@ When processing KITTI images, the network uses the provided intrinsic matrix of
|
||||
In all the other cases, we use the parameters of nuScenes cameras, with "1/1.8'' CMOS sensors of size 7.2 x 5.4 mm.
|
||||
The default focal length is 5.7mm and this parameter can be modified using the argument `--focal`.
|
||||
|
||||
## Webcam
|
||||
|
||||
You can use the webcam as input by using the `--webcam` argument. By default the `--z_max` is set to 10 while using the webcam and the `--long-edge` is set to 144. If multiple webcams are plugged in you can choose between them using `--camera`, for instance to use the second camera you can add `--camera 1`.
|
||||
we can see a few examples below, obtained we the following commands :
|
||||
|
||||
For the first and last visualization:
|
||||
```
|
||||
python -m monoloco.run predict \
|
||||
--webcam \
|
||||
--activities raise_hand
|
||||
```
|
||||
For the second one :
|
||||
```
|
||||
python -m monoloco.run predict \
|
||||
--webcam \
|
||||
--activities raise_hand social_distance
|
||||
```
|
||||
|
||||

|
||||
|
||||
With `social_distance` in `--activities`, only the keypoints will be shown, with no image, allowing total anonimity.
|
||||
|
||||
## A) 3D Localization
|
||||
|
||||
@ -138,7 +125,7 @@ If you provide a ground-truth json file to compare the predictions of the networ
|
||||
For an example image, run the following command:
|
||||
|
||||
```sh
|
||||
python -m monoloco.run predict docs/002282.png \
|
||||
python3 -m monoloco.run predict docs/002282.png \
|
||||
--path_gt names-kitti-200615-1022.json \
|
||||
-o <output directory> \
|
||||
--long-edge <rescale the image by providing dimension of long side>
|
||||
@ -153,7 +140,7 @@ To show all the instances estimated by MonoLoco add the argument `--show_all` to
|
||||
|
||||
It is also possible to run [openpifpaf](https://github.com/vita-epfl/openpifpaf) directly
|
||||
by using `--mode keypoints`. All the other pifpaf arguments are also supported
|
||||
and can be checked with `python -m monoloco.run predict --help`.
|
||||
and can be checked with `python3 -m monoloco.run predict --help`.
|
||||
|
||||

|
||||
|
||||
@ -191,7 +178,7 @@ To visualize social distancing compliance, simply add the argument `social_dista
|
||||
Threshold distance and radii (for F-formations) can be set using `--threshold-dist` and `--radii`, respectively.
|
||||
|
||||
For more info, run:
|
||||
`python -m monoloco.run predict --help`
|
||||
`python3 -m monoloco.run predict --help`
|
||||
|
||||
**Examples** <br>
|
||||
An example from the Collective Activity Dataset is provided below.
|
||||
@ -201,66 +188,79 @@ An example from the Collective Activity Dataset is provided below.
|
||||
To visualize social distancing run the below, command:
|
||||
|
||||
```sh
|
||||
python -m monoloco.run predict docs/frame0032.jpg \
|
||||
pip3 install scipy
|
||||
```
|
||||
|
||||
```sh
|
||||
python3 -m monoloco.run predict docs/frame0032.jpg \
|
||||
--activities social_distance --output_types front bird
|
||||
```
|
||||
|
||||
<img src="docs/out_frame0032_front_bird.jpg" width="700"/>
|
||||
|
||||
## C) Raise hand detection
|
||||
To detect a risen hand, you can add `raise_hand` to `--activities`.
|
||||
## C) Hand-raising detection
|
||||
To detect raised hand, you can add the argument `--activities raise_hand` to the prediction command.
|
||||
|
||||
For example, the below image is obtained with:
|
||||
```sh
|
||||
python3 -m monoloco.run predict docs/raising_hand.jpg \
|
||||
--activities raise_hand social_distance --output_types front
|
||||
```
|
||||
|
||||
<img src="docs/out_raising_hand.jpg.front.jpg" width="500"/>
|
||||
|
||||
For more info, run:
|
||||
`python -m monoloco.run predict --help`
|
||||
|
||||
**Examples** <br>
|
||||
|
||||
The command below:
|
||||
```
|
||||
python -m monoloco.run predict .\docs\raising_hand.jpg \
|
||||
--output_types front \
|
||||
--activities raise_hand
|
||||
```
|
||||
Yields the following:
|
||||
|
||||

|
||||
|
||||
`python3 -m monoloco.run predict --help`
|
||||
|
||||
## D) Orientation and Bounding Box dimensions
|
||||
The network estimates orientation and box dimensions as well. Results are saved in a json file when using the command
|
||||
`--output_types json`. At the moment, the only visualization including orientation is the social distancing one.
|
||||
<br />
|
||||
|
||||
## Training
|
||||
## E) Webcam
|
||||
You can use the webcam as input by using the `--webcam` argument. By default the `--z_max` is set to 10 while using the webcam and the `--long-edge` is set to 144. If multiple webcams are plugged in you can choose between them using `--camera`, for instance to use the second camera you can add `--camera 1`.
|
||||
You also need to install `opencv-python` to use this feature :
|
||||
```sh
|
||||
pip3 install opencv-python
|
||||
```
|
||||
Example command:
|
||||
|
||||
```sh
|
||||
python3 -m monoloco.run predict --webcam \
|
||||
--activities raise_hand social_distance
|
||||
```
|
||||
|
||||
# Training
|
||||
We train on the KITTI dataset (MonoLoco/Monoloco++/MonStereo) or the nuScenes dataset (MonoLoco) specifying the path of the json file containing the input joints. Please download them [here](https://drive.google.com/drive/folders/1j0riwbS9zuEKQ_3oIs_dWlYBnfuN2WVN?usp=sharing) or follow [preprocessing instructions](#Preprocessing).
|
||||
|
||||
Results for [MonoLoco++](###Tables) are obtained with:
|
||||
|
||||
```
|
||||
python -m monoloco.run train --joints data/arrays/joints-kitti-mono-210422-1600.json
|
||||
```sh
|
||||
python3 -m monoloco.run train --joints data/arrays/joints-kitti-mono-210422-1600.json
|
||||
```
|
||||
|
||||
While for the [MonStereo](###Tables) results run:
|
||||
|
||||
```sh
|
||||
python -m monoloco.run train --joints data/arrays/joints-kitti-stereo-210422-1601.json --lr 0.003 --mode stereo
|
||||
python3 -m monoloco.run train --joints data/arrays/joints-kitti-stereo-210422-1601.json \
|
||||
--lr 0.003 --mode stereo
|
||||
```
|
||||
|
||||
If you are interested in the original results of the MonoLoco ICCV article (now improved with MonoLoco++), please refer to the tag v0.4.9 in this repository.
|
||||
|
||||
Finally, for a more extensive list of available parameters, run:
|
||||
|
||||
`python -m monstereo.run train --help`
|
||||
`python3 -m monstereo.run train --help`
|
||||
|
||||
<br />
|
||||
|
||||
## Preprocessing
|
||||
# Preprocessing
|
||||
Preprocessing and training step are already fully supported by the code provided,
|
||||
but require first to run a pose detector over
|
||||
all the training images and collect the annotations.
|
||||
The code supports this option (by running the predict script and using `--mode keypoints`).
|
||||
|
||||
### Data structure
|
||||
## Data structure
|
||||
|
||||
data
|
||||
├── outputs
|
||||
@ -275,7 +275,7 @@ mkdir outputs arrays kitti
|
||||
```
|
||||
|
||||
|
||||
### Kitti Dataset
|
||||
## Kitti Dataset
|
||||
Download kitti images (from left and right cameras), ground-truth files (labels), and calibration files from their [website](http://www.cvlibs.net/datasets/kitti/eval_object.php?obj_benchmark=3d) and save them inside the `data` folder as shown below.
|
||||
|
||||
data
|
||||
@ -289,7 +289,7 @@ Download kitti images (from left and right cameras), ground-truth files (labels)
|
||||
The network takes as inputs 2D keypoints annotations. To create them run PifPaf over the saved images:
|
||||
|
||||
```sh
|
||||
python -m openpifpaf.predict \
|
||||
python3 -m openpifpaf.predict \
|
||||
--glob "data/kitti/images/*.png" \
|
||||
--json-output <directory to contain predictions> \
|
||||
--checkpoint=shufflenetv2k30 \
|
||||
@ -311,15 +311,15 @@ Once this step is complete, the below commands transform all the annotations int
|
||||
|
||||
For MonoLoco++:
|
||||
```sh
|
||||
python -m monoloco.run prep --dir_ann <directory that contains annotations>
|
||||
python3 -m monoloco.run prep --dir_ann <directory that contains annotations>
|
||||
```
|
||||
|
||||
For MonStereo:
|
||||
```sh
|
||||
python -m monoloco.run prep --mode stereo --dir_ann <directory that contains left annotations>
|
||||
python3 -m monoloco.run prep --mode stereo --dir_ann <directory that contains left annotations>
|
||||
```
|
||||
|
||||
### Collective Activity Dataset
|
||||
## Collective Activity Dataset
|
||||
To evaluate on of the [collective activity dataset](http://vhosts.eecs.umich.edu/vision//activity-dataset.html)
|
||||
(without any training) we selected 6 scenes that contain people talking to each other.
|
||||
This allows for a balanced dataset, but any other configuration will work.
|
||||
@ -346,7 +346,7 @@ which for example change the name of all the jpg images in that folder adding th
|
||||
Pifpaf annotations should also be saved in a single folder and can be created with:
|
||||
|
||||
```sh
|
||||
python -m openpifpaf.predict \
|
||||
python3 -m openpifpaf.predict \
|
||||
--glob "data/collective_activity/images/*.jpg" \
|
||||
--checkpoint=shufflenetv2k30 \
|
||||
--instance-threshold=0.05 --seed-threshold 0.05 \--force-complete-pose \
|
||||
@ -354,9 +354,9 @@ python -m openpifpaf.predict \
|
||||
```
|
||||
|
||||
|
||||
## Evaluation
|
||||
# Evaluation
|
||||
|
||||
### 3D Localization
|
||||
## 3D Localization
|
||||
We provide evaluation on KITTI for models trained on nuScenes or KITTI. Download the ground-truths of KITTI dataset and the calibration files from their [website](http://www.cvlibs.net/datasets/kitti/eval_object.php?obj_benchmark=3d). Save the training labels (one .txt file for each image) into the folder `data/kitti/gt` and the camera calibration matrices (one .txt file for each image) into `data/kitti/calib`.
|
||||
To evaluate a pre-trained model, download the latest models from [here](https://drive.google.com/drive/u/0/folders/1kQpaTcDsiNyY6eh1kUurcpptfAXkBjAJ) and save them into `data/outputs.
|
||||
|
||||
@ -386,7 +386,7 @@ To include also geometric baselines and MonoLoco, download a monoloco model, sav
|
||||
The evaluation file will run the model over all the annotations and compare the results with KITTI ground-truth and the downloaded baselines. For this run:
|
||||
|
||||
```sh
|
||||
python -m monoloco.run eval \
|
||||
python3 -m monoloco.run eval \
|
||||
--dir_ann <annotation directory> \
|
||||
--model data/outputs/monoloco_pp-210422-1601.pkl \
|
||||
--generate \
|
||||
@ -395,14 +395,14 @@ python -m monoloco.run eval \
|
||||
|
||||
For stereo results add `--mode stereo` and select `--model=monstereo-210422-1620.pkl`. Below, the resulting table of results and an example of the saved figures.
|
||||
|
||||
### Tables
|
||||
## Tables
|
||||
|
||||
<img src="docs/quantitative.jpg" width="700"/>
|
||||
|
||||
<img src="docs/results_monstereo.jpg" width="700"/>
|
||||
|
||||
|
||||
### Relative Average Precision Localization: RALP-5% (MonStereo)
|
||||
## Relative Average Precision Localization: RALP-5% (MonStereo)
|
||||
|
||||
We modified the original C++ evaluation of KITTI to make it relative to distance. We use **cmake**.
|
||||
To run the evaluation, first generate the txt file with the standard command for evaluation (above).
|
||||
@ -410,20 +410,20 @@ Then follow the instructions of this [repository](https://github.com/cguindel/ev
|
||||
to prepare the folders accordingly (or follow kitti guidelines) and run evaluation.
|
||||
The modified file is called *evaluate_object.cpp* and runs exactly as the original kitti evaluation.
|
||||
|
||||
### Activity Estimation (Talking)
|
||||
## Activity Estimation (Talking)
|
||||
Please follow preprocessing steps for Collective activity dataset and run pifpaf over the dataset images.
|
||||
Evaluation on this dataset is done with models trained on either KITTI or nuScenes.
|
||||
For optimal performances, we suggest the model trained on nuScenes teaser.
|
||||
|
||||
```sh
|
||||
python -m monstereo.run eval \
|
||||
python3 -m monstereo.run eval \
|
||||
--activity \
|
||||
--dataset collective \
|
||||
--model <path to the model> \
|
||||
--dir_ann <annotation directory>
|
||||
```
|
||||
|
||||
## Citation
|
||||
# Citation
|
||||
When using this library in your research, we will be happy if you cite us!
|
||||
|
||||
```
|
||||
|
||||
0
docs/000840.png
Executable file → Normal file
|
Before Width: | Height: | Size: 736 KiB After Width: | Height: | Size: 736 KiB |
0
docs/000840_right.png
Executable file → Normal file
|
Before Width: | Height: | Size: 732 KiB After Width: | Height: | Size: 732 KiB |
0
docs/002282.png
Executable file → Normal file
|
Before Width: | Height: | Size: 831 KiB After Width: | Height: | Size: 831 KiB |
BIN
docs/out_raising_hand.jpg.front.jpg
Normal file
|
After Width: | Height: | Size: 544 KiB |
@ -248,7 +248,7 @@ def is_raising_hand(kp):
|
||||
if is_right_risen:
|
||||
return 'right'
|
||||
|
||||
return 'none'
|
||||
return None
|
||||
|
||||
|
||||
def check_f_formations(idx, idx_t, centers, angles, radii, social_distance=False):
|
||||
@ -308,8 +308,6 @@ def show_activities(args, image_t, output_path, annotations, dic_out):
|
||||
if 'social_distance' in args.activities:
|
||||
colors = social_distance_colors(colors, dic_out)
|
||||
|
||||
print("Size of the image :", image_t.size)
|
||||
|
||||
angles = dic_out['angles']
|
||||
stds = dic_out['stds_ale']
|
||||
xz_centers = [[xx[0], xx[2]] for xx in dic_out['xyz_pred']]
|
||||
|
||||
@ -18,7 +18,6 @@ import torch
|
||||
import PIL
|
||||
import openpifpaf
|
||||
import openpifpaf.datasets as datasets
|
||||
from openpifpaf.predict import processor_factory, preprocess_factory
|
||||
from openpifpaf import decoder, network, visualizer, show, logger
|
||||
try:
|
||||
import gdown
|
||||
@ -53,14 +52,17 @@ def get_torch_checkpoints_dir():
|
||||
|
||||
def download_checkpoints(args):
|
||||
torch_dir = get_torch_checkpoints_dir()
|
||||
os.makedirs(torch_dir, exist_ok=True)
|
||||
if args.checkpoint is None:
|
||||
os.makedirs(torch_dir, exist_ok=True)
|
||||
pifpaf_model = os.path.join(torch_dir, 'shufflenetv2k30-201104-224654-cocokp-d75ed641.pkl')
|
||||
print(pifpaf_model)
|
||||
else:
|
||||
pifpaf_model = args.checkpoint
|
||||
dic_models = {'keypoints': pifpaf_model}
|
||||
if not os.path.exists(pifpaf_model):
|
||||
assert DOWNLOAD is not None, "pip install gdown to download pifpaf model, or pass it as --checkpoint"
|
||||
assert DOWNLOAD is not None, \
|
||||
"pip install gdown to download a pifpaf model, or pass the model path as --checkpoint"
|
||||
LOG.info('Downloading OpenPifPaf model in %s', torch_dir)
|
||||
DOWNLOAD(OPENPIFPAF_MODEL, pifpaf_model, quiet=False)
|
||||
|
||||
@ -74,7 +76,7 @@ def download_checkpoints(args):
|
||||
assert not args.social_distance, "Social distance not supported in stereo modality"
|
||||
path = MONSTEREO_MODEL
|
||||
name = 'monstereo-201202-1212.pkl'
|
||||
elif (args.activities and 'social_distance' in args.activities) or args.webcam:
|
||||
elif ('social_distance' in args.activities) or args.webcam:
|
||||
path = MONOLOCO_MODEL_NU
|
||||
name = 'monoloco_pp-201207-1350.pkl'
|
||||
else:
|
||||
@ -85,7 +87,9 @@ def download_checkpoints(args):
|
||||
print(name)
|
||||
dic_models[args.mode] = model
|
||||
if not os.path.exists(model):
|
||||
assert DOWNLOAD is not None, "pip install gdown to download monoloco model, or pass it as --model"
|
||||
os.makedirs(torch_dir, exist_ok=True)
|
||||
assert DOWNLOAD is not None, \
|
||||
"pip install gdown to download a monoloco model, or pass the model path as --model"
|
||||
LOG.info('Downloading model in %s', torch_dir)
|
||||
DOWNLOAD(path, model, quiet=False)
|
||||
return dic_models
|
||||
@ -166,12 +170,11 @@ def predict(args):
|
||||
casr=args.casr,
|
||||
casr_model=args.casr_model)
|
||||
|
||||
# data
|
||||
processor, pifpaf_model = processor_factory(args)
|
||||
preprocess = preprocess_factory(args)
|
||||
# for openpifpaf predicitons
|
||||
predictor = openpifpaf.Predictor(checkpoint=args.checkpoint)
|
||||
|
||||
# data
|
||||
data = datasets.ImageList(args.images, preprocess=preprocess)
|
||||
data = datasets.ImageList(args.images, preprocess=predictor.preprocess)
|
||||
if args.mode == 'stereo':
|
||||
assert len(
|
||||
data.image_paths) % 2 == 0, "Odd number of images in a stereo setting"
|
||||
@ -180,22 +183,19 @@ def predict(args):
|
||||
data, batch_size=args.batch_size, shuffle=False,
|
||||
pin_memory=False, collate_fn=datasets.collate_images_anns_meta)
|
||||
|
||||
for batch_i, (image_tensors_batch, _, meta_batch) in enumerate(data_loader):
|
||||
pred_batch = processor.batch(
|
||||
pifpaf_model, image_tensors_batch, device=args.device)
|
||||
for batch_i, (_, _, meta_batch) in enumerate(data_loader):
|
||||
|
||||
# unbatch (only for MonStereo)
|
||||
for idx, (pred, meta) in enumerate(zip(pred_batch, meta_batch)):
|
||||
for idx, (preds, _, meta) in enumerate(predictor.dataset(data)):
|
||||
LOG.info('batch %d: %s', batch_i, meta['file_name'])
|
||||
pred = [ann.inverse_transform(meta) for ann in pred]
|
||||
|
||||
# Load image and collect pifpaf results
|
||||
if idx == 0:
|
||||
with open(meta_batch[0]['file_name'], 'rb') as f:
|
||||
cpu_image = PIL.Image.open(f).convert('RGB')
|
||||
pifpaf_outs = {
|
||||
'pred': pred,
|
||||
'left': [ann.json_data() for ann in pred],
|
||||
'pred': preds,
|
||||
'left': [ann.json_data() for ann in preds],
|
||||
'image': cpu_image}
|
||||
|
||||
# Set output image name
|
||||
@ -212,7 +212,7 @@ def predict(args):
|
||||
|
||||
# Only for MonStereo
|
||||
else:
|
||||
pifpaf_outs['right'] = [ann.json_data() for ann in pred]
|
||||
pifpaf_outs['right'] = [ann.json_data() for ann in preds]
|
||||
|
||||
# 3D Predictions
|
||||
if args.mode != 'keypoints':
|
||||
@ -229,15 +229,14 @@ def predict(args):
|
||||
dic_out = net.forward(keypoints, kk)
|
||||
dic_out = net.post_process(
|
||||
dic_out, boxes, keypoints, kk, dic_gt)
|
||||
if args.activities:
|
||||
if 'social_distance' in args.activities:
|
||||
dic_out = net.social_distance(dic_out, args)
|
||||
if 'raise_hand' in args.activities:
|
||||
dic_out = net.raising_hand(dic_out, keypoints)
|
||||
if 'using_phone' in args.activities:
|
||||
dic_out = net.using_phone(dic_out, keypoints)
|
||||
if 'is_turning' in args.activities:
|
||||
dic_out = net.turning_forward(dic_out, keypoints)
|
||||
if 'social_distance' in args.activities:
|
||||
dic_out = net.social_distance(dic_out, args)
|
||||
if 'raise_hand' in args.activities:
|
||||
dic_out = net.raising_hand(dic_out, keypoints)
|
||||
if 'using_phone' in args.activities:
|
||||
dic_out = net.using_phone(dic_out, keypoints)
|
||||
if 'is_turning' in args.activities:
|
||||
dic_out = net.turning_forward(dic_out, keypoints)
|
||||
else:
|
||||
LOG.info("Prediction with MonStereo")
|
||||
_, keypoints_r = preprocess_pifpaf(pifpaf_outs['right'], im_size)
|
||||
@ -264,7 +263,7 @@ def factory_outputs(args, pifpaf_outs, dic_out, output_path, kk=None):
|
||||
else:
|
||||
assert 'json' in args.output_types or args.mode == 'keypoints', \
|
||||
"No output saved, please select one among front, bird, multi, json, or pifpaf arguments"
|
||||
if args.activities and 'social_distance' in args.activities:
|
||||
if 'social_distance' in args.activities:
|
||||
assert args.mode == 'mono', "Social distancing only works with monocular network"
|
||||
|
||||
if args.mode == 'keypoints':
|
||||
|
||||
@ -51,7 +51,7 @@ def cli():
|
||||
|
||||
# Monoloco
|
||||
predict_parser.add_argument('--activities', nargs='+', choices=['raise_hand', 'social_distance', 'using_phone', 'is_turning'],
|
||||
help='Choose activities to show: social_distance, raise_hand')
|
||||
help='Choose activities to show: social_distance, raise_hand', default=[])
|
||||
predict_parser.add_argument('--mode', help='keypoints, mono, stereo', default='mono')
|
||||
predict_parser.add_argument('--model', help='path of MonoLoco/MonStereo model to load')
|
||||
predict_parser.add_argument('--casr_model', help='path of casr model to load')
|
||||
|
||||
@ -16,7 +16,11 @@ import sys
|
||||
import time
|
||||
from itertools import chain
|
||||
|
||||
import matplotlib.pyplot as plt
|
||||
try:
|
||||
import matplotlib.pyplot as plt
|
||||
except ImportError:
|
||||
plt = None
|
||||
|
||||
import torch
|
||||
from torch.utils.data import DataLoader
|
||||
from torch.optim import lr_scheduler
|
||||
@ -328,6 +332,10 @@ class Trainer:
|
||||
if not self.print_loss:
|
||||
return
|
||||
os.makedirs(self.dir_figures, exist_ok=True)
|
||||
|
||||
if plt is None:
|
||||
raise Exception('please install matplotlib')
|
||||
|
||||
for idx, phase in enumerate(epoch_losses):
|
||||
for idx_2, el in enumerate(epoch_losses['train']):
|
||||
plt.figure(idx + idx_2)
|
||||
|
||||
@ -125,7 +125,7 @@ def show_spread(dic_stats, clusters, net, dir_fig, show=False, save=False):
|
||||
def show_task_error(dir_fig, show, save):
|
||||
"""Task error figure"""
|
||||
plt.figure(3, figsize=FIGSIZE)
|
||||
xx = np.linspace(0.1, 50, 100)
|
||||
xx = np.linspace(0.1, 40, 100)
|
||||
mu_men = 178
|
||||
mu_women = 165
|
||||
mu_child_m = 164
|
||||
@ -145,8 +145,9 @@ def show_task_error(dir_fig, show, save):
|
||||
plt.plot(xx, yy_gender, '--', color='lightgreen', linewidth=2.8, label='Generic adult (task error)')
|
||||
plt.plot(xx, yy_female, '-.', linewidth=1.7, color='darkorange', label='Adult female')
|
||||
plt.plot(xx, yy_male, '-.', linewidth=1.7, color='b', label='Adult male')
|
||||
plt.plot(xx, yy_stereo, linewidth=1.7, color='k', label='Pixel error')
|
||||
plt.plot(xx, yy_stereo, linewidth=2.5, color='k', label='Pixel error')
|
||||
plt.xlim(np.min(xx), np.max(xx))
|
||||
plt.ylim(0, 5)
|
||||
plt.xlabel("Ground-truth distance from the camera $d_{gt}$ [m]")
|
||||
plt.ylabel("Localization error $\hat{e}$ due to human height variation [m]") # pylint: disable=W1401
|
||||
plt.legend(loc=(0.01, 0.55)) # Location from 0 to 1 from lower left
|
||||
|
||||
@ -11,14 +11,10 @@ import math
|
||||
import numpy as np
|
||||
from PIL import Image
|
||||
|
||||
try:
|
||||
import matplotlib
|
||||
import matplotlib.pyplot as plt
|
||||
from matplotlib.patches import Circle, FancyArrow
|
||||
import scipy.ndimage as ndimage
|
||||
except ImportError:
|
||||
ndimage = None
|
||||
plt = None
|
||||
import matplotlib
|
||||
import matplotlib.pyplot as plt
|
||||
from matplotlib.patches import Circle, FancyArrow
|
||||
import scipy.ndimage as ndimage
|
||||
|
||||
|
||||
COCO_PERSON_SKELETON = [
|
||||
@ -49,6 +45,10 @@ def image_canvas(image, fig_file=None, show=True, dpi_factor=1.0, fig_width=10.0
|
||||
if 'figsize' not in kwargs:
|
||||
kwargs['figsize'] = (fig_width, fig_width * image.size[1] / image.size[0])
|
||||
|
||||
if plt is None:
|
||||
raise Exception('please install matplotlib')
|
||||
if ndimage is None:
|
||||
raise Exception('please install scipy')
|
||||
fig = plt.figure(**kwargs)
|
||||
ax = plt.Axes(fig, [0.0, 0.0, 1.0, 1.0])
|
||||
ax.set_axis_off()
|
||||
@ -128,13 +128,12 @@ class KeypointPainter:
|
||||
c = color
|
||||
linewidth = self.linewidth
|
||||
|
||||
if activities:
|
||||
if 'raise_hand' in activities:
|
||||
c, linewidth = highlighted_arm(x, y, connection, c, linewidth,
|
||||
dic_out['raising_hand'][:][i], size=size)
|
||||
if 'is_turning' in activities:
|
||||
c, linewidth = highlighted_arm(x, y, connection, c, linewidth,
|
||||
dic_out['turning'][:][i], size=size)
|
||||
if 'raise_hand' in activities:
|
||||
c, linewidth = highlighted_arm(x, y, connection, c, linewidth,
|
||||
dic_out['raising_hand'][:][i], size=size)
|
||||
if 'is_turning' in activities:
|
||||
c, linewidth = highlighted_arm(x, y, connection, c, linewidth,
|
||||
dic_out['turning'][:][i], size=size)
|
||||
|
||||
if self.color_connections:
|
||||
c = matplotlib.cm.get_cmap('tab20')(ci / len(self.skeleton))
|
||||
|
||||
@ -37,7 +37,7 @@ def image_attributes(dpi, output_types):
|
||||
mono=dict(color='red',
|
||||
numcolor='firebrick',
|
||||
linewidth=2 * c)
|
||||
)
|
||||
)
|
||||
|
||||
|
||||
class Printer:
|
||||
@ -113,6 +113,10 @@ class Printer:
|
||||
|
||||
def factory_axes(self, dic_out):
|
||||
"""Create axes for figures: front bird multi"""
|
||||
|
||||
if self.webcam:
|
||||
plt.style.use('dark_background')
|
||||
|
||||
axes = []
|
||||
figures = []
|
||||
|
||||
@ -190,14 +194,9 @@ class Printer:
|
||||
else:
|
||||
scores=None
|
||||
|
||||
if activities:
|
||||
keypoint_painter.keypoints(
|
||||
axis, keypoint_sets, size=self.im.size,
|
||||
scores=scores, colors=colors, activities=activities, dic_out=dic_out)
|
||||
|
||||
else:
|
||||
keypoint_painter.keypoints(
|
||||
axis, keypoint_sets, size=self.im.size, colors=colors, scores=scores)
|
||||
keypoint_painter.keypoints(
|
||||
axis, keypoint_sets, size=self.im.size,
|
||||
scores=scores, colors=colors, activities=activities, dic_out=dic_out)
|
||||
|
||||
draw_orientation(axis, self.centers,
|
||||
sizes, self.angles, colors, mode='front')
|
||||
@ -219,7 +218,8 @@ class Printer:
|
||||
def _bird_loop(self, iterator, axes, colors, number):
|
||||
for idx in iterator:
|
||||
if any(xx in self.output_types for xx in ['bird', 'multi']) and self.zz_pred[idx] > 0:
|
||||
draw_orientation(axes[1], self.xz_centers, [], self.angles, colors, mode='bird')
|
||||
draw_orientation(axes[1], self.xz_centers[:len(iterator)], [],
|
||||
self.angles[:len(iterator)], colors, mode='bird')
|
||||
# Draw ground truth and uncertainty
|
||||
self._draw_uncertainty(axes, idx)
|
||||
|
||||
@ -232,9 +232,8 @@ class Printer:
|
||||
def draw(self, figures, axes, image, dic_out=None, annotations=None):
|
||||
|
||||
colors = ['deepskyblue' for _ in self.uv_heads]
|
||||
if self.activities:
|
||||
if 'social_distance' in self.activities:
|
||||
colors = social_distance_colors(colors, dic_out)
|
||||
if 'social_distance' in self.activities:
|
||||
colors = social_distance_colors(colors, dic_out)
|
||||
|
||||
# whether to include instances that don't match the ground-truth
|
||||
iterator = range(len(self.zz_pred)) if self.show_all else range(len(self.zz_gt))
|
||||
@ -246,7 +245,7 @@ class Printer:
|
||||
if any(xx in self.output_types for xx in ['front', 'multi']):
|
||||
number['flag'] = True # add numbers
|
||||
# Remove image if social distance is activated
|
||||
if not self.activities or 'social_distance' not in self.activities:
|
||||
if 'social_distance' not in self.activities:
|
||||
self.mpl_im0.set_data(image)
|
||||
|
||||
self._front_loop(iterator, axes, number, colors, annotations, dic_out)
|
||||
@ -427,17 +426,19 @@ class Printer:
|
||||
ax.get_yaxis().set_visible(False)
|
||||
|
||||
else:
|
||||
line_style = 'w--' if self.webcam else 'k--'
|
||||
uv_max = [0., float(self.height)]
|
||||
xyz_max = pixel_to_camera(uv_max, self.kk, self.z_max)
|
||||
x_max = abs(xyz_max[0]) # shortcut to avoid oval circles in case of different kk
|
||||
corr = round(float(x_max / 3))
|
||||
ax.plot([0, x_max], [0, self.z_max], 'k--')
|
||||
ax.plot([0, -x_max], [0, self.z_max], 'k--')
|
||||
ax.plot([0, x_max], [0, self.z_max], line_style)
|
||||
ax.plot([0, -x_max], [0, self.z_max], line_style)
|
||||
ax.set_xlim(-x_max + corr, x_max - corr)
|
||||
ax.set_ylim(0, self.z_max + 1)
|
||||
ax.set_xlabel("X [m]")
|
||||
ax.set_box_aspect(.8)
|
||||
plt.xlim((-x_max, x_max))
|
||||
if self.webcam:
|
||||
ax.set_box_aspect(.8)
|
||||
plt.xlim((-x_max, x_max))
|
||||
plt.xticks(fontsize=self.attr['fontsize_ax'])
|
||||
plt.yticks(fontsize=self.attr['fontsize_ax'])
|
||||
return ax
|
||||
|
||||
@ -17,9 +17,9 @@ try:
|
||||
except ImportError:
|
||||
cv2 = None
|
||||
|
||||
import openpifpaf
|
||||
from openpifpaf import decoder, network, visualizer, show, logger
|
||||
import openpifpaf.datasets as datasets
|
||||
from openpifpaf.predict import processor_factory, preprocess_factory
|
||||
|
||||
from ..visuals import Printer
|
||||
from ..network import Loco
|
||||
@ -73,6 +73,7 @@ def factory_from_args(args):
|
||||
def webcam(args):
|
||||
|
||||
assert args.mode in 'mono'
|
||||
assert cv2
|
||||
|
||||
args, dic_models = factory_from_args(args)
|
||||
|
||||
@ -80,8 +81,8 @@ def webcam(args):
|
||||
net = Loco(model=dic_models[args.mode], mode=args.mode, device=args.device,
|
||||
n_dropout=args.n_dropout, p_dropout=args.dropout)
|
||||
|
||||
processor, pifpaf_model = processor_factory(args)
|
||||
preprocess = preprocess_factory(args)
|
||||
# for openpifpaf predicitons
|
||||
predictor = openpifpaf.Predictor(checkpoint=args.checkpoint)
|
||||
|
||||
# Start recording
|
||||
cam = cv2.VideoCapture(args.camera)
|
||||
@ -93,28 +94,25 @@ def webcam(args):
|
||||
scale = (args.long_edge)/frame.shape[0]
|
||||
image = cv2.resize(frame, None, fx=scale, fy=scale)
|
||||
height, width, _ = image.shape
|
||||
print('resized image size: {}'.format(image.shape))
|
||||
LOG.debug('resized image size: {}'.format(image.shape))
|
||||
image = cv2.cvtColor(image, cv2.COLOR_BGR2RGB)
|
||||
pil_image = Image.fromarray(image)
|
||||
|
||||
data = datasets.PilImageList(
|
||||
[pil_image], preprocess=preprocess)
|
||||
[pil_image], preprocess=predictor.preprocess)
|
||||
|
||||
data_loader = torch.utils.data.DataLoader(
|
||||
data, batch_size=1, shuffle=False,
|
||||
pin_memory=False, collate_fn=datasets.collate_images_anns_meta)
|
||||
|
||||
for (image_tensors_batch, _, meta_batch) in data_loader:
|
||||
pred_batch = processor.batch(
|
||||
pifpaf_model, image_tensors_batch, device=args.device)
|
||||
for (_, _, _) in data_loader:
|
||||
|
||||
for idx, (pred, meta) in enumerate(zip(pred_batch, meta_batch)):
|
||||
pred = [ann.inverse_transform(meta) for ann in pred]
|
||||
for idx, (preds, _, _) in enumerate(predictor.dataset(data)):
|
||||
|
||||
if idx == 0:
|
||||
pifpaf_outs = {
|
||||
'pred': pred,
|
||||
'left': [ann.json_data() for ann in pred],
|
||||
'pred': preds,
|
||||
'left': [ann.json_data() for ann in preds],
|
||||
'image': image}
|
||||
|
||||
if not ret:
|
||||
@ -122,7 +120,7 @@ def webcam(args):
|
||||
key = cv2.waitKey(1)
|
||||
if key % 256 == 27:
|
||||
# ESC pressed
|
||||
print("Escape hit, closing...")
|
||||
LOG.info("Escape hit, closing...")
|
||||
break
|
||||
|
||||
kk, dic_gt = factory_for_gt(pil_image.size, focal_length=args.focal)
|
||||
@ -132,20 +130,19 @@ def webcam(args):
|
||||
dic_out = net.forward(keypoints, kk)
|
||||
dic_out = net.post_process(dic_out, boxes, keypoints, kk, dic_gt)
|
||||
|
||||
if args.activities:
|
||||
if 'social_distance' in args.activities:
|
||||
dic_out = net.social_distance(dic_out, args)
|
||||
if 'raise_hand' in args.activities:
|
||||
dic_out = net.raising_hand(dic_out, keypoints)
|
||||
if 'social_distance' in args.activities:
|
||||
dic_out = net.social_distance(dic_out, args)
|
||||
if 'raise_hand' in args.activities:
|
||||
dic_out = net.raising_hand(dic_out, keypoints)
|
||||
if visualizer_mono is None: # it is, at the beginning
|
||||
visualizer_mono = Visualizer(kk, args)(pil_image) # create it with the first image
|
||||
visualizer_mono.send(None)
|
||||
|
||||
print(dic_out)
|
||||
LOG.debug(dic_out)
|
||||
visualizer_mono.send((pil_image, dic_out, pifpaf_outs))
|
||||
|
||||
end = time.time()
|
||||
print("run-time: {:.2f} ms".format((end-start)*1000))
|
||||
LOG.info("run-time: {:.2f} ms".format((end-start)*1000))
|
||||
|
||||
cam.release()
|
||||
|
||||
|
||||