欢迎来到 MMPose 中文文档!¶
您可以在页面左下角切换文档语言。
You can change the documentation language at the lower-left corner of the page.
安装¶
本文档提供了安装 MMPose 的相关步骤。
安装依赖包
准备环境
MMPose 的安装步骤
CPU 环境下的安装步骤
利用 Docker 镜像安装 MMPose
源码安装 MMPose
在多个 MMPose 版本下进行开发
安装依赖包¶
Linux (Windows 系统暂未有官方支持)
Python 3.6+
PyTorch 1.3+
CUDA 9.2+ (如果从源码编译 PyTorch,则可以兼容 CUDA 9.0 版本)
GCC 5+
mmcv 请安装最新版本的 mmcv-full
Numpy
cv2
json_tricks
可选项:
准备环境¶
a. 创建并激活 conda 虚拟环境,如:
conda create -n open-mmlab python=3.7 -y
conda activate open-mmlab
b. 参考 官方文档 安装 PyTorch 和 torchvision ,如:
conda install pytorch torchvision -c pytorch
注:确保 CUDA 的编译版本和 CUDA 的运行版本相匹配。 用户可以参照 PyTorch 官网 对预编译包所支持的 CUDA 版本进行核对。
例 1
:如果用户的 /usr/local/cuda
文件夹下已安装 CUDA 10.2 版本,并且想要安装 PyTorch 1.8.0 版本,
则需要安装 CUDA 10.2 下预编译的 PyTorch。
conda install pytorch==1.8.0 torchvision==0.9.0 cudatoolkit=10.2 -c pytorch
例 2
:如果用户的 /usr/local/cuda
文件夹下已安装 CUDA 9.2 版本,并且想要安装 PyTorch 1.7.0 版本,
则需要安装 CUDA 9.2 下预编译的 PyTorch。
conda install pytorch==1.7.0 torchvision==0.8.0 cudatoolkit=9.2 -c pytorch
如果 PyTorch 是由源码进行编译安装(而非直接下载预编译好的安装包),则可以使用更多的 CUDA 版本(如 9.0 版本)。
MMPose 的安装步骤¶
a. 安装最新版本的 mmcv-full。MMPose 推荐用户使用如下的命令安装预编译好的 mmcv。
pip install mmcv-full -f https://download.openmmlab.com/mmcv/dist/{cu_version}/{torch_version}/index.html
其中,命令里 url 的 {cu_version}
和 {torch_version}
变量需由用户进行指定。
例如,如果想要安装 CUDA 10.2
和 PyTorch 1.8.0
下的最新版 mmcv-full
,可使用以下命令:
pip install mmcv-full -f https://download.openmmlab.com/mmcv/dist/cu102/torch1.8.0/index.html
可查阅 这里 以参考不同版本的 MMCV 所兼容的 PyTorch 和 CUDA 版本。
另外,用户也可以通过使用以下命令从源码进行编译:
git clone https://github.com/open-mmlab/mmcv.git
cd mmcv
MMCV_WITH_OPS=1 pip install -e . # mmcv-full 包含一些 cuda 算子,执行该步骤会安装 mmcv-full(而非 mmcv)
# 或者使用 pip install -e . # 这个命令安装的 mmcv 将不包含 cuda ops,通常适配 CPU(无 GPU)环境
cd ..
注意:如果之前安装过 mmcv,那么需要先使用 pip uninstall mmcv
命令进行卸载。如果 mmcv 和 mmcv-full 同时被安装, 会报 ModuleNotFoundError
的错误。
b. 克隆 MMPose 库。
git clone https://github.com/open-mmlab/mmpose.git
cd mmpose
c. 安装依赖包和 MMPose。
pip install -r requirements.txt
pip install -v -e . # or "python setup.py develop"
如果是在 macOS 环境安装 MMPose,则需使用如下命令:
CC=clang CXX=clang++ CFLAGS='-stdlib=libc++' pip install -e .
d. 安装其他可选依赖。
如果用户不需要做相关任务,这部分步骤可以选择跳过。
可选项:
注意:
在步骤 c 中,git commit 的 id 将会被写到版本号中,如 0.6.0+2e7045c。这个版本号也会被保存到训练好的模型中。 这里推荐用户每次在步骤 b 中对本地代码和 github 上的源码进行同步。如果 C++/CUDA 代码被修改,就必须进行这一步骤。
根据上述步骤,MMPose 就会以
dev
模式被安装,任何本地的代码修改都会立刻生效,不需要再重新安装一遍(除非用户提交了 commits,并且想更新版本号)。如果用户想使用
opencv-python-headless
而不是opencv-python
,可再安装 MMCV 前安装opencv-python-headless
。如果 mmcv 已经被安装,用户需要使用
pip uninstall mmcv
命令进行卸载。如果 mmcv 和 mmcv-full 同时被安装, 会报ModuleNotFoundError
的错误。一些依赖包是可选的。运行
python setup.py develop
将只会安装运行代码所需的最小要求依赖包。 要想使用一些可选的依赖包,如smplx
,用户需要通过pip install -r requirements/optional.txt
进行安装, 或者通过调用pip
(如pip install -v -e .[optional]
,这里的[optional]
可替换为all
,tests
,build
或optional
) 指定安装对应的依赖包,如pip install -v -e .[tests,build]
。
源码安装 MMPose¶
这里提供了 conda 下安装 MMPose 并链接 COCO 数据集路径的完整脚本(假设 COCO 数据的路径在 $COCO_ROOT)。
conda create -n open-mmlab python=3.7 -y
conda activate open-mmlab
# 安装最新的,使用默认版本的 CUDA 版本(一般为最新版本)预编译的 PyTorch 包
conda install -c pytorch pytorch torchvision -y
# 安装 mmcv-full。其中,命令里 url 的 ``{cu_version}`` 和 ``{torch_version}`` 变量需由用户进行指定。
# 可查阅 [这里](https://github.com/open-mmlab/mmcv#installation) 以参考不同版本的 MMCV 所兼容的 PyTorch 和 CUDA 版本。
pip install mmcv-full -f https://download.openmmlab.com/mmcv/dist/{cu_version}/{torch_version}/index.html
# 安装 mmpose
git clone git@github.com:open-mmlab/mmpose.git
cd mmpose
pip install -r requirements.txt
python setup.py develop
mkdir data
ln -s $COCO_ROOT data/coco
利用 Docker 镜像安装 MMPose¶
MMPose 提供一个 Dockerfile 用户创建 docker 镜像。
# 创建拥有 PyTorch 1.6.0, CUDA 10.1, CUDNN 7 配置的 docker 镜像.
docker build -f ./docker/Dockerfile --rm -t mmpose .
注意:用户需要确保已经安装了 nvidia-container-toolkit。
运行以下命令:
docker run --gpus all\
--shm-size=8g \
-it -v {DATA_DIR}:/mmpose/data mmpose
在多个 MMPose 版本下进行开发¶
MMPose 的训练和测试脚本已经修改了 PYTHONPATH
变量,以确保其能够运行当前目录下的 MMPose。
如果想要运行环境下默认的 MMPose,用户需要在训练和测试脚本中去除这一行:
PYTHONPATH="$(dirname $0)/..":$PYTHONPATH
基础教程¶
本文档提供 MMPose 的基础使用教程。请先参阅 安装指南,进行 MMPose 的安装。
准备数据集
使用预训练模型进行推理
测试某个数据集
运行演示
如何训练模型
使用单个 GPU 训练
使用 CPU 训练
使用多个 GPU 训练
使用多台机器训练
使用单台机器启动多个任务
基准测试
进阶教程
准备数据集¶
MMPose 支持各种不同的任务。请根据需要,查阅对应的数据集准备教程。
使用预训练模型进行推理¶
MMPose 提供了一些测试脚本用于测试数据集上的指标(如 COCO, MPII 等), 并提供了一些高级 API,使您可以轻松使用 MMPose。
测试某个数据集¶
[x] 单 GPU 测试
[x] CPU 测试
[x] 单节点多 GPU 测试
[x] 多节点测试
用户可使用以下命令测试数据集
# 单 GPU 测试
python tools/test.py ${CONFIG_FILE} ${CHECKPOINT_FILE} [--out ${RESULT_FILE}] [--fuse-conv-bn] \
[--eval ${EVAL_METRICS}] [--gpu_collect] [--tmpdir ${TMPDIR}] [--cfg-options ${CFG_OPTIONS}] \
[--launcher ${JOB_LAUNCHER}] [--local_rank ${LOCAL_RANK}]
# CPU 测试:禁用 GPU 并运行测试脚本
export CUDA_VISIBLE_DEVICES=-1
python tools/test.py ${CONFIG_FILE} ${CHECKPOINT_FILE} [--out ${RESULT_FILE}] \
[--eval ${EVAL_METRICS}]
# 多 GPU 测试
./tools/dist_test.sh ${CONFIG_FILE} ${CHECKPOINT_FILE} ${GPU_NUM} [--out ${RESULT_FILE}] [--eval ${EVAL_METRICS}] \
[--gpu-collect] [--tmpdir ${TMPDIR}] [--options ${OPTIONS}] [--average-clips ${AVG_TYPE}] \
[--launcher ${JOB_LAUNCHER}] [--local_rank ${LOCAL_RANK}]
此处的 CHECKPOINT_FILE
可以是本地的模型权重文件的路径,也可以是模型的下载链接。
可选参数:
RESULT_FILE
:输出结果文件名。如果没有被指定,则不会保存测试结果。--fuse-conv-bn
: 是否融合 BN 和 Conv 层。该操作会略微提升模型推理速度。EVAL_METRICS
:测试指标。其可选值与对应数据集相关,如mAP
,适用于 COCO 等数据集,PCK
AUC
EPE
适用于 OneHand10K 等数据集等。--gpu-collect
:如果被指定,姿态估计结果将会通过 GPU 通信进行收集。否则,它将被存储到不同 GPU 上的TMPDIR
文件夹中,并在 rank 0 的进程中被收集。TMPDIR
:用于存储不同进程收集的结果文件的临时文件夹。该变量仅当--gpu-collect
没有被指定时有效。CFG_OPTIONS
:覆盖配置文件中的一些实验设置。比如,可以设置’–cfg-options model.backbone.depth=18 model.backbone.with_cp=True’,在线修改配置文件内容。JOB_LAUNCHER
:分布式任务初始化启动器选项。可选值有none
,pytorch
,slurm
,mpi
。特别地,如果被设置为none
, 则会以非分布式模式进行测试。LOCAL_RANK
:本地 rank 的 ID。如果没有被指定,则会被设置为 0。
例子:
假定用户将下载的模型权重文件放置在 checkpoints/
目录下。
在 COCO 数据集下测试 ResNet50(不存储测试结果为文件),并验证
mAP
指标./tools/dist_test.sh configs/body/2d_kpt_sview_rgb_img/topdown_heatmap/coco/res50_coco_256x192.py \ checkpoints/SOME_CHECKPOINT.pth 1 \ --eval mAP
使用 8 块 GPU 在 COCO 数据集下测试 ResNet。在线下载模型权重,并验证
mAP
指标。./tools/dist_test.sh configs/body/2d_kpt_sview_rgb_img/topdown_heatmap/coco/res50_coco_256x192.py \ https://download.openmmlab.com/mmpose/top_down/resnet/res50_coco_256x192-ec54d7f3_20200709.pth 8 \ --eval mAP
在 slurm 分布式环境中测试 ResNet50 在 COCO 数据集下的
mAP
指标./tools/slurm_test.sh slurm_partition test_job \ configs/body/2d_kpt_sview_rgb_img/topdown_heatmap/coco/res50_coco_256x192.py \ checkpoints/SOME_CHECKPOINT.pth \ --eval mAP
运行演示¶
我们提供了丰富的脚本,方便大家快速运行演示。 下面是 多人人体姿态估计 的演示示例,此处我们使用了人工标注的人体框作为输入。
python demo/top_down_img_demo.py \
${MMPOSE_CONFIG_FILE} ${MMPOSE_CHECKPOINT_FILE} \
--img-root ${IMG_ROOT} --json-file ${JSON_FILE} \
--out-img-root ${OUTPUT_DIR} \
[--show --device ${GPU_ID}] \
[--kpt-thr ${KPT_SCORE_THR}]
例子:
python demo/top_down_img_demo.py \
configs/body/2d_kpt_sview_rgb_img/topdown_heatmap/coco/hrnet_w48_coco_256x192.py \
https://download.openmmlab.com/mmpose/top_down/hrnet/hrnet_w48_coco_256x192-b9e0b3ab_20200708.pth \
--img-root tests/data/coco/ --json-file tests/data/coco/test_coco.json \
--out-img-root vis_results
如何训练模型¶
MMPose 使用 MMDistributedDataParallel
进行分布式训练,使用 MMDataParallel
进行非分布式训练。
对于单机多卡与多台机器的情况,MMPose 使用分布式训练。假设服务器有 8 块 GPU,则会启动 8 个进程,并且每台 GPU 对应一个进程。
每个进程拥有一个独立的模型,以及对应的数据加载器和优化器。 模型参数同步只发生于最开始。之后,每经过一次前向与后向计算,所有 GPU 中梯度都执行一次 allreduce 操作,而后优化器将更新模型参数。 由于梯度执行了 allreduce 操作,因此不同 GPU 中模型参数将保持一致。
训练配置¶
所有的输出(日志文件和模型权重文件)会被将保存到工作目录下。工作目录通过配置文件中的参数 work_dir
指定。
默认情况下,MMPose 在每轮训练轮后会在验证集上评估模型,可以通过在训练配置中修改 interval
参数来更改评估间隔
evaluation = dict(interval=5) # 每 5 轮训练进行一次模型评估
根据 Linear Scaling Rule,当 GPU 数量或每个 GPU 上的视频批大小改变时,用户可根据批大小按比例地调整学习率,如,当 4 GPUs x 2 video/gpu 时,lr=0.01;当 16 GPUs x 4 video/gpu 时,lr=0.08。
使用单个 GPU 训练¶
python tools/train.py ${CONFIG_FILE} [optional arguments]
如果用户想在命令中指定工作目录,则需要增加参数 --work-dir ${YOUR_WORK_DIR}
使用 CPU 训练¶
使用 CPU 训练的流程和使用单 GPU 训练的流程一致,我们仅需要在训练流程开始前禁用 GPU。
export CUDA_VISIBLE_DEVICES=-1
之后运行单 GPU 训练脚本即可。
注意:
我们不推荐用户使用 CPU 进行训练,这太过缓慢。我们支持这个功能是为了方便用户在没有 GPU 的机器上进行调试。
使用多个 GPU 训练¶
./tools/dist_train.sh ${CONFIG_FILE} ${GPU_NUM} [optional arguments]
可选参数为:
--work-dir ${WORK_DIR}
:覆盖配置文件中指定的工作目录。--resume-from ${CHECKPOINT_FILE}
:从以前的模型权重文件恢复训练。--no-validate
: 在训练过程中,不进行验证。--gpus ${GPU_NUM}
:使用的 GPU 数量,仅适用于非分布式训练。--gpu-ids ${GPU_IDS}
:使用的 GPU ID,仅适用于非分布式训练。--seed ${SEED}
:设置 python,numpy 和 pytorch 里的种子 ID,已用于生成随机数。--deterministic
:如果被指定,程序将设置 CUDNN 后端的确定化选项。--cfg-options CFG_OPTIONS
:覆盖配置文件中的一些实验设置。比如,可以设置’–cfg-options model.backbone.depth=18 model.backbone.with_cp=True’,在线修改配置文件内容。--launcher ${JOB_LAUNCHER}
:分布式任务初始化启动器选项。可选值有none
,pytorch
,slurm
,mpi
。特别地,如果被设置为none
, 则会以非分布式模式进行测试。--autoscale-lr
:根据 Linear Scaling Rule,当 GPU 数量或每个 GPU 上的视频批大小改变时,用户可根据批大小按比例地调整学习率。LOCAL_RANK
:本地 rank 的 ID。如果没有被指定,则会被设置为 0。
resume-from
和 load-from
的区别:
resume-from
加载模型参数和优化器状态,并且保留检查点所在的训练轮数,常被用于恢复意外被中断的训练。
load-from
只加载模型参数,但训练轮数从 0 开始计数,常被用于微调模型。
这里提供一个使用 8 块 GPU 加载 ResNet50 模型权重文件的例子。
./tools/dist_train.sh configs/body/2d_kpt_sview_rgb_img/topdown_heatmap/coco/res50_coco_256x192.py 8 --resume_from work_dirs/res50_coco_256x192/latest.pth
使用多台机器训练¶
如果用户在 slurm 集群上运行 MMPose,可使用 slurm_train.sh
脚本。(该脚本也支持单台机器上训练)
[GPUS=${GPUS}] ./tools/slurm_train.sh ${PARTITION} ${JOB_NAME} ${CONFIG_FILE} [--work-dir ${WORK_DIR}]
这里给出一个在 slurm 集群上的 dev 分区使用 16 块 GPU 训练 ResNet50 的例子。
使用 GPUS_PER_NODE=8
参数来指定一个有 8 块 GPUS 的 slurm 集群节点,使用 CPUS_PER_TASK=2
来指定每个任务拥有2块cpu。
GPUS=16 GPUS_PER_NODE=8 CPUS_PER_TASK=2 ./tools/slurm_train.sh Test res50 configs/body/2d_kpt_sview_rgb_img/topdown_heatmap/coco/res50_coco_256x192.py work_dirs/res50_coco_256x192
用户可以查看 slurm_train.sh 文件来检查完整的参数和环境变量。
如果用户的多台机器通过 Ethernet 连接,则可以参考 pytorch launch utility。如果用户没有高速网络,如 InfiniBand,速度将会非常慢。
使用单台机器启动多个任务¶
如果用使用单台机器启动多个任务,如在有 8 块 GPU 的单台机器上启动 2 个需要 4 块 GPU 的训练任务,则需要为每个任务指定不同端口,以避免通信冲突。
如果用户使用 dist_train.sh
脚本启动训练任务,则可以通过以下命令指定端口
CUDA_VISIBLE_DEVICES=0,1,2,3 PORT=29500 ./tools/dist_train.sh ${CONFIG_FILE} 4
CUDA_VISIBLE_DEVICES=4,5,6,7 PORT=29501 ./tools/dist_train.sh ${CONFIG_FILE} 4
如果用户在 slurm 集群下启动多个训练任务,则需要修改配置文件(通常是配置文件的第 4 行)中的 dist_params
变量,以设置不同的通信端口。
在 config1.py
中,
dist_params = dict(backend='nccl', port=29500)
在 config2.py
中,
dist_params = dict(backend='nccl', port=29501)
之后便可启动两个任务,分别对应 config1.py
和 config2.py
。
CUDA_VISIBLE_DEVICES=0,1,2,3 GPUS=4 ./tools/slurm_train.sh ${PARTITION} ${JOB_NAME} config1.py [--work-dir ${WORK_DIR}]
CUDA_VISIBLE_DEVICES=4,5,6,7 GPUS=4 ./tools/slurm_train.sh ${PARTITION} ${JOB_NAME} config2.py [--work-dir ${WORK_DIR}]
进阶教程¶
目前, MMPose 提供了以下更详细的教程:
示例¶
2D Animal Pose Demo¶
2D Animal Pose Image Demo¶
Using gt hand bounding boxes as input¶
We provide a demo script to test a single image, given gt json file.
Pose Model Preparation: The pre-trained pose estimation model can be downloaded from model zoo. Take macaque model as an example:
python demo/top_down_img_demo.py \
${MMPOSE_CONFIG_FILE} ${MMPOSE_CHECKPOINT_FILE} \
--img-root ${IMG_ROOT} --json-file ${JSON_FILE} \
--out-img-root ${OUTPUT_DIR} \
[--show --device ${GPU_ID or CPU}] \
[--kpt-thr ${KPT_SCORE_THR}]
Examples:
python demo/top_down_img_demo.py \
configs/animal/2d_kpt_sview_rgb_img/topdown_heatmap/macaque/res50_macaque_256x192.py \
https://download.openmmlab.com/mmpose/animal/resnet/res50_macaque_256x192-98f1dd3a_20210407.pth \
--img-root tests/data/macaque/ --json-file tests/data/macaque/test_macaque.json \
--out-img-root vis_results
To run demos on CPU:
python demo/top_down_img_demo.py \
configs/animal/2d_kpt_sview_rgb_img/topdown_heatmap/macaque/res50_macaque_256x192.py \
https://download.openmmlab.com/mmpose/animal/resnet/res50_macaque_256x192-98f1dd3a_20210407.pth \
--img-root tests/data/macaque/ --json-file tests/data/macaque/test_macaque.json \
--out-img-root vis_results \
--device=cpu
2D Animal Pose Video Demo¶
We also provide video demos to illustrate the results.
Using the full image as input¶
If the video is cropped with the object centered in the screen, we can simply use the full image as the model input (without object detection).
python demo/top_down_video_demo_full_frame_without_det.py \
${MMPOSE_CONFIG_FILE} ${MMPOSE_CHECKPOINT_FILE} \
--video-path ${VIDEO_FILE} \
--out-video-root ${OUTPUT_VIDEO_ROOT} \
[--show --device ${GPU_ID or CPU}] \
[--kpt-thr ${KPT_SCORE_THR}]
Examples:
python demo/top_down_video_demo_full_frame_without_det.py \
configs/animal/2d_kpt_sview_rgb_img/topdown_heatmap/fly/res152_fly_192x192.py \
https://download.openmmlab.com/mmpose/animal/resnet/res152_fly_192x192-fcafbd5a_20210407.pth \
--video-path demo/resources/<demo_fly_video.avi> \
--out-video-root vis_results
Using MMDetection to detect animals¶
Assume that you have already installed mmdet.
COCO-animals
In COCO dataset, there are 80 object categories, including 10 common animal
categories (15: ‘bird’, 16: ‘cat’, 17: ‘dog’, 18: ‘horse’, 19: ‘sheep’, 20: ‘cow’, 21: ‘elephant’, 22: ‘bear’, 23: ‘zebra’, 24: ‘giraffe’)
For these COCO-animals, please download the COCO pre-trained detection model from MMDetection Model Zoo.
python demo/top_down_video_demo_with_mmdet.py \
${MMDET_CONFIG_FILE} ${MMDET_CHECKPOINT_FILE} \
${MMPOSE_CONFIG_FILE} ${MMPOSE_CHECKPOINT_FILE} \
--video-path ${VIDEO_FILE} \
--out-video-root ${OUTPUT_VIDEO_ROOT} \
--det-cat-id ${CATEGORY_ID}
[--show --device ${GPU_ID or CPU}] \
[--bbox-thr ${BBOX_SCORE_THR} --kpt-thr ${KPT_SCORE_THR}]
Examples:
python demo/top_down_video_demo_with_mmdet.py \
demo/mmdetection_cfg/faster_rcnn_r50_fpn_coco.py \
https://download.openmmlab.com/mmdetection/v2.0/faster_rcnn/faster_rcnn_r50_fpn_2x_coco/faster_rcnn_r50_fpn_2x_coco_bbox_mAP-0.384_20200504_210434-a5d8aa15.pth \
configs/animal/2d_kpt_sview_rgb_img/topdown_heatmap/horse10/res50_horse10_256x256-split1.py \
https://download.openmmlab.com/mmpose/animal/resnet/res50_horse10_256x256_split1-3a3dc37e_20210405.pth \
--video-path demo/resources/<demo_horse.mp4> \
--out-video-root vis_results \
--bbox-thr 0.1 \
--kpt-thr 0.4 \
--det-cat-id 18
Other Animals
For other animals, we have also provided some pre-trained animal detection models (1-class models). Supported models can be found in det model zoo. The pre-trained animal pose estimation model can be found in pose model zoo.
python demo/top_down_video_demo_with_mmdet.py \
${MMDET_CONFIG_FILE} ${MMDET_CHECKPOINT_FILE} \
${MMPOSE_CONFIG_FILE} ${MMPOSE_CHECKPOINT_FILE} \
--video-path ${VIDEO_FILE} \
--out-video-root ${OUTPUT_VIDEO_ROOT} \
[--det-cat-id ${CATEGORY_ID}]
[--show --device ${GPU_ID or CPU}] \
[--bbox-thr ${BBOX_SCORE_THR} --kpt-thr ${KPT_SCORE_THR}]
Examples:
python demo/top_down_video_demo_with_mmdet.py \
demo/mmdetection_cfg/cascade_rcnn_x101_64x4d_fpn_1class.py \
https://openmmlab.oss-cn-hangzhou.aliyuncs.com/mmpose/mmdet_pretrained/cascade_rcnn_x101_64x4d_fpn_20e_macaque-e45e36f5_20210409.pth \
configs/animal/2d_kpt_sview_rgb_img/topdown_heatmap/macaque/res152_macaque_256x192.py \
https://download.openmmlab.com/mmpose/animal/resnet/res152_macaque_256x192-c42abc02_20210407.pth \
--video-path demo/resources/<demo_macaque.mp4> \
--out-video-root vis_results \
--bbox-thr 0.5 \
--kpt-thr 0.3 \
Speed Up Inference¶
Some tips to speed up MMPose inference:
For 2D animal pose estimation models, try to edit the config file. For example,
set
flip_test=False
in macaque-res50.set
post_process='default'
in macaque-res50.
2D Face Keypoint Demo¶
2D Face Image Demo¶
Using gt face bounding boxes as input¶
We provide a demo script to test a single image, given gt json file.
Face Keypoint Model Preparation: The pre-trained face keypoint estimation model can be found from model zoo. Take aflw model as an example:
python demo/top_down_img_demo.py \
${MMPOSE_CONFIG_FILE} ${MMPOSE_CHECKPOINT_FILE} \
--img-root ${IMG_ROOT} --json-file ${JSON_FILE} \
--out-img-root ${OUTPUT_DIR} \
[--show --device ${GPU_ID or CPU}] \
[--kpt-thr ${KPT_SCORE_THR}]
Examples:
python demo/top_down_img_demo.py \
configs/face/2d_kpt_sview_rgb_img/topdown_heatmap/aflw/hrnetv2_w18_aflw_256x256.py \
https://download.openmmlab.com/mmpose/face/hrnetv2/hrnetv2_w18_aflw_256x256-f2bbc62b_20210125.pth \
--img-root tests/data/aflw/ --json-file tests/data/aflw/test_aflw.json \
--out-img-root vis_results
To run demos on CPU:
python demo/top_down_img_demo.py \
configs/face/2d_kpt_sview_rgb_img/topdown_heatmap/aflw/hrnetv2_w18_aflw_256x256.py \
https://download.openmmlab.com/mmpose/face/hrnetv2/hrnetv2_w18_aflw_256x256-f2bbc62b_20210125.pth \
--img-root tests/data/aflw/ --json-file tests/data/aflw/test_aflw.json \
--out-img-root vis_results \
--device=cpu
Using face bounding box detectors¶
We provide a demo script to run face detection and face keypoint estimation.
Please install face_recognition
before running the demo, by pip install face_recognition
.
For more details, please refer to https://github.com/ageitgey/face_recognition.
python demo/face_img_demo.py \
${MMPOSE_CONFIG_FILE} ${MMPOSE_CHECKPOINT_FILE} \
--img-root ${IMG_ROOT} --img ${IMG_FILE} \
--out-img-root ${OUTPUT_DIR} \
[--show --device ${GPU_ID or CPU}] \
[--kpt-thr ${KPT_SCORE_THR}]
python demo/face_img_demo.py \
configs/face/2d_kpt_sview_rgb_img/topdown_heatmap/aflw/hrnetv2_w18_aflw_256x256.py \
https://download.openmmlab.com/mmpose/face/hrnetv2/hrnetv2_w18_aflw_256x256-f2bbc62b_20210125.pth \
--img-root tests/data/aflw/ \
--img image04476.jpg \
--out-img-root vis_results
2D Face Video Demo¶
We also provide a video demo to illustrate the results.
Please install face_recognition
before running the demo, by pip install face_recognition
.
For more details, please refer to https://github.com/ageitgey/face_recognition.
python demo/face_video_demo.py \
${MMPOSE_CONFIG_FILE} ${MMPOSE_CHECKPOINT_FILE} \
--video-path ${VIDEO_FILE} \
--out-video-root ${OUTPUT_VIDEO_ROOT} \
[--show --device ${GPU_ID or CPU}] \
[--kpt-thr ${KPT_SCORE_THR}]
Examples:
python demo/face_video_demo.py \
configs/face/2d_kpt_sview_rgb_img/topdown_heatmap/aflw/hrnetv2_w18_aflw_256x256.py \
https://download.openmmlab.com/mmpose/face/hrnetv2/hrnetv2_w18_aflw_256x256-f2bbc62b_20210125.pth \
--video-path https://user-images.githubusercontent.com/87690686/137441355-ec4da09c-3a8f-421b-bee9-b8b26f8c2dd0.mp4 \
--out-video-root vis_results
Speed Up Inference¶
Some tips to speed up MMPose inference:
For 2D face keypoint estimation models, try to edit the config file. For example,
set
flip_test=False
in face-hrnetv2_w18.set
post_process='default'
in face-hrnetv2_w18.
2D Hand Keypoint Demo¶
2D Hand Image Demo¶
Using gt hand bounding boxes as input¶
We provide a demo script to test a single image, given gt json file.
Hand Pose Model Preparation: The pre-trained hand pose estimation model can be downloaded from model zoo. Take onehand10k model as an example:
python demo/top_down_img_demo.py \
${MMPOSE_CONFIG_FILE} ${MMPOSE_CHECKPOINT_FILE} \
--img-root ${IMG_ROOT} --json-file ${JSON_FILE} \
--out-img-root ${OUTPUT_DIR} \
[--show --device ${GPU_ID or CPU}] \
[--kpt-thr ${KPT_SCORE_THR}]
Examples:
python demo/top_down_img_demo.py \
configs/hand/2d_kpt_sview_rgb_img/topdown_heatmap/onehand10k/res50_onehand10k_256x256.py \
https://download.openmmlab.com/mmpose/top_down/resnet/res50_onehand10k_256x256-e67998f6_20200813.pth \
--img-root tests/data/onehand10k/ --json-file tests/data/onehand10k/test_onehand10k.json \
--out-img-root vis_results
To run demos on CPU:
python demo/top_down_img_demo.py \
configs/hand/2d_kpt_sview_rgb_img/topdown_heatmap/onehand10k/res50_onehand10k_256x256.py \
https://download.openmmlab.com/mmpose/top_down/resnet/res50_onehand10k_256x256-e67998f6_20200813.pth \
--img-root tests/data/onehand10k/ --json-file tests/data/onehand10k/test_onehand10k.json \
--out-img-root vis_results \
--device=cpu
Using mmdet for hand bounding box detection¶
We provide a demo script to run mmdet for hand detection, and mmpose for hand pose estimation.
Assume that you have already installed mmdet.
Hand Box Model Preparation: The pre-trained hand box estimation model can be found in det model zoo.
Hand Pose Model Preparation: The pre-trained hand pose estimation model can be downloaded from pose model zoo.
python demo/top_down_img_demo_with_mmdet.py \
${MMDET_CONFIG_FILE} ${MMDET_CHECKPOINT_FILE} \
${MMPOSE_CONFIG_FILE} ${MMPOSE_CHECKPOINT_FILE} \
--img-root ${IMG_ROOT} --img ${IMG_FILE} \
--out-img-root ${OUTPUT_DIR} \
[--show --device ${GPU_ID or CPU}] \
[--bbox-thr ${BBOX_SCORE_THR} --kpt-thr ${KPT_SCORE_THR}]
python demo/top_down_img_demo_with_mmdet.py demo/mmdetection_cfg/cascade_rcnn_x101_64x4d_fpn_1class.py \
https://download.openmmlab.com/mmpose/mmdet_pretrained/cascade_rcnn_x101_64x4d_fpn_20e_onehand10k-dac19597_20201030.pth \
configs/hand/2d_kpt_sview_rgb_img/topdown_heatmap/onehand10k/res50_onehand10k_256x256.py \
https://download.openmmlab.com/mmpose/top_down/resnet/res50_onehand10k_256x256-e67998f6_20200813.pth \
--img-root tests/data/onehand10k/ \
--img 9.jpg \
--out-img-root vis_results
2D Hand Video Demo¶
We also provide a video demo to illustrate the results.
Assume that you have already installed mmdet.
Hand Box Model Preparation: The pre-trained hand box estimation model can be found in det model zoo.
Hand Pose Model Preparation: The pre-trained hand pose estimation model can be found in pose model zoo.
python demo/top_down_video_demo_with_mmdet.py \
${MMDET_CONFIG_FILE} ${MMDET_CHECKPOINT_FILE} \
${MMPOSE_CONFIG_FILE} ${MMPOSE_CHECKPOINT_FILE} \
--video-path ${VIDEO_FILE} \
--out-video-root ${OUTPUT_VIDEO_ROOT} \
[--show --device ${GPU_ID or CPU}] \
[--bbox-thr ${BBOX_SCORE_THR} --kpt-thr ${KPT_SCORE_THR}]
Examples:
python demo/top_down_video_demo_with_mmdet.py demo/mmdetection_cfg/cascade_rcnn_x101_64x4d_fpn_1class.py \
https://download.openmmlab.com/mmpose/mmdet_pretrained/cascade_rcnn_x101_64x4d_fpn_20e_onehand10k-dac19597_20201030.pth \
configs/hand/2d_kpt_sview_rgb_img/topdown_heatmap/onehand10k/res50_onehand10k_256x256.py \
https://download.openmmlab.com/mmpose/top_down/resnet/res50_onehand10k_256x256-e67998f6_20200813.pth \
--video-path https://user-images.githubusercontent.com/87690686/137441388-3ea93d26-5445-4184-829e-bf7011def9e4.mp4 \
--out-video-root vis_results
Speed Up Inference¶
Some tips to speed up MMPose inference:
For 2D hand pose estimation models, try to edit the config file. For example,
set
flip_test=False
in hand-res50.set
post_process='default'
in hand-res50.
2D Human Pose Demo¶
2D Human Pose Top-Down Image Demo¶
Using gt human bounding boxes as input¶
We provide a demo script to test a single image, given gt json file.
python demo/top_down_img_demo.py \
${MMPOSE_CONFIG_FILE} ${MMPOSE_CHECKPOINT_FILE} \
--img-root ${IMG_ROOT} --json-file ${JSON_FILE} \
--out-img-root ${OUTPUT_DIR} \
[--show --device ${GPU_ID or CPU}] \
[--kpt-thr ${KPT_SCORE_THR}]
Examples:
python demo/top_down_img_demo.py \
configs/body/2d_kpt_sview_rgb_img/topdown_heatmap/coco/hrnet_w48_coco_256x192.py \
https://download.openmmlab.com/mmpose/top_down/hrnet/hrnet_w48_coco_256x192-b9e0b3ab_20200708.pth \
--img-root tests/data/coco/ --json-file tests/data/coco/test_coco.json \
--out-img-root vis_results
To run demos on CPU:
python demo/top_down_img_demo.py \
configs/body/2d_kpt_sview_rgb_img/topdown_heatmap/coco/hrnet_w48_coco_256x192.py \
https://download.openmmlab.com/mmpose/top_down/hrnet/hrnet_w48_coco_256x192-b9e0b3ab_20200708.pth \
--img-root tests/data/coco/ --json-file tests/data/coco/test_coco.json \
--out-img-root vis_results \
--device=cpu
Using mmdet for human bounding box detection¶
We provide a demo script to run mmdet for human detection, and mmpose for pose estimation.
Assume that you have already installed mmdet.
python demo/top_down_img_demo_with_mmdet.py \
${MMDET_CONFIG_FILE} ${MMDET_CHECKPOINT_FILE} \
${MMPOSE_CONFIG_FILE} ${MMPOSE_CHECKPOINT_FILE} \
--img-root ${IMG_ROOT} --img ${IMG_FILE} \
--out-img-root ${OUTPUT_DIR} \
[--show --device ${GPU_ID or CPU}] \
[--bbox-thr ${BBOX_SCORE_THR} --kpt-thr ${KPT_SCORE_THR}]
Examples:
python demo/top_down_img_demo_with_mmdet.py \
demo/mmdetection_cfg/faster_rcnn_r50_fpn_coco.py \
https://download.openmmlab.com/mmdetection/v2.0/faster_rcnn/faster_rcnn_r50_fpn_1x_coco/faster_rcnn_r50_fpn_1x_coco_20200130-047c8118.pth \
configs/body/2d_kpt_sview_rgb_img/topdown_heatmap/coco/hrnet_w48_coco_256x192.py \
https://download.openmmlab.com/mmpose/top_down/hrnet/hrnet_w48_coco_256x192-b9e0b3ab_20200708.pth \
--img-root tests/data/coco/ \
--img 000000196141.jpg \
--out-img-root vis_results
2D Human Pose Top-Down Video Demo¶
We also provide a video demo to illustrate the results.
Assume that you have already installed mmdet.
python demo/top_down_video_demo_with_mmdet.py \
${MMDET_CONFIG_FILE} ${MMDET_CHECKPOINT_FILE} \
${MMPOSE_CONFIG_FILE} ${MMPOSE_CHECKPOINT_FILE} \
--video-path ${VIDEO_FILE} \
--out-video-root ${OUTPUT_VIDEO_ROOT} \
[--show --device ${GPU_ID or CPU}] \
[--bbox-thr ${BBOX_SCORE_THR} --kpt-thr ${KPT_SCORE_THR}]
Examples:
python demo/top_down_video_demo_with_mmdet.py \
demo/mmdetection_cfg/faster_rcnn_r50_fpn_coco.py \
https://download.openmmlab.com/mmdetection/v2.0/faster_rcnn/faster_rcnn_r50_fpn_1x_coco/faster_rcnn_r50_fpn_1x_coco_20200130-047c8118.pth \
configs/body/2d_kpt_sview_rgb_img/topdown_heatmap/coco/hrnet_w48_coco_256x192.py \
https://download.openmmlab.com/mmpose/top_down/hrnet/hrnet_w48_coco_256x192-b9e0b3ab_20200708.pth \
--video-path demo/resources/demo.mp4 \
--out-video-root vis_results
2D Human Pose Bottom-Up Image Demo¶
We provide a demo script to test a single image.
python demo/bottom_up_img_demo.py \
${MMPOSE_CONFIG_FILE} ${MMPOSE_CHECKPOINT_FILE} \
--img-path ${IMG_PATH}\
--out-img-root ${OUTPUT_DIR} \
[--show --device ${GPU_ID or CPU}] \
[--kpt-thr ${KPT_SCORE_THR} --pose-nms-thr ${POSE_NMS_THR}]
Examples:
python demo/bottom_up_img_demo.py \
configs/body/2d_kpt_sview_rgb_img/associative_embedding/coco/hrnet_w32_coco_512x512.py \
https://download.openmmlab.com/mmpose/bottom_up/hrnet_w32_coco_512x512-bcb8c247_20200816.pth \
--img-path tests/data/coco/ \
--out-img-root vis_results
2D Human Pose Bottom-Up Video Demo¶
We also provide a video demo to illustrate the results.
python demo/bottom_up_video_demo.py \
${MMPOSE_CONFIG_FILE} ${MMPOSE_CHECKPOINT_FILE} \
--video-path ${VIDEO_FILE} \
--out-video-root ${OUTPUT_VIDEO_ROOT} \
[--show --device ${GPU_ID or CPU}] \
[--kpt-thr ${KPT_SCORE_THR} --pose-nms-thr ${POSE_NMS_THR}]
Examples:
python demo/bottom_up_video_demo.py \
configs/body/2d_kpt_sview_rgb_img/associative_embedding/coco/hrnet_w32_coco_512x512.py \
https://download.openmmlab.com/mmpose/bottom_up/hrnet_w32_coco_512x512-bcb8c247_20200816.pth \
--video-path demo/resources/demo.mp4 \
--out-video-root vis_results
Speed Up Inference¶
Some tips to speed up MMPose inference:
For top-down models, try to edit the config file. For example,
set
flip_test=False
in topdown-res50.set
post_process='default'
in topdown-res50.use faster human bounding box detector, see MMDetection.
For bottom-up models, try to edit the config file. For example,
2D Pose Tracking Demo¶
2D Top-Down Video Human Pose Tracking Demo¶
We provide a video demo to illustrate the pose tracking results.
Assume that you have already installed mmdet.
python demo/top_down_pose_tracking_demo_with_mmdet.py \
${MMDET_CONFIG_FILE} ${MMDET_CHECKPOINT_FILE} \
${MMPOSE_CONFIG_FILE} ${MMPOSE_CHECKPOINT_FILE} \
--video-path ${VIDEO_FILE} \
--out-video-root ${OUTPUT_VIDEO_ROOT} \
[--show --device ${GPU_ID or CPU}] \
[--bbox-thr ${BBOX_SCORE_THR} --kpt-thr ${KPT_SCORE_THR}]
[--use-oks-tracking --tracking-thr ${TRACKING_THR} --euro]
Examples:
python demo/top_down_pose_tracking_demo_with_mmdet.py \
demo/mmdetection_cfg/faster_rcnn_r50_fpn_coco.py \
https://download.openmmlab.com/mmdetection/v2.0/faster_rcnn/faster_rcnn_r50_fpn_1x_coco/faster_rcnn_r50_fpn_1x_coco_20200130-047c8118.pth \
configs/body/2d_kpt_sview_rgb_img/topdown_heatmap/coco/res50_coco_256x192.py \
https://download.openmmlab.com/mmpose/top_down/resnet/res50_coco_256x192-ec54d7f3_20200709.pth \
--video-path demo/resources/demo.mp4 \
--out-video-root vis_results
2D Top-Down Video Human Pose Tracking Demo with MMTracking¶
MMTracking is an open source video perception toolbox based on PyTorch for tracking related tasks. Here we show how to utilize MMTracking and MMPose to achieve human pose tracking.
Assume that you have already installed mmtracking.
python demo/top_down_video_demo_with_mmtracking.py \
${MMTRACKING_CONFIG_FILE} \
${MMPOSE_CONFIG_FILE} ${MMPOSE_CHECKPOINT_FILE} \
--video-path ${VIDEO_FILE} \
--out-video-root ${OUTPUT_VIDEO_ROOT} \
[--show --device ${GPU_ID or CPU}] \
[--bbox-thr ${BBOX_SCORE_THR} --kpt-thr ${KPT_SCORE_THR}]
Examples:
python demo/top_down_pose_tracking_demo_with_mmtracking.py \
demo/mmtracking_cfg/tracktor_faster-rcnn_r50_fpn_4e_mot17-private.py \
configs/body/2d_kpt_sview_rgb_img/topdown_heatmap/coco/res50_coco_256x192.py \
https://download.openmmlab.com/mmpose/top_down/resnet/res50_coco_256x192-ec54d7f3_20200709.pth \
--video-path demo/resources/demo.mp4 \
--out-video-root vis_results
2D Bottom-Up Video Human Pose Tracking Demo¶
We also provide a pose tracking demo with bottom-up pose estimation methods.
python demo/bottom_up_pose_tracking_demo.py \
${MMPOSE_CONFIG_FILE} ${MMPOSE_CHECKPOINT_FILE} \
--video-path ${VIDEO_FILE} \
--out-video-root ${OUTPUT_VIDEO_ROOT} \
[--show --device ${GPU_ID or CPU}] \
[--kpt-thr ${KPT_SCORE_THR} --pose-nms-thr ${POSE_NMS_THR}]
[--use-oks-tracking --tracking-thr ${TRACKING_THR} --euro]
Examples:
python demo/bottom_up_pose_tracking_demo.py \
configs/body/2d_kpt_sview_rgb_img/associative_embedding/coco/hrnet_w32_coco_512x512.py \
https://download.openmmlab.com/mmpose/bottom_up/hrnet_w32_coco_512x512-bcb8c247_20200816.pth \
--video-path demo/resources/demo.mp4 \
--out-video-root vis_results
Speed Up Inference¶
Some tips to speed up MMPose inference:
For top-down models, try to edit the config file. For example,
set
flip_test=False
in topdown-res50.set
post_process='default'
in topdown-res50.use faster human detector or human tracker, see MMDetection or MMTracking.
For bottom-up models, try to edit the config file. For example,
2D Human Whole-Body Pose Demo¶
2D Human Whole-Body Pose Top-Down Image Demo¶
Using gt human bounding boxes as input¶
We provide a demo script to test a single image, given gt json file.
python demo/top_down_img_demo.py \
${MMPOSE_CONFIG_FILE} ${MMPOSE_CHECKPOINT_FILE} \
--img-root ${IMG_ROOT} --json-file ${JSON_FILE} \
--out-img-root ${OUTPUT_DIR} \
[--show --device ${GPU_ID or CPU}] \
[--kpt-thr ${KPT_SCORE_THR}]
Examples:
python demo/top_down_img_demo.py \
configs/wholebody/2d_kpt_sview_rgb_img/topdown_heatmap/coco-wholebody/hrnet_w48_coco_wholebody_384x288_dark_plus.py \
https://download.openmmlab.com/mmpose/top_down/hrnet/hrnet_w48_coco_wholebody_384x288_dark-f5726563_20200918.pth \
--img-root tests/data/coco/ --json-file tests/data/coco/test_coco.json \
--out-img-root vis_results
To run demos on CPU:
python demo/top_down_img_demo.py \
configs/wholebody/2d_kpt_sview_rgb_img/topdown_heatmap/coco-wholebody/hrnet_w48_coco_wholebody_384x288_dark_plus.py \
https://download.openmmlab.com/mmpose/top_down/hrnet/hrnet_w48_coco_wholebody_384x288_dark-f5726563_20200918.pth \
--img-root tests/data/coco/ --json-file tests/data/coco/test_coco.json \
--out-img-root vis_results \
--device=cpu
Using mmdet for human bounding box detection¶
We provide a demo script to run mmdet for human detection, and mmpose for pose estimation.
Assume that you have already installed mmdet.
python demo/top_down_img_demo_with_mmdet.py \
${MMDET_CONFIG_FILE} ${MMDET_CHECKPOINT_FILE} \
${MMPOSE_CONFIG_FILE} ${MMPOSE_CHECKPOINT_FILE} \
--img-root ${IMG_ROOT} --img ${IMG_FILE} \
--out-img-root ${OUTPUT_DIR} \
[--show --device ${GPU_ID or CPU}] \
[--bbox-thr ${BBOX_SCORE_THR} --kpt-thr ${KPT_SCORE_THR}]
Examples:
python demo/top_down_img_demo_with_mmdet.py \
demo/mmdetection_cfg/faster_rcnn_r50_fpn_coco.py \
https://download.openmmlab.com/mmdetection/v2.0/faster_rcnn/faster_rcnn_r50_fpn_1x_coco/faster_rcnn_r50_fpn_1x_coco_20200130-047c8118.pth \
configs/wholebody/2d_kpt_sview_rgb_img/topdown_heatmap/coco-wholebody/hrnet_w48_coco_wholebody_384x288_dark_plus.py \
https://download.openmmlab.com/mmpose/top_down/hrnet/hrnet_w48_coco_wholebody_384x288_dark-f5726563_20200918.pth \
--img-root tests/data/coco/ \
--img 000000196141.jpg \
--out-img-root vis_results
2D Human Whole-Body Pose Top-Down Video Demo¶
We also provide a video demo to illustrate the results.
Assume that you have already installed mmdet.
python demo/top_down_video_demo_with_mmdet.py \
${MMDET_CONFIG_FILE} ${MMDET_CHECKPOINT_FILE} \
${MMPOSE_CONFIG_FILE} ${MMPOSE_CHECKPOINT_FILE} \
--video-path ${VIDEO_FILE} \
--out-video-root ${OUTPUT_VIDEO_ROOT} \
[--show --device ${GPU_ID or CPU}] \
[--bbox-thr ${BBOX_SCORE_THR} --kpt-thr ${KPT_SCORE_THR}]
Examples:
python demo/top_down_video_demo_with_mmdet.py \
demo/mmdetection_cfg/faster_rcnn_r50_fpn_coco.py \
https://download.openmmlab.com/mmdetection/v2.0/faster_rcnn/faster_rcnn_r50_fpn_1x_coco/faster_rcnn_r50_fpn_1x_coco_20200130-047c8118.pth \
configs/wholebody/2d_kpt_sview_rgb_img/topdown_heatmap/coco-wholebody/hrnet_w48_coco_wholebody_384x288_dark_plus.py \
https://download.openmmlab.com/mmpose/top_down/hrnet/hrnet_w48_coco_wholebody_384x288_dark-f5726563_20200918.pth \
--video-path https://user-images.githubusercontent.com/87690686/137440639-fb08603d-9a35-474e-b65f-46b5c06b68d6.mp4 \
--out-video-root vis_results
Speed Up Inference¶
Some tips to speed up MMPose inference:
For top-down models, try to edit the config file. For example,
set
flip_test=False
in pose_hrnet_w48_dark+.set
post_process='default'
in pose_hrnet_w48_dark+.use faster human bounding box detector, see MMDetection.
3D Mesh Demo¶
3D Mesh Recovery Demo¶
We provide a demo script to recover human 3D mesh from a single image.
python demo/mesh_img_demo.py \
${MMPOSE_CONFIG_FILE} ${MMPOSE_CHECKPOINT_FILE} \
--json-file ${JSON_FILE} \
--img-root ${IMG_ROOT} \
[--show] \
[--device ${GPU_ID or CPU}] \
[--out-img-root ${OUTPUT_DIR}]
Example:
python demo/mesh_img_demo.py \
configs/body/3d_mesh_sview_rgb_img/hmr/mixed/res50_mixed_224x224.py \
https://download.openmmlab.com/mmpose/mesh/hmr/hmr_mesh_224x224-c21e8229_20201015.pth \
--json-file tests/data/h36m/h36m_coco.json \
--img-root tests/data/h36m \
--out-img-root vis_results
3D Hand Demo¶
3D Hand Estimation Image Demo¶
Using gt hand bounding boxes as input¶
We provide a demo script to test a single image, given gt json file.
python demo/interhand3d_img_demo.py \
${MMPOSE_CONFIG_FILE} ${MMPOSE_CHECKPOINT_FILE} \
--json-file ${JSON_FILE} \
--img-root ${IMG_ROOT} \
[--camera-param-file ${CAMERA_PARAM_FILE}] \
[--gt-joints-file ${GT_JOINTS_FILE}]\
[--show] \
[--device ${GPU_ID or CPU}] \
[--out-img-root ${OUTPUT_DIR}] \
[--rebase-keypoint-height] \
[--show-ground-truth]
Example with gt keypoints and camera parameters:
python demo/interhand3d_img_demo.py \
configs/hand/3d_kpt_sview_rgb_img/internet/interhand3d/res50_interhand3d_all_256x256.py \
https://download.openmmlab.com/mmpose/hand3d/internet/res50_intehand3d_all_256x256-b9c1cf4c_20210506.pth \
--json-file tests/data/interhand2.6m/test_interhand2.6m_data.json \
--img-root tests/data/interhand2.6m \
--camera-param-file tests/data/interhand2.6m/test_interhand2.6m_camera.json \
--gt-joints-file tests/data/interhand2.6m/test_interhand2.6m_joint_3d.json \
--out-img-root vis_results \
--rebase-keypoint-height \
--show-ground-truth
Example without gt keypoints and camera parameters:
python demo/interhand3d_img_demo.py \
configs/hand/3d_kpt_sview_rgb_img/internet/interhand3d/res50_interhand3d_all_256x256.py \
https://download.openmmlab.com/mmpose/hand3d/internet/res50_intehand3d_all_256x256-b9c1cf4c_20210506.pth \
--json-file tests/data/interhand2.6m/test_interhand2.6m_data.json \
--img-root tests/data/interhand2.6m \
--out-img-root vis_results \
--rebase-keypoint-height
3D Human Pose Demo¶
3D Human Pose Two-stage Estimation Image Demo¶
Using ground truth 2D poses as the 1st stage (pose detection) result, and inference the 2nd stage (2D-to-3D lifting)¶
We provide a demo script to test on single images with a given ground-truth Json file.
python demo/body3d_two_stage_img_demo.py \
${MMPOSE_CONFIG_FILE_3D} \
${MMPOSE_CHECKPOINT_FILE_3D} \
--json-file ${JSON_FILE} \
--img-root ${IMG_ROOT} \
--only-second-stage \
[--show] \
[--device ${GPU_ID or CPU}] \
[--out-img-root ${OUTPUT_DIR}] \
[--rebase-keypoint-height] \
[--show-ground-truth]
Example:
python demo/body3d_two_stage_img_demo.py \
configs/body/3d_kpt_sview_rgb_img/pose_lift/h36m/simplebaseline3d_h36m.py \
https://download.openmmlab.com/mmpose/body3d/simple_baseline/simple3Dbaseline_h36m-f0ad73a4_20210419.pth \
--json-file tests/data/h36m/h36m_coco.json \
--img-root tests/data/h36m \
--camera-param-file tests/data/h36m/cameras.pkl \
--only-second-stage \
--out-img-root vis_results \
--rebase-keypoint-height \
--show-ground-truth
3D Human Pose Two-stage Estimation Video Demo¶
Using mmdet for human bounding box detection and top-down model for the 1st stage (2D pose detection), and inference the 2nd stage (2D-to-3D lifting)¶
Assume that you have already installed mmdet.
python demo/body3d_two_stage_video_demo.py \
${MMDET_CONFIG_FILE} \
${MMDET_CHECKPOINT_FILE} \
${MMPOSE_CONFIG_FILE_2D} \
${MMPOSE_CHECKPOINT_FILE_2D} \
${MMPOSE_CONFIG_FILE_3D} \
${MMPOSE_CHECKPOINT_FILE_3D} \
--video-path ${VIDEO_PATH} \
[--rebase-keypoint-height] \
[--norm-pose-2d] \
[--num-poses-vis NUM_POSES_VIS] \
[--show] \
[--out-video-root ${OUT_VIDEO_ROOT}] \
[--device ${GPU_ID or CPU}] \
[--det-cat-id DET_CAT_ID] \
[--bbox-thr BBOX_THR] \
[--kpt-thr KPT_THR] \
[--use-oks-tracking] \
[--tracking-thr TRACKING_THR] \
[--euro] \
[--radius RADIUS] \
[--thickness THICKNESS]
Example:
python demo/body3d_two_stage_video_demo.py \
demo/mmdetection_cfg/faster_rcnn_r50_fpn_coco.py \
https://download.openmmlab.com/mmdetection/v2.0/faster_rcnn/faster_rcnn_r50_fpn_1x_coco/faster_rcnn_r50_fpn_1x_coco_20200130-047c8118.pth \
configs/body/2d_kpt_sview_rgb_img/topdown_heatmap/coco/hrnet_w48_coco_256x192.py \
https://download.openmmlab.com/mmpose/top_down/hrnet/hrnet_w48_coco_256x192-b9e0b3ab_20200708.pth \
configs/body/3d_kpt_sview_rgb_vid/video_pose_lift/h36m/videopose3d_h36m_243frames_fullconv_supervised_cpn_ft.py \
https://download.openmmlab.com/mmpose/body3d/videopose/videopose_h36m_243frames_fullconv_supervised_cpn_ft-88f5abbb_20210527.pth \
--video-path demo/resources/<demo_body3d>.mp4 \
--out-video-root vis_results \
--rebase-keypoint-height
Webcam Demo¶
We provide a webcam demo tool which integrartes detection and 2D pose estimation for humans and animals. You can simply run the following command:
python demo/webcam_demo.py
It will launch a window to display the webcam video steam with detection and pose estimation results:

Usage Tips¶
Which model is used in the demo tool?
Please check the following default arguments in the script. You can also choose other models from the MMDetection Model Zoo and MMPose Model Zoo or use your own models.
Model | Arguments |
---|---|
Detection | --det-config , --det-checkpoint |
Human Pose | --human-pose-config , --human-pose-checkpoint |
Animal Pose | --animal-pose-config , --animal-pose-checkpoint |
Can this tool run without GPU?
Yes, you can set
--device=cpu
and the model inference will be performed on CPU. Of course, this may cause a low inference FPS compared to using GPU devices.Why there is time delay between the pose visualization and the video?
The video I/O and model inference are running asynchronously and the latter usually takes more time for a single frame. To allevidate the time delay, you can:
set
--display-delay=MILLISECONDS
to defer the video stream, according to the inference delay shown at the top left corner. Or,set
--synchronous-mode
to force video stream being aligned with inference results. This may reduce the video display FPS.
Can this tool process video files?
Yes. You can set
--cam-id=VIDEO_FILE_PATH
to run the demo tool in offline mode on a video file. Note that--synchronous-mode
should be set in this case.How to enable/disable the special effects?
The special effects can be enabled/disabled at launch time by setting arguments like
--bugeye
,--sunglasses
, etc. You can also toggle the effects by keyboard shortcuts likeb
,s
when the tool starts.What if my computer doesn’t have a camera?
You can use a smart phone as a webcam with apps like Camo or DroidCam.
基准测试¶
内容建设中……
推理速度总结¶
这里总结了 MMPose 中主要模型的复杂度信息和推理速度,包括模型的计算复杂度、参数数量,以及以不同的批处理大小在 CPU 和 GPU 上的推理速度。还比较了不同模型在 COCO 人体关键点数据集上的全类别平均正确率,展示了模型性能和模型复杂度之间的折中。
比较规则¶
为了保证比较的公平性,在相同的硬件和软件环境下使用相同的数据集进行了比较实验。还列出了模型在 COCO 人体关键点数据集上的全类别平均正确率以及相应的配置文件。
对于模型复杂度信息,计算具有相应输入形状的模型的浮点数运算次数和参数数量。请注意,当前某些网络层或算子还未支持,如 DeformConv2d
,因此您可能需要检查是否所有操作都已支持,并验证浮点数运算次数和参数数量的计算是否正确。
对于推理速度,忽略了数据预处理的时间,只测量模型前向计算和数据后处理的时间。对于每个模型设置,保持相同的数据预处理方法,以确保相同的特征输入。分别测量了在 CPU 和 GPU 设备上的推理速度。对于自上而下的热图模型,我们还测试了批处理量较大(例如,10)情况,以测试拥挤场景下的模型性能。
推断速度是用每秒处理的帧数 (FPS) 来衡量的,即每秒模型的平均迭代次数,它可以显示模型处理输入的速度。这个数值越高,表示推理速度越快,模型性能越好。
硬件¶
GPU: GeForce GTX 1660 SUPER
CPU: Intel(R) Core(TM) i7-8700 CPU @ 3.20GHz
软件环境¶
Ubuntu 16.04
Python 3.8
PyTorch 1.10
CUDA 10.2
mmcv-full 1.3.17
mmpose 0.20.0
MMPose 中主要模型的复杂度信息和推理速度总结¶
Algorithm | Model | config | Input size | mAP | Flops (GFLOPs) | Params (M) | GPU Inference Speed (FPS)1 |
GPU Inference Speed (FPS, bs=10)2 |
CPU Inference Speed (FPS) |
CPU Inference Speed (FPS, bs=10) |
---|---|---|---|---|---|---|---|---|---|---|
topdown_heatmap | Alexnet | config | (3, 192, 256) | 0.397 | 1.42 | 5.62 | 229.21 ± 16.91 | 33.52 ± 1.14 | 13.92 ± 0.60 | 1.38 ± 0.02 |
topdown_heatmap | CPM | config | (3, 192, 256) | 0.623 | 63.81 | 31.3 | 11.35 ± 0.22 | 3.87 ± 0.07 | 0.31 ± 0.01 | 0.03 ± 0.00 |
topdown_heatmap | CPM | config | (3, 288, 384) | 0.65 | 143.57 | 31.3 | 7.09 ± 0.14 | 2.10 ± 0.05 | 0.14 ± 0.00 | 0.01 ± 0.00 |
topdown_heatmap | Hourglass-52 | config | (3, 256, 256) | 0.726 | 28.67 | 94.85 | 25.50 ± 1.68 | 3.99 ± 0.07 | 0.92 ± 0.03 | 0.09 ± 0.00 |
topdown_heatmap | Hourglass-52 | config | (3, 384, 384) | 0.746 | 64.5 | 94.85 | 14.74 ± 0.8 | 1.86 ± 0.06 | 0.43 ± 0.03 | 0.04 ± 0.00 |
topdown_heatmap | HRNet-W32 | config | (3, 192, 256) | 0.746 | 7.7 | 28.54 | 22.73 ± 1.12 | 6.60 ± 0.14 | 2.73 ± 0.11 | 0.32 ± 0.00 |
topdown_heatmap | HRNet-W32 | config | (3, 288, 384) | 0.76 | 17.33 | 28.54 | 22.78 ± 1.21 | 3.28 ± 0.08 | 1.35 ± 0.05 | 0.14 ± 0.00 |
topdown_heatmap | HRNet-W48 | config | (3, 192, 256) | 0.756 | 15.77 | 63.6 | 22.01 ± 1.10 | 3.74 ± 0.10 | 1.46 ± 0.05 | 0.16 ± 0.00 |
topdown_heatmap | HRNet-W48 | config | (3, 288, 384) | 0.767 | 35.48 | 63.6 | 15.03 ± 1.03 | 1.80 ± 0.03 | 0.68 ± 0.02 | 0.07 ± 0.00 |
topdown_heatmap | LiteHRNet-30 | config | (3, 192, 256) | 0.675 | 0.42 | 1.76 | 11.86 ± 0.38 | 9.77 ± 0.23 | 5.84 ± 0.39 | 0.80 ± 0.00 |
topdown_heatmap | LiteHRNet-30 | config | (3, 288, 384) | 0.7 | 0.95 | 1.76 | 11.52 ± 0.39 | 5.18 ± 0.11 | 3.45 ± 0.22 | 0.37 ± 0.00 |
topdown_heatmap | MobilenetV2 | config | (3, 192, 256) | 0.646 | 1.59 | 9.57 | 91.82 ± 10.98 | 17.85 ± 0.32 | 10.44 ± 0.80 | 1.05 ± 0.01 |
topdown_heatmap | MobilenetV2 | config | (3, 288, 384) | 0.673 | 3.57 | 9.57 | 71.27 ± 6.82 | 8.00 ± 0.15 | 5.01 ± 0.32 | 0.46 ± 0.00 |
topdown_heatmap | MSPN-50 | config | (3, 192, 256) | 0.723 | 5.11 | 25.11 | 59.65 ± 3.74 | 9.51 ± 0.15 | 3.98 ± 0.21 | 0.43 ± 0.00 |
topdown_heatmap | 2xMSPN-50 | config | (3, 192, 256) | 0.754 | 11.35 | 56.8 | 30.64 ± 2.61 | 4.74 ± 0.12 | 1.85 ± 0.08 | 0.20 ± 0.00 |
topdown_heatmap | 3xMSPN-50 | config | (3, 192, 256) | 0.758 | 17.59 | 88.49 | 20.90 ± 1.82 | 3.22 ± 0.08 | 1.23 ± 0.04 | 0.13 ± 0.00 |
topdown_heatmap | 4xMSPN-50 | config | (3, 192, 256) | 0.764 | 23.82 | 120.18 | 15.79 ± 1.14 | 2.45 ± 0.05 | 0.90 ± 0.03 | 0.10 ± 0.00 |
topdown_heatmap | ResNest-50 | config | (3, 192, 256) | 0.721 | 6.73 | 35.93 | 48.36 ± 4.12 | 7.48 ± 0.13 | 3.00 ± 0.13 | 0.33 ± 0.00 |
topdown_heatmap | ResNest-50 | config | (3, 288, 384) | 0.737 | 15.14 | 35.93 | 30.30 ± 2.30 | 3.62 ± 0.09 | 1.43 ± 0.05 | 0.13 ± 0.00 |
topdown_heatmap | ResNest-101 | config | (3, 192, 256) | 0.725 | 10.38 | 56.61 | 29.21 ± 1.98 | 5.30 ± 0.12 | 2.01 ± 0.08 | 0.22 ± 0.00 |
topdown_heatmap | ResNest-101 | config | (3, 288, 384) | 0.746 | 23.36 | 56.61 | 19.02 ± 1.40 | 2.59 ± 0.05 | 0.97 ± 0.03 | 0.09 ± 0.00 |
topdown_heatmap | ResNest-200 | config | (3, 192, 256) | 0.732 | 17.5 | 78.54 | 16.11 ± 0.71 | 3.29 ± 0.07 | 1.33 ± 0.02 | 0.14 ± 0.00 |
topdown_heatmap | ResNest-200 | config | (3, 288, 384) | 0.754 | 39.37 | 78.54 | 11.48 ± 0.68 | 1.58 ± 0.02 | 0.63 ± 0.01 | 0.06 ± 0.00 |
topdown_heatmap | ResNest-269 | config | (3, 192, 256) | 0.738 | 22.45 | 119.27 | 12.02 ± 0.47 | 2.60 ± 0.05 | 1.03 ± 0.01 | 0.11 ± 0.00 |
topdown_heatmap | ResNest-269 | config | (3, 288, 384) | 0.755 | 50.5 | 119.27 | 8.82 ± 0.42 | 1.24 ± 0.02 | 0.49 ± 0.01 | 0.05 ± 0.00 |
topdown_heatmap | ResNet-50 | config | (3, 192, 256) | 0.718 | 5.46 | 34 | 64.23 ± 6.05 | 9.33 ± 0.21 | 4.00 ± 0.10 | 0.41 ± 0.00 |
topdown_heatmap | ResNet-50 | config | (3, 288, 384) | 0.731 | 12.29 | 34 | 36.78 ± 3.05 | 4.48 ± 0.12 | 1.92 ± 0.04 | 0.19 ± 0.00 |
topdown_heatmap | ResNet-101 | config | (3, 192, 256) | 0.726 | 9.11 | 52.99 | 43.35 ± 4.36 | 6.44 ± 0.14 | 2.57 ± 0.05 | 0.27 ± 0.00 |
topdown_heatmap | ResNet-101 | config | (3, 288, 384) | 0.748 | 20.5 | 52.99 | 23.29 ± 1.83 | 3.12 ± 0.09 | 1.23 ± 0.03 | 0.11 ± 0.00 |
topdown_heatmap | ResNet-152 | config | (3, 192, 256) | 0.735 | 12.77 | 68.64 | 32.31 ± 2.84 | 4.88 ± 0.17 | 1.89 ± 0.03 | 0.20 ± 0.00 |
topdown_heatmap | ResNet-152 | config | (3, 288, 384) | 0.75 | 28.73 | 68.64 | 17.32 ± 1.17 | 2.40 ± 0.04 | 0.91 ± 0.01 | 0.08 ± 0.00 |
topdown_heatmap | ResNetV1d-50 | config | (3, 192, 256) | 0.722 | 5.7 | 34.02 | 63.44 ± 6.09 | 9.09 ± 0.10 | 3.82 ± 0.10 | 0.39 ± 0.00 |
topdown_heatmap | ResNetV1d-50 | config | (3, 288, 384) | 0.73 | 12.82 | 34.02 | 36.21 ± 3.10 | 4.30 ± 0.12 | 1.82 ± 0.04 | 0.16 ± 0.00 |
topdown_heatmap | ResNetV1d-101 | config | (3, 192, 256) | 0.731 | 9.35 | 53.01 | 41.48 ± 3.76 | 6.33 ± 0.15 | 2.48 ± 0.05 | 0.26 ± 0.00 |
topdown_heatmap | ResNetV1d-101 | config | (3, 288, 384) | 0.748 | 21.04 | 53.01 | 23.49 ± 1.76 | 3.07 ± 0.07 | 1.19 ± 0.02 | 0.11 ± 0.00 |
topdown_heatmap | ResNetV1d-152 | config | (3, 192, 256) | 0.737 | 13.01 | 68.65 | 31.96 ± 2.87 | 4.69 ± 0.18 | 1.87 ± 0.02 | 0.19 ± 0.00 |
topdown_heatmap | ResNetV1d-152 | config | (3, 288, 384) | 0.752 | 29.26 | 68.65 | 17.31 ± 1.13 | 2.32 ± 0.04 | 0.88 ± 0.01 | 0.08 ± 0.00 |
topdown_heatmap | ResNext-50 | config | (3, 192, 256) | 0.714 | 5.61 | 33.47 | 48.34 ± 3.85 | 7.66 ± 0.13 | 3.71 ± 0.10 | 0.37 ± 0.00 |
topdown_heatmap | ResNext-50 | config | (3, 288, 384) | 0.724 | 12.62 | 33.47 | 30.66 ± 2.38 | 3.64 ± 0.11 | 1.73 ± 0.03 | 0.15 ± 0.00 |
topdown_heatmap | ResNext-101 | config | (3, 192, 256) | 0.726 | 9.29 | 52.62 | 27.33 ± 2.35 | 5.09 ± 0.13 | 2.45 ± 0.04 | 0.25 ± 0.00 |
topdown_heatmap | ResNext-101 | config | (3, 288, 384) | 0.743 | 20.91 | 52.62 | 18.19 ± 1.38 | 2.42 ± 0.04 | 1.15 ± 0.01 | 0.10 ± 0.00 |
topdown_heatmap | ResNext-152 | config | (3, 192, 256) | 0.73 | 12.98 | 68.39 | 19.61 ± 1.61 | 3.80 ± 0.13 | 1.83 ± 0.02 | 0.18 ± 0.00 |
topdown_heatmap | ResNext-152 | config | (3, 288, 384) | 0.742 | 29.21 | 68.39 | 13.14 ± 0.75 | 1.82 ± 0.03 | 0.85 ± 0.01 | 0.08 ± 0.00 |
topdown_heatmap | RSN-18 | config | (3, 192, 256) | 0.704 | 2.27 | 9.14 | 47.80 ± 4.50 | 13.68 ± 0.25 | 6.70 ± 0.28 | 0.70 ± 0.00 |
topdown_heatmap | RSN-50 | config | (3, 192, 256) | 0.723 | 4.11 | 19.33 | 27.22 ± 1.61 | 8.81 ± 0.13 | 3.98 ± 0.12 | 0.45 ± 0.00 |
topdown_heatmap | 2xRSN-50 | config | (3, 192, 256) | 0.745 | 8.29 | 39.26 | 13.88 ± 0.64 | 4.78 ± 0.13 | 2.02 ± 0.04 | 0.23 ± 0.00 |
topdown_heatmap | 3xRSN-50 | config | (3, 192, 256) | 0.75 | 12.47 | 59.2 | 9.40 ± 0.32 | 3.37 ± 0.09 | 1.34 ± 0.03 | 0.15 ± 0.00 |
topdown_heatmap | SCNet-50 | config | (3, 192, 256) | 0.728 | 5.31 | 34.01 | 40.76 ± 3.08 | 8.35 ± 0.19 | 3.82 ± 0.08 | 0.40 ± 0.00 |
topdown_heatmap | SCNet-50 | config | (3, 288, 384) | 0.751 | 11.94 | 34.01 | 32.61 ± 2.97 | 4.19 ± 0.10 | 1.85 ± 0.03 | 0.17 ± 0.00 |
topdown_heatmap | SCNet-101 | config | (3, 192, 256) | 0.733 | 8.51 | 53.01 | 24.28 ± 1.19 | 5.80 ± 0.13 | 2.49 ± 0.05 | 0.27 ± 0.00 |
topdown_heatmap | SCNet-101 | config | (3, 288, 384) | 0.752 | 19.14 | 53.01 | 20.43 ± 1.76 | 2.91 ± 0.06 | 1.23 ± 0.02 | 0.12 ± 0.00 |
topdown_heatmap | SeresNet-50 | config | (3, 192, 256) | 0.728 | 5.47 | 36.53 | 54.83 ± 4.94 | 8.80 ± 0.12 | 3.85 ± 0.10 | 0.40 ± 0.00 |
topdown_heatmap | SeresNet-50 | config | (3, 288, 384) | 0.748 | 12.3 | 36.53 | 33.00 ± 2.67 | 4.26 ± 0.12 | 1.86 ± 0.04 | 0.17 ± 0.00 |
topdown_heatmap | SeresNet-101 | config | (3, 192, 256) | 0.734 | 9.13 | 57.77 | 33.90 ± 2.65 | 6.01 ± 0.13 | 2.48 ± 0.05 | 0.26 ± 0.00 |
topdown_heatmap | SeresNet-101 | config | (3, 288, 384) | 0.753 | 20.53 | 57.77 | 20.57 ± 1.57 | 2.96 ± 0.07 | 1.20 ± 0.02 | 0.11 ± 0.00 |
topdown_heatmap | SeresNet-152 | config | (3, 192, 256) | 0.73 | 12.79 | 75.26 | 24.25 ± 1.95 | 4.45 ± 0.10 | 1.82 ± 0.02 | 0.19 ± 0.00 |
topdown_heatmap | SeresNet-152 | config | (3, 288, 384) | 0.753 | 28.76 | 75.26 | 15.11 ± 0.99 | 2.25 ± 0.04 | 0.88 ± 0.01 | 0.08 ± 0.00 |
topdown_heatmap | ShuffleNetV1 | config | (3, 192, 256) | 0.585 | 1.35 | 6.94 | 80.79 ± 8.95 | 21.91 ± 0.46 | 11.84 ± 0.59 | 1.25 ± 0.01 |
topdown_heatmap | ShuffleNetV1 | config | (3, 288, 384) | 0.622 | 3.05 | 6.94 | 63.45 ± 5.21 | 9.84 ± 0.10 | 6.01 ± 0.31 | 0.57 ± 0.00 |
topdown_heatmap | ShuffleNetV2 | config | (3, 192, 256) | 0.599 | 1.37 | 7.55 | 82.36 ± 7.30 | 22.68 ± 0.53 | 12.40 ± 0.66 | 1.34 ± 0.02 |
topdown_heatmap | ShuffleNetV2 | config | (3, 288, 384) | 0.636 | 3.08 | 7.55 | 63.63 ± 5.72 | 10.47 ± 0.16 | 6.32 ± 0.28 | 0.63 ± 0.01 |
topdown_heatmap | VGG16 | config | (3, 192, 256) | 0.698 | 16.22 | 18.92 | 51.91 ± 2.98 | 6.18 ± 0.13 | 1.64 ± 0.03 | 0.15 ± 0.00 |
topdown_heatmap | VIPNAS + ResNet-50 | config | (3, 192, 256) | 0.711 | 1.49 | 7.29 | 34.88 ± 2.45 | 10.29 ± 0.13 | 6.51 ± 0.17 | 0.65 ± 0.00 |
topdown_heatmap | VIPNAS + MobileNetV3 | config | (3, 192, 256) | 0.7 | 0.76 | 5.9 | 53.62 ± 6.59 | 11.54 ± 0.18 | 1.26 ± 0.02 | 0.13 ± 0.00 |
Associative Embedding | HigherHRNet-W32 | config | (3, 512, 512) | 0.677 | 46.58 | 28.65 | 7.80 ± 0.67 | / | 0.28 ± 0.02 | / |
Associative Embedding | HigherHRNet-W32 | config | (3, 640, 640) | 0.686 | 72.77 | 28.65 | 5.30 ± 0.37 | / | 0.17 ± 0.01 | / |
Associative Embedding | HigherHRNet-W48 | config | (3, 512, 512) | 0.686 | 96.17 | 63.83 | 4.55 ± 0.35 | / | 0.15 ± 0.01 | / |
Associative Embedding | Hourglass-AE | config | (3, 512, 512) | 0.613 | 221.58 | 138.86 | 3.55 ± 0.24 | / | 0.08 ± 0.00 | / |
Associative Embedding | HRNet-W32 | config | (3, 512, 512) | 0.654 | 41.1 | 28.54 | 8.93 ± 0.76 | / | 0.33 ± 0.02 | / |
Associative Embedding | HRNet-W48 | config | (3, 512, 512) | 0.665 | 84.12 | 63.6 | 5.27 ± 0.43 | / | 0.18 ± 0.01 | / |
Associative Embedding | MobilenetV2 | config | (3, 512, 512) | 0.38 | 8.54 | 9.57 | 21.24 ± 1.34 | / | 0.81 ± 0.06 | / |
Associative Embedding | ResNet-50 | config | (3, 512, 512) | 0.466 | 29.2 | 34 | 11.71 ± 0.97 | / | 0.41 ± 0.02 | / |
Associative Embedding | ResNet-50 | config | (3, 640, 640) | 0.479 | 45.62 | 34 | 8.20 ± 0.58 | / | 0.26 ± 0.02 | / |
Associative Embedding | ResNet-101 | config | (3, 512, 512) | 0.554 | 48.67 | 53 | 8.26 ± 0.68 | / | 0.28 ± 0.02 | / |
Associative Embedding | ResNet-101 | config | (3, 512, 512) | 0.595 | 68.17 | 68.64 | 6.25 ± 0.53 | / | 0.21 ± 0.01 | / |
DeepPose | ResNet-50 | config | (3, 192, 256) | 0.526 | 4.04 | 23.58 | 82.20 ± 7.54 | / | 5.50 ± 0.18 | / |
DeepPose | ResNet-101 | config | (3, 192, 256) | 0.56 | 7.69 | 42.57 | 48.93 ± 4.02 | / | 3.10 ± 0.07 | / |
DeepPose | ResNet-152 | config | (3, 192, 256) | 0.583 | 11.34 | 58.21 | 35.06 ± 3.50 | / | 2.19 ± 0.04 | / |
1 注意,这里运行迭代多次,并记录每次迭代的时间,同时展示了 FPS 数值的平均值和标准差。
2 FPS 定义为每秒的平均迭代次数,与此迭代中的批处理大小无关。
概览¶
论文数量: 9
DATASET: 9
已支持的算法详细信息请见 模型池.
2D动物关键点数据集¶
论文数量: 0
2D 人体关键点数据集¶
论文数量: 9
[DATASET] 2d Human Pose Estimation: New Benchmark and State of the Art Analysis (MPII ⇨)
[DATASET] Ai Challenger: A Large-Scale Dataset for Going Deeper in Image Understanding (AIC ⇨)
[DATASET] Crowdpose: Efficient Crowded Scenes Pose Estimation and a New Benchmark (CrowdPose ⇨)
[DATASET] Learning Delicate Local Representations for Multi-Person Pose Estimation (sub-JHMDB dataset ⇨)
[DATASET] Microsoft Coco: Common Objects in Context (COCO ⇨)
[DATASET] Pose2seg: Detection Free Human Instance Segmentation (OCHuman ⇨)
[DATASET] Posetrack: A Benchmark for Human Pose Estimation and Tracking (PoseTrack18 ⇨)
[DATASET] Trb: A Novel Triplet Representation for Understanding 2d Human Body (MPII-TRB ⇨)
[DATASET] Understanding Humans in Crowded Scenes: Deep Nested Adversarial Learning and a New Benchmark for Multi-Human Parsing (MHP ⇨)
2D人脸关键点数据集¶
论文数量: 0
2D服装关键点数据集¶
论文数量: 0
2D手部关键点数据集¶
论文数量: 0
2D全身人体关键点数据集¶
论文数量: 0
3D人体关键点数据集¶
论文数量: 0
3D人体网格模型数据集¶
论文数量: 0
3D手部关键点数据集¶
论文数量: 0
2D 人体关键点数据集¶
我们建议您将数据集的根目录放置在 $MMPOSE/data
下。
如果您的文件结构比较特别,您需要在配置文件中修改相应的路径。
MMPose 支持的数据集如下所示:
图像
视频
COCO¶
COCO (ECCV'2014)
@inproceedings{lin2014microsoft,
title={Microsoft coco: Common objects in context},
author={Lin, Tsung-Yi and Maire, Michael and Belongie, Serge and Hays, James and Perona, Pietro and Ramanan, Deva and Doll{\'a}r, Piotr and Zitnick, C Lawrence},
booktitle={European conference on computer vision},
pages={740--755},
year={2014},
organization={Springer}
}
请从此链接 COCO download 下载数据集。请注意,2017 Train/Val 对于 COCO 关键点的训练和评估是非常必要的。 HRNet-Human-Pose-Estimation 提供了 COCO val2017 的检测结果,可用于复现我们的多人姿态估计的结果。 请从 OneDrive 或 GoogleDrive下载。 可选地, 为了在 COCO’2017 test-dev 上评估, 请下载 image-info。 请将数据置于 $MMPOSE/data 目录下,并整理成如下的格式:
mmpose
├── mmpose
├── docs
├── tests
├── tools
├── configs
`── data
│── coco
│-- annotations
│ │-- person_keypoints_train2017.json
│ |-- person_keypoints_val2017.json
│ |-- person_keypoints_test-dev-2017.json
|-- person_detection_results
| |-- COCO_val2017_detections_AP_H_56_person.json
| |-- COCO_test-dev2017_detections_AP_H_609_person.json
│-- train2017
│ │-- 000000000009.jpg
│ │-- 000000000025.jpg
│ │-- 000000000030.jpg
│ │-- ...
`-- val2017
│-- 000000000139.jpg
│-- 000000000285.jpg
│-- 000000000632.jpg
│-- ...
MPII¶
MPII (CVPR'2014)
@inproceedings{andriluka14cvpr,
author = {Mykhaylo Andriluka and Leonid Pishchulin and Peter Gehler and Schiele, Bernt},
title = {2D Human Pose Estimation: New Benchmark and State of the Art Analysis},
booktitle = {IEEE Conference on Computer Vision and Pattern Recognition (CVPR)},
year = {2014},
month = {June}
}
请从此链接 MPII Human Pose Dataset 下载数据集。 我们已经将原来的标注文件转成了 json 格式,请从此链接 mpii_annotations 下载。 请将数据置于 $MMPOSE/data 目录下,并整理成如下的格式:
mmpose
├── mmpose
├── docs
├── tests
├── tools
├── configs
`── data
│── mpii
|── annotations
| |── mpii_gt_val.mat
| |── mpii_test.json
| |── mpii_train.json
| |── mpii_trainval.json
| `── mpii_val.json
`── images
|── 000001163.jpg
|── 000003072.jpg
在训练和推理过程中,预测结果将会被默认保存为 ‘.mat’ 的格式。我们提供了一个工具将这种 ‘.mat’ 的格式转换成更加易读的 ‘.json’ 格式。
python tools/dataset/mat2json ${PRED_MAT_FILE} ${GT_JSON_FILE} ${OUTPUT_PRED_JSON_FILE}
比如,
python tools/dataset/mat2json work_dirs/res50_mpii_256x256/pred.mat data/mpii/annotations/mpii_val.json pred.json
MPII-TRB¶
MPII-TRB (ICCV'2019)
@inproceedings{duan2019trb,
title={TRB: A Novel Triplet Representation for Understanding 2D Human Body},
author={Duan, Haodong and Lin, Kwan-Yee and Jin, Sheng and Liu, Wentao and Qian, Chen and Ouyang, Wanli},
booktitle={Proceedings of the IEEE International Conference on Computer Vision},
pages={9479--9488},
year={2019}
}
请从此链接MPII Human Pose Dataset下载数据集,并从此链接 mpii_trb_annotations 下载标注文件。 请将数据置于 $MMPOSE/data 目录下,并整理成如下的格式:
mmpose
├── mmpose
├── docs
├── tests
├── tools
├── configs
`── data
│── mpii
|── annotations
| |── mpii_trb_train.json
| |── mpii_trb_val.json
`── images
|── 000001163.jpg
|── 000003072.jpg
AIC¶
AI Challenger (ArXiv'2017)
@article{wu2017ai,
title={Ai challenger: A large-scale dataset for going deeper in image understanding},
author={Wu, Jiahong and Zheng, He and Zhao, Bo and Li, Yixin and Yan, Baoming and Liang, Rui and Wang, Wenjia and Zhou, Shipei and Lin, Guosen and Fu, Yanwei and others},
journal={arXiv preprint arXiv:1711.06475},
year={2017}
}
请从此链接 AI Challenger 2017 下载 AIC 数据集。请注意,2017 Train/Val 对于关键点的训练和评估是必要的。 请从此链接 aic_annotations 下载标注文件。 请将数据置于 $MMPOSE/data 目录下,并整理成如下的格式:
mmpose
├── mmpose
├── docs
├── tests
├── tools
├── configs
`── data
│── aic
│-- annotations
│ │-- aic_train.json
│ |-- aic_val.json
│-- ai_challenger_keypoint_train_20170902
│ │-- keypoint_train_images_20170902
│ │ │-- 0000252aea98840a550dac9a78c476ecb9f47ffa.jpg
│ │ │-- 000050f770985ac9653198495ef9b5c82435d49c.jpg
│ │ │-- ...
`-- ai_challenger_keypoint_validation_20170911
│-- keypoint_validation_images_20170911
│-- 0002605c53fb92109a3f2de4fc3ce06425c3b61f.jpg
│-- 0003b55a2c991223e6d8b4b820045bd49507bf6d.jpg
│-- ...
CrowdPose¶
CrowdPose (CVPR'2019)
@article{li2018crowdpose,
title={CrowdPose: Efficient Crowded Scenes Pose Estimation and A New Benchmark},
author={Li, Jiefeng and Wang, Can and Zhu, Hao and Mao, Yihuan and Fang, Hao-Shu and Lu, Cewu},
journal={arXiv preprint arXiv:1812.00324},
year={2018}
}
请从此链接 CrowdPose 下载数据集,并从此链接 crowdpose_annotations 下载标注文件和人体检测结果。 对于 top-down 方法,我们仿照 CrowdPose,使用 YOLOv3的预训练权重 来产生人体的检测框。 对于模型训练, 我们仿照 HigherHRNet,在 CrowdPose 训练/验证 数据集上训练模型, 并在 CrowdPose 测试集上评估模型。 请将数据置于 $MMPOSE/data 目录下,并整理成如下的格式:
mmpose
├── mmpose
├── docs
├── tests
├── tools
├── configs
`── data
│── crowdpose
│-- annotations
│ │-- mmpose_crowdpose_train.json
│ │-- mmpose_crowdpose_val.json
│ │-- mmpose_crowdpose_trainval.json
│ │-- mmpose_crowdpose_test.json
│ │-- det_for_crowd_test_0.1_0.5.json
│-- images
│-- 100000.jpg
│-- 100001.jpg
│-- 100002.jpg
│-- ...
OCHuman¶
OCHuman (CVPR'2019)
@inproceedings{zhang2019pose2seg,
title={Pose2seg: Detection free human instance segmentation},
author={Zhang, Song-Hai and Li, Ruilong and Dong, Xin and Rosin, Paul and Cai, Zixi and Han, Xi and Yang, Dingcheng and Huang, Haozhi and Hu, Shi-Min},
booktitle={Proceedings of the IEEE conference on computer vision and pattern recognition},
pages={889--898},
year={2019}
}
请从此链接 OCHuman 下载数据集的图像和标注文件。 请将数据置于 $MMPOSE/data 目录下,并整理成如下的格式:
mmpose
├── mmpose
├── docs
├── tests
├── tools
├── configs
`── data
│── ochuman
│-- annotations
│ │-- ochuman_coco_format_val_range_0.00_1.00.json
│ |-- ochuman_coco_format_test_range_0.00_1.00.json
|-- images
│-- 000001.jpg
│-- 000002.jpg
│-- 000003.jpg
│-- ...
MHP¶
MHP (ACM MM'2018)
@inproceedings{zhao2018understanding,
title={Understanding humans in crowded scenes: Deep nested adversarial learning and a new benchmark for multi-human parsing},
author={Zhao, Jian and Li, Jianshu and Cheng, Yu and Sim, Terence and Yan, Shuicheng and Feng, Jiashi},
booktitle={Proceedings of the 26th ACM international conference on Multimedia},
pages={792--800},
year={2018}
}
请从此链接 MHP下载数据文件,并从此链接 mhp_annotations下载标注文件。 请将数据置于 $MMPOSE/data 目录下,并整理成如下的格式:
mmpose
├── mmpose
├── docs
├── tests
├── tools
├── configs
`── data
│── mhp
│-- annotations
│ │-- mhp_train.json
│ │-- mhp_val.json
│
`-- train
│ │-- images
│ │ │-- 1004.jpg
│ │ │-- 10050.jpg
│ │ │-- ...
│
`-- val
│ │-- images
│ │ │-- 10059.jpg
│ │ │-- 10068.jpg
│ │ │-- ...
│
`-- test
│ │-- images
│ │ │-- 1005.jpg
│ │ │-- 10052.jpg
│ │ │-- ...~~~~
PoseTrack18¶
PoseTrack18 (CVPR'2018)
@inproceedings{andriluka2018posetrack,
title={Posetrack: A benchmark for human pose estimation and tracking},
author={Andriluka, Mykhaylo and Iqbal, Umar and Insafutdinov, Eldar and Pishchulin, Leonid and Milan, Anton and Gall, Juergen and Schiele, Bernt},
booktitle={Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition},
pages={5167--5176},
year={2018}
}
请从此链接 PoseTrack18下载数据文件,并从此链接下载 posetrack18_annotations下载标注文件。 我们已将官方提供的所有单视频标注文件合并为两个 json 文件 (posetrack18_train & posetrack18_val.json),并生成了 mask files 来加速训练。 对于 top-down 的方法, 我们使用 MMDetection 的预训练 Cascade R-CNN (X-101-64x4d-FPN) 来生成人体的检测框。 请将数据置于 $MMPOSE/data 目录下,并整理成如下的格式:
mmpose
├── mmpose
├── docs
├── tests
├── tools
├── configs
`── data
│── posetrack18
│-- annotations
│ │-- posetrack18_train.json
│ │-- posetrack18_val.json
│ │-- posetrack18_val_human_detections.json
│ │-- train
│ │ │-- 000001_bonn_train.json
│ │ │-- 000002_bonn_train.json
│ │ │-- ...
│ │-- val
│ │ │-- 000342_mpii_test.json
│ │ │-- 000522_mpii_test.json
│ │ │-- ...
│ `-- test
│ │-- 000001_mpiinew_test.json
│ │-- 000002_mpiinew_test.json
│ │-- ...
│
`-- images
│ │-- train
│ │ │-- 000001_bonn_train
│ │ │ │-- 000000.jpg
│ │ │ │-- 000001.jpg
│ │ │ │-- ...
│ │ │-- ...
│ │-- val
│ │ │-- 000342_mpii_test
│ │ │ │-- 000000.jpg
│ │ │ │-- 000001.jpg
│ │ │ │-- ...
│ │ │-- ...
│ `-- test
│ │-- 000001_mpiinew_test
│ │ │-- 000000.jpg
│ │ │-- 000001.jpg
│ │ │-- ...
│ │-- ...
`-- mask
│-- train
│ │-- 000002_bonn_train
│ │ │-- 000000.jpg
│ │ │-- 000001.jpg
│ │ │-- ...
│ │-- ...
`-- val
│-- 000522_mpii_test
│ │-- 000000.jpg
│ │-- 000001.jpg
│ │-- ...
│-- ...
请从 Github 上安装 PoseTrack 官方评估工具:
pip install git+https://github.com/svenkreiss/poseval.git
sub-JHMDB dataset¶
RSN (ECCV'2020)
@misc{cai2020learning,
title={Learning Delicate Local Representations for Multi-Person Pose Estimation},
author={Yuanhao Cai and Zhicheng Wang and Zhengxiong Luo and Binyi Yin and Angang Du and Haoqian Wang and Xinyu Zhou and Erjin Zhou and Xiangyu Zhang and Jian Sun},
year={2020},
eprint={2003.04030},
archivePrefix={arXiv},
primaryClass={cs.CV}
}
对于 sub-JHMDB 数据集,请从此链接 images (来自 JHMDB)下载, 请从此链接 jhmdb_annotations下载标注文件。 将它们移至 $MMPOSE/data目录下, 使得文件呈如下的格式:
mmpose
├── mmpose
├── docs
├── tests
├── tools
├── configs
`── data
│── jhmdb
│-- annotations
│ │-- Sub1_train.json
│ |-- Sub1_test.json
│ │-- Sub2_train.json
│ |-- Sub2_test.json
│ │-- Sub3_train.json
│ |-- Sub3_test.json
|-- Rename_Images
│-- brush_hair
│ │--April_09_brush_hair_u_nm_np1_ba_goo_0
| │ │--00001.png
| │ │--00002.png
│-- catch
│-- ...
2D全身人体关键点数据集¶
内容建设中……
2D人脸关键点数据集¶
内容建设中……
2D手部关键点数据集¶
内容建设中……
2D服装关键点数据集¶
内容建设中……
2D动物关键点数据集¶
内容建设中……
3D人体关键点数据集¶
内容建设中……
3D人体网格模型数据集¶
内容建设中……
3D手部关键点数据集¶
内容建设中……
概览¶
模型权重文件数量: 291
配置文件数量: 307
论文数量: 70
ALGORITHM: 24
BACKBONE: 12
DATASET: 32
OTHERS: 2
已支持的数据集详细信息请见 数据集.
Animal¶
模型权重文件数量: 43
配置文件数量: 43
论文数量: 9
[ALGORITHM] Deep High-Resolution Representation Learning for Human Pose Estimation (Topdown Heatmap + Hrnet on Macaque ⇨, Topdown Heatmap + Hrnet on Horse10 ⇨, Topdown Heatmap + Hrnet on Atrw ⇨, Topdown Heatmap + Hrnet on Ap10k ⇨, Topdown Heatmap + Hrnet on Animalpose ⇨)
[ALGORITHM] Simple Baselines for Human Pose Estimation and Tracking (Topdown Heatmap + Resnet on Zebra ⇨, Topdown Heatmap + Resnet on Macaque ⇨, Topdown Heatmap + Resnet on Locust ⇨, Topdown Heatmap + Resnet on Horse10 ⇨, Topdown Heatmap + Resnet on Fly ⇨, Topdown Heatmap + Resnet on Atrw ⇨, Topdown Heatmap + Resnet on Ap10k ⇨, Topdown Heatmap + Resnet on Animalpose ⇨)
[DATASET] Ap-10k: A Benchmark for Animal Pose Estimation in the Wild (Topdown Heatmap + Resnet on Ap10k ⇨, Topdown Heatmap + Hrnet on Ap10k ⇨)
[DATASET] Atrw: A Benchmark for Amur Tiger Re-Identification in the Wild (Topdown Heatmap + Hrnet on Atrw ⇨, Topdown Heatmap + Resnet on Atrw ⇨)
[DATASET] Cross-Domain Adaptation for Animal Pose Estimation (Topdown Heatmap + Resnet on Animalpose ⇨, Topdown Heatmap + Hrnet on Animalpose ⇨)
[DATASET] Deepposekit, a Software Toolkit for Fast and Robust Animal Pose Estimation Using Deep Learning (Topdown Heatmap + Resnet on Zebra ⇨, Topdown Heatmap + Resnet on Locust ⇨)
[DATASET] Fast Animal Pose Estimation Using Deep Neural Networks (Topdown Heatmap + Resnet on Fly ⇨)
[DATASET] Macaquepose: A Novel ‘In the Wild’macaque Monkey Pose Dataset for Markerless Motion Capture (Topdown Heatmap + Hrnet on Macaque ⇨, Topdown Heatmap + Resnet on Macaque ⇨)
[DATASET] Pretraining Boosts Out-of-Domain Robustness for Pose Estimation (Topdown Heatmap + Hrnet on Horse10 ⇨, Topdown Heatmap + Resnet on Horse10 ⇨)
Body(2D,Kpt,Sview,Img)¶
模型权重文件数量: 161
配置文件数量: 166
论文数量: 38
[ALGORITHM] Associative Embedding: End-to-End Learning for Joint Detection and Grouping (Associative Embedding + Hrnet on MHP ⇨, Associative Embedding + Higherhrnet on Crowdpose ⇨, Associative Embedding + Hourglass + Ae on Coco ⇨, Associative Embedding + Hrnet + Udp on Coco ⇨, Associative Embedding + Higherhrnet on Coco ⇨, Associative Embedding + Hrnet on Coco ⇨, Associative Embedding + Higherhrnet + Udp on Coco ⇨, Associative Embedding + Resnet on Coco ⇨, Associative Embedding + Mobilenetv2 on Coco ⇨, Associative Embedding + Higherhrnet on Aic ⇨, Associative Embedding + Hrnet on Aic ⇨)
[ALGORITHM] Convolutional Pose Machines (Topdown Heatmap + CPM on Mpii ⇨, Topdown Heatmap + CPM on JHMDB ⇨, Topdown Heatmap + CPM on Coco ⇨)
[ALGORITHM] Deep High-Resolution Representation Learning for Human Pose Estimation (Topdown Heatmap + Hrnet on Posetrack18 ⇨, Topdown Heatmap + Hrnet on Ochuman ⇨, Topdown Heatmap + Hrnet on Mpii ⇨, Topdown Heatmap + Hrnet + Dark on Mpii ⇨, Associative Embedding + Hrnet on MHP ⇨, Topdown Heatmap + Hrnet on H36m ⇨, Topdown Heatmap + Hrnet on Crowdpose ⇨, Topdown Heatmap + Hrnet + Dark on Coco ⇨, Topdown Heatmap + Hrnet + Udp on Coco ⇨, Topdown Heatmap + Hrnet on Coco ⇨, Topdown Heatmap + Hrnet + Fp16 on Coco ⇨, Topdown Heatmap + Hrnet + Augmentation on Coco ⇨, Associative Embedding + Hrnet + Udp on Coco ⇨, Associative Embedding + Hrnet on Coco ⇨, Topdown Heatmap + Hrnet on Aic ⇨, Associative Embedding + Hrnet on Aic ⇨)
[ALGORITHM] Deeppose: Human Pose Estimation via Deep Neural Networks (Deeppose + Resnet on Mpii ⇨, Deeppose + Resnet on Coco ⇨)
[ALGORITHM] Distribution-Aware Coordinate Representation for Human Pose Estimation (Topdown Heatmap + Hrnet + Dark on Mpii ⇨, Topdown Heatmap + Hrnet + Dark on Coco ⇨, Topdown Heatmap + Resnet + Dark on Coco ⇨)
[ALGORITHM] Higherhrnet: Scale-Aware Representation Learning for Bottom-Up Human Pose Estimation (Associative Embedding + Higherhrnet on Crowdpose ⇨, Associative Embedding + Higherhrnet on Coco ⇨, Associative Embedding + Higherhrnet + Udp on Coco ⇨, Associative Embedding + Higherhrnet on Aic ⇨)
[ALGORITHM] Improving Convolutional Networks With Self-Calibrated Convolutions (Topdown Heatmap + Scnet on Mpii ⇨, Topdown Heatmap + Scnet on Coco ⇨)
[ALGORITHM] Learning Delicate Local Representations for Multi-Person Pose Estimation (Topdown Heatmap + RSN on Coco ⇨)
[ALGORITHM] Lite-Hrnet: A Lightweight High-Resolution Network (Topdown Heatmap + Litehrnet on Mpii ⇨, Topdown Heatmap + Litehrnet on Coco ⇨)
[ALGORITHM] Rethinking on Multi-Stage Networks for Human Pose Estimation (Topdown Heatmap + MSPN on Coco ⇨)
[ALGORITHM] Simple Baselines for Human Pose Estimation and Tracking (Topdown Heatmap + Resnet on Posetrack18 ⇨, Topdown Heatmap + Resnet on Ochuman ⇨, Topdown Heatmap + Resnet + Mpii on Mpii_trb ⇨, Topdown Heatmap + Resnet on Mpii ⇨, Topdown Heatmap + Resnet on MHP ⇨, Topdown Heatmap + Resnet on JHMDB ⇨, Topdown Heatmap + Resnet on Crowdpose ⇨, Topdown Heatmap + Resnet + Dark on Coco ⇨, Topdown Heatmap + Resnet on Coco ⇨, Topdown Heatmap + Resnet + Fp16 on Coco ⇨, Topdown Heatmap + Resnet on Aic ⇨)
[ALGORITHM] Stacked Hourglass Networks for Human Pose Estimation (Topdown Heatmap + Hourglass on Mpii ⇨, Topdown Heatmap + Hourglass on Coco ⇨)
[ALGORITHM] The Devil Is in the Details: Delving Into Unbiased Data Processing for Human Pose Estimation (Topdown Heatmap + Hrnet + Udp on Coco ⇨, Associative Embedding + Hrnet + Udp on Coco ⇨, Associative Embedding + Higherhrnet + Udp on Coco ⇨)
[ALGORITHM] Vipnas: Efficient Video Pose Estimation via Neural Architecture Search (Topdown Heatmap + Vipnas on Coco ⇨)
[BACKBONE] Aggregated Residual Transformations for Deep Neural Networks (Topdown Heatmap + Resnext on Mpii ⇨, Topdown Heatmap + Resnext on Coco ⇨)
[BACKBONE] Associative Embedding: End-to-End Learning for Joint Detection and Grouping (Associative Embedding + Hrnet on MHP ⇨, Associative Embedding + Higherhrnet on Crowdpose ⇨, Associative Embedding + Hourglass + Ae on Coco ⇨, Associative Embedding + Hrnet + Udp on Coco ⇨, Associative Embedding + Higherhrnet on Coco ⇨, Associative Embedding + Hrnet on Coco ⇨, Associative Embedding + Higherhrnet + Udp on Coco ⇨, Associative Embedding + Resnet on Coco ⇨, Associative Embedding + Mobilenetv2 on Coco ⇨, Associative Embedding + Higherhrnet on Aic ⇨, Associative Embedding + Hrnet on Aic ⇨)
[BACKBONE] Bag of Tricks for Image Classification With Convolutional Neural Networks (Topdown Heatmap + Resnetv1d on Mpii ⇨, Topdown Heatmap + Resnetv1d on Coco ⇨)
[BACKBONE] Deep High-Resolution Representation Learning for Human Pose Estimation (Topdown Heatmap + Hrnet on Posetrack18 ⇨, Topdown Heatmap + Hrnet on Ochuman ⇨, Topdown Heatmap + Hrnet on Mpii ⇨, Topdown Heatmap + Hrnet + Dark on Mpii ⇨, Associative Embedding + Hrnet on MHP ⇨, Topdown Heatmap + Hrnet on H36m ⇨, Topdown Heatmap + Hrnet on Crowdpose ⇨, Topdown Heatmap + Hrnet + Dark on Coco ⇨, Topdown Heatmap + Hrnet + Udp on Coco ⇨, Topdown Heatmap + Hrnet on Coco ⇨, Topdown Heatmap + Hrnet + Fp16 on Coco ⇨, Topdown Heatmap + Hrnet + Augmentation on Coco ⇨, Associative Embedding + Hrnet + Udp on Coco ⇨, Associative Embedding + Hrnet on Coco ⇨, Topdown Heatmap + Hrnet on Aic ⇨, Associative Embedding + Hrnet on Aic ⇨)
[BACKBONE] Deep Residual Learning for Image Recognition (Topdown Heatmap + Resnet on Posetrack18 ⇨, Topdown Heatmap + Resnet on Ochuman ⇨, Topdown Heatmap + Resnet + Mpii on Mpii_trb ⇨, Topdown Heatmap + Resnet on Mpii ⇨, Deeppose + Resnet on Mpii ⇨, Topdown Heatmap + Resnet on MHP ⇨, Topdown Heatmap + Resnet on JHMDB ⇨, Topdown Heatmap + Resnet on Crowdpose ⇨, Topdown Heatmap + Resnet + Dark on Coco ⇨, Topdown Heatmap + Resnet on Coco ⇨, Topdown Heatmap + Resnet + Fp16 on Coco ⇨, Deeppose + Resnet on Coco ⇨, Associative Embedding + Resnet on Coco ⇨, Topdown Heatmap + Resnet on Aic ⇨)
[BACKBONE] Imagenet Classification With Deep Convolutional Neural Networks (Topdown Heatmap + Alexnet on Coco ⇨)
[BACKBONE] Mobilenetv2: Inverted Residuals and Linear Bottlenecks (Topdown Heatmap + Mobilenetv2 on Mpii ⇨, Topdown Heatmap + Mobilenetv2 on Coco ⇨, Associative Embedding + Mobilenetv2 on Coco ⇨)
[BACKBONE] Resnest: Split-Attention Networks (Topdown Heatmap + Resnest on Coco ⇨)
[BACKBONE] Shufflenet V2: Practical Guidelines for Efficient CNN Architecture Design (Topdown Heatmap + Shufflenetv2 on Mpii ⇨, Topdown Heatmap + Shufflenetv2 on Coco ⇨)
[BACKBONE] Shufflenet: An Extremely Efficient Convolutional Neural Network for Mobile Devices (Topdown Heatmap + Shufflenetv1 on Mpii ⇨, Topdown Heatmap + Shufflenetv1 on Coco ⇨)
[BACKBONE] Squeeze-and-Excitation Networks (Topdown Heatmap + Seresnet on Mpii ⇨, Topdown Heatmap + Seresnet on Coco ⇨)
[BACKBONE] Very Deep Convolutional Networks for Large-Scale Image Recognition (Topdown Heatmap + VGG on Coco ⇨)
[DATASET] 2d Human Pose Estimation: New Benchmark and State of the Art Analysis (Topdown Heatmap + Hrnet on Mpii ⇨, Topdown Heatmap + Shufflenetv2 on Mpii ⇨, Topdown Heatmap + Litehrnet on Mpii ⇨, Topdown Heatmap + Resnext on Mpii ⇨, Topdown Heatmap + Hrnet + Dark on Mpii ⇨, Topdown Heatmap + Hourglass on Mpii ⇨, Topdown Heatmap + CPM on Mpii ⇨, Topdown Heatmap + Mobilenetv2 on Mpii ⇨, Topdown Heatmap + Shufflenetv1 on Mpii ⇨, Topdown Heatmap + Seresnet on Mpii ⇨, Topdown Heatmap + Resnetv1d on Mpii ⇨, Topdown Heatmap + Scnet on Mpii ⇨, Topdown Heatmap + Resnet on Mpii ⇨, Deeppose + Resnet on Mpii ⇨)
[DATASET] Ai Challenger: A Large-Scale Dataset for Going Deeper in Image Understanding (Topdown Heatmap + Resnet on Aic ⇨, Topdown Heatmap + Hrnet on Aic ⇨, Associative Embedding + Higherhrnet on Aic ⇨, Associative Embedding + Hrnet on Aic ⇨)
[DATASET] Crowdpose: Efficient Crowded Scenes Pose Estimation and a New Benchmark (Topdown Heatmap + Resnet on Crowdpose ⇨, Topdown Heatmap + Hrnet on Crowdpose ⇨, Associative Embedding + Higherhrnet on Crowdpose ⇨)
[DATASET] Human3.6m: Large Scale Datasets and Predictive Methods for 3d Human Sensing in Natural Environments (Topdown Heatmap + Hrnet on H36m ⇨)
[DATASET] Microsoft Coco: Common Objects in Context (Topdown Heatmap + Vipnas on Coco ⇨, Topdown Heatmap + Resnetv1d on Coco ⇨, Topdown Heatmap + Scnet on Coco ⇨, Topdown Heatmap + Hrnet + Dark on Coco ⇨, Topdown Heatmap + CPM on Coco ⇨, Topdown Heatmap + Shufflenetv1 on Coco ⇨, Topdown Heatmap + Seresnet on Coco ⇨, Topdown Heatmap + Alexnet on Coco ⇨, Topdown Heatmap + Hrnet + Udp on Coco ⇨, Topdown Heatmap + Resnet + Dark on Coco ⇨, Topdown Heatmap + VGG on Coco ⇨, Topdown Heatmap + MSPN on Coco ⇨, Topdown Heatmap + Resnext on Coco ⇨, Topdown Heatmap + Resnest on Coco ⇨, Topdown Heatmap + RSN on Coco ⇨, Topdown Heatmap + Hrnet on Coco ⇨, Topdown Heatmap + Resnet on Coco ⇨, Topdown Heatmap + Mobilenetv2 on Coco ⇨, Topdown Heatmap + Hrnet + Fp16 on Coco ⇨, Topdown Heatmap + Resnet + Fp16 on Coco ⇨, Topdown Heatmap + Hrnet + Augmentation on Coco ⇨, Topdown Heatmap + Hourglass on Coco ⇨, Topdown Heatmap + Litehrnet on Coco ⇨, Topdown Heatmap + Shufflenetv2 on Coco ⇨, Deeppose + Resnet on Coco ⇨, Associative Embedding + Hourglass + Ae on Coco ⇨, Associative Embedding + Hrnet + Udp on Coco ⇨, Associative Embedding + Higherhrnet on Coco ⇨, Associative Embedding + Hrnet on Coco ⇨, Associative Embedding + Higherhrnet + Udp on Coco ⇨, Associative Embedding + Resnet on Coco ⇨, Associative Embedding + Mobilenetv2 on Coco ⇨)
[DATASET] Pose2seg: Detection Free Human Instance Segmentation (Topdown Heatmap + Hrnet on Ochuman ⇨, Topdown Heatmap + Resnet on Ochuman ⇨)
[DATASET] Posetrack: A Benchmark for Human Pose Estimation and Tracking (Topdown Heatmap + Resnet on Posetrack18 ⇨, Topdown Heatmap + Hrnet on Posetrack18 ⇨)
[DATASET] Towards Understanding Action Recognition (Topdown Heatmap + Resnet on JHMDB ⇨, Topdown Heatmap + CPM on JHMDB ⇨)
[DATASET] Trb: A Novel Triplet Representation for Understanding 2d Human Body (Topdown Heatmap + Resnet + Mpii on Mpii_trb ⇨)
[DATASET] Understanding Humans in Crowded Scenes: Deep Nested Adversarial Learning and a New Benchmark for Multi-Human Parsing (Topdown Heatmap + Resnet on MHP ⇨, Associative Embedding + Hrnet on MHP ⇨)
[OTHERS] Albumentations: Fast and Flexible Image Augmentations (Topdown Heatmap + Hrnet + Augmentation on Coco ⇨)
[OTHERS] Mixed Precision Training (Topdown Heatmap + Hrnet + Fp16 on Coco ⇨, Topdown Heatmap + Resnet + Fp16 on Coco ⇨)
Body(2D,Kpt,Sview,Vid)¶
模型权重文件数量: 3
配置文件数量: 2
论文数量: 4
[ALGORITHM] Deep High-Resolution Representation Learning for Human Pose Estimation (Posewarper + Hrnet + Posetrack18 on Posetrack18 ⇨)
[ALGORITHM] Learning Temporal Pose Estimation From Sparsely Labeled Videos (Posewarper + Hrnet + Posetrack18 on Posetrack18 ⇨)
[DATASET] Microsoft Coco: Common Objects in Context (Posewarper + Hrnet + Posetrack18 on Posetrack18 ⇨)
[DATASET] Posetrack: A Benchmark for Human Pose Estimation and Tracking (Posewarper + Hrnet + Posetrack18 on Posetrack18 ⇨)
Body(3D,Kpt,Mview,Img)¶
模型权重文件数量: 1
配置文件数量: 1
论文数量: 2
[ALGORITHM] Voxelpose: Towards Multi-Camera 3d Human Pose Estimation in Wild Environment (Voxelpose + Voxelpose + Prn64x64x64 + Cpn80x80x20 + Panoptic on Panoptic ⇨)
[DATASET] Panoptic Studio: A Massively Multiview System for Social Motion Capture (Voxelpose + Voxelpose + Prn64x64x64 + Cpn80x80x20 + Panoptic on Panoptic ⇨)
Body(3D,Kpt,Sview,Img)¶
模型权重文件数量: 2
配置文件数量: 2
论文数量: 3
[ALGORITHM] A Simple Yet Effective Baseline for 3d Human Pose Estimation (Pose Lift + Simplebaseline3d on Mpi_inf_3dhp ⇨, Pose Lift + Simplebaseline3d on H36m ⇨)
[DATASET] Human3.6m: Large Scale Datasets and Predictive Methods for 3d Human Sensing in Natural Environments (Pose Lift + Simplebaseline3d on H36m ⇨)
[DATASET] Monocular 3d Human Pose Estimation in the Wild Using Improved CNN Supervision (Pose Lift + Simplebaseline3d on Mpi_inf_3dhp ⇨)
Body(3D,Kpt,Sview,Vid)¶
模型权重文件数量: 8
配置文件数量: 8
论文数量: 3
[ALGORITHM] 3d Human Pose Estimation in Video With Temporal Convolutions and Semi-Supervised Training (Video Pose Lift + Videopose3d on Mpi_inf_3dhp ⇨, Video Pose Lift + Videopose3d on H36m ⇨)
[DATASET] Human3.6m: Large Scale Datasets and Predictive Methods for 3d Human Sensing in Natural Environments (Video Pose Lift + Videopose3d on H36m ⇨)
[DATASET] Monocular 3d Human Pose Estimation in the Wild Using Improved CNN Supervision (Video Pose Lift + Videopose3d on Mpi_inf_3dhp ⇨)
Body(3D,Mesh,Sview,Img)¶
模型权重文件数量: 1
配置文件数量: 1
论文数量: 3
[ALGORITHM] End-to-End Recovery of Human Shape and Pose (HMR + Resnet on Mixed ⇨)
[BACKBONE] Deep Residual Learning for Image Recognition (HMR + Resnet on Mixed ⇨)
[DATASET] Human3.6m: Large Scale Datasets and Predictive Methods for 3d Human Sensing in Natural Environments (HMR + Resnet on Mixed ⇨)
Face¶
模型权重文件数量: 16
配置文件数量: 16
论文数量: 16
[ALGORITHM] Adaptive Wing Loss for Robust Face Alignment via Heatmap Regression (Topdown Heatmap + Hrnetv2 + Awing on WFLW ⇨)
[ALGORITHM] Deep High-Resolution Representation Learning for Visual Recognition (Topdown Heatmap + Hrnetv2 on WFLW ⇨, Topdown Heatmap + Hrnetv2 + Awing on WFLW ⇨, Topdown Heatmap + Hrnetv2 + Dark on WFLW ⇨, Topdown Heatmap + Hrnetv2 on Cofw ⇨, Topdown Heatmap + Hrnetv2 + Coco + Wholebody on Coco_wholebody_face ⇨, Topdown Heatmap + Hrnetv2 + Dark + Coco + Wholebody on Coco_wholebody_face ⇨, Topdown Heatmap + Hrnetv2 on Aflw ⇨, Topdown Heatmap + Hrnetv2 + Dark on Aflw ⇨, Topdown Heatmap + Hrnetv2 on 300w ⇨)
[ALGORITHM] Deeppose: Human Pose Estimation via Deep Neural Networks (Deeppose + Resnet + Wingloss on WFLW ⇨, Deeppose + Resnet on WFLW ⇨, Deeppose + Resnet + Softwingloss on WFLW ⇨)
[ALGORITHM] Distribution-Aware Coordinate Representation for Human Pose Estimation (Topdown Heatmap + Hrnetv2 + Dark on WFLW ⇨, Topdown Heatmap + Hrnetv2 + Dark + Coco + Wholebody on Coco_wholebody_face ⇨, Topdown Heatmap + Hrnetv2 + Dark on Aflw ⇨)
[ALGORITHM] Improving Convolutional Networks With Self-Calibrated Convolutions (Topdown Heatmap + Scnet + Coco + Wholebody on Coco_wholebody_face ⇨)
[ALGORITHM] Simple Baselines for Human Pose Estimation and Tracking (Topdown Heatmap + Resnet + Coco + Wholebody on Coco_wholebody_face ⇨)
[ALGORITHM] Stacked Hourglass Networks for Human Pose Estimation (Topdown Heatmap + Hourglass + Coco + Wholebody on Coco_wholebody_face ⇨)
[ALGORITHM] Structure-Coherent Deep Feature Learning for Robust Face Alignment (Deeppose + Resnet + Softwingloss on WFLW ⇨)
[ALGORITHM] Wing Loss for Robust Facial Landmark Localisation With Convolutional Neural Networks (Deeppose + Resnet + Wingloss on WFLW ⇨)
[BACKBONE] Deep Residual Learning for Image Recognition (Deeppose + Resnet + Wingloss on WFLW ⇨, Deeppose + Resnet on WFLW ⇨, Deeppose + Resnet + Softwingloss on WFLW ⇨, Topdown Heatmap + Resnet + Coco + Wholebody on Coco_wholebody_face ⇨)
[BACKBONE] Mobilenetv2: Inverted Residuals and Linear Bottlenecks (Topdown Heatmap + Mobilenetv2 + Coco + Wholebody on Coco_wholebody_face ⇨)
[DATASET] 300 Faces in-the-Wild Challenge: Database and Results (Topdown Heatmap + Hrnetv2 on 300w ⇨)
[DATASET] Annotated Facial Landmarks in the Wild: A Large-Scale, Real-World Database for Facial Landmark Localization (Topdown Heatmap + Hrnetv2 on Aflw ⇨, Topdown Heatmap + Hrnetv2 + Dark on Aflw ⇨)
[DATASET] Look at Boundary: A Boundary-Aware Face Alignment Algorithm (Topdown Heatmap + Hrnetv2 on WFLW ⇨, Topdown Heatmap + Hrnetv2 + Awing on WFLW ⇨, Topdown Heatmap + Hrnetv2 + Dark on WFLW ⇨, Deeppose + Resnet + Wingloss on WFLW ⇨, Deeppose + Resnet on WFLW ⇨, Deeppose + Resnet + Softwingloss on WFLW ⇨)
[DATASET] Robust Face Landmark Estimation Under Occlusion (Topdown Heatmap + Hrnetv2 on Cofw ⇨)
[DATASET] Whole-Body Human Pose Estimation in the Wild (Topdown Heatmap + Scnet + Coco + Wholebody on Coco_wholebody_face ⇨, Topdown Heatmap + Hrnetv2 + Coco + Wholebody on Coco_wholebody_face ⇨, Topdown Heatmap + Resnet + Coco + Wholebody on Coco_wholebody_face ⇨, Topdown Heatmap + Hourglass + Coco + Wholebody on Coco_wholebody_face ⇨, Topdown Heatmap + Mobilenetv2 + Coco + Wholebody on Coco_wholebody_face ⇨, Topdown Heatmap + Hrnetv2 + Dark + Coco + Wholebody on Coco_wholebody_face ⇨)
Fashion¶
模型权重文件数量: 6
配置文件数量: 6
论文数量: 5
[ALGORITHM] Deeppose: Human Pose Estimation via Deep Neural Networks (Deeppose + Resnet on Deepfashion ⇨)
[ALGORITHM] Simple Baselines for Human Pose Estimation and Tracking (Topdown Heatmap + Resnet on Deepfashion ⇨)
[BACKBONE] Deep Residual Learning for Image Recognition (Topdown Heatmap + Resnet on Deepfashion ⇨, Deeppose + Resnet on Deepfashion ⇨)
[DATASET] Deepfashion: Powering Robust Clothes Recognition and Retrieval With Rich Annotations (Topdown Heatmap + Resnet on Deepfashion ⇨, Deeppose + Resnet on Deepfashion ⇨)
[DATASET] Fashion Landmark Detection in the Wild (Topdown Heatmap + Resnet on Deepfashion ⇨, Deeppose + Resnet on Deepfashion ⇨)
Hand(2D)¶
模型权重文件数量: 29
配置文件数量: 39
论文数量: 16
[ALGORITHM] Deep High-Resolution Representation Learning for Visual Recognition (Topdown Heatmap + Hrnetv2 + Dark on Rhd2d ⇨, Topdown Heatmap + Hrnetv2 + Udp on Rhd2d ⇨, Topdown Heatmap + Hrnetv2 on Rhd2d ⇨, Topdown Heatmap + Hrnetv2 + Dark on Panoptic2d ⇨, Topdown Heatmap + Hrnetv2 + Udp on Panoptic2d ⇨, Topdown Heatmap + Hrnetv2 on Panoptic2d ⇨, Topdown Heatmap + Hrnetv2 + Udp on Onehand10k ⇨, Topdown Heatmap + Hrnetv2 on Onehand10k ⇨, Topdown Heatmap + Hrnetv2 + Dark on Onehand10k ⇨, Topdown Heatmap + Hrnetv2 + Dark + Coco + Wholebody on Coco_wholebody_hand ⇨, Topdown Heatmap + Hrnetv2 + Coco + Wholebody on Coco_wholebody_hand ⇨)
[ALGORITHM] Deeppose: Human Pose Estimation via Deep Neural Networks (Deeppose + Resnet on Rhd2d ⇨, Deeppose + Resnet on Panoptic2d ⇨, Deeppose + Resnet on Onehand10k ⇨)
[ALGORITHM] Distribution-Aware Coordinate Representation for Human Pose Estimation (Topdown Heatmap + Hrnetv2 + Dark on Rhd2d ⇨, Topdown Heatmap + Hrnetv2 + Dark on Panoptic2d ⇨, Topdown Heatmap + Hrnetv2 + Dark on Onehand10k ⇨, Topdown Heatmap + Hrnetv2 + Dark + Coco + Wholebody on Coco_wholebody_hand ⇨)
[ALGORITHM] Improving Convolutional Networks With Self-Calibrated Convolutions (Topdown Heatmap + Scnet + Coco + Wholebody on Coco_wholebody_hand ⇨)
[ALGORITHM] Lite-Hrnet: A Lightweight High-Resolution Network (Topdown Heatmap + Litehrnet + Coco + Wholebody on Coco_wholebody_hand ⇨)
[ALGORITHM] Simple Baselines for Human Pose Estimation and Tracking (Topdown Heatmap + Resnet on Rhd2d ⇨, Topdown Heatmap + Resnet on Panoptic2d ⇨, Topdown Heatmap + Resnet on Onehand10k ⇨, Topdown Heatmap + Resnet on Interhand2d ⇨, Topdown Heatmap + Resnet on Freihand2d ⇨, Topdown Heatmap + Resnet + Coco + Wholebody on Coco_wholebody_hand ⇨)
[ALGORITHM] Stacked Hourglass Networks for Human Pose Estimation (Topdown Heatmap + Hourglass + Coco + Wholebody on Coco_wholebody_hand ⇨)
[ALGORITHM] The Devil Is in the Details: Delving Into Unbiased Data Processing for Human Pose Estimation (Topdown Heatmap + Hrnetv2 + Udp on Rhd2d ⇨, Topdown Heatmap + Hrnetv2 + Udp on Panoptic2d ⇨, Topdown Heatmap + Hrnetv2 + Udp on Onehand10k ⇨)
[BACKBONE] Deep Residual Learning for Image Recognition (Topdown Heatmap + Resnet on Rhd2d ⇨, Deeppose + Resnet on Rhd2d ⇨, Topdown Heatmap + Resnet on Panoptic2d ⇨, Deeppose + Resnet on Panoptic2d ⇨, Topdown Heatmap + Resnet on Onehand10k ⇨, Deeppose + Resnet on Onehand10k ⇨, Topdown Heatmap + Resnet on Interhand2d ⇨, Topdown Heatmap + Resnet on Freihand2d ⇨, Topdown Heatmap + Resnet + Coco + Wholebody on Coco_wholebody_hand ⇨)
[BACKBONE] Mobilenetv2: Inverted Residuals and Linear Bottlenecks (Topdown Heatmap + Mobilenetv2 on Rhd2d ⇨, Topdown Heatmap + Mobilenetv2 on Panoptic2d ⇨, Topdown Heatmap + Mobilenetv2 on Onehand10k ⇨, Topdown Heatmap + Mobilenetv2 + Coco + Wholebody on Coco_wholebody_hand ⇨)
[DATASET] Freihand: A Dataset for Markerless Capture of Hand Pose and Shape From Single RGB Images (Topdown Heatmap + Resnet on Freihand2d ⇨)
[DATASET] Hand Keypoint Detection in Single Images Using Multiview Bootstrapping (Topdown Heatmap + Hrnetv2 + Dark on Panoptic2d ⇨, Topdown Heatmap + Hrnetv2 + Udp on Panoptic2d ⇨, Topdown Heatmap + Resnet on Panoptic2d ⇨, Topdown Heatmap + Mobilenetv2 on Panoptic2d ⇨, Topdown Heatmap + Hrnetv2 on Panoptic2d ⇨, Deeppose + Resnet on Panoptic2d ⇨)
[DATASET] Interhand2.6m: A Dataset and Baseline for 3d Interacting Hand Pose Estimation From a Single RGB Image (Topdown Heatmap + Resnet on Interhand2d ⇨)
[DATASET] Learning to Estimate 3d Hand Pose From Single RGB Images (Topdown Heatmap + Hrnetv2 + Dark on Rhd2d ⇨, Topdown Heatmap + Hrnetv2 + Udp on Rhd2d ⇨, Topdown Heatmap + Hrnetv2 on Rhd2d ⇨, Topdown Heatmap + Resnet on Rhd2d ⇨, Topdown Heatmap + Mobilenetv2 on Rhd2d ⇨, Deeppose + Resnet on Rhd2d ⇨)
[DATASET] Mask-Pose Cascaded CNN for 2d Hand Pose Estimation From Single Color Image (Topdown Heatmap + Resnet on Onehand10k ⇨, Topdown Heatmap + Mobilenetv2 on Onehand10k ⇨, Topdown Heatmap + Hrnetv2 + Udp on Onehand10k ⇨, Topdown Heatmap + Hrnetv2 on Onehand10k ⇨, Topdown Heatmap + Hrnetv2 + Dark on Onehand10k ⇨, Deeppose + Resnet on Onehand10k ⇨)
[DATASET] Whole-Body Human Pose Estimation in the Wild (Topdown Heatmap + Hrnetv2 + Dark + Coco + Wholebody on Coco_wholebody_hand ⇨, Topdown Heatmap + Scnet + Coco + Wholebody on Coco_wholebody_hand ⇨, Topdown Heatmap + Litehrnet + Coco + Wholebody on Coco_wholebody_hand ⇨, Topdown Heatmap + Resnet + Coco + Wholebody on Coco_wholebody_hand ⇨, Topdown Heatmap + Mobilenetv2 + Coco + Wholebody on Coco_wholebody_hand ⇨, Topdown Heatmap + Hourglass + Coco + Wholebody on Coco_wholebody_hand ⇨, Topdown Heatmap + Hrnetv2 + Coco + Wholebody on Coco_wholebody_hand ⇨)
Hand(3D)¶
模型权重文件数量: 1
配置文件数量: 2
论文数量: 3
[ALGORITHM] Interhand2.6m: A Dataset and Baseline for 3d Interacting Hand Pose Estimation From a Single RGB Image (Internet + Internet on Interhand3d ⇨)
[BACKBONE] Deep Residual Learning for Image Recognition (Internet + Internet on Interhand3d ⇨)
[DATASET] Interhand2.6m: A Dataset and Baseline for 3d Interacting Hand Pose Estimation From a Single RGB Image (Internet + Internet on Interhand3d ⇨)
Wholebody¶
模型权重文件数量: 21
配置文件数量: 21
论文数量: 8
[ALGORITHM] Associative Embedding: End-to-End Learning for Joint Detection and Grouping (Associative Embedding + Higherhrnet on Coco-Wholebody ⇨, Associative Embedding + Hrnet on Coco-Wholebody ⇨)
[ALGORITHM] Deep High-Resolution Representation Learning for Human Pose Estimation (Topdown Heatmap + Hrnet + Dark on Halpe ⇨, Topdown Heatmap + Hrnet on Coco-Wholebody ⇨, Topdown Heatmap + Hrnet + Dark on Coco-Wholebody ⇨, Associative Embedding + Hrnet on Coco-Wholebody ⇨)
[ALGORITHM] Distribution-Aware Coordinate Representation for Human Pose Estimation (Topdown Heatmap + Hrnet + Dark on Halpe ⇨, Topdown Heatmap + Hrnet + Dark on Coco-Wholebody ⇨, Topdown Heatmap + Vipnas + Dark on Coco-Wholebody ⇨)
[ALGORITHM] Higherhrnet: Scale-Aware Representation Learning for Bottom-Up Human Pose Estimation (Associative Embedding + Higherhrnet on Coco-Wholebody ⇨)
[ALGORITHM] Simple Baselines for Human Pose Estimation and Tracking (Topdown Heatmap + Resnet on Coco-Wholebody ⇨)
[ALGORITHM] Vipnas: Efficient Video Pose Estimation via Neural Architecture Search (Topdown Heatmap + Vipnas on Coco-Wholebody ⇨, Topdown Heatmap + Vipnas + Dark on Coco-Wholebody ⇨)
[DATASET] Pastanet: Toward Human Activity Knowledge Engine (Topdown Heatmap + Hrnet + Dark on Halpe ⇨)
[DATASET] Whole-Body Human Pose Estimation in the Wild (Topdown Heatmap + Vipnas on Coco-Wholebody ⇨, Topdown Heatmap + Resnet on Coco-Wholebody ⇨, Topdown Heatmap + Hrnet on Coco-Wholebody ⇨, Topdown Heatmap + Hrnet + Dark on Coco-Wholebody ⇨, Topdown Heatmap + Vipnas + Dark on Coco-Wholebody ⇨, Associative Embedding + Higherhrnet on Coco-Wholebody ⇨, Associative Embedding + Hrnet on Coco-Wholebody ⇨)
Animal¶
Animalpose Dataset¶
Topdown Heatmap + Hrnet on Animalpose¶
HRNet (CVPR'2019)
@inproceedings{sun2019deep,
title={Deep high-resolution representation learning for human pose estimation},
author={Sun, Ke and Xiao, Bin and Liu, Dong and Wang, Jingdong},
booktitle={Proceedings of the IEEE conference on computer vision and pattern recognition},
pages={5693--5703},
year={2019}
}
Animal-Pose (ICCV'2019)
@InProceedings{Cao_2019_ICCV,
author = {Cao, Jinkun and Tang, Hongyang and Fang, Hao-Shu and Shen, Xiaoyong and Lu, Cewu and Tai, Yu-Wing},
title = {Cross-Domain Adaptation for Animal Pose Estimation},
booktitle = {The IEEE International Conference on Computer Vision (ICCV)},
month = {October},
year = {2019}
}
Results on AnimalPose validation set (1117 instances)
Arch | Input Size | AP | AP50 | AP75 | AR | AR50 | ckpt | log |
---|---|---|---|---|---|---|---|---|
pose_hrnet_w32 | 256x256 | 0.736 | 0.959 | 0.832 | 0.775 | 0.966 | ckpt | log |
pose_hrnet_w48 | 256x256 | 0.737 | 0.959 | 0.823 | 0.778 | 0.962 | ckpt | log |
Topdown Heatmap + Resnet on Animalpose¶
SimpleBaseline2D (ECCV'2018)
@inproceedings{xiao2018simple,
title={Simple baselines for human pose estimation and tracking},
author={Xiao, Bin and Wu, Haiping and Wei, Yichen},
booktitle={Proceedings of the European conference on computer vision (ECCV)},
pages={466--481},
year={2018}
}
Animal-Pose (ICCV'2019)
@InProceedings{Cao_2019_ICCV,
author = {Cao, Jinkun and Tang, Hongyang and Fang, Hao-Shu and Shen, Xiaoyong and Lu, Cewu and Tai, Yu-Wing},
title = {Cross-Domain Adaptation for Animal Pose Estimation},
booktitle = {The IEEE International Conference on Computer Vision (ICCV)},
month = {October},
year = {2019}
}
Results on AnimalPose validation set (1117 instances)
Arch | Input Size | AP | AP50 | AP75 | AR | AR50 | ckpt | log |
---|---|---|---|---|---|---|---|---|
pose_resnet_50 | 256x256 | 0.688 | 0.945 | 0.772 | 0.733 | 0.952 | ckpt | log |
pose_resnet_101 | 256x256 | 0.696 | 0.948 | 0.785 | 0.737 | 0.954 | ckpt | log |
pose_resnet_152 | 256x256 | 0.709 | 0.948 | 0.797 | 0.749 | 0.951 | ckpt | log |
Ap10k Dataset¶
Topdown Heatmap + Hrnet on Ap10k¶
HRNet (CVPR'2019)
@inproceedings{sun2019deep,
title={Deep high-resolution representation learning for human pose estimation},
author={Sun, Ke and Xiao, Bin and Liu, Dong and Wang, Jingdong},
booktitle={Proceedings of the IEEE conference on computer vision and pattern recognition},
pages={5693--5703},
year={2019}
}
AP-10K (NeurIPS'2021)
@misc{yu2021ap10k,
title={AP-10K: A Benchmark for Animal Pose Estimation in the Wild},
author={Hang Yu and Yufei Xu and Jing Zhang and Wei Zhao and Ziyu Guan and Dacheng Tao},
year={2021},
eprint={2108.12617},
archivePrefix={arXiv},
primaryClass={cs.CV}
}
Results on AP-10K validation set
Arch | Input Size | AP | AP50 | AP75 | APM | APL | ckpt | log |
---|---|---|---|---|---|---|---|---|
pose_hrnet_w32 | 256x256 | 0.738 | 0.958 | 0.808 | 0.592 | 0.743 | ckpt | log |
pose_hrnet_w48 | 256x256 | 0.744 | 0.959 | 0.807 | 0.589 | 0.748 | ckpt | log |
Topdown Heatmap + Resnet on Ap10k¶
SimpleBaseline2D (ECCV'2018)
@inproceedings{xiao2018simple,
title={Simple baselines for human pose estimation and tracking},
author={Xiao, Bin and Wu, Haiping and Wei, Yichen},
booktitle={Proceedings of the European conference on computer vision (ECCV)},
pages={466--481},
year={2018}
}
AP-10K (NeurIPS'2021)
@misc{yu2021ap10k,
title={AP-10K: A Benchmark for Animal Pose Estimation in the Wild},
author={Hang Yu and Yufei Xu and Jing Zhang and Wei Zhao and Ziyu Guan and Dacheng Tao},
year={2021},
eprint={2108.12617},
archivePrefix={arXiv},
primaryClass={cs.CV}
}
Results on AP-10K validation set
Arch | Input Size | AP | AP50 | AP75 | APM | APL | ckpt | log |
---|---|---|---|---|---|---|---|---|
pose_resnet_50 | 256x256 | 0.699 | 0.940 | 0.760 | 0.570 | 0.703 | ckpt | log |
pose_resnet_101 | 256x256 | 0.698 | 0.943 | 0.754 | 0.543 | 0.702 | ckpt | log |
Atrw Dataset¶
Topdown Heatmap + Resnet on Atrw¶
SimpleBaseline2D (ECCV'2018)
@inproceedings{xiao2018simple,
title={Simple baselines for human pose estimation and tracking},
author={Xiao, Bin and Wu, Haiping and Wei, Yichen},
booktitle={Proceedings of the European conference on computer vision (ECCV)},
pages={466--481},
year={2018}
}
ATRW (ACM MM'2020)
@inproceedings{li2020atrw,
title={ATRW: A Benchmark for Amur Tiger Re-identification in the Wild},
author={Li, Shuyuan and Li, Jianguo and Tang, Hanlin and Qian, Rui and Lin, Weiyao},
booktitle={Proceedings of the 28th ACM International Conference on Multimedia},
pages={2590--2598},
year={2020}
}
Results on ATRW validation set
Arch | Input Size | AP | AP50 | AP75 | AR | AR50 | ckpt | log |
---|---|---|---|---|---|---|---|---|
pose_resnet_50 | 256x256 | 0.900 | 0.973 | 0.932 | 0.929 | 0.985 | ckpt | log |
pose_resnet_101 | 256x256 | 0.898 | 0.973 | 0.936 | 0.927 | 0.985 | ckpt | log |
pose_resnet_152 | 256x256 | 0.896 | 0.973 | 0.931 | 0.927 | 0.985 | ckpt | log |
Topdown Heatmap + Hrnet on Atrw¶
HRNet (CVPR'2019)
@inproceedings{sun2019deep,
title={Deep high-resolution representation learning for human pose estimation},
author={Sun, Ke and Xiao, Bin and Liu, Dong and Wang, Jingdong},
booktitle={Proceedings of the IEEE conference on computer vision and pattern recognition},
pages={5693--5703},
year={2019}
}
ATRW (ACM MM'2020)
@inproceedings{li2020atrw,
title={ATRW: A Benchmark for Amur Tiger Re-identification in the Wild},
author={Li, Shuyuan and Li, Jianguo and Tang, Hanlin and Qian, Rui and Lin, Weiyao},
booktitle={Proceedings of the 28th ACM International Conference on Multimedia},
pages={2590--2598},
year={2020}
}
Results on ATRW validation set
Arch | Input Size | AP | AP50 | AP75 | AR | AR50 | ckpt | log |
---|---|---|---|---|---|---|---|---|
pose_hrnet_w32 | 256x256 | 0.912 | 0.973 | 0.959 | 0.938 | 0.985 | ckpt | log |
pose_hrnet_w48 | 256x256 | 0.911 | 0.972 | 0.946 | 0.937 | 0.985 | ckpt | log |
Fly Dataset¶
Topdown Heatmap + Resnet on Fly¶
SimpleBaseline2D (ECCV'2018)
@inproceedings{xiao2018simple,
title={Simple baselines for human pose estimation and tracking},
author={Xiao, Bin and Wu, Haiping and Wei, Yichen},
booktitle={Proceedings of the European conference on computer vision (ECCV)},
pages={466--481},
year={2018}
}
Vinegar Fly (Nature Methods'2019)
@article{pereira2019fast,
title={Fast animal pose estimation using deep neural networks},
author={Pereira, Talmo D and Aldarondo, Diego E and Willmore, Lindsay and Kislin, Mikhail and Wang, Samuel S-H and Murthy, Mala and Shaevitz, Joshua W},
journal={Nature methods},
volume={16},
number={1},
pages={117--125},
year={2019},
publisher={Nature Publishing Group}
}
Results on Vinegar Fly test set
Arch | Input Size | PCK@0.2 | AUC | EPE | ckpt | log |
---|---|---|---|---|---|---|
pose_resnet_50 | 192x192 | 0.996 | 0.910 | 2.00 | ckpt | log |
pose_resnet_101 | 192x192 | 0.996 | 0.912 | 1.95 | ckpt | log |
pose_resnet_152 | 192x192 | 0.997 | 0.917 | 1.78 | ckpt | log |
Horse10 Dataset¶
Topdown Heatmap + Resnet on Horse10¶
SimpleBaseline2D (ECCV'2018)
@inproceedings{xiao2018simple,
title={Simple baselines for human pose estimation and tracking},
author={Xiao, Bin and Wu, Haiping and Wei, Yichen},
booktitle={Proceedings of the European conference on computer vision (ECCV)},
pages={466--481},
year={2018}
}
Horse-10 (WACV'2021)
@inproceedings{mathis2021pretraining,
title={Pretraining boosts out-of-domain robustness for pose estimation},
author={Mathis, Alexander and Biasi, Thomas and Schneider, Steffen and Yuksekgonul, Mert and Rogers, Byron and Bethge, Matthias and Mathis, Mackenzie W},
booktitle={Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision},
pages={1859--1868},
year={2021}
}
Results on Horse-10 test set
Set | Arch | Input Size | PCK@0.3 | NME | ckpt | log |
---|---|---|---|---|---|---|
split1 | pose_resnet_50 | 256x256 | 0.956 | 0.113 | ckpt | log |
split2 | pose_resnet_50 | 256x256 | 0.954 | 0.111 | ckpt | log |
split3 | pose_resnet_50 | 256x256 | 0.946 | 0.129 | ckpt | log |
split1 | pose_resnet_101 | 256x256 | 0.958 | 0.115 | ckpt | log |
split2 | pose_resnet_101 | 256x256 | 0.955 | 0.115 | ckpt | log |
split3 | pose_resnet_101 | 256x256 | 0.946 | 0.126 | ckpt | log |
split1 | pose_resnet_152 | 256x256 | 0.969 | 0.105 | ckpt | log |
split2 | pose_resnet_152 | 256x256 | 0.970 | 0.103 | ckpt | log |
split3 | pose_resnet_152 | 256x256 | 0.957 | 0.131 | ckpt | log |
Topdown Heatmap + Hrnet on Horse10¶
HRNet (CVPR'2019)
@inproceedings{sun2019deep,
title={Deep high-resolution representation learning for human pose estimation},
author={Sun, Ke and Xiao, Bin and Liu, Dong and Wang, Jingdong},
booktitle={Proceedings of the IEEE conference on computer vision and pattern recognition},
pages={5693--5703},
year={2019}
}
Horse-10 (WACV'2021)
@inproceedings{mathis2021pretraining,
title={Pretraining boosts out-of-domain robustness for pose estimation},
author={Mathis, Alexander and Biasi, Thomas and Schneider, Steffen and Yuksekgonul, Mert and Rogers, Byron and Bethge, Matthias and Mathis, Mackenzie W},
booktitle={Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision},
pages={1859--1868},
year={2021}
}
Results on Horse-10 test set
Set | Arch | Input Size | PCK@0.3 | NME | ckpt | log |
---|---|---|---|---|---|---|
split1 | pose_hrnet_w32 | 256x256 | 0.951 | 0.122 | ckpt | log |
split2 | pose_hrnet_w32 | 256x256 | 0.949 | 0.116 | ckpt | log |
split3 | pose_hrnet_w32 | 256x256 | 0.939 | 0.153 | ckpt | log |
split1 | pose_hrnet_w48 | 256x256 | 0.973 | 0.095 | ckpt | log |
split2 | pose_hrnet_w48 | 256x256 | 0.969 | 0.101 | ckpt | log |
split3 | pose_hrnet_w48 | 256x256 | 0.961 | 0.128 | ckpt | log |
Locust Dataset¶
Topdown Heatmap + Resnet on Locust¶
SimpleBaseline2D (ECCV'2018)
@inproceedings{xiao2018simple,
title={Simple baselines for human pose estimation and tracking},
author={Xiao, Bin and Wu, Haiping and Wei, Yichen},
booktitle={Proceedings of the European conference on computer vision (ECCV)},
pages={466--481},
year={2018}
}
Desert Locust (Elife'2019)
@article{graving2019deepposekit,
title={DeepPoseKit, a software toolkit for fast and robust animal pose estimation using deep learning},
author={Graving, Jacob M and Chae, Daniel and Naik, Hemal and Li, Liang and Koger, Benjamin and Costelloe, Blair R and Couzin, Iain D},
journal={Elife},
volume={8},
pages={e47994},
year={2019},
publisher={eLife Sciences Publications Limited}
}
Results on Desert Locust test set
Arch | Input Size | PCK@0.2 | AUC | EPE | ckpt | log |
---|---|---|---|---|---|---|
pose_resnet_50 | 160x160 | 0.999 | 0.899 | 2.27 | ckpt | log |
pose_resnet_101 | 160x160 | 0.999 | 0.907 | 2.03 | ckpt | log |
pose_resnet_152 | 160x160 | 1.000 | 0.926 | 1.48 | ckpt | log |
Macaque Dataset¶
Topdown Heatmap + Resnet on Macaque¶
SimpleBaseline2D (ECCV'2018)
@inproceedings{xiao2018simple,
title={Simple baselines for human pose estimation and tracking},
author={Xiao, Bin and Wu, Haiping and Wei, Yichen},
booktitle={Proceedings of the European conference on computer vision (ECCV)},
pages={466--481},
year={2018}
}
MacaquePose (bioRxiv'2020)
@article{labuguen2020macaquepose,
title={MacaquePose: A novel ‘in the wild’macaque monkey pose dataset for markerless motion capture},
author={Labuguen, Rollyn and Matsumoto, Jumpei and Negrete, Salvador and Nishimaru, Hiroshi and Nishijo, Hisao and Takada, Masahiko and Go, Yasuhiro and Inoue, Ken-ichi and Shibata, Tomohiro},
journal={bioRxiv},
year={2020},
publisher={Cold Spring Harbor Laboratory}
}
Results on MacaquePose with ground-truth detection bounding boxes
Arch | Input Size | AP | AP50 | AP75 | AR | AR50 | ckpt | log |
---|---|---|---|---|---|---|---|---|
pose_resnet_50 | 256x192 | 0.799 | 0.952 | 0.919 | 0.837 | 0.964 | ckpt | log |
pose_resnet_101 | 256x192 | 0.790 | 0.953 | 0.908 | 0.828 | 0.967 | ckpt | log |
pose_resnet_152 | 256x192 | 0.794 | 0.951 | 0.915 | 0.834 | 0.968 | ckpt | log |
Topdown Heatmap + Hrnet on Macaque¶
HRNet (CVPR'2019)
@inproceedings{sun2019deep,
title={Deep high-resolution representation learning for human pose estimation},
author={Sun, Ke and Xiao, Bin and Liu, Dong and Wang, Jingdong},
booktitle={Proceedings of the IEEE conference on computer vision and pattern recognition},
pages={5693--5703},
year={2019}
}
MacaquePose (bioRxiv'2020)
@article{labuguen2020macaquepose,
title={MacaquePose: A novel ‘in the wild’macaque monkey pose dataset for markerless motion capture},
author={Labuguen, Rollyn and Matsumoto, Jumpei and Negrete, Salvador and Nishimaru, Hiroshi and Nishijo, Hisao and Takada, Masahiko and Go, Yasuhiro and Inoue, Ken-ichi and Shibata, Tomohiro},
journal={bioRxiv},
year={2020},
publisher={Cold Spring Harbor Laboratory}
}
Results on MacaquePose with ground-truth detection bounding boxes
Arch | Input Size | AP | AP50 | AP75 | AR | AR50 | ckpt | log |
---|---|---|---|---|---|---|---|---|
pose_hrnet_w32 | 256x192 | 0.814 | 0.953 | 0.918 | 0.851 | 0.969 | ckpt | log |
pose_hrnet_w48 | 256x192 | 0.818 | 0.963 | 0.917 | 0.855 | 0.971 | ckpt | log |
Zebra Dataset¶
Topdown Heatmap + Resnet on Zebra¶
SimpleBaseline2D (ECCV'2018)
@inproceedings{xiao2018simple,
title={Simple baselines for human pose estimation and tracking},
author={Xiao, Bin and Wu, Haiping and Wei, Yichen},
booktitle={Proceedings of the European conference on computer vision (ECCV)},
pages={466--481},
year={2018}
}
Grévy’s Zebra (Elife'2019)
@article{graving2019deepposekit,
title={DeepPoseKit, a software toolkit for fast and robust animal pose estimation using deep learning},
author={Graving, Jacob M and Chae, Daniel and Naik, Hemal and Li, Liang and Koger, Benjamin and Costelloe, Blair R and Couzin, Iain D},
journal={Elife},
volume={8},
pages={e47994},
year={2019},
publisher={eLife Sciences Publications Limited}
}
Results on Grévy’s Zebra test set
Arch | Input Size | PCK@0.2 | AUC | EPE | ckpt | log |
---|---|---|---|---|---|---|
pose_resnet_50 | 160x160 | 1.000 | 0.914 | 1.86 | ckpt | log |
pose_resnet_101 | 160x160 | 1.000 | 0.916 | 1.82 | ckpt | log |
pose_resnet_152 | 160x160 | 1.000 | 0.921 | 1.66 | ckpt | log |
Body(2D,Kpt,Sview,Img)¶
Aic Dataset¶
Associative Embedding + Hrnet on Aic¶
Associative Embedding (NIPS'2017)
@inproceedings{newell2017associative,
title={Associative embedding: End-to-end learning for joint detection and grouping},
author={Newell, Alejandro and Huang, Zhiao and Deng, Jia},
booktitle={Advances in neural information processing systems},
pages={2277--2287},
year={2017}
}
HRNet (CVPR'2019)
@inproceedings{sun2019deep,
title={Deep high-resolution representation learning for human pose estimation},
author={Sun, Ke and Xiao, Bin and Liu, Dong and Wang, Jingdong},
booktitle={Proceedings of the IEEE conference on computer vision and pattern recognition},
pages={5693--5703},
year={2019}
}
AI Challenger (ArXiv'2017)
@article{wu2017ai,
title={Ai challenger: A large-scale dataset for going deeper in image understanding},
author={Wu, Jiahong and Zheng, He and Zhao, Bo and Li, Yixin and Yan, Baoming and Liang, Rui and Wang, Wenjia and Zhou, Shipei and Lin, Guosen and Fu, Yanwei and others},
journal={arXiv preprint arXiv:1711.06475},
year={2017}
}
Results on AIC validation set without multi-scale test
Arch | Input Size | AP | AP50 | AP75 | AR | AR50 | ckpt | log |
---|---|---|---|---|---|---|---|---|
HRNet-w32 | 512x512 | 0.303 | 0.697 | 0.225 | 0.373 | 0.755 | ckpt | log |
Results on AIC validation set with multi-scale test. 3 default scales ([2, 1, 0.5]) are used
Arch | Input Size | AP | AP50 | AP75 | AR | AR50 | ckpt | log |
---|---|---|---|---|---|---|---|---|
HRNet-w32 | 512x512 | 0.318 | 0.717 | 0.246 | 0.379 | 0.764 | ckpt | log |
Associative Embedding + Higherhrnet on Aic¶
Associative Embedding (NIPS'2017)
@inproceedings{newell2017associative,
title={Associative embedding: End-to-end learning for joint detection and grouping},
author={Newell, Alejandro and Huang, Zhiao and Deng, Jia},
booktitle={Advances in neural information processing systems},
pages={2277--2287},
year={2017}
}
HigherHRNet (CVPR'2020)
@inproceedings{cheng2020higherhrnet,
title={HigherHRNet: Scale-Aware Representation Learning for Bottom-Up Human Pose Estimation},
author={Cheng, Bowen and Xiao, Bin and Wang, Jingdong and Shi, Honghui and Huang, Thomas S and Zhang, Lei},
booktitle={Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition},
pages={5386--5395},
year={2020}
}
AI Challenger (ArXiv'2017)
@article{wu2017ai,
title={Ai challenger: A large-scale dataset for going deeper in image understanding},
author={Wu, Jiahong and Zheng, He and Zhao, Bo and Li, Yixin and Yan, Baoming and Liang, Rui and Wang, Wenjia and Zhou, Shipei and Lin, Guosen and Fu, Yanwei and others},
journal={arXiv preprint arXiv:1711.06475},
year={2017}
}
Results on AIC validation set without multi-scale test
Arch | Input Size | AP | AP50 | AP75 | AR | AR50 | ckpt | log |
---|---|---|---|---|---|---|---|---|
HigherHRNet-w32 | 512x512 | 0.315 | 0.710 | 0.243 | 0.379 | 0.757 | ckpt | log |
Results on AIC validation set with multi-scale test. 3 default scales ([2, 1, 0.5]) are used
Arch | Input Size | AP | AP50 | AP75 | AR | AR50 | ckpt | log |
---|---|---|---|---|---|---|---|---|
HigherHRNet-w32 | 512x512 | 0.323 | 0.718 | 0.254 | 0.379 | 0.758 | ckpt | log |
Topdown Heatmap + Hrnet on Aic¶
HRNet (CVPR'2019)
@inproceedings{sun2019deep,
title={Deep high-resolution representation learning for human pose estimation},
author={Sun, Ke and Xiao, Bin and Liu, Dong and Wang, Jingdong},
booktitle={Proceedings of the IEEE conference on computer vision and pattern recognition},
pages={5693--5703},
year={2019}
}
AI Challenger (ArXiv'2017)
@article{wu2017ai,
title={Ai challenger: A large-scale dataset for going deeper in image understanding},
author={Wu, Jiahong and Zheng, He and Zhao, Bo and Li, Yixin and Yan, Baoming and Liang, Rui and Wang, Wenjia and Zhou, Shipei and Lin, Guosen and Fu, Yanwei and others},
journal={arXiv preprint arXiv:1711.06475},
year={2017}
}
Results on AIC val set with ground-truth bounding boxes
Arch | Input Size | AP | AP50 | AP75 | AR | AR50 | ckpt | log |
---|---|---|---|---|---|---|---|---|
pose_hrnet_w32 | 256x192 | 0.323 | 0.762 | 0.219 | 0.366 | 0.789 | ckpt | log |
Topdown Heatmap + Resnet on Aic¶
SimpleBaseline2D (ECCV'2018)
@inproceedings{xiao2018simple,
title={Simple baselines for human pose estimation and tracking},
author={Xiao, Bin and Wu, Haiping and Wei, Yichen},
booktitle={Proceedings of the European conference on computer vision (ECCV)},
pages={466--481},
year={2018}
}
ResNet (CVPR'2016)
@inproceedings{he2016deep,
title={Deep residual learning for image recognition},
author={He, Kaiming and Zhang, Xiangyu and Ren, Shaoqing and Sun, Jian},
booktitle={Proceedings of the IEEE conference on computer vision and pattern recognition},
pages={770--778},
year={2016}
}
AI Challenger (ArXiv'2017)
@article{wu2017ai,
title={Ai challenger: A large-scale dataset for going deeper in image understanding},
author={Wu, Jiahong and Zheng, He and Zhao, Bo and Li, Yixin and Yan, Baoming and Liang, Rui and Wang, Wenjia and Zhou, Shipei and Lin, Guosen and Fu, Yanwei and others},
journal={arXiv preprint arXiv:1711.06475},
year={2017}
}
Results on AIC val set with ground-truth bounding boxes
Arch | Input Size | AP | AP50 | AP75 | AR | AR50 | ckpt | log |
---|---|---|---|---|---|---|---|---|
pose_resnet_101 | 256x192 | 0.294 | 0.736 | 0.174 | 0.337 | 0.763 | ckpt | log |
Coco Dataset¶
Associative Embedding + Mobilenetv2 on Coco¶
Associative Embedding (NIPS'2017)
@inproceedings{newell2017associative,
title={Associative embedding: End-to-end learning for joint detection and grouping},
author={Newell, Alejandro and Huang, Zhiao and Deng, Jia},
booktitle={Advances in neural information processing systems},
pages={2277--2287},
year={2017}
}
MobilenetV2 (CVPR'2018)
@inproceedings{sandler2018mobilenetv2,
title={Mobilenetv2: Inverted residuals and linear bottlenecks},
author={Sandler, Mark and Howard, Andrew and Zhu, Menglong and Zhmoginov, Andrey and Chen, Liang-Chieh},
booktitle={Proceedings of the IEEE conference on computer vision and pattern recognition},
pages={4510--4520},
year={2018}
}
COCO (ECCV'2014)
@inproceedings{lin2014microsoft,
title={Microsoft coco: Common objects in context},
author={Lin, Tsung-Yi and Maire, Michael and Belongie, Serge and Hays, James and Perona, Pietro and Ramanan, Deva and Doll{\'a}r, Piotr and Zitnick, C Lawrence},
booktitle={European conference on computer vision},
pages={740--755},
year={2014},
organization={Springer}
}
Results on COCO val2017 without multi-scale test
Arch | Input Size | AP | AP50 | AP75 | AR | AR50 | ckpt | log |
---|---|---|---|---|---|---|---|---|
pose_mobilenetv2 | 512x512 | 0.380 | 0.671 | 0.368 | 0.473 | 0.741 | ckpt | log |
Results on COCO val2017 with multi-scale test. 3 default scales ([2, 1, 0.5]) are used
Arch | Input Size | AP | AP50 | AP75 | AR | AR50 | ckpt | log |
---|---|---|---|---|---|---|---|---|
pose_mobilenetv2 | 512x512 | 0.442 | 0.696 | 0.422 | 0.517 | 0.766 | ckpt | log |
Associative Embedding + Resnet on Coco¶
Associative Embedding (NIPS'2017)
@inproceedings{newell2017associative,
title={Associative embedding: End-to-end learning for joint detection and grouping},
author={Newell, Alejandro and Huang, Zhiao and Deng, Jia},
booktitle={Advances in neural information processing systems},
pages={2277--2287},
year={2017}
}
ResNet (CVPR'2016)
@inproceedings{he2016deep,
title={Deep residual learning for image recognition},
author={He, Kaiming and Zhang, Xiangyu and Ren, Shaoqing and Sun, Jian},
booktitle={Proceedings of the IEEE conference on computer vision and pattern recognition},
pages={770--778},
year={2016}
}
COCO (ECCV'2014)
@inproceedings{lin2014microsoft,
title={Microsoft coco: Common objects in context},
author={Lin, Tsung-Yi and Maire, Michael and Belongie, Serge and Hays, James and Perona, Pietro and Ramanan, Deva and Doll{\'a}r, Piotr and Zitnick, C Lawrence},
booktitle={European conference on computer vision},
pages={740--755},
year={2014},
organization={Springer}
}
Results on COCO val2017 without multi-scale test
Arch | Input Size | AP | AP50 | AP75 | AR | AR50 | ckpt | log |
---|---|---|---|---|---|---|---|---|
pose_resnet_50 | 512x512 | 0.466 | 0.742 | 0.479 | 0.552 | 0.797 | ckpt | log |
pose_resnet_50 | 640x640 | 0.479 | 0.757 | 0.487 | 0.566 | 0.810 | ckpt | log |
pose_resnet_101 | 512x512 | 0.554 | 0.807 | 0.599 | 0.622 | 0.841 | ckpt | log |
pose_resnet_152 | 512x512 | 0.595 | 0.829 | 0.648 | 0.651 | 0.856 | ckpt | log |
Results on COCO val2017 with multi-scale test. 3 default scales ([2, 1, 0.5]) are used
Arch | Input Size | AP | AP50 | AP75 | AR | AR50 | ckpt | log |
---|---|---|---|---|---|---|---|---|
pose_resnet_50 | 512x512 | 0.503 | 0.765 | 0.521 | 0.591 | 0.821 | ckpt | log |
pose_resnet_50 | 640x640 | 0.525 | 0.784 | 0.542 | 0.610 | 0.832 | ckpt | log |
pose_resnet_101 | 512x512 | 0.603 | 0.831 | 0.641 | 0.668 | 0.870 | ckpt | log |
pose_resnet_152 | 512x512 | 0.660 | 0.860 | 0.713 | 0.709 | 0.889 | ckpt | log |
Associative Embedding + Higherhrnet + Udp on Coco¶
Associative Embedding (NIPS'2017)
@inproceedings{newell2017associative,
title={Associative embedding: End-to-end learning for joint detection and grouping},
author={Newell, Alejandro and Huang, Zhiao and Deng, Jia},
booktitle={Advances in neural information processing systems},
pages={2277--2287},
year={2017}
}
HigherHRNet (CVPR'2020)
@inproceedings{cheng2020higherhrnet,
title={HigherHRNet: Scale-Aware Representation Learning for Bottom-Up Human Pose Estimation},
author={Cheng, Bowen and Xiao, Bin and Wang, Jingdong and Shi, Honghui and Huang, Thomas S and Zhang, Lei},
booktitle={Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition},
pages={5386--5395},
year={2020}
}
UDP (CVPR'2020)
@InProceedings{Huang_2020_CVPR,
author = {Huang, Junjie and Zhu, Zheng and Guo, Feng and Huang, Guan},
title = {The Devil Is in the Details: Delving Into Unbiased Data Processing for Human Pose Estimation},
booktitle = {The IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)},
month = {June},
year = {2020}
}
COCO (ECCV'2014)
@inproceedings{lin2014microsoft,
title={Microsoft coco: Common objects in context},
author={Lin, Tsung-Yi and Maire, Michael and Belongie, Serge and Hays, James and Perona, Pietro and Ramanan, Deva and Doll{\'a}r, Piotr and Zitnick, C Lawrence},
booktitle={European conference on computer vision},
pages={740--755},
year={2014},
organization={Springer}
}
Results on COCO val2017 without multi-scale test
Arch | Input Size | AP | AP50 | AP75 | AR | AR50 | ckpt | log |
---|---|---|---|---|---|---|---|---|
HigherHRNet-w32_udp | 512x512 | 0.678 | 0.862 | 0.736 | 0.724 | 0.890 | ckpt | log |
HigherHRNet-w48_udp | 512x512 | 0.690 | 0.872 | 0.750 | 0.734 | 0.891 | ckpt | log |
Associative Embedding + Hrnet on Coco¶
Associative Embedding (NIPS'2017)
@inproceedings{newell2017associative,
title={Associative embedding: End-to-end learning for joint detection and grouping},
author={Newell, Alejandro and Huang, Zhiao and Deng, Jia},
booktitle={Advances in neural information processing systems},
pages={2277--2287},
year={2017}
}
HRNet (CVPR'2019)
@inproceedings{sun2019deep,
title={Deep high-resolution representation learning for human pose estimation},
author={Sun, Ke and Xiao, Bin and Liu, Dong and Wang, Jingdong},
booktitle={Proceedings of the IEEE conference on computer vision and pattern recognition},
pages={5693--5703},
year={2019}
}
COCO (ECCV'2014)
@inproceedings{lin2014microsoft,
title={Microsoft coco: Common objects in context},
author={Lin, Tsung-Yi and Maire, Michael and Belongie, Serge and Hays, James and Perona, Pietro and Ramanan, Deva and Doll{\'a}r, Piotr and Zitnick, C Lawrence},
booktitle={European conference on computer vision},
pages={740--755},
year={2014},
organization={Springer}
}
Results on COCO val2017 without multi-scale test
Arch | Input Size | AP | AP50 | AP75 | AR | AR50 | ckpt | log |
---|---|---|---|---|---|---|---|---|
HRNet-w32 | 512x512 | 0.654 | 0.863 | 0.720 | 0.710 | 0.892 | ckpt | log |
HRNet-w48 | 512x512 | 0.665 | 0.860 | 0.727 | 0.716 | 0.889 | ckpt | log |
Results on COCO val2017 with multi-scale test. 3 default scales ([2, 1, 0.5]) are used
Arch | Input Size | AP | AP50 | AP75 | AR | AR50 | ckpt | log |
---|---|---|---|---|---|---|---|---|
HRNet-w32 | 512x512 | 0.698 | 0.877 | 0.760 | 0.748 | 0.907 | ckpt | log |
HRNet-w48 | 512x512 | 0.712 | 0.880 | 0.771 | 0.757 | 0.909 | ckpt | log |
Associative Embedding + Higherhrnet on Coco¶
Associative Embedding (NIPS'2017)
@inproceedings{newell2017associative,
title={Associative embedding: End-to-end learning for joint detection and grouping},
author={Newell, Alejandro and Huang, Zhiao and Deng, Jia},
booktitle={Advances in neural information processing systems},
pages={2277--2287},
year={2017}
}
HigherHRNet (CVPR'2020)
@inproceedings{cheng2020higherhrnet,
title={HigherHRNet: Scale-Aware Representation Learning for Bottom-Up Human Pose Estimation},
author={Cheng, Bowen and Xiao, Bin and Wang, Jingdong and Shi, Honghui and Huang, Thomas S and Zhang, Lei},
booktitle={Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition},
pages={5386--5395},
year={2020}
}
COCO (ECCV'2014)
@inproceedings{lin2014microsoft,
title={Microsoft coco: Common objects in context},
author={Lin, Tsung-Yi and Maire, Michael and Belongie, Serge and Hays, James and Perona, Pietro and Ramanan, Deva and Doll{\'a}r, Piotr and Zitnick, C Lawrence},
booktitle={European conference on computer vision},
pages={740--755},
year={2014},
organization={Springer}
}
Results on COCO val2017 without multi-scale test
Arch | Input Size | AP | AP50 | AP75 | AR | AR50 | ckpt | log |
---|---|---|---|---|---|---|---|---|
HigherHRNet-w32 | 512x512 | 0.677 | 0.870 | 0.738 | 0.723 | 0.890 | ckpt | log |
HigherHRNet-w32 | 640x640 | 0.686 | 0.871 | 0.747 | 0.733 | 0.898 | ckpt | log |
HigherHRNet-w48 | 512x512 | 0.686 | 0.873 | 0.741 | 0.731 | 0.892 | ckpt | log |
Results on COCO val2017 with multi-scale test. 3 default scales ([2, 1, 0.5]) are used
Arch | Input Size | AP | AP50 | AP75 | AR | AR50 | ckpt | log |
---|---|---|---|---|---|---|---|---|
HigherHRNet-w32 | 512x512 | 0.706 | 0.881 | 0.771 | 0.747 | 0.901 | ckpt | log |
HigherHRNet-w32 | 640x640 | 0.706 | 0.880 | 0.770 | 0.749 | 0.902 | ckpt | log |
HigherHRNet-w48 | 512x512 | 0.716 | 0.884 | 0.775 | 0.755 | 0.901 | ckpt | log |
Associative Embedding + Hrnet + Udp on Coco¶
Associative Embedding (NIPS'2017)
@inproceedings{newell2017associative,
title={Associative embedding: End-to-end learning for joint detection and grouping},
author={Newell, Alejandro and Huang, Zhiao and Deng, Jia},
booktitle={Advances in neural information processing systems},
pages={2277--2287},
year={2017}
}
HRNet (CVPR'2019)
@inproceedings{sun2019deep,
title={Deep high-resolution representation learning for human pose estimation},
author={Sun, Ke and Xiao, Bin and Liu, Dong and Wang, Jingdong},
booktitle={Proceedings of the IEEE conference on computer vision and pattern recognition},
pages={5693--5703},
year={2019}
}
UDP (CVPR'2020)
@InProceedings{Huang_2020_CVPR,
author = {Huang, Junjie and Zhu, Zheng and Guo, Feng and Huang, Guan},
title = {The Devil Is in the Details: Delving Into Unbiased Data Processing for Human Pose Estimation},
booktitle = {The IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)},
month = {June},
year = {2020}
}
COCO (ECCV'2014)
@inproceedings{lin2014microsoft,
title={Microsoft coco: Common objects in context},
author={Lin, Tsung-Yi and Maire, Michael and Belongie, Serge and Hays, James and Perona, Pietro and Ramanan, Deva and Doll{\'a}r, Piotr and Zitnick, C Lawrence},
booktitle={European conference on computer vision},
pages={740--755},
year={2014},
organization={Springer}
}
Results on COCO val2017 without multi-scale test
Arch | Input Size | AP | AP50 | AP75 | AR | AR50 | ckpt | log |
---|---|---|---|---|---|---|---|---|
HRNet-w32_udp | 512x512 | 0.671 | 0.863 | 0.729 | 0.717 | 0.889 | ckpt | log |
HRNet-w48_udp | 512x512 | 0.681 | 0.872 | 0.741 | 0.725 | 0.892 | ckpt | log |
Associative Embedding + Hourglass + Ae on Coco¶
Associative Embedding (NIPS'2017)
@inproceedings{newell2017associative,
title={Associative embedding: End-to-end learning for joint detection and grouping},
author={Newell, Alejandro and Huang, Zhiao and Deng, Jia},
booktitle={Advances in neural information processing systems},
pages={2277--2287},
year={2017}
}
HourglassAENet (NIPS'2017)
@inproceedings{newell2017associative,
title={Associative embedding: End-to-end learning for joint detection and grouping},
author={Newell, Alejandro and Huang, Zhiao and Deng, Jia},
booktitle={Advances in neural information processing systems},
pages={2277--2287},
year={2017}
}
COCO (ECCV'2014)
@inproceedings{lin2014microsoft,
title={Microsoft coco: Common objects in context},
author={Lin, Tsung-Yi and Maire, Michael and Belongie, Serge and Hays, James and Perona, Pietro and Ramanan, Deva and Doll{\'a}r, Piotr and Zitnick, C Lawrence},
booktitle={European conference on computer vision},
pages={740--755},
year={2014},
organization={Springer}
}
Results on COCO val2017 without multi-scale test
Arch | Input Size | AP | AP50 | AP75 | AR | AR50 | ckpt | log |
---|---|---|---|---|---|---|---|---|
pose_hourglass_ae | 512x512 | 0.613 | 0.833 | 0.667 | 0.659 | 0.850 | ckpt | log |
Results on COCO val2017 with multi-scale test. 3 default scales ([2, 1, 0.5]) are used
Arch | Input Size | AP | AP50 | AP75 | AR | AR50 | ckpt | log |
---|---|---|---|---|---|---|---|---|
pose_hourglass_ae | 512x512 | 0.667 | 0.855 | 0.723 | 0.707 | 0.877 | ckpt | log |
Deeppose + Resnet on Coco¶
DeepPose (CVPR'2014)
@inproceedings{toshev2014deeppose,
title={Deeppose: Human pose estimation via deep neural networks},
author={Toshev, Alexander and Szegedy, Christian},
booktitle={Proceedings of the IEEE conference on computer vision and pattern recognition},
pages={1653--1660},
year={2014}
}
ResNet (CVPR'2016)
@inproceedings{he2016deep,
title={Deep residual learning for image recognition},
author={He, Kaiming and Zhang, Xiangyu and Ren, Shaoqing and Sun, Jian},
booktitle={Proceedings of the IEEE conference on computer vision and pattern recognition},
pages={770--778},
year={2016}
}
COCO (ECCV'2014)
@inproceedings{lin2014microsoft,
title={Microsoft coco: Common objects in context},
author={Lin, Tsung-Yi and Maire, Michael and Belongie, Serge and Hays, James and Perona, Pietro and Ramanan, Deva and Doll{\'a}r, Piotr and Zitnick, C Lawrence},
booktitle={European conference on computer vision},
pages={740--755},
year={2014},
organization={Springer}
}
Results on COCO val2017 with detector having human AP of 56.4 on COCO val2017 dataset
Arch | Input Size | AP | AP50 | AP75 | AR | AR50 | ckpt | log |
---|---|---|---|---|---|---|---|---|
deeppose_resnet_50 | 256x192 | 0.526 | 0.816 | 0.586 | 0.638 | 0.887 | ckpt | log |
deeppose_resnet_101 | 256x192 | 0.560 | 0.832 | 0.628 | 0.668 | 0.900 | ckpt | log |
deeppose_resnet_152 | 256x192 | 0.583 | 0.843 | 0.659 | 0.686 | 0.907 | ckpt | log |
Topdown Heatmap + Shufflenetv2 on Coco¶
ShufflenetV2 (ECCV'2018)
@inproceedings{ma2018shufflenet,
title={Shufflenet v2: Practical guidelines for efficient cnn architecture design},
author={Ma, Ningning and Zhang, Xiangyu and Zheng, Hai-Tao and Sun, Jian},
booktitle={Proceedings of the European conference on computer vision (ECCV)},
pages={116--131},
year={2018}
}
COCO (ECCV'2014)
@inproceedings{lin2014microsoft,
title={Microsoft coco: Common objects in context},
author={Lin, Tsung-Yi and Maire, Michael and Belongie, Serge and Hays, James and Perona, Pietro and Ramanan, Deva and Doll{\'a}r, Piotr and Zitnick, C Lawrence},
booktitle={European conference on computer vision},
pages={740--755},
year={2014},
organization={Springer}
}
Results on COCO val2017 with detector having human AP of 56.4 on COCO val2017 dataset
Arch | Input Size | AP | AP50 | AP75 | AR | AR50 | ckpt | log |
---|---|---|---|---|---|---|---|---|
pose_shufflenetv2 | 256x192 | 0.599 | 0.854 | 0.663 | 0.664 | 0.899 | ckpt | log |
pose_shufflenetv2 | 384x288 | 0.636 | 0.865 | 0.705 | 0.697 | 0.909 | ckpt | log |
Topdown Heatmap + Litehrnet on Coco¶
LiteHRNet (CVPR'2021)
@inproceedings{Yulitehrnet21,
title={Lite-HRNet: A Lightweight High-Resolution Network},
author={Yu, Changqian and Xiao, Bin and Gao, Changxin and Yuan, Lu and Zhang, Lei and Sang, Nong and Wang, Jingdong},
booktitle={CVPR},
year={2021}
}
COCO (ECCV'2014)
@inproceedings{lin2014microsoft,
title={Microsoft coco: Common objects in context},
author={Lin, Tsung-Yi and Maire, Michael and Belongie, Serge and Hays, James and Perona, Pietro and Ramanan, Deva and Doll{\'a}r, Piotr and Zitnick, C Lawrence},
booktitle={European conference on computer vision},
pages={740--755},
year={2014},
organization={Springer}
}
Results on COCO val2017 with detector having human AP of 56.4 on COCO val2017 dataset
Arch | Input Size | AP | AP50 | AP75 | AR | AR50 | ckpt | log |
---|---|---|---|---|---|---|---|---|
LiteHRNet-18 | 256x192 | 0.643 | 0.868 | 0.720 | 0.706 | 0.912 | ckpt | log |
LiteHRNet-18 | 384x288 | 0.677 | 0.878 | 0.746 | 0.735 | 0.920 | ckpt | log |
LiteHRNet-30 | 256x192 | 0.675 | 0.881 | 0.754 | 0.736 | 0.924 | ckpt | log |
LiteHRNet-30 | 384x288 | 0.700 | 0.884 | 0.776 | 0.758 | 0.928 | ckpt | log |
Topdown Heatmap + Hourglass on Coco¶
Hourglass (ECCV'2016)
@inproceedings{newell2016stacked,
title={Stacked hourglass networks for human pose estimation},
author={Newell, Alejandro and Yang, Kaiyu and Deng, Jia},
booktitle={European conference on computer vision},
pages={483--499},
year={2016},
organization={Springer}
}
COCO (ECCV'2014)
@inproceedings{lin2014microsoft,
title={Microsoft coco: Common objects in context},
author={Lin, Tsung-Yi and Maire, Michael and Belongie, Serge and Hays, James and Perona, Pietro and Ramanan, Deva and Doll{\'a}r, Piotr and Zitnick, C Lawrence},
booktitle={European conference on computer vision},
pages={740--755},
year={2014},
organization={Springer}
}
Results on COCO val2017 with detector having human AP of 56.4 on COCO val2017 dataset
Arch | Input Size | AP | AP50 | AP75 | AR | AR50 | ckpt | log |
---|---|---|---|---|---|---|---|---|
pose_hourglass_52 | 256x256 | 0.726 | 0.896 | 0.799 | 0.780 | 0.934 | ckpt | log |
pose_hourglass_52 | 384x384 | 0.746 | 0.900 | 0.813 | 0.797 | 0.939 | ckpt | log |
Topdown Heatmap + Hrnet + Augmentation on Coco¶
HRNet (CVPR'2019)
@inproceedings{sun2019deep,
title={Deep high-resolution representation learning for human pose estimation},
author={Sun, Ke and Xiao, Bin and Liu, Dong and Wang, Jingdong},
booktitle={Proceedings of the IEEE conference on computer vision and pattern recognition},
pages={5693--5703},
year={2019}
}
Albumentations (Information'2020)
@article{buslaev2020albumentations,
title={Albumentations: fast and flexible image augmentations},
author={Buslaev, Alexander and Iglovikov, Vladimir I and Khvedchenya, Eugene and Parinov, Alex and Druzhinin, Mikhail and Kalinin, Alexandr A},
journal={Information},
volume={11},
number={2},
pages={125},
year={2020},
publisher={Multidisciplinary Digital Publishing Institute}
}
COCO (ECCV'2014)
@inproceedings{lin2014microsoft,
title={Microsoft coco: Common objects in context},
author={Lin, Tsung-Yi and Maire, Michael and Belongie, Serge and Hays, James and Perona, Pietro and Ramanan, Deva and Doll{\'a}r, Piotr and Zitnick, C Lawrence},
booktitle={European conference on computer vision},
pages={740--755},
year={2014},
organization={Springer}
}
Results on COCO val2017 with detector having human AP of 56.4 on COCO val2017 dataset
Arch | Input Size | AP | AP50 | AP75 | AR | AR50 | ckpt | log |
---|---|---|---|---|---|---|---|---|
coarsedropout | 256x192 | 0.753 | 0.908 | 0.822 | 0.806 | 0.946 | ckpt | log |
gridmask | 256x192 | 0.752 | 0.906 | 0.825 | 0.804 | 0.943 | ckpt | log |
photometric | 256x192 | 0.753 | 0.909 | 0.825 | 0.805 | 0.943 | ckpt | log |
Topdown Heatmap + Resnet + Fp16 on Coco¶
SimpleBaseline2D (ECCV'2018)
@inproceedings{xiao2018simple,
title={Simple baselines for human pose estimation and tracking},
author={Xiao, Bin and Wu, Haiping and Wei, Yichen},
booktitle={Proceedings of the European conference on computer vision (ECCV)},
pages={466--481},
year={2018}
}
ResNet (CVPR'2016)
@inproceedings{he2016deep,
title={Deep residual learning for image recognition},
author={He, Kaiming and Zhang, Xiangyu and Ren, Shaoqing and Sun, Jian},
booktitle={Proceedings of the IEEE conference on computer vision and pattern recognition},
pages={770--778},
year={2016}
}
FP16 (ArXiv'2017)
@article{micikevicius2017mixed,
title={Mixed precision training},
author={Micikevicius, Paulius and Narang, Sharan and Alben, Jonah and Diamos, Gregory and Elsen, Erich and Garcia, David and Ginsburg, Boris and Houston, Michael and Kuchaiev, Oleksii and Venkatesh, Ganesh and others},
journal={arXiv preprint arXiv:1710.03740},
year={2017}
}
COCO (ECCV'2014)
@inproceedings{lin2014microsoft,
title={Microsoft coco: Common objects in context},
author={Lin, Tsung-Yi and Maire, Michael and Belongie, Serge and Hays, James and Perona, Pietro and Ramanan, Deva and Doll{\'a}r, Piotr and Zitnick, C Lawrence},
booktitle={European conference on computer vision},
pages={740--755},
year={2014},
organization={Springer}
}
Results on COCO val2017 with detector having human AP of 56.4 on COCO val2017 dataset
Arch | Input Size | AP | AP50 | AP75 | AR | AR50 | ckpt | log |
---|---|---|---|---|---|---|---|---|
pose_resnet_50_fp16 | 256x192 | 0.717 | 0.898 | 0.793 | 0.772 | 0.936 | ckpt | log |
Topdown Heatmap + Hrnet + Fp16 on Coco¶
HRNet (CVPR'2019)
@inproceedings{sun2019deep,
title={Deep high-resolution representation learning for human pose estimation},
author={Sun, Ke and Xiao, Bin and Liu, Dong and Wang, Jingdong},
booktitle={Proceedings of the IEEE conference on computer vision and pattern recognition},
pages={5693--5703},
year={2019}
}
FP16 (ArXiv'2017)
@article{micikevicius2017mixed,
title={Mixed precision training},
author={Micikevicius, Paulius and Narang, Sharan and Alben, Jonah and Diamos, Gregory and Elsen, Erich and Garcia, David and Ginsburg, Boris and Houston, Michael and Kuchaiev, Oleksii and Venkatesh, Ganesh and others},
journal={arXiv preprint arXiv:1710.03740},
year={2017}
}
COCO (ECCV'2014)
@inproceedings{lin2014microsoft,
title={Microsoft coco: Common objects in context},
author={Lin, Tsung-Yi and Maire, Michael and Belongie, Serge and Hays, James and Perona, Pietro and Ramanan, Deva and Doll{\'a}r, Piotr and Zitnick, C Lawrence},
booktitle={European conference on computer vision},
pages={740--755},
year={2014},
organization={Springer}
}
Results on COCO val2017 with detector having human AP of 56.4 on COCO val2017 dataset
Arch | Input Size | AP | AP50 | AP75 | AR | AR50 | ckpt | log |
---|---|---|---|---|---|---|---|---|
pose_hrnet_w32_fp16 | 256x192 | 0.746 | 0.905 | 0.88 | 0.800 | 0.943 | ckpt | log |
Topdown Heatmap + Mobilenetv2 on Coco¶
MobilenetV2 (CVPR'2018)
@inproceedings{sandler2018mobilenetv2,
title={Mobilenetv2: Inverted residuals and linear bottlenecks},
author={Sandler, Mark and Howard, Andrew and Zhu, Menglong and Zhmoginov, Andrey and Chen, Liang-Chieh},
booktitle={Proceedings of the IEEE conference on computer vision and pattern recognition},
pages={4510--4520},
year={2018}
}
COCO (ECCV'2014)
@inproceedings{lin2014microsoft,
title={Microsoft coco: Common objects in context},
author={Lin, Tsung-Yi and Maire, Michael and Belongie, Serge and Hays, James and Perona, Pietro and Ramanan, Deva and Doll{\'a}r, Piotr and Zitnick, C Lawrence},
booktitle={European conference on computer vision},
pages={740--755},
year={2014},
organization={Springer}
}
Results on COCO val2017 with detector having human AP of 56.4 on COCO val2017 dataset
Arch | Input Size | AP | AP50 | AP75 | AR | AR50 | ckpt | log |
---|---|---|---|---|---|---|---|---|
pose_mobilenetv2 | 256x192 | 0.646 | 0.874 | 0.723 | 0.707 | 0.917 | ckpt | log |
pose_mobilenetv2 | 384x288 | 0.673 | 0.879 | 0.743 | 0.729 | 0.916 | ckpt | log |
Topdown Heatmap + Resnet on Coco¶
SimpleBaseline2D (ECCV'2018)
@inproceedings{xiao2018simple,
title={Simple baselines for human pose estimation and tracking},
author={Xiao, Bin and Wu, Haiping and Wei, Yichen},
booktitle={Proceedings of the European conference on computer vision (ECCV)},
pages={466--481},
year={2018}
}
ResNet (CVPR'2016)
@inproceedings{he2016deep,
title={Deep residual learning for image recognition},
author={He, Kaiming and Zhang, Xiangyu and Ren, Shaoqing and Sun, Jian},
booktitle={Proceedings of the IEEE conference on computer vision and pattern recognition},
pages={770--778},
year={2016}
}
COCO (ECCV'2014)
@inproceedings{lin2014microsoft,
title={Microsoft coco: Common objects in context},
author={Lin, Tsung-Yi and Maire, Michael and Belongie, Serge and Hays, James and Perona, Pietro and Ramanan, Deva and Doll{\'a}r, Piotr and Zitnick, C Lawrence},
booktitle={European conference on computer vision},
pages={740--755},
year={2014},
organization={Springer}
}
Results on COCO val2017 with detector having human AP of 56.4 on COCO val2017 dataset
Arch | Input Size | AP | AP50 | AP75 | AR | AR50 | ckpt | log |
---|---|---|---|---|---|---|---|---|
pose_resnet_50 | 256x192 | 0.718 | 0.898 | 0.795 | 0.773 | 0.937 | ckpt | log |
pose_resnet_50 | 384x288 | 0.731 | 0.900 | 0.799 | 0.783 | 0.931 | ckpt | log |
pose_resnet_101 | 256x192 | 0.726 | 0.899 | 0.806 | 0.781 | 0.939 | ckpt | log |
pose_resnet_101 | 384x288 | 0.748 | 0.905 | 0.817 | 0.798 | 0.940 | ckpt | log |
pose_resnet_152 | 256x192 | 0.735 | 0.905 | 0.812 | 0.790 | 0.943 | ckpt | log |
pose_resnet_152 | 384x288 | 0.750 | 0.908 | 0.821 | 0.800 | 0.942 | ckpt | log |
Topdown Heatmap + Hrnet on Coco¶
HRNet (CVPR'2019)
@inproceedings{sun2019deep,
title={Deep high-resolution representation learning for human pose estimation},
author={Sun, Ke and Xiao, Bin and Liu, Dong and Wang, Jingdong},
booktitle={Proceedings of the IEEE conference on computer vision and pattern recognition},
pages={5693--5703},
year={2019}
}
COCO (ECCV'2014)
@inproceedings{lin2014microsoft,
title={Microsoft coco: Common objects in context},
author={Lin, Tsung-Yi and Maire, Michael and Belongie, Serge and Hays, James and Perona, Pietro and Ramanan, Deva and Doll{\'a}r, Piotr and Zitnick, C Lawrence},
booktitle={European conference on computer vision},
pages={740--755},
year={2014},
organization={Springer}
}
Results on COCO val2017 with detector having human AP of 56.4 on COCO val2017 dataset
Arch | Input Size | AP | AP50 | AP75 | AR | AR50 | ckpt | log |
---|---|---|---|---|---|---|---|---|
pose_hrnet_w32 | 256x192 | 0.746 | 0.904 | 0.819 | 0.799 | 0.942 | ckpt | log |
pose_hrnet_w32 | 384x288 | 0.760 | 0.906 | 0.829 | 0.810 | 0.943 | ckpt | log |
pose_hrnet_w48 | 256x192 | 0.756 | 0.907 | 0.825 | 0.806 | 0.942 | ckpt | log |
pose_hrnet_w48 | 384x288 | 0.767 | 0.910 | 0.831 | 0.816 | 0.946 | ckpt | log |
Topdown Heatmap + RSN on Coco¶
RSN (ECCV'2020)
@misc{cai2020learning,
title={Learning Delicate Local Representations for Multi-Person Pose Estimation},
author={Yuanhao Cai and Zhicheng Wang and Zhengxiong Luo and Binyi Yin and Angang Du and Haoqian Wang and Xinyu Zhou and Erjin Zhou and Xiangyu Zhang and Jian Sun},
year={2020},
eprint={2003.04030},
archivePrefix={arXiv},
primaryClass={cs.CV}
}
COCO (ECCV'2014)
@inproceedings{lin2014microsoft,
title={Microsoft coco: Common objects in context},
author={Lin, Tsung-Yi and Maire, Michael and Belongie, Serge and Hays, James and Perona, Pietro and Ramanan, Deva and Doll{\'a}r, Piotr and Zitnick, C Lawrence},
booktitle={European conference on computer vision},
pages={740--755},
year={2014},
organization={Springer}
}
Results on COCO val2017 with detector having human AP of 56.4 on COCO val2017 dataset
Arch | Input Size | AP | AP50 | AP75 | AR | AR50 | ckpt | log |
---|---|---|---|---|---|---|---|---|
rsn_18 | 256x192 | 0.704 | 0.887 | 0.779 | 0.771 | 0.926 | ckpt | log |
rsn_50 | 256x192 | 0.723 | 0.896 | 0.800 | 0.788 | 0.934 | ckpt | log |
2xrsn_50 | 256x192 | 0.745 | 0.899 | 0.818 | 0.809 | 0.939 | ckpt | log |
3xrsn_50 | 256x192 | 0.750 | 0.900 | 0.823 | 0.813 | 0.940 | ckpt | log |
Topdown Heatmap + Resnest on Coco¶
ResNeSt (ArXiv'2020)
@article{zhang2020resnest,
title={ResNeSt: Split-Attention Networks},
author={Zhang, Hang and Wu, Chongruo and Zhang, Zhongyue and Zhu, Yi and Zhang, Zhi and Lin, Haibin and Sun, Yue and He, Tong and Muller, Jonas and Manmatha, R. and Li, Mu and Smola, Alexander},
journal={arXiv preprint arXiv:2004.08955},
year={2020}
}
COCO (ECCV'2014)
@inproceedings{lin2014microsoft,
title={Microsoft coco: Common objects in context},
author={Lin, Tsung-Yi and Maire, Michael and Belongie, Serge and Hays, James and Perona, Pietro and Ramanan, Deva and Doll{\'a}r, Piotr and Zitnick, C Lawrence},
booktitle={European conference on computer vision},
pages={740--755},
year={2014},
organization={Springer}
}
Results on COCO val2017 with detector having human AP of 56.4 on COCO val2017 dataset
Arch | Input Size | AP | AP50 | AP75 | AR | AR50 | ckpt | log |
---|---|---|---|---|---|---|---|---|
pose_resnest_50 | 256x192 | 0.721 | 0.899 | 0.802 | 0.776 | 0.938 | ckpt | log |
pose_resnest_50 | 384x288 | 0.737 | 0.900 | 0.811 | 0.789 | 0.938 | ckpt | log |
pose_resnest_101 | 256x192 | 0.725 | 0.899 | 0.807 | 0.781 | 0.939 | ckpt | log |
pose_resnest_101 | 384x288 | 0.746 | 0.906 | 0.820 | 0.798 | 0.943 | ckpt | log |
pose_resnest_200 | 256x192 | 0.732 | 0.905 | 0.812 | 0.787 | 0.942 | ckpt | log |
pose_resnest_200 | 384x288 | 0.754 | 0.908 | 0.827 | 0.807 | 0.945 | ckpt | log |
pose_resnest_269 | 256x192 | 0.738 | 0.907 | 0.819 | 0.793 | 0.945 | ckpt | log |
pose_resnest_269 | 384x288 | 0.755 | 0.908 | 0.828 | 0.806 | 0.943 | ckpt | log |
Topdown Heatmap + Resnext on Coco¶
ResNext (CVPR'2017)
@inproceedings{xie2017aggregated,
title={Aggregated residual transformations for deep neural networks},
author={Xie, Saining and Girshick, Ross and Doll{\'a}r, Piotr and Tu, Zhuowen and He, Kaiming},
booktitle={Proceedings of the IEEE conference on computer vision and pattern recognition},
pages={1492--1500},
year={2017}
}
COCO (ECCV'2014)
@inproceedings{lin2014microsoft,
title={Microsoft coco: Common objects in context},
author={Lin, Tsung-Yi and Maire, Michael and Belongie, Serge and Hays, James and Perona, Pietro and Ramanan, Deva and Doll{\'a}r, Piotr and Zitnick, C Lawrence},
booktitle={European conference on computer vision},
pages={740--755},
year={2014},
organization={Springer}
}
Results on COCO val2017 with detector having human AP of 56.4 on COCO val2017 dataset
Arch | Input Size | AP | AP50 | AP75 | AR | AR50 | ckpt | log |
---|---|---|---|---|---|---|---|---|
pose_resnext_50 | 256x192 | 0.714 | 0.898 | 0.789 | 0.771 | 0.937 | ckpt | log |
pose_resnext_50 | 384x288 | 0.724 | 0.899 | 0.794 | 0.777 | 0.935 | ckpt | log |
pose_resnext_101 | 256x192 | 0.726 | 0.900 | 0.801 | 0.782 | 0.940 | ckpt | log |
pose_resnext_101 | 384x288 | 0.743 | 0.903 | 0.815 | 0.795 | 0.939 | ckpt | log |
pose_resnext_152 | 256x192 | 0.730 | 0.904 | 0.808 | 0.786 | 0.940 | ckpt | log |
pose_resnext_152 | 384x288 | 0.742 | 0.902 | 0.810 | 0.794 | 0.939 | ckpt | log |
Topdown Heatmap + MSPN on Coco¶
MSPN (ArXiv'2019)
@article{li2019rethinking,
title={Rethinking on Multi-Stage Networks for Human Pose Estimation},
author={Li, Wenbo and Wang, Zhicheng and Yin, Binyi and Peng, Qixiang and Du, Yuming and Xiao, Tianzi and Yu, Gang and Lu, Hongtao and Wei, Yichen and Sun, Jian},
journal={arXiv preprint arXiv:1901.00148},
year={2019}
}
COCO (ECCV'2014)
@inproceedings{lin2014microsoft,
title={Microsoft coco: Common objects in context},
author={Lin, Tsung-Yi and Maire, Michael and Belongie, Serge and Hays, James and Perona, Pietro and Ramanan, Deva and Doll{\'a}r, Piotr and Zitnick, C Lawrence},
booktitle={European conference on computer vision},
pages={740--755},
year={2014},
organization={Springer}
}
Results on COCO val2017 with detector having human AP of 56.4 on COCO val2017 dataset
Arch | Input Size | AP | AP50 | AP75 | AR | AR50 | ckpt | log |
---|---|---|---|---|---|---|---|---|
mspn_50 | 256x192 | 0.723 | 0.895 | 0.794 | 0.788 | 0.933 | ckpt | log |
2xmspn_50 | 256x192 | 0.754 | 0.903 | 0.825 | 0.815 | 0.941 | ckpt | log |
3xmspn_50 | 256x192 | 0.758 | 0.904 | 0.830 | 0.821 | 0.943 | ckpt | log |
4xmspn_50 | 256x192 | 0.764 | 0.906 | 0.835 | 0.826 | 0.944 | ckpt | log |
Topdown Heatmap + VGG on Coco¶
VGG (ICLR'2015)
@article{simonyan2014very,
title={Very deep convolutional networks for large-scale image recognition},
author={Simonyan, Karen and Zisserman, Andrew},
journal={arXiv preprint arXiv:1409.1556},
year={2014}
}
COCO (ECCV'2014)
@inproceedings{lin2014microsoft,
title={Microsoft coco: Common objects in context},
author={Lin, Tsung-Yi and Maire, Michael and Belongie, Serge and Hays, James and Perona, Pietro and Ramanan, Deva and Doll{\'a}r, Piotr and Zitnick, C Lawrence},
booktitle={European conference on computer vision},
pages={740--755},
year={2014},
organization={Springer}
}
Results on COCO val2017 with detector having human AP of 56.4 on COCO val2017 dataset
Arch | Input Size | AP | AP50 | AP75 | AR | AR50 | ckpt | log |
---|---|---|---|---|---|---|---|---|
vgg | 256x192 | 0.698 | 0.890 | 0.768 | 0.754 | 0.929 | ckpt | log |
Topdown Heatmap + Resnet + Dark on Coco¶
SimpleBaseline2D (ECCV'2018)
@inproceedings{xiao2018simple,
title={Simple baselines for human pose estimation and tracking},
author={Xiao, Bin and Wu, Haiping and Wei, Yichen},
booktitle={Proceedings of the European conference on computer vision (ECCV)},
pages={466--481},
year={2018}
}
ResNet (CVPR'2016)
@inproceedings{he2016deep,
title={Deep residual learning for image recognition},
author={He, Kaiming and Zhang, Xiangyu and Ren, Shaoqing and Sun, Jian},
booktitle={Proceedings of the IEEE conference on computer vision and pattern recognition},
pages={770--778},
year={2016}
}
DarkPose (CVPR'2020)
@inproceedings{zhang2020distribution,
title={Distribution-aware coordinate representation for human pose estimation},
author={Zhang, Feng and Zhu, Xiatian and Dai, Hanbin and Ye, Mao and Zhu, Ce},
booktitle={Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition},
pages={7093--7102},
year={2020}
}
COCO (ECCV'2014)
@inproceedings{lin2014microsoft,
title={Microsoft coco: Common objects in context},
author={Lin, Tsung-Yi and Maire, Michael and Belongie, Serge and Hays, James and Perona, Pietro and Ramanan, Deva and Doll{\'a}r, Piotr and Zitnick, C Lawrence},
booktitle={European conference on computer vision},
pages={740--755},
year={2014},
organization={Springer}
}
Results on COCO val2017 with detector having human AP of 56.4 on COCO val2017 dataset
Arch | Input Size | AP | AP50 | AP75 | AR | AR50 | ckpt | log |
---|---|---|---|---|---|---|---|---|
pose_resnet_50_dark | 256x192 | 0.724 | 0.898 | 0.800 | 0.777 | 0.936 | ckpt | log |
pose_resnet_50_dark | 384x288 | 0.735 | 0.900 | 0.801 | 0.785 | 0.937 | ckpt | log |
pose_resnet_101_dark | 256x192 | 0.732 | 0.899 | 0.808 | 0.786 | 0.938 | ckpt | log |
pose_resnet_101_dark | 384x288 | 0.749 | 0.902 | 0.816 | 0.799 | 0.939 | ckpt | log |
pose_resnet_152_dark | 256x192 | 0.745 | 0.905 | 0.821 | 0.797 | 0.942 | ckpt | log |
pose_resnet_152_dark | 384x288 | 0.757 | 0.909 | 0.826 | 0.806 | 0.943 | ckpt | log |
Topdown Heatmap + Hrnet + Udp on Coco¶
HRNet (CVPR'2019)
@inproceedings{sun2019deep,
title={Deep high-resolution representation learning for human pose estimation},
author={Sun, Ke and Xiao, Bin and Liu, Dong and Wang, Jingdong},
booktitle={Proceedings of the IEEE conference on computer vision and pattern recognition},
pages={5693--5703},
year={2019}
}
UDP (CVPR'2020)
@InProceedings{Huang_2020_CVPR,
author = {Huang, Junjie and Zhu, Zheng and Guo, Feng and Huang, Guan},
title = {The Devil Is in the Details: Delving Into Unbiased Data Processing for Human Pose Estimation},
booktitle = {The IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)},
month = {June},
year = {2020}
}
COCO (ECCV'2014)
@inproceedings{lin2014microsoft,
title={Microsoft coco: Common objects in context},
author={Lin, Tsung-Yi and Maire, Michael and Belongie, Serge and Hays, James and Perona, Pietro and Ramanan, Deva and Doll{\'a}r, Piotr and Zitnick, C Lawrence},
booktitle={European conference on computer vision},
pages={740--755},
year={2014},
organization={Springer}
}
Results on COCO val2017 with detector having human AP of 56.4 on COCO val2017 dataset
Arch | Input Size | AP | AP50 | AP75 | AR | AR50 | ckpt | log |
---|---|---|---|---|---|---|---|---|
pose_hrnet_w32_udp | 256x192 | 0.760 | 0.907 | 0.827 | 0.811 | 0.945 | ckpt | log |
pose_hrnet_w32_udp | 384x288 | 0.769 | 0.908 | 0.833 | 0.817 | 0.944 | ckpt | log |
pose_hrnet_w48_udp | 256x192 | 0.767 | 0.906 | 0.834 | 0.817 | 0.945 | ckpt | log |
pose_hrnet_w48_udp | 384x288 | 0.772 | 0.910 | 0.835 | 0.820 | 0.945 | ckpt | log |
pose_hrnet_w32_udp_regress | 256x192 | 0.758 | 0.908 | 0.823 | 0.812 | 0.943 | ckpt | log |
Note that, UDP also adopts the unbiased encoding/decoding algorithm of DARK.
Topdown Heatmap + Alexnet on Coco¶
AlexNet (NeurIPS'2012)
@inproceedings{krizhevsky2012imagenet,
title={Imagenet classification with deep convolutional neural networks},
author={Krizhevsky, Alex and Sutskever, Ilya and Hinton, Geoffrey E},
booktitle={Advances in neural information processing systems},
pages={1097--1105},
year={2012}
}
COCO (ECCV'2014)
@inproceedings{lin2014microsoft,
title={Microsoft coco: Common objects in context},
author={Lin, Tsung-Yi and Maire, Michael and Belongie, Serge and Hays, James and Perona, Pietro and Ramanan, Deva and Doll{\'a}r, Piotr and Zitnick, C Lawrence},
booktitle={European conference on computer vision},
pages={740--755},
year={2014},
organization={Springer}
}
Results on COCO val2017 with detector having human AP of 56.4 on COCO val2017 dataset
Arch | Input Size | AP | AP50 | AP75 | AR | AR50 | ckpt | log |
---|---|---|---|---|---|---|---|---|
pose_alexnet | 256x192 | 0.397 | 0.758 | 0.381 | 0.478 | 0.822 | ckpt | log |
Topdown Heatmap + Seresnet on Coco¶
SEResNet (CVPR'2018)
@inproceedings{hu2018squeeze,
title={Squeeze-and-excitation networks},
author={Hu, Jie and Shen, Li and Sun, Gang},
booktitle={Proceedings of the IEEE conference on computer vision and pattern recognition},
pages={7132--7141},
year={2018}
}
COCO (ECCV'2014)
@inproceedings{lin2014microsoft,
title={Microsoft coco: Common objects in context},
author={Lin, Tsung-Yi and Maire, Michael and Belongie, Serge and Hays, James and Perona, Pietro and Ramanan, Deva and Doll{\'a}r, Piotr and Zitnick, C Lawrence},
booktitle={European conference on computer vision},
pages={740--755},
year={2014},
organization={Springer}
}
Results on COCO val2017 with detector having human AP of 56.4 on COCO val2017 dataset
Arch | Input Size | AP | AP50 | AP75 | AR | AR50 | ckpt | log |
---|---|---|---|---|---|---|---|---|
pose_seresnet_50 | 256x192 | 0.728 | 0.900 | 0.809 | 0.784 | 0.940 | ckpt | log |
pose_seresnet_50 | 384x288 | 0.748 | 0.905 | 0.819 | 0.799 | 0.941 | ckpt | log |
pose_seresnet_101 | 256x192 | 0.734 | 0.904 | 0.815 | 0.790 | 0.942 | ckpt | log |
pose_seresnet_101 | 384x288 | 0.753 | 0.907 | 0.823 | 0.805 | 0.943 | ckpt | log |
pose_seresnet_152* | 256x192 | 0.730 | 0.899 | 0.810 | 0.786 | 0.940 | ckpt | log |
pose_seresnet_152* | 384x288 | 0.753 | 0.906 | 0.823 | 0.806 | 0.945 | ckpt | log |
Note that * means without imagenet pre-training.
Topdown Heatmap + Shufflenetv1 on Coco¶
ShufflenetV1 (CVPR'2018)
@inproceedings{zhang2018shufflenet,
title={Shufflenet: An extremely efficient convolutional neural network for mobile devices},
author={Zhang, Xiangyu and Zhou, Xinyu and Lin, Mengxiao and Sun, Jian},
booktitle={Proceedings of the IEEE conference on computer vision and pattern recognition},
pages={6848--6856},
year={2018}
}
COCO (ECCV'2014)
@inproceedings{lin2014microsoft,
title={Microsoft coco: Common objects in context},
author={Lin, Tsung-Yi and Maire, Michael and Belongie, Serge and Hays, James and Perona, Pietro and Ramanan, Deva and Doll{\'a}r, Piotr and Zitnick, C Lawrence},
booktitle={European conference on computer vision},
pages={740--755},
year={2014},
organization={Springer}
}
Results on COCO val2017 with detector having human AP of 56.4 on COCO val2017 dataset
Arch | Input Size | AP | AP50 | AP75 | AR | AR50 | ckpt | log |
---|---|---|---|---|---|---|---|---|
pose_shufflenetv1 | 256x192 | 0.585 | 0.845 | 0.650 | 0.651 | 0.894 | ckpt | log |
pose_shufflenetv1 | 384x288 | 0.622 | 0.859 | 0.685 | 0.684 | 0.901 | ckpt | log |
Topdown Heatmap + CPM on Coco¶
CPM (CVPR'2016)
@inproceedings{wei2016convolutional,
title={Convolutional pose machines},
author={Wei, Shih-En and Ramakrishna, Varun and Kanade, Takeo and Sheikh, Yaser},
booktitle={Proceedings of the IEEE conference on Computer Vision and Pattern Recognition},
pages={4724--4732},
year={2016}
}
COCO (ECCV'2014)
@inproceedings{lin2014microsoft,
title={Microsoft coco: Common objects in context},
author={Lin, Tsung-Yi and Maire, Michael and Belongie, Serge and Hays, James and Perona, Pietro and Ramanan, Deva and Doll{\'a}r, Piotr and Zitnick, C Lawrence},
booktitle={European conference on computer vision},
pages={740--755},
year={2014},
organization={Springer}
}
Results on COCO val2017 with detector having human AP of 56.4 on COCO val2017 dataset
Arch | Input Size | AP | AP50 | AP75 | AR | AR50 | ckpt | log |
---|---|---|---|---|---|---|---|---|
cpm | 256x192 | 0.623 | 0.859 | 0.704 | 0.686 | 0.903 | ckpt | log |
cpm | 384x288 | 0.650 | 0.864 | 0.725 | 0.708 | 0.905 | ckpt | log |
Topdown Heatmap + Hrnet + Dark on Coco¶
HRNet (CVPR'2019)
@inproceedings{sun2019deep,
title={Deep high-resolution representation learning for human pose estimation},
author={Sun, Ke and Xiao, Bin and Liu, Dong and Wang, Jingdong},
booktitle={Proceedings of the IEEE conference on computer vision and pattern recognition},
pages={5693--5703},
year={2019}
}
DarkPose (CVPR'2020)
@inproceedings{zhang2020distribution,
title={Distribution-aware coordinate representation for human pose estimation},
author={Zhang, Feng and Zhu, Xiatian and Dai, Hanbin and Ye, Mao and Zhu, Ce},
booktitle={Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition},
pages={7093--7102},
year={2020}
}
COCO (ECCV'2014)
@inproceedings{lin2014microsoft,
title={Microsoft coco: Common objects in context},
author={Lin, Tsung-Yi and Maire, Michael and Belongie, Serge and Hays, James and Perona, Pietro and Ramanan, Deva and Doll{\'a}r, Piotr and Zitnick, C Lawrence},
booktitle={European conference on computer vision},
pages={740--755},
year={2014},
organization={Springer}
}
Results on COCO val2017 with detector having human AP of 56.4 on COCO val2017 dataset
Arch | Input Size | AP | AP50 | AP75 | AR | AR50 | ckpt | log |
---|---|---|---|---|---|---|---|---|
pose_hrnet_w32_dark | 256x192 | 0.757 | 0.907 | 0.823 | 0.808 | 0.943 | ckpt | log |
pose_hrnet_w32_dark | 384x288 | 0.766 | 0.907 | 0.831 | 0.815 | 0.943 | ckpt | log |
pose_hrnet_w48_dark | 256x192 | 0.764 | 0.907 | 0.830 | 0.814 | 0.943 | ckpt | log |
pose_hrnet_w48_dark | 384x288 | 0.772 | 0.910 | 0.836 | 0.820 | 0.946 | ckpt | log |
Topdown Heatmap + Scnet on Coco¶
SCNet (CVPR'2020)
@inproceedings{liu2020improving,
title={Improving Convolutional Networks with Self-Calibrated Convolutions},
author={Liu, Jiang-Jiang and Hou, Qibin and Cheng, Ming-Ming and Wang, Changhu and Feng, Jiashi},
booktitle={Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition},
pages={10096--10105},
year={2020}
}
COCO (ECCV'2014)
@inproceedings{lin2014microsoft,
title={Microsoft coco: Common objects in context},
author={Lin, Tsung-Yi and Maire, Michael and Belongie, Serge and Hays, James and Perona, Pietro and Ramanan, Deva and Doll{\'a}r, Piotr and Zitnick, C Lawrence},
booktitle={European conference on computer vision},
pages={740--755},
year={2014},
organization={Springer}
}
Results on COCO val2017 with detector having human AP of 56.4 on COCO val2017 dataset
Arch | Input Size | AP | AP50 | AP75 | AR | AR50 | ckpt | log |
---|---|---|---|---|---|---|---|---|
pose_scnet_50 | 256x192 | 0.728 | 0.899 | 0.807 | 0.784 | 0.938 | ckpt | log |
pose_scnet_50 | 384x288 | 0.751 | 0.906 | 0.818 | 0.802 | 0.943 | ckpt | log |
pose_scnet_101 | 256x192 | 0.733 | 0.903 | 0.813 | 0.790 | 0.941 | ckpt | log |
pose_scnet_101 | 384x288 | 0.752 | 0.906 | 0.823 | 0.804 | 0.943 | ckpt | log |
Topdown Heatmap + Resnetv1d on Coco¶
ResNetV1D (CVPR'2019)
@inproceedings{he2019bag,
title={Bag of tricks for image classification with convolutional neural networks},
author={He, Tong and Zhang, Zhi and Zhang, Hang and Zhang, Zhongyue and Xie, Junyuan and Li, Mu},
booktitle={Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition},
pages={558--567},
year={2019}
}
COCO (ECCV'2014)
@inproceedings{lin2014microsoft,
title={Microsoft coco: Common objects in context},
author={Lin, Tsung-Yi and Maire, Michael and Belongie, Serge and Hays, James and Perona, Pietro and Ramanan, Deva and Doll{\'a}r, Piotr and Zitnick, C Lawrence},
booktitle={European conference on computer vision},
pages={740--755},
year={2014},
organization={Springer}
}
Results on COCO val2017 with detector having human AP of 56.4 on COCO val2017 dataset
Arch | Input Size | AP | AP50 | AP75 | AR | AR50 | ckpt | log |
---|---|---|---|---|---|---|---|---|
pose_resnetv1d_50 | 256x192 | 0.722 | 0.897 | 0.799 | 0.777 | 0.933 | ckpt | log |
pose_resnetv1d_50 | 384x288 | 0.730 | 0.900 | 0.799 | 0.780 | 0.934 | ckpt | log |
pose_resnetv1d_101 | 256x192 | 0.731 | 0.899 | 0.809 | 0.786 | 0.938 | ckpt | log |
pose_resnetv1d_101 | 384x288 | 0.748 | 0.902 | 0.816 | 0.799 | 0.939 | ckpt | log |
pose_resnetv1d_152 | 256x192 | 0.737 | 0.902 | 0.812 | 0.791 | 0.940 | ckpt | log |
pose_resnetv1d_152 | 384x288 | 0.752 | 0.909 | 0.821 | 0.802 | 0.944 | ckpt | log |
Topdown Heatmap + Vipnas on Coco¶
ViPNAS (CVPR'2021)
@article{xu2021vipnas,
title={ViPNAS: Efficient Video Pose Estimation via Neural Architecture Search},
author={Xu, Lumin and Guan, Yingda and Jin, Sheng and Liu, Wentao and Qian, Chen and Luo, Ping and Ouyang, Wanli and Wang, Xiaogang},
booktitle={Proceedings of the IEEE conference on computer vision and pattern recognition},
year={2021}
}
COCO (ECCV'2014)
@inproceedings{lin2014microsoft,
title={Microsoft coco: Common objects in context},
author={Lin, Tsung-Yi and Maire, Michael and Belongie, Serge and Hays, James and Perona, Pietro and Ramanan, Deva and Doll{\'a}r, Piotr and Zitnick, C Lawrence},
booktitle={European conference on computer vision},
pages={740--755},
year={2014},
organization={Springer}
}
Results on COCO val2017 with detector having human AP of 56.4 on COCO val2017 dataset
Arch | Input Size | AP | AP50 | AP75 | AR | AR50 | ckpt | log |
---|---|---|---|---|---|---|---|---|
S-ViPNAS-MobileNetV3 | 256x192 | 0.700 | 0.887 | 0.778 | 0.757 | 0.929 | ckpt | log |
S-ViPNAS-Res50 | 256x192 | 0.711 | 0.893 | 0.789 | 0.769 | 0.934 | ckpt | log |
Crowdpose Dataset¶
Associative Embedding + Higherhrnet on Crowdpose¶
Associative Embedding (NIPS'2017)
@inproceedings{newell2017associative,
title={Associative embedding: End-to-end learning for joint detection and grouping},
author={Newell, Alejandro and Huang, Zhiao and Deng, Jia},
booktitle={Advances in neural information processing systems},
pages={2277--2287},
year={2017}
}
HigherHRNet (CVPR'2020)
@inproceedings{cheng2020higherhrnet,
title={HigherHRNet: Scale-Aware Representation Learning for Bottom-Up Human Pose Estimation},
author={Cheng, Bowen and Xiao, Bin and Wang, Jingdong and Shi, Honghui and Huang, Thomas S and Zhang, Lei},
booktitle={Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition},
pages={5386--5395},
year={2020}
}
CrowdPose (CVPR'2019)
@article{li2018crowdpose,
title={CrowdPose: Efficient Crowded Scenes Pose Estimation and A New Benchmark},
author={Li, Jiefeng and Wang, Can and Zhu, Hao and Mao, Yihuan and Fang, Hao-Shu and Lu, Cewu},
journal={arXiv preprint arXiv:1812.00324},
year={2018}
}
Results on CrowdPose test without multi-scale test
Arch | Input Size | AP | AP50 | AP75 | AP (E) | AP (M) | AP (H) | ckpt | log |
---|---|---|---|---|---|---|---|---|---|
HigherHRNet-w32 | 512x512 | 0.655 | 0.859 | 0.705 | 0.728 | 0.660 | 0.577 | ckpt | log |
Results on CrowdPose test with multi-scale test. 2 scales ([2, 1]) are used
Arch | Input Size | AP | AP50 | AP75 | AP (E) | AP (M) | AP (H) | ckpt | log |
---|---|---|---|---|---|---|---|---|---|
HigherHRNet-w32 | 512x512 | 0.661 | 0.864 | 0.710 | 0.742 | 0.670 | 0.566 | ckpt | log |
Topdown Heatmap + Hrnet on Crowdpose¶
HRNet (CVPR'2019)
@inproceedings{sun2019deep,
title={Deep high-resolution representation learning for human pose estimation},
author={Sun, Ke and Xiao, Bin and Liu, Dong and Wang, Jingdong},
booktitle={Proceedings of the IEEE conference on computer vision and pattern recognition},
pages={5693--5703},
year={2019}
}
CrowdPose (CVPR'2019)
@article{li2018crowdpose,
title={CrowdPose: Efficient Crowded Scenes Pose Estimation and A New Benchmark},
author={Li, Jiefeng and Wang, Can and Zhu, Hao and Mao, Yihuan and Fang, Hao-Shu and Lu, Cewu},
journal={arXiv preprint arXiv:1812.00324},
year={2018}
}
Results on CrowdPose test with YOLOv3 human detector
Arch | Input Size | AP | AP50 | AP75 | AP (E) | AP (M) | AP (H) | ckpt | log |
---|---|---|---|---|---|---|---|---|---|
pose_hrnet_w32 | 256x192 | 0.675 | 0.825 | 0.729 | 0.770 | 0.687 | 0.553 | ckpt | log |
Topdown Heatmap + Resnet on Crowdpose¶
SimpleBaseline2D (ECCV'2018)
@inproceedings{xiao2018simple,
title={Simple baselines for human pose estimation and tracking},
author={Xiao, Bin and Wu, Haiping and Wei, Yichen},
booktitle={Proceedings of the European conference on computer vision (ECCV)},
pages={466--481},
year={2018}
}
ResNet (CVPR'2016)
@inproceedings{he2016deep,
title={Deep residual learning for image recognition},
author={He, Kaiming and Zhang, Xiangyu and Ren, Shaoqing and Sun, Jian},
booktitle={Proceedings of the IEEE conference on computer vision and pattern recognition},
pages={770--778},
year={2016}
}
CrowdPose (CVPR'2019)
@article{li2018crowdpose,
title={CrowdPose: Efficient Crowded Scenes Pose Estimation and A New Benchmark},
author={Li, Jiefeng and Wang, Can and Zhu, Hao and Mao, Yihuan and Fang, Hao-Shu and Lu, Cewu},
journal={arXiv preprint arXiv:1812.00324},
year={2018}
}
Results on CrowdPose test with YOLOv3 human detector
Arch | Input Size | AP | AP50 | AP75 | AP (E) | AP (M) | AP (H) | ckpt | log |
---|---|---|---|---|---|---|---|---|---|
pose_resnet_50 | 256x192 | 0.637 | 0.808 | 0.692 | 0.739 | 0.650 | 0.506 | ckpt | log |
pose_resnet_101 | 256x192 | 0.647 | 0.810 | 0.703 | 0.744 | 0.658 | 0.522 | ckpt | log |
pose_resnet_101 | 320x256 | 0.661 | 0.821 | 0.714 | 0.759 | 0.671 | 0.536 | ckpt | log |
pose_resnet_152 | 256x192 | 0.656 | 0.818 | 0.712 | 0.754 | 0.666 | 0.532 | ckpt | log |
H36m Dataset¶
Topdown Heatmap + Hrnet on H36m¶
HRNet (CVPR'2019)
@inproceedings{sun2019deep,
title={Deep high-resolution representation learning for human pose estimation},
author={Sun, Ke and Xiao, Bin and Liu, Dong and Wang, Jingdong},
booktitle={Proceedings of the IEEE conference on computer vision and pattern recognition},
pages={5693--5703},
year={2019}
}
Human3.6M (TPAMI'2014)
@article{h36m_pami,
author = {Ionescu, Catalin and Papava, Dragos and Olaru, Vlad and Sminchisescu, Cristian},
title = {Human3.6M: Large Scale Datasets and Predictive Methods for 3D Human Sensing in Natural Environments},
journal = {IEEE Transactions on Pattern Analysis and Machine Intelligence},
publisher = {IEEE Computer Society},
volume = {36},
number = {7},
pages = {1325-1339},
month = {jul},
year = {2014}
}
Results on Human3.6M test set with ground truth 2D detections
Arch | Input Size | EPE | PCK | ckpt | log |
---|---|---|---|---|---|
pose_hrnet_w32 | 256x256 | 9.43 | 0.911 | ckpt | log |
pose_hrnet_w48 | 256x256 | 7.36 | 0.932 | ckpt | log |
JHMDB Dataset¶
Topdown Heatmap + CPM on JHMDB¶
CPM (CVPR'2016)
@inproceedings{wei2016convolutional,
title={Convolutional pose machines},
author={Wei, Shih-En and Ramakrishna, Varun and Kanade, Takeo and Sheikh, Yaser},
booktitle={Proceedings of the IEEE conference on Computer Vision and Pattern Recognition},
pages={4724--4732},
year={2016}
}
JHMDB (ICCV'2013)
@inproceedings{Jhuang:ICCV:2013,
title = {Towards understanding action recognition},
author = {H. Jhuang and J. Gall and S. Zuffi and C. Schmid and M. J. Black},
booktitle = {International Conf. on Computer Vision (ICCV)},
month = Dec,
pages = {3192-3199},
year = {2013}
}
Results on Sub-JHMDB dataset
The models are pre-trained on MPII dataset only. NO test-time augmentation (multi-scale /rotation testing) is used.
Normalized by Person Size
Split | Arch | Input Size | Head | Sho | Elb | Wri | Hip | Knee | Ank | Mean | ckpt | log |
---|---|---|---|---|---|---|---|---|---|---|---|---|
Sub1 | cpm | 368x368 | 96.1 | 91.9 | 81.0 | 78.9 | 96.6 | 90.8 | 87.3 | 89.5 | ckpt | log |
Sub2 | cpm | 368x368 | 98.1 | 93.6 | 77.1 | 70.9 | 94.0 | 89.1 | 84.7 | 87.4 | ckpt | log |
Sub3 | cpm | 368x368 | 97.9 | 94.9 | 87.3 | 84.0 | 98.6 | 94.4 | 86.2 | 92.4 | ckpt | log |
Average | cpm | 368x368 | 97.4 | 93.5 | 81.5 | 77.9 | 96.4 | 91.4 | 86.1 | 89.8 | - | - |
Normalized by Torso Size
Split | Arch | Input Size | Head | Sho | Elb | Wri | Hip | Knee | Ank | Mean | ckpt | log |
---|---|---|---|---|---|---|---|---|---|---|---|---|
Sub1 | cpm | 368x368 | 89.0 | 63.0 | 54.0 | 54.9 | 68.2 | 63.1 | 61.2 | 66.0 | ckpt | log |
Sub2 | cpm | 368x368 | 90.3 | 57.9 | 46.8 | 44.3 | 60.8 | 58.2 | 62.4 | 61.1 | ckpt | log |
Sub3 | cpm | 368x368 | 91.0 | 72.6 | 59.9 | 54.0 | 73.2 | 68.5 | 65.8 | 70.3 | ckpt | log |
Average | cpm | 368x368 | 90.1 | 64.5 | 53.6 | 51.1 | 67.4 | 63.3 | 63.1 | 65.7 | - | - |
Topdown Heatmap + Resnet on JHMDB¶
SimpleBaseline2D (ECCV'2018)
@inproceedings{xiao2018simple,
title={Simple baselines for human pose estimation and tracking},
author={Xiao, Bin and Wu, Haiping and Wei, Yichen},
booktitle={Proceedings of the European conference on computer vision (ECCV)},
pages={466--481},
year={2018}
}
ResNet (CVPR'2016)
@inproceedings{he2016deep,
title={Deep residual learning for image recognition},
author={He, Kaiming and Zhang, Xiangyu and Ren, Shaoqing and Sun, Jian},
booktitle={Proceedings of the IEEE conference on computer vision and pattern recognition},
pages={770--778},
year={2016}
}
JHMDB (ICCV'2013)
@inproceedings{Jhuang:ICCV:2013,
title = {Towards understanding action recognition},
author = {H. Jhuang and J. Gall and S. Zuffi and C. Schmid and M. J. Black},
booktitle = {International Conf. on Computer Vision (ICCV)},
month = Dec,
pages = {3192-3199},
year = {2013}
}
Results on Sub-JHMDB dataset
The models are pre-trained on MPII dataset only. NO test-time augmentation (multi-scale /rotation testing) is used.
Normalized by Person Size
Split | Arch | Input Size | Head | Sho | Elb | Wri | Hip | Knee | Ank | Mean | ckpt | log |
---|---|---|---|---|---|---|---|---|---|---|---|---|
Sub1 | pose_resnet_50 | 256x256 | 99.1 | 98.0 | 93.8 | 91.3 | 99.4 | 96.5 | 92.8 | 96.1 | ckpt | log |
Sub2 | pose_resnet_50 | 256x256 | 99.3 | 97.1 | 90.6 | 87.0 | 98.9 | 96.3 | 94.1 | 95.0 | ckpt | log |
Sub3 | pose_resnet_50 | 256x256 | 99.0 | 97.9 | 94.0 | 91.6 | 99.7 | 98.0 | 94.7 | 96.7 | ckpt | log |
Average | pose_resnet_50 | 256x256 | 99.2 | 97.7 | 92.8 | 90.0 | 99.3 | 96.9 | 93.9 | 96.0 | - | - |
Sub1 | pose_resnet_50 (2 Deconv.) | 256x256 | 99.1 | 98.5 | 94.6 | 92.0 | 99.4 | 94.6 | 92.5 | 96.1 | ckpt | log |
Sub2 | pose_resnet_50 (2 Deconv.) | 256x256 | 99.3 | 97.8 | 91.0 | 87.0 | 99.1 | 96.5 | 93.8 | 95.2 | ckpt | log |
Sub3 | pose_resnet_50 (2 Deconv.) | 256x256 | 98.8 | 98.4 | 94.3 | 92.1 | 99.8 | 97.5 | 93.8 | 96.7 | ckpt | log |
Average | pose_resnet_50 (2 Deconv.) | 256x256 | 99.1 | 98.2 | 93.3 | 90.4 | 99.4 | 96.2 | 93.4 | 96.0 | - | - |
Normalized by Torso Size
Split | Arch | Input Size | Head | Sho | Elb | Wri | Hip | Knee | Ank | Mean | ckpt | log |
---|---|---|---|---|---|---|---|---|---|---|---|---|
Sub1 | pose_resnet_50 | 256x256 | 93.3 | 83.2 | 74.4 | 72.7 | 85.0 | 81.2 | 78.9 | 81.9 | ckpt | log |
Sub2 | pose_resnet_50 | 256x256 | 94.1 | 74.9 | 64.5 | 62.5 | 77.9 | 71.9 | 78.6 | 75.5 | ckpt | log |
Sub3 | pose_resnet_50 | 256x256 | 97.0 | 82.2 | 74.9 | 70.7 | 84.7 | 83.7 | 84.2 | 82.9 | ckpt | log |
Average | pose_resnet_50 | 256x256 | 94.8 | 80.1 | 71.3 | 68.6 | 82.5 | 78.9 | 80.6 | 80.1 | - | - |
Sub1 | pose_resnet_50 (2 Deconv.) | 256x256 | 92.4 | 80.6 | 73.2 | 70.5 | 82.3 | 75.4 | 75.0 | 79.2 | ckpt | log |
Sub2 | pose_resnet_50 (2 Deconv.) | 256x256 | 93.4 | 73.6 | 63.8 | 60.5 | 75.1 | 68.4 | 75.5 | 73.7 | ckpt | log |
Sub3 | pose_resnet_50 (2 Deconv.) | 256x256 | 96.1 | 81.2 | 72.6 | 67.9 | 83.6 | 80.9 | 81.5 | 81.2 | ckpt | log |
Average | pose_resnet_50 (2 Deconv.) | 256x256 | 94.0 | 78.5 | 69.9 | 66.3 | 80.3 | 74.9 | 77.3 | 78.0 | - | - |
MHP Dataset¶
Associative Embedding + Hrnet on MHP¶
Associative Embedding (NIPS'2017)
@inproceedings{newell2017associative,
title={Associative embedding: End-to-end learning for joint detection and grouping},
author={Newell, Alejandro and Huang, Zhiao and Deng, Jia},
booktitle={Advances in neural information processing systems},
pages={2277--2287},
year={2017}
}
HRNet (CVPR'2019)
@inproceedings{sun2019deep,
title={Deep high-resolution representation learning for human pose estimation},
author={Sun, Ke and Xiao, Bin and Liu, Dong and Wang, Jingdong},
booktitle={Proceedings of the IEEE conference on computer vision and pattern recognition},
pages={5693--5703},
year={2019}
}
MHP (ACM MM'2018)
@inproceedings{zhao2018understanding,
title={Understanding humans in crowded scenes: Deep nested adversarial learning and a new benchmark for multi-human parsing},
author={Zhao, Jian and Li, Jianshu and Cheng, Yu and Sim, Terence and Yan, Shuicheng and Feng, Jiashi},
booktitle={Proceedings of the 26th ACM international conference on Multimedia},
pages={792--800},
year={2018}
}
Results on MHP v2.0 validation set without multi-scale test
Arch | Input Size | AP | AP50 | AP75 | AR | AR50 | ckpt | log |
---|---|---|---|---|---|---|---|---|
HRNet-w48 | 512x512 | 0.583 | 0.895 | 0.666 | 0.656 | 0.931 | ckpt | log |
Results on MHP v2.0 validation set with multi-scale test. 3 default scales ([2, 1, 0.5]) are used
Arch | Input Size | AP | AP50 | AP75 | AR | AR50 | ckpt | log |
---|---|---|---|---|---|---|---|---|
HRNet-w48 | 512x512 | 0.592 | 0.898 | 0.673 | 0.664 | 0.932 | ckpt | log |
Topdown Heatmap + Resnet on MHP¶
SimpleBaseline2D (ECCV'2018)
@inproceedings{xiao2018simple,
title={Simple baselines for human pose estimation and tracking},
author={Xiao, Bin and Wu, Haiping and Wei, Yichen},
booktitle={Proceedings of the European conference on computer vision (ECCV)},
pages={466--481},
year={2018}
}
ResNet (CVPR'2016)
@inproceedings{he2016deep,
title={Deep residual learning for image recognition},
author={He, Kaiming and Zhang, Xiangyu and Ren, Shaoqing and Sun, Jian},
booktitle={Proceedings of the IEEE conference on computer vision and pattern recognition},
pages={770--778},
year={2016}
}
MHP (ACM MM'2018)
@inproceedings{zhao2018understanding,
title={Understanding humans in crowded scenes: Deep nested adversarial learning and a new benchmark for multi-human parsing},
author={Zhao, Jian and Li, Jianshu and Cheng, Yu and Sim, Terence and Yan, Shuicheng and Feng, Jiashi},
booktitle={Proceedings of the 26th ACM international conference on Multimedia},
pages={792--800},
year={2018}
}
Results on MHP v2.0 val set
Arch | Input Size | AP | AP50 | AP75 | AR | AR50 | ckpt | log |
---|---|---|---|---|---|---|---|---|
pose_resnet_101 | 256x192 | 0.583 | 0.897 | 0.669 | 0.636 | 0.918 | ckpt | log |
Note that, the evaluation metric used here is mAP (adapted from COCO), which may be different from the official evaluation codes. Please be cautious if you use the results in papers.
Mpii Dataset¶
Deeppose + Resnet on Mpii¶
DeepPose (CVPR'2014)
@inproceedings{toshev2014deeppose,
title={Deeppose: Human pose estimation via deep neural networks},
author={Toshev, Alexander and Szegedy, Christian},
booktitle={Proceedings of the IEEE conference on computer vision and pattern recognition},
pages={1653--1660},
year={2014}
}
ResNet (CVPR'2016)
@inproceedings{he2016deep,
title={Deep residual learning for image recognition},
author={He, Kaiming and Zhang, Xiangyu and Ren, Shaoqing and Sun, Jian},
booktitle={Proceedings of the IEEE conference on computer vision and pattern recognition},
pages={770--778},
year={2016}
}
MPII (CVPR'2014)
@inproceedings{andriluka14cvpr,
author = {Mykhaylo Andriluka and Leonid Pishchulin and Peter Gehler and Schiele, Bernt},
title = {2D Human Pose Estimation: New Benchmark and State of the Art Analysis},
booktitle = {IEEE Conference on Computer Vision and Pattern Recognition (CVPR)},
year = {2014},
month = {June}
}
Results on MPII val set
Arch | Input Size | Mean | Mean@0.1 | ckpt | log |
---|---|---|---|---|---|
deeppose_resnet_50 | 256x256 | 0.825 | 0.174 | ckpt | log |
deeppose_resnet_101 | 256x256 | 0.841 | 0.193 | ckpt | log |
deeppose_resnet_152 | 256x256 | 0.850 | 0.198 | ckpt | log |
Topdown Heatmap + Resnet on Mpii¶
SimpleBaseline2D (ECCV'2018)
@inproceedings{xiao2018simple,
title={Simple baselines for human pose estimation and tracking},
author={Xiao, Bin and Wu, Haiping and Wei, Yichen},
booktitle={Proceedings of the European conference on computer vision (ECCV)},
pages={466--481},
year={2018}
}
ResNet (CVPR'2016)
@inproceedings{he2016deep,
title={Deep residual learning for image recognition},
author={He, Kaiming and Zhang, Xiangyu and Ren, Shaoqing and Sun, Jian},
booktitle={Proceedings of the IEEE conference on computer vision and pattern recognition},
pages={770--778},
year={2016}
}
MPII (CVPR'2014)
@inproceedings{andriluka14cvpr,
author = {Mykhaylo Andriluka and Leonid Pishchulin and Peter Gehler and Schiele, Bernt},
title = {2D Human Pose Estimation: New Benchmark and State of the Art Analysis},
booktitle = {IEEE Conference on Computer Vision and Pattern Recognition (CVPR)},
year = {2014},
month = {June}
}
Results on MPII val set
Arch | Input Size | Mean | Mean@0.1 | ckpt | log |
---|---|---|---|---|---|
pose_resnet_50 | 256x256 | 0.882 | 0.286 | ckpt | log |
pose_resnet_101 | 256x256 | 0.888 | 0.290 | ckpt | log |
pose_resnet_152 | 256x256 | 0.889 | 0.303 | ckpt | log |
Topdown Heatmap + Scnet on Mpii¶
SCNet (CVPR'2020)
@inproceedings{liu2020improving,
title={Improving Convolutional Networks with Self-Calibrated Convolutions},
author={Liu, Jiang-Jiang and Hou, Qibin and Cheng, Ming-Ming and Wang, Changhu and Feng, Jiashi},
booktitle={Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition},
pages={10096--10105},
year={2020}
}
MPII (CVPR'2014)
@inproceedings{andriluka14cvpr,
author = {Mykhaylo Andriluka and Leonid Pishchulin and Peter Gehler and Schiele, Bernt},
title = {2D Human Pose Estimation: New Benchmark and State of the Art Analysis},
booktitle = {IEEE Conference on Computer Vision and Pattern Recognition (CVPR)},
year = {2014},
month = {June}
}
Results on MPII val set
Arch | Input Size | Mean | Mean@0.1 | ckpt | log |
---|---|---|---|---|---|
pose_scnet_50 | 256x256 | 0.888 | 0.290 | ckpt | log |
pose_scnet_101 | 256x256 | 0.886 | 0.293 | ckpt | log |
Topdown Heatmap + Resnetv1d on Mpii¶
ResNetV1D (CVPR'2019)
@inproceedings{he2019bag,
title={Bag of tricks for image classification with convolutional neural networks},
author={He, Tong and Zhang, Zhi and Zhang, Hang and Zhang, Zhongyue and Xie, Junyuan and Li, Mu},
booktitle={Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition},
pages={558--567},
year={2019}
}
MPII (CVPR'2014)
@inproceedings{andriluka14cvpr,
author = {Mykhaylo Andriluka and Leonid Pishchulin and Peter Gehler and Schiele, Bernt},
title = {2D Human Pose Estimation: New Benchmark and State of the Art Analysis},
booktitle = {IEEE Conference on Computer Vision and Pattern Recognition (CVPR)},
year = {2014},
month = {June}
}
Results on MPII val set
Arch | Input Size | Mean | Mean@0.1 | ckpt | log |
---|---|---|---|---|---|
pose_resnetv1d_50 | 256x256 | 0.881 | 0.290 | ckpt | log |
pose_resnetv1d_101 | 256x256 | 0.883 | 0.295 | ckpt | log |
pose_resnetv1d_152 | 256x256 | 0.888 | 0.300 | ckpt | log |
Topdown Heatmap + Seresnet on Mpii¶
SEResNet (CVPR'2018)
@inproceedings{hu2018squeeze,
title={Squeeze-and-excitation networks},
author={Hu, Jie and Shen, Li and Sun, Gang},
booktitle={Proceedings of the IEEE conference on computer vision and pattern recognition},
pages={7132--7141},
year={2018}
}
MPII (CVPR'2014)
@inproceedings{andriluka14cvpr,
author = {Mykhaylo Andriluka and Leonid Pishchulin and Peter Gehler and Schiele, Bernt},
title = {2D Human Pose Estimation: New Benchmark and State of the Art Analysis},
booktitle = {IEEE Conference on Computer Vision and Pattern Recognition (CVPR)},
year = {2014},
month = {June}
}
Results on MPII val set
Arch | Input Size | Mean | Mean@0.1 | ckpt | log |
---|---|---|---|---|---|
pose_seresnet_50 | 256x256 | 0.884 | 0.292 | ckpt | log |
pose_seresnet_101 | 256x256 | 0.884 | 0.295 | ckpt | log |
pose_seresnet_152* | 256x256 | 0.884 | 0.287 | ckpt | log |
Note that * means without imagenet pre-training.
Topdown Heatmap + Shufflenetv1 on Mpii¶
ShufflenetV1 (CVPR'2018)
@inproceedings{zhang2018shufflenet,
title={Shufflenet: An extremely efficient convolutional neural network for mobile devices},
author={Zhang, Xiangyu and Zhou, Xinyu and Lin, Mengxiao and Sun, Jian},
booktitle={Proceedings of the IEEE conference on computer vision and pattern recognition},
pages={6848--6856},
year={2018}
}
MPII (CVPR'2014)
@inproceedings{andriluka14cvpr,
author = {Mykhaylo Andriluka and Leonid Pishchulin and Peter Gehler and Schiele, Bernt},
title = {2D Human Pose Estimation: New Benchmark and State of the Art Analysis},
booktitle = {IEEE Conference on Computer Vision and Pattern Recognition (CVPR)},
year = {2014},
month = {June}
}
Results on MPII val set
Arch | Input Size | Mean | Mean@0.1 | ckpt | log |
---|---|---|---|---|---|
pose_shufflenetv1 | 256x256 | 0.823 | 0.195 | ckpt | log |
Topdown Heatmap + Mobilenetv2 on Mpii¶
MobilenetV2 (CVPR'2018)
@inproceedings{sandler2018mobilenetv2,
title={Mobilenetv2: Inverted residuals and linear bottlenecks},
author={Sandler, Mark and Howard, Andrew and Zhu, Menglong and Zhmoginov, Andrey and Chen, Liang-Chieh},
booktitle={Proceedings of the IEEE conference on computer vision and pattern recognition},
pages={4510--4520},
year={2018}
}
MPII (CVPR'2014)
@inproceedings{andriluka14cvpr,
author = {Mykhaylo Andriluka and Leonid Pishchulin and Peter Gehler and Schiele, Bernt},
title = {2D Human Pose Estimation: New Benchmark and State of the Art Analysis},
booktitle = {IEEE Conference on Computer Vision and Pattern Recognition (CVPR)},
year = {2014},
month = {June}
}
Results on MPII val set
Arch | Input Size | Mean | Mean@0.1 | ckpt | log |
---|---|---|---|---|---|
pose_mobilenetv2 | 256x256 | 0.854 | 0.235 | ckpt | log |
Topdown Heatmap + CPM on Mpii¶
CPM (CVPR'2016)
@inproceedings{wei2016convolutional,
title={Convolutional pose machines},
author={Wei, Shih-En and Ramakrishna, Varun and Kanade, Takeo and Sheikh, Yaser},
booktitle={Proceedings of the IEEE conference on Computer Vision and Pattern Recognition},
pages={4724--4732},
year={2016}
}
MPII (CVPR'2014)
@inproceedings{andriluka14cvpr,
author = {Mykhaylo Andriluka and Leonid Pishchulin and Peter Gehler and Schiele, Bernt},
title = {2D Human Pose Estimation: New Benchmark and State of the Art Analysis},
booktitle = {IEEE Conference on Computer Vision and Pattern Recognition (CVPR)},
year = {2014},
month = {June}
}
Results on MPII val set
Arch | Input Size | Mean | Mean@0.1 | ckpt | log |
---|---|---|---|---|---|
cpm | 368x368 | 0.876 | 0.285 | ckpt | log |
Topdown Heatmap + Hourglass on Mpii¶
Hourglass (ECCV'2016)
@inproceedings{newell2016stacked,
title={Stacked hourglass networks for human pose estimation},
author={Newell, Alejandro and Yang, Kaiyu and Deng, Jia},
booktitle={European conference on computer vision},
pages={483--499},
year={2016},
organization={Springer}
}
MPII (CVPR'2014)
@inproceedings{andriluka14cvpr,
author = {Mykhaylo Andriluka and Leonid Pishchulin and Peter Gehler and Schiele, Bernt},
title = {2D Human Pose Estimation: New Benchmark and State of the Art Analysis},
booktitle = {IEEE Conference on Computer Vision and Pattern Recognition (CVPR)},
year = {2014},
month = {June}
}
Results on MPII val set
Arch | Input Size | Mean | Mean@0.1 | ckpt | log |
---|---|---|---|---|---|
pose_hourglass_52 | 256x256 | 0.889 | 0.317 | ckpt | log |
pose_hourglass_52 | 384x384 | 0.894 | 0.366 | ckpt | log |
Topdown Heatmap + Hrnet + Dark on Mpii¶
HRNet (CVPR'2019)
@inproceedings{sun2019deep,
title={Deep high-resolution representation learning for human pose estimation},
author={Sun, Ke and Xiao, Bin and Liu, Dong and Wang, Jingdong},
booktitle={Proceedings of the IEEE conference on computer vision and pattern recognition},
pages={5693--5703},
year={2019}
}
DarkPose (CVPR'2020)
@inproceedings{zhang2020distribution,
title={Distribution-aware coordinate representation for human pose estimation},
author={Zhang, Feng and Zhu, Xiatian and Dai, Hanbin and Ye, Mao and Zhu, Ce},
booktitle={Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition},
pages={7093--7102},
year={2020}
}
MPII (CVPR'2014)
@inproceedings{andriluka14cvpr,
author = {Mykhaylo Andriluka and Leonid Pishchulin and Peter Gehler and Schiele, Bernt},
title = {2D Human Pose Estimation: New Benchmark and State of the Art Analysis},
booktitle = {IEEE Conference on Computer Vision and Pattern Recognition (CVPR)},
year = {2014},
month = {June}
}
Results on MPII val set
Arch | Input Size | Mean | Mean@0.1 | ckpt | log |
---|---|---|---|---|---|
pose_hrnet_w32_dark | 256x256 | 0.904 | 0.354 | ckpt | log |
pose_hrnet_w48_dark | 256x256 | 0.905 | 0.360 | ckpt | log |
Topdown Heatmap + Resnext on Mpii¶
ResNext (CVPR'2017)
@inproceedings{xie2017aggregated,
title={Aggregated residual transformations for deep neural networks},
author={Xie, Saining and Girshick, Ross and Doll{\'a}r, Piotr and Tu, Zhuowen and He, Kaiming},
booktitle={Proceedings of the IEEE conference on computer vision and pattern recognition},
pages={1492--1500},
year={2017}
}
MPII (CVPR'2014)
@inproceedings{andriluka14cvpr,
author = {Mykhaylo Andriluka and Leonid Pishchulin and Peter Gehler and Schiele, Bernt},
title = {2D Human Pose Estimation: New Benchmark and State of the Art Analysis},
booktitle = {IEEE Conference on Computer Vision and Pattern Recognition (CVPR)},
year = {2014},
month = {June}
}
Results on MPII val set
Arch | Input Size | Mean | Mean@0.1 | ckpt | log |
---|---|---|---|---|---|
pose_resnext_152 | 256x256 | 0.887 | 0.294 | ckpt | log |
Topdown Heatmap + Litehrnet on Mpii¶
LiteHRNet (CVPR'2021)
@inproceedings{Yulitehrnet21,
title={Lite-HRNet: A Lightweight High-Resolution Network},
author={Yu, Changqian and Xiao, Bin and Gao, Changxin and Yuan, Lu and Zhang, Lei and Sang, Nong and Wang, Jingdong},
booktitle={CVPR},
year={2021}
}
MPII (CVPR'2014)
@inproceedings{andriluka14cvpr,
author = {Mykhaylo Andriluka and Leonid Pishchulin and Peter Gehler and Schiele, Bernt},
title = {2D Human Pose Estimation: New Benchmark and State of the Art Analysis},
booktitle = {IEEE Conference on Computer Vision and Pattern Recognition (CVPR)},
year = {2014},
month = {June}
}
Results on MPII val set
Arch | Input Size | Mean | Mean@0.1 | ckpt | log |
---|---|---|---|---|---|
LiteHRNet-18 | 256x256 | 0.859 | 0.260 | ckpt | log |
LiteHRNet-30 | 256x256 | 0.869 | 0.271 | ckpt | log |
Topdown Heatmap + Shufflenetv2 on Mpii¶
ShufflenetV2 (ECCV'2018)
@inproceedings{ma2018shufflenet,
title={Shufflenet v2: Practical guidelines for efficient cnn architecture design},
author={Ma, Ningning and Zhang, Xiangyu and Zheng, Hai-Tao and Sun, Jian},
booktitle={Proceedings of the European conference on computer vision (ECCV)},
pages={116--131},
year={2018}
}
MPII (CVPR'2014)
@inproceedings{andriluka14cvpr,
author = {Mykhaylo Andriluka and Leonid Pishchulin and Peter Gehler and Schiele, Bernt},
title = {2D Human Pose Estimation: New Benchmark and State of the Art Analysis},
booktitle = {IEEE Conference on Computer Vision and Pattern Recognition (CVPR)},
year = {2014},
month = {June}
}
Results on MPII val set
Arch | Input Size | Mean | Mean@0.1 | ckpt | log |
---|---|---|---|---|---|
pose_shufflenetv2 | 256x256 | 0.828 | 0.205 | ckpt | log |
Topdown Heatmap + Hrnet on Mpii¶
HRNet (CVPR'2019)
@inproceedings{sun2019deep,
title={Deep high-resolution representation learning for human pose estimation},
author={Sun, Ke and Xiao, Bin and Liu, Dong and Wang, Jingdong},
booktitle={Proceedings of the IEEE conference on computer vision and pattern recognition},
pages={5693--5703},
year={2019}
}
MPII (CVPR'2014)
@inproceedings{andriluka14cvpr,
author = {Mykhaylo Andriluka and Leonid Pishchulin and Peter Gehler and Schiele, Bernt},
title = {2D Human Pose Estimation: New Benchmark and State of the Art Analysis},
booktitle = {IEEE Conference on Computer Vision and Pattern Recognition (CVPR)},
year = {2014},
month = {June}
}
Results on MPII val set
Arch | Input Size | Mean | Mean@0.1 | ckpt | log |
---|---|---|---|---|---|
pose_hrnet_w32 | 256x256 | 0.900 | 0.334 | ckpt | log |
pose_hrnet_w48 | 256x256 | 0.901 | 0.337 | ckpt | log |
Mpii_trb Dataset¶
Topdown Heatmap + Resnet + Mpii on Mpii_trb¶
SimpleBaseline2D (ECCV'2018)
@inproceedings{xiao2018simple,
title={Simple baselines for human pose estimation and tracking},
author={Xiao, Bin and Wu, Haiping and Wei, Yichen},
booktitle={Proceedings of the European conference on computer vision (ECCV)},
pages={466--481},
year={2018}
}
ResNet (CVPR'2016)
@inproceedings{he2016deep,
title={Deep residual learning for image recognition},
author={He, Kaiming and Zhang, Xiangyu and Ren, Shaoqing and Sun, Jian},
booktitle={Proceedings of the IEEE conference on computer vision and pattern recognition},
pages={770--778},
year={2016}
}
MPII-TRB (ICCV'2019)
@inproceedings{duan2019trb,
title={TRB: A Novel Triplet Representation for Understanding 2D Human Body},
author={Duan, Haodong and Lin, Kwan-Yee and Jin, Sheng and Liu, Wentao and Qian, Chen and Ouyang, Wanli},
booktitle={Proceedings of the IEEE International Conference on Computer Vision},
pages={9479--9488},
year={2019}
}
Results on MPII-TRB val set
Arch | Input Size | Skeleton Acc | Contour Acc | Mean Acc | ckpt | log |
---|---|---|---|---|---|---|
pose_resnet_50 | 256x256 | 0.887 | 0.858 | 0.868 | ckpt | log |
pose_resnet_101 | 256x256 | 0.890 | 0.863 | 0.873 | ckpt | log |
pose_resnet_152 | 256x256 | 0.897 | 0.868 | 0.879 | ckpt | log |
Ochuman Dataset¶
Topdown Heatmap + Resnet on Ochuman¶
SimpleBaseline2D (ECCV'2018)
@inproceedings{xiao2018simple,
title={Simple baselines for human pose estimation and tracking},
author={Xiao, Bin and Wu, Haiping and Wei, Yichen},
booktitle={Proceedings of the European conference on computer vision (ECCV)},
pages={466--481},
year={2018}
}
ResNet (CVPR'2016)
@inproceedings{he2016deep,
title={Deep residual learning for image recognition},
author={He, Kaiming and Zhang, Xiangyu and Ren, Shaoqing and Sun, Jian},
booktitle={Proceedings of the IEEE conference on computer vision and pattern recognition},
pages={770--778},
year={2016}
}
OCHuman (CVPR'2019)
@inproceedings{zhang2019pose2seg,
title={Pose2seg: Detection free human instance segmentation},
author={Zhang, Song-Hai and Li, Ruilong and Dong, Xin and Rosin, Paul and Cai, Zixi and Han, Xi and Yang, Dingcheng and Huang, Haozhi and Hu, Shi-Min},
booktitle={Proceedings of the IEEE conference on computer vision and pattern recognition},
pages={889--898},
year={2019}
}
Results on OCHuman test dataset with ground-truth bounding boxes
Following the common setting, the models are trained on COCO train dataset, and evaluate on OCHuman dataset.
Arch | Input Size | AP | AP50 | AP75 | AR | AR50 | ckpt | log |
---|---|---|---|---|---|---|---|---|
pose_resnet_50 | 256x192 | 0.546 | 0.726 | 0.593 | 0.592 | 0.755 | ckpt | log |
pose_resnet_50 | 384x288 | 0.539 | 0.723 | 0.574 | 0.588 | 0.756 | ckpt | log |
pose_resnet_101 | 256x192 | 0.559 | 0.724 | 0.606 | 0.605 | 0.751 | ckpt | log |
pose_resnet_101 | 384x288 | 0.571 | 0.715 | 0.615 | 0.615 | 0.748 | ckpt | log |
pose_resnet_152 | 256x192 | 0.570 | 0.725 | 0.617 | 0.616 | 0.754 | ckpt | log |
pose_resnet_152 | 384x288 | 0.582 | 0.723 | 0.627 | 0.627 | 0.752 | ckpt | log |
Topdown Heatmap + Hrnet on Ochuman¶
HRNet (CVPR'2019)
@inproceedings{sun2019deep,
title={Deep high-resolution representation learning for human pose estimation},
author={Sun, Ke and Xiao, Bin and Liu, Dong and Wang, Jingdong},
booktitle={Proceedings of the IEEE conference on computer vision and pattern recognition},
pages={5693--5703},
year={2019}
}
OCHuman (CVPR'2019)
@inproceedings{zhang2019pose2seg,
title={Pose2seg: Detection free human instance segmentation},
author={Zhang, Song-Hai and Li, Ruilong and Dong, Xin and Rosin, Paul and Cai, Zixi and Han, Xi and Yang, Dingcheng and Huang, Haozhi and Hu, Shi-Min},
booktitle={Proceedings of the IEEE conference on computer vision and pattern recognition},
pages={889--898},
year={2019}
}
Results on OCHuman test dataset with ground-truth bounding boxes
Following the common setting, the models are trained on COCO train dataset, and evaluate on OCHuman dataset.
Arch | Input Size | AP | AP50 | AP75 | AR | AR50 | ckpt | log |
---|---|---|---|---|---|---|---|---|
pose_hrnet_w32 | 256x192 | 0.591 | 0.748 | 0.641 | 0.631 | 0.775 | ckpt | log |
pose_hrnet_w32 | 384x288 | 0.606 | 0.748 | 0.650 | 0.647 | 0.776 | ckpt | log |
pose_hrnet_w48 | 256x192 | 0.611 | 0.752 | 0.663 | 0.648 | 0.778 | ckpt | log |
pose_hrnet_w48 | 384x288 | 0.616 | 0.749 | 0.663 | 0.653 | 0.773 | ckpt | log |
Posetrack18 Dataset¶
Topdown Heatmap + Hrnet on Posetrack18¶
HRNet (CVPR'2019)
@inproceedings{sun2019deep,
title={Deep high-resolution representation learning for human pose estimation},
author={Sun, Ke and Xiao, Bin and Liu, Dong and Wang, Jingdong},
booktitle={Proceedings of the IEEE conference on computer vision and pattern recognition},
pages={5693--5703},
year={2019}
}
PoseTrack18 (CVPR'2018)
@inproceedings{andriluka2018posetrack,
title={Posetrack: A benchmark for human pose estimation and tracking},
author={Andriluka, Mykhaylo and Iqbal, Umar and Insafutdinov, Eldar and Pishchulin, Leonid and Milan, Anton and Gall, Juergen and Schiele, Bernt},
booktitle={Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition},
pages={5167--5176},
year={2018}
}
Results on PoseTrack2018 val with ground-truth bounding boxes
Arch | Input Size | Head | Shou | Elb | Wri | Hip | Knee | Ankl | Total | ckpt | log |
---|---|---|---|---|---|---|---|---|---|---|---|
pose_hrnet_w32 | 256x192 | 87.4 | 88.6 | 84.3 | 78.5 | 79.7 | 81.8 | 78.8 | 83.0 | ckpt | log |
pose_hrnet_w32 | 384x288 | 87.0 | 88.8 | 85.0 | 80.1 | 80.5 | 82.6 | 79.4 | 83.6 | ckpt | log |
pose_hrnet_w48 | 256x192 | 88.2 | 90.1 | 85.8 | 80.8 | 80.7 | 83.3 | 80.3 | 84.4 | ckpt | log |
pose_hrnet_w48 | 384x288 | 87.8 | 90.0 | 85.9 | 81.3 | 81.1 | 83.3 | 80.9 | 84.5 | ckpt | log |
The models are first pre-trained on COCO dataset, and then fine-tuned on PoseTrack18.
Results on PoseTrack2018 val with MMDetection pre-trained Cascade R-CNN (X-101-64x4d-FPN) human detector
Arch | Input Size | Head | Shou | Elb | Wri | Hip | Knee | Ankl | Total | ckpt | log |
---|---|---|---|---|---|---|---|---|---|---|---|
pose_hrnet_w32 | 256x192 | 78.0 | 82.9 | 79.5 | 73.8 | 76.9 | 76.6 | 70.2 | 76.9 | ckpt | log |
pose_hrnet_w32 | 384x288 | 79.9 | 83.6 | 80.4 | 74.5 | 74.8 | 76.1 | 70.5 | 77.3 | ckpt | log |
pose_hrnet_w48 | 256x192 | 80.1 | 83.4 | 80.6 | 74.8 | 74.3 | 76.8 | 70.4 | 77.4 | ckpt | log |
pose_hrnet_w48 | 384x288 | 80.2 | 83.8 | 80.9 | 75.2 | 74.7 | 76.7 | 71.7 | 77.8 | ckpt | log |
The models are first pre-trained on COCO dataset, and then fine-tuned on PoseTrack18.
Topdown Heatmap + Resnet on Posetrack18¶
SimpleBaseline2D (ECCV'2018)
@inproceedings{xiao2018simple,
title={Simple baselines for human pose estimation and tracking},
author={Xiao, Bin and Wu, Haiping and Wei, Yichen},
booktitle={Proceedings of the European conference on computer vision (ECCV)},
pages={466--481},
year={2018}
}
ResNet (CVPR'2016)
@inproceedings{he2016deep,
title={Deep residual learning for image recognition},
author={He, Kaiming and Zhang, Xiangyu and Ren, Shaoqing and Sun, Jian},
booktitle={Proceedings of the IEEE conference on computer vision and pattern recognition},
pages={770--778},
year={2016}
}
PoseTrack18 (CVPR'2018)
@inproceedings{andriluka2018posetrack,
title={Posetrack: A benchmark for human pose estimation and tracking},
author={Andriluka, Mykhaylo and Iqbal, Umar and Insafutdinov, Eldar and Pishchulin, Leonid and Milan, Anton and Gall, Juergen and Schiele, Bernt},
booktitle={Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition},
pages={5167--5176},
year={2018}
}
Results on PoseTrack2018 val with ground-truth bounding boxes
Arch | Input Size | Head | Shou | Elb | Wri | Hip | Knee | Ankl | Total | ckpt | log |
---|---|---|---|---|---|---|---|---|---|---|---|
pose_resnet_50 | 256x192 | 86.5 | 87.5 | 82.3 | 75.6 | 79.9 | 78.6 | 74.0 | 81.0 | ckpt | log |
The models are first pre-trained on COCO dataset, and then fine-tuned on PoseTrack18.
Results on PoseTrack2018 val with MMDetection pre-trained Cascade R-CNN (X-101-64x4d-FPN) human detector
Arch | Input Size | Head | Shou | Elb | Wri | Hip | Knee | Ankl | Total | ckpt | log |
---|---|---|---|---|---|---|---|---|---|---|---|
pose_resnet_50 | 256x192 | 78.9 | 81.9 | 77.8 | 70.8 | 75.3 | 73.2 | 66.4 | 75.2 | ckpt | log |
The models are first pre-trained on COCO dataset, and then fine-tuned on PoseTrack18.
Body(2D,Kpt,Sview,Vid)¶
Posetrack18 Dataset¶
Posewarper + Hrnet + Posetrack18 on Posetrack18¶
PoseWarper (NeurIPS'2019)
@inproceedings{NIPS2019_gberta,
title = {Learning Temporal Pose Estimation from Sparsely Labeled Videos},
author = {Bertasius, Gedas and Feichtenhofer, Christoph, and Tran, Du and Shi, Jianbo, and Torresani, Lorenzo},
booktitle = {Advances in Neural Information Processing Systems 33},
year = {2019},
}
HRNet (CVPR'2019)
@inproceedings{sun2019deep,
title={Deep high-resolution representation learning for human pose estimation},
author={Sun, Ke and Xiao, Bin and Liu, Dong and Wang, Jingdong},
booktitle={Proceedings of the IEEE conference on computer vision and pattern recognition},
pages={5693--5703},
year={2019}
}
PoseTrack18 (CVPR'2018)
@inproceedings{andriluka2018posetrack,
title={Posetrack: A benchmark for human pose estimation and tracking},
author={Andriluka, Mykhaylo and Iqbal, Umar and Insafutdinov, Eldar and Pishchulin, Leonid and Milan, Anton and Gall, Juergen and Schiele, Bernt},
booktitle={Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition},
pages={5167--5176},
year={2018}
}
COCO (ECCV'2014)
@inproceedings{lin2014microsoft,
title={Microsoft coco: Common objects in context},
author={Lin, Tsung-Yi and Maire, Michael and Belongie, Serge and Hays, James and Perona, Pietro and Ramanan, Deva and Doll{\'a}r, Piotr and Zitnick, C Lawrence},
booktitle={European conference on computer vision},
pages={740--755},
year={2014},
organization={Springer}
}
Note that the training of PoseWarper can be split into two stages.
The first-stage is trained with the pre-trained checkpoint on COCO dataset, and the main backbone is fine-tuned on PoseTrack18 in a single-frame setting.
The second-stage is trained with the last checkpoint from the first stage, and the warping offsets are learned in a multi-frame setting while the backbone is frozen.
Results on PoseTrack2018 val with ground-truth bounding boxes
Arch | Input Size | Head | Shou | Elb | Wri | Hip | Knee | Ankl | Total | ckpt | log |
---|---|---|---|---|---|---|---|---|---|---|---|
pose_hrnet_w48 | 384x288 | 88.2 | 90.3 | 86.1 | 81.6 | 81.8 | 83.8 | 81.5 | 85.0 | ckpt | log |
Results on PoseTrack2018 val with precomputed human bounding boxes from PoseWarper supplementary data files from this link1.
Arch | Input Size | Head | Shou | Elb | Wri | Hip | Knee | Ankl | Total | ckpt | log |
---|---|---|---|---|---|---|---|---|---|---|---|
pose_hrnet_w48 | 384x288 | 81.8 | 85.6 | 82.7 | 77.2 | 76.8 | 79.0 | 74.4 | 79.8 | ckpt | log |
1 Please download the precomputed human bounding boxes on PoseTrack2018 val from $PoseWarper_supp_files/posetrack18_precomputed_boxes/val_boxes.json
and place it here: $mmpose/data/posetrack18/posetrack18_precomputed_boxes/val_boxes.json
to be consistent with the config. Please refer to DATA Preparation for more detail about data preparation.
Body(3D,Kpt,Sview,Img)¶
H36m Dataset¶
Pose Lift + Simplebaseline3d on H36m¶
SimpleBaseline3D (ICCV'2017)
@inproceedings{martinez_2017_3dbaseline,
title={A simple yet effective baseline for 3d human pose estimation},
author={Martinez, Julieta and Hossain, Rayat and Romero, Javier and Little, James J.},
booktitle={ICCV},
year={2017}
}
Human3.6M (TPAMI'2014)
@article{h36m_pami,
author = {Ionescu, Catalin and Papava, Dragos and Olaru, Vlad and Sminchisescu, Cristian},
title = {Human3.6M: Large Scale Datasets and Predictive Methods for 3D Human Sensing in Natural Environments},
journal = {IEEE Transactions on Pattern Analysis and Machine Intelligence},
publisher = {IEEE Computer Society},
volume = {36},
number = {7},
pages = {1325-1339},
month = {jul},
year = {2014}
}
Results on Human3.6M dataset with ground truth 2D detections
Arch | MPJPE | P-MPJPE | ckpt | log |
---|---|---|---|---|
simple_baseline_3d_tcn1 | 43.4 | 34.3 | ckpt | log |
1 Differing from the original paper, we didn’t apply the max-norm constraint
because we found this led to a better convergence and performance.
Mpi_inf_3dhp Dataset¶
Pose Lift + Simplebaseline3d on Mpi_inf_3dhp¶
SimpleBaseline3D (ICCV'2017)
@inproceedings{martinez_2017_3dbaseline,
title={A simple yet effective baseline for 3d human pose estimation},
author={Martinez, Julieta and Hossain, Rayat and Romero, Javier and Little, James J.},
booktitle={ICCV},
year={2017}
}
MPI-INF-3DHP (3DV'2017)
@inproceedings{mono-3dhp2017,
author = {Mehta, Dushyant and Rhodin, Helge and Casas, Dan and Fua, Pascal and Sotnychenko, Oleksandr and Xu, Weipeng and Theobalt, Christian},
title = {Monocular 3D Human Pose Estimation In The Wild Using Improved CNN Supervision},
booktitle = {3D Vision (3DV), 2017 Fifth International Conference on},
url = {http://gvv.mpi-inf.mpg.de/3dhp_dataset},
year = {2017},
organization={IEEE},
doi={10.1109/3dv.2017.00064},
}
Results on MPI-INF-3DHP dataset with ground truth 2D detections
Arch | MPJPE | P-MPJPE | 3DPCK | 3DAUC | ckpt | log |
---|---|---|---|---|---|---|
simple_baseline_3d_tcn1 | 84.3 | 53.2 | 85.0 | 52.0 | ckpt | log |
1 Differing from the original paper, we didn’t apply the max-norm constraint
because we found this led to a better convergence and performance.
Body(3D,Kpt,Sview,Vid)¶
H36m Dataset¶
Video Pose Lift + Videopose3d on H36m¶
VideoPose3D (CVPR'2019)
@inproceedings{pavllo20193d,
title={3d human pose estimation in video with temporal convolutions and semi-supervised training},
author={Pavllo, Dario and Feichtenhofer, Christoph and Grangier, David and Auli, Michael},
booktitle={Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition},
pages={7753--7762},
year={2019}
}
Human3.6M (TPAMI'2014)
@article{h36m_pami,
author = {Ionescu, Catalin and Papava, Dragos and Olaru, Vlad and Sminchisescu, Cristian},
title = {Human3.6M: Large Scale Datasets and Predictive Methods for 3D Human Sensing in Natural Environments},
journal = {IEEE Transactions on Pattern Analysis and Machine Intelligence},
publisher = {IEEE Computer Society},
volume = {36},
number = {7},
pages = {1325-1339},
month = {jul},
year = {2014}
}
Results on Human3.6M dataset with ground truth 2D detections, supervised training
Arch | Receptive Field | MPJPE | P-MPJPE | ckpt | log |
---|---|---|---|---|---|
VideoPose3D | 27 | 40.0 | 30.1 | ckpt | log |
VideoPose3D | 81 | 38.9 | 29.2 | ckpt | log |
VideoPose3D | 243 | 37.6 | 28.3 | ckpt | log |
Results on Human3.6M dataset with CPN 2D detections1, supervised training
Arch | Receptive Field | MPJPE | P-MPJPE | ckpt | log |
---|---|---|---|---|---|
VideoPose3D | 1 | 52.9 | 41.3 | ckpt | log |
VideoPose3D | 243 | 47.9 | 38.0 | ckpt | log |
Results on Human3.6M dataset with ground truth 2D detections, semi-supervised training
Training Data | Arch | Receptive Field | MPJPE | P-MPJPE | N-MPJPE | ckpt | log |
---|---|---|---|---|---|---|---|
10% S1 | VideoPose3D | 27 | 58.1 | 42.8 | 54.7 | ckpt | log |
Results on Human3.6M dataset with CPN 2D detections1, semi-supervised training
Training Data | Arch | Receptive Field | MPJPE | P-MPJPE | N-MPJPE | ckpt | log |
---|---|---|---|---|---|---|---|
10% S1 | VideoPose3D | 27 | 67.4 | 50.1 | 63.2 | ckpt | log |
1 CPN 2D detections are provided by official repo. The reformatted version used in this repository can be downloaded from train_detection and test_detection.
Mpi_inf_3dhp Dataset¶
Video Pose Lift + Videopose3d on Mpi_inf_3dhp¶
VideoPose3D (CVPR'2019)
@inproceedings{pavllo20193d,
title={3d human pose estimation in video with temporal convolutions and semi-supervised training},
author={Pavllo, Dario and Feichtenhofer, Christoph and Grangier, David and Auli, Michael},
booktitle={Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition},
pages={7753--7762},
year={2019}
}
MPI-INF-3DHP (3DV'2017)
@inproceedings{mono-3dhp2017,
author = {Mehta, Dushyant and Rhodin, Helge and Casas, Dan and Fua, Pascal and Sotnychenko, Oleksandr and Xu, Weipeng and Theobalt, Christian},
title = {Monocular 3D Human Pose Estimation In The Wild Using Improved CNN Supervision},
booktitle = {3D Vision (3DV), 2017 Fifth International Conference on},
url = {http://gvv.mpi-inf.mpg.de/3dhp_dataset},
year = {2017},
organization={IEEE},
doi={10.1109/3dv.2017.00064},
}
Results on MPI-INF-3DHP dataset with ground truth 2D detections, supervised training
Arch | Receptive Field | MPJPE | P-MPJPE | 3DPCK | 3DAUC | ckpt | log |
---|---|---|---|---|---|---|---|
VideoPose3D | 1 | 58.3 | 40.6 | 94.1 | 63.1 | ckpt | log |
Body(3D,Kpt,Mview,Img)¶
Panoptic Dataset¶
Voxelpose + Voxelpose + Prn64x64x64 + Cpn80x80x20 + Panoptic on Panoptic¶
VoxelPose (ECCV'2020)
@inproceedings{tumultipose,
title={VoxelPose: Towards Multi-Camera 3D Human Pose Estimation in Wild Environment},
author={Tu, Hanyue and Wang, Chunyu and Zeng, Wenjun},
booktitle={ECCV},
year={2020}
}
CMU Panoptic (ICCV'2015)
@Article = {joo_iccv_2015,
author = {Hanbyul Joo, Hao Liu, Lei Tan, Lin Gui, Bart Nabbe, Iain Matthews, Takeo Kanade, Shohei Nobuhara, and Yaser Sheikh},
title = {Panoptic Studio: A Massively Multiview System for Social Motion Capture},
booktitle = {ICCV},
year = {2015}
}
Results on CMU Panoptic dataset.
Arch | mAP | mAR | MPJPE | Recall@500mm | ckpt | log |
---|---|---|---|---|---|---|
prn64_cpn80_res50 | 97.31 | 97.99 | 17.57 | 99.85 | ckpt | log |
Body(3D,Mesh,Sview,Img)¶
Mixed Dataset¶
HMR + Resnet on Mixed¶
HMR (CVPR'2018)
@inProceedings{kanazawaHMR18,
title={End-to-end Recovery of Human Shape and Pose},
author = {Angjoo Kanazawa
and Michael J. Black
and David W. Jacobs
and Jitendra Malik},
booktitle={Computer Vision and Pattern Recognition (CVPR)},
year={2018}
}
ResNet (CVPR'2016)
@inproceedings{he2016deep,
title={Deep residual learning for image recognition},
author={He, Kaiming and Zhang, Xiangyu and Ren, Shaoqing and Sun, Jian},
booktitle={Proceedings of the IEEE conference on computer vision and pattern recognition},
pages={770--778},
year={2016}
}
Human3.6M (TPAMI'2014)
@article{h36m_pami,
author = {Ionescu, Catalin and Papava, Dragos and Olaru, Vlad and Sminchisescu, Cristian},
title = {Human3.6M: Large Scale Datasets and Predictive Methods for 3D Human Sensing in Natural Environments},
journal = {IEEE Transactions on Pattern Analysis and Machine Intelligence},
publisher = {IEEE Computer Society},
volume = {36},
number = {7},
pages = {1325-1339},
month = {jul},
year = {2014}
}
Results on Human3.6M with ground-truth bounding box having MPJPE-PA of 52.60 mm on Protocol2
Arch | Input Size | MPJPE (P1) | MPJPE-PA (P1) | MPJPE (P2) | MPJPE-PA (P2) | ckpt | log |
---|---|---|---|---|---|---|---|
hmr_resnet_50 | 224x224 | 80.75 | 55.08 | 80.35 | 52.60 | ckpt | log |
Face¶
300w Dataset¶
Topdown Heatmap + Hrnetv2 on 300w¶
HRNetv2 (TPAMI'2019)
@article{WangSCJDZLMTWLX19,
title={Deep High-Resolution Representation Learning for Visual Recognition},
author={Jingdong Wang and Ke Sun and Tianheng Cheng and
Borui Jiang and Chaorui Deng and Yang Zhao and Dong Liu and Yadong Mu and
Mingkui Tan and Xinggang Wang and Wenyu Liu and Bin Xiao},
journal={TPAMI},
year={2019}
}
300W (IMAVIS'2016)
@article{sagonas2016300,
title={300 faces in-the-wild challenge: Database and results},
author={Sagonas, Christos and Antonakos, Epameinondas and Tzimiropoulos, Georgios and Zafeiriou, Stefanos and Pantic, Maja},
journal={Image and vision computing},
volume={47},
pages={3--18},
year={2016},
publisher={Elsevier}
}
Results on 300W dataset
The model is trained on 300W train.
Arch | Input Size | NMEcommon | NMEchallenge | NMEfull | NMEtest | ckpt | log |
---|---|---|---|---|---|---|---|
pose_hrnetv2_w18 | 256x256 | 2.86 | 5.45 | 3.37 | 3.97 | ckpt | log |
Aflw Dataset¶
Topdown Heatmap + Hrnetv2 + Dark on Aflw¶
HRNetv2 (TPAMI'2019)
@article{WangSCJDZLMTWLX19,
title={Deep High-Resolution Representation Learning for Visual Recognition},
author={Jingdong Wang and Ke Sun and Tianheng Cheng and
Borui Jiang and Chaorui Deng and Yang Zhao and Dong Liu and Yadong Mu and
Mingkui Tan and Xinggang Wang and Wenyu Liu and Bin Xiao},
journal={TPAMI},
year={2019}
}
DarkPose (CVPR'2020)
@inproceedings{zhang2020distribution,
title={Distribution-aware coordinate representation for human pose estimation},
author={Zhang, Feng and Zhu, Xiatian and Dai, Hanbin and Ye, Mao and Zhu, Ce},
booktitle={Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition},
pages={7093--7102},
year={2020}
}
AFLW (ICCVW'2011)
@inproceedings{koestinger2011annotated,
title={Annotated facial landmarks in the wild: A large-scale, real-world database for facial landmark localization},
author={Koestinger, Martin and Wohlhart, Paul and Roth, Peter M and Bischof, Horst},
booktitle={2011 IEEE international conference on computer vision workshops (ICCV workshops)},
pages={2144--2151},
year={2011},
organization={IEEE}
}
Results on AFLW dataset
The model is trained on AFLW train and evaluated on AFLW full and frontal.
Arch | Input Size | NMEfull | NMEfrontal | ckpt | log |
---|---|---|---|---|---|
pose_hrnetv2_w18_dark | 256x256 | 1.34 | 1.20 | ckpt | log |
Topdown Heatmap + Hrnetv2 on Aflw¶
HRNetv2 (TPAMI'2019)
@article{WangSCJDZLMTWLX19,
title={Deep High-Resolution Representation Learning for Visual Recognition},
author={Jingdong Wang and Ke Sun and Tianheng Cheng and
Borui Jiang and Chaorui Deng and Yang Zhao and Dong Liu and Yadong Mu and
Mingkui Tan and Xinggang Wang and Wenyu Liu and Bin Xiao},
journal={TPAMI},
year={2019}
}
AFLW (ICCVW'2011)
@inproceedings{koestinger2011annotated,
title={Annotated facial landmarks in the wild: A large-scale, real-world database for facial landmark localization},
author={Koestinger, Martin and Wohlhart, Paul and Roth, Peter M and Bischof, Horst},
booktitle={2011 IEEE international conference on computer vision workshops (ICCV workshops)},
pages={2144--2151},
year={2011},
organization={IEEE}
}
Results on AFLW dataset
The model is trained on AFLW train and evaluated on AFLW full and frontal.
Arch | Input Size | NMEfull | NMEfrontal | ckpt | log |
---|---|---|---|---|---|
pose_hrnetv2_w18 | 256x256 | 1.41 | 1.27 | ckpt | log |
Coco_wholebody_face Dataset¶
Topdown Heatmap + Hrnetv2 + Dark + Coco + Wholebody on Coco_wholebody_face¶
HRNetv2 (TPAMI'2019)
@article{WangSCJDZLMTWLX19,
title={Deep High-Resolution Representation Learning for Visual Recognition},
author={Jingdong Wang and Ke Sun and Tianheng Cheng and
Borui Jiang and Chaorui Deng and Yang Zhao and Dong Liu and Yadong Mu and
Mingkui Tan and Xinggang Wang and Wenyu Liu and Bin Xiao},
journal={TPAMI},
year={2019}
}
DarkPose (CVPR'2020)
@inproceedings{zhang2020distribution,
title={Distribution-aware coordinate representation for human pose estimation},
author={Zhang, Feng and Zhu, Xiatian and Dai, Hanbin and Ye, Mao and Zhu, Ce},
booktitle={Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition},
pages={7093--7102},
year={2020}
}
COCO-WholeBody-Face (ECCV'2020)
@inproceedings{jin2020whole,
title={Whole-Body Human Pose Estimation in the Wild},
author={Jin, Sheng and Xu, Lumin and Xu, Jin and Wang, Can and Liu, Wentao and Qian, Chen and Ouyang, Wanli and Luo, Ping},
booktitle={Proceedings of the European Conference on Computer Vision (ECCV)},
year={2020}
}
Results on COCO-WholeBody-Face val set
Arch | Input Size | NME | ckpt | log |
---|---|---|---|---|
pose_hrnetv2_w18_dark | 256x256 | 0.0513 | ckpt | log |
Topdown Heatmap + Mobilenetv2 + Coco + Wholebody on Coco_wholebody_face¶
MobilenetV2 (CVPR'2018)
@inproceedings{sandler2018mobilenetv2,
title={Mobilenetv2: Inverted residuals and linear bottlenecks},
author={Sandler, Mark and Howard, Andrew and Zhu, Menglong and Zhmoginov, Andrey and Chen, Liang-Chieh},
booktitle={Proceedings of the IEEE conference on computer vision and pattern recognition},
pages={4510--4520},
year={2018}
}
COCO-WholeBody-Face (ECCV'2020)
@inproceedings{jin2020whole,
title={Whole-Body Human Pose Estimation in the Wild},
author={Jin, Sheng and Xu, Lumin and Xu, Jin and Wang, Can and Liu, Wentao and Qian, Chen and Ouyang, Wanli and Luo, Ping},
booktitle={Proceedings of the European Conference on Computer Vision (ECCV)},
year={2020}
}
Results on COCO-WholeBody-Face val set
Arch | Input Size | NME | ckpt | log |
---|---|---|---|---|
pose_mobilenetv2 | 256x256 | 0.0612 | ckpt | log |
Topdown Heatmap + Hourglass + Coco + Wholebody on Coco_wholebody_face¶
Hourglass (ECCV'2016)
@inproceedings{newell2016stacked,
title={Stacked hourglass networks for human pose estimation},
author={Newell, Alejandro and Yang, Kaiyu and Deng, Jia},
booktitle={European conference on computer vision},
pages={483--499},
year={2016},
organization={Springer}
}
COCO-WholeBody-Face (ECCV'2020)
@inproceedings{jin2020whole,
title={Whole-Body Human Pose Estimation in the Wild},
author={Jin, Sheng and Xu, Lumin and Xu, Jin and Wang, Can and Liu, Wentao and Qian, Chen and Ouyang, Wanli and Luo, Ping},
booktitle={Proceedings of the European Conference on Computer Vision (ECCV)},
year={2020}
}
Results on COCO-WholeBody-Face val set
Arch | Input Size | NME | ckpt | log |
---|---|---|---|---|
pose_hourglass_52 | 256x256 | 0.0586 | ckpt | log |
Topdown Heatmap + Resnet + Coco + Wholebody on Coco_wholebody_face¶
SimpleBaseline2D (ECCV'2018)
@inproceedings{xiao2018simple,
title={Simple baselines for human pose estimation and tracking},
author={Xiao, Bin and Wu, Haiping and Wei, Yichen},
booktitle={Proceedings of the European conference on computer vision (ECCV)},
pages={466--481},
year={2018}
}
ResNet (CVPR'2016)
@inproceedings{he2016deep,
title={Deep residual learning for image recognition},
author={He, Kaiming and Zhang, Xiangyu and Ren, Shaoqing and Sun, Jian},
booktitle={Proceedings of the IEEE conference on computer vision and pattern recognition},
pages={770--778},
year={2016}
}
COCO-WholeBody-Face (ECCV'2020)
@inproceedings{jin2020whole,
title={Whole-Body Human Pose Estimation in the Wild},
author={Jin, Sheng and Xu, Lumin and Xu, Jin and Wang, Can and Liu, Wentao and Qian, Chen and Ouyang, Wanli and Luo, Ping},
booktitle={Proceedings of the European Conference on Computer Vision (ECCV)},
year={2020}
}
Results on COCO-WholeBody-Face val set
Arch | Input Size | NME | ckpt | log |
---|---|---|---|---|
pose_res50 | 256x256 | 0.0566 | ckpt | log |
Topdown Heatmap + Hrnetv2 + Coco + Wholebody on Coco_wholebody_face¶
HRNetv2 (TPAMI'2019)
@article{WangSCJDZLMTWLX19,
title={Deep High-Resolution Representation Learning for Visual Recognition},
author={Jingdong Wang and Ke Sun and Tianheng Cheng and
Borui Jiang and Chaorui Deng and Yang Zhao and Dong Liu and Yadong Mu and
Mingkui Tan and Xinggang Wang and Wenyu Liu and Bin Xiao},
journal={TPAMI},
year={2019}
}
COCO-WholeBody-Face (ECCV'2020)
@inproceedings{jin2020whole,
title={Whole-Body Human Pose Estimation in the Wild},
author={Jin, Sheng and Xu, Lumin and Xu, Jin and Wang, Can and Liu, Wentao and Qian, Chen and Ouyang, Wanli and Luo, Ping},
booktitle={Proceedings of the European Conference on Computer Vision (ECCV)},
year={2020}
}
Results on COCO-WholeBody-Face val set
Arch | Input Size | NME | ckpt | log |
---|---|---|---|---|
pose_hrnetv2_w18 | 256x256 | 0.0569 | ckpt | log |
Topdown Heatmap + Scnet + Coco + Wholebody on Coco_wholebody_face¶
SCNet (CVPR'2020)
@inproceedings{liu2020improving,
title={Improving Convolutional Networks with Self-Calibrated Convolutions},
author={Liu, Jiang-Jiang and Hou, Qibin and Cheng, Ming-Ming and Wang, Changhu and Feng, Jiashi},
booktitle={Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition},
pages={10096--10105},
year={2020}
}
COCO-WholeBody-Face (ECCV'2020)
@inproceedings{jin2020whole,
title={Whole-Body Human Pose Estimation in the Wild},
author={Jin, Sheng and Xu, Lumin and Xu, Jin and Wang, Can and Liu, Wentao and Qian, Chen and Ouyang, Wanli and Luo, Ping},
booktitle={Proceedings of the European Conference on Computer Vision (ECCV)},
year={2020}
}
Results on COCO-WholeBody-Face val set
Arch | Input Size | NME | ckpt | log |
---|---|---|---|---|
pose_scnet_50 | 256x256 | 0.0565 | ckpt | log |
Cofw Dataset¶
Topdown Heatmap + Hrnetv2 on Cofw¶
HRNetv2 (TPAMI'2019)
@article{WangSCJDZLMTWLX19,
title={Deep High-Resolution Representation Learning for Visual Recognition},
author={Jingdong Wang and Ke Sun and Tianheng Cheng and
Borui Jiang and Chaorui Deng and Yang Zhao and Dong Liu and Yadong Mu and
Mingkui Tan and Xinggang Wang and Wenyu Liu and Bin Xiao},
journal={TPAMI},
year={2019}
}
COFW (ICCV'2013)
@inproceedings{burgos2013robust,
title={Robust face landmark estimation under occlusion},
author={Burgos-Artizzu, Xavier P and Perona, Pietro and Doll{\'a}r, Piotr},
booktitle={Proceedings of the IEEE international conference on computer vision},
pages={1513--1520},
year={2013}
}
Results on COFW dataset
The model is trained on COFW train.
Arch | Input Size | NME | ckpt | log |
---|---|---|---|---|
pose_hrnetv2_w18 | 256x256 | 3.40 | ckpt | log |
WFLW Dataset¶
Deeppose + Resnet + Softwingloss on WFLW¶
DeepPose (CVPR'2014)
@inproceedings{toshev2014deeppose,
title={Deeppose: Human pose estimation via deep neural networks},
author={Toshev, Alexander and Szegedy, Christian},
booktitle={Proceedings of the IEEE conference on computer vision and pattern recognition},
pages={1653--1660},
year={2014}
}
ResNet (CVPR'2016)
@inproceedings{he2016deep,
title={Deep residual learning for image recognition},
author={He, Kaiming and Zhang, Xiangyu and Ren, Shaoqing and Sun, Jian},
booktitle={Proceedings of the IEEE conference on computer vision and pattern recognition},
pages={770--778},
year={2016}
}
SoftWingloss (TIP'2021)
@article{lin2021structure,
title={Structure-Coherent Deep Feature Learning for Robust Face Alignment},
author={Lin, Chunze and Zhu, Beier and Wang, Quan and Liao, Renjie and Qian, Chen and Lu, Jiwen and Zhou, Jie},
journal={IEEE Transactions on Image Processing},
year={2021},
publisher={IEEE}
}
WFLW (CVPR'2018)
@inproceedings{wu2018look,
title={Look at boundary: A boundary-aware face alignment algorithm},
author={Wu, Wayne and Qian, Chen and Yang, Shuo and Wang, Quan and Cai, Yici and Zhou, Qiang},
booktitle={Proceedings of the IEEE conference on computer vision and pattern recognition},
pages={2129--2138},
year={2018}
}
Results on WFLW dataset
The model is trained on WFLW train.
Arch | Input Size | NMEtest | NMEpose | NMEillumination | NMEocclusion | NMEblur | NMEmakeup | NMEexpression | ckpt | log |
---|---|---|---|---|---|---|---|---|---|---|
deeppose_res50_softwingloss | 256x256 | 4.41 | 7.77 | 4.37 | 5.27 | 5.01 | 4.36 | 4.70 | ckpt | log |
Deeppose + Resnet on WFLW¶
DeepPose (CVPR'2014)
@inproceedings{toshev2014deeppose,
title={Deeppose: Human pose estimation via deep neural networks},
author={Toshev, Alexander and Szegedy, Christian},
booktitle={Proceedings of the IEEE conference on computer vision and pattern recognition},
pages={1653--1660},
year={2014}
}
ResNet (CVPR'2016)
@inproceedings{he2016deep,
title={Deep residual learning for image recognition},
author={He, Kaiming and Zhang, Xiangyu and Ren, Shaoqing and Sun, Jian},
booktitle={Proceedings of the IEEE conference on computer vision and pattern recognition},
pages={770--778},
year={2016}
}
WFLW (CVPR'2018)
@inproceedings{wu2018look,
title={Look at boundary: A boundary-aware face alignment algorithm},
author={Wu, Wayne and Qian, Chen and Yang, Shuo and Wang, Quan and Cai, Yici and Zhou, Qiang},
booktitle={Proceedings of the IEEE conference on computer vision and pattern recognition},
pages={2129--2138},
year={2018}
}
Results on WFLW dataset
The model is trained on WFLW train.
Arch | Input Size | NMEtest | NMEpose | NMEillumination | NMEocclusion | NMEblur | NMEmakeup | NMEexpression | ckpt | log |
---|---|---|---|---|---|---|---|---|---|---|
deeppose_res50 | 256x256 | 4.85 | 8.50 | 4.81 | 5.69 | 5.45 | 4.82 | 5.20 | ckpt | log |
Deeppose + Resnet + Wingloss on WFLW¶
DeepPose (CVPR'2014)
@inproceedings{toshev2014deeppose,
title={Deeppose: Human pose estimation via deep neural networks},
author={Toshev, Alexander and Szegedy, Christian},
booktitle={Proceedings of the IEEE conference on computer vision and pattern recognition},
pages={1653--1660},
year={2014}
}
ResNet (CVPR'2016)
@inproceedings{he2016deep,
title={Deep residual learning for image recognition},
author={He, Kaiming and Zhang, Xiangyu and Ren, Shaoqing and Sun, Jian},
booktitle={Proceedings of the IEEE conference on computer vision and pattern recognition},
pages={770--778},
year={2016}
}
Wingloss (CVPR'2018)
@inproceedings{feng2018wing,
title={Wing Loss for Robust Facial Landmark Localisation with Convolutional Neural Networks},
author={Feng, Zhen-Hua and Kittler, Josef and Awais, Muhammad and Huber, Patrik and Wu, Xiao-Jun},
booktitle={Computer Vision and Pattern Recognition (CVPR), 2018 IEEE Conference on},
year={2018},
pages ={2235-2245},
organization={IEEE}
}
WFLW (CVPR'2018)
@inproceedings{wu2018look,
title={Look at boundary: A boundary-aware face alignment algorithm},
author={Wu, Wayne and Qian, Chen and Yang, Shuo and Wang, Quan and Cai, Yici and Zhou, Qiang},
booktitle={Proceedings of the IEEE conference on computer vision and pattern recognition},
pages={2129--2138},
year={2018}
}
Results on WFLW dataset
The model is trained on WFLW train.
Arch | Input Size | NMEtest | NMEpose | NMEillumination | NMEocclusion | NMEblur | NMEmakeup | NMEexpression | ckpt | log |
---|---|---|---|---|---|---|---|---|---|---|
deeppose_res50_wingloss | 256x256 | 4.64 | 8.25 | 4.59 | 5.56 | 5.26 | 4.59 | 5.07 | ckpt | log |
Topdown Heatmap + Hrnetv2 + Dark on WFLW¶
HRNetv2 (TPAMI'2019)
@article{WangSCJDZLMTWLX19,
title={Deep High-Resolution Representation Learning for Visual Recognition},
author={Jingdong Wang and Ke Sun and Tianheng Cheng and
Borui Jiang and Chaorui Deng and Yang Zhao and Dong Liu and Yadong Mu and
Mingkui Tan and Xinggang Wang and Wenyu Liu and Bin Xiao},
journal={TPAMI},
year={2019}
}
DarkPose (CVPR'2020)
@inproceedings{zhang2020distribution,
title={Distribution-aware coordinate representation for human pose estimation},
author={Zhang, Feng and Zhu, Xiatian and Dai, Hanbin and Ye, Mao and Zhu, Ce},
booktitle={Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition},
pages={7093--7102},
year={2020}
}
WFLW (CVPR'2018)
@inproceedings{wu2018look,
title={Look at boundary: A boundary-aware face alignment algorithm},
author={Wu, Wayne and Qian, Chen and Yang, Shuo and Wang, Quan and Cai, Yici and Zhou, Qiang},
booktitle={Proceedings of the IEEE conference on computer vision and pattern recognition},
pages={2129--2138},
year={2018}
}
Results on WFLW dataset
The model is trained on WFLW train.
Arch | Input Size | NMEtest | NMEpose | NMEillumination | NMEocclusion | NMEblur | NMEmakeup | NMEexpression | ckpt | log |
---|---|---|---|---|---|---|---|---|---|---|
pose_hrnetv2_w18_dark | 256x256 | 3.98 | 6.99 | 3.96 | 4.78 | 4.57 | 3.87 | 4.30 | ckpt | log |
Topdown Heatmap + Hrnetv2 + Awing on WFLW¶
HRNetv2 (TPAMI'2019)
@article{WangSCJDZLMTWLX19,
title={Deep High-Resolution Representation Learning for Visual Recognition},
author={Jingdong Wang and Ke Sun and Tianheng Cheng and
Borui Jiang and Chaorui Deng and Yang Zhao and Dong Liu and Yadong Mu and
Mingkui Tan and Xinggang Wang and Wenyu Liu and Bin Xiao},
journal={TPAMI},
year={2019}
}
AdaptiveWingloss (ICCV'2019)
@inproceedings{wang2019adaptive,
title={Adaptive wing loss for robust face alignment via heatmap regression},
author={Wang, Xinyao and Bo, Liefeng and Fuxin, Li},
booktitle={Proceedings of the IEEE/CVF international conference on computer vision},
pages={6971--6981},
year={2019}
}
WFLW (CVPR'2018)
@inproceedings{wu2018look,
title={Look at boundary: A boundary-aware face alignment algorithm},
author={Wu, Wayne and Qian, Chen and Yang, Shuo and Wang, Quan and Cai, Yici and Zhou, Qiang},
booktitle={Proceedings of the IEEE conference on computer vision and pattern recognition},
pages={2129--2138},
year={2018}
}
Results on WFLW dataset
The model is trained on WFLW train.
Arch | Input Size | NMEtest | NMEpose | NMEillumination | NMEocclusion | NMEblur | NMEmakeup | NMEexpression | ckpt | log |
---|---|---|---|---|---|---|---|---|---|---|
pose_hrnetv2_w18_awing | 256x256 | 4.02 | 6.94 | 3.96 | 4.78 | 4.59 | 3.85 | 4.28 | ckpt | log |
Topdown Heatmap + Hrnetv2 on WFLW¶
HRNetv2 (TPAMI'2019)
@article{WangSCJDZLMTWLX19,
title={Deep High-Resolution Representation Learning for Visual Recognition},
author={Jingdong Wang and Ke Sun and Tianheng Cheng and
Borui Jiang and Chaorui Deng and Yang Zhao and Dong Liu and Yadong Mu and
Mingkui Tan and Xinggang Wang and Wenyu Liu and Bin Xiao},
journal={TPAMI},
year={2019}
}
WFLW (CVPR'2018)
@inproceedings{wu2018look,
title={Look at boundary: A boundary-aware face alignment algorithm},
author={Wu, Wayne and Qian, Chen and Yang, Shuo and Wang, Quan and Cai, Yici and Zhou, Qiang},
booktitle={Proceedings of the IEEE conference on computer vision and pattern recognition},
pages={2129--2138},
year={2018}
}
Results on WFLW dataset
The model is trained on WFLW train.
Arch | Input Size | NMEtest | NMEpose | NMEillumination | NMEocclusion | NMEblur | NMEmakeup | NMEexpression | ckpt | log |
---|---|---|---|---|---|---|---|---|---|---|
pose_hrnetv2_w18 | 256x256 | 4.06 | 6.98 | 3.99 | 4.83 | 4.59 | 3.92 | 4.33 | ckpt | log |
Fashion¶
Deepfashion Dataset¶
Deeppose + Resnet on Deepfashion¶
DeepPose (CVPR'2014)
@inproceedings{toshev2014deeppose,
title={Deeppose: Human pose estimation via deep neural networks},
author={Toshev, Alexander and Szegedy, Christian},
booktitle={Proceedings of the IEEE conference on computer vision and pattern recognition},
pages={1653--1660},
year={2014}
}
ResNet (CVPR'2016)
@inproceedings{he2016deep,
title={Deep residual learning for image recognition},
author={He, Kaiming and Zhang, Xiangyu and Ren, Shaoqing and Sun, Jian},
booktitle={Proceedings of the IEEE conference on computer vision and pattern recognition},
pages={770--778},
year={2016}
}
DeepFashion (CVPR'2016)
@inproceedings{liuLQWTcvpr16DeepFashion,
author = {Liu, Ziwei and Luo, Ping and Qiu, Shi and Wang, Xiaogang and Tang, Xiaoou},
title = {DeepFashion: Powering Robust Clothes Recognition and Retrieval with Rich Annotations},
booktitle = {Proceedings of IEEE Conference on Computer Vision and Pattern Recognition (CVPR)},
month = {June},
year = {2016}
}
DeepFashion (ECCV'2016)
@inproceedings{liuYLWTeccv16FashionLandmark,
author = {Liu, Ziwei and Yan, Sijie and Luo, Ping and Wang, Xiaogang and Tang, Xiaoou},
title = {Fashion Landmark Detection in the Wild},
booktitle = {European Conference on Computer Vision (ECCV)},
month = {October},
year = {2016}
}
Results on DeepFashion val set
Set | Arch | Input Size | PCK@0.2 | AUC | EPE | ckpt | log |
---|---|---|---|---|---|---|---|
upper | deeppose_resnet_50 | 256x256 | 0.965 | 0.535 | 17.2 | ckpt | log |
lower | deeppose_resnet_50 | 256x256 | 0.971 | 0.678 | 11.8 | ckpt | log |
full | deeppose_resnet_50 | 256x256 | 0.983 | 0.602 | 14.0 | ckpt | log |
Topdown Heatmap + Resnet on Deepfashion¶
SimpleBaseline2D (ECCV'2018)
@inproceedings{xiao2018simple,
title={Simple baselines for human pose estimation and tracking},
author={Xiao, Bin and Wu, Haiping and Wei, Yichen},
booktitle={Proceedings of the European conference on computer vision (ECCV)},
pages={466--481},
year={2018}
}
ResNet (CVPR'2016)
@inproceedings{he2016deep,
title={Deep residual learning for image recognition},
author={He, Kaiming and Zhang, Xiangyu and Ren, Shaoqing and Sun, Jian},
booktitle={Proceedings of the IEEE conference on computer vision and pattern recognition},
pages={770--778},
year={2016}
}
DeepFashion (CVPR'2016)
@inproceedings{liuLQWTcvpr16DeepFashion,
author = {Liu, Ziwei and Luo, Ping and Qiu, Shi and Wang, Xiaogang and Tang, Xiaoou},
title = {DeepFashion: Powering Robust Clothes Recognition and Retrieval with Rich Annotations},
booktitle = {Proceedings of IEEE Conference on Computer Vision and Pattern Recognition (CVPR)},
month = {June},
year = {2016}
}
DeepFashion (ECCV'2016)
@inproceedings{liuYLWTeccv16FashionLandmark,
author = {Liu, Ziwei and Yan, Sijie and Luo, Ping and Wang, Xiaogang and Tang, Xiaoou},
title = {Fashion Landmark Detection in the Wild},
booktitle = {European Conference on Computer Vision (ECCV)},
month = {October},
year = {2016}
}
Results on DeepFashion val set
Set | Arch | Input Size | PCK@0.2 | AUC | EPE | ckpt | log |
---|---|---|---|---|---|---|---|
upper | pose_resnet_50 | 256x256 | 0.954 | 0.578 | 16.8 | ckpt | log |
lower | pose_resnet_50 | 256x256 | 0.965 | 0.744 | 10.5 | ckpt | log |
full | pose_resnet_50 | 256x256 | 0.977 | 0.664 | 12.7 | ckpt | log |
Hand(2D)¶
Coco_wholebody_hand Dataset¶
Topdown Heatmap + Hrnetv2 + Coco + Wholebody on Coco_wholebody_hand¶
HRNetv2 (TPAMI'2019)
@article{WangSCJDZLMTWLX19,
title={Deep High-Resolution Representation Learning for Visual Recognition},
author={Jingdong Wang and Ke Sun and Tianheng Cheng and
Borui Jiang and Chaorui Deng and Yang Zhao and Dong Liu and Yadong Mu and
Mingkui Tan and Xinggang Wang and Wenyu Liu and Bin Xiao},
journal={TPAMI},
year={2019}
}
COCO-WholeBody-Hand (ECCV'2020)
@inproceedings{jin2020whole,
title={Whole-Body Human Pose Estimation in the Wild},
author={Jin, Sheng and Xu, Lumin and Xu, Jin and Wang, Can and Liu, Wentao and Qian, Chen and Ouyang, Wanli and Luo, Ping},
booktitle={Proceedings of the European Conference on Computer Vision (ECCV)},
year={2020}
}
Results on COCO-WholeBody-Hand val set
Arch | Input Size | PCK@0.2 | AUC | EPE | ckpt | log |
---|---|---|---|---|---|---|
pose_hrnetv2_w18 | 256x256 | 0.813 | 0.840 | 4.39 | ckpt | log |
Topdown Heatmap + Hourglass + Coco + Wholebody on Coco_wholebody_hand¶
Hourglass (ECCV'2016)
@inproceedings{newell2016stacked,
title={Stacked hourglass networks for human pose estimation},
author={Newell, Alejandro and Yang, Kaiyu and Deng, Jia},
booktitle={European conference on computer vision},
pages={483--499},
year={2016},
organization={Springer}
}
COCO-WholeBody-Hand (ECCV'2020)
@inproceedings{jin2020whole,
title={Whole-Body Human Pose Estimation in the Wild},
author={Jin, Sheng and Xu, Lumin and Xu, Jin and Wang, Can and Liu, Wentao and Qian, Chen and Ouyang, Wanli and Luo, Ping},
booktitle={Proceedings of the European Conference on Computer Vision (ECCV)},
year={2020}
}
Results on COCO-WholeBody-Hand val set
Arch | Input Size | PCK@0.2 | AUC | EPE | ckpt | log |
---|---|---|---|---|---|---|
pose_hourglass_52 | 256x256 | 0.804 | 0.835 | 4.54 | ckpt | log |
Topdown Heatmap + Mobilenetv2 + Coco + Wholebody on Coco_wholebody_hand¶
MobilenetV2 (CVPR'2018)
@inproceedings{sandler2018mobilenetv2,
title={Mobilenetv2: Inverted residuals and linear bottlenecks},
author={Sandler, Mark and Howard, Andrew and Zhu, Menglong and Zhmoginov, Andrey and Chen, Liang-Chieh},
booktitle={Proceedings of the IEEE conference on computer vision and pattern recognition},
pages={4510--4520},
year={2018}
}
COCO-WholeBody-Hand (ECCV'2020)
@inproceedings{jin2020whole,
title={Whole-Body Human Pose Estimation in the Wild},
author={Jin, Sheng and Xu, Lumin and Xu, Jin and Wang, Can and Liu, Wentao and Qian, Chen and Ouyang, Wanli and Luo, Ping},
booktitle={Proceedings of the European Conference on Computer Vision (ECCV)},
year={2020}
}
Results on COCO-WholeBody-Hand val set
Arch | Input Size | PCK@0.2 | AUC | EPE | ckpt | log |
---|---|---|---|---|---|---|
pose_mobilenetv2 | 256x256 | 0.795 | 0.829 | 4.77 | ckpt | log |
Topdown Heatmap + Resnet + Coco + Wholebody on Coco_wholebody_hand¶
SimpleBaseline2D (ECCV'2018)
@inproceedings{xiao2018simple,
title={Simple baselines for human pose estimation and tracking},
author={Xiao, Bin and Wu, Haiping and Wei, Yichen},
booktitle={Proceedings of the European conference on computer vision (ECCV)},
pages={466--481},
year={2018}
}
ResNet (CVPR'2016)
@inproceedings{he2016deep,
title={Deep residual learning for image recognition},
author={He, Kaiming and Zhang, Xiangyu and Ren, Shaoqing and Sun, Jian},
booktitle={Proceedings of the IEEE conference on computer vision and pattern recognition},
pages={770--778},
year={2016}
}
COCO-WholeBody-Hand (ECCV'2020)
@inproceedings{jin2020whole,
title={Whole-Body Human Pose Estimation in the Wild},
author={Jin, Sheng and Xu, Lumin and Xu, Jin and Wang, Can and Liu, Wentao and Qian, Chen and Ouyang, Wanli and Luo, Ping},
booktitle={Proceedings of the European Conference on Computer Vision (ECCV)},
year={2020}
}
Results on COCO-WholeBody-Hand val set
Arch | Input Size | PCK@0.2 | AUC | EPE | ckpt | log |
---|---|---|---|---|---|---|
pose_resnet_50 | 256x256 | 0.800 | 0.833 | 4.64 | ckpt | log |
Topdown Heatmap + Litehrnet + Coco + Wholebody on Coco_wholebody_hand¶
LiteHRNet (CVPR'2021)
@inproceedings{Yulitehrnet21,
title={Lite-HRNet: A Lightweight High-Resolution Network},
author={Yu, Changqian and Xiao, Bin and Gao, Changxin and Yuan, Lu and Zhang, Lei and Sang, Nong and Wang, Jingdong},
booktitle={CVPR},
year={2021}
}
COCO-WholeBody-Hand (ECCV'2020)
@inproceedings{jin2020whole,
title={Whole-Body Human Pose Estimation in the Wild},
author={Jin, Sheng and Xu, Lumin and Xu, Jin and Wang, Can and Liu, Wentao and Qian, Chen and Ouyang, Wanli and Luo, Ping},
booktitle={Proceedings of the European Conference on Computer Vision (ECCV)},
year={2020}
}
Results on COCO-WholeBody-Hand val set
Arch | Input Size | PCK@0.2 | AUC | EPE | ckpt | log |
---|---|---|---|---|---|---|
LiteHRNet-18 | 256x256 | 0.795 | 0.830 | 4.77 | ckpt | log |
Topdown Heatmap + Scnet + Coco + Wholebody on Coco_wholebody_hand¶
SCNet (CVPR'2020)
@inproceedings{liu2020improving,
title={Improving Convolutional Networks with Self-Calibrated Convolutions},
author={Liu, Jiang-Jiang and Hou, Qibin and Cheng, Ming-Ming and Wang, Changhu and Feng, Jiashi},
booktitle={Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition},
pages={10096--10105},
year={2020}
}
COCO-WholeBody-Hand (ECCV'2020)
@inproceedings{jin2020whole,
title={Whole-Body Human Pose Estimation in the Wild},
author={Jin, Sheng and Xu, Lumin and Xu, Jin and Wang, Can and Liu, Wentao and Qian, Chen and Ouyang, Wanli and Luo, Ping},
booktitle={Proceedings of the European Conference on Computer Vision (ECCV)},
year={2020}
}
Results on COCO-WholeBody-Hand val set
Arch | Input Size | PCK@0.2 | AUC | EPE | ckpt | log |
---|---|---|---|---|---|---|
pose_scnet_50 | 256x256 | 0.803 | 0.834 | 4.55 | ckpt | log |
Topdown Heatmap + Hrnetv2 + Dark + Coco + Wholebody on Coco_wholebody_hand¶
HRNetv2 (TPAMI'2019)
@article{WangSCJDZLMTWLX19,
title={Deep High-Resolution Representation Learning for Visual Recognition},
author={Jingdong Wang and Ke Sun and Tianheng Cheng and
Borui Jiang and Chaorui Deng and Yang Zhao and Dong Liu and Yadong Mu and
Mingkui Tan and Xinggang Wang and Wenyu Liu and Bin Xiao},
journal={TPAMI},
year={2019}
}
DarkPose (CVPR'2020)
@inproceedings{zhang2020distribution,
title={Distribution-aware coordinate representation for human pose estimation},
author={Zhang, Feng and Zhu, Xiatian and Dai, Hanbin and Ye, Mao and Zhu, Ce},
booktitle={Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition},
pages={7093--7102},
year={2020}
}
COCO-WholeBody-Hand (ECCV'2020)
@inproceedings{jin2020whole,
title={Whole-Body Human Pose Estimation in the Wild},
author={Jin, Sheng and Xu, Lumin and Xu, Jin and Wang, Can and Liu, Wentao and Qian, Chen and Ouyang, Wanli and Luo, Ping},
booktitle={Proceedings of the European Conference on Computer Vision (ECCV)},
year={2020}
}
Results on COCO-WholeBody-Hand val set
Arch | Input Size | PCK@0.2 | AUC | EPE | ckpt | log |
---|---|---|---|---|---|---|
pose_hrnetv2_w18_dark | 256x256 | 0.814 | 0.840 | 4.37 | ckpt | log |
Freihand2d Dataset¶
Topdown Heatmap + Resnet on Freihand2d¶
SimpleBaseline2D (ECCV'2018)
@inproceedings{xiao2018simple,
title={Simple baselines for human pose estimation and tracking},
author={Xiao, Bin and Wu, Haiping and Wei, Yichen},
booktitle={Proceedings of the European conference on computer vision (ECCV)},
pages={466--481},
year={2018}
}
ResNet (CVPR'2016)
@inproceedings{he2016deep,
title={Deep residual learning for image recognition},
author={He, Kaiming and Zhang, Xiangyu and Ren, Shaoqing and Sun, Jian},
booktitle={Proceedings of the IEEE conference on computer vision and pattern recognition},
pages={770--778},
year={2016}
}
FreiHand (ICCV'2019)
@inproceedings{zimmermann2019freihand,
title={Freihand: A dataset for markerless capture of hand pose and shape from single rgb images},
author={Zimmermann, Christian and Ceylan, Duygu and Yang, Jimei and Russell, Bryan and Argus, Max and Brox, Thomas},
booktitle={Proceedings of the IEEE International Conference on Computer Vision},
pages={813--822},
year={2019}
}
Results on FreiHand val & test set
Set | Arch | Input Size | PCK@0.2 | AUC | EPE | ckpt | log |
---|---|---|---|---|---|---|---|
val | pose_resnet_50 | 224x224 | 0.993 | 0.868 | 3.25 | ckpt | log |
test | pose_resnet_50 | 224x224 | 0.992 | 0.868 | 3.27 | ckpt | log |
Interhand2d Dataset¶
Topdown Heatmap + Resnet on Interhand2d¶
SimpleBaseline2D (ECCV'2018)
@inproceedings{xiao2018simple,
title={Simple baselines for human pose estimation and tracking},
author={Xiao, Bin and Wu, Haiping and Wei, Yichen},
booktitle={Proceedings of the European conference on computer vision (ECCV)},
pages={466--481},
year={2018}
}
ResNet (CVPR'2016)
@inproceedings{he2016deep,
title={Deep residual learning for image recognition},
author={He, Kaiming and Zhang, Xiangyu and Ren, Shaoqing and Sun, Jian},
booktitle={Proceedings of the IEEE conference on computer vision and pattern recognition},
pages={770--778},
year={2016}
}
InterHand2.6M (ECCV'2020)
@InProceedings{Moon_2020_ECCV_InterHand2.6M,
author = {Moon, Gyeongsik and Yu, Shoou-I and Wen, He and Shiratori, Takaaki and Lee, Kyoung Mu},
title = {InterHand2.6M: A Dataset and Baseline for 3D Interacting Hand Pose Estimation from a Single RGB Image},
booktitle = {European Conference on Computer Vision (ECCV)},
year = {2020}
}
Results on InterHand2.6M val & test set
Train Set | Set | Arch | Input Size | PCK@0.2 | AUC | EPE | ckpt | log |
---|---|---|---|---|---|---|---|---|
Human_annot | val(M) | pose_resnet_50 | 256x256 | 0.973 | 0.828 | 5.15 | ckpt | log |
Human_annot | test(H) | pose_resnet_50 | 256x256 | 0.973 | 0.826 | 5.27 | ckpt | log |
Human_annot | test(M) | pose_resnet_50 | 256x256 | 0.975 | 0.841 | 4.90 | ckpt | log |
Human_annot | test(H+M) | pose_resnet_50 | 256x256 | 0.975 | 0.839 | 4.97 | ckpt | log |
Machine_annot | val(M) | pose_resnet_50 | 256x256 | 0.970 | 0.824 | 5.39 | ckpt | log |
Machine_annot | test(H) | pose_resnet_50 | 256x256 | 0.969 | 0.821 | 5.52 | ckpt | log |
Machine_annot | test(M) | pose_resnet_50 | 256x256 | 0.972 | 0.838 | 5.03 | ckpt | log |
Machine_annot | test(H+M) | pose_resnet_50 | 256x256 | 0.972 | 0.837 | 5.11 | ckpt | log |
All | val(M) | pose_resnet_50 | 256x256 | 0.977 | 0.840 | 4.66 | ckpt | log |
All | test(H) | pose_resnet_50 | 256x256 | 0.979 | 0.839 | 4.65 | ckpt | log |
All | test(M) | pose_resnet_50 | 256x256 | 0.979 | 0.838 | 4.42 | ckpt | log |
All | test(H+M) | pose_resnet_50 | 256x256 | 0.979 | 0.851 | 4.46 | ckpt | log |
Onehand10k Dataset¶
Deeppose + Resnet on Onehand10k¶
DeepPose (CVPR'2014)
@inproceedings{toshev2014deeppose,
title={Deeppose: Human pose estimation via deep neural networks},
author={Toshev, Alexander and Szegedy, Christian},
booktitle={Proceedings of the IEEE conference on computer vision and pattern recognition},
pages={1653--1660},
year={2014}
}
ResNet (CVPR'2016)
@inproceedings{he2016deep,
title={Deep residual learning for image recognition},
author={He, Kaiming and Zhang, Xiangyu and Ren, Shaoqing and Sun, Jian},
booktitle={Proceedings of the IEEE conference on computer vision and pattern recognition},
pages={770--778},
year={2016}
}
OneHand10K (TCSVT'2019)
@article{wang2018mask,
title={Mask-pose cascaded cnn for 2d hand pose estimation from single color image},
author={Wang, Yangang and Peng, Cong and Liu, Yebin},
journal={IEEE Transactions on Circuits and Systems for Video Technology},
volume={29},
number={11},
pages={3258--3268},
year={2018},
publisher={IEEE}
}
Results on OneHand10K val set
Arch | Input Size | PCK@0.2 | AUC | EPE | ckpt | log |
---|---|---|---|---|---|---|
deeppose_resnet_50 | 256x256 | 0.990 | 0.486 | 34.28 | ckpt | log |
Topdown Heatmap + Hrnetv2 + Dark on Onehand10k¶
HRNetv2 (TPAMI'2019)
@article{WangSCJDZLMTWLX19,
title={Deep High-Resolution Representation Learning for Visual Recognition},
author={Jingdong Wang and Ke Sun and Tianheng Cheng and
Borui Jiang and Chaorui Deng and Yang Zhao and Dong Liu and Yadong Mu and
Mingkui Tan and Xinggang Wang and Wenyu Liu and Bin Xiao},
journal={TPAMI},
year={2019}
}
DarkPose (CVPR'2020)
@inproceedings{zhang2020distribution,
title={Distribution-aware coordinate representation for human pose estimation},
author={Zhang, Feng and Zhu, Xiatian and Dai, Hanbin and Ye, Mao and Zhu, Ce},
booktitle={Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition},
pages={7093--7102},
year={2020}
}
OneHand10K (TCSVT'2019)
@article{wang2018mask,
title={Mask-pose cascaded cnn for 2d hand pose estimation from single color image},
author={Wang, Yangang and Peng, Cong and Liu, Yebin},
journal={IEEE Transactions on Circuits and Systems for Video Technology},
volume={29},
number={11},
pages={3258--3268},
year={2018},
publisher={IEEE}
}
Results on OneHand10K val set
Arch | Input Size | PCK@0.2 | AUC | EPE | ckpt | log |
---|---|---|---|---|---|---|
pose_hrnetv2_w18_dark | 256x256 | 0.990 | 0.573 | 23.84 | ckpt | log |
Topdown Heatmap + Hrnetv2 on Onehand10k¶
HRNetv2 (TPAMI'2019)
@article{WangSCJDZLMTWLX19,
title={Deep High-Resolution Representation Learning for Visual Recognition},
author={Jingdong Wang and Ke Sun and Tianheng Cheng and
Borui Jiang and Chaorui Deng and Yang Zhao and Dong Liu and Yadong Mu and
Mingkui Tan and Xinggang Wang and Wenyu Liu and Bin Xiao},
journal={TPAMI},
year={2019}
}
OneHand10K (TCSVT'2019)
@article{wang2018mask,
title={Mask-pose cascaded cnn for 2d hand pose estimation from single color image},
author={Wang, Yangang and Peng, Cong and Liu, Yebin},
journal={IEEE Transactions on Circuits and Systems for Video Technology},
volume={29},
number={11},
pages={3258--3268},
year={2018},
publisher={IEEE}
}
Results on OneHand10K val set
Arch | Input Size | PCK@0.2 | AUC | EPE | ckpt | log |
---|---|---|---|---|---|---|
pose_hrnetv2_w18 | 256x256 | 0.990 | 0.568 | 24.16 | ckpt | log |
Topdown Heatmap + Hrnetv2 + Udp on Onehand10k¶
HRNetv2 (TPAMI'2019)
@article{WangSCJDZLMTWLX19,
title={Deep High-Resolution Representation Learning for Visual Recognition},
author={Jingdong Wang and Ke Sun and Tianheng Cheng and
Borui Jiang and Chaorui Deng and Yang Zhao and Dong Liu and Yadong Mu and
Mingkui Tan and Xinggang Wang and Wenyu Liu and Bin Xiao},
journal={TPAMI},
year={2019}
}
UDP (CVPR'2020)
@InProceedings{Huang_2020_CVPR,
author = {Huang, Junjie and Zhu, Zheng and Guo, Feng and Huang, Guan},
title = {The Devil Is in the Details: Delving Into Unbiased Data Processing for Human Pose Estimation},
booktitle = {The IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)},
month = {June},
year = {2020}
}
OneHand10K (TCSVT'2019)
@article{wang2018mask,
title={Mask-pose cascaded cnn for 2d hand pose estimation from single color image},
author={Wang, Yangang and Peng, Cong and Liu, Yebin},
journal={IEEE Transactions on Circuits and Systems for Video Technology},
volume={29},
number={11},
pages={3258--3268},
year={2018},
publisher={IEEE}
}
Results on OneHand10K val set
Arch | Input Size | PCK@0.2 | AUC | EPE | ckpt | log |
---|---|---|---|---|---|---|
pose_hrnetv2_w18_udp | 256x256 | 0.990 | 0.572 | 23.87 | ckpt | log |
Topdown Heatmap + Mobilenetv2 on Onehand10k¶
MobilenetV2 (CVPR'2018)
@inproceedings{sandler2018mobilenetv2,
title={Mobilenetv2: Inverted residuals and linear bottlenecks},
author={Sandler, Mark and Howard, Andrew and Zhu, Menglong and Zhmoginov, Andrey and Chen, Liang-Chieh},
booktitle={Proceedings of the IEEE conference on computer vision and pattern recognition},
pages={4510--4520},
year={2018}
}
OneHand10K (TCSVT'2019)
@article{wang2018mask,
title={Mask-pose cascaded cnn for 2d hand pose estimation from single color image},
author={Wang, Yangang and Peng, Cong and Liu, Yebin},
journal={IEEE Transactions on Circuits and Systems for Video Technology},
volume={29},
number={11},
pages={3258--3268},
year={2018},
publisher={IEEE}
}
Results on OneHand10K val set
Arch | Input Size | PCK@0.2 | AUC | EPE | ckpt | log |
---|---|---|---|---|---|---|
pose_mobilenet_v2 | 256x256 | 0.986 | 0.537 | 28.60 | ckpt | log |
Topdown Heatmap + Resnet on Onehand10k¶
SimpleBaseline2D (ECCV'2018)
@inproceedings{xiao2018simple,
title={Simple baselines for human pose estimation and tracking},
author={Xiao, Bin and Wu, Haiping and Wei, Yichen},
booktitle={Proceedings of the European conference on computer vision (ECCV)},
pages={466--481},
year={2018}
}
ResNet (CVPR'2016)
@inproceedings{he2016deep,
title={Deep residual learning for image recognition},
author={He, Kaiming and Zhang, Xiangyu and Ren, Shaoqing and Sun, Jian},
booktitle={Proceedings of the IEEE conference on computer vision and pattern recognition},
pages={770--778},
year={2016}
}
OneHand10K (TCSVT'2019)
@article{wang2018mask,
title={Mask-pose cascaded cnn for 2d hand pose estimation from single color image},
author={Wang, Yangang and Peng, Cong and Liu, Yebin},
journal={IEEE Transactions on Circuits and Systems for Video Technology},
volume={29},
number={11},
pages={3258--3268},
year={2018},
publisher={IEEE}
}
Results on OneHand10K val set
Arch | Input Size | PCK@0.2 | AUC | EPE | ckpt | log |
---|---|---|---|---|---|---|
pose_resnet_50 | 256x256 | 0.989 | 0.555 | 25.19 | ckpt | log |
Panoptic2d Dataset¶
Deeppose + Resnet on Panoptic2d¶
DeepPose (CVPR'2014)
@inproceedings{toshev2014deeppose,
title={Deeppose: Human pose estimation via deep neural networks},
author={Toshev, Alexander and Szegedy, Christian},
booktitle={Proceedings of the IEEE conference on computer vision and pattern recognition},
pages={1653--1660},
year={2014}
}
ResNet (CVPR'2016)
@inproceedings{he2016deep,
title={Deep residual learning for image recognition},
author={He, Kaiming and Zhang, Xiangyu and Ren, Shaoqing and Sun, Jian},
booktitle={Proceedings of the IEEE conference on computer vision and pattern recognition},
pages={770--778},
year={2016}
}
CMU Panoptic HandDB (CVPR'2017)
@inproceedings{simon2017hand,
title={Hand keypoint detection in single images using multiview bootstrapping},
author={Simon, Tomas and Joo, Hanbyul and Matthews, Iain and Sheikh, Yaser},
booktitle={Proceedings of the IEEE conference on Computer Vision and Pattern Recognition},
pages={1145--1153},
year={2017}
}
Results on CMU Panoptic (MPII+NZSL val set)
Arch | Input Size | PCKh@0.7 | AUC | EPE | ckpt | log |
---|---|---|---|---|---|---|
deeppose_resnet_50 | 256x256 | 0.999 | 0.686 | 9.36 | ckpt | log |
Topdown Heatmap + Hrnetv2 on Panoptic2d¶
HRNetv2 (TPAMI'2019)
@article{WangSCJDZLMTWLX19,
title={Deep High-Resolution Representation Learning for Visual Recognition},
author={Jingdong Wang and Ke Sun and Tianheng Cheng and
Borui Jiang and Chaorui Deng and Yang Zhao and Dong Liu and Yadong Mu and
Mingkui Tan and Xinggang Wang and Wenyu Liu and Bin Xiao},
journal={TPAMI},
year={2019}
}
CMU Panoptic HandDB (CVPR'2017)
@inproceedings{simon2017hand,
title={Hand keypoint detection in single images using multiview bootstrapping},
author={Simon, Tomas and Joo, Hanbyul and Matthews, Iain and Sheikh, Yaser},
booktitle={Proceedings of the IEEE conference on Computer Vision and Pattern Recognition},
pages={1145--1153},
year={2017}
}
Results on CMU Panoptic (MPII+NZSL val set)
Arch | Input Size | PCKh@0.7 | AUC | EPE | ckpt | log |
---|---|---|---|---|---|---|
pose_hrnetv2_w18 | 256x256 | 0.999 | 0.744 | 7.79 | ckpt | log |
Topdown Heatmap + Mobilenetv2 on Panoptic2d¶
MobilenetV2 (CVPR'2018)
@inproceedings{sandler2018mobilenetv2,
title={Mobilenetv2: Inverted residuals and linear bottlenecks},
author={Sandler, Mark and Howard, Andrew and Zhu, Menglong and Zhmoginov, Andrey and Chen, Liang-Chieh},
booktitle={Proceedings of the IEEE conference on computer vision and pattern recognition},
pages={4510--4520},
year={2018}
}
CMU Panoptic HandDB (CVPR'2017)
@inproceedings{simon2017hand,
title={Hand keypoint detection in single images using multiview bootstrapping},
author={Simon, Tomas and Joo, Hanbyul and Matthews, Iain and Sheikh, Yaser},
booktitle={Proceedings of the IEEE conference on Computer Vision and Pattern Recognition},
pages={1145--1153},
year={2017}
}
Results on CMU Panoptic (MPII+NZSL val set)
Arch | Input Size | PCKh@0.7 | AUC | EPE | ckpt | log |
---|---|---|---|---|---|---|
pose_mobilenet_v2 | 256x256 | 0.998 | 0.694 | 9.70 | ckpt | log |
Topdown Heatmap + Resnet on Panoptic2d¶
SimpleBaseline2D (ECCV'2018)
@inproceedings{xiao2018simple,
title={Simple baselines for human pose estimation and tracking},
author={Xiao, Bin and Wu, Haiping and Wei, Yichen},
booktitle={Proceedings of the European conference on computer vision (ECCV)},
pages={466--481},
year={2018}
}
ResNet (CVPR'2016)
@inproceedings{he2016deep,
title={Deep residual learning for image recognition},
author={He, Kaiming and Zhang, Xiangyu and Ren, Shaoqing and Sun, Jian},
booktitle={Proceedings of the IEEE conference on computer vision and pattern recognition},
pages={770--778},
year={2016}
}
CMU Panoptic HandDB (CVPR'2017)
@inproceedings{simon2017hand,
title={Hand keypoint detection in single images using multiview bootstrapping},
author={Simon, Tomas and Joo, Hanbyul and Matthews, Iain and Sheikh, Yaser},
booktitle={Proceedings of the IEEE conference on Computer Vision and Pattern Recognition},
pages={1145--1153},
year={2017}
}
Results on CMU Panoptic (MPII+NZSL val set)
Arch | Input Size | PCKh@0.7 | AUC | EPE | ckpt | log |
---|---|---|---|---|---|---|
pose_resnet_50 | 256x256 | 0.999 | 0.713 | 9.00 | ckpt | log |
Topdown Heatmap + Hrnetv2 + Udp on Panoptic2d¶
HRNetv2 (TPAMI'2019)
@article{WangSCJDZLMTWLX19,
title={Deep High-Resolution Representation Learning for Visual Recognition},
author={Jingdong Wang and Ke Sun and Tianheng Cheng and
Borui Jiang and Chaorui Deng and Yang Zhao and Dong Liu and Yadong Mu and
Mingkui Tan and Xinggang Wang and Wenyu Liu and Bin Xiao},
journal={TPAMI},
year={2019}
}
UDP (CVPR'2020)
@InProceedings{Huang_2020_CVPR,
author = {Huang, Junjie and Zhu, Zheng and Guo, Feng and Huang, Guan},
title = {The Devil Is in the Details: Delving Into Unbiased Data Processing for Human Pose Estimation},
booktitle = {The IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)},
month = {June},
year = {2020}
}
CMU Panoptic HandDB (CVPR'2017)
@inproceedings{simon2017hand,
title={Hand keypoint detection in single images using multiview bootstrapping},
author={Simon, Tomas and Joo, Hanbyul and Matthews, Iain and Sheikh, Yaser},
booktitle={Proceedings of the IEEE conference on Computer Vision and Pattern Recognition},
pages={1145--1153},
year={2017}
}
Results on CMU Panoptic (MPII+NZSL val set)
Arch | Input Size | PCKh@0.7 | AUC | EPE | ckpt | log |
---|---|---|---|---|---|---|
pose_hrnetv2_w18_udp | 256x256 | 0.998 | 0.742 | 7.84 | ckpt | log |
Topdown Heatmap + Hrnetv2 + Dark on Panoptic2d¶
HRNetv2 (TPAMI'2019)
@article{WangSCJDZLMTWLX19,
title={Deep High-Resolution Representation Learning for Visual Recognition},
author={Jingdong Wang and Ke Sun and Tianheng Cheng and
Borui Jiang and Chaorui Deng and Yang Zhao and Dong Liu and Yadong Mu and
Mingkui Tan and Xinggang Wang and Wenyu Liu and Bin Xiao},
journal={TPAMI},
year={2019}
}
DarkPose (CVPR'2020)
@inproceedings{zhang2020distribution,
title={Distribution-aware coordinate representation for human pose estimation},
author={Zhang, Feng and Zhu, Xiatian and Dai, Hanbin and Ye, Mao and Zhu, Ce},
booktitle={Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition},
pages={7093--7102},
year={2020}
}
CMU Panoptic HandDB (CVPR'2017)
@inproceedings{simon2017hand,
title={Hand keypoint detection in single images using multiview bootstrapping},
author={Simon, Tomas and Joo, Hanbyul and Matthews, Iain and Sheikh, Yaser},
booktitle={Proceedings of the IEEE conference on Computer Vision and Pattern Recognition},
pages={1145--1153},
year={2017}
}
Results on CMU Panoptic (MPII+NZSL val set)
Arch | Input Size | PCKh@0.7 | AUC | EPE | ckpt | log |
---|---|---|---|---|---|---|
pose_hrnetv2_w18_dark | 256x256 | 0.999 | 0.745 | 7.77 | ckpt | log |
Rhd2d Dataset¶
Deeppose + Resnet on Rhd2d¶
DeepPose (CVPR'2014)
@inproceedings{toshev2014deeppose,
title={Deeppose: Human pose estimation via deep neural networks},
author={Toshev, Alexander and Szegedy, Christian},
booktitle={Proceedings of the IEEE conference on computer vision and pattern recognition},
pages={1653--1660},
year={2014}
}
ResNet (CVPR'2016)
@inproceedings{he2016deep,
title={Deep residual learning for image recognition},
author={He, Kaiming and Zhang, Xiangyu and Ren, Shaoqing and Sun, Jian},
booktitle={Proceedings of the IEEE conference on computer vision and pattern recognition},
pages={770--778},
year={2016}
}
RHD (ICCV'2017)
@TechReport{zb2017hand,
author={Christian Zimmermann and Thomas Brox},
title={Learning to Estimate 3D Hand Pose from Single RGB Images},
institution={arXiv:1705.01389},
year={2017},
note="https://arxiv.org/abs/1705.01389",
url="https://lmb.informatik.uni-freiburg.de/projects/hand3d/"
}
Results on RHD test set
Arch | Input Size | PCK@0.2 | AUC | EPE | ckpt | log |
---|---|---|---|---|---|---|
deeppose_resnet_50 | 256x256 | 0.988 | 0.865 | 3.29 | ckpt | log |
Topdown Heatmap + Mobilenetv2 on Rhd2d¶
MobilenetV2 (CVPR'2018)
@inproceedings{sandler2018mobilenetv2,
title={Mobilenetv2: Inverted residuals and linear bottlenecks},
author={Sandler, Mark and Howard, Andrew and Zhu, Menglong and Zhmoginov, Andrey and Chen, Liang-Chieh},
booktitle={Proceedings of the IEEE conference on computer vision and pattern recognition},
pages={4510--4520},
year={2018}
}
RHD (ICCV'2017)
@TechReport{zb2017hand,
author={Christian Zimmermann and Thomas Brox},
title={Learning to Estimate 3D Hand Pose from Single RGB Images},
institution={arXiv:1705.01389},
year={2017},
note="https://arxiv.org/abs/1705.01389",
url="https://lmb.informatik.uni-freiburg.de/projects/hand3d/"
}
Results on RHD test set
Arch | Input Size | PCK@0.2 | AUC | EPE | ckpt | log |
---|---|---|---|---|---|---|
pose_mobilenet_v2 | 256x256 | 0.985 | 0.883 | 2.80 | ckpt | log |
Topdown Heatmap + Resnet on Rhd2d¶
SimpleBaseline2D (ECCV'2018)
@inproceedings{xiao2018simple,
title={Simple baselines for human pose estimation and tracking},
author={Xiao, Bin and Wu, Haiping and Wei, Yichen},
booktitle={Proceedings of the European conference on computer vision (ECCV)},
pages={466--481},
year={2018}
}
ResNet (CVPR'2016)
@inproceedings{he2016deep,
title={Deep residual learning for image recognition},
author={He, Kaiming and Zhang, Xiangyu and Ren, Shaoqing and Sun, Jian},
booktitle={Proceedings of the IEEE conference on computer vision and pattern recognition},
pages={770--778},
year={2016}
}
RHD (ICCV'2017)
@TechReport{zb2017hand,
author={Christian Zimmermann and Thomas Brox},
title={Learning to Estimate 3D Hand Pose from Single RGB Images},
institution={arXiv:1705.01389},
year={2017},
note="https://arxiv.org/abs/1705.01389",
url="https://lmb.informatik.uni-freiburg.de/projects/hand3d/"
}
Results on RHD test set
Arch | Input Size | PCK@0.2 | AUC | EPE | ckpt | log |
---|---|---|---|---|---|---|
pose_resnet50 | 256x256 | 0.991 | 0.898 | 2.33 | ckpt | log |
Topdown Heatmap + Hrnetv2 on Rhd2d¶
HRNetv2 (TPAMI'2019)
@article{WangSCJDZLMTWLX19,
title={Deep High-Resolution Representation Learning for Visual Recognition},
author={Jingdong Wang and Ke Sun and Tianheng Cheng and
Borui Jiang and Chaorui Deng and Yang Zhao and Dong Liu and Yadong Mu and
Mingkui Tan and Xinggang Wang and Wenyu Liu and Bin Xiao},
journal={TPAMI},
year={2019}
}
RHD (ICCV'2017)
@TechReport{zb2017hand,
author={Christian Zimmermann and Thomas Brox},
title={Learning to Estimate 3D Hand Pose from Single RGB Images},
institution={arXiv:1705.01389},
year={2017},
note="https://arxiv.org/abs/1705.01389",
url="https://lmb.informatik.uni-freiburg.de/projects/hand3d/"
}
Results on RHD test set
Arch | Input Size | PCK@0.2 | AUC | EPE | ckpt | log |
---|---|---|---|---|---|---|
pose_hrnetv2_w18 | 256x256 | 0.992 | 0.902 | 2.21 | ckpt | log |
Topdown Heatmap + Hrnetv2 + Udp on Rhd2d¶
HRNetv2 (TPAMI'2019)
@article{WangSCJDZLMTWLX19,
title={Deep High-Resolution Representation Learning for Visual Recognition},
author={Jingdong Wang and Ke Sun and Tianheng Cheng and
Borui Jiang and Chaorui Deng and Yang Zhao and Dong Liu and Yadong Mu and
Mingkui Tan and Xinggang Wang and Wenyu Liu and Bin Xiao},
journal={TPAMI},
year={2019}
}
UDP (CVPR'2020)
@InProceedings{Huang_2020_CVPR,
author = {Huang, Junjie and Zhu, Zheng and Guo, Feng and Huang, Guan},
title = {The Devil Is in the Details: Delving Into Unbiased Data Processing for Human Pose Estimation},
booktitle = {The IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)},
month = {June},
year = {2020}
}
RHD (ICCV'2017)
@TechReport{zb2017hand,
author={Christian Zimmermann and Thomas Brox},
title={Learning to Estimate 3D Hand Pose from Single RGB Images},
institution={arXiv:1705.01389},
year={2017},
note="https://arxiv.org/abs/1705.01389",
url="https://lmb.informatik.uni-freiburg.de/projects/hand3d/"
}
Results on CMU Panoptic (MPII+NZSL val set)
Arch | Input Size | PCKh@0.7 | AUC | EPE | ckpt | log |
---|---|---|---|---|---|---|
pose_hrnetv2_w18_udp | 256x256 | 0.998 | 0.742 | 7.84 | ckpt | log |
Topdown Heatmap + Hrnetv2 + Dark on Rhd2d¶
HRNetv2 (TPAMI'2019)
@article{WangSCJDZLMTWLX19,
title={Deep High-Resolution Representation Learning for Visual Recognition},
author={Jingdong Wang and Ke Sun and Tianheng Cheng and
Borui Jiang and Chaorui Deng and Yang Zhao and Dong Liu and Yadong Mu and
Mingkui Tan and Xinggang Wang and Wenyu Liu and Bin Xiao},
journal={TPAMI},
year={2019}
}
DarkPose (CVPR'2020)
@inproceedings{zhang2020distribution,
title={Distribution-aware coordinate representation for human pose estimation},
author={Zhang, Feng and Zhu, Xiatian and Dai, Hanbin and Ye, Mao and Zhu, Ce},
booktitle={Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition},
pages={7093--7102},
year={2020}
}
RHD (ICCV'2017)
@TechReport{zb2017hand,
author={Christian Zimmermann and Thomas Brox},
title={Learning to Estimate 3D Hand Pose from Single RGB Images},
institution={arXiv:1705.01389},
year={2017},
note="https://arxiv.org/abs/1705.01389",
url="https://lmb.informatik.uni-freiburg.de/projects/hand3d/"
}
Results on RHD test set
Arch | Input Size | PCK@0.2 | AUC | EPE | ckpt | log |
---|---|---|---|---|---|---|
pose_hrnetv2_w18_dark | 256x256 | 0.992 | 0.903 | 2.17 | ckpt | log |
Hand(3D)¶
Interhand3d Dataset¶
Internet + Internet on Interhand3d¶
InterNet (ECCV'2020)
@InProceedings{Moon_2020_ECCV_InterHand2.6M,
author = {Moon, Gyeongsik and Yu, Shoou-I and Wen, He and Shiratori, Takaaki and Lee, Kyoung Mu},
title = {InterHand2.6M: A Dataset and Baseline for 3D Interacting Hand Pose Estimation from a Single RGB Image},
booktitle = {European Conference on Computer Vision (ECCV)},
year = {2020}
}
ResNet (CVPR'2016)
@inproceedings{he2016deep,
title={Deep residual learning for image recognition},
author={He, Kaiming and Zhang, Xiangyu and Ren, Shaoqing and Sun, Jian},
booktitle={Proceedings of the IEEE conference on computer vision and pattern recognition},
pages={770--778},
year={2016}
}
InterHand2.6M (ECCV'2020)
@InProceedings{Moon_2020_ECCV_InterHand2.6M,
author = {Moon, Gyeongsik and Yu, Shoou-I and Wen, He and Shiratori, Takaaki and Lee, Kyoung Mu},
title = {InterHand2.6M: A Dataset and Baseline for 3D Interacting Hand Pose Estimation from a Single RGB Image},
booktitle = {European Conference on Computer Vision (ECCV)},
year = {2020}
}
Results on InterHand2.6M val & test set
Train Set | Set | Arch | Input Size | MPJPE-single | MPJPE-interacting | MPJPE-all | MRRPE | APh | ckpt | log |
---|---|---|---|---|---|---|---|---|---|---|
All | test(H+M) | InterNet_resnet_50 | 256x256 | 9.47 | 13.40 | 11.59 | 29.28 | 0.99 | ckpt | log |
All | val(M) | InterNet_resnet_50 | 256x256 | 11.22 | 15.23 | 13.16 | 31.73 | 0.98 | ckpt | log |
Wholebody¶
Coco-Wholebody Dataset¶
Associative Embedding + Hrnet on Coco-Wholebody¶
Associative Embedding (NIPS'2017)
@inproceedings{newell2017associative,
title={Associative embedding: End-to-end learning for joint detection and grouping},
author={Newell, Alejandro and Huang, Zhiao and Deng, Jia},
booktitle={Advances in neural information processing systems},
pages={2277--2287},
year={2017}
}
HRNet (CVPR'2019)
@inproceedings{sun2019deep,
title={Deep high-resolution representation learning for human pose estimation},
author={Sun, Ke and Xiao, Bin and Liu, Dong and Wang, Jingdong},
booktitle={Proceedings of the IEEE conference on computer vision and pattern recognition},
pages={5693--5703},
year={2019}
}
COCO-WholeBody (ECCV'2020)
@inproceedings{jin2020whole,
title={Whole-Body Human Pose Estimation in the Wild},
author={Jin, Sheng and Xu, Lumin and Xu, Jin and Wang, Can and Liu, Wentao and Qian, Chen and Ouyang, Wanli and Luo, Ping},
booktitle={Proceedings of the European Conference on Computer Vision (ECCV)},
year={2020}
}
Results on COCO-WholeBody v1.0 val without multi-scale test
Arch | Input Size | Body AP | Body AR | Foot AP | Foot AR | Face AP | Face AR | Hand AP | Hand AR | Whole AP | Whole AR | ckpt | log |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|
HRNet-w32+ | 512x512 | 0.551 | 0.650 | 0.271 | 0.451 | 0.564 | 0.618 | 0.159 | 0.238 | 0.342 | 0.453 | ckpt | log |
HRNet-w48+ | 512x512 | 0.592 | 0.686 | 0.443 | 0.595 | 0.619 | 0.674 | 0.347 | 0.438 | 0.422 | 0.532 | ckpt | log |
Note: +
means the model is first pre-trained on original COCO dataset, and then fine-tuned on COCO-WholeBody dataset. We find this will lead to better performance.
Associative Embedding + Higherhrnet on Coco-Wholebody¶
Associative Embedding (NIPS'2017)
@inproceedings{newell2017associative,
title={Associative embedding: End-to-end learning for joint detection and grouping},
author={Newell, Alejandro and Huang, Zhiao and Deng, Jia},
booktitle={Advances in neural information processing systems},
pages={2277--2287},
year={2017}
}
HigherHRNet (CVPR'2020)
@inproceedings{cheng2020higherhrnet,
title={HigherHRNet: Scale-Aware Representation Learning for Bottom-Up Human Pose Estimation},
author={Cheng, Bowen and Xiao, Bin and Wang, Jingdong and Shi, Honghui and Huang, Thomas S and Zhang, Lei},
booktitle={Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition},
pages={5386--5395},
year={2020}
}
COCO-WholeBody (ECCV'2020)
@inproceedings{jin2020whole,
title={Whole-Body Human Pose Estimation in the Wild},
author={Jin, Sheng and Xu, Lumin and Xu, Jin and Wang, Can and Liu, Wentao and Qian, Chen and Ouyang, Wanli and Luo, Ping},
booktitle={Proceedings of the European Conference on Computer Vision (ECCV)},
year={2020}
}
Results on COCO-WholeBody v1.0 val without multi-scale test
Arch | Input Size | Body AP | Body AR | Foot AP | Foot AR | Face AP | Face AR | Hand AP | Hand AR | Whole AP | Whole AR | ckpt | log |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|
HigherHRNet-w32+ | 512x512 | 0.590 | 0.672 | 0.185 | 0.335 | 0.676 | 0.721 | 0.212 | 0.298 | 0.401 | 0.493 | ckpt | log |
HigherHRNet-w48+ | 512x512 | 0.630 | 0.706 | 0.440 | 0.573 | 0.730 | 0.777 | 0.389 | 0.477 | 0.487 | 0.574 | ckpt | log |
Note: +
means the model is first pre-trained on original COCO dataset, and then fine-tuned on COCO-WholeBody dataset. We find this will lead to better performance.
Topdown Heatmap + Vipnas + Dark on Coco-Wholebody¶
ViPNAS (CVPR'2021)
@article{xu2021vipnas,
title={ViPNAS: Efficient Video Pose Estimation via Neural Architecture Search},
author={Xu, Lumin and Guan, Yingda and Jin, Sheng and Liu, Wentao and Qian, Chen and Luo, Ping and Ouyang, Wanli and Wang, Xiaogang},
booktitle={Proceedings of the IEEE conference on computer vision and pattern recognition},
year={2021}
}
DarkPose (CVPR'2020)
@inproceedings{zhang2020distribution,
title={Distribution-aware coordinate representation for human pose estimation},
author={Zhang, Feng and Zhu, Xiatian and Dai, Hanbin and Ye, Mao and Zhu, Ce},
booktitle={Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition},
pages={7093--7102},
year={2020}
}
COCO-WholeBody (ECCV'2020)
@inproceedings{jin2020whole,
title={Whole-Body Human Pose Estimation in the Wild},
author={Jin, Sheng and Xu, Lumin and Xu, Jin and Wang, Can and Liu, Wentao and Qian, Chen and Ouyang, Wanli and Luo, Ping},
booktitle={Proceedings of the European Conference on Computer Vision (ECCV)},
year={2020}
}
Results on COCO-WholeBody v1.0 val with detector having human AP of 56.4 on COCO val2017 dataset
Arch | Input Size | Body AP | Body AR | Foot AP | Foot AR | Face AP | Face AR | Hand AP | Hand AR | Whole AP | Whole AR | ckpt | log |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|
S-ViPNAS-MobileNetV3_dark | 256x192 | 0.632 | 0.710 | 0.530 | 0.660 | 0.672 | 0.771 | 0.404 | 0.519 | 0.508 | 0.607 | ckpt | log |
S-ViPNAS-Res50_dark | 256x192 | 0.650 | 0.732 | 0.550 | 0.686 | 0.684 | 0.784 | 0.437 | 0.554 | 0.528 | 0.632 | ckpt | log |
Topdown Heatmap + Hrnet + Dark on Coco-Wholebody¶
HRNet (CVPR'2019)
@inproceedings{sun2019deep,
title={Deep high-resolution representation learning for human pose estimation},
author={Sun, Ke and Xiao, Bin and Liu, Dong and Wang, Jingdong},
booktitle={Proceedings of the IEEE conference on computer vision and pattern recognition},
pages={5693--5703},
year={2019}
}
DarkPose (CVPR'2020)
@inproceedings{zhang2020distribution,
title={Distribution-aware coordinate representation for human pose estimation},
author={Zhang, Feng and Zhu, Xiatian and Dai, Hanbin and Ye, Mao and Zhu, Ce},
booktitle={Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition},
pages={7093--7102},
year={2020}
}
COCO-WholeBody (ECCV'2020)
@inproceedings{jin2020whole,
title={Whole-Body Human Pose Estimation in the Wild},
author={Jin, Sheng and Xu, Lumin and Xu, Jin and Wang, Can and Liu, Wentao and Qian, Chen and Ouyang, Wanli and Luo, Ping},
booktitle={Proceedings of the European Conference on Computer Vision (ECCV)},
year={2020}
}
Results on COCO-WholeBody v1.0 val with detector having human AP of 56.4 on COCO val2017 dataset
Arch | Input Size | Body AP | Body AR | Foot AP | Foot AR | Face AP | Face AR | Hand AP | Hand AR | Whole AP | Whole AR | ckpt | log |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|
pose_hrnet_w32_dark | 256x192 | 0.694 | 0.764 | 0.565 | 0.674 | 0.736 | 0.808 | 0.503 | 0.602 | 0.582 | 0.671 | ckpt | log |
pose_hrnet_w48_dark+ | 384x288 | 0.742 | 0.807 | 0.705 | 0.804 | 0.840 | 0.892 | 0.602 | 0.694 | 0.661 | 0.743 | ckpt | log |
Note: +
means the model is first pre-trained on original COCO dataset, and then fine-tuned on COCO-WholeBody dataset. We find this will lead to better performance.
Topdown Heatmap + Hrnet on Coco-Wholebody¶
HRNet (CVPR'2019)
@inproceedings{sun2019deep,
title={Deep high-resolution representation learning for human pose estimation},
author={Sun, Ke and Xiao, Bin and Liu, Dong and Wang, Jingdong},
booktitle={Proceedings of the IEEE conference on computer vision and pattern recognition},
pages={5693--5703},
year={2019}
}
COCO-WholeBody (ECCV'2020)
@inproceedings{jin2020whole,
title={Whole-Body Human Pose Estimation in the Wild},
author={Jin, Sheng and Xu, Lumin and Xu, Jin and Wang, Can and Liu, Wentao and Qian, Chen and Ouyang, Wanli and Luo, Ping},
booktitle={Proceedings of the European Conference on Computer Vision (ECCV)},
year={2020}
}
Results on COCO-WholeBody v1.0 val with detector having human AP of 56.4 on COCO val2017 dataset
Arch | Input Size | Body AP | Body AR | Foot AP | Foot AR | Face AP | Face AR | Hand AP | Hand AR | Whole AP | Whole AR | ckpt | log |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|
pose_hrnet_w32 | 256x192 | 0.700 | 0.746 | 0.567 | 0.645 | 0.637 | 0.688 | 0.473 | 0.546 | 0.553 | 0.626 | ckpt | log |
pose_hrnet_w32 | 384x288 | 0.701 | 0.773 | 0.586 | 0.692 | 0.727 | 0.783 | 0.516 | 0.604 | 0.586 | 0.674 | ckpt | log |
pose_hrnet_w48 | 256x192 | 0.700 | 0.776 | 0.672 | 0.785 | 0.656 | 0.743 | 0.534 | 0.639 | 0.579 | 0.681 | ckpt | log |
pose_hrnet_w48 | 384x288 | 0.722 | 0.790 | 0.694 | 0.799 | 0.777 | 0.834 | 0.587 | 0.679 | 0.631 | 0.716 | ckpt | log |
Topdown Heatmap + Resnet on Coco-Wholebody¶
SimpleBaseline2D (ECCV'2018)
@inproceedings{xiao2018simple,
title={Simple baselines for human pose estimation and tracking},
author={Xiao, Bin and Wu, Haiping and Wei, Yichen},
booktitle={Proceedings of the European conference on computer vision (ECCV)},
pages={466--481},
year={2018}
}
COCO-WholeBody (ECCV'2020)
@inproceedings{jin2020whole,
title={Whole-Body Human Pose Estimation in the Wild},
author={Jin, Sheng and Xu, Lumin and Xu, Jin and Wang, Can and Liu, Wentao and Qian, Chen and Ouyang, Wanli and Luo, Ping},
booktitle={Proceedings of the European Conference on Computer Vision (ECCV)},
year={2020}
}
Results on COCO-WholeBody v1.0 val with detector having human AP of 56.4 on COCO val2017 dataset
Arch | Input Size | Body AP | Body AR | Foot AP | Foot AR | Face AP | Face AR | Hand AP | Hand AR | Whole AP | Whole AR | ckpt | log |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|
pose_resnet_50 | 256x192 | 0.652 | 0.739 | 0.614 | 0.746 | 0.608 | 0.716 | 0.460 | 0.584 | 0.520 | 0.633 | ckpt | log |
pose_resnet_50 | 384x288 | 0.666 | 0.747 | 0.635 | 0.763 | 0.732 | 0.812 | 0.537 | 0.647 | 0.573 | 0.671 | ckpt | log |
pose_resnet_101 | 256x192 | 0.670 | 0.754 | 0.640 | 0.767 | 0.611 | 0.723 | 0.463 | 0.589 | 0.533 | 0.647 | ckpt | log |
pose_resnet_101 | 384x288 | 0.692 | 0.770 | 0.680 | 0.798 | 0.747 | 0.822 | 0.549 | 0.658 | 0.597 | 0.692 | ckpt | log |
pose_resnet_152 | 256x192 | 0.682 | 0.764 | 0.662 | 0.788 | 0.624 | 0.728 | 0.482 | 0.606 | 0.548 | 0.661 | ckpt | log |
pose_resnet_152 | 384x288 | 0.703 | 0.780 | 0.693 | 0.813 | 0.751 | 0.825 | 0.559 | 0.667 | 0.610 | 0.705 | ckpt | log |
Topdown Heatmap + Vipnas on Coco-Wholebody¶
ViPNAS (CVPR'2021)
@article{xu2021vipnas,
title={ViPNAS: Efficient Video Pose Estimation via Neural Architecture Search},
author={Xu, Lumin and Guan, Yingda and Jin, Sheng and Liu, Wentao and Qian, Chen and Luo, Ping and Ouyang, Wanli and Wang, Xiaogang},
booktitle={Proceedings of the IEEE conference on computer vision and pattern recognition},
year={2021}
}
COCO-WholeBody (ECCV'2020)
@inproceedings{jin2020whole,
title={Whole-Body Human Pose Estimation in the Wild},
author={Jin, Sheng and Xu, Lumin and Xu, Jin and Wang, Can and Liu, Wentao and Qian, Chen and Ouyang, Wanli and Luo, Ping},
booktitle={Proceedings of the European Conference on Computer Vision (ECCV)},
year={2020}
}
Results on COCO-WholeBody v1.0 val with detector having human AP of 56.4 on COCO val2017 dataset
Arch | Input Size | Body AP | Body AR | Foot AP | Foot AR | Face AP | Face AR | Hand AP | Hand AR | Whole AP | Whole AR | ckpt | log |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|
S-ViPNAS-MobileNetV3 | 256x192 | 0.619 | 0.700 | 0.477 | 0.608 | 0.585 | 0.689 | 0.386 | 0.505 | 0.473 | 0.578 | ckpt | log |
S-ViPNAS-Res50 | 256x192 | 0.643 | 0.726 | 0.553 | 0.694 | 0.587 | 0.698 | 0.410 | 0.529 | 0.495 | 0.607 | ckpt | log |
Halpe Dataset¶
Topdown Heatmap + Hrnet + Dark on Halpe¶
HRNet (CVPR'2019)
@inproceedings{sun2019deep,
title={Deep high-resolution representation learning for human pose estimation},
author={Sun, Ke and Xiao, Bin and Liu, Dong and Wang, Jingdong},
booktitle={Proceedings of the IEEE conference on computer vision and pattern recognition},
pages={5693--5703},
year={2019}
}
DarkPose (CVPR'2020)
@inproceedings{zhang2020distribution,
title={Distribution-aware coordinate representation for human pose estimation},
author={Zhang, Feng and Zhu, Xiatian and Dai, Hanbin and Ye, Mao and Zhu, Ce},
booktitle={Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition},
pages={7093--7102},
year={2020}
}
Halpe (CVPR'2020)
@inproceedings{li2020pastanet,
title={PaStaNet: Toward Human Activity Knowledge Engine},
author={Li, Yong-Lu and Xu, Liang and Liu, Xinpeng and Huang, Xijie and Xu, Yue and Wang, Shiyi and Fang, Hao-Shu and Ma, Ze and Chen, Mingyang and Lu, Cewu},
booktitle={CVPR},
year={2020}
}
Results on Halpe v1.0 val with detector having human AP of 56.4 on COCO val2017 dataset
Arch | Input Size | Whole AP | Whole AR | ckpt | log |
---|---|---|---|---|---|
pose_hrnet_w48_dark+ | 384x288 | 0.531 | 0.642 | ckpt | log |
Note: +
means the model is first pre-trained on original COCO dataset, and then fine-tuned on Halpe dataset. We find this will lead to better performance.
Algorithms¶
SimpleBaseline2D (ECCV’2018)¶
Topdown Heatmap + Resnet on Animalpose¶
SimpleBaseline2D (ECCV'2018)
@inproceedings{xiao2018simple,
title={Simple baselines for human pose estimation and tracking},
author={Xiao, Bin and Wu, Haiping and Wei, Yichen},
booktitle={Proceedings of the European conference on computer vision (ECCV)},
pages={466--481},
year={2018}
}
Animal-Pose (ICCV'2019)
@InProceedings{Cao_2019_ICCV,
author = {Cao, Jinkun and Tang, Hongyang and Fang, Hao-Shu and Shen, Xiaoyong and Lu, Cewu and Tai, Yu-Wing},
title = {Cross-Domain Adaptation for Animal Pose Estimation},
booktitle = {The IEEE International Conference on Computer Vision (ICCV)},
month = {October},
year = {2019}
}
Results on AnimalPose validation set (1117 instances)
Arch | Input Size | AP | AP50 | AP75 | AR | AR50 | ckpt | log |
---|---|---|---|---|---|---|---|---|
pose_resnet_50 | 256x256 | 0.688 | 0.945 | 0.772 | 0.733 | 0.952 | ckpt | log |
pose_resnet_101 | 256x256 | 0.696 | 0.948 | 0.785 | 0.737 | 0.954 | ckpt | log |
pose_resnet_152 | 256x256 | 0.709 | 0.948 | 0.797 | 0.749 | 0.951 | ckpt | log |
Topdown Heatmap + Resnet on Ap10k¶
SimpleBaseline2D (ECCV'2018)
@inproceedings{xiao2018simple,
title={Simple baselines for human pose estimation and tracking},
author={Xiao, Bin and Wu, Haiping and Wei, Yichen},
booktitle={Proceedings of the European conference on computer vision (ECCV)},
pages={466--481},
year={2018}
}
AP-10K (NeurIPS'2021)
@misc{yu2021ap10k,
title={AP-10K: A Benchmark for Animal Pose Estimation in the Wild},
author={Hang Yu and Yufei Xu and Jing Zhang and Wei Zhao and Ziyu Guan and Dacheng Tao},
year={2021},
eprint={2108.12617},
archivePrefix={arXiv},
primaryClass={cs.CV}
}
Results on AP-10K validation set
Arch | Input Size | AP | AP50 | AP75 | APM | APL | ckpt | log |
---|---|---|---|---|---|---|---|---|
pose_resnet_50 | 256x256 | 0.699 | 0.940 | 0.760 | 0.570 | 0.703 | ckpt | log |
pose_resnet_101 | 256x256 | 0.698 | 0.943 | 0.754 | 0.543 | 0.702 | ckpt | log |
Topdown Heatmap + Resnet on Atrw¶
SimpleBaseline2D (ECCV'2018)
@inproceedings{xiao2018simple,
title={Simple baselines for human pose estimation and tracking},
author={Xiao, Bin and Wu, Haiping and Wei, Yichen},
booktitle={Proceedings of the European conference on computer vision (ECCV)},
pages={466--481},
year={2018}
}
ATRW (ACM MM'2020)
@inproceedings{li2020atrw,
title={ATRW: A Benchmark for Amur Tiger Re-identification in the Wild},
author={Li, Shuyuan and Li, Jianguo and Tang, Hanlin and Qian, Rui and Lin, Weiyao},
booktitle={Proceedings of the 28th ACM International Conference on Multimedia},
pages={2590--2598},
year={2020}
}
Results on ATRW validation set
Arch | Input Size | AP | AP50 | AP75 | AR | AR50 | ckpt | log |
---|---|---|---|---|---|---|---|---|
pose_resnet_50 | 256x256 | 0.900 | 0.973 | 0.932 | 0.929 | 0.985 | ckpt | log |
pose_resnet_101 | 256x256 | 0.898 | 0.973 | 0.936 | 0.927 | 0.985 | ckpt | log |
pose_resnet_152 | 256x256 | 0.896 | 0.973 | 0.931 | 0.927 | 0.985 | ckpt | log |
Topdown Heatmap + Resnet on Fly¶
SimpleBaseline2D (ECCV'2018)
@inproceedings{xiao2018simple,
title={Simple baselines for human pose estimation and tracking},
author={Xiao, Bin and Wu, Haiping and Wei, Yichen},
booktitle={Proceedings of the European conference on computer vision (ECCV)},
pages={466--481},
year={2018}
}
Vinegar Fly (Nature Methods'2019)
@article{pereira2019fast,
title={Fast animal pose estimation using deep neural networks},
author={Pereira, Talmo D and Aldarondo, Diego E and Willmore, Lindsay and Kislin, Mikhail and Wang, Samuel S-H and Murthy, Mala and Shaevitz, Joshua W},
journal={Nature methods},
volume={16},
number={1},
pages={117--125},
year={2019},
publisher={Nature Publishing Group}
}
Results on Vinegar Fly test set
Arch | Input Size | PCK@0.2 | AUC | EPE | ckpt | log |
---|---|---|---|---|---|---|
pose_resnet_50 | 192x192 | 0.996 | 0.910 | 2.00 | ckpt | log |
pose_resnet_101 | 192x192 | 0.996 | 0.912 | 1.95 | ckpt | log |
pose_resnet_152 | 192x192 | 0.997 | 0.917 | 1.78 | ckpt | log |
Topdown Heatmap + Resnet on Horse10¶
SimpleBaseline2D (ECCV'2018)
@inproceedings{xiao2018simple,
title={Simple baselines for human pose estimation and tracking},
author={Xiao, Bin and Wu, Haiping and Wei, Yichen},
booktitle={Proceedings of the European conference on computer vision (ECCV)},
pages={466--481},
year={2018}
}
Horse-10 (WACV'2021)
@inproceedings{mathis2021pretraining,
title={Pretraining boosts out-of-domain robustness for pose estimation},
author={Mathis, Alexander and Biasi, Thomas and Schneider, Steffen and Yuksekgonul, Mert and Rogers, Byron and Bethge, Matthias and Mathis, Mackenzie W},
booktitle={Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision},
pages={1859--1868},
year={2021}
}
Results on Horse-10 test set
Set | Arch | Input Size | PCK@0.3 | NME | ckpt | log |
---|---|---|---|---|---|---|
split1 | pose_resnet_50 | 256x256 | 0.956 | 0.113 | ckpt | log |
split2 | pose_resnet_50 | 256x256 | 0.954 | 0.111 | ckpt | log |
split3 | pose_resnet_50 | 256x256 | 0.946 | 0.129 | ckpt | log |
split1 | pose_resnet_101 | 256x256 | 0.958 | 0.115 | ckpt | log |
split2 | pose_resnet_101 | 256x256 | 0.955 | 0.115 | ckpt | log |
split3 | pose_resnet_101 | 256x256 | 0.946 | 0.126 | ckpt | log |
split1 | pose_resnet_152 | 256x256 | 0.969 | 0.105 | ckpt | log |
split2 | pose_resnet_152 | 256x256 | 0.970 | 0.103 | ckpt | log |
split3 | pose_resnet_152 | 256x256 | 0.957 | 0.131 | ckpt | log |
Topdown Heatmap + Resnet on Locust¶
SimpleBaseline2D (ECCV'2018)
@inproceedings{xiao2018simple,
title={Simple baselines for human pose estimation and tracking},
author={Xiao, Bin and Wu, Haiping and Wei, Yichen},
booktitle={Proceedings of the European conference on computer vision (ECCV)},
pages={466--481},
year={2018}
}
Desert Locust (Elife'2019)
@article{graving2019deepposekit,
title={DeepPoseKit, a software toolkit for fast and robust animal pose estimation using deep learning},
author={Graving, Jacob M and Chae, Daniel and Naik, Hemal and Li, Liang and Koger, Benjamin and Costelloe, Blair R and Couzin, Iain D},
journal={Elife},
volume={8},
pages={e47994},
year={2019},
publisher={eLife Sciences Publications Limited}
}
Results on Desert Locust test set
Arch | Input Size | PCK@0.2 | AUC | EPE | ckpt | log |
---|---|---|---|---|---|---|
pose_resnet_50 | 160x160 | 0.999 | 0.899 | 2.27 | ckpt | log |
pose_resnet_101 | 160x160 | 0.999 | 0.907 | 2.03 | ckpt | log |
pose_resnet_152 | 160x160 | 1.000 | 0.926 | 1.48 | ckpt | log |
Topdown Heatmap + Resnet on Macaque¶
SimpleBaseline2D (ECCV'2018)
@inproceedings{xiao2018simple,
title={Simple baselines for human pose estimation and tracking},
author={Xiao, Bin and Wu, Haiping and Wei, Yichen},
booktitle={Proceedings of the European conference on computer vision (ECCV)},
pages={466--481},
year={2018}
}
MacaquePose (bioRxiv'2020)
@article{labuguen2020macaquepose,
title={MacaquePose: A novel ‘in the wild’macaque monkey pose dataset for markerless motion capture},
author={Labuguen, Rollyn and Matsumoto, Jumpei and Negrete, Salvador and Nishimaru, Hiroshi and Nishijo, Hisao and Takada, Masahiko and Go, Yasuhiro and Inoue, Ken-ichi and Shibata, Tomohiro},
journal={bioRxiv},
year={2020},
publisher={Cold Spring Harbor Laboratory}
}
Results on MacaquePose with ground-truth detection bounding boxes
Arch | Input Size | AP | AP50 | AP75 | AR | AR50 | ckpt | log |
---|---|---|---|---|---|---|---|---|
pose_resnet_50 | 256x192 | 0.799 | 0.952 | 0.919 | 0.837 | 0.964 | ckpt | log |
pose_resnet_101 | 256x192 | 0.790 | 0.953 | 0.908 | 0.828 | 0.967 | ckpt | log |
pose_resnet_152 | 256x192 | 0.794 | 0.951 | 0.915 | 0.834 | 0.968 | ckpt | log |
Topdown Heatmap + Resnet on Zebra¶
SimpleBaseline2D (ECCV'2018)
@inproceedings{xiao2018simple,
title={Simple baselines for human pose estimation and tracking},
author={Xiao, Bin and Wu, Haiping and Wei, Yichen},
booktitle={Proceedings of the European conference on computer vision (ECCV)},
pages={466--481},
year={2018}
}
Grévy’s Zebra (Elife'2019)
@article{graving2019deepposekit,
title={DeepPoseKit, a software toolkit for fast and robust animal pose estimation using deep learning},
author={Graving, Jacob M and Chae, Daniel and Naik, Hemal and Li, Liang and Koger, Benjamin and Costelloe, Blair R and Couzin, Iain D},
journal={Elife},
volume={8},
pages={e47994},
year={2019},
publisher={eLife Sciences Publications Limited}
}
Results on Grévy’s Zebra test set
Arch | Input Size | PCK@0.2 | AUC | EPE | ckpt | log |
---|---|---|---|---|---|---|
pose_resnet_50 | 160x160 | 1.000 | 0.914 | 1.86 | ckpt | log |
pose_resnet_101 | 160x160 | 1.000 | 0.916 | 1.82 | ckpt | log |
pose_resnet_152 | 160x160 | 1.000 | 0.921 | 1.66 | ckpt | log |
Topdown Heatmap + Resnet on Aic¶
SimpleBaseline2D (ECCV'2018)
@inproceedings{xiao2018simple,
title={Simple baselines for human pose estimation and tracking},
author={Xiao, Bin and Wu, Haiping and Wei, Yichen},
booktitle={Proceedings of the European conference on computer vision (ECCV)},
pages={466--481},
year={2018}
}
ResNet (CVPR'2016)
@inproceedings{he2016deep,
title={Deep residual learning for image recognition},
author={He, Kaiming and Zhang, Xiangyu and Ren, Shaoqing and Sun, Jian},
booktitle={Proceedings of the IEEE conference on computer vision and pattern recognition},
pages={770--778},
year={2016}
}
AI Challenger (ArXiv'2017)
@article{wu2017ai,
title={Ai challenger: A large-scale dataset for going deeper in image understanding},
author={Wu, Jiahong and Zheng, He and Zhao, Bo and Li, Yixin and Yan, Baoming and Liang, Rui and Wang, Wenjia and Zhou, Shipei and Lin, Guosen and Fu, Yanwei and others},
journal={arXiv preprint arXiv:1711.06475},
year={2017}
}
Results on AIC val set with ground-truth bounding boxes
Arch | Input Size | AP | AP50 | AP75 | AR | AR50 | ckpt | log |
---|---|---|---|---|---|---|---|---|
pose_resnet_101 | 256x192 | 0.294 | 0.736 | 0.174 | 0.337 | 0.763 | ckpt | log |
Topdown Heatmap + Resnet + Fp16 on Coco¶
SimpleBaseline2D (ECCV'2018)
@inproceedings{xiao2018simple,
title={Simple baselines for human pose estimation and tracking},
author={Xiao, Bin and Wu, Haiping and Wei, Yichen},
booktitle={Proceedings of the European conference on computer vision (ECCV)},
pages={466--481},
year={2018}
}
ResNet (CVPR'2016)
@inproceedings{he2016deep,
title={Deep residual learning for image recognition},
author={He, Kaiming and Zhang, Xiangyu and Ren, Shaoqing and Sun, Jian},
booktitle={Proceedings of the IEEE conference on computer vision and pattern recognition},
pages={770--778},
year={2016}
}
FP16 (ArXiv'2017)
@article{micikevicius2017mixed,
title={Mixed precision training},
author={Micikevicius, Paulius and Narang, Sharan and Alben, Jonah and Diamos, Gregory and Elsen, Erich and Garcia, David and Ginsburg, Boris and Houston, Michael and Kuchaiev, Oleksii and Venkatesh, Ganesh and others},
journal={arXiv preprint arXiv:1710.03740},
year={2017}
}
COCO (ECCV'2014)
@inproceedings{lin2014microsoft,
title={Microsoft coco: Common objects in context},
author={Lin, Tsung-Yi and Maire, Michael and Belongie, Serge and Hays, James and Perona, Pietro and Ramanan, Deva and Doll{\'a}r, Piotr and Zitnick, C Lawrence},
booktitle={European conference on computer vision},
pages={740--755},
year={2014},
organization={Springer}
}
Results on COCO val2017 with detector having human AP of 56.4 on COCO val2017 dataset
Arch | Input Size | AP | AP50 | AP75 | AR | AR50 | ckpt | log |
---|---|---|---|---|---|---|---|---|
pose_resnet_50_fp16 | 256x192 | 0.717 | 0.898 | 0.793 | 0.772 | 0.936 | ckpt | log |
Topdown Heatmap + Resnet on Coco¶
SimpleBaseline2D (ECCV'2018)
@inproceedings{xiao2018simple,
title={Simple baselines for human pose estimation and tracking},
author={Xiao, Bin and Wu, Haiping and Wei, Yichen},
booktitle={Proceedings of the European conference on computer vision (ECCV)},
pages={466--481},
year={2018}
}
ResNet (CVPR'2016)
@inproceedings{he2016deep,
title={Deep residual learning for image recognition},
author={He, Kaiming and Zhang, Xiangyu and Ren, Shaoqing and Sun, Jian},
booktitle={Proceedings of the IEEE conference on computer vision and pattern recognition},
pages={770--778},
year={2016}
}
COCO (ECCV'2014)
@inproceedings{lin2014microsoft,
title={Microsoft coco: Common objects in context},
author={Lin, Tsung-Yi and Maire, Michael and Belongie, Serge and Hays, James and Perona, Pietro and Ramanan, Deva and Doll{\'a}r, Piotr and Zitnick, C Lawrence},
booktitle={European conference on computer vision},
pages={740--755},
year={2014},
organization={Springer}
}
Results on COCO val2017 with detector having human AP of 56.4 on COCO val2017 dataset
Arch | Input Size | AP | AP50 | AP75 | AR | AR50 | ckpt | log |
---|---|---|---|---|---|---|---|---|
pose_resnet_50 | 256x192 | 0.718 | 0.898 | 0.795 | 0.773 | 0.937 | ckpt | log |
pose_resnet_50 | 384x288 | 0.731 | 0.900 | 0.799 | 0.783 | 0.931 | ckpt | log |
pose_resnet_101 | 256x192 | 0.726 | 0.899 | 0.806 | 0.781 | 0.939 | ckpt | log |
pose_resnet_101 | 384x288 | 0.748 | 0.905 | 0.817 | 0.798 | 0.940 | ckpt | log |
pose_resnet_152 | 256x192 | 0.735 | 0.905 | 0.812 | 0.790 | 0.943 | ckpt | log |
pose_resnet_152 | 384x288 | 0.750 | 0.908 | 0.821 | 0.800 | 0.942 | ckpt | log |
Topdown Heatmap + Resnet + Dark on Coco¶
SimpleBaseline2D (ECCV'2018)
@inproceedings{xiao2018simple,
title={Simple baselines for human pose estimation and tracking},
author={Xiao, Bin and Wu, Haiping and Wei, Yichen},
booktitle={Proceedings of the European conference on computer vision (ECCV)},
pages={466--481},
year={2018}
}
ResNet (CVPR'2016)
@inproceedings{he2016deep,
title={Deep residual learning for image recognition},
author={He, Kaiming and Zhang, Xiangyu and Ren, Shaoqing and Sun, Jian},
booktitle={Proceedings of the IEEE conference on computer vision and pattern recognition},
pages={770--778},
year={2016}
}
DarkPose (CVPR'2020)
@inproceedings{zhang2020distribution,
title={Distribution-aware coordinate representation for human pose estimation},
author={Zhang, Feng and Zhu, Xiatian and Dai, Hanbin and Ye, Mao and Zhu, Ce},
booktitle={Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition},
pages={7093--7102},
year={2020}
}
COCO (ECCV'2014)
@inproceedings{lin2014microsoft,
title={Microsoft coco: Common objects in context},
author={Lin, Tsung-Yi and Maire, Michael and Belongie, Serge and Hays, James and Perona, Pietro and Ramanan, Deva and Doll{\'a}r, Piotr and Zitnick, C Lawrence},
booktitle={European conference on computer vision},
pages={740--755},
year={2014},
organization={Springer}
}
Results on COCO val2017 with detector having human AP of 56.4 on COCO val2017 dataset
Arch | Input Size | AP | AP50 | AP75 | AR | AR50 | ckpt | log |
---|---|---|---|---|---|---|---|---|
pose_resnet_50_dark | 256x192 | 0.724 | 0.898 | 0.800 | 0.777 | 0.936 | ckpt | log |
pose_resnet_50_dark | 384x288 | 0.735 | 0.900 | 0.801 | 0.785 | 0.937 | ckpt | log |
pose_resnet_101_dark | 256x192 | 0.732 | 0.899 | 0.808 | 0.786 | 0.938 | ckpt | log |
pose_resnet_101_dark | 384x288 | 0.749 | 0.902 | 0.816 | 0.799 | 0.939 | ckpt | log |
pose_resnet_152_dark | 256x192 | 0.745 | 0.905 | 0.821 | 0.797 | 0.942 | ckpt | log |
pose_resnet_152_dark | 384x288 | 0.757 | 0.909 | 0.826 | 0.806 | 0.943 | ckpt | log |
Topdown Heatmap + Resnet on Crowdpose¶
SimpleBaseline2D (ECCV'2018)
@inproceedings{xiao2018simple,
title={Simple baselines for human pose estimation and tracking},
author={Xiao, Bin and Wu, Haiping and Wei, Yichen},
booktitle={Proceedings of the European conference on computer vision (ECCV)},
pages={466--481},
year={2018}
}
ResNet (CVPR'2016)
@inproceedings{he2016deep,
title={Deep residual learning for image recognition},
author={He, Kaiming and Zhang, Xiangyu and Ren, Shaoqing and Sun, Jian},
booktitle={Proceedings of the IEEE conference on computer vision and pattern recognition},
pages={770--778},
year={2016}
}
CrowdPose (CVPR'2019)
@article{li2018crowdpose,
title={CrowdPose: Efficient Crowded Scenes Pose Estimation and A New Benchmark},
author={Li, Jiefeng and Wang, Can and Zhu, Hao and Mao, Yihuan and Fang, Hao-Shu and Lu, Cewu},
journal={arXiv preprint arXiv:1812.00324},
year={2018}
}
Results on CrowdPose test with YOLOv3 human detector
Arch | Input Size | AP | AP50 | AP75 | AP (E) | AP (M) | AP (H) | ckpt | log |
---|---|---|---|---|---|---|---|---|---|
pose_resnet_50 | 256x192 | 0.637 | 0.808 | 0.692 | 0.739 | 0.650 | 0.506 | ckpt | log |
pose_resnet_101 | 256x192 | 0.647 | 0.810 | 0.703 | 0.744 | 0.658 | 0.522 | ckpt | log |
pose_resnet_101 | 320x256 | 0.661 | 0.821 | 0.714 | 0.759 | 0.671 | 0.536 | ckpt | log |
pose_resnet_152 | 256x192 | 0.656 | 0.818 | 0.712 | 0.754 | 0.666 | 0.532 | ckpt | log |
Topdown Heatmap + Resnet on JHMDB¶
SimpleBaseline2D (ECCV'2018)
@inproceedings{xiao2018simple,
title={Simple baselines for human pose estimation and tracking},
author={Xiao, Bin and Wu, Haiping and Wei, Yichen},
booktitle={Proceedings of the European conference on computer vision (ECCV)},
pages={466--481},
year={2018}
}
ResNet (CVPR'2016)
@inproceedings{he2016deep,
title={Deep residual learning for image recognition},
author={He, Kaiming and Zhang, Xiangyu and Ren, Shaoqing and Sun, Jian},
booktitle={Proceedings of the IEEE conference on computer vision and pattern recognition},
pages={770--778},
year={2016}
}
JHMDB (ICCV'2013)
@inproceedings{Jhuang:ICCV:2013,
title = {Towards understanding action recognition},
author = {H. Jhuang and J. Gall and S. Zuffi and C. Schmid and M. J. Black},
booktitle = {International Conf. on Computer Vision (ICCV)},
month = Dec,
pages = {3192-3199},
year = {2013}
}
Results on Sub-JHMDB dataset
The models are pre-trained on MPII dataset only. NO test-time augmentation (multi-scale /rotation testing) is used.
Normalized by Person Size
Split | Arch | Input Size | Head | Sho | Elb | Wri | Hip | Knee | Ank | Mean | ckpt | log |
---|---|---|---|---|---|---|---|---|---|---|---|---|
Sub1 | pose_resnet_50 | 256x256 | 99.1 | 98.0 | 93.8 | 91.3 | 99.4 | 96.5 | 92.8 | 96.1 | ckpt | log |
Sub2 | pose_resnet_50 | 256x256 | 99.3 | 97.1 | 90.6 | 87.0 | 98.9 | 96.3 | 94.1 | 95.0 | ckpt | log |
Sub3 | pose_resnet_50 | 256x256 | 99.0 | 97.9 | 94.0 | 91.6 | 99.7 | 98.0 | 94.7 | 96.7 | ckpt | log |
Average | pose_resnet_50 | 256x256 | 99.2 | 97.7 | 92.8 | 90.0 | 99.3 | 96.9 | 93.9 | 96.0 | - | - |
Sub1 | pose_resnet_50 (2 Deconv.) | 256x256 | 99.1 | 98.5 | 94.6 | 92.0 | 99.4 | 94.6 | 92.5 | 96.1 | ckpt | log |
Sub2 | pose_resnet_50 (2 Deconv.) | 256x256 | 99.3 | 97.8 | 91.0 | 87.0 | 99.1 | 96.5 | 93.8 | 95.2 | ckpt | log |
Sub3 | pose_resnet_50 (2 Deconv.) | 256x256 | 98.8 | 98.4 | 94.3 | 92.1 | 99.8 | 97.5 | 93.8 | 96.7 | ckpt | log |
Average | pose_resnet_50 (2 Deconv.) | 256x256 | 99.1 | 98.2 | 93.3 | 90.4 | 99.4 | 96.2 | 93.4 | 96.0 | - | - |
Normalized by Torso Size
Split | Arch | Input Size | Head | Sho | Elb | Wri | Hip | Knee | Ank | Mean | ckpt | log |
---|---|---|---|---|---|---|---|---|---|---|---|---|
Sub1 | pose_resnet_50 | 256x256 | 93.3 | 83.2 | 74.4 | 72.7 | 85.0 | 81.2 | 78.9 | 81.9 | ckpt | log |
Sub2 | pose_resnet_50 | 256x256 | 94.1 | 74.9 | 64.5 | 62.5 | 77.9 | 71.9 | 78.6 | 75.5 | ckpt | log |
Sub3 | pose_resnet_50 | 256x256 | 97.0 | 82.2 | 74.9 | 70.7 | 84.7 | 83.7 | 84.2 | 82.9 | ckpt | log |
Average | pose_resnet_50 | 256x256 | 94.8 | 80.1 | 71.3 | 68.6 | 82.5 | 78.9 | 80.6 | 80.1 | - | - |
Sub1 | pose_resnet_50 (2 Deconv.) | 256x256 | 92.4 | 80.6 | 73.2 | 70.5 | 82.3 | 75.4 | 75.0 | 79.2 | ckpt | log |
Sub2 | pose_resnet_50 (2 Deconv.) | 256x256 | 93.4 | 73.6 | 63.8 | 60.5 | 75.1 | 68.4 | 75.5 | 73.7 | ckpt | log |
Sub3 | pose_resnet_50 (2 Deconv.) | 256x256 | 96.1 | 81.2 | 72.6 | 67.9 | 83.6 | 80.9 | 81.5 | 81.2 | ckpt | log |
Average | pose_resnet_50 (2 Deconv.) | 256x256 | 94.0 | 78.5 | 69.9 | 66.3 | 80.3 | 74.9 | 77.3 | 78.0 | - | - |
Topdown Heatmap + Resnet on MHP¶
SimpleBaseline2D (ECCV'2018)
@inproceedings{xiao2018simple,
title={Simple baselines for human pose estimation and tracking},
author={Xiao, Bin and Wu, Haiping and Wei, Yichen},
booktitle={Proceedings of the European conference on computer vision (ECCV)},
pages={466--481},
year={2018}
}
ResNet (CVPR'2016)
@inproceedings{he2016deep,
title={Deep residual learning for image recognition},
author={He, Kaiming and Zhang, Xiangyu and Ren, Shaoqing and Sun, Jian},
booktitle={Proceedings of the IEEE conference on computer vision and pattern recognition},
pages={770--778},
year={2016}
}
MHP (ACM MM'2018)
@inproceedings{zhao2018understanding,
title={Understanding humans in crowded scenes: Deep nested adversarial learning and a new benchmark for multi-human parsing},
author={Zhao, Jian and Li, Jianshu and Cheng, Yu and Sim, Terence and Yan, Shuicheng and Feng, Jiashi},
booktitle={Proceedings of the 26th ACM international conference on Multimedia},
pages={792--800},
year={2018}
}
Results on MHP v2.0 val set
Arch | Input Size | AP | AP50 | AP75 | AR | AR50 | ckpt | log |
---|---|---|---|---|---|---|---|---|
pose_resnet_101 | 256x192 | 0.583 | 0.897 | 0.669 | 0.636 | 0.918 | ckpt | log |
Note that, the evaluation metric used here is mAP (adapted from COCO), which may be different from the official evaluation codes. Please be cautious if you use the results in papers.
Topdown Heatmap + Resnet on Mpii¶
SimpleBaseline2D (ECCV'2018)
@inproceedings{xiao2018simple,
title={Simple baselines for human pose estimation and tracking},
author={Xiao, Bin and Wu, Haiping and Wei, Yichen},
booktitle={Proceedings of the European conference on computer vision (ECCV)},
pages={466--481},
year={2018}
}
ResNet (CVPR'2016)
@inproceedings{he2016deep,
title={Deep residual learning for image recognition},
author={He, Kaiming and Zhang, Xiangyu and Ren, Shaoqing and Sun, Jian},
booktitle={Proceedings of the IEEE conference on computer vision and pattern recognition},
pages={770--778},
year={2016}
}
MPII (CVPR'2014)
@inproceedings{andriluka14cvpr,
author = {Mykhaylo Andriluka and Leonid Pishchulin and Peter Gehler and Schiele, Bernt},
title = {2D Human Pose Estimation: New Benchmark and State of the Art Analysis},
booktitle = {IEEE Conference on Computer Vision and Pattern Recognition (CVPR)},
year = {2014},
month = {June}
}
Results on MPII val set
Arch | Input Size | Mean | Mean@0.1 | ckpt | log |
---|---|---|---|---|---|
pose_resnet_50 | 256x256 | 0.882 | 0.286 | ckpt | log |
pose_resnet_101 | 256x256 | 0.888 | 0.290 | ckpt | log |
pose_resnet_152 | 256x256 | 0.889 | 0.303 | ckpt | log |
Topdown Heatmap + Resnet + Mpii on Mpii_trb¶
SimpleBaseline2D (ECCV'2018)
@inproceedings{xiao2018simple,
title={Simple baselines for human pose estimation and tracking},
author={Xiao, Bin and Wu, Haiping and Wei, Yichen},
booktitle={Proceedings of the European conference on computer vision (ECCV)},
pages={466--481},
year={2018}
}
ResNet (CVPR'2016)
@inproceedings{he2016deep,
title={Deep residual learning for image recognition},
author={He, Kaiming and Zhang, Xiangyu and Ren, Shaoqing and Sun, Jian},
booktitle={Proceedings of the IEEE conference on computer vision and pattern recognition},
pages={770--778},
year={2016}
}
MPII-TRB (ICCV'2019)
@inproceedings{duan2019trb,
title={TRB: A Novel Triplet Representation for Understanding 2D Human Body},
author={Duan, Haodong and Lin, Kwan-Yee and Jin, Sheng and Liu, Wentao and Qian, Chen and Ouyang, Wanli},
booktitle={Proceedings of the IEEE International Conference on Computer Vision},
pages={9479--9488},
year={2019}
}
Results on MPII-TRB val set
Arch | Input Size | Skeleton Acc | Contour Acc | Mean Acc | ckpt | log |
---|---|---|---|---|---|---|
pose_resnet_50 | 256x256 | 0.887 | 0.858 | 0.868 | ckpt | log |
pose_resnet_101 | 256x256 | 0.890 | 0.863 | 0.873 | ckpt | log |
pose_resnet_152 | 256x256 | 0.897 | 0.868 | 0.879 | ckpt | log |
Topdown Heatmap + Resnet on Ochuman¶
SimpleBaseline2D (ECCV'2018)
@inproceedings{xiao2018simple,
title={Simple baselines for human pose estimation and tracking},
author={Xiao, Bin and Wu, Haiping and Wei, Yichen},
booktitle={Proceedings of the European conference on computer vision (ECCV)},
pages={466--481},
year={2018}
}
ResNet (CVPR'2016)
@inproceedings{he2016deep,
title={Deep residual learning for image recognition},
author={He, Kaiming and Zhang, Xiangyu and Ren, Shaoqing and Sun, Jian},
booktitle={Proceedings of the IEEE conference on computer vision and pattern recognition},
pages={770--778},
year={2016}
}
OCHuman (CVPR'2019)
@inproceedings{zhang2019pose2seg,
title={Pose2seg: Detection free human instance segmentation},
author={Zhang, Song-Hai and Li, Ruilong and Dong, Xin and Rosin, Paul and Cai, Zixi and Han, Xi and Yang, Dingcheng and Huang, Haozhi and Hu, Shi-Min},
booktitle={Proceedings of the IEEE conference on computer vision and pattern recognition},
pages={889--898},
year={2019}
}
Results on OCHuman test dataset with ground-truth bounding boxes
Following the common setting, the models are trained on COCO train dataset, and evaluate on OCHuman dataset.
Arch | Input Size | AP | AP50 | AP75 | AR | AR50 | ckpt | log |
---|---|---|---|---|---|---|---|---|
pose_resnet_50 | 256x192 | 0.546 | 0.726 | 0.593 | 0.592 | 0.755 | ckpt | log |
pose_resnet_50 | 384x288 | 0.539 | 0.723 | 0.574 | 0.588 | 0.756 | ckpt | log |
pose_resnet_101 | 256x192 | 0.559 | 0.724 | 0.606 | 0.605 | 0.751 | ckpt | log |
pose_resnet_101 | 384x288 | 0.571 | 0.715 | 0.615 | 0.615 | 0.748 | ckpt | log |
pose_resnet_152 | 256x192 | 0.570 | 0.725 | 0.617 | 0.616 | 0.754 | ckpt | log |
pose_resnet_152 | 384x288 | 0.582 | 0.723 | 0.627 | 0.627 | 0.752 | ckpt | log |
Topdown Heatmap + Resnet on Posetrack18¶
SimpleBaseline2D (ECCV'2018)
@inproceedings{xiao2018simple,
title={Simple baselines for human pose estimation and tracking},
author={Xiao, Bin and Wu, Haiping and Wei, Yichen},
booktitle={Proceedings of the European conference on computer vision (ECCV)},
pages={466--481},
year={2018}
}
ResNet (CVPR'2016)
@inproceedings{he2016deep,
title={Deep residual learning for image recognition},
author={He, Kaiming and Zhang, Xiangyu and Ren, Shaoqing and Sun, Jian},
booktitle={Proceedings of the IEEE conference on computer vision and pattern recognition},
pages={770--778},
year={2016}
}
PoseTrack18 (CVPR'2018)
@inproceedings{andriluka2018posetrack,
title={Posetrack: A benchmark for human pose estimation and tracking},
author={Andriluka, Mykhaylo and Iqbal, Umar and Insafutdinov, Eldar and Pishchulin, Leonid and Milan, Anton and Gall, Juergen and Schiele, Bernt},
booktitle={Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition},
pages={5167--5176},
year={2018}
}
Results on PoseTrack2018 val with ground-truth bounding boxes
Arch | Input Size | Head | Shou | Elb | Wri | Hip | Knee | Ankl | Total | ckpt | log |
---|---|---|---|---|---|---|---|---|---|---|---|
pose_resnet_50 | 256x192 | 86.5 | 87.5 | 82.3 | 75.6 | 79.9 | 78.6 | 74.0 | 81.0 | ckpt | log |
The models are first pre-trained on COCO dataset, and then fine-tuned on PoseTrack18.
Results on PoseTrack2018 val with MMDetection pre-trained Cascade R-CNN (X-101-64x4d-FPN) human detector
Arch | Input Size | Head | Shou | Elb | Wri | Hip | Knee | Ankl | Total | ckpt | log |
---|---|---|---|---|---|---|---|---|---|---|---|
pose_resnet_50 | 256x192 | 78.9 | 81.9 | 77.8 | 70.8 | 75.3 | 73.2 | 66.4 | 75.2 | ckpt | log |
The models are first pre-trained on COCO dataset, and then fine-tuned on PoseTrack18.
Topdown Heatmap + Resnet + Coco + Wholebody on Coco_wholebody_face¶
SimpleBaseline2D (ECCV'2018)
@inproceedings{xiao2018simple,
title={Simple baselines for human pose estimation and tracking},
author={Xiao, Bin and Wu, Haiping and Wei, Yichen},
booktitle={Proceedings of the European conference on computer vision (ECCV)},
pages={466--481},
year={2018}
}
ResNet (CVPR'2016)
@inproceedings{he2016deep,
title={Deep residual learning for image recognition},
author={He, Kaiming and Zhang, Xiangyu and Ren, Shaoqing and Sun, Jian},
booktitle={Proceedings of the IEEE conference on computer vision and pattern recognition},
pages={770--778},
year={2016}
}
COCO-WholeBody-Face (ECCV'2020)
@inproceedings{jin2020whole,
title={Whole-Body Human Pose Estimation in the Wild},
author={Jin, Sheng and Xu, Lumin and Xu, Jin and Wang, Can and Liu, Wentao and Qian, Chen and Ouyang, Wanli and Luo, Ping},
booktitle={Proceedings of the European Conference on Computer Vision (ECCV)},
year={2020}
}
Results on COCO-WholeBody-Face val set
Arch | Input Size | NME | ckpt | log |
---|---|---|---|---|
pose_res50 | 256x256 | 0.0566 | ckpt | log |
Topdown Heatmap + Resnet on Deepfashion¶
SimpleBaseline2D (ECCV'2018)
@inproceedings{xiao2018simple,
title={Simple baselines for human pose estimation and tracking},
author={Xiao, Bin and Wu, Haiping and Wei, Yichen},
booktitle={Proceedings of the European conference on computer vision (ECCV)},
pages={466--481},
year={2018}
}
ResNet (CVPR'2016)
@inproceedings{he2016deep,
title={Deep residual learning for image recognition},
author={He, Kaiming and Zhang, Xiangyu and Ren, Shaoqing and Sun, Jian},
booktitle={Proceedings of the IEEE conference on computer vision and pattern recognition},
pages={770--778},
year={2016}
}
DeepFashion (CVPR'2016)
@inproceedings{liuLQWTcvpr16DeepFashion,
author = {Liu, Ziwei and Luo, Ping and Qiu, Shi and Wang, Xiaogang and Tang, Xiaoou},
title = {DeepFashion: Powering Robust Clothes Recognition and Retrieval with Rich Annotations},
booktitle = {Proceedings of IEEE Conference on Computer Vision and Pattern Recognition (CVPR)},
month = {June},
year = {2016}
}
DeepFashion (ECCV'2016)
@inproceedings{liuYLWTeccv16FashionLandmark,
author = {Liu, Ziwei and Yan, Sijie and Luo, Ping and Wang, Xiaogang and Tang, Xiaoou},
title = {Fashion Landmark Detection in the Wild},
booktitle = {European Conference on Computer Vision (ECCV)},
month = {October},
year = {2016}
}
Results on DeepFashion val set
Set | Arch | Input Size | PCK@0.2 | AUC | EPE | ckpt | log |
---|---|---|---|---|---|---|---|
upper | pose_resnet_50 | 256x256 | 0.954 | 0.578 | 16.8 | ckpt | log |
lower | pose_resnet_50 | 256x256 | 0.965 | 0.744 | 10.5 | ckpt | log |
full | pose_resnet_50 | 256x256 | 0.977 | 0.664 | 12.7 | ckpt | log |
Topdown Heatmap + Resnet + Coco + Wholebody on Coco_wholebody_hand¶
SimpleBaseline2D (ECCV'2018)
@inproceedings{xiao2018simple,
title={Simple baselines for human pose estimation and tracking},
author={Xiao, Bin and Wu, Haiping and Wei, Yichen},
booktitle={Proceedings of the European conference on computer vision (ECCV)},
pages={466--481},
year={2018}
}
ResNet (CVPR'2016)
@inproceedings{he2016deep,
title={Deep residual learning for image recognition},
author={He, Kaiming and Zhang, Xiangyu and Ren, Shaoqing and Sun, Jian},
booktitle={Proceedings of the IEEE conference on computer vision and pattern recognition},
pages={770--778},
year={2016}
}
COCO-WholeBody-Hand (ECCV'2020)
@inproceedings{jin2020whole,
title={Whole-Body Human Pose Estimation in the Wild},
author={Jin, Sheng and Xu, Lumin and Xu, Jin and Wang, Can and Liu, Wentao and Qian, Chen and Ouyang, Wanli and Luo, Ping},
booktitle={Proceedings of the European Conference on Computer Vision (ECCV)},
year={2020}
}
Results on COCO-WholeBody-Hand val set
Arch | Input Size | PCK@0.2 | AUC | EPE | ckpt | log |
---|---|---|---|---|---|---|
pose_resnet_50 | 256x256 | 0.800 | 0.833 | 4.64 | ckpt | log |
Topdown Heatmap + Resnet on Freihand2d¶
SimpleBaseline2D (ECCV'2018)
@inproceedings{xiao2018simple,
title={Simple baselines for human pose estimation and tracking},
author={Xiao, Bin and Wu, Haiping and Wei, Yichen},
booktitle={Proceedings of the European conference on computer vision (ECCV)},
pages={466--481},
year={2018}
}
ResNet (CVPR'2016)
@inproceedings{he2016deep,
title={Deep residual learning for image recognition},
author={He, Kaiming and Zhang, Xiangyu and Ren, Shaoqing and Sun, Jian},
booktitle={Proceedings of the IEEE conference on computer vision and pattern recognition},
pages={770--778},
year={2016}
}
FreiHand (ICCV'2019)
@inproceedings{zimmermann2019freihand,
title={Freihand: A dataset for markerless capture of hand pose and shape from single rgb images},
author={Zimmermann, Christian and Ceylan, Duygu and Yang, Jimei and Russell, Bryan and Argus, Max and Brox, Thomas},
booktitle={Proceedings of the IEEE International Conference on Computer Vision},
pages={813--822},
year={2019}
}
Results on FreiHand val & test set
Set | Arch | Input Size | PCK@0.2 | AUC | EPE | ckpt | log |
---|---|---|---|---|---|---|---|
val | pose_resnet_50 | 224x224 | 0.993 | 0.868 | 3.25 | ckpt | log |
test | pose_resnet_50 | 224x224 | 0.992 | 0.868 | 3.27 | ckpt | log |
Topdown Heatmap + Resnet on Interhand2d¶
SimpleBaseline2D (ECCV'2018)
@inproceedings{xiao2018simple,
title={Simple baselines for human pose estimation and tracking},
author={Xiao, Bin and Wu, Haiping and Wei, Yichen},
booktitle={Proceedings of the European conference on computer vision (ECCV)},
pages={466--481},
year={2018}
}
ResNet (CVPR'2016)
@inproceedings{he2016deep,
title={Deep residual learning for image recognition},
author={He, Kaiming and Zhang, Xiangyu and Ren, Shaoqing and Sun, Jian},
booktitle={Proceedings of the IEEE conference on computer vision and pattern recognition},
pages={770--778},
year={2016}
}
InterHand2.6M (ECCV'2020)
@InProceedings{Moon_2020_ECCV_InterHand2.6M,
author = {Moon, Gyeongsik and Yu, Shoou-I and Wen, He and Shiratori, Takaaki and Lee, Kyoung Mu},
title = {InterHand2.6M: A Dataset and Baseline for 3D Interacting Hand Pose Estimation from a Single RGB Image},
booktitle = {European Conference on Computer Vision (ECCV)},
year = {2020}
}
Results on InterHand2.6M val & test set
Train Set | Set | Arch | Input Size | PCK@0.2 | AUC | EPE | ckpt | log |
---|---|---|---|---|---|---|---|---|
Human_annot | val(M) | pose_resnet_50 | 256x256 | 0.973 | 0.828 | 5.15 | ckpt | log |
Human_annot | test(H) | pose_resnet_50 | 256x256 | 0.973 | 0.826 | 5.27 | ckpt | log |
Human_annot | test(M) | pose_resnet_50 | 256x256 | 0.975 | 0.841 | 4.90 | ckpt | log |
Human_annot | test(H+M) | pose_resnet_50 | 256x256 | 0.975 | 0.839 | 4.97 | ckpt | log |
Machine_annot | val(M) | pose_resnet_50 | 256x256 | 0.970 | 0.824 | 5.39 | ckpt | log |
Machine_annot | test(H) | pose_resnet_50 | 256x256 | 0.969 | 0.821 | 5.52 | ckpt | log |
Machine_annot | test(M) | pose_resnet_50 | 256x256 | 0.972 | 0.838 | 5.03 | ckpt | log |
Machine_annot | test(H+M) | pose_resnet_50 | 256x256 | 0.972 | 0.837 | 5.11 | ckpt | log |
All | val(M) | pose_resnet_50 | 256x256 | 0.977 | 0.840 | 4.66 | ckpt | log |
All | test(H) | pose_resnet_50 | 256x256 | 0.979 | 0.839 | 4.65 | ckpt | log |
All | test(M) | pose_resnet_50 | 256x256 | 0.979 | 0.838 | 4.42 | ckpt | log |
All | test(H+M) | pose_resnet_50 | 256x256 | 0.979 | 0.851 | 4.46 | ckpt | log |
Topdown Heatmap + Resnet on Onehand10k¶
SimpleBaseline2D (ECCV'2018)
@inproceedings{xiao2018simple,
title={Simple baselines for human pose estimation and tracking},
author={Xiao, Bin and Wu, Haiping and Wei, Yichen},
booktitle={Proceedings of the European conference on computer vision (ECCV)},
pages={466--481},
year={2018}
}
ResNet (CVPR'2016)
@inproceedings{he2016deep,
title={Deep residual learning for image recognition},
author={He, Kaiming and Zhang, Xiangyu and Ren, Shaoqing and Sun, Jian},
booktitle={Proceedings of the IEEE conference on computer vision and pattern recognition},
pages={770--778},
year={2016}
}
OneHand10K (TCSVT'2019)
@article{wang2018mask,
title={Mask-pose cascaded cnn for 2d hand pose estimation from single color image},
author={Wang, Yangang and Peng, Cong and Liu, Yebin},
journal={IEEE Transactions on Circuits and Systems for Video Technology},
volume={29},
number={11},
pages={3258--3268},
year={2018},
publisher={IEEE}
}
Results on OneHand10K val set
Arch | Input Size | PCK@0.2 | AUC | EPE | ckpt | log |
---|---|---|---|---|---|---|
pose_resnet_50 | 256x256 | 0.989 | 0.555 | 25.19 | ckpt | log |
Topdown Heatmap + Resnet on Panoptic2d¶
SimpleBaseline2D (ECCV'2018)
@inproceedings{xiao2018simple,
title={Simple baselines for human pose estimation and tracking},
author={Xiao, Bin and Wu, Haiping and Wei, Yichen},
booktitle={Proceedings of the European conference on computer vision (ECCV)},
pages={466--481},
year={2018}
}
ResNet (CVPR'2016)
@inproceedings{he2016deep,
title={Deep residual learning for image recognition},
author={He, Kaiming and Zhang, Xiangyu and Ren, Shaoqing and Sun, Jian},
booktitle={Proceedings of the IEEE conference on computer vision and pattern recognition},
pages={770--778},
year={2016}
}
CMU Panoptic HandDB (CVPR'2017)
@inproceedings{simon2017hand,
title={Hand keypoint detection in single images using multiview bootstrapping},
author={Simon, Tomas and Joo, Hanbyul and Matthews, Iain and Sheikh, Yaser},
booktitle={Proceedings of the IEEE conference on Computer Vision and Pattern Recognition},
pages={1145--1153},
year={2017}
}
Results on CMU Panoptic (MPII+NZSL val set)
Arch | Input Size | PCKh@0.7 | AUC | EPE | ckpt | log |
---|---|---|---|---|---|---|
pose_resnet_50 | 256x256 | 0.999 | 0.713 | 9.00 | ckpt | log |
Topdown Heatmap + Resnet on Rhd2d¶
SimpleBaseline2D (ECCV'2018)
@inproceedings{xiao2018simple,
title={Simple baselines for human pose estimation and tracking},
author={Xiao, Bin and Wu, Haiping and Wei, Yichen},
booktitle={Proceedings of the European conference on computer vision (ECCV)},
pages={466--481},
year={2018}
}
ResNet (CVPR'2016)
@inproceedings{he2016deep,
title={Deep residual learning for image recognition},
author={He, Kaiming and Zhang, Xiangyu and Ren, Shaoqing and Sun, Jian},
booktitle={Proceedings of the IEEE conference on computer vision and pattern recognition},
pages={770--778},
year={2016}
}
RHD (ICCV'2017)
@TechReport{zb2017hand,
author={Christian Zimmermann and Thomas Brox},
title={Learning to Estimate 3D Hand Pose from Single RGB Images},
institution={arXiv:1705.01389},
year={2017},
note="https://arxiv.org/abs/1705.01389",
url="https://lmb.informatik.uni-freiburg.de/projects/hand3d/"
}
Results on RHD test set
Arch | Input Size | PCK@0.2 | AUC | EPE | ckpt | log |
---|---|---|---|---|---|---|
pose_resnet50 | 256x256 | 0.991 | 0.898 | 2.33 | ckpt | log |
Topdown Heatmap + Resnet on Coco-Wholebody¶
SimpleBaseline2D (ECCV'2018)
@inproceedings{xiao2018simple,
title={Simple baselines for human pose estimation and tracking},
author={Xiao, Bin and Wu, Haiping and Wei, Yichen},
booktitle={Proceedings of the European conference on computer vision (ECCV)},
pages={466--481},
year={2018}
}
COCO-WholeBody (ECCV'2020)
@inproceedings{jin2020whole,
title={Whole-Body Human Pose Estimation in the Wild},
author={Jin, Sheng and Xu, Lumin and Xu, Jin and Wang, Can and Liu, Wentao and Qian, Chen and Ouyang, Wanli and Luo, Ping},
booktitle={Proceedings of the European Conference on Computer Vision (ECCV)},
year={2020}
}
Results on COCO-WholeBody v1.0 val with detector having human AP of 56.4 on COCO val2017 dataset
Arch | Input Size | Body AP | Body AR | Foot AP | Foot AR | Face AP | Face AR | Hand AP | Hand AR | Whole AP | Whole AR | ckpt | log |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|
pose_resnet_50 | 256x192 | 0.652 | 0.739 | 0.614 | 0.746 | 0.608 | 0.716 | 0.460 | 0.584 | 0.520 | 0.633 | ckpt | log |
pose_resnet_50 | 384x288 | 0.666 | 0.747 | 0.635 | 0.763 | 0.732 | 0.812 | 0.537 | 0.647 | 0.573 | 0.671 | ckpt | log |
pose_resnet_101 | 256x192 | 0.670 | 0.754 | 0.640 | 0.767 | 0.611 | 0.723 | 0.463 | 0.589 | 0.533 | 0.647 | ckpt | log |
pose_resnet_101 | 384x288 | 0.692 | 0.770 | 0.680 | 0.798 | 0.747 | 0.822 | 0.549 | 0.658 | 0.597 | 0.692 | ckpt | log |
pose_resnet_152 | 256x192 | 0.682 | 0.764 | 0.662 | 0.788 | 0.624 | 0.728 | 0.482 | 0.606 | 0.548 | 0.661 | ckpt | log |
pose_resnet_152 | 384x288 | 0.703 | 0.780 | 0.693 | 0.813 | 0.751 | 0.825 | 0.559 | 0.667 | 0.610 | 0.705 | ckpt | log |
HMR (CVPR’2018)¶
HMR + Resnet on Mixed¶
HMR (CVPR'2018)
@inProceedings{kanazawaHMR18,
title={End-to-end Recovery of Human Shape and Pose},
author = {Angjoo Kanazawa
and Michael J. Black
and David W. Jacobs
and Jitendra Malik},
booktitle={Computer Vision and Pattern Recognition (CVPR)},
year={2018}
}
ResNet (CVPR'2016)
@inproceedings{he2016deep,
title={Deep residual learning for image recognition},
author={He, Kaiming and Zhang, Xiangyu and Ren, Shaoqing and Sun, Jian},
booktitle={Proceedings of the IEEE conference on computer vision and pattern recognition},
pages={770--778},
year={2016}
}
Human3.6M (TPAMI'2014)
@article{h36m_pami,
author = {Ionescu, Catalin and Papava, Dragos and Olaru, Vlad and Sminchisescu, Cristian},
title = {Human3.6M: Large Scale Datasets and Predictive Methods for 3D Human Sensing in Natural Environments},
journal = {IEEE Transactions on Pattern Analysis and Machine Intelligence},
publisher = {IEEE Computer Society},
volume = {36},
number = {7},
pages = {1325-1339},
month = {jul},
year = {2014}
}
Results on Human3.6M with ground-truth bounding box having MPJPE-PA of 52.60 mm on Protocol2
Arch | Input Size | MPJPE (P1) | MPJPE-PA (P1) | MPJPE (P2) | MPJPE-PA (P2) | ckpt | log |
---|---|---|---|---|---|---|---|
hmr_resnet_50 | 224x224 | 80.75 | 55.08 | 80.35 | 52.60 | ckpt | log |
UDP (CVPR’2020)¶
Associative Embedding + Higherhrnet + Udp on Coco¶
Associative Embedding (NIPS'2017)
@inproceedings{newell2017associative,
title={Associative embedding: End-to-end learning for joint detection and grouping},
author={Newell, Alejandro and Huang, Zhiao and Deng, Jia},
booktitle={Advances in neural information processing systems},
pages={2277--2287},
year={2017}
}
HigherHRNet (CVPR'2020)
@inproceedings{cheng2020higherhrnet,
title={HigherHRNet: Scale-Aware Representation Learning for Bottom-Up Human Pose Estimation},
author={Cheng, Bowen and Xiao, Bin and Wang, Jingdong and Shi, Honghui and Huang, Thomas S and Zhang, Lei},
booktitle={Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition},
pages={5386--5395},
year={2020}
}
UDP (CVPR'2020)
@InProceedings{Huang_2020_CVPR,
author = {Huang, Junjie and Zhu, Zheng and Guo, Feng and Huang, Guan},
title = {The Devil Is in the Details: Delving Into Unbiased Data Processing for Human Pose Estimation},
booktitle = {The IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)},
month = {June},
year = {2020}
}
COCO (ECCV'2014)
@inproceedings{lin2014microsoft,
title={Microsoft coco: Common objects in context},
author={Lin, Tsung-Yi and Maire, Michael and Belongie, Serge and Hays, James and Perona, Pietro and Ramanan, Deva and Doll{\'a}r, Piotr and Zitnick, C Lawrence},
booktitle={European conference on computer vision},
pages={740--755},
year={2014},
organization={Springer}
}
Results on COCO val2017 without multi-scale test
Arch | Input Size | AP | AP50 | AP75 | AR | AR50 | ckpt | log |
---|---|---|---|---|---|---|---|---|
HigherHRNet-w32_udp | 512x512 | 0.678 | 0.862 | 0.736 | 0.724 | 0.890 | ckpt | log |
HigherHRNet-w48_udp | 512x512 | 0.690 | 0.872 | 0.750 | 0.734 | 0.891 | ckpt | log |
Associative Embedding + Hrnet + Udp on Coco¶
Associative Embedding (NIPS'2017)
@inproceedings{newell2017associative,
title={Associative embedding: End-to-end learning for joint detection and grouping},
author={Newell, Alejandro and Huang, Zhiao and Deng, Jia},
booktitle={Advances in neural information processing systems},
pages={2277--2287},
year={2017}
}
HRNet (CVPR'2019)
@inproceedings{sun2019deep,
title={Deep high-resolution representation learning for human pose estimation},
author={Sun, Ke and Xiao, Bin and Liu, Dong and Wang, Jingdong},
booktitle={Proceedings of the IEEE conference on computer vision and pattern recognition},
pages={5693--5703},
year={2019}
}
UDP (CVPR'2020)
@InProceedings{Huang_2020_CVPR,
author = {Huang, Junjie and Zhu, Zheng and Guo, Feng and Huang, Guan},
title = {The Devil Is in the Details: Delving Into Unbiased Data Processing for Human Pose Estimation},
booktitle = {The IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)},
month = {June},
year = {2020}
}
COCO (ECCV'2014)
@inproceedings{lin2014microsoft,
title={Microsoft coco: Common objects in context},
author={Lin, Tsung-Yi and Maire, Michael and Belongie, Serge and Hays, James and Perona, Pietro and Ramanan, Deva and Doll{\'a}r, Piotr and Zitnick, C Lawrence},
booktitle={European conference on computer vision},
pages={740--755},
year={2014},
organization={Springer}
}
Results on COCO val2017 without multi-scale test
Arch | Input Size | AP | AP50 | AP75 | AR | AR50 | ckpt | log |
---|---|---|---|---|---|---|---|---|
HRNet-w32_udp | 512x512 | 0.671 | 0.863 | 0.729 | 0.717 | 0.889 | ckpt | log |
HRNet-w48_udp | 512x512 | 0.681 | 0.872 | 0.741 | 0.725 | 0.892 | ckpt | log |
Topdown Heatmap + Hrnet + Udp on Coco¶
HRNet (CVPR'2019)
@inproceedings{sun2019deep,
title={Deep high-resolution representation learning for human pose estimation},
author={Sun, Ke and Xiao, Bin and Liu, Dong and Wang, Jingdong},
booktitle={Proceedings of the IEEE conference on computer vision and pattern recognition},
pages={5693--5703},
year={2019}
}
UDP (CVPR'2020)
@InProceedings{Huang_2020_CVPR,
author = {Huang, Junjie and Zhu, Zheng and Guo, Feng and Huang, Guan},
title = {The Devil Is in the Details: Delving Into Unbiased Data Processing for Human Pose Estimation},
booktitle = {The IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)},
month = {June},
year = {2020}
}
COCO (ECCV'2014)
@inproceedings{lin2014microsoft,
title={Microsoft coco: Common objects in context},
author={Lin, Tsung-Yi and Maire, Michael and Belongie, Serge and Hays, James and Perona, Pietro and Ramanan, Deva and Doll{\'a}r, Piotr and Zitnick, C Lawrence},
booktitle={European conference on computer vision},
pages={740--755},
year={2014},
organization={Springer}
}
Results on COCO val2017 with detector having human AP of 56.4 on COCO val2017 dataset
Arch | Input Size | AP | AP50 | AP75 | AR | AR50 | ckpt | log |
---|---|---|---|---|---|---|---|---|
pose_hrnet_w32_udp | 256x192 | 0.760 | 0.907 | 0.827 | 0.811 | 0.945 | ckpt | log |
pose_hrnet_w32_udp | 384x288 | 0.769 | 0.908 | 0.833 | 0.817 | 0.944 | ckpt | log |
pose_hrnet_w48_udp | 256x192 | 0.767 | 0.906 | 0.834 | 0.817 | 0.945 | ckpt | log |
pose_hrnet_w48_udp | 384x288 | 0.772 | 0.910 | 0.835 | 0.820 | 0.945 | ckpt | log |
pose_hrnet_w32_udp_regress | 256x192 | 0.758 | 0.908 | 0.823 | 0.812 | 0.943 | ckpt | log |
Note that, UDP also adopts the unbiased encoding/decoding algorithm of DARK.
Topdown Heatmap + Hrnetv2 + Udp on Onehand10k¶
HRNetv2 (TPAMI'2019)
@article{WangSCJDZLMTWLX19,
title={Deep High-Resolution Representation Learning for Visual Recognition},
author={Jingdong Wang and Ke Sun and Tianheng Cheng and
Borui Jiang and Chaorui Deng and Yang Zhao and Dong Liu and Yadong Mu and
Mingkui Tan and Xinggang Wang and Wenyu Liu and Bin Xiao},
journal={TPAMI},
year={2019}
}
UDP (CVPR'2020)
@InProceedings{Huang_2020_CVPR,
author = {Huang, Junjie and Zhu, Zheng and Guo, Feng and Huang, Guan},
title = {The Devil Is in the Details: Delving Into Unbiased Data Processing for Human Pose Estimation},
booktitle = {The IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)},
month = {June},
year = {2020}
}
OneHand10K (TCSVT'2019)
@article{wang2018mask,
title={Mask-pose cascaded cnn for 2d hand pose estimation from single color image},
author={Wang, Yangang and Peng, Cong and Liu, Yebin},
journal={IEEE Transactions on Circuits and Systems for Video Technology},
volume={29},
number={11},
pages={3258--3268},
year={2018},
publisher={IEEE}
}
Results on OneHand10K val set
Arch | Input Size | PCK@0.2 | AUC | EPE | ckpt | log |
---|---|---|---|---|---|---|
pose_hrnetv2_w18_udp | 256x256 | 0.990 | 0.572 | 23.87 | ckpt | log |
Topdown Heatmap + Hrnetv2 + Udp on Panoptic2d¶
HRNetv2 (TPAMI'2019)
@article{WangSCJDZLMTWLX19,
title={Deep High-Resolution Representation Learning for Visual Recognition},
author={Jingdong Wang and Ke Sun and Tianheng Cheng and
Borui Jiang and Chaorui Deng and Yang Zhao and Dong Liu and Yadong Mu and
Mingkui Tan and Xinggang Wang and Wenyu Liu and Bin Xiao},
journal={TPAMI},
year={2019}
}
UDP (CVPR'2020)
@InProceedings{Huang_2020_CVPR,
author = {Huang, Junjie and Zhu, Zheng and Guo, Feng and Huang, Guan},
title = {The Devil Is in the Details: Delving Into Unbiased Data Processing for Human Pose Estimation},
booktitle = {The IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)},
month = {June},
year = {2020}
}
CMU Panoptic HandDB (CVPR'2017)
@inproceedings{simon2017hand,
title={Hand keypoint detection in single images using multiview bootstrapping},
author={Simon, Tomas and Joo, Hanbyul and Matthews, Iain and Sheikh, Yaser},
booktitle={Proceedings of the IEEE conference on Computer Vision and Pattern Recognition},
pages={1145--1153},
year={2017}
}
Results on CMU Panoptic (MPII+NZSL val set)
Arch | Input Size | PCKh@0.7 | AUC | EPE | ckpt | log |
---|---|---|---|---|---|---|
pose_hrnetv2_w18_udp | 256x256 | 0.998 | 0.742 | 7.84 | ckpt | log |
Topdown Heatmap + Hrnetv2 + Udp on Rhd2d¶
HRNetv2 (TPAMI'2019)
@article{WangSCJDZLMTWLX19,
title={Deep High-Resolution Representation Learning for Visual Recognition},
author={Jingdong Wang and Ke Sun and Tianheng Cheng and
Borui Jiang and Chaorui Deng and Yang Zhao and Dong Liu and Yadong Mu and
Mingkui Tan and Xinggang Wang and Wenyu Liu and Bin Xiao},
journal={TPAMI},
year={2019}
}
UDP (CVPR'2020)
@InProceedings{Huang_2020_CVPR,
author = {Huang, Junjie and Zhu, Zheng and Guo, Feng and Huang, Guan},
title = {The Devil Is in the Details: Delving Into Unbiased Data Processing for Human Pose Estimation},
booktitle = {The IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)},
month = {June},
year = {2020}
}
RHD (ICCV'2017)
@TechReport{zb2017hand,
author={Christian Zimmermann and Thomas Brox},
title={Learning to Estimate 3D Hand Pose from Single RGB Images},
institution={arXiv:1705.01389},
year={2017},
note="https://arxiv.org/abs/1705.01389",
url="https://lmb.informatik.uni-freiburg.de/projects/hand3d/"
}
Results on CMU Panoptic (MPII+NZSL val set)
Arch | Input Size | PCKh@0.7 | AUC | EPE | ckpt | log |
---|---|---|---|---|---|---|
pose_hrnetv2_w18_udp | 256x256 | 0.998 | 0.742 | 7.84 | ckpt | log |
CPM (CVPR’2016)¶
Topdown Heatmap + CPM on Coco¶
CPM (CVPR'2016)
@inproceedings{wei2016convolutional,
title={Convolutional pose machines},
author={Wei, Shih-En and Ramakrishna, Varun and Kanade, Takeo and Sheikh, Yaser},
booktitle={Proceedings of the IEEE conference on Computer Vision and Pattern Recognition},
pages={4724--4732},
year={2016}
}
COCO (ECCV'2014)
@inproceedings{lin2014microsoft,
title={Microsoft coco: Common objects in context},
author={Lin, Tsung-Yi and Maire, Michael and Belongie, Serge and Hays, James and Perona, Pietro and Ramanan, Deva and Doll{\'a}r, Piotr and Zitnick, C Lawrence},
booktitle={European conference on computer vision},
pages={740--755},
year={2014},
organization={Springer}
}
Results on COCO val2017 with detector having human AP of 56.4 on COCO val2017 dataset
Arch | Input Size | AP | AP50 | AP75 | AR | AR50 | ckpt | log |
---|---|---|---|---|---|---|---|---|
cpm | 256x192 | 0.623 | 0.859 | 0.704 | 0.686 | 0.903 | ckpt | log |
cpm | 384x288 | 0.650 | 0.864 | 0.725 | 0.708 | 0.905 | ckpt | log |
Topdown Heatmap + CPM on JHMDB¶
CPM (CVPR'2016)
@inproceedings{wei2016convolutional,
title={Convolutional pose machines},
author={Wei, Shih-En and Ramakrishna, Varun and Kanade, Takeo and Sheikh, Yaser},
booktitle={Proceedings of the IEEE conference on Computer Vision and Pattern Recognition},
pages={4724--4732},
year={2016}
}
JHMDB (ICCV'2013)
@inproceedings{Jhuang:ICCV:2013,
title = {Towards understanding action recognition},
author = {H. Jhuang and J. Gall and S. Zuffi and C. Schmid and M. J. Black},
booktitle = {International Conf. on Computer Vision (ICCV)},
month = Dec,
pages = {3192-3199},
year = {2013}
}
Results on Sub-JHMDB dataset
The models are pre-trained on MPII dataset only. NO test-time augmentation (multi-scale /rotation testing) is used.
Normalized by Person Size
Split | Arch | Input Size | Head | Sho | Elb | Wri | Hip | Knee | Ank | Mean | ckpt | log |
---|---|---|---|---|---|---|---|---|---|---|---|---|
Sub1 | cpm | 368x368 | 96.1 | 91.9 | 81.0 | 78.9 | 96.6 | 90.8 | 87.3 | 89.5 | ckpt | log |
Sub2 | cpm | 368x368 | 98.1 | 93.6 | 77.1 | 70.9 | 94.0 | 89.1 | 84.7 | 87.4 | ckpt | log |
Sub3 | cpm | 368x368 | 97.9 | 94.9 | 87.3 | 84.0 | 98.6 | 94.4 | 86.2 | 92.4 | ckpt | log |
Average | cpm | 368x368 | 97.4 | 93.5 | 81.5 | 77.9 | 96.4 | 91.4 | 86.1 | 89.8 | - | - |
Normalized by Torso Size
Split | Arch | Input Size | Head | Sho | Elb | Wri | Hip | Knee | Ank | Mean | ckpt | log |
---|---|---|---|---|---|---|---|---|---|---|---|---|
Sub1 | cpm | 368x368 | 89.0 | 63.0 | 54.0 | 54.9 | 68.2 | 63.1 | 61.2 | 66.0 | ckpt | log |
Sub2 | cpm | 368x368 | 90.3 | 57.9 | 46.8 | 44.3 | 60.8 | 58.2 | 62.4 | 61.1 | ckpt | log |
Sub3 | cpm | 368x368 | 91.0 | 72.6 | 59.9 | 54.0 | 73.2 | 68.5 | 65.8 | 70.3 | ckpt | log |
Average | cpm | 368x368 | 90.1 | 64.5 | 53.6 | 51.1 | 67.4 | 63.3 | 63.1 | 65.7 | - | - |
Topdown Heatmap + CPM on Mpii¶
CPM (CVPR'2016)
@inproceedings{wei2016convolutional,
title={Convolutional pose machines},
author={Wei, Shih-En and Ramakrishna, Varun and Kanade, Takeo and Sheikh, Yaser},
booktitle={Proceedings of the IEEE conference on Computer Vision and Pattern Recognition},
pages={4724--4732},
year={2016}
}
MPII (CVPR'2014)
@inproceedings{andriluka14cvpr,
author = {Mykhaylo Andriluka and Leonid Pishchulin and Peter Gehler and Schiele, Bernt},
title = {2D Human Pose Estimation: New Benchmark and State of the Art Analysis},
booktitle = {IEEE Conference on Computer Vision and Pattern Recognition (CVPR)},
year = {2014},
month = {June}
}
Results on MPII val set
Arch | Input Size | Mean | Mean@0.1 | ckpt | log |
---|---|---|---|---|---|
cpm | 368x368 | 0.876 | 0.285 | ckpt | log |
VoxelPose (ECCV’2020)¶
Voxelpose + Voxelpose + Prn64x64x64 + Cpn80x80x20 + Panoptic on Panoptic¶
VoxelPose (ECCV'2020)
@inproceedings{tumultipose,
title={VoxelPose: Towards Multi-Camera 3D Human Pose Estimation in Wild Environment},
author={Tu, Hanyue and Wang, Chunyu and Zeng, Wenjun},
booktitle={ECCV},
year={2020}
}
CMU Panoptic (ICCV'2015)
@Article = {joo_iccv_2015,
author = {Hanbyul Joo, Hao Liu, Lei Tan, Lin Gui, Bart Nabbe, Iain Matthews, Takeo Kanade, Shohei Nobuhara, and Yaser Sheikh},
title = {Panoptic Studio: A Massively Multiview System for Social Motion Capture},
booktitle = {ICCV},
year = {2015}
}
Results on CMU Panoptic dataset.
Arch | mAP | mAR | MPJPE | Recall@500mm | ckpt | log |
---|---|---|---|---|---|---|
prn64_cpn80_res50 | 97.31 | 97.99 | 17.57 | 99.85 | ckpt | log |
AdaptiveWingloss (ICCV’2019)¶
Topdown Heatmap + Hrnetv2 + Awing on WFLW¶
HRNetv2 (TPAMI'2019)
@article{WangSCJDZLMTWLX19,
title={Deep High-Resolution Representation Learning for Visual Recognition},
author={Jingdong Wang and Ke Sun and Tianheng Cheng and
Borui Jiang and Chaorui Deng and Yang Zhao and Dong Liu and Yadong Mu and
Mingkui Tan and Xinggang Wang and Wenyu Liu and Bin Xiao},
journal={TPAMI},
year={2019}
}
AdaptiveWingloss (ICCV'2019)
@inproceedings{wang2019adaptive,
title={Adaptive wing loss for robust face alignment via heatmap regression},
author={Wang, Xinyao and Bo, Liefeng and Fuxin, Li},
booktitle={Proceedings of the IEEE/CVF international conference on computer vision},
pages={6971--6981},
year={2019}
}
WFLW (CVPR'2018)
@inproceedings{wu2018look,
title={Look at boundary: A boundary-aware face alignment algorithm},
author={Wu, Wayne and Qian, Chen and Yang, Shuo and Wang, Quan and Cai, Yici and Zhou, Qiang},
booktitle={Proceedings of the IEEE conference on computer vision and pattern recognition},
pages={2129--2138},
year={2018}
}
Results on WFLW dataset
The model is trained on WFLW train.
Arch | Input Size | NMEtest | NMEpose | NMEillumination | NMEocclusion | NMEblur | NMEmakeup | NMEexpression | ckpt | log |
---|---|---|---|---|---|---|---|---|---|---|
pose_hrnetv2_w18_awing | 256x256 | 4.02 | 6.94 | 3.96 | 4.78 | 4.59 | 3.85 | 4.28 | ckpt | log |
HigherHRNet (CVPR’2020)¶
Associative Embedding + Higherhrnet on Aic¶
Associative Embedding (NIPS'2017)
@inproceedings{newell2017associative,
title={Associative embedding: End-to-end learning for joint detection and grouping},
author={Newell, Alejandro and Huang, Zhiao and Deng, Jia},
booktitle={Advances in neural information processing systems},
pages={2277--2287},
year={2017}
}
HigherHRNet (CVPR'2020)
@inproceedings{cheng2020higherhrnet,
title={HigherHRNet: Scale-Aware Representation Learning for Bottom-Up Human Pose Estimation},
author={Cheng, Bowen and Xiao, Bin and Wang, Jingdong and Shi, Honghui and Huang, Thomas S and Zhang, Lei},
booktitle={Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition},
pages={5386--5395},
year={2020}
}
AI Challenger (ArXiv'2017)
@article{wu2017ai,
title={Ai challenger: A large-scale dataset for going deeper in image understanding},
author={Wu, Jiahong and Zheng, He and Zhao, Bo and Li, Yixin and Yan, Baoming and Liang, Rui and Wang, Wenjia and Zhou, Shipei and Lin, Guosen and Fu, Yanwei and others},
journal={arXiv preprint arXiv:1711.06475},
year={2017}
}
Results on AIC validation set without multi-scale test
Arch | Input Size | AP | AP50 | AP75 | AR | AR50 | ckpt | log |
---|---|---|---|---|---|---|---|---|
HigherHRNet-w32 | 512x512 | 0.315 | 0.710 | 0.243 | 0.379 | 0.757 | ckpt | log |
Results on AIC validation set with multi-scale test. 3 default scales ([2, 1, 0.5]) are used
Arch | Input Size | AP | AP50 | AP75 | AR | AR50 | ckpt | log |
---|---|---|---|---|---|---|---|---|
HigherHRNet-w32 | 512x512 | 0.323 | 0.718 | 0.254 | 0.379 | 0.758 | ckpt | log |
Associative Embedding + Higherhrnet + Udp on Coco¶
Associative Embedding (NIPS'2017)
@inproceedings{newell2017associative,
title={Associative embedding: End-to-end learning for joint detection and grouping},
author={Newell, Alejandro and Huang, Zhiao and Deng, Jia},
booktitle={Advances in neural information processing systems},
pages={2277--2287},
year={2017}
}
HigherHRNet (CVPR'2020)
@inproceedings{cheng2020higherhrnet,
title={HigherHRNet: Scale-Aware Representation Learning for Bottom-Up Human Pose Estimation},
author={Cheng, Bowen and Xiao, Bin and Wang, Jingdong and Shi, Honghui and Huang, Thomas S and Zhang, Lei},
booktitle={Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition},
pages={5386--5395},
year={2020}
}
UDP (CVPR'2020)
@InProceedings{Huang_2020_CVPR,
author = {Huang, Junjie and Zhu, Zheng and Guo, Feng and Huang, Guan},
title = {The Devil Is in the Details: Delving Into Unbiased Data Processing for Human Pose Estimation},
booktitle = {The IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)},
month = {June},
year = {2020}
}
COCO (ECCV'2014)
@inproceedings{lin2014microsoft,
title={Microsoft coco: Common objects in context},
author={Lin, Tsung-Yi and Maire, Michael and Belongie, Serge and Hays, James and Perona, Pietro and Ramanan, Deva and Doll{\'a}r, Piotr and Zitnick, C Lawrence},
booktitle={European conference on computer vision},
pages={740--755},
year={2014},
organization={Springer}
}
Results on COCO val2017 without multi-scale test
Arch | Input Size | AP | AP50 | AP75 | AR | AR50 | ckpt | log |
---|---|---|---|---|---|---|---|---|
HigherHRNet-w32_udp | 512x512 | 0.678 | 0.862 | 0.736 | 0.724 | 0.890 | ckpt | log |
HigherHRNet-w48_udp | 512x512 | 0.690 | 0.872 | 0.750 | 0.734 | 0.891 | ckpt | log |
Associative Embedding + Higherhrnet on Coco¶
Associative Embedding (NIPS'2017)
@inproceedings{newell2017associative,
title={Associative embedding: End-to-end learning for joint detection and grouping},
author={Newell, Alejandro and Huang, Zhiao and Deng, Jia},
booktitle={Advances in neural information processing systems},
pages={2277--2287},
year={2017}
}
HigherHRNet (CVPR'2020)
@inproceedings{cheng2020higherhrnet,
title={HigherHRNet: Scale-Aware Representation Learning for Bottom-Up Human Pose Estimation},
author={Cheng, Bowen and Xiao, Bin and Wang, Jingdong and Shi, Honghui and Huang, Thomas S and Zhang, Lei},
booktitle={Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition},
pages={5386--5395},
year={2020}
}
COCO (ECCV'2014)
@inproceedings{lin2014microsoft,
title={Microsoft coco: Common objects in context},
author={Lin, Tsung-Yi and Maire, Michael and Belongie, Serge and Hays, James and Perona, Pietro and Ramanan, Deva and Doll{\'a}r, Piotr and Zitnick, C Lawrence},
booktitle={European conference on computer vision},
pages={740--755},
year={2014},
organization={Springer}
}
Results on COCO val2017 without multi-scale test
Arch | Input Size | AP | AP50 | AP75 | AR | AR50 | ckpt | log |
---|---|---|---|---|---|---|---|---|
HigherHRNet-w32 | 512x512 | 0.677 | 0.870 | 0.738 | 0.723 | 0.890 | ckpt | log |
HigherHRNet-w32 | 640x640 | 0.686 | 0.871 | 0.747 | 0.733 | 0.898 | ckpt | log |
HigherHRNet-w48 | 512x512 | 0.686 | 0.873 | 0.741 | 0.731 | 0.892 | ckpt | log |
Results on COCO val2017 with multi-scale test. 3 default scales ([2, 1, 0.5]) are used
Arch | Input Size | AP | AP50 | AP75 | AR | AR50 | ckpt | log |
---|---|---|---|---|---|---|---|---|
HigherHRNet-w32 | 512x512 | 0.706 | 0.881 | 0.771 | 0.747 | 0.901 | ckpt | log |
HigherHRNet-w32 | 640x640 | 0.706 | 0.880 | 0.770 | 0.749 | 0.902 | ckpt | log |
HigherHRNet-w48 | 512x512 | 0.716 | 0.884 | 0.775 | 0.755 | 0.901 | ckpt | log |
Associative Embedding + Higherhrnet on Crowdpose¶
Associative Embedding (NIPS'2017)
@inproceedings{newell2017associative,
title={Associative embedding: End-to-end learning for joint detection and grouping},
author={Newell, Alejandro and Huang, Zhiao and Deng, Jia},
booktitle={Advances in neural information processing systems},
pages={2277--2287},
year={2017}
}
HigherHRNet (CVPR'2020)
@inproceedings{cheng2020higherhrnet,
title={HigherHRNet: Scale-Aware Representation Learning for Bottom-Up Human Pose Estimation},
author={Cheng, Bowen and Xiao, Bin and Wang, Jingdong and Shi, Honghui and Huang, Thomas S and Zhang, Lei},
booktitle={Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition},
pages={5386--5395},
year={2020}
}
CrowdPose (CVPR'2019)
@article{li2018crowdpose,
title={CrowdPose: Efficient Crowded Scenes Pose Estimation and A New Benchmark},
author={Li, Jiefeng and Wang, Can and Zhu, Hao and Mao, Yihuan and Fang, Hao-Shu and Lu, Cewu},
journal={arXiv preprint arXiv:1812.00324},
year={2018}
}
Results on CrowdPose test without multi-scale test
Arch | Input Size | AP | AP50 | AP75 | AP (E) | AP (M) | AP (H) | ckpt | log |
---|---|---|---|---|---|---|---|---|---|
HigherHRNet-w32 | 512x512 | 0.655 | 0.859 | 0.705 | 0.728 | 0.660 | 0.577 | ckpt | log |
Results on CrowdPose test with multi-scale test. 2 scales ([2, 1]) are used
Arch | Input Size | AP | AP50 | AP75 | AP (E) | AP (M) | AP (H) | ckpt | log |
---|---|---|---|---|---|---|---|---|---|
HigherHRNet-w32 | 512x512 | 0.661 | 0.864 | 0.710 | 0.742 | 0.670 | 0.566 | ckpt | log |
Associative Embedding + Higherhrnet on Coco-Wholebody¶
Associative Embedding (NIPS'2017)
@inproceedings{newell2017associative,
title={Associative embedding: End-to-end learning for joint detection and grouping},
author={Newell, Alejandro and Huang, Zhiao and Deng, Jia},
booktitle={Advances in neural information processing systems},
pages={2277--2287},
year={2017}
}
HigherHRNet (CVPR'2020)
@inproceedings{cheng2020higherhrnet,
title={HigherHRNet: Scale-Aware Representation Learning for Bottom-Up Human Pose Estimation},
author={Cheng, Bowen and Xiao, Bin and Wang, Jingdong and Shi, Honghui and Huang, Thomas S and Zhang, Lei},
booktitle={Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition},
pages={5386--5395},
year={2020}
}
COCO-WholeBody (ECCV'2020)
@inproceedings{jin2020whole,
title={Whole-Body Human Pose Estimation in the Wild},
author={Jin, Sheng and Xu, Lumin and Xu, Jin and Wang, Can and Liu, Wentao and Qian, Chen and Ouyang, Wanli and Luo, Ping},
booktitle={Proceedings of the European Conference on Computer Vision (ECCV)},
year={2020}
}
Results on COCO-WholeBody v1.0 val without multi-scale test
Arch | Input Size | Body AP | Body AR | Foot AP | Foot AR | Face AP | Face AR | Hand AP | Hand AR | Whole AP | Whole AR | ckpt | log |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|
HigherHRNet-w32+ | 512x512 | 0.590 | 0.672 | 0.185 | 0.335 | 0.676 | 0.721 | 0.212 | 0.298 | 0.401 | 0.493 | ckpt | log |
HigherHRNet-w48+ | 512x512 | 0.630 | 0.706 | 0.440 | 0.573 | 0.730 | 0.777 | 0.389 | 0.477 | 0.487 | 0.574 | ckpt | log |
Note: +
means the model is first pre-trained on original COCO dataset, and then fine-tuned on COCO-WholeBody dataset. We find this will lead to better performance.
LiteHRNet (CVPR’2021)¶
Topdown Heatmap + Litehrnet on Coco¶
LiteHRNet (CVPR'2021)
@inproceedings{Yulitehrnet21,
title={Lite-HRNet: A Lightweight High-Resolution Network},
author={Yu, Changqian and Xiao, Bin and Gao, Changxin and Yuan, Lu and Zhang, Lei and Sang, Nong and Wang, Jingdong},
booktitle={CVPR},
year={2021}
}
COCO (ECCV'2014)
@inproceedings{lin2014microsoft,
title={Microsoft coco: Common objects in context},
author={Lin, Tsung-Yi and Maire, Michael and Belongie, Serge and Hays, James and Perona, Pietro and Ramanan, Deva and Doll{\'a}r, Piotr and Zitnick, C Lawrence},
booktitle={European conference on computer vision},
pages={740--755},
year={2014},
organization={Springer}
}
Results on COCO val2017 with detector having human AP of 56.4 on COCO val2017 dataset
Arch | Input Size | AP | AP50 | AP75 | AR | AR50 | ckpt | log |
---|---|---|---|---|---|---|---|---|
LiteHRNet-18 | 256x192 | 0.643 | 0.868 | 0.720 | 0.706 | 0.912 | ckpt | log |
LiteHRNet-18 | 384x288 | 0.677 | 0.878 | 0.746 | 0.735 | 0.920 | ckpt | log |
LiteHRNet-30 | 256x192 | 0.675 | 0.881 | 0.754 | 0.736 | 0.924 | ckpt | log |
LiteHRNet-30 | 384x288 | 0.700 | 0.884 | 0.776 | 0.758 | 0.928 | ckpt | log |
Topdown Heatmap + Litehrnet on Mpii¶
LiteHRNet (CVPR'2021)
@inproceedings{Yulitehrnet21,
title={Lite-HRNet: A Lightweight High-Resolution Network},
author={Yu, Changqian and Xiao, Bin and Gao, Changxin and Yuan, Lu and Zhang, Lei and Sang, Nong and Wang, Jingdong},
booktitle={CVPR},
year={2021}
}
MPII (CVPR'2014)
@inproceedings{andriluka14cvpr,
author = {Mykhaylo Andriluka and Leonid Pishchulin and Peter Gehler and Schiele, Bernt},
title = {2D Human Pose Estimation: New Benchmark and State of the Art Analysis},
booktitle = {IEEE Conference on Computer Vision and Pattern Recognition (CVPR)},
year = {2014},
month = {June}
}
Results on MPII val set
Arch | Input Size | Mean | Mean@0.1 | ckpt | log |
---|---|---|---|---|---|
LiteHRNet-18 | 256x256 | 0.859 | 0.260 | ckpt | log |
LiteHRNet-30 | 256x256 | 0.869 | 0.271 | ckpt | log |
Topdown Heatmap + Litehrnet + Coco + Wholebody on Coco_wholebody_hand¶
LiteHRNet (CVPR'2021)
@inproceedings{Yulitehrnet21,
title={Lite-HRNet: A Lightweight High-Resolution Network},
author={Yu, Changqian and Xiao, Bin and Gao, Changxin and Yuan, Lu and Zhang, Lei and Sang, Nong and Wang, Jingdong},
booktitle={CVPR},
year={2021}
}
COCO-WholeBody-Hand (ECCV'2020)
@inproceedings{jin2020whole,
title={Whole-Body Human Pose Estimation in the Wild},
author={Jin, Sheng and Xu, Lumin and Xu, Jin and Wang, Can and Liu, Wentao and Qian, Chen and Ouyang, Wanli and Luo, Ping},
booktitle={Proceedings of the European Conference on Computer Vision (ECCV)},
year={2020}
}
Results on COCO-WholeBody-Hand val set
Arch | Input Size | PCK@0.2 | AUC | EPE | ckpt | log |
---|---|---|---|---|---|---|
LiteHRNet-18 | 256x256 | 0.795 | 0.830 | 4.77 | ckpt | log |
RSN (ECCV’2020)¶
Topdown Heatmap + RSN on Coco¶
RSN (ECCV'2020)
@misc{cai2020learning,
title={Learning Delicate Local Representations for Multi-Person Pose Estimation},
author={Yuanhao Cai and Zhicheng Wang and Zhengxiong Luo and Binyi Yin and Angang Du and Haoqian Wang and Xinyu Zhou and Erjin Zhou and Xiangyu Zhang and Jian Sun},
year={2020},
eprint={2003.04030},
archivePrefix={arXiv},
primaryClass={cs.CV}
}
COCO (ECCV'2014)
@inproceedings{lin2014microsoft,
title={Microsoft coco: Common objects in context},
author={Lin, Tsung-Yi and Maire, Michael and Belongie, Serge and Hays, James and Perona, Pietro and Ramanan, Deva and Doll{\'a}r, Piotr and Zitnick, C Lawrence},
booktitle={European conference on computer vision},
pages={740--755},
year={2014},
organization={Springer}
}
Results on COCO val2017 with detector having human AP of 56.4 on COCO val2017 dataset
Arch | Input Size | AP | AP50 | AP75 | AR | AR50 | ckpt | log |
---|---|---|---|---|---|---|---|---|
rsn_18 | 256x192 | 0.704 | 0.887 | 0.779 | 0.771 | 0.926 | ckpt | log |
rsn_50 | 256x192 | 0.723 | 0.896 | 0.800 | 0.788 | 0.934 | ckpt | log |
2xrsn_50 | 256x192 | 0.745 | 0.899 | 0.818 | 0.809 | 0.939 | ckpt | log |
3xrsn_50 | 256x192 | 0.750 | 0.900 | 0.823 | 0.813 | 0.940 | ckpt | log |
PoseWarper (NeurIPS’2019)¶
Posewarper + Hrnet + Posetrack18 on Posetrack18¶
PoseWarper (NeurIPS'2019)
@inproceedings{NIPS2019_gberta,
title = {Learning Temporal Pose Estimation from Sparsely Labeled Videos},
author = {Bertasius, Gedas and Feichtenhofer, Christoph, and Tran, Du and Shi, Jianbo, and Torresani, Lorenzo},
booktitle = {Advances in Neural Information Processing Systems 33},
year = {2019},
}
HRNet (CVPR'2019)
@inproceedings{sun2019deep,
title={Deep high-resolution representation learning for human pose estimation},
author={Sun, Ke and Xiao, Bin and Liu, Dong and Wang, Jingdong},
booktitle={Proceedings of the IEEE conference on computer vision and pattern recognition},
pages={5693--5703},
year={2019}
}
PoseTrack18 (CVPR'2018)
@inproceedings{andriluka2018posetrack,
title={Posetrack: A benchmark for human pose estimation and tracking},
author={Andriluka, Mykhaylo and Iqbal, Umar and Insafutdinov, Eldar and Pishchulin, Leonid and Milan, Anton and Gall, Juergen and Schiele, Bernt},
booktitle={Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition},
pages={5167--5176},
year={2018}
}
COCO (ECCV'2014)
@inproceedings{lin2014microsoft,
title={Microsoft coco: Common objects in context},
author={Lin, Tsung-Yi and Maire, Michael and Belongie, Serge and Hays, James and Perona, Pietro and Ramanan, Deva and Doll{\'a}r, Piotr and Zitnick, C Lawrence},
booktitle={European conference on computer vision},
pages={740--755},
year={2014},
organization={Springer}
}
Note that the training of PoseWarper can be split into two stages.
The first-stage is trained with the pre-trained checkpoint on COCO dataset, and the main backbone is fine-tuned on PoseTrack18 in a single-frame setting.
The second-stage is trained with the last checkpoint from the first stage, and the warping offsets are learned in a multi-frame setting while the backbone is frozen.
Results on PoseTrack2018 val with ground-truth bounding boxes
Arch | Input Size | Head | Shou | Elb | Wri | Hip | Knee | Ankl | Total | ckpt | log |
---|---|---|---|---|---|---|---|---|---|---|---|
pose_hrnet_w48 | 384x288 | 88.2 | 90.3 | 86.1 | 81.6 | 81.8 | 83.8 | 81.5 | 85.0 | ckpt | log |
Results on PoseTrack2018 val with precomputed human bounding boxes from PoseWarper supplementary data files from this link1.
Arch | Input Size | Head | Shou | Elb | Wri | Hip | Knee | Ankl | Total | ckpt | log |
---|---|---|---|---|---|---|---|---|---|---|---|
pose_hrnet_w48 | 384x288 | 81.8 | 85.6 | 82.7 | 77.2 | 76.8 | 79.0 | 74.4 | 79.8 | ckpt | log |
1 Please download the precomputed human bounding boxes on PoseTrack2018 val from $PoseWarper_supp_files/posetrack18_precomputed_boxes/val_boxes.json
and place it here: $mmpose/data/posetrack18/posetrack18_precomputed_boxes/val_boxes.json
to be consistent with the config. Please refer to DATA Preparation for more detail about data preparation.
MSPN (ArXiv’2019)¶
Topdown Heatmap + MSPN on Coco¶
MSPN (ArXiv'2019)
@article{li2019rethinking,
title={Rethinking on Multi-Stage Networks for Human Pose Estimation},
author={Li, Wenbo and Wang, Zhicheng and Yin, Binyi and Peng, Qixiang and Du, Yuming and Xiao, Tianzi and Yu, Gang and Lu, Hongtao and Wei, Yichen and Sun, Jian},
journal={arXiv preprint arXiv:1901.00148},
year={2019}
}
COCO (ECCV'2014)
@inproceedings{lin2014microsoft,
title={Microsoft coco: Common objects in context},
author={Lin, Tsung-Yi and Maire, Michael and Belongie, Serge and Hays, James and Perona, Pietro and Ramanan, Deva and Doll{\'a}r, Piotr and Zitnick, C Lawrence},
booktitle={European conference on computer vision},
pages={740--755},
year={2014},
organization={Springer}
}
Results on COCO val2017 with detector having human AP of 56.4 on COCO val2017 dataset
Arch | Input Size | AP | AP50 | AP75 | AR | AR50 | ckpt | log |
---|---|---|---|---|---|---|---|---|
mspn_50 | 256x192 | 0.723 | 0.895 | 0.794 | 0.788 | 0.933 | ckpt | log |
2xmspn_50 | 256x192 | 0.754 | 0.903 | 0.825 | 0.815 | 0.941 | ckpt | log |
3xmspn_50 | 256x192 | 0.758 | 0.904 | 0.830 | 0.821 | 0.943 | ckpt | log |
4xmspn_50 | 256x192 | 0.764 | 0.906 | 0.835 | 0.826 | 0.944 | ckpt | log |
HRNet (CVPR’2019)¶
Topdown Heatmap + Hrnet on Animalpose¶
HRNet (CVPR'2019)
@inproceedings{sun2019deep,
title={Deep high-resolution representation learning for human pose estimation},
author={Sun, Ke and Xiao, Bin and Liu, Dong and Wang, Jingdong},
booktitle={Proceedings of the IEEE conference on computer vision and pattern recognition},
pages={5693--5703},
year={2019}
}
Animal-Pose (ICCV'2019)
@InProceedings{Cao_2019_ICCV,
author = {Cao, Jinkun and Tang, Hongyang and Fang, Hao-Shu and Shen, Xiaoyong and Lu, Cewu and Tai, Yu-Wing},
title = {Cross-Domain Adaptation for Animal Pose Estimation},
booktitle = {The IEEE International Conference on Computer Vision (ICCV)},
month = {October},
year = {2019}
}
Results on AnimalPose validation set (1117 instances)
Arch | Input Size | AP | AP50 | AP75 | AR | AR50 | ckpt | log |
---|---|---|---|---|---|---|---|---|
pose_hrnet_w32 | 256x256 | 0.736 | 0.959 | 0.832 | 0.775 | 0.966 | ckpt | log |
pose_hrnet_w48 | 256x256 | 0.737 | 0.959 | 0.823 | 0.778 | 0.962 | ckpt | log |
Topdown Heatmap + Hrnet on Ap10k¶
HRNet (CVPR'2019)
@inproceedings{sun2019deep,
title={Deep high-resolution representation learning for human pose estimation},
author={Sun, Ke and Xiao, Bin and Liu, Dong and Wang, Jingdong},
booktitle={Proceedings of the IEEE conference on computer vision and pattern recognition},
pages={5693--5703},
year={2019}
}
AP-10K (NeurIPS'2021)
@misc{yu2021ap10k,
title={AP-10K: A Benchmark for Animal Pose Estimation in the Wild},
author={Hang Yu and Yufei Xu and Jing Zhang and Wei Zhao and Ziyu Guan and Dacheng Tao},
year={2021},
eprint={2108.12617},
archivePrefix={arXiv},
primaryClass={cs.CV}
}
Results on AP-10K validation set
Arch | Input Size | AP | AP50 | AP75 | APM | APL | ckpt | log |
---|---|---|---|---|---|---|---|---|
pose_hrnet_w32 | 256x256 | 0.738 | 0.958 | 0.808 | 0.592 | 0.743 | ckpt | log |
pose_hrnet_w48 | 256x256 | 0.744 | 0.959 | 0.807 | 0.589 | 0.748 | ckpt | log |
Topdown Heatmap + Hrnet on Atrw¶
HRNet (CVPR'2019)
@inproceedings{sun2019deep,
title={Deep high-resolution representation learning for human pose estimation},
author={Sun, Ke and Xiao, Bin and Liu, Dong and Wang, Jingdong},
booktitle={Proceedings of the IEEE conference on computer vision and pattern recognition},
pages={5693--5703},
year={2019}
}
ATRW (ACM MM'2020)
@inproceedings{li2020atrw,
title={ATRW: A Benchmark for Amur Tiger Re-identification in the Wild},
author={Li, Shuyuan and Li, Jianguo and Tang, Hanlin and Qian, Rui and Lin, Weiyao},
booktitle={Proceedings of the 28th ACM International Conference on Multimedia},
pages={2590--2598},
year={2020}
}
Results on ATRW validation set
Arch | Input Size | AP | AP50 | AP75 | AR | AR50 | ckpt | log |
---|---|---|---|---|---|---|---|---|
pose_hrnet_w32 | 256x256 | 0.912 | 0.973 | 0.959 | 0.938 | 0.985 | ckpt | log |
pose_hrnet_w48 | 256x256 | 0.911 | 0.972 | 0.946 | 0.937 | 0.985 | ckpt | log |
Topdown Heatmap + Hrnet on Horse10¶
HRNet (CVPR'2019)
@inproceedings{sun2019deep,
title={Deep high-resolution representation learning for human pose estimation},
author={Sun, Ke and Xiao, Bin and Liu, Dong and Wang, Jingdong},
booktitle={Proceedings of the IEEE conference on computer vision and pattern recognition},
pages={5693--5703},
year={2019}
}
Horse-10 (WACV'2021)
@inproceedings{mathis2021pretraining,
title={Pretraining boosts out-of-domain robustness for pose estimation},
author={Mathis, Alexander and Biasi, Thomas and Schneider, Steffen and Yuksekgonul, Mert and Rogers, Byron and Bethge, Matthias and Mathis, Mackenzie W},
booktitle={Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision},
pages={1859--1868},
year={2021}
}
Results on Horse-10 test set
Set | Arch | Input Size | PCK@0.3 | NME | ckpt | log |
---|---|---|---|---|---|---|
split1 | pose_hrnet_w32 | 256x256 | 0.951 | 0.122 | ckpt | log |
split2 | pose_hrnet_w32 | 256x256 | 0.949 | 0.116 | ckpt | log |
split3 | pose_hrnet_w32 | 256x256 | 0.939 | 0.153 | ckpt | log |
split1 | pose_hrnet_w48 | 256x256 | 0.973 | 0.095 | ckpt | log |
split2 | pose_hrnet_w48 | 256x256 | 0.969 | 0.101 | ckpt | log |
split3 | pose_hrnet_w48 | 256x256 | 0.961 | 0.128 | ckpt | log |
Topdown Heatmap + Hrnet on Macaque¶
HRNet (CVPR'2019)
@inproceedings{sun2019deep,
title={Deep high-resolution representation learning for human pose estimation},
author={Sun, Ke and Xiao, Bin and Liu, Dong and Wang, Jingdong},
booktitle={Proceedings of the IEEE conference on computer vision and pattern recognition},
pages={5693--5703},
year={2019}
}
MacaquePose (bioRxiv'2020)
@article{labuguen2020macaquepose,
title={MacaquePose: A novel ‘in the wild’macaque monkey pose dataset for markerless motion capture},
author={Labuguen, Rollyn and Matsumoto, Jumpei and Negrete, Salvador and Nishimaru, Hiroshi and Nishijo, Hisao and Takada, Masahiko and Go, Yasuhiro and Inoue, Ken-ichi and Shibata, Tomohiro},
journal={bioRxiv},
year={2020},
publisher={Cold Spring Harbor Laboratory}
}
Results on MacaquePose with ground-truth detection bounding boxes
Arch | Input Size | AP | AP50 | AP75 | AR | AR50 | ckpt | log |
---|---|---|---|---|---|---|---|---|
pose_hrnet_w32 | 256x192 | 0.814 | 0.953 | 0.918 | 0.851 | 0.969 | ckpt | log |
pose_hrnet_w48 | 256x192 | 0.818 | 0.963 | 0.917 | 0.855 | 0.971 | ckpt | log |
Associative Embedding + Hrnet on Aic¶
Associative Embedding (NIPS'2017)
@inproceedings{newell2017associative,
title={Associative embedding: End-to-end learning for joint detection and grouping},
author={Newell, Alejandro and Huang, Zhiao and Deng, Jia},
booktitle={Advances in neural information processing systems},
pages={2277--2287},
year={2017}
}
HRNet (CVPR'2019)
@inproceedings{sun2019deep,
title={Deep high-resolution representation learning for human pose estimation},
author={Sun, Ke and Xiao, Bin and Liu, Dong and Wang, Jingdong},
booktitle={Proceedings of the IEEE conference on computer vision and pattern recognition},
pages={5693--5703},
year={2019}
}
AI Challenger (ArXiv'2017)
@article{wu2017ai,
title={Ai challenger: A large-scale dataset for going deeper in image understanding},
author={Wu, Jiahong and Zheng, He and Zhao, Bo and Li, Yixin and Yan, Baoming and Liang, Rui and Wang, Wenjia and Zhou, Shipei and Lin, Guosen and Fu, Yanwei and others},
journal={arXiv preprint arXiv:1711.06475},
year={2017}
}
Results on AIC validation set without multi-scale test
Arch | Input Size | AP | AP50 | AP75 | AR | AR50 | ckpt | log |
---|---|---|---|---|---|---|---|---|
HRNet-w32 | 512x512 | 0.303 | 0.697 | 0.225 | 0.373 | 0.755 | ckpt | log |
Results on AIC validation set with multi-scale test. 3 default scales ([2, 1, 0.5]) are used
Arch | Input Size | AP | AP50 | AP75 | AR | AR50 | ckpt | log |
---|---|---|---|---|---|---|---|---|
HRNet-w32 | 512x512 | 0.318 | 0.717 | 0.246 | 0.379 | 0.764 | ckpt | log |
Topdown Heatmap + Hrnet on Aic¶
HRNet (CVPR'2019)
@inproceedings{sun2019deep,
title={Deep high-resolution representation learning for human pose estimation},
author={Sun, Ke and Xiao, Bin and Liu, Dong and Wang, Jingdong},
booktitle={Proceedings of the IEEE conference on computer vision and pattern recognition},
pages={5693--5703},
year={2019}
}
AI Challenger (ArXiv'2017)
@article{wu2017ai,
title={Ai challenger: A large-scale dataset for going deeper in image understanding},
author={Wu, Jiahong and Zheng, He and Zhao, Bo and Li, Yixin and Yan, Baoming and Liang, Rui and Wang, Wenjia and Zhou, Shipei and Lin, Guosen and Fu, Yanwei and others},
journal={arXiv preprint arXiv:1711.06475},
year={2017}
}
Results on AIC val set with ground-truth bounding boxes
Arch | Input Size | AP | AP50 | AP75 | AR | AR50 | ckpt | log |
---|---|---|---|---|---|---|---|---|
pose_hrnet_w32 | 256x192 | 0.323 | 0.762 | 0.219 | 0.366 | 0.789 | ckpt | log |
Associative Embedding + Hrnet on Coco¶
Associative Embedding (NIPS'2017)
@inproceedings{newell2017associative,
title={Associative embedding: End-to-end learning for joint detection and grouping},
author={Newell, Alejandro and Huang, Zhiao and Deng, Jia},
booktitle={Advances in neural information processing systems},
pages={2277--2287},
year={2017}
}
HRNet (CVPR'2019)
@inproceedings{sun2019deep,
title={Deep high-resolution representation learning for human pose estimation},
author={Sun, Ke and Xiao, Bin and Liu, Dong and Wang, Jingdong},
booktitle={Proceedings of the IEEE conference on computer vision and pattern recognition},
pages={5693--5703},
year={2019}
}
COCO (ECCV'2014)
@inproceedings{lin2014microsoft,
title={Microsoft coco: Common objects in context},
author={Lin, Tsung-Yi and Maire, Michael and Belongie, Serge and Hays, James and Perona, Pietro and Ramanan, Deva and Doll{\'a}r, Piotr and Zitnick, C Lawrence},
booktitle={European conference on computer vision},
pages={740--755},
year={2014},
organization={Springer}
}
Results on COCO val2017 without multi-scale test
Arch | Input Size | AP | AP50 | AP75 | AR | AR50 | ckpt | log |
---|---|---|---|---|---|---|---|---|
HRNet-w32 | 512x512 | 0.654 | 0.863 | 0.720 | 0.710 | 0.892 | ckpt | log |
HRNet-w48 | 512x512 | 0.665 | 0.860 | 0.727 | 0.716 | 0.889 | ckpt | log |
Results on COCO val2017 with multi-scale test. 3 default scales ([2, 1, 0.5]) are used
Arch | Input Size | AP | AP50 | AP75 | AR | AR50 | ckpt | log |
---|---|---|---|---|---|---|---|---|
HRNet-w32 | 512x512 | 0.698 | 0.877 | 0.760 | 0.748 | 0.907 | ckpt | log |
HRNet-w48 | 512x512 | 0.712 | 0.880 | 0.771 | 0.757 | 0.909 | ckpt | log |
Associative Embedding + Hrnet + Udp on Coco¶
Associative Embedding (NIPS'2017)
@inproceedings{newell2017associative,
title={Associative embedding: End-to-end learning for joint detection and grouping},
author={Newell, Alejandro and Huang, Zhiao and Deng, Jia},
booktitle={Advances in neural information processing systems},
pages={2277--2287},
year={2017}
}
HRNet (CVPR'2019)
@inproceedings{sun2019deep,
title={Deep high-resolution representation learning for human pose estimation},
author={Sun, Ke and Xiao, Bin and Liu, Dong and Wang, Jingdong},
booktitle={Proceedings of the IEEE conference on computer vision and pattern recognition},
pages={5693--5703},
year={2019}
}
UDP (CVPR'2020)
@InProceedings{Huang_2020_CVPR,
author = {Huang, Junjie and Zhu, Zheng and Guo, Feng and Huang, Guan},
title = {The Devil Is in the Details: Delving Into Unbiased Data Processing for Human Pose Estimation},
booktitle = {The IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)},
month = {June},
year = {2020}
}
COCO (ECCV'2014)
@inproceedings{lin2014microsoft,
title={Microsoft coco: Common objects in context},
author={Lin, Tsung-Yi and Maire, Michael and Belongie, Serge and Hays, James and Perona, Pietro and Ramanan, Deva and Doll{\'a}r, Piotr and Zitnick, C Lawrence},
booktitle={European conference on computer vision},
pages={740--755},
year={2014},
organization={Springer}
}
Results on COCO val2017 without multi-scale test
Arch | Input Size | AP | AP50 | AP75 | AR | AR50 | ckpt | log |
---|---|---|---|---|---|---|---|---|
HRNet-w32_udp | 512x512 | 0.671 | 0.863 | 0.729 | 0.717 | 0.889 | ckpt | log |
HRNet-w48_udp | 512x512 | 0.681 | 0.872 | 0.741 | 0.725 | 0.892 | ckpt | log |
Topdown Heatmap + Hrnet + Augmentation on Coco¶
HRNet (CVPR'2019)
@inproceedings{sun2019deep,
title={Deep high-resolution representation learning for human pose estimation},
author={Sun, Ke and Xiao, Bin and Liu, Dong and Wang, Jingdong},
booktitle={Proceedings of the IEEE conference on computer vision and pattern recognition},
pages={5693--5703},
year={2019}
}
Albumentations (Information'2020)
@article{buslaev2020albumentations,
title={Albumentations: fast and flexible image augmentations},
author={Buslaev, Alexander and Iglovikov, Vladimir I and Khvedchenya, Eugene and Parinov, Alex and Druzhinin, Mikhail and Kalinin, Alexandr A},
journal={Information},
volume={11},
number={2},
pages={125},
year={2020},
publisher={Multidisciplinary Digital Publishing Institute}
}
COCO (ECCV'2014)
@inproceedings{lin2014microsoft,
title={Microsoft coco: Common objects in context},
author={Lin, Tsung-Yi and Maire, Michael and Belongie, Serge and Hays, James and Perona, Pietro and Ramanan, Deva and Doll{\'a}r, Piotr and Zitnick, C Lawrence},
booktitle={European conference on computer vision},
pages={740--755},
year={2014},
organization={Springer}
}
Results on COCO val2017 with detector having human AP of 56.4 on COCO val2017 dataset
Arch | Input Size | AP | AP50 | AP75 | AR | AR50 | ckpt | log |
---|---|---|---|---|---|---|---|---|
coarsedropout | 256x192 | 0.753 | 0.908 | 0.822 | 0.806 | 0.946 | ckpt | log |
gridmask | 256x192 | 0.752 | 0.906 | 0.825 | 0.804 | 0.943 | ckpt | log |
photometric | 256x192 | 0.753 | 0.909 | 0.825 | 0.805 | 0.943 | ckpt | log |
Topdown Heatmap + Hrnet + Fp16 on Coco¶
HRNet (CVPR'2019)
@inproceedings{sun2019deep,
title={Deep high-resolution representation learning for human pose estimation},
author={Sun, Ke and Xiao, Bin and Liu, Dong and Wang, Jingdong},
booktitle={Proceedings of the IEEE conference on computer vision and pattern recognition},
pages={5693--5703},
year={2019}
}
FP16 (ArXiv'2017)
@article{micikevicius2017mixed,
title={Mixed precision training},
author={Micikevicius, Paulius and Narang, Sharan and Alben, Jonah and Diamos, Gregory and Elsen, Erich and Garcia, David and Ginsburg, Boris and Houston, Michael and Kuchaiev, Oleksii and Venkatesh, Ganesh and others},
journal={arXiv preprint arXiv:1710.03740},
year={2017}
}
COCO (ECCV'2014)
@inproceedings{lin2014microsoft,
title={Microsoft coco: Common objects in context},
author={Lin, Tsung-Yi and Maire, Michael and Belongie, Serge and Hays, James and Perona, Pietro and Ramanan, Deva and Doll{\'a}r, Piotr and Zitnick, C Lawrence},
booktitle={European conference on computer vision},
pages={740--755},
year={2014},
organization={Springer}
}
Results on COCO val2017 with detector having human AP of 56.4 on COCO val2017 dataset
Arch | Input Size | AP | AP50 | AP75 | AR | AR50 | ckpt | log |
---|---|---|---|---|---|---|---|---|
pose_hrnet_w32_fp16 | 256x192 | 0.746 | 0.905 | 0.88 | 0.800 | 0.943 | ckpt | log |
Topdown Heatmap + Hrnet on Coco¶
HRNet (CVPR'2019)
@inproceedings{sun2019deep,
title={Deep high-resolution representation learning for human pose estimation},
author={Sun, Ke and Xiao, Bin and Liu, Dong and Wang, Jingdong},
booktitle={Proceedings of the IEEE conference on computer vision and pattern recognition},
pages={5693--5703},
year={2019}
}
COCO (ECCV'2014)
@inproceedings{lin2014microsoft,
title={Microsoft coco: Common objects in context},
author={Lin, Tsung-Yi and Maire, Michael and Belongie, Serge and Hays, James and Perona, Pietro and Ramanan, Deva and Doll{\'a}r, Piotr and Zitnick, C Lawrence},
booktitle={European conference on computer vision},
pages={740--755},
year={2014},
organization={Springer}
}
Results on COCO val2017 with detector having human AP of 56.4 on COCO val2017 dataset
Arch | Input Size | AP | AP50 | AP75 | AR | AR50 | ckpt | log |
---|---|---|---|---|---|---|---|---|
pose_hrnet_w32 | 256x192 | 0.746 | 0.904 | 0.819 | 0.799 | 0.942 | ckpt | log |
pose_hrnet_w32 | 384x288 | 0.760 | 0.906 | 0.829 | 0.810 | 0.943 | ckpt | log |
pose_hrnet_w48 | 256x192 | 0.756 | 0.907 | 0.825 | 0.806 | 0.942 | ckpt | log |
pose_hrnet_w48 | 384x288 | 0.767 | 0.910 | 0.831 | 0.816 | 0.946 | ckpt | log |
Topdown Heatmap + Hrnet + Udp on Coco¶
HRNet (CVPR'2019)
@inproceedings{sun2019deep,
title={Deep high-resolution representation learning for human pose estimation},
author={Sun, Ke and Xiao, Bin and Liu, Dong and Wang, Jingdong},
booktitle={Proceedings of the IEEE conference on computer vision and pattern recognition},
pages={5693--5703},
year={2019}
}
UDP (CVPR'2020)
@InProceedings{Huang_2020_CVPR,
author = {Huang, Junjie and Zhu, Zheng and Guo, Feng and Huang, Guan},
title = {The Devil Is in the Details: Delving Into Unbiased Data Processing for Human Pose Estimation},
booktitle = {The IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)},
month = {June},
year = {2020}
}
COCO (ECCV'2014)
@inproceedings{lin2014microsoft,
title={Microsoft coco: Common objects in context},
author={Lin, Tsung-Yi and Maire, Michael and Belongie, Serge and Hays, James and Perona, Pietro and Ramanan, Deva and Doll{\'a}r, Piotr and Zitnick, C Lawrence},
booktitle={European conference on computer vision},
pages={740--755},
year={2014},
organization={Springer}
}
Results on COCO val2017 with detector having human AP of 56.4 on COCO val2017 dataset
Arch | Input Size | AP | AP50 | AP75 | AR | AR50 | ckpt | log |
---|---|---|---|---|---|---|---|---|
pose_hrnet_w32_udp | 256x192 | 0.760 | 0.907 | 0.827 | 0.811 | 0.945 | ckpt | log |
pose_hrnet_w32_udp | 384x288 | 0.769 | 0.908 | 0.833 | 0.817 | 0.944 | ckpt | log |
pose_hrnet_w48_udp | 256x192 | 0.767 | 0.906 | 0.834 | 0.817 | 0.945 | ckpt | log |
pose_hrnet_w48_udp | 384x288 | 0.772 | 0.910 | 0.835 | 0.820 | 0.945 | ckpt | log |
pose_hrnet_w32_udp_regress | 256x192 | 0.758 | 0.908 | 0.823 | 0.812 | 0.943 | ckpt | log |
Note that, UDP also adopts the unbiased encoding/decoding algorithm of DARK.
Topdown Heatmap + Hrnet + Dark on Coco¶
HRNet (CVPR'2019)
@inproceedings{sun2019deep,
title={Deep high-resolution representation learning for human pose estimation},
author={Sun, Ke and Xiao, Bin and Liu, Dong and Wang, Jingdong},
booktitle={Proceedings of the IEEE conference on computer vision and pattern recognition},
pages={5693--5703},
year={2019}
}
DarkPose (CVPR'2020)
@inproceedings{zhang2020distribution,
title={Distribution-aware coordinate representation for human pose estimation},
author={Zhang, Feng and Zhu, Xiatian and Dai, Hanbin and Ye, Mao and Zhu, Ce},
booktitle={Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition},
pages={7093--7102},
year={2020}
}
COCO (ECCV'2014)
@inproceedings{lin2014microsoft,
title={Microsoft coco: Common objects in context},
author={Lin, Tsung-Yi and Maire, Michael and Belongie, Serge and Hays, James and Perona, Pietro and Ramanan, Deva and Doll{\'a}r, Piotr and Zitnick, C Lawrence},
booktitle={European conference on computer vision},
pages={740--755},
year={2014},
organization={Springer}
}
Results on COCO val2017 with detector having human AP of 56.4 on COCO val2017 dataset
Arch | Input Size | AP | AP50 | AP75 | AR | AR50 | ckpt | log |
---|---|---|---|---|---|---|---|---|
pose_hrnet_w32_dark | 256x192 | 0.757 | 0.907 | 0.823 | 0.808 | 0.943 | ckpt | log |
pose_hrnet_w32_dark | 384x288 | 0.766 | 0.907 | 0.831 | 0.815 | 0.943 | ckpt | log |
pose_hrnet_w48_dark | 256x192 | 0.764 | 0.907 | 0.830 | 0.814 | 0.943 | ckpt | log |
pose_hrnet_w48_dark | 384x288 | 0.772 | 0.910 | 0.836 | 0.820 | 0.946 | ckpt | log |
Topdown Heatmap + Hrnet on Crowdpose¶
HRNet (CVPR'2019)
@inproceedings{sun2019deep,
title={Deep high-resolution representation learning for human pose estimation},
author={Sun, Ke and Xiao, Bin and Liu, Dong and Wang, Jingdong},
booktitle={Proceedings of the IEEE conference on computer vision and pattern recognition},
pages={5693--5703},
year={2019}
}
CrowdPose (CVPR'2019)
@article{li2018crowdpose,
title={CrowdPose: Efficient Crowded Scenes Pose Estimation and A New Benchmark},
author={Li, Jiefeng and Wang, Can and Zhu, Hao and Mao, Yihuan and Fang, Hao-Shu and Lu, Cewu},
journal={arXiv preprint arXiv:1812.00324},
year={2018}
}
Results on CrowdPose test with YOLOv3 human detector
Arch | Input Size | AP | AP50 | AP75 | AP (E) | AP (M) | AP (H) | ckpt | log |
---|---|---|---|---|---|---|---|---|---|
pose_hrnet_w32 | 256x192 | 0.675 | 0.825 | 0.729 | 0.770 | 0.687 | 0.553 | ckpt | log |
Topdown Heatmap + Hrnet on H36m¶
HRNet (CVPR'2019)
@inproceedings{sun2019deep,
title={Deep high-resolution representation learning for human pose estimation},
author={Sun, Ke and Xiao, Bin and Liu, Dong and Wang, Jingdong},
booktitle={Proceedings of the IEEE conference on computer vision and pattern recognition},
pages={5693--5703},
year={2019}
}
Human3.6M (TPAMI'2014)
@article{h36m_pami,
author = {Ionescu, Catalin and Papava, Dragos and Olaru, Vlad and Sminchisescu, Cristian},
title = {Human3.6M: Large Scale Datasets and Predictive Methods for 3D Human Sensing in Natural Environments},
journal = {IEEE Transactions on Pattern Analysis and Machine Intelligence},
publisher = {IEEE Computer Society},
volume = {36},
number = {7},
pages = {1325-1339},
month = {jul},
year = {2014}
}
Results on Human3.6M test set with ground truth 2D detections
Arch | Input Size | EPE | PCK | ckpt | log |
---|---|---|---|---|---|
pose_hrnet_w32 | 256x256 | 9.43 | 0.911 | ckpt | log |
pose_hrnet_w48 | 256x256 | 7.36 | 0.932 | ckpt | log |
Associative Embedding + Hrnet on MHP¶
Associative Embedding (NIPS'2017)
@inproceedings{newell2017associative,
title={Associative embedding: End-to-end learning for joint detection and grouping},
author={Newell, Alejandro and Huang, Zhiao and Deng, Jia},
booktitle={Advances in neural information processing systems},
pages={2277--2287},
year={2017}
}
HRNet (CVPR'2019)
@inproceedings{sun2019deep,
title={Deep high-resolution representation learning for human pose estimation},
author={Sun, Ke and Xiao, Bin and Liu, Dong and Wang, Jingdong},
booktitle={Proceedings of the IEEE conference on computer vision and pattern recognition},
pages={5693--5703},
year={2019}
}
MHP (ACM MM'2018)
@inproceedings{zhao2018understanding,
title={Understanding humans in crowded scenes: Deep nested adversarial learning and a new benchmark for multi-human parsing},
author={Zhao, Jian and Li, Jianshu and Cheng, Yu and Sim, Terence and Yan, Shuicheng and Feng, Jiashi},
booktitle={Proceedings of the 26th ACM international conference on Multimedia},
pages={792--800},
year={2018}
}
Results on MHP v2.0 validation set without multi-scale test
Arch | Input Size | AP | AP50 | AP75 | AR | AR50 | ckpt | log |
---|---|---|---|---|---|---|---|---|
HRNet-w48 | 512x512 | 0.583 | 0.895 | 0.666 | 0.656 | 0.931 | ckpt | log |
Results on MHP v2.0 validation set with multi-scale test. 3 default scales ([2, 1, 0.5]) are used
Arch | Input Size | AP | AP50 | AP75 | AR | AR50 | ckpt | log |
---|---|---|---|---|---|---|---|---|
HRNet-w48 | 512x512 | 0.592 | 0.898 | 0.673 | 0.664 | 0.932 | ckpt | log |
Topdown Heatmap + Hrnet + Dark on Mpii¶
HRNet (CVPR'2019)
@inproceedings{sun2019deep,
title={Deep high-resolution representation learning for human pose estimation},
author={Sun, Ke and Xiao, Bin and Liu, Dong and Wang, Jingdong},
booktitle={Proceedings of the IEEE conference on computer vision and pattern recognition},
pages={5693--5703},
year={2019}
}
DarkPose (CVPR'2020)
@inproceedings{zhang2020distribution,
title={Distribution-aware coordinate representation for human pose estimation},
author={Zhang, Feng and Zhu, Xiatian and Dai, Hanbin and Ye, Mao and Zhu, Ce},
booktitle={Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition},
pages={7093--7102},
year={2020}
}
MPII (CVPR'2014)
@inproceedings{andriluka14cvpr,
author = {Mykhaylo Andriluka and Leonid Pishchulin and Peter Gehler and Schiele, Bernt},
title = {2D Human Pose Estimation: New Benchmark and State of the Art Analysis},
booktitle = {IEEE Conference on Computer Vision and Pattern Recognition (CVPR)},
year = {2014},
month = {June}
}
Results on MPII val set
Arch | Input Size | Mean | Mean@0.1 | ckpt | log |
---|---|---|---|---|---|
pose_hrnet_w32_dark | 256x256 | 0.904 | 0.354 | ckpt | log |
pose_hrnet_w48_dark | 256x256 | 0.905 | 0.360 | ckpt | log |
Topdown Heatmap + Hrnet on Mpii¶
HRNet (CVPR'2019)
@inproceedings{sun2019deep,
title={Deep high-resolution representation learning for human pose estimation},
author={Sun, Ke and Xiao, Bin and Liu, Dong and Wang, Jingdong},
booktitle={Proceedings of the IEEE conference on computer vision and pattern recognition},
pages={5693--5703},
year={2019}
}
MPII (CVPR'2014)
@inproceedings{andriluka14cvpr,
author = {Mykhaylo Andriluka and Leonid Pishchulin and Peter Gehler and Schiele, Bernt},
title = {2D Human Pose Estimation: New Benchmark and State of the Art Analysis},
booktitle = {IEEE Conference on Computer Vision and Pattern Recognition (CVPR)},
year = {2014},
month = {June}
}
Results on MPII val set
Arch | Input Size | Mean | Mean@0.1 | ckpt | log |
---|---|---|---|---|---|
pose_hrnet_w32 | 256x256 | 0.900 | 0.334 | ckpt | log |
pose_hrnet_w48 | 256x256 | 0.901 | 0.337 | ckpt | log |
Topdown Heatmap + Hrnet on Ochuman¶
HRNet (CVPR'2019)
@inproceedings{sun2019deep,
title={Deep high-resolution representation learning for human pose estimation},
author={Sun, Ke and Xiao, Bin and Liu, Dong and Wang, Jingdong},
booktitle={Proceedings of the IEEE conference on computer vision and pattern recognition},
pages={5693--5703},
year={2019}
}
OCHuman (CVPR'2019)
@inproceedings{zhang2019pose2seg,
title={Pose2seg: Detection free human instance segmentation},
author={Zhang, Song-Hai and Li, Ruilong and Dong, Xin and Rosin, Paul and Cai, Zixi and Han, Xi and Yang, Dingcheng and Huang, Haozhi and Hu, Shi-Min},
booktitle={Proceedings of the IEEE conference on computer vision and pattern recognition},
pages={889--898},
year={2019}
}
Results on OCHuman test dataset with ground-truth bounding boxes
Following the common setting, the models are trained on COCO train dataset, and evaluate on OCHuman dataset.
Arch | Input Size | AP | AP50 | AP75 | AR | AR50 | ckpt | log |
---|---|---|---|---|---|---|---|---|
pose_hrnet_w32 | 256x192 | 0.591 | 0.748 | 0.641 | 0.631 | 0.775 | ckpt | log |
pose_hrnet_w32 | 384x288 | 0.606 | 0.748 | 0.650 | 0.647 | 0.776 | ckpt | log |
pose_hrnet_w48 | 256x192 | 0.611 | 0.752 | 0.663 | 0.648 | 0.778 | ckpt | log |
pose_hrnet_w48 | 384x288 | 0.616 | 0.749 | 0.663 | 0.653 | 0.773 | ckpt | log |
Topdown Heatmap + Hrnet on Posetrack18¶
HRNet (CVPR'2019)
@inproceedings{sun2019deep,
title={Deep high-resolution representation learning for human pose estimation},
author={Sun, Ke and Xiao, Bin and Liu, Dong and Wang, Jingdong},
booktitle={Proceedings of the IEEE conference on computer vision and pattern recognition},
pages={5693--5703},
year={2019}
}
PoseTrack18 (CVPR'2018)
@inproceedings{andriluka2018posetrack,
title={Posetrack: A benchmark for human pose estimation and tracking},
author={Andriluka, Mykhaylo and Iqbal, Umar and Insafutdinov, Eldar and Pishchulin, Leonid and Milan, Anton and Gall, Juergen and Schiele, Bernt},
booktitle={Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition},
pages={5167--5176},
year={2018}
}
Results on PoseTrack2018 val with ground-truth bounding boxes
Arch | Input Size | Head | Shou | Elb | Wri | Hip | Knee | Ankl | Total | ckpt | log |
---|---|---|---|---|---|---|---|---|---|---|---|
pose_hrnet_w32 | 256x192 | 87.4 | 88.6 | 84.3 | 78.5 | 79.7 | 81.8 | 78.8 | 83.0 | ckpt | log |
pose_hrnet_w32 | 384x288 | 87.0 | 88.8 | 85.0 | 80.1 | 80.5 | 82.6 | 79.4 | 83.6 | ckpt | log |
pose_hrnet_w48 | 256x192 | 88.2 | 90.1 | 85.8 | 80.8 | 80.7 | 83.3 | 80.3 | 84.4 | ckpt | log |
pose_hrnet_w48 | 384x288 | 87.8 | 90.0 | 85.9 | 81.3 | 81.1 | 83.3 | 80.9 | 84.5 | ckpt | log |
The models are first pre-trained on COCO dataset, and then fine-tuned on PoseTrack18.
Results on PoseTrack2018 val with MMDetection pre-trained Cascade R-CNN (X-101-64x4d-FPN) human detector
Arch | Input Size | Head | Shou | Elb | Wri | Hip | Knee | Ankl | Total | ckpt | log |
---|---|---|---|---|---|---|---|---|---|---|---|
pose_hrnet_w32 | 256x192 | 78.0 | 82.9 | 79.5 | 73.8 | 76.9 | 76.6 | 70.2 | 76.9 | ckpt | log |
pose_hrnet_w32 | 384x288 | 79.9 | 83.6 | 80.4 | 74.5 | 74.8 | 76.1 | 70.5 | 77.3 | ckpt | log |
pose_hrnet_w48 | 256x192 | 80.1 | 83.4 | 80.6 | 74.8 | 74.3 | 76.8 | 70.4 | 77.4 | ckpt | log |
pose_hrnet_w48 | 384x288 | 80.2 | 83.8 | 80.9 | 75.2 | 74.7 | 76.7 | 71.7 | 77.8 | ckpt | log |
The models are first pre-trained on COCO dataset, and then fine-tuned on PoseTrack18.
Posewarper + Hrnet + Posetrack18 on Posetrack18¶
PoseWarper (NeurIPS'2019)
@inproceedings{NIPS2019_gberta,
title = {Learning Temporal Pose Estimation from Sparsely Labeled Videos},
author = {Bertasius, Gedas and Feichtenhofer, Christoph, and Tran, Du and Shi, Jianbo, and Torresani, Lorenzo},
booktitle = {Advances in Neural Information Processing Systems 33},
year = {2019},
}
HRNet (CVPR'2019)
@inproceedings{sun2019deep,
title={Deep high-resolution representation learning for human pose estimation},
author={Sun, Ke and Xiao, Bin and Liu, Dong and Wang, Jingdong},
booktitle={Proceedings of the IEEE conference on computer vision and pattern recognition},
pages={5693--5703},
year={2019}
}
PoseTrack18 (CVPR'2018)
@inproceedings{andriluka2018posetrack,
title={Posetrack: A benchmark for human pose estimation and tracking},
author={Andriluka, Mykhaylo and Iqbal, Umar and Insafutdinov, Eldar and Pishchulin, Leonid and Milan, Anton and Gall, Juergen and Schiele, Bernt},
booktitle={Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition},
pages={5167--5176},
year={2018}
}
COCO (ECCV'2014)
@inproceedings{lin2014microsoft,
title={Microsoft coco: Common objects in context},
author={Lin, Tsung-Yi and Maire, Michael and Belongie, Serge and Hays, James and Perona, Pietro and Ramanan, Deva and Doll{\'a}r, Piotr and Zitnick, C Lawrence},
booktitle={European conference on computer vision},
pages={740--755},
year={2014},
organization={Springer}
}
Note that the training of PoseWarper can be split into two stages.
The first-stage is trained with the pre-trained checkpoint on COCO dataset, and the main backbone is fine-tuned on PoseTrack18 in a single-frame setting.
The second-stage is trained with the last checkpoint from the first stage, and the warping offsets are learned in a multi-frame setting while the backbone is frozen.
Results on PoseTrack2018 val with ground-truth bounding boxes
Arch | Input Size | Head | Shou | Elb | Wri | Hip | Knee | Ankl | Total | ckpt | log |
---|---|---|---|---|---|---|---|---|---|---|---|
pose_hrnet_w48 | 384x288 | 88.2 | 90.3 | 86.1 | 81.6 | 81.8 | 83.8 | 81.5 | 85.0 | ckpt | log |
Results on PoseTrack2018 val with precomputed human bounding boxes from PoseWarper supplementary data files from this link1.
Arch | Input Size | Head | Shou | Elb | Wri | Hip | Knee | Ankl | Total | ckpt | log |
---|---|---|---|---|---|---|---|---|---|---|---|
pose_hrnet_w48 | 384x288 | 81.8 | 85.6 | 82.7 | 77.2 | 76.8 | 79.0 | 74.4 | 79.8 | ckpt | log |
1 Please download the precomputed human bounding boxes on PoseTrack2018 val from $PoseWarper_supp_files/posetrack18_precomputed_boxes/val_boxes.json
and place it here: $mmpose/data/posetrack18/posetrack18_precomputed_boxes/val_boxes.json
to be consistent with the config. Please refer to DATA Preparation for more detail about data preparation.
Associative Embedding + Hrnet on Coco-Wholebody¶
Associative Embedding (NIPS'2017)
@inproceedings{newell2017associative,
title={Associative embedding: End-to-end learning for joint detection and grouping},
author={Newell, Alejandro and Huang, Zhiao and Deng, Jia},
booktitle={Advances in neural information processing systems},
pages={2277--2287},
year={2017}
}
HRNet (CVPR'2019)
@inproceedings{sun2019deep,
title={Deep high-resolution representation learning for human pose estimation},
author={Sun, Ke and Xiao, Bin and Liu, Dong and Wang, Jingdong},
booktitle={Proceedings of the IEEE conference on computer vision and pattern recognition},
pages={5693--5703},
year={2019}
}
COCO-WholeBody (ECCV'2020)
@inproceedings{jin2020whole,
title={Whole-Body Human Pose Estimation in the Wild},
author={Jin, Sheng and Xu, Lumin and Xu, Jin and Wang, Can and Liu, Wentao and Qian, Chen and Ouyang, Wanli and Luo, Ping},
booktitle={Proceedings of the European Conference on Computer Vision (ECCV)},
year={2020}
}
Results on COCO-WholeBody v1.0 val without multi-scale test
Arch | Input Size | Body AP | Body AR | Foot AP | Foot AR | Face AP | Face AR | Hand AP | Hand AR | Whole AP | Whole AR | ckpt | log |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|
HRNet-w32+ | 512x512 | 0.551 | 0.650 | 0.271 | 0.451 | 0.564 | 0.618 | 0.159 | 0.238 | 0.342 | 0.453 | ckpt | log |
HRNet-w48+ | 512x512 | 0.592 | 0.686 | 0.443 | 0.595 | 0.619 | 0.674 | 0.347 | 0.438 | 0.422 | 0.532 | ckpt | log |
Note: +
means the model is first pre-trained on original COCO dataset, and then fine-tuned on COCO-WholeBody dataset. We find this will lead to better performance.
Topdown Heatmap + Hrnet + Dark on Coco-Wholebody¶
HRNet (CVPR'2019)
@inproceedings{sun2019deep,
title={Deep high-resolution representation learning for human pose estimation},
author={Sun, Ke and Xiao, Bin and Liu, Dong and Wang, Jingdong},
booktitle={Proceedings of the IEEE conference on computer vision and pattern recognition},
pages={5693--5703},
year={2019}
}
DarkPose (CVPR'2020)
@inproceedings{zhang2020distribution,
title={Distribution-aware coordinate representation for human pose estimation},
author={Zhang, Feng and Zhu, Xiatian and Dai, Hanbin and Ye, Mao and Zhu, Ce},
booktitle={Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition},
pages={7093--7102},
year={2020}
}
COCO-WholeBody (ECCV'2020)
@inproceedings{jin2020whole,
title={Whole-Body Human Pose Estimation in the Wild},
author={Jin, Sheng and Xu, Lumin and Xu, Jin and Wang, Can and Liu, Wentao and Qian, Chen and Ouyang, Wanli and Luo, Ping},
booktitle={Proceedings of the European Conference on Computer Vision (ECCV)},
year={2020}
}
Results on COCO-WholeBody v1.0 val with detector having human AP of 56.4 on COCO val2017 dataset
Arch | Input Size | Body AP | Body AR | Foot AP | Foot AR | Face AP | Face AR | Hand AP | Hand AR | Whole AP | Whole AR | ckpt | log |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|
pose_hrnet_w32_dark | 256x192 | 0.694 | 0.764 | 0.565 | 0.674 | 0.736 | 0.808 | 0.503 | 0.602 | 0.582 | 0.671 | ckpt | log |
pose_hrnet_w48_dark+ | 384x288 | 0.742 | 0.807 | 0.705 | 0.804 | 0.840 | 0.892 | 0.602 | 0.694 | 0.661 | 0.743 | ckpt | log |
Note: +
means the model is first pre-trained on original COCO dataset, and then fine-tuned on COCO-WholeBody dataset. We find this will lead to better performance.
Topdown Heatmap + Hrnet on Coco-Wholebody¶
HRNet (CVPR'2019)
@inproceedings{sun2019deep,
title={Deep high-resolution representation learning for human pose estimation},
author={Sun, Ke and Xiao, Bin and Liu, Dong and Wang, Jingdong},
booktitle={Proceedings of the IEEE conference on computer vision and pattern recognition},
pages={5693--5703},
year={2019}
}
COCO-WholeBody (ECCV'2020)
@inproceedings{jin2020whole,
title={Whole-Body Human Pose Estimation in the Wild},
author={Jin, Sheng and Xu, Lumin and Xu, Jin and Wang, Can and Liu, Wentao and Qian, Chen and Ouyang, Wanli and Luo, Ping},
booktitle={Proceedings of the European Conference on Computer Vision (ECCV)},
year={2020}
}
Results on COCO-WholeBody v1.0 val with detector having human AP of 56.4 on COCO val2017 dataset
Arch | Input Size | Body AP | Body AR | Foot AP | Foot AR | Face AP | Face AR | Hand AP | Hand AR | Whole AP | Whole AR | ckpt | log |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|
pose_hrnet_w32 | 256x192 | 0.700 | 0.746 | 0.567 | 0.645 | 0.637 | 0.688 | 0.473 | 0.546 | 0.553 | 0.626 | ckpt | log |
pose_hrnet_w32 | 384x288 | 0.701 | 0.773 | 0.586 | 0.692 | 0.727 | 0.783 | 0.516 | 0.604 | 0.586 | 0.674 | ckpt | log |
pose_hrnet_w48 | 256x192 | 0.700 | 0.776 | 0.672 | 0.785 | 0.656 | 0.743 | 0.534 | 0.639 | 0.579 | 0.681 | ckpt | log |
pose_hrnet_w48 | 384x288 | 0.722 | 0.790 | 0.694 | 0.799 | 0.777 | 0.834 | 0.587 | 0.679 | 0.631 | 0.716 | ckpt | log |
Topdown Heatmap + Hrnet + Dark on Halpe¶
HRNet (CVPR'2019)
@inproceedings{sun2019deep,
title={Deep high-resolution representation learning for human pose estimation},
author={Sun, Ke and Xiao, Bin and Liu, Dong and Wang, Jingdong},
booktitle={Proceedings of the IEEE conference on computer vision and pattern recognition},
pages={5693--5703},
year={2019}
}
DarkPose (CVPR'2020)
@inproceedings{zhang2020distribution,
title={Distribution-aware coordinate representation for human pose estimation},
author={Zhang, Feng and Zhu, Xiatian and Dai, Hanbin and Ye, Mao and Zhu, Ce},
booktitle={Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition},
pages={7093--7102},
year={2020}
}
Halpe (CVPR'2020)
@inproceedings{li2020pastanet,
title={PaStaNet: Toward Human Activity Knowledge Engine},
author={Li, Yong-Lu and Xu, Liang and Liu, Xinpeng and Huang, Xijie and Xu, Yue and Wang, Shiyi and Fang, Hao-Shu and Ma, Ze and Chen, Mingyang and Lu, Cewu},
booktitle={CVPR},
year={2020}
}
Results on Halpe v1.0 val with detector having human AP of 56.4 on COCO val2017 dataset
Arch | Input Size | Whole AP | Whole AR | ckpt | log |
---|---|---|---|---|---|
pose_hrnet_w48_dark+ | 384x288 | 0.531 | 0.642 | ckpt | log |
Note: +
means the model is first pre-trained on original COCO dataset, and then fine-tuned on Halpe dataset. We find this will lead to better performance.
Associative Embedding (NIPS’2017)¶
Associative Embedding + Hrnet on Aic¶
Associative Embedding (NIPS'2017)
@inproceedings{newell2017associative,
title={Associative embedding: End-to-end learning for joint detection and grouping},
author={Newell, Alejandro and Huang, Zhiao and Deng, Jia},
booktitle={Advances in neural information processing systems},
pages={2277--2287},
year={2017}
}
HRNet (CVPR'2019)
@inproceedings{sun2019deep,
title={Deep high-resolution representation learning for human pose estimation},
author={Sun, Ke and Xiao, Bin and Liu, Dong and Wang, Jingdong},
booktitle={Proceedings of the IEEE conference on computer vision and pattern recognition},
pages={5693--5703},
year={2019}
}
AI Challenger (ArXiv'2017)
@article{wu2017ai,
title={Ai challenger: A large-scale dataset for going deeper in image understanding},
author={Wu, Jiahong and Zheng, He and Zhao, Bo and Li, Yixin and Yan, Baoming and Liang, Rui and Wang, Wenjia and Zhou, Shipei and Lin, Guosen and Fu, Yanwei and others},
journal={arXiv preprint arXiv:1711.06475},
year={2017}
}
Results on AIC validation set without multi-scale test
Arch | Input Size | AP | AP50 | AP75 | AR | AR50 | ckpt | log |
---|---|---|---|---|---|---|---|---|
HRNet-w32 | 512x512 | 0.303 | 0.697 | 0.225 | 0.373 | 0.755 | ckpt | log |
Results on AIC validation set with multi-scale test. 3 default scales ([2, 1, 0.5]) are used
Arch | Input Size | AP | AP50 | AP75 | AR | AR50 | ckpt | log |
---|---|---|---|---|---|---|---|---|
HRNet-w32 | 512x512 | 0.318 | 0.717 | 0.246 | 0.379 | 0.764 | ckpt | log |
Associative Embedding + Higherhrnet on Aic¶
Associative Embedding (NIPS'2017)
@inproceedings{newell2017associative,
title={Associative embedding: End-to-end learning for joint detection and grouping},
author={Newell, Alejandro and Huang, Zhiao and Deng, Jia},
booktitle={Advances in neural information processing systems},
pages={2277--2287},
year={2017}
}
HigherHRNet (CVPR'2020)
@inproceedings{cheng2020higherhrnet,
title={HigherHRNet: Scale-Aware Representation Learning for Bottom-Up Human Pose Estimation},
author={Cheng, Bowen and Xiao, Bin and Wang, Jingdong and Shi, Honghui and Huang, Thomas S and Zhang, Lei},
booktitle={Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition},
pages={5386--5395},
year={2020}
}
AI Challenger (ArXiv'2017)
@article{wu2017ai,
title={Ai challenger: A large-scale dataset for going deeper in image understanding},
author={Wu, Jiahong and Zheng, He and Zhao, Bo and Li, Yixin and Yan, Baoming and Liang, Rui and Wang, Wenjia and Zhou, Shipei and Lin, Guosen and Fu, Yanwei and others},
journal={arXiv preprint arXiv:1711.06475},
year={2017}
}
Results on AIC validation set without multi-scale test
Arch | Input Size | AP | AP50 | AP75 | AR | AR50 | ckpt | log |
---|---|---|---|---|---|---|---|---|
HigherHRNet-w32 | 512x512 | 0.315 | 0.710 | 0.243 | 0.379 | 0.757 | ckpt | log |
Results on AIC validation set with multi-scale test. 3 default scales ([2, 1, 0.5]) are used
Arch | Input Size | AP | AP50 | AP75 | AR | AR50 | ckpt | log |
---|---|---|---|---|---|---|---|---|
HigherHRNet-w32 | 512x512 | 0.323 | 0.718 | 0.254 | 0.379 | 0.758 | ckpt | log |
Associative Embedding + Mobilenetv2 on Coco¶
Associative Embedding (NIPS'2017)
@inproceedings{newell2017associative,
title={Associative embedding: End-to-end learning for joint detection and grouping},
author={Newell, Alejandro and Huang, Zhiao and Deng, Jia},
booktitle={Advances in neural information processing systems},
pages={2277--2287},
year={2017}
}
MobilenetV2 (CVPR'2018)
@inproceedings{sandler2018mobilenetv2,
title={Mobilenetv2: Inverted residuals and linear bottlenecks},
author={Sandler, Mark and Howard, Andrew and Zhu, Menglong and Zhmoginov, Andrey and Chen, Liang-Chieh},
booktitle={Proceedings of the IEEE conference on computer vision and pattern recognition},
pages={4510--4520},
year={2018}
}
COCO (ECCV'2014)
@inproceedings{lin2014microsoft,
title={Microsoft coco: Common objects in context},
author={Lin, Tsung-Yi and Maire, Michael and Belongie, Serge and Hays, James and Perona, Pietro and Ramanan, Deva and Doll{\'a}r, Piotr and Zitnick, C Lawrence},
booktitle={European conference on computer vision},
pages={740--755},
year={2014},
organization={Springer}
}
Results on COCO val2017 without multi-scale test
Arch | Input Size | AP | AP50 | AP75 | AR | AR50 | ckpt | log |
---|---|---|---|---|---|---|---|---|
pose_mobilenetv2 | 512x512 | 0.380 | 0.671 | 0.368 | 0.473 | 0.741 | ckpt | log |
Results on COCO val2017 with multi-scale test. 3 default scales ([2, 1, 0.5]) are used
Arch | Input Size | AP | AP50 | AP75 | AR | AR50 | ckpt | log |
---|---|---|---|---|---|---|---|---|
pose_mobilenetv2 | 512x512 | 0.442 | 0.696 | 0.422 | 0.517 | 0.766 | ckpt | log |
Associative Embedding + Resnet on Coco¶
Associative Embedding (NIPS'2017)
@inproceedings{newell2017associative,
title={Associative embedding: End-to-end learning for joint detection and grouping},
author={Newell, Alejandro and Huang, Zhiao and Deng, Jia},
booktitle={Advances in neural information processing systems},
pages={2277--2287},
year={2017}
}
ResNet (CVPR'2016)
@inproceedings{he2016deep,
title={Deep residual learning for image recognition},
author={He, Kaiming and Zhang, Xiangyu and Ren, Shaoqing and Sun, Jian},
booktitle={Proceedings of the IEEE conference on computer vision and pattern recognition},
pages={770--778},
year={2016}
}
COCO (ECCV'2014)
@inproceedings{lin2014microsoft,
title={Microsoft coco: Common objects in context},
author={Lin, Tsung-Yi and Maire, Michael and Belongie, Serge and Hays, James and Perona, Pietro and Ramanan, Deva and Doll{\'a}r, Piotr and Zitnick, C Lawrence},
booktitle={European conference on computer vision},
pages={740--755},
year={2014},
organization={Springer}
}
Results on COCO val2017 without multi-scale test
Arch | Input Size | AP | AP50 | AP75 | AR | AR50 | ckpt | log |
---|---|---|---|---|---|---|---|---|
pose_resnet_50 | 512x512 | 0.466 | 0.742 | 0.479 | 0.552 | 0.797 | ckpt | log |
pose_resnet_50 | 640x640 | 0.479 | 0.757 | 0.487 | 0.566 | 0.810 | ckpt | log |
pose_resnet_101 | 512x512 | 0.554 | 0.807 | 0.599 | 0.622 | 0.841 | ckpt | log |
pose_resnet_152 | 512x512 | 0.595 | 0.829 | 0.648 | 0.651 | 0.856 | ckpt | log |
Results on COCO val2017 with multi-scale test. 3 default scales ([2, 1, 0.5]) are used
Arch | Input Size | AP | AP50 | AP75 | AR | AR50 | ckpt | log |
---|---|---|---|---|---|---|---|---|
pose_resnet_50 | 512x512 | 0.503 | 0.765 | 0.521 | 0.591 | 0.821 | ckpt | log |
pose_resnet_50 | 640x640 | 0.525 | 0.784 | 0.542 | 0.610 | 0.832 | ckpt | log |
pose_resnet_101 | 512x512 | 0.603 | 0.831 | 0.641 | 0.668 | 0.870 | ckpt | log |
pose_resnet_152 | 512x512 | 0.660 | 0.860 | 0.713 | 0.709 | 0.889 | ckpt | log |
Associative Embedding + Higherhrnet + Udp on Coco¶
Associative Embedding (NIPS'2017)
@inproceedings{newell2017associative,
title={Associative embedding: End-to-end learning for joint detection and grouping},
author={Newell, Alejandro and Huang, Zhiao and Deng, Jia},
booktitle={Advances in neural information processing systems},
pages={2277--2287},
year={2017}
}
HigherHRNet (CVPR'2020)
@inproceedings{cheng2020higherhrnet,
title={HigherHRNet: Scale-Aware Representation Learning for Bottom-Up Human Pose Estimation},
author={Cheng, Bowen and Xiao, Bin and Wang, Jingdong and Shi, Honghui and Huang, Thomas S and Zhang, Lei},
booktitle={Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition},
pages={5386--5395},
year={2020}
}
UDP (CVPR'2020)
@InProceedings{Huang_2020_CVPR,
author = {Huang, Junjie and Zhu, Zheng and Guo, Feng and Huang, Guan},
title = {The Devil Is in the Details: Delving Into Unbiased Data Processing for Human Pose Estimation},
booktitle = {The IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)},
month = {June},
year = {2020}
}
COCO (ECCV'2014)
@inproceedings{lin2014microsoft,
title={Microsoft coco: Common objects in context},
author={Lin, Tsung-Yi and Maire, Michael and Belongie, Serge and Hays, James and Perona, Pietro and Ramanan, Deva and Doll{\'a}r, Piotr and Zitnick, C Lawrence},
booktitle={European conference on computer vision},
pages={740--755},
year={2014},
organization={Springer}
}
Results on COCO val2017 without multi-scale test
Arch | Input Size | AP | AP50 | AP75 | AR | AR50 | ckpt | log |
---|---|---|---|---|---|---|---|---|
HigherHRNet-w32_udp | 512x512 | 0.678 | 0.862 | 0.736 | 0.724 | 0.890 | ckpt | log |
HigherHRNet-w48_udp | 512x512 | 0.690 | 0.872 | 0.750 | 0.734 | 0.891 | ckpt | log |
Associative Embedding + Hrnet on Coco¶
Associative Embedding (NIPS'2017)
@inproceedings{newell2017associative,
title={Associative embedding: End-to-end learning for joint detection and grouping},
author={Newell, Alejandro and Huang, Zhiao and Deng, Jia},
booktitle={Advances in neural information processing systems},
pages={2277--2287},
year={2017}
}
HRNet (CVPR'2019)
@inproceedings{sun2019deep,
title={Deep high-resolution representation learning for human pose estimation},
author={Sun, Ke and Xiao, Bin and Liu, Dong and Wang, Jingdong},
booktitle={Proceedings of the IEEE conference on computer vision and pattern recognition},
pages={5693--5703},
year={2019}
}
COCO (ECCV'2014)
@inproceedings{lin2014microsoft,
title={Microsoft coco: Common objects in context},
author={Lin, Tsung-Yi and Maire, Michael and Belongie, Serge and Hays, James and Perona, Pietro and Ramanan, Deva and Doll{\'a}r, Piotr and Zitnick, C Lawrence},
booktitle={European conference on computer vision},
pages={740--755},
year={2014},
organization={Springer}
}
Results on COCO val2017 without multi-scale test
Arch | Input Size | AP | AP50 | AP75 | AR | AR50 | ckpt | log |
---|---|---|---|---|---|---|---|---|
HRNet-w32 | 512x512 | 0.654 | 0.863 | 0.720 | 0.710 | 0.892 | ckpt | log |
HRNet-w48 | 512x512 | 0.665 | 0.860 | 0.727 | 0.716 | 0.889 | ckpt | log |
Results on COCO val2017 with multi-scale test. 3 default scales ([2, 1, 0.5]) are used
Arch | Input Size | AP | AP50 | AP75 | AR | AR50 | ckpt | log |
---|---|---|---|---|---|---|---|---|
HRNet-w32 | 512x512 | 0.698 | 0.877 | 0.760 | 0.748 | 0.907 | ckpt | log |
HRNet-w48 | 512x512 | 0.712 | 0.880 | 0.771 | 0.757 | 0.909 | ckpt | log |
Associative Embedding + Higherhrnet on Coco¶
Associative Embedding (NIPS'2017)
@inproceedings{newell2017associative,
title={Associative embedding: End-to-end learning for joint detection and grouping},
author={Newell, Alejandro and Huang, Zhiao and Deng, Jia},
booktitle={Advances in neural information processing systems},
pages={2277--2287},
year={2017}
}
HigherHRNet (CVPR'2020)
@inproceedings{cheng2020higherhrnet,
title={HigherHRNet: Scale-Aware Representation Learning for Bottom-Up Human Pose Estimation},
author={Cheng, Bowen and Xiao, Bin and Wang, Jingdong and Shi, Honghui and Huang, Thomas S and Zhang, Lei},
booktitle={Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition},
pages={5386--5395},
year={2020}
}
COCO (ECCV'2014)
@inproceedings{lin2014microsoft,
title={Microsoft coco: Common objects in context},
author={Lin, Tsung-Yi and Maire, Michael and Belongie, Serge and Hays, James and Perona, Pietro and Ramanan, Deva and Doll{\'a}r, Piotr and Zitnick, C Lawrence},
booktitle={European conference on computer vision},
pages={740--755},
year={2014},
organization={Springer}
}
Results on COCO val2017 without multi-scale test
Arch | Input Size | AP | AP50 | AP75 | AR | AR50 | ckpt | log |
---|---|---|---|---|---|---|---|---|
HigherHRNet-w32 | 512x512 | 0.677 | 0.870 | 0.738 | 0.723 | 0.890 | ckpt | log |
HigherHRNet-w32 | 640x640 | 0.686 | 0.871 | 0.747 | 0.733 | 0.898 | ckpt | log |
HigherHRNet-w48 | 512x512 | 0.686 | 0.873 | 0.741 | 0.731 | 0.892 | ckpt | log |
Results on COCO val2017 with multi-scale test. 3 default scales ([2, 1, 0.5]) are used
Arch | Input Size | AP | AP50 | AP75 | AR | AR50 | ckpt | log |
---|---|---|---|---|---|---|---|---|
HigherHRNet-w32 | 512x512 | 0.706 | 0.881 | 0.771 | 0.747 | 0.901 | ckpt | log |
HigherHRNet-w32 | 640x640 | 0.706 | 0.880 | 0.770 | 0.749 | 0.902 | ckpt | log |
HigherHRNet-w48 | 512x512 | 0.716 | 0.884 | 0.775 | 0.755 | 0.901 | ckpt | log |
Associative Embedding + Hrnet + Udp on Coco¶
Associative Embedding (NIPS'2017)
@inproceedings{newell2017associative,
title={Associative embedding: End-to-end learning for joint detection and grouping},
author={Newell, Alejandro and Huang, Zhiao and Deng, Jia},
booktitle={Advances in neural information processing systems},
pages={2277--2287},
year={2017}
}
HRNet (CVPR'2019)
@inproceedings{sun2019deep,
title={Deep high-resolution representation learning for human pose estimation},
author={Sun, Ke and Xiao, Bin and Liu, Dong and Wang, Jingdong},
booktitle={Proceedings of the IEEE conference on computer vision and pattern recognition},
pages={5693--5703},
year={2019}
}
UDP (CVPR'2020)
@InProceedings{Huang_2020_CVPR,
author = {Huang, Junjie and Zhu, Zheng and Guo, Feng and Huang, Guan},
title = {The Devil Is in the Details: Delving Into Unbiased Data Processing for Human Pose Estimation},
booktitle = {The IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)},
month = {June},
year = {2020}
}
COCO (ECCV'2014)
@inproceedings{lin2014microsoft,
title={Microsoft coco: Common objects in context},
author={Lin, Tsung-Yi and Maire, Michael and Belongie, Serge and Hays, James and Perona, Pietro and Ramanan, Deva and Doll{\'a}r, Piotr and Zitnick, C Lawrence},
booktitle={European conference on computer vision},
pages={740--755},
year={2014},
organization={Springer}
}
Results on COCO val2017 without multi-scale test
Arch | Input Size | AP | AP50 | AP75 | AR | AR50 | ckpt | log |
---|---|---|---|---|---|---|---|---|
HRNet-w32_udp | 512x512 | 0.671 | 0.863 | 0.729 | 0.717 | 0.889 | ckpt | log |
HRNet-w48_udp | 512x512 | 0.681 | 0.872 | 0.741 | 0.725 | 0.892 | ckpt | log |
Associative Embedding + Hourglass + Ae on Coco¶
Associative Embedding (NIPS'2017)
@inproceedings{newell2017associative,
title={Associative embedding: End-to-end learning for joint detection and grouping},
author={Newell, Alejandro and Huang, Zhiao and Deng, Jia},
booktitle={Advances in neural information processing systems},
pages={2277--2287},
year={2017}
}
HourglassAENet (NIPS'2017)
@inproceedings{newell2017associative,
title={Associative embedding: End-to-end learning for joint detection and grouping},
author={Newell, Alejandro and Huang, Zhiao and Deng, Jia},
booktitle={Advances in neural information processing systems},
pages={2277--2287},
year={2017}
}
COCO (ECCV'2014)
@inproceedings{lin2014microsoft,
title={Microsoft coco: Common objects in context},
author={Lin, Tsung-Yi and Maire, Michael and Belongie, Serge and Hays, James and Perona, Pietro and Ramanan, Deva and Doll{\'a}r, Piotr and Zitnick, C Lawrence},
booktitle={European conference on computer vision},
pages={740--755},
year={2014},
organization={Springer}
}
Results on COCO val2017 without multi-scale test
Arch | Input Size | AP | AP50 | AP75 | AR | AR50 | ckpt | log |
---|---|---|---|---|---|---|---|---|
pose_hourglass_ae | 512x512 | 0.613 | 0.833 | 0.667 | 0.659 | 0.850 | ckpt | log |
Results on COCO val2017 with multi-scale test. 3 default scales ([2, 1, 0.5]) are used
Arch | Input Size | AP | AP50 | AP75 | AR | AR50 | ckpt | log |
---|---|---|---|---|---|---|---|---|
pose_hourglass_ae | 512x512 | 0.667 | 0.855 | 0.723 | 0.707 | 0.877 | ckpt | log |
Associative Embedding + Higherhrnet on Crowdpose¶
Associative Embedding (NIPS'2017)
@inproceedings{newell2017associative,
title={Associative embedding: End-to-end learning for joint detection and grouping},
author={Newell, Alejandro and Huang, Zhiao and Deng, Jia},
booktitle={Advances in neural information processing systems},
pages={2277--2287},
year={2017}
}
HigherHRNet (CVPR'2020)
@inproceedings{cheng2020higherhrnet,
title={HigherHRNet: Scale-Aware Representation Learning for Bottom-Up Human Pose Estimation},
author={Cheng, Bowen and Xiao, Bin and Wang, Jingdong and Shi, Honghui and Huang, Thomas S and Zhang, Lei},
booktitle={Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition},
pages={5386--5395},
year={2020}
}
CrowdPose (CVPR'2019)
@article{li2018crowdpose,
title={CrowdPose: Efficient Crowded Scenes Pose Estimation and A New Benchmark},
author={Li, Jiefeng and Wang, Can and Zhu, Hao and Mao, Yihuan and Fang, Hao-Shu and Lu, Cewu},
journal={arXiv preprint arXiv:1812.00324},
year={2018}
}
Results on CrowdPose test without multi-scale test
Arch | Input Size | AP | AP50 | AP75 | AP (E) | AP (M) | AP (H) | ckpt | log |
---|---|---|---|---|---|---|---|---|---|
HigherHRNet-w32 | 512x512 | 0.655 | 0.859 | 0.705 | 0.728 | 0.660 | 0.577 | ckpt | log |
Results on CrowdPose test with multi-scale test. 2 scales ([2, 1]) are used
Arch | Input Size | AP | AP50 | AP75 | AP (E) | AP (M) | AP (H) | ckpt | log |
---|---|---|---|---|---|---|---|---|---|
HigherHRNet-w32 | 512x512 | 0.661 | 0.864 | 0.710 | 0.742 | 0.670 | 0.566 | ckpt | log |
Associative Embedding + Hrnet on MHP¶
Associative Embedding (NIPS'2017)
@inproceedings{newell2017associative,
title={Associative embedding: End-to-end learning for joint detection and grouping},
author={Newell, Alejandro and Huang, Zhiao and Deng, Jia},
booktitle={Advances in neural information processing systems},
pages={2277--2287},
year={2017}
}
HRNet (CVPR'2019)
@inproceedings{sun2019deep,
title={Deep high-resolution representation learning for human pose estimation},
author={Sun, Ke and Xiao, Bin and Liu, Dong and Wang, Jingdong},
booktitle={Proceedings of the IEEE conference on computer vision and pattern recognition},
pages={5693--5703},
year={2019}
}
MHP (ACM MM'2018)
@inproceedings{zhao2018understanding,
title={Understanding humans in crowded scenes: Deep nested adversarial learning and a new benchmark for multi-human parsing},
author={Zhao, Jian and Li, Jianshu and Cheng, Yu and Sim, Terence and Yan, Shuicheng and Feng, Jiashi},
booktitle={Proceedings of the 26th ACM international conference on Multimedia},
pages={792--800},
year={2018}
}
Results on MHP v2.0 validation set without multi-scale test
Arch | Input Size | AP | AP50 | AP75 | AR | AR50 | ckpt | log |
---|---|---|---|---|---|---|---|---|
HRNet-w48 | 512x512 | 0.583 | 0.895 | 0.666 | 0.656 | 0.931 | ckpt | log |
Results on MHP v2.0 validation set with multi-scale test. 3 default scales ([2, 1, 0.5]) are used
Arch | Input Size | AP | AP50 | AP75 | AR | AR50 | ckpt | log |
---|---|---|---|---|---|---|---|---|
HRNet-w48 | 512x512 | 0.592 | 0.898 | 0.673 | 0.664 | 0.932 | ckpt | log |
Associative Embedding + Hrnet on Coco-Wholebody¶
Associative Embedding (NIPS'2017)
@inproceedings{newell2017associative,
title={Associative embedding: End-to-end learning for joint detection and grouping},
author={Newell, Alejandro and Huang, Zhiao and Deng, Jia},
booktitle={Advances in neural information processing systems},
pages={2277--2287},
year={2017}
}
HRNet (CVPR'2019)
@inproceedings{sun2019deep,
title={Deep high-resolution representation learning for human pose estimation},
author={Sun, Ke and Xiao, Bin and Liu, Dong and Wang, Jingdong},
booktitle={Proceedings of the IEEE conference on computer vision and pattern recognition},
pages={5693--5703},
year={2019}
}
COCO-WholeBody (ECCV'2020)
@inproceedings{jin2020whole,
title={Whole-Body Human Pose Estimation in the Wild},
author={Jin, Sheng and Xu, Lumin and Xu, Jin and Wang, Can and Liu, Wentao and Qian, Chen and Ouyang, Wanli and Luo, Ping},
booktitle={Proceedings of the European Conference on Computer Vision (ECCV)},
year={2020}
}
Results on COCO-WholeBody v1.0 val without multi-scale test
Arch | Input Size | Body AP | Body AR | Foot AP | Foot AR | Face AP | Face AR | Hand AP | Hand AR | Whole AP | Whole AR | ckpt | log |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|
HRNet-w32+ | 512x512 | 0.551 | 0.650 | 0.271 | 0.451 | 0.564 | 0.618 | 0.159 | 0.238 | 0.342 | 0.453 | ckpt | log |
HRNet-w48+ | 512x512 | 0.592 | 0.686 | 0.443 | 0.595 | 0.619 | 0.674 | 0.347 | 0.438 | 0.422 | 0.532 | ckpt | log |
Note: +
means the model is first pre-trained on original COCO dataset, and then fine-tuned on COCO-WholeBody dataset. We find this will lead to better performance.
Associative Embedding + Higherhrnet on Coco-Wholebody¶
Associative Embedding (NIPS'2017)
@inproceedings{newell2017associative,
title={Associative embedding: End-to-end learning for joint detection and grouping},
author={Newell, Alejandro and Huang, Zhiao and Deng, Jia},
booktitle={Advances in neural information processing systems},
pages={2277--2287},
year={2017}
}
HigherHRNet (CVPR'2020)
@inproceedings{cheng2020higherhrnet,
title={HigherHRNet: Scale-Aware Representation Learning for Bottom-Up Human Pose Estimation},
author={Cheng, Bowen and Xiao, Bin and Wang, Jingdong and Shi, Honghui and Huang, Thomas S and Zhang, Lei},
booktitle={Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition},
pages={5386--5395},
year={2020}
}
COCO-WholeBody (ECCV'2020)
@inproceedings{jin2020whole,
title={Whole-Body Human Pose Estimation in the Wild},
author={Jin, Sheng and Xu, Lumin and Xu, Jin and Wang, Can and Liu, Wentao and Qian, Chen and Ouyang, Wanli and Luo, Ping},
booktitle={Proceedings of the European Conference on Computer Vision (ECCV)},
year={2020}
}
Results on COCO-WholeBody v1.0 val without multi-scale test
Arch | Input Size | Body AP | Body AR | Foot AP | Foot AR | Face AP | Face AR | Hand AP | Hand AR | Whole AP | Whole AR | ckpt | log |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|
HigherHRNet-w32+ | 512x512 | 0.590 | 0.672 | 0.185 | 0.335 | 0.676 | 0.721 | 0.212 | 0.298 | 0.401 | 0.493 | ckpt | log |
HigherHRNet-w48+ | 512x512 | 0.630 | 0.706 | 0.440 | 0.573 | 0.730 | 0.777 | 0.389 | 0.477 | 0.487 | 0.574 | ckpt | log |
Note: +
means the model is first pre-trained on original COCO dataset, and then fine-tuned on COCO-WholeBody dataset. We find this will lead to better performance.
HRNetv2 (TPAMI’2019)¶
Topdown Heatmap + Hrnetv2 on 300w¶
HRNetv2 (TPAMI'2019)
@article{WangSCJDZLMTWLX19,
title={Deep High-Resolution Representation Learning for Visual Recognition},
author={Jingdong Wang and Ke Sun and Tianheng Cheng and
Borui Jiang and Chaorui Deng and Yang Zhao and Dong Liu and Yadong Mu and
Mingkui Tan and Xinggang Wang and Wenyu Liu and Bin Xiao},
journal={TPAMI},
year={2019}
}
300W (IMAVIS'2016)
@article{sagonas2016300,
title={300 faces in-the-wild challenge: Database and results},
author={Sagonas, Christos and Antonakos, Epameinondas and Tzimiropoulos, Georgios and Zafeiriou, Stefanos and Pantic, Maja},
journal={Image and vision computing},
volume={47},
pages={3--18},
year={2016},
publisher={Elsevier}
}
Results on 300W dataset
The model is trained on 300W train.
Arch | Input Size | NMEcommon | NMEchallenge | NMEfull | NMEtest | ckpt | log |
---|---|---|---|---|---|---|---|
pose_hrnetv2_w18 | 256x256 | 2.86 | 5.45 | 3.37 | 3.97 | ckpt | log |
Topdown Heatmap + Hrnetv2 + Dark on Aflw¶
HRNetv2 (TPAMI'2019)
@article{WangSCJDZLMTWLX19,
title={Deep High-Resolution Representation Learning for Visual Recognition},
author={Jingdong Wang and Ke Sun and Tianheng Cheng and
Borui Jiang and Chaorui Deng and Yang Zhao and Dong Liu and Yadong Mu and
Mingkui Tan and Xinggang Wang and Wenyu Liu and Bin Xiao},
journal={TPAMI},
year={2019}
}
DarkPose (CVPR'2020)
@inproceedings{zhang2020distribution,
title={Distribution-aware coordinate representation for human pose estimation},
author={Zhang, Feng and Zhu, Xiatian and Dai, Hanbin and Ye, Mao and Zhu, Ce},
booktitle={Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition},
pages={7093--7102},
year={2020}
}
AFLW (ICCVW'2011)
@inproceedings{koestinger2011annotated,
title={Annotated facial landmarks in the wild: A large-scale, real-world database for facial landmark localization},
author={Koestinger, Martin and Wohlhart, Paul and Roth, Peter M and Bischof, Horst},
booktitle={2011 IEEE international conference on computer vision workshops (ICCV workshops)},
pages={2144--2151},
year={2011},
organization={IEEE}
}
Results on AFLW dataset
The model is trained on AFLW train and evaluated on AFLW full and frontal.
Arch | Input Size | NMEfull | NMEfrontal | ckpt | log |
---|---|---|---|---|---|
pose_hrnetv2_w18_dark | 256x256 | 1.34 | 1.20 | ckpt | log |
Topdown Heatmap + Hrnetv2 on Aflw¶
HRNetv2 (TPAMI'2019)
@article{WangSCJDZLMTWLX19,
title={Deep High-Resolution Representation Learning for Visual Recognition},
author={Jingdong Wang and Ke Sun and Tianheng Cheng and
Borui Jiang and Chaorui Deng and Yang Zhao and Dong Liu and Yadong Mu and
Mingkui Tan and Xinggang Wang and Wenyu Liu and Bin Xiao},
journal={TPAMI},
year={2019}
}
AFLW (ICCVW'2011)
@inproceedings{koestinger2011annotated,
title={Annotated facial landmarks in the wild: A large-scale, real-world database for facial landmark localization},
author={Koestinger, Martin and Wohlhart, Paul and Roth, Peter M and Bischof, Horst},
booktitle={2011 IEEE international conference on computer vision workshops (ICCV workshops)},
pages={2144--2151},
year={2011},
organization={IEEE}
}
Results on AFLW dataset
The model is trained on AFLW train and evaluated on AFLW full and frontal.
Arch | Input Size | NMEfull | NMEfrontal | ckpt | log |
---|---|---|---|---|---|
pose_hrnetv2_w18 | 256x256 | 1.41 | 1.27 | ckpt | log |
Topdown Heatmap + Hrnetv2 + Dark + Coco + Wholebody on Coco_wholebody_face¶
HRNetv2 (TPAMI'2019)
@article{WangSCJDZLMTWLX19,
title={Deep High-Resolution Representation Learning for Visual Recognition},
author={Jingdong Wang and Ke Sun and Tianheng Cheng and
Borui Jiang and Chaorui Deng and Yang Zhao and Dong Liu and Yadong Mu and
Mingkui Tan and Xinggang Wang and Wenyu Liu and Bin Xiao},
journal={TPAMI},
year={2019}
}
DarkPose (CVPR'2020)
@inproceedings{zhang2020distribution,
title={Distribution-aware coordinate representation for human pose estimation},
author={Zhang, Feng and Zhu, Xiatian and Dai, Hanbin and Ye, Mao and Zhu, Ce},
booktitle={Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition},
pages={7093--7102},
year={2020}
}
COCO-WholeBody-Face (ECCV'2020)
@inproceedings{jin2020whole,
title={Whole-Body Human Pose Estimation in the Wild},
author={Jin, Sheng and Xu, Lumin and Xu, Jin and Wang, Can and Liu, Wentao and Qian, Chen and Ouyang, Wanli and Luo, Ping},
booktitle={Proceedings of the European Conference on Computer Vision (ECCV)},
year={2020}
}
Results on COCO-WholeBody-Face val set
Arch | Input Size | NME | ckpt | log |
---|---|---|---|---|
pose_hrnetv2_w18_dark | 256x256 | 0.0513 | ckpt | log |
Topdown Heatmap + Hrnetv2 + Coco + Wholebody on Coco_wholebody_face¶
HRNetv2 (TPAMI'2019)
@article{WangSCJDZLMTWLX19,
title={Deep High-Resolution Representation Learning for Visual Recognition},
author={Jingdong Wang and Ke Sun and Tianheng Cheng and
Borui Jiang and Chaorui Deng and Yang Zhao and Dong Liu and Yadong Mu and
Mingkui Tan and Xinggang Wang and Wenyu Liu and Bin Xiao},
journal={TPAMI},
year={2019}
}
COCO-WholeBody-Face (ECCV'2020)
@inproceedings{jin2020whole,
title={Whole-Body Human Pose Estimation in the Wild},
author={Jin, Sheng and Xu, Lumin and Xu, Jin and Wang, Can and Liu, Wentao and Qian, Chen and Ouyang, Wanli and Luo, Ping},
booktitle={Proceedings of the European Conference on Computer Vision (ECCV)},
year={2020}
}
Results on COCO-WholeBody-Face val set
Arch | Input Size | NME | ckpt | log |
---|---|---|---|---|
pose_hrnetv2_w18 | 256x256 | 0.0569 | ckpt | log |
Topdown Heatmap + Hrnetv2 on Cofw¶
HRNetv2 (TPAMI'2019)
@article{WangSCJDZLMTWLX19,
title={Deep High-Resolution Representation Learning for Visual Recognition},
author={Jingdong Wang and Ke Sun and Tianheng Cheng and
Borui Jiang and Chaorui Deng and Yang Zhao and Dong Liu and Yadong Mu and
Mingkui Tan and Xinggang Wang and Wenyu Liu and Bin Xiao},
journal={TPAMI},
year={2019}
}
COFW (ICCV'2013)
@inproceedings{burgos2013robust,
title={Robust face landmark estimation under occlusion},
author={Burgos-Artizzu, Xavier P and Perona, Pietro and Doll{\'a}r, Piotr},
booktitle={Proceedings of the IEEE international conference on computer vision},
pages={1513--1520},
year={2013}
}
Results on COFW dataset
The model is trained on COFW train.
Arch | Input Size | NME | ckpt | log |
---|---|---|---|---|
pose_hrnetv2_w18 | 256x256 | 3.40 | ckpt | log |
Topdown Heatmap + Hrnetv2 + Dark on WFLW¶
HRNetv2 (TPAMI'2019)
@article{WangSCJDZLMTWLX19,
title={Deep High-Resolution Representation Learning for Visual Recognition},
author={Jingdong Wang and Ke Sun and Tianheng Cheng and
Borui Jiang and Chaorui Deng and Yang Zhao and Dong Liu and Yadong Mu and
Mingkui Tan and Xinggang Wang and Wenyu Liu and Bin Xiao},
journal={TPAMI},
year={2019}
}
DarkPose (CVPR'2020)
@inproceedings{zhang2020distribution,
title={Distribution-aware coordinate representation for human pose estimation},
author={Zhang, Feng and Zhu, Xiatian and Dai, Hanbin and Ye, Mao and Zhu, Ce},
booktitle={Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition},
pages={7093--7102},
year={2020}
}
WFLW (CVPR'2018)
@inproceedings{wu2018look,
title={Look at boundary: A boundary-aware face alignment algorithm},
author={Wu, Wayne and Qian, Chen and Yang, Shuo and Wang, Quan and Cai, Yici and Zhou, Qiang},
booktitle={Proceedings of the IEEE conference on computer vision and pattern recognition},
pages={2129--2138},
year={2018}
}
Results on WFLW dataset
The model is trained on WFLW train.
Arch | Input Size | NMEtest | NMEpose | NMEillumination | NMEocclusion | NMEblur | NMEmakeup | NMEexpression | ckpt | log |
---|---|---|---|---|---|---|---|---|---|---|
pose_hrnetv2_w18_dark | 256x256 | 3.98 | 6.99 | 3.96 | 4.78 | 4.57 | 3.87 | 4.30 | ckpt | log |
Topdown Heatmap + Hrnetv2 + Awing on WFLW¶
HRNetv2 (TPAMI'2019)
@article{WangSCJDZLMTWLX19,
title={Deep High-Resolution Representation Learning for Visual Recognition},
author={Jingdong Wang and Ke Sun and Tianheng Cheng and
Borui Jiang and Chaorui Deng and Yang Zhao and Dong Liu and Yadong Mu and
Mingkui Tan and Xinggang Wang and Wenyu Liu and Bin Xiao},
journal={TPAMI},
year={2019}
}
AdaptiveWingloss (ICCV'2019)
@inproceedings{wang2019adaptive,
title={Adaptive wing loss for robust face alignment via heatmap regression},
author={Wang, Xinyao and Bo, Liefeng and Fuxin, Li},
booktitle={Proceedings of the IEEE/CVF international conference on computer vision},
pages={6971--6981},
year={2019}
}
WFLW (CVPR'2018)
@inproceedings{wu2018look,
title={Look at boundary: A boundary-aware face alignment algorithm},
author={Wu, Wayne and Qian, Chen and Yang, Shuo and Wang, Quan and Cai, Yici and Zhou, Qiang},
booktitle={Proceedings of the IEEE conference on computer vision and pattern recognition},
pages={2129--2138},
year={2018}
}
Results on WFLW dataset
The model is trained on WFLW train.
Arch | Input Size | NMEtest | NMEpose | NMEillumination | NMEocclusion | NMEblur | NMEmakeup | NMEexpression | ckpt | log |
---|---|---|---|---|---|---|---|---|---|---|
pose_hrnetv2_w18_awing | 256x256 | 4.02 | 6.94 | 3.96 | 4.78 | 4.59 | 3.85 | 4.28 | ckpt | log |
Topdown Heatmap + Hrnetv2 on WFLW¶
HRNetv2 (TPAMI'2019)
@article{WangSCJDZLMTWLX19,
title={Deep High-Resolution Representation Learning for Visual Recognition},
author={Jingdong Wang and Ke Sun and Tianheng Cheng and
Borui Jiang and Chaorui Deng and Yang Zhao and Dong Liu and Yadong Mu and
Mingkui Tan and Xinggang Wang and Wenyu Liu and Bin Xiao},
journal={TPAMI},
year={2019}
}
WFLW (CVPR'2018)
@inproceedings{wu2018look,
title={Look at boundary: A boundary-aware face alignment algorithm},
author={Wu, Wayne and Qian, Chen and Yang, Shuo and Wang, Quan and Cai, Yici and Zhou, Qiang},
booktitle={Proceedings of the IEEE conference on computer vision and pattern recognition},
pages={2129--2138},
year={2018}
}
Results on WFLW dataset
The model is trained on WFLW train.
Arch | Input Size | NMEtest | NMEpose | NMEillumination | NMEocclusion | NMEblur | NMEmakeup | NMEexpression | ckpt | log |
---|---|---|---|---|---|---|---|---|---|---|
pose_hrnetv2_w18 | 256x256 | 4.06 | 6.98 | 3.99 | 4.83 | 4.59 | 3.92 | 4.33 | ckpt | log |
Topdown Heatmap + Hrnetv2 + Coco + Wholebody on Coco_wholebody_hand¶
HRNetv2 (TPAMI'2019)
@article{WangSCJDZLMTWLX19,
title={Deep High-Resolution Representation Learning for Visual Recognition},
author={Jingdong Wang and Ke Sun and Tianheng Cheng and
Borui Jiang and Chaorui Deng and Yang Zhao and Dong Liu and Yadong Mu and
Mingkui Tan and Xinggang Wang and Wenyu Liu and Bin Xiao},
journal={TPAMI},
year={2019}
}
COCO-WholeBody-Hand (ECCV'2020)
@inproceedings{jin2020whole,
title={Whole-Body Human Pose Estimation in the Wild},
author={Jin, Sheng and Xu, Lumin and Xu, Jin and Wang, Can and Liu, Wentao and Qian, Chen and Ouyang, Wanli and Luo, Ping},
booktitle={Proceedings of the European Conference on Computer Vision (ECCV)},
year={2020}
}
Results on COCO-WholeBody-Hand val set
Arch | Input Size | PCK@0.2 | AUC | EPE | ckpt | log |
---|---|---|---|---|---|---|
pose_hrnetv2_w18 | 256x256 | 0.813 | 0.840 | 4.39 | ckpt | log |
Topdown Heatmap + Hrnetv2 + Dark + Coco + Wholebody on Coco_wholebody_hand¶
HRNetv2 (TPAMI'2019)
@article{WangSCJDZLMTWLX19,
title={Deep High-Resolution Representation Learning for Visual Recognition},
author={Jingdong Wang and Ke Sun and Tianheng Cheng and
Borui Jiang and Chaorui Deng and Yang Zhao and Dong Liu and Yadong Mu and
Mingkui Tan and Xinggang Wang and Wenyu Liu and Bin Xiao},
journal={TPAMI},
year={2019}
}
DarkPose (CVPR'2020)
@inproceedings{zhang2020distribution,
title={Distribution-aware coordinate representation for human pose estimation},
author={Zhang, Feng and Zhu, Xiatian and Dai, Hanbin and Ye, Mao and Zhu, Ce},
booktitle={Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition},
pages={7093--7102},
year={2020}
}
COCO-WholeBody-Hand (ECCV'2020)
@inproceedings{jin2020whole,
title={Whole-Body Human Pose Estimation in the Wild},
author={Jin, Sheng and Xu, Lumin and Xu, Jin and Wang, Can and Liu, Wentao and Qian, Chen and Ouyang, Wanli and Luo, Ping},
booktitle={Proceedings of the European Conference on Computer Vision (ECCV)},
year={2020}
}
Results on COCO-WholeBody-Hand val set
Arch | Input Size | PCK@0.2 | AUC | EPE | ckpt | log |
---|---|---|---|---|---|---|
pose_hrnetv2_w18_dark | 256x256 | 0.814 | 0.840 | 4.37 | ckpt | log |
Topdown Heatmap + Hrnetv2 + Dark on Onehand10k¶
HRNetv2 (TPAMI'2019)
@article{WangSCJDZLMTWLX19,
title={Deep High-Resolution Representation Learning for Visual Recognition},
author={Jingdong Wang and Ke Sun and Tianheng Cheng and
Borui Jiang and Chaorui Deng and Yang Zhao and Dong Liu and Yadong Mu and
Mingkui Tan and Xinggang Wang and Wenyu Liu and Bin Xiao},
journal={TPAMI},
year={2019}
}
DarkPose (CVPR'2020)
@inproceedings{zhang2020distribution,
title={Distribution-aware coordinate representation for human pose estimation},
author={Zhang, Feng and Zhu, Xiatian and Dai, Hanbin and Ye, Mao and Zhu, Ce},
booktitle={Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition},
pages={7093--7102},
year={2020}
}
OneHand10K (TCSVT'2019)
@article{wang2018mask,
title={Mask-pose cascaded cnn for 2d hand pose estimation from single color image},
author={Wang, Yangang and Peng, Cong and Liu, Yebin},
journal={IEEE Transactions on Circuits and Systems for Video Technology},
volume={29},
number={11},
pages={3258--3268},
year={2018},
publisher={IEEE}
}
Results on OneHand10K val set
Arch | Input Size | PCK@0.2 | AUC | EPE | ckpt | log |
---|---|---|---|---|---|---|
pose_hrnetv2_w18_dark | 256x256 | 0.990 | 0.573 | 23.84 | ckpt | log |
Topdown Heatmap + Hrnetv2 on Onehand10k¶
HRNetv2 (TPAMI'2019)
@article{WangSCJDZLMTWLX19,
title={Deep High-Resolution Representation Learning for Visual Recognition},
author={Jingdong Wang and Ke Sun and Tianheng Cheng and
Borui Jiang and Chaorui Deng and Yang Zhao and Dong Liu and Yadong Mu and
Mingkui Tan and Xinggang Wang and Wenyu Liu and Bin Xiao},
journal={TPAMI},
year={2019}
}
OneHand10K (TCSVT'2019)
@article{wang2018mask,
title={Mask-pose cascaded cnn for 2d hand pose estimation from single color image},
author={Wang, Yangang and Peng, Cong and Liu, Yebin},
journal={IEEE Transactions on Circuits and Systems for Video Technology},
volume={29},
number={11},
pages={3258--3268},
year={2018},
publisher={IEEE}
}
Results on OneHand10K val set
Arch | Input Size | PCK@0.2 | AUC | EPE | ckpt | log |
---|---|---|---|---|---|---|
pose_hrnetv2_w18 | 256x256 | 0.990 | 0.568 | 24.16 | ckpt | log |
Topdown Heatmap + Hrnetv2 + Udp on Onehand10k¶
HRNetv2 (TPAMI'2019)
@article{WangSCJDZLMTWLX19,
title={Deep High-Resolution Representation Learning for Visual Recognition},
author={Jingdong Wang and Ke Sun and Tianheng Cheng and
Borui Jiang and Chaorui Deng and Yang Zhao and Dong Liu and Yadong Mu and
Mingkui Tan and Xinggang Wang and Wenyu Liu and Bin Xiao},
journal={TPAMI},
year={2019}
}
UDP (CVPR'2020)
@InProceedings{Huang_2020_CVPR,
author = {Huang, Junjie and Zhu, Zheng and Guo, Feng and Huang, Guan},
title = {The Devil Is in the Details: Delving Into Unbiased Data Processing for Human Pose Estimation},
booktitle = {The IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)},
month = {June},
year = {2020}
}
OneHand10K (TCSVT'2019)
@article{wang2018mask,
title={Mask-pose cascaded cnn for 2d hand pose estimation from single color image},
author={Wang, Yangang and Peng, Cong and Liu, Yebin},
journal={IEEE Transactions on Circuits and Systems for Video Technology},
volume={29},
number={11},
pages={3258--3268},
year={2018},
publisher={IEEE}
}
Results on OneHand10K val set
Arch | Input Size | PCK@0.2 | AUC | EPE | ckpt | log |
---|---|---|---|---|---|---|
pose_hrnetv2_w18_udp | 256x256 | 0.990 | 0.572 | 23.87 | ckpt | log |
Topdown Heatmap + Hrnetv2 on Panoptic2d¶
HRNetv2 (TPAMI'2019)
@article{WangSCJDZLMTWLX19,
title={Deep High-Resolution Representation Learning for Visual Recognition},
author={Jingdong Wang and Ke Sun and Tianheng Cheng and
Borui Jiang and Chaorui Deng and Yang Zhao and Dong Liu and Yadong Mu and
Mingkui Tan and Xinggang Wang and Wenyu Liu and Bin Xiao},
journal={TPAMI},
year={2019}
}
CMU Panoptic HandDB (CVPR'2017)
@inproceedings{simon2017hand,
title={Hand keypoint detection in single images using multiview bootstrapping},
author={Simon, Tomas and Joo, Hanbyul and Matthews, Iain and Sheikh, Yaser},
booktitle={Proceedings of the IEEE conference on Computer Vision and Pattern Recognition},
pages={1145--1153},
year={2017}
}
Results on CMU Panoptic (MPII+NZSL val set)
Arch | Input Size | PCKh@0.7 | AUC | EPE | ckpt | log |
---|---|---|---|---|---|---|
pose_hrnetv2_w18 | 256x256 | 0.999 | 0.744 | 7.79 | ckpt | log |
Topdown Heatmap + Hrnetv2 + Udp on Panoptic2d¶
HRNetv2 (TPAMI'2019)
@article{WangSCJDZLMTWLX19,
title={Deep High-Resolution Representation Learning for Visual Recognition},
author={Jingdong Wang and Ke Sun and Tianheng Cheng and
Borui Jiang and Chaorui Deng and Yang Zhao and Dong Liu and Yadong Mu and
Mingkui Tan and Xinggang Wang and Wenyu Liu and Bin Xiao},
journal={TPAMI},
year={2019}
}
UDP (CVPR'2020)
@InProceedings{Huang_2020_CVPR,
author = {Huang, Junjie and Zhu, Zheng and Guo, Feng and Huang, Guan},
title = {The Devil Is in the Details: Delving Into Unbiased Data Processing for Human Pose Estimation},
booktitle = {The IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)},
month = {June},
year = {2020}
}
CMU Panoptic HandDB (CVPR'2017)
@inproceedings{simon2017hand,
title={Hand keypoint detection in single images using multiview bootstrapping},
author={Simon, Tomas and Joo, Hanbyul and Matthews, Iain and Sheikh, Yaser},
booktitle={Proceedings of the IEEE conference on Computer Vision and Pattern Recognition},
pages={1145--1153},
year={2017}
}
Results on CMU Panoptic (MPII+NZSL val set)
Arch | Input Size | PCKh@0.7 | AUC | EPE | ckpt | log |
---|---|---|---|---|---|---|
pose_hrnetv2_w18_udp | 256x256 | 0.998 | 0.742 | 7.84 | ckpt | log |
Topdown Heatmap + Hrnetv2 + Dark on Panoptic2d¶
HRNetv2 (TPAMI'2019)
@article{WangSCJDZLMTWLX19,
title={Deep High-Resolution Representation Learning for Visual Recognition},
author={Jingdong Wang and Ke Sun and Tianheng Cheng and
Borui Jiang and Chaorui Deng and Yang Zhao and Dong Liu and Yadong Mu and
Mingkui Tan and Xinggang Wang and Wenyu Liu and Bin Xiao},
journal={TPAMI},
year={2019}
}
DarkPose (CVPR'2020)
@inproceedings{zhang2020distribution,
title={Distribution-aware coordinate representation for human pose estimation},
author={Zhang, Feng and Zhu, Xiatian and Dai, Hanbin and Ye, Mao and Zhu, Ce},
booktitle={Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition},
pages={7093--7102},
year={2020}
}
CMU Panoptic HandDB (CVPR'2017)
@inproceedings{simon2017hand,
title={Hand keypoint detection in single images using multiview bootstrapping},
author={Simon, Tomas and Joo, Hanbyul and Matthews, Iain and Sheikh, Yaser},
booktitle={Proceedings of the IEEE conference on Computer Vision and Pattern Recognition},
pages={1145--1153},
year={2017}
}
Results on CMU Panoptic (MPII+NZSL val set)
Arch | Input Size | PCKh@0.7 | AUC | EPE | ckpt | log |
---|---|---|---|---|---|---|
pose_hrnetv2_w18_dark | 256x256 | 0.999 | 0.745 | 7.77 | ckpt | log |
Topdown Heatmap + Hrnetv2 on Rhd2d¶
HRNetv2 (TPAMI'2019)
@article{WangSCJDZLMTWLX19,
title={Deep High-Resolution Representation Learning for Visual Recognition},
author={Jingdong Wang and Ke Sun and Tianheng Cheng and
Borui Jiang and Chaorui Deng and Yang Zhao and Dong Liu and Yadong Mu and
Mingkui Tan and Xinggang Wang and Wenyu Liu and Bin Xiao},
journal={TPAMI},
year={2019}
}
RHD (ICCV'2017)
@TechReport{zb2017hand,
author={Christian Zimmermann and Thomas Brox},
title={Learning to Estimate 3D Hand Pose from Single RGB Images},
institution={arXiv:1705.01389},
year={2017},
note="https://arxiv.org/abs/1705.01389",
url="https://lmb.informatik.uni-freiburg.de/projects/hand3d/"
}
Results on RHD test set
Arch | Input Size | PCK@0.2 | AUC | EPE | ckpt | log |
---|---|---|---|---|---|---|
pose_hrnetv2_w18 | 256x256 | 0.992 | 0.902 | 2.21 | ckpt | log |
Topdown Heatmap + Hrnetv2 + Udp on Rhd2d¶
HRNetv2 (TPAMI'2019)
@article{WangSCJDZLMTWLX19,
title={Deep High-Resolution Representation Learning for Visual Recognition},
author={Jingdong Wang and Ke Sun and Tianheng Cheng and
Borui Jiang and Chaorui Deng and Yang Zhao and Dong Liu and Yadong Mu and
Mingkui Tan and Xinggang Wang and Wenyu Liu and Bin Xiao},
journal={TPAMI},
year={2019}
}
UDP (CVPR'2020)
@InProceedings{Huang_2020_CVPR,
author = {Huang, Junjie and Zhu, Zheng and Guo, Feng and Huang, Guan},
title = {The Devil Is in the Details: Delving Into Unbiased Data Processing for Human Pose Estimation},
booktitle = {The IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)},
month = {June},
year = {2020}
}
RHD (ICCV'2017)
@TechReport{zb2017hand,
author={Christian Zimmermann and Thomas Brox},
title={Learning to Estimate 3D Hand Pose from Single RGB Images},
institution={arXiv:1705.01389},
year={2017},
note="https://arxiv.org/abs/1705.01389",
url="https://lmb.informatik.uni-freiburg.de/projects/hand3d/"
}
Results on CMU Panoptic (MPII+NZSL val set)
Arch | Input Size | PCKh@0.7 | AUC | EPE | ckpt | log |
---|---|---|---|---|---|---|
pose_hrnetv2_w18_udp | 256x256 | 0.998 | 0.742 | 7.84 | ckpt | log |
Topdown Heatmap + Hrnetv2 + Dark on Rhd2d¶
HRNetv2 (TPAMI'2019)
@article{WangSCJDZLMTWLX19,
title={Deep High-Resolution Representation Learning for Visual Recognition},
author={Jingdong Wang and Ke Sun and Tianheng Cheng and
Borui Jiang and Chaorui Deng and Yang Zhao and Dong Liu and Yadong Mu and
Mingkui Tan and Xinggang Wang and Wenyu Liu and Bin Xiao},
journal={TPAMI},
year={2019}
}
DarkPose (CVPR'2020)
@inproceedings{zhang2020distribution,
title={Distribution-aware coordinate representation for human pose estimation},
author={Zhang, Feng and Zhu, Xiatian and Dai, Hanbin and Ye, Mao and Zhu, Ce},
booktitle={Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition},
pages={7093--7102},
year={2020}
}
RHD (ICCV'2017)
@TechReport{zb2017hand,
author={Christian Zimmermann and Thomas Brox},
title={Learning to Estimate 3D Hand Pose from Single RGB Images},
institution={arXiv:1705.01389},
year={2017},
note="https://arxiv.org/abs/1705.01389",
url="https://lmb.informatik.uni-freiburg.de/projects/hand3d/"
}
Results on RHD test set
Arch | Input Size | PCK@0.2 | AUC | EPE | ckpt | log |
---|---|---|---|---|---|---|
pose_hrnetv2_w18_dark | 256x256 | 0.992 | 0.903 | 2.17 | ckpt | log |
Hourglass (ECCV’2016)¶
Topdown Heatmap + Hourglass on Coco¶
Hourglass (ECCV'2016)
@inproceedings{newell2016stacked,
title={Stacked hourglass networks for human pose estimation},
author={Newell, Alejandro and Yang, Kaiyu and Deng, Jia},
booktitle={European conference on computer vision},
pages={483--499},
year={2016},
organization={Springer}
}
COCO (ECCV'2014)
@inproceedings{lin2014microsoft,
title={Microsoft coco: Common objects in context},
author={Lin, Tsung-Yi and Maire, Michael and Belongie, Serge and Hays, James and Perona, Pietro and Ramanan, Deva and Doll{\'a}r, Piotr and Zitnick, C Lawrence},
booktitle={European conference on computer vision},
pages={740--755},
year={2014},
organization={Springer}
}
Results on COCO val2017 with detector having human AP of 56.4 on COCO val2017 dataset
Arch | Input Size | AP | AP50 | AP75 | AR | AR50 | ckpt | log |
---|---|---|---|---|---|---|---|---|
pose_hourglass_52 | 256x256 | 0.726 | 0.896 | 0.799 | 0.780 | 0.934 | ckpt | log |
pose_hourglass_52 | 384x384 | 0.746 | 0.900 | 0.813 | 0.797 | 0.939 | ckpt | log |
Topdown Heatmap + Hourglass on Mpii¶
Hourglass (ECCV'2016)
@inproceedings{newell2016stacked,
title={Stacked hourglass networks for human pose estimation},
author={Newell, Alejandro and Yang, Kaiyu and Deng, Jia},
booktitle={European conference on computer vision},
pages={483--499},
year={2016},
organization={Springer}
}
MPII (CVPR'2014)
@inproceedings{andriluka14cvpr,
author = {Mykhaylo Andriluka and Leonid Pishchulin and Peter Gehler and Schiele, Bernt},
title = {2D Human Pose Estimation: New Benchmark and State of the Art Analysis},
booktitle = {IEEE Conference on Computer Vision and Pattern Recognition (CVPR)},
year = {2014},
month = {June}
}
Results on MPII val set
Arch | Input Size | Mean | Mean@0.1 | ckpt | log |
---|---|---|---|---|---|
pose_hourglass_52 | 256x256 | 0.889 | 0.317 | ckpt | log |
pose_hourglass_52 | 384x384 | 0.894 | 0.366 | ckpt | log |
Topdown Heatmap + Hourglass + Coco + Wholebody on Coco_wholebody_face¶
Hourglass (ECCV'2016)
@inproceedings{newell2016stacked,
title={Stacked hourglass networks for human pose estimation},
author={Newell, Alejandro and Yang, Kaiyu and Deng, Jia},
booktitle={European conference on computer vision},
pages={483--499},
year={2016},
organization={Springer}
}
COCO-WholeBody-Face (ECCV'2020)
@inproceedings{jin2020whole,
title={Whole-Body Human Pose Estimation in the Wild},
author={Jin, Sheng and Xu, Lumin and Xu, Jin and Wang, Can and Liu, Wentao and Qian, Chen and Ouyang, Wanli and Luo, Ping},
booktitle={Proceedings of the European Conference on Computer Vision (ECCV)},
year={2020}
}
Results on COCO-WholeBody-Face val set
Arch | Input Size | NME | ckpt | log |
---|---|---|---|---|
pose_hourglass_52 | 256x256 | 0.0586 | ckpt | log |
Topdown Heatmap + Hourglass + Coco + Wholebody on Coco_wholebody_hand¶
Hourglass (ECCV'2016)
@inproceedings{newell2016stacked,
title={Stacked hourglass networks for human pose estimation},
author={Newell, Alejandro and Yang, Kaiyu and Deng, Jia},
booktitle={European conference on computer vision},
pages={483--499},
year={2016},
organization={Springer}
}
COCO-WholeBody-Hand (ECCV'2020)
@inproceedings{jin2020whole,
title={Whole-Body Human Pose Estimation in the Wild},
author={Jin, Sheng and Xu, Lumin and Xu, Jin and Wang, Can and Liu, Wentao and Qian, Chen and Ouyang, Wanli and Luo, Ping},
booktitle={Proceedings of the European Conference on Computer Vision (ECCV)},
year={2020}
}
Results on COCO-WholeBody-Hand val set
Arch | Input Size | PCK@0.2 | AUC | EPE | ckpt | log |
---|---|---|---|---|---|---|
pose_hourglass_52 | 256x256 | 0.804 | 0.835 | 4.54 | ckpt | log |
DeepPose (CVPR’2014)¶
Deeppose + Resnet on Coco¶
DeepPose (CVPR'2014)
@inproceedings{toshev2014deeppose,
title={Deeppose: Human pose estimation via deep neural networks},
author={Toshev, Alexander and Szegedy, Christian},
booktitle={Proceedings of the IEEE conference on computer vision and pattern recognition},
pages={1653--1660},
year={2014}
}
ResNet (CVPR'2016)
@inproceedings{he2016deep,
title={Deep residual learning for image recognition},
author={He, Kaiming and Zhang, Xiangyu and Ren, Shaoqing and Sun, Jian},
booktitle={Proceedings of the IEEE conference on computer vision and pattern recognition},
pages={770--778},
year={2016}
}
COCO (ECCV'2014)
@inproceedings{lin2014microsoft,
title={Microsoft coco: Common objects in context},
author={Lin, Tsung-Yi and Maire, Michael and Belongie, Serge and Hays, James and Perona, Pietro and Ramanan, Deva and Doll{\'a}r, Piotr and Zitnick, C Lawrence},
booktitle={European conference on computer vision},
pages={740--755},
year={2014},
organization={Springer}
}
Results on COCO val2017 with detector having human AP of 56.4 on COCO val2017 dataset
Arch | Input Size | AP | AP50 | AP75 | AR | AR50 | ckpt | log |
---|---|---|---|---|---|---|---|---|
deeppose_resnet_50 | 256x192 | 0.526 | 0.816 | 0.586 | 0.638 | 0.887 | ckpt | log |
deeppose_resnet_101 | 256x192 | 0.560 | 0.832 | 0.628 | 0.668 | 0.900 | ckpt | log |
deeppose_resnet_152 | 256x192 | 0.583 | 0.843 | 0.659 | 0.686 | 0.907 | ckpt | log |
Deeppose + Resnet on Mpii¶
DeepPose (CVPR'2014)
@inproceedings{toshev2014deeppose,
title={Deeppose: Human pose estimation via deep neural networks},
author={Toshev, Alexander and Szegedy, Christian},
booktitle={Proceedings of the IEEE conference on computer vision and pattern recognition},
pages={1653--1660},
year={2014}
}
ResNet (CVPR'2016)
@inproceedings{he2016deep,
title={Deep residual learning for image recognition},
author={He, Kaiming and Zhang, Xiangyu and Ren, Shaoqing and Sun, Jian},
booktitle={Proceedings of the IEEE conference on computer vision and pattern recognition},
pages={770--778},
year={2016}
}
MPII (CVPR'2014)
@inproceedings{andriluka14cvpr,
author = {Mykhaylo Andriluka and Leonid Pishchulin and Peter Gehler and Schiele, Bernt},
title = {2D Human Pose Estimation: New Benchmark and State of the Art Analysis},
booktitle = {IEEE Conference on Computer Vision and Pattern Recognition (CVPR)},
year = {2014},
month = {June}
}
Results on MPII val set
Arch | Input Size | Mean | Mean@0.1 | ckpt | log |
---|---|---|---|---|---|
deeppose_resnet_50 | 256x256 | 0.825 | 0.174 | ckpt | log |
deeppose_resnet_101 | 256x256 | 0.841 | 0.193 | ckpt | log |
deeppose_resnet_152 | 256x256 | 0.850 | 0.198 | ckpt | log |
Deeppose + Resnet + Softwingloss on WFLW¶
DeepPose (CVPR'2014)
@inproceedings{toshev2014deeppose,
title={Deeppose: Human pose estimation via deep neural networks},
author={Toshev, Alexander and Szegedy, Christian},
booktitle={Proceedings of the IEEE conference on computer vision and pattern recognition},
pages={1653--1660},
year={2014}
}
ResNet (CVPR'2016)
@inproceedings{he2016deep,
title={Deep residual learning for image recognition},
author={He, Kaiming and Zhang, Xiangyu and Ren, Shaoqing and Sun, Jian},
booktitle={Proceedings of the IEEE conference on computer vision and pattern recognition},
pages={770--778},
year={2016}
}
SoftWingloss (TIP'2021)
@article{lin2021structure,
title={Structure-Coherent Deep Feature Learning for Robust Face Alignment},
author={Lin, Chunze and Zhu, Beier and Wang, Quan and Liao, Renjie and Qian, Chen and Lu, Jiwen and Zhou, Jie},
journal={IEEE Transactions on Image Processing},
year={2021},
publisher={IEEE}
}
WFLW (CVPR'2018)
@inproceedings{wu2018look,
title={Look at boundary: A boundary-aware face alignment algorithm},
author={Wu, Wayne and Qian, Chen and Yang, Shuo and Wang, Quan and Cai, Yici and Zhou, Qiang},
booktitle={Proceedings of the IEEE conference on computer vision and pattern recognition},
pages={2129--2138},
year={2018}
}
Results on WFLW dataset
The model is trained on WFLW train.
Arch | Input Size | NMEtest | NMEpose | NMEillumination | NMEocclusion | NMEblur | NMEmakeup | NMEexpression | ckpt | log |
---|---|---|---|---|---|---|---|---|---|---|
deeppose_res50_softwingloss | 256x256 | 4.41 | 7.77 | 4.37 | 5.27 | 5.01 | 4.36 | 4.70 | ckpt | log |
Deeppose + Resnet on WFLW¶
DeepPose (CVPR'2014)
@inproceedings{toshev2014deeppose,
title={Deeppose: Human pose estimation via deep neural networks},
author={Toshev, Alexander and Szegedy, Christian},
booktitle={Proceedings of the IEEE conference on computer vision and pattern recognition},
pages={1653--1660},
year={2014}
}
ResNet (CVPR'2016)
@inproceedings{he2016deep,
title={Deep residual learning for image recognition},
author={He, Kaiming and Zhang, Xiangyu and Ren, Shaoqing and Sun, Jian},
booktitle={Proceedings of the IEEE conference on computer vision and pattern recognition},
pages={770--778},
year={2016}
}
WFLW (CVPR'2018)
@inproceedings{wu2018look,
title={Look at boundary: A boundary-aware face alignment algorithm},
author={Wu, Wayne and Qian, Chen and Yang, Shuo and Wang, Quan and Cai, Yici and Zhou, Qiang},
booktitle={Proceedings of the IEEE conference on computer vision and pattern recognition},
pages={2129--2138},
year={2018}
}
Results on WFLW dataset
The model is trained on WFLW train.
Arch | Input Size | NMEtest | NMEpose | NMEillumination | NMEocclusion | NMEblur | NMEmakeup | NMEexpression | ckpt | log |
---|---|---|---|---|---|---|---|---|---|---|
deeppose_res50 | 256x256 | 4.85 | 8.50 | 4.81 | 5.69 | 5.45 | 4.82 | 5.20 | ckpt | log |
Deeppose + Resnet + Wingloss on WFLW¶
DeepPose (CVPR'2014)
@inproceedings{toshev2014deeppose,
title={Deeppose: Human pose estimation via deep neural networks},
author={Toshev, Alexander and Szegedy, Christian},
booktitle={Proceedings of the IEEE conference on computer vision and pattern recognition},
pages={1653--1660},
year={2014}
}
ResNet (CVPR'2016)
@inproceedings{he2016deep,
title={Deep residual learning for image recognition},
author={He, Kaiming and Zhang, Xiangyu and Ren, Shaoqing and Sun, Jian},
booktitle={Proceedings of the IEEE conference on computer vision and pattern recognition},
pages={770--778},
year={2016}
}
Wingloss (CVPR'2018)
@inproceedings{feng2018wing,
title={Wing Loss for Robust Facial Landmark Localisation with Convolutional Neural Networks},
author={Feng, Zhen-Hua and Kittler, Josef and Awais, Muhammad and Huber, Patrik and Wu, Xiao-Jun},
booktitle={Computer Vision and Pattern Recognition (CVPR), 2018 IEEE Conference on},
year={2018},
pages ={2235-2245},
organization={IEEE}
}
WFLW (CVPR'2018)
@inproceedings{wu2018look,
title={Look at boundary: A boundary-aware face alignment algorithm},
author={Wu, Wayne and Qian, Chen and Yang, Shuo and Wang, Quan and Cai, Yici and Zhou, Qiang},
booktitle={Proceedings of the IEEE conference on computer vision and pattern recognition},
pages={2129--2138},
year={2018}
}
Results on WFLW dataset
The model is trained on WFLW train.
Arch | Input Size | NMEtest | NMEpose | NMEillumination | NMEocclusion | NMEblur | NMEmakeup | NMEexpression | ckpt | log |
---|---|---|---|---|---|---|---|---|---|---|
deeppose_res50_wingloss | 256x256 | 4.64 | 8.25 | 4.59 | 5.56 | 5.26 | 4.59 | 5.07 | ckpt | log |
Deeppose + Resnet on Deepfashion¶
DeepPose (CVPR'2014)
@inproceedings{toshev2014deeppose,
title={Deeppose: Human pose estimation via deep neural networks},
author={Toshev, Alexander and Szegedy, Christian},
booktitle={Proceedings of the IEEE conference on computer vision and pattern recognition},
pages={1653--1660},
year={2014}
}
ResNet (CVPR'2016)
@inproceedings{he2016deep,
title={Deep residual learning for image recognition},
author={He, Kaiming and Zhang, Xiangyu and Ren, Shaoqing and Sun, Jian},
booktitle={Proceedings of the IEEE conference on computer vision and pattern recognition},
pages={770--778},
year={2016}
}
DeepFashion (CVPR'2016)
@inproceedings{liuLQWTcvpr16DeepFashion,
author = {Liu, Ziwei and Luo, Ping and Qiu, Shi and Wang, Xiaogang and Tang, Xiaoou},
title = {DeepFashion: Powering Robust Clothes Recognition and Retrieval with Rich Annotations},
booktitle = {Proceedings of IEEE Conference on Computer Vision and Pattern Recognition (CVPR)},
month = {June},
year = {2016}
}
DeepFashion (ECCV'2016)
@inproceedings{liuYLWTeccv16FashionLandmark,
author = {Liu, Ziwei and Yan, Sijie and Luo, Ping and Wang, Xiaogang and Tang, Xiaoou},
title = {Fashion Landmark Detection in the Wild},
booktitle = {European Conference on Computer Vision (ECCV)},
month = {October},
year = {2016}
}
Results on DeepFashion val set
Set | Arch | Input Size | PCK@0.2 | AUC | EPE | ckpt | log |
---|---|---|---|---|---|---|---|
upper | deeppose_resnet_50 | 256x256 | 0.965 | 0.535 | 17.2 | ckpt | log |
lower | deeppose_resnet_50 | 256x256 | 0.971 | 0.678 | 11.8 | ckpt | log |
full | deeppose_resnet_50 | 256x256 | 0.983 | 0.602 | 14.0 | ckpt | log |
Deeppose + Resnet on Onehand10k¶
DeepPose (CVPR'2014)
@inproceedings{toshev2014deeppose,
title={Deeppose: Human pose estimation via deep neural networks},
author={Toshev, Alexander and Szegedy, Christian},
booktitle={Proceedings of the IEEE conference on computer vision and pattern recognition},
pages={1653--1660},
year={2014}
}
ResNet (CVPR'2016)
@inproceedings{he2016deep,
title={Deep residual learning for image recognition},
author={He, Kaiming and Zhang, Xiangyu and Ren, Shaoqing and Sun, Jian},
booktitle={Proceedings of the IEEE conference on computer vision and pattern recognition},
pages={770--778},
year={2016}
}
OneHand10K (TCSVT'2019)
@article{wang2018mask,
title={Mask-pose cascaded cnn for 2d hand pose estimation from single color image},
author={Wang, Yangang and Peng, Cong and Liu, Yebin},
journal={IEEE Transactions on Circuits and Systems for Video Technology},
volume={29},
number={11},
pages={3258--3268},
year={2018},
publisher={IEEE}
}
Results on OneHand10K val set
Arch | Input Size | PCK@0.2 | AUC | EPE | ckpt | log |
---|---|---|---|---|---|---|
deeppose_resnet_50 | 256x256 | 0.990 | 0.486 | 34.28 | ckpt | log |
Deeppose + Resnet on Panoptic2d¶
DeepPose (CVPR'2014)
@inproceedings{toshev2014deeppose,
title={Deeppose: Human pose estimation via deep neural networks},
author={Toshev, Alexander and Szegedy, Christian},
booktitle={Proceedings of the IEEE conference on computer vision and pattern recognition},
pages={1653--1660},
year={2014}
}
ResNet (CVPR'2016)
@inproceedings{he2016deep,
title={Deep residual learning for image recognition},
author={He, Kaiming and Zhang, Xiangyu and Ren, Shaoqing and Sun, Jian},
booktitle={Proceedings of the IEEE conference on computer vision and pattern recognition},
pages={770--778},
year={2016}
}
CMU Panoptic HandDB (CVPR'2017)
@inproceedings{simon2017hand,
title={Hand keypoint detection in single images using multiview bootstrapping},
author={Simon, Tomas and Joo, Hanbyul and Matthews, Iain and Sheikh, Yaser},
booktitle={Proceedings of the IEEE conference on Computer Vision and Pattern Recognition},
pages={1145--1153},
year={2017}
}
Results on CMU Panoptic (MPII+NZSL val set)
Arch | Input Size | PCKh@0.7 | AUC | EPE | ckpt | log |
---|---|---|---|---|---|---|
deeppose_resnet_50 | 256x256 | 0.999 | 0.686 | 9.36 | ckpt | log |
Deeppose + Resnet on Rhd2d¶
DeepPose (CVPR'2014)
@inproceedings{toshev2014deeppose,
title={Deeppose: Human pose estimation via deep neural networks},
author={Toshev, Alexander and Szegedy, Christian},
booktitle={Proceedings of the IEEE conference on computer vision and pattern recognition},
pages={1653--1660},
year={2014}
}
ResNet (CVPR'2016)
@inproceedings{he2016deep,
title={Deep residual learning for image recognition},
author={He, Kaiming and Zhang, Xiangyu and Ren, Shaoqing and Sun, Jian},
booktitle={Proceedings of the IEEE conference on computer vision and pattern recognition},
pages={770--778},
year={2016}
}
RHD (ICCV'2017)
@TechReport{zb2017hand,
author={Christian Zimmermann and Thomas Brox},
title={Learning to Estimate 3D Hand Pose from Single RGB Images},
institution={arXiv:1705.01389},
year={2017},
note="https://arxiv.org/abs/1705.01389",
url="https://lmb.informatik.uni-freiburg.de/projects/hand3d/"
}
Results on RHD test set
Arch | Input Size | PCK@0.2 | AUC | EPE | ckpt | log |
---|---|---|---|---|---|---|
deeppose_resnet_50 | 256x256 | 0.988 | 0.865 | 3.29 | ckpt | log |
Wingloss (CVPR’2018)¶
Deeppose + Resnet + Wingloss on WFLW¶
DeepPose (CVPR'2014)
@inproceedings{toshev2014deeppose,
title={Deeppose: Human pose estimation via deep neural networks},
author={Toshev, Alexander and Szegedy, Christian},
booktitle={Proceedings of the IEEE conference on computer vision and pattern recognition},
pages={1653--1660},
year={2014}
}
ResNet (CVPR'2016)
@inproceedings{he2016deep,
title={Deep residual learning for image recognition},
author={He, Kaiming and Zhang, Xiangyu and Ren, Shaoqing and Sun, Jian},
booktitle={Proceedings of the IEEE conference on computer vision and pattern recognition},
pages={770--778},
year={2016}
}
Wingloss (CVPR'2018)
@inproceedings{feng2018wing,
title={Wing Loss for Robust Facial Landmark Localisation with Convolutional Neural Networks},
author={Feng, Zhen-Hua and Kittler, Josef and Awais, Muhammad and Huber, Patrik and Wu, Xiao-Jun},
booktitle={Computer Vision and Pattern Recognition (CVPR), 2018 IEEE Conference on},
year={2018},
pages ={2235-2245},
organization={IEEE}
}
WFLW (CVPR'2018)
@inproceedings{wu2018look,
title={Look at boundary: A boundary-aware face alignment algorithm},
author={Wu, Wayne and Qian, Chen and Yang, Shuo and Wang, Quan and Cai, Yici and Zhou, Qiang},
booktitle={Proceedings of the IEEE conference on computer vision and pattern recognition},
pages={2129--2138},
year={2018}
}
Results on WFLW dataset
The model is trained on WFLW train.
Arch | Input Size | NMEtest | NMEpose | NMEillumination | NMEocclusion | NMEblur | NMEmakeup | NMEexpression | ckpt | log |
---|---|---|---|---|---|---|---|---|---|---|
deeppose_res50_wingloss | 256x256 | 4.64 | 8.25 | 4.59 | 5.56 | 5.26 | 4.59 | 5.07 | ckpt | log |
ViPNAS (CVPR’2021)¶
Topdown Heatmap + Vipnas on Coco¶
ViPNAS (CVPR'2021)
@article{xu2021vipnas,
title={ViPNAS: Efficient Video Pose Estimation via Neural Architecture Search},
author={Xu, Lumin and Guan, Yingda and Jin, Sheng and Liu, Wentao and Qian, Chen and Luo, Ping and Ouyang, Wanli and Wang, Xiaogang},
booktitle={Proceedings of the IEEE conference on computer vision and pattern recognition},
year={2021}
}
COCO (ECCV'2014)
@inproceedings{lin2014microsoft,
title={Microsoft coco: Common objects in context},
author={Lin, Tsung-Yi and Maire, Michael and Belongie, Serge and Hays, James and Perona, Pietro and Ramanan, Deva and Doll{\'a}r, Piotr and Zitnick, C Lawrence},
booktitle={European conference on computer vision},
pages={740--755},
year={2014},
organization={Springer}
}
Results on COCO val2017 with detector having human AP of 56.4 on COCO val2017 dataset
Arch | Input Size | AP | AP50 | AP75 | AR | AR50 | ckpt | log |
---|---|---|---|---|---|---|---|---|
S-ViPNAS-MobileNetV3 | 256x192 | 0.700 | 0.887 | 0.778 | 0.757 | 0.929 | ckpt | log |
S-ViPNAS-Res50 | 256x192 | 0.711 | 0.893 | 0.789 | 0.769 | 0.934 | ckpt | log |
Topdown Heatmap + Vipnas + Dark on Coco-Wholebody¶
ViPNAS (CVPR'2021)
@article{xu2021vipnas,
title={ViPNAS: Efficient Video Pose Estimation via Neural Architecture Search},
author={Xu, Lumin and Guan, Yingda and Jin, Sheng and Liu, Wentao and Qian, Chen and Luo, Ping and Ouyang, Wanli and Wang, Xiaogang},
booktitle={Proceedings of the IEEE conference on computer vision and pattern recognition},
year={2021}
}
DarkPose (CVPR'2020)
@inproceedings{zhang2020distribution,
title={Distribution-aware coordinate representation for human pose estimation},
author={Zhang, Feng and Zhu, Xiatian and Dai, Hanbin and Ye, Mao and Zhu, Ce},
booktitle={Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition},
pages={7093--7102},
year={2020}
}
COCO-WholeBody (ECCV'2020)
@inproceedings{jin2020whole,
title={Whole-Body Human Pose Estimation in the Wild},
author={Jin, Sheng and Xu, Lumin and Xu, Jin and Wang, Can and Liu, Wentao and Qian, Chen and Ouyang, Wanli and Luo, Ping},
booktitle={Proceedings of the European Conference on Computer Vision (ECCV)},
year={2020}
}
Results on COCO-WholeBody v1.0 val with detector having human AP of 56.4 on COCO val2017 dataset
Arch | Input Size | Body AP | Body AR | Foot AP | Foot AR | Face AP | Face AR | Hand AP | Hand AR | Whole AP | Whole AR | ckpt | log |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|
S-ViPNAS-MobileNetV3_dark | 256x192 | 0.632 | 0.710 | 0.530 | 0.660 | 0.672 | 0.771 | 0.404 | 0.519 | 0.508 | 0.607 | ckpt | log |
S-ViPNAS-Res50_dark | 256x192 | 0.650 | 0.732 | 0.550 | 0.686 | 0.684 | 0.784 | 0.437 | 0.554 | 0.528 | 0.632 | ckpt | log |
Topdown Heatmap + Vipnas on Coco-Wholebody¶
ViPNAS (CVPR'2021)
@article{xu2021vipnas,
title={ViPNAS: Efficient Video Pose Estimation via Neural Architecture Search},
author={Xu, Lumin and Guan, Yingda and Jin, Sheng and Liu, Wentao and Qian, Chen and Luo, Ping and Ouyang, Wanli and Wang, Xiaogang},
booktitle={Proceedings of the IEEE conference on computer vision and pattern recognition},
year={2021}
}
COCO-WholeBody (ECCV'2020)
@inproceedings{jin2020whole,
title={Whole-Body Human Pose Estimation in the Wild},
author={Jin, Sheng and Xu, Lumin and Xu, Jin and Wang, Can and Liu, Wentao and Qian, Chen and Ouyang, Wanli and Luo, Ping},
booktitle={Proceedings of the European Conference on Computer Vision (ECCV)},
year={2020}
}
Results on COCO-WholeBody v1.0 val with detector having human AP of 56.4 on COCO val2017 dataset
Arch | Input Size | Body AP | Body AR | Foot AP | Foot AR | Face AP | Face AR | Hand AP | Hand AR | Whole AP | Whole AR | ckpt | log |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|
S-ViPNAS-MobileNetV3 | 256x192 | 0.619 | 0.700 | 0.477 | 0.608 | 0.585 | 0.689 | 0.386 | 0.505 | 0.473 | 0.578 | ckpt | log |
S-ViPNAS-Res50 | 256x192 | 0.643 | 0.726 | 0.553 | 0.694 | 0.587 | 0.698 | 0.410 | 0.529 | 0.495 | 0.607 | ckpt | log |
DarkPose (CVPR’2020)¶
Topdown Heatmap + Resnet + Dark on Coco¶
SimpleBaseline2D (ECCV'2018)
@inproceedings{xiao2018simple,
title={Simple baselines for human pose estimation and tracking},
author={Xiao, Bin and Wu, Haiping and Wei, Yichen},
booktitle={Proceedings of the European conference on computer vision (ECCV)},
pages={466--481},
year={2018}
}
ResNet (CVPR'2016)
@inproceedings{he2016deep,
title={Deep residual learning for image recognition},
author={He, Kaiming and Zhang, Xiangyu and Ren, Shaoqing and Sun, Jian},
booktitle={Proceedings of the IEEE conference on computer vision and pattern recognition},
pages={770--778},
year={2016}
}
DarkPose (CVPR'2020)
@inproceedings{zhang2020distribution,
title={Distribution-aware coordinate representation for human pose estimation},
author={Zhang, Feng and Zhu, Xiatian and Dai, Hanbin and Ye, Mao and Zhu, Ce},
booktitle={Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition},
pages={7093--7102},
year={2020}
}
COCO (ECCV'2014)
@inproceedings{lin2014microsoft,
title={Microsoft coco: Common objects in context},
author={Lin, Tsung-Yi and Maire, Michael and Belongie, Serge and Hays, James and Perona, Pietro and Ramanan, Deva and Doll{\'a}r, Piotr and Zitnick, C Lawrence},
booktitle={European conference on computer vision},
pages={740--755},
year={2014},
organization={Springer}
}
Results on COCO val2017 with detector having human AP of 56.4 on COCO val2017 dataset
Arch | Input Size | AP | AP50 | AP75 | AR | AR50 | ckpt | log |
---|---|---|---|---|---|---|---|---|
pose_resnet_50_dark | 256x192 | 0.724 | 0.898 | 0.800 | 0.777 | 0.936 | ckpt | log |
pose_resnet_50_dark | 384x288 | 0.735 | 0.900 | 0.801 | 0.785 | 0.937 | ckpt | log |
pose_resnet_101_dark | 256x192 | 0.732 | 0.899 | 0.808 | 0.786 | 0.938 | ckpt | log |
pose_resnet_101_dark | 384x288 | 0.749 | 0.902 | 0.816 | 0.799 | 0.939 | ckpt | log |
pose_resnet_152_dark | 256x192 | 0.745 | 0.905 | 0.821 | 0.797 | 0.942 | ckpt | log |
pose_resnet_152_dark | 384x288 | 0.757 | 0.909 | 0.826 | 0.806 | 0.943 | ckpt | log |
Topdown Heatmap + Hrnet + Dark on Coco¶
HRNet (CVPR'2019)
@inproceedings{sun2019deep,
title={Deep high-resolution representation learning for human pose estimation},
author={Sun, Ke and Xiao, Bin and Liu, Dong and Wang, Jingdong},
booktitle={Proceedings of the IEEE conference on computer vision and pattern recognition},
pages={5693--5703},
year={2019}
}
DarkPose (CVPR'2020)
@inproceedings{zhang2020distribution,
title={Distribution-aware coordinate representation for human pose estimation},
author={Zhang, Feng and Zhu, Xiatian and Dai, Hanbin and Ye, Mao and Zhu, Ce},
booktitle={Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition},
pages={7093--7102},
year={2020}
}
COCO (ECCV'2014)
@inproceedings{lin2014microsoft,
title={Microsoft coco: Common objects in context},
author={Lin, Tsung-Yi and Maire, Michael and Belongie, Serge and Hays, James and Perona, Pietro and Ramanan, Deva and Doll{\'a}r, Piotr and Zitnick, C Lawrence},
booktitle={European conference on computer vision},
pages={740--755},
year={2014},
organization={Springer}
}
Results on COCO val2017 with detector having human AP of 56.4 on COCO val2017 dataset
Arch | Input Size | AP | AP50 | AP75 | AR | AR50 | ckpt | log |
---|---|---|---|---|---|---|---|---|
pose_hrnet_w32_dark | 256x192 | 0.757 | 0.907 | 0.823 | 0.808 | 0.943 | ckpt | log |
pose_hrnet_w32_dark | 384x288 | 0.766 | 0.907 | 0.831 | 0.815 | 0.943 | ckpt | log |
pose_hrnet_w48_dark | 256x192 | 0.764 | 0.907 | 0.830 | 0.814 | 0.943 | ckpt | log |
pose_hrnet_w48_dark | 384x288 | 0.772 | 0.910 | 0.836 | 0.820 | 0.946 | ckpt | log |
Topdown Heatmap + Hrnet + Dark on Mpii¶
HRNet (CVPR'2019)
@inproceedings{sun2019deep,
title={Deep high-resolution representation learning for human pose estimation},
author={Sun, Ke and Xiao, Bin and Liu, Dong and Wang, Jingdong},
booktitle={Proceedings of the IEEE conference on computer vision and pattern recognition},
pages={5693--5703},
year={2019}
}
DarkPose (CVPR'2020)
@inproceedings{zhang2020distribution,
title={Distribution-aware coordinate representation for human pose estimation},
author={Zhang, Feng and Zhu, Xiatian and Dai, Hanbin and Ye, Mao and Zhu, Ce},
booktitle={Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition},
pages={7093--7102},
year={2020}
}
MPII (CVPR'2014)
@inproceedings{andriluka14cvpr,
author = {Mykhaylo Andriluka and Leonid Pishchulin and Peter Gehler and Schiele, Bernt},
title = {2D Human Pose Estimation: New Benchmark and State of the Art Analysis},
booktitle = {IEEE Conference on Computer Vision and Pattern Recognition (CVPR)},
year = {2014},
month = {June}
}
Results on MPII val set
Arch | Input Size | Mean | Mean@0.1 | ckpt | log |
---|---|---|---|---|---|
pose_hrnet_w32_dark | 256x256 | 0.904 | 0.354 | ckpt | log |
pose_hrnet_w48_dark | 256x256 | 0.905 | 0.360 | ckpt | log |
Topdown Heatmap + Hrnetv2 + Dark on Aflw¶
HRNetv2 (TPAMI'2019)
@article{WangSCJDZLMTWLX19,
title={Deep High-Resolution Representation Learning for Visual Recognition},
author={Jingdong Wang and Ke Sun and Tianheng Cheng and
Borui Jiang and Chaorui Deng and Yang Zhao and Dong Liu and Yadong Mu and
Mingkui Tan and Xinggang Wang and Wenyu Liu and Bin Xiao},
journal={TPAMI},
year={2019}
}
DarkPose (CVPR'2020)
@inproceedings{zhang2020distribution,
title={Distribution-aware coordinate representation for human pose estimation},
author={Zhang, Feng and Zhu, Xiatian and Dai, Hanbin and Ye, Mao and Zhu, Ce},
booktitle={Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition},
pages={7093--7102},
year={2020}
}
AFLW (ICCVW'2011)
@inproceedings{koestinger2011annotated,
title={Annotated facial landmarks in the wild: A large-scale, real-world database for facial landmark localization},
author={Koestinger, Martin and Wohlhart, Paul and Roth, Peter M and Bischof, Horst},
booktitle={2011 IEEE international conference on computer vision workshops (ICCV workshops)},
pages={2144--2151},
year={2011},
organization={IEEE}
}
Results on AFLW dataset
The model is trained on AFLW train and evaluated on AFLW full and frontal.
Arch | Input Size | NMEfull | NMEfrontal | ckpt | log |
---|---|---|---|---|---|
pose_hrnetv2_w18_dark | 256x256 | 1.34 | 1.20 | ckpt | log |
Topdown Heatmap + Hrnetv2 + Dark + Coco + Wholebody on Coco_wholebody_face¶
HRNetv2 (TPAMI'2019)
@article{WangSCJDZLMTWLX19,
title={Deep High-Resolution Representation Learning for Visual Recognition},
author={Jingdong Wang and Ke Sun and Tianheng Cheng and
Borui Jiang and Chaorui Deng and Yang Zhao and Dong Liu and Yadong Mu and
Mingkui Tan and Xinggang Wang and Wenyu Liu and Bin Xiao},
journal={TPAMI},
year={2019}
}
DarkPose (CVPR'2020)
@inproceedings{zhang2020distribution,
title={Distribution-aware coordinate representation for human pose estimation},
author={Zhang, Feng and Zhu, Xiatian and Dai, Hanbin and Ye, Mao and Zhu, Ce},
booktitle={Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition},
pages={7093--7102},
year={2020}
}
COCO-WholeBody-Face (ECCV'2020)
@inproceedings{jin2020whole,
title={Whole-Body Human Pose Estimation in the Wild},
author={Jin, Sheng and Xu, Lumin and Xu, Jin and Wang, Can and Liu, Wentao and Qian, Chen and Ouyang, Wanli and Luo, Ping},
booktitle={Proceedings of the European Conference on Computer Vision (ECCV)},
year={2020}
}
Results on COCO-WholeBody-Face val set
Arch | Input Size | NME | ckpt | log |
---|---|---|---|---|
pose_hrnetv2_w18_dark | 256x256 | 0.0513 | ckpt | log |
Topdown Heatmap + Hrnetv2 + Dark on WFLW¶
HRNetv2 (TPAMI'2019)
@article{WangSCJDZLMTWLX19,
title={Deep High-Resolution Representation Learning for Visual Recognition},
author={Jingdong Wang and Ke Sun and Tianheng Cheng and
Borui Jiang and Chaorui Deng and Yang Zhao and Dong Liu and Yadong Mu and
Mingkui Tan and Xinggang Wang and Wenyu Liu and Bin Xiao},
journal={TPAMI},
year={2019}
}
DarkPose (CVPR'2020)
@inproceedings{zhang2020distribution,
title={Distribution-aware coordinate representation for human pose estimation},
author={Zhang, Feng and Zhu, Xiatian and Dai, Hanbin and Ye, Mao and Zhu, Ce},
booktitle={Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition},
pages={7093--7102},
year={2020}
}
WFLW (CVPR'2018)
@inproceedings{wu2018look,
title={Look at boundary: A boundary-aware face alignment algorithm},
author={Wu, Wayne and Qian, Chen and Yang, Shuo and Wang, Quan and Cai, Yici and Zhou, Qiang},
booktitle={Proceedings of the IEEE conference on computer vision and pattern recognition},
pages={2129--2138},
year={2018}
}
Results on WFLW dataset
The model is trained on WFLW train.
Arch | Input Size | NMEtest | NMEpose | NMEillumination | NMEocclusion | NMEblur | NMEmakeup | NMEexpression | ckpt | log |
---|---|---|---|---|---|---|---|---|---|---|
pose_hrnetv2_w18_dark | 256x256 | 3.98 | 6.99 | 3.96 | 4.78 | 4.57 | 3.87 | 4.30 | ckpt | log |
Topdown Heatmap + Hrnetv2 + Dark + Coco + Wholebody on Coco_wholebody_hand¶
HRNetv2 (TPAMI'2019)
@article{WangSCJDZLMTWLX19,
title={Deep High-Resolution Representation Learning for Visual Recognition},
author={Jingdong Wang and Ke Sun and Tianheng Cheng and
Borui Jiang and Chaorui Deng and Yang Zhao and Dong Liu and Yadong Mu and
Mingkui Tan and Xinggang Wang and Wenyu Liu and Bin Xiao},
journal={TPAMI},
year={2019}
}
DarkPose (CVPR'2020)
@inproceedings{zhang2020distribution,
title={Distribution-aware coordinate representation for human pose estimation},
author={Zhang, Feng and Zhu, Xiatian and Dai, Hanbin and Ye, Mao and Zhu, Ce},
booktitle={Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition},
pages={7093--7102},
year={2020}
}
COCO-WholeBody-Hand (ECCV'2020)
@inproceedings{jin2020whole,
title={Whole-Body Human Pose Estimation in the Wild},
author={Jin, Sheng and Xu, Lumin and Xu, Jin and Wang, Can and Liu, Wentao and Qian, Chen and Ouyang, Wanli and Luo, Ping},
booktitle={Proceedings of the European Conference on Computer Vision (ECCV)},
year={2020}
}
Results on COCO-WholeBody-Hand val set
Arch | Input Size | PCK@0.2 | AUC | EPE | ckpt | log |
---|---|---|---|---|---|---|
pose_hrnetv2_w18_dark | 256x256 | 0.814 | 0.840 | 4.37 | ckpt | log |
Topdown Heatmap + Hrnetv2 + Dark on Onehand10k¶
HRNetv2 (TPAMI'2019)
@article{WangSCJDZLMTWLX19,
title={Deep High-Resolution Representation Learning for Visual Recognition},
author={Jingdong Wang and Ke Sun and Tianheng Cheng and
Borui Jiang and Chaorui Deng and Yang Zhao and Dong Liu and Yadong Mu and
Mingkui Tan and Xinggang Wang and Wenyu Liu and Bin Xiao},
journal={TPAMI},
year={2019}
}
DarkPose (CVPR'2020)
@inproceedings{zhang2020distribution,
title={Distribution-aware coordinate representation for human pose estimation},
author={Zhang, Feng and Zhu, Xiatian and Dai, Hanbin and Ye, Mao and Zhu, Ce},
booktitle={Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition},
pages={7093--7102},
year={2020}
}
OneHand10K (TCSVT'2019)
@article{wang2018mask,
title={Mask-pose cascaded cnn for 2d hand pose estimation from single color image},
author={Wang, Yangang and Peng, Cong and Liu, Yebin},
journal={IEEE Transactions on Circuits and Systems for Video Technology},
volume={29},
number={11},
pages={3258--3268},
year={2018},
publisher={IEEE}
}
Results on OneHand10K val set
Arch | Input Size | PCK@0.2 | AUC | EPE | ckpt | log |
---|---|---|---|---|---|---|
pose_hrnetv2_w18_dark | 256x256 | 0.990 | 0.573 | 23.84 | ckpt | log |
Topdown Heatmap + Hrnetv2 + Dark on Panoptic2d¶
HRNetv2 (TPAMI'2019)
@article{WangSCJDZLMTWLX19,
title={Deep High-Resolution Representation Learning for Visual Recognition},
author={Jingdong Wang and Ke Sun and Tianheng Cheng and
Borui Jiang and Chaorui Deng and Yang Zhao and Dong Liu and Yadong Mu and
Mingkui Tan and Xinggang Wang and Wenyu Liu and Bin Xiao},
journal={TPAMI},
year={2019}
}
DarkPose (CVPR'2020)
@inproceedings{zhang2020distribution,
title={Distribution-aware coordinate representation for human pose estimation},
author={Zhang, Feng and Zhu, Xiatian and Dai, Hanbin and Ye, Mao and Zhu, Ce},
booktitle={Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition},
pages={7093--7102},
year={2020}
}
CMU Panoptic HandDB (CVPR'2017)
@inproceedings{simon2017hand,
title={Hand keypoint detection in single images using multiview bootstrapping},
author={Simon, Tomas and Joo, Hanbyul and Matthews, Iain and Sheikh, Yaser},
booktitle={Proceedings of the IEEE conference on Computer Vision and Pattern Recognition},
pages={1145--1153},
year={2017}
}
Results on CMU Panoptic (MPII+NZSL val set)
Arch | Input Size | PCKh@0.7 | AUC | EPE | ckpt | log |
---|---|---|---|---|---|---|
pose_hrnetv2_w18_dark | 256x256 | 0.999 | 0.745 | 7.77 | ckpt | log |
Topdown Heatmap + Hrnetv2 + Dark on Rhd2d¶
HRNetv2 (TPAMI'2019)
@article{WangSCJDZLMTWLX19,
title={Deep High-Resolution Representation Learning for Visual Recognition},
author={Jingdong Wang and Ke Sun and Tianheng Cheng and
Borui Jiang and Chaorui Deng and Yang Zhao and Dong Liu and Yadong Mu and
Mingkui Tan and Xinggang Wang and Wenyu Liu and Bin Xiao},
journal={TPAMI},
year={2019}
}
DarkPose (CVPR'2020)
@inproceedings{zhang2020distribution,
title={Distribution-aware coordinate representation for human pose estimation},
author={Zhang, Feng and Zhu, Xiatian and Dai, Hanbin and Ye, Mao and Zhu, Ce},
booktitle={Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition},
pages={7093--7102},
year={2020}
}
RHD (ICCV'2017)
@TechReport{zb2017hand,
author={Christian Zimmermann and Thomas Brox},
title={Learning to Estimate 3D Hand Pose from Single RGB Images},
institution={arXiv:1705.01389},
year={2017},
note="https://arxiv.org/abs/1705.01389",
url="https://lmb.informatik.uni-freiburg.de/projects/hand3d/"
}
Results on RHD test set
Arch | Input Size | PCK@0.2 | AUC | EPE | ckpt | log |
---|---|---|---|---|---|---|
pose_hrnetv2_w18_dark | 256x256 | 0.992 | 0.903 | 2.17 | ckpt | log |
Topdown Heatmap + Vipnas + Dark on Coco-Wholebody¶
ViPNAS (CVPR'2021)
@article{xu2021vipnas,
title={ViPNAS: Efficient Video Pose Estimation via Neural Architecture Search},
author={Xu, Lumin and Guan, Yingda and Jin, Sheng and Liu, Wentao and Qian, Chen and Luo, Ping and Ouyang, Wanli and Wang, Xiaogang},
booktitle={Proceedings of the IEEE conference on computer vision and pattern recognition},
year={2021}
}
DarkPose (CVPR'2020)
@inproceedings{zhang2020distribution,
title={Distribution-aware coordinate representation for human pose estimation},
author={Zhang, Feng and Zhu, Xiatian and Dai, Hanbin and Ye, Mao and Zhu, Ce},
booktitle={Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition},
pages={7093--7102},
year={2020}
}
COCO-WholeBody (ECCV'2020)
@inproceedings{jin2020whole,
title={Whole-Body Human Pose Estimation in the Wild},
author={Jin, Sheng and Xu, Lumin and Xu, Jin and Wang, Can and Liu, Wentao and Qian, Chen and Ouyang, Wanli and Luo, Ping},
booktitle={Proceedings of the European Conference on Computer Vision (ECCV)},
year={2020}
}
Results on COCO-WholeBody v1.0 val with detector having human AP of 56.4 on COCO val2017 dataset
Arch | Input Size | Body AP | Body AR | Foot AP | Foot AR | Face AP | Face AR | Hand AP | Hand AR | Whole AP | Whole AR | ckpt | log |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|
S-ViPNAS-MobileNetV3_dark | 256x192 | 0.632 | 0.710 | 0.530 | 0.660 | 0.672 | 0.771 | 0.404 | 0.519 | 0.508 | 0.607 | ckpt | log |
S-ViPNAS-Res50_dark | 256x192 | 0.650 | 0.732 | 0.550 | 0.686 | 0.684 | 0.784 | 0.437 | 0.554 | 0.528 | 0.632 | ckpt | log |
Topdown Heatmap + Hrnet + Dark on Coco-Wholebody¶
HRNet (CVPR'2019)
@inproceedings{sun2019deep,
title={Deep high-resolution representation learning for human pose estimation},
author={Sun, Ke and Xiao, Bin and Liu, Dong and Wang, Jingdong},
booktitle={Proceedings of the IEEE conference on computer vision and pattern recognition},
pages={5693--5703},
year={2019}
}
DarkPose (CVPR'2020)
@inproceedings{zhang2020distribution,
title={Distribution-aware coordinate representation for human pose estimation},
author={Zhang, Feng and Zhu, Xiatian and Dai, Hanbin and Ye, Mao and Zhu, Ce},
booktitle={Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition},
pages={7093--7102},
year={2020}
}
COCO-WholeBody (ECCV'2020)
@inproceedings{jin2020whole,
title={Whole-Body Human Pose Estimation in the Wild},
author={Jin, Sheng and Xu, Lumin and Xu, Jin and Wang, Can and Liu, Wentao and Qian, Chen and Ouyang, Wanli and Luo, Ping},
booktitle={Proceedings of the European Conference on Computer Vision (ECCV)},
year={2020}
}
Results on COCO-WholeBody v1.0 val with detector having human AP of 56.4 on COCO val2017 dataset
Arch | Input Size | Body AP | Body AR | Foot AP | Foot AR | Face AP | Face AR | Hand AP | Hand AR | Whole AP | Whole AR | ckpt | log |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|
pose_hrnet_w32_dark | 256x192 | 0.694 | 0.764 | 0.565 | 0.674 | 0.736 | 0.808 | 0.503 | 0.602 | 0.582 | 0.671 | ckpt | log |
pose_hrnet_w48_dark+ | 384x288 | 0.742 | 0.807 | 0.705 | 0.804 | 0.840 | 0.892 | 0.602 | 0.694 | 0.661 | 0.743 | ckpt | log |
Note: +
means the model is first pre-trained on original COCO dataset, and then fine-tuned on COCO-WholeBody dataset. We find this will lead to better performance.
Topdown Heatmap + Hrnet + Dark on Halpe¶
HRNet (CVPR'2019)
@inproceedings{sun2019deep,
title={Deep high-resolution representation learning for human pose estimation},
author={Sun, Ke and Xiao, Bin and Liu, Dong and Wang, Jingdong},
booktitle={Proceedings of the IEEE conference on computer vision and pattern recognition},
pages={5693--5703},
year={2019}
}
DarkPose (CVPR'2020)
@inproceedings{zhang2020distribution,
title={Distribution-aware coordinate representation for human pose estimation},
author={Zhang, Feng and Zhu, Xiatian and Dai, Hanbin and Ye, Mao and Zhu, Ce},
booktitle={Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition},
pages={7093--7102},
year={2020}
}
Halpe (CVPR'2020)
@inproceedings{li2020pastanet,
title={PaStaNet: Toward Human Activity Knowledge Engine},
author={Li, Yong-Lu and Xu, Liang and Liu, Xinpeng and Huang, Xijie and Xu, Yue and Wang, Shiyi and Fang, Hao-Shu and Ma, Ze and Chen, Mingyang and Lu, Cewu},
booktitle={CVPR},
year={2020}
}
Results on Halpe v1.0 val with detector having human AP of 56.4 on COCO val2017 dataset
Arch | Input Size | Whole AP | Whole AR | ckpt | log |
---|---|---|---|---|---|
pose_hrnet_w48_dark+ | 384x288 | 0.531 | 0.642 | ckpt | log |
Note: +
means the model is first pre-trained on original COCO dataset, and then fine-tuned on Halpe dataset. We find this will lead to better performance.
SCNet (CVPR’2020)¶
Topdown Heatmap + Scnet on Coco¶
SCNet (CVPR'2020)
@inproceedings{liu2020improving,
title={Improving Convolutional Networks with Self-Calibrated Convolutions},
author={Liu, Jiang-Jiang and Hou, Qibin and Cheng, Ming-Ming and Wang, Changhu and Feng, Jiashi},
booktitle={Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition},
pages={10096--10105},
year={2020}
}
COCO (ECCV'2014)
@inproceedings{lin2014microsoft,
title={Microsoft coco: Common objects in context},
author={Lin, Tsung-Yi and Maire, Michael and Belongie, Serge and Hays, James and Perona, Pietro and Ramanan, Deva and Doll{\'a}r, Piotr and Zitnick, C Lawrence},
booktitle={European conference on computer vision},
pages={740--755},
year={2014},
organization={Springer}
}
Results on COCO val2017 with detector having human AP of 56.4 on COCO val2017 dataset
Arch | Input Size | AP | AP50 | AP75 | AR | AR50 | ckpt | log |
---|---|---|---|---|---|---|---|---|
pose_scnet_50 | 256x192 | 0.728 | 0.899 | 0.807 | 0.784 | 0.938 | ckpt | log |
pose_scnet_50 | 384x288 | 0.751 | 0.906 | 0.818 | 0.802 | 0.943 | ckpt | log |
pose_scnet_101 | 256x192 | 0.733 | 0.903 | 0.813 | 0.790 | 0.941 | ckpt | log |
pose_scnet_101 | 384x288 | 0.752 | 0.906 | 0.823 | 0.804 | 0.943 | ckpt | log |
Topdown Heatmap + Scnet on Mpii¶
SCNet (CVPR'2020)
@inproceedings{liu2020improving,
title={Improving Convolutional Networks with Self-Calibrated Convolutions},
author={Liu, Jiang-Jiang and Hou, Qibin and Cheng, Ming-Ming and Wang, Changhu and Feng, Jiashi},
booktitle={Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition},
pages={10096--10105},
year={2020}
}
MPII (CVPR'2014)
@inproceedings{andriluka14cvpr,
author = {Mykhaylo Andriluka and Leonid Pishchulin and Peter Gehler and Schiele, Bernt},
title = {2D Human Pose Estimation: New Benchmark and State of the Art Analysis},
booktitle = {IEEE Conference on Computer Vision and Pattern Recognition (CVPR)},
year = {2014},
month = {June}
}
Results on MPII val set
Arch | Input Size | Mean | Mean@0.1 | ckpt | log |
---|---|---|---|---|---|
pose_scnet_50 | 256x256 | 0.888 | 0.290 | ckpt | log |
pose_scnet_101 | 256x256 | 0.886 | 0.293 | ckpt | log |
Topdown Heatmap + Scnet + Coco + Wholebody on Coco_wholebody_face¶
SCNet (CVPR'2020)
@inproceedings{liu2020improving,
title={Improving Convolutional Networks with Self-Calibrated Convolutions},
author={Liu, Jiang-Jiang and Hou, Qibin and Cheng, Ming-Ming and Wang, Changhu and Feng, Jiashi},
booktitle={Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition},
pages={10096--10105},
year={2020}
}
COCO-WholeBody-Face (ECCV'2020)
@inproceedings{jin2020whole,
title={Whole-Body Human Pose Estimation in the Wild},
author={Jin, Sheng and Xu, Lumin and Xu, Jin and Wang, Can and Liu, Wentao and Qian, Chen and Ouyang, Wanli and Luo, Ping},
booktitle={Proceedings of the European Conference on Computer Vision (ECCV)},
year={2020}
}
Results on COCO-WholeBody-Face val set
Arch | Input Size | NME | ckpt | log |
---|---|---|---|---|
pose_scnet_50 | 256x256 | 0.0565 | ckpt | log |
Topdown Heatmap + Scnet + Coco + Wholebody on Coco_wholebody_hand¶
SCNet (CVPR'2020)
@inproceedings{liu2020improving,
title={Improving Convolutional Networks with Self-Calibrated Convolutions},
author={Liu, Jiang-Jiang and Hou, Qibin and Cheng, Ming-Ming and Wang, Changhu and Feng, Jiashi},
booktitle={Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition},
pages={10096--10105},
year={2020}
}
COCO-WholeBody-Hand (ECCV'2020)
@inproceedings{jin2020whole,
title={Whole-Body Human Pose Estimation in the Wild},
author={Jin, Sheng and Xu, Lumin and Xu, Jin and Wang, Can and Liu, Wentao and Qian, Chen and Ouyang, Wanli and Luo, Ping},
booktitle={Proceedings of the European Conference on Computer Vision (ECCV)},
year={2020}
}
Results on COCO-WholeBody-Hand val set
Arch | Input Size | PCK@0.2 | AUC | EPE | ckpt | log |
---|---|---|---|---|---|---|
pose_scnet_50 | 256x256 | 0.803 | 0.834 | 4.55 | ckpt | log |
SoftWingloss (TIP’2021)¶
Deeppose + Resnet + Softwingloss on WFLW¶
DeepPose (CVPR'2014)
@inproceedings{toshev2014deeppose,
title={Deeppose: Human pose estimation via deep neural networks},
author={Toshev, Alexander and Szegedy, Christian},
booktitle={Proceedings of the IEEE conference on computer vision and pattern recognition},
pages={1653--1660},
year={2014}
}
ResNet (CVPR'2016)
@inproceedings{he2016deep,
title={Deep residual learning for image recognition},
author={He, Kaiming and Zhang, Xiangyu and Ren, Shaoqing and Sun, Jian},
booktitle={Proceedings of the IEEE conference on computer vision and pattern recognition},
pages={770--778},
year={2016}
}
SoftWingloss (TIP'2021)
@article{lin2021structure,
title={Structure-Coherent Deep Feature Learning for Robust Face Alignment},
author={Lin, Chunze and Zhu, Beier and Wang, Quan and Liao, Renjie and Qian, Chen and Lu, Jiwen and Zhou, Jie},
journal={IEEE Transactions on Image Processing},
year={2021},
publisher={IEEE}
}
WFLW (CVPR'2018)
@inproceedings{wu2018look,
title={Look at boundary: A boundary-aware face alignment algorithm},
author={Wu, Wayne and Qian, Chen and Yang, Shuo and Wang, Quan and Cai, Yici and Zhou, Qiang},
booktitle={Proceedings of the IEEE conference on computer vision and pattern recognition},
pages={2129--2138},
year={2018}
}
Results on WFLW dataset
The model is trained on WFLW train.
Arch | Input Size | NMEtest | NMEpose | NMEillumination | NMEocclusion | NMEblur | NMEmakeup | NMEexpression | ckpt | log |
---|---|---|---|---|---|---|---|---|---|---|
deeppose_res50_softwingloss | 256x256 | 4.41 | 7.77 | 4.37 | 5.27 | 5.01 | 4.36 | 4.70 | ckpt | log |
SimpleBaseline3D (ICCV’2017)¶
Pose Lift + Simplebaseline3d on H36m¶
SimpleBaseline3D (ICCV'2017)
@inproceedings{martinez_2017_3dbaseline,
title={A simple yet effective baseline for 3d human pose estimation},
author={Martinez, Julieta and Hossain, Rayat and Romero, Javier and Little, James J.},
booktitle={ICCV},
year={2017}
}
Human3.6M (TPAMI'2014)
@article{h36m_pami,
author = {Ionescu, Catalin and Papava, Dragos and Olaru, Vlad and Sminchisescu, Cristian},
title = {Human3.6M: Large Scale Datasets and Predictive Methods for 3D Human Sensing in Natural Environments},
journal = {IEEE Transactions on Pattern Analysis and Machine Intelligence},
publisher = {IEEE Computer Society},
volume = {36},
number = {7},
pages = {1325-1339},
month = {jul},
year = {2014}
}
Results on Human3.6M dataset with ground truth 2D detections
Arch | MPJPE | P-MPJPE | ckpt | log |
---|---|---|---|---|
simple_baseline_3d_tcn1 | 43.4 | 34.3 | ckpt | log |
1 Differing from the original paper, we didn’t apply the max-norm constraint
because we found this led to a better convergence and performance.
Pose Lift + Simplebaseline3d on Mpi_inf_3dhp¶
SimpleBaseline3D (ICCV'2017)
@inproceedings{martinez_2017_3dbaseline,
title={A simple yet effective baseline for 3d human pose estimation},
author={Martinez, Julieta and Hossain, Rayat and Romero, Javier and Little, James J.},
booktitle={ICCV},
year={2017}
}
MPI-INF-3DHP (3DV'2017)
@inproceedings{mono-3dhp2017,
author = {Mehta, Dushyant and Rhodin, Helge and Casas, Dan and Fua, Pascal and Sotnychenko, Oleksandr and Xu, Weipeng and Theobalt, Christian},
title = {Monocular 3D Human Pose Estimation In The Wild Using Improved CNN Supervision},
booktitle = {3D Vision (3DV), 2017 Fifth International Conference on},
url = {http://gvv.mpi-inf.mpg.de/3dhp_dataset},
year = {2017},
organization={IEEE},
doi={10.1109/3dv.2017.00064},
}
Results on MPI-INF-3DHP dataset with ground truth 2D detections
Arch | MPJPE | P-MPJPE | 3DPCK | 3DAUC | ckpt | log |
---|---|---|---|---|---|---|
simple_baseline_3d_tcn1 | 84.3 | 53.2 | 85.0 | 52.0 | ckpt | log |
1 Differing from the original paper, we didn’t apply the max-norm constraint
because we found this led to a better convergence and performance.
InterNet (ECCV’2020)¶
Internet + Internet on Interhand3d¶
InterNet (ECCV'2020)
@InProceedings{Moon_2020_ECCV_InterHand2.6M,
author = {Moon, Gyeongsik and Yu, Shoou-I and Wen, He and Shiratori, Takaaki and Lee, Kyoung Mu},
title = {InterHand2.6M: A Dataset and Baseline for 3D Interacting Hand Pose Estimation from a Single RGB Image},
booktitle = {European Conference on Computer Vision (ECCV)},
year = {2020}
}
ResNet (CVPR'2016)
@inproceedings{he2016deep,
title={Deep residual learning for image recognition},
author={He, Kaiming and Zhang, Xiangyu and Ren, Shaoqing and Sun, Jian},
booktitle={Proceedings of the IEEE conference on computer vision and pattern recognition},
pages={770--778},
year={2016}
}
InterHand2.6M (ECCV'2020)
@InProceedings{Moon_2020_ECCV_InterHand2.6M,
author = {Moon, Gyeongsik and Yu, Shoou-I and Wen, He and Shiratori, Takaaki and Lee, Kyoung Mu},
title = {InterHand2.6M: A Dataset and Baseline for 3D Interacting Hand Pose Estimation from a Single RGB Image},
booktitle = {European Conference on Computer Vision (ECCV)},
year = {2020}
}
Results on InterHand2.6M val & test set
Train Set | Set | Arch | Input Size | MPJPE-single | MPJPE-interacting | MPJPE-all | MRRPE | APh | ckpt | log |
---|---|---|---|---|---|---|---|---|---|---|
All | test(H+M) | InterNet_resnet_50 | 256x256 | 9.47 | 13.40 | 11.59 | 29.28 | 0.99 | ckpt | log |
All | val(M) | InterNet_resnet_50 | 256x256 | 11.22 | 15.23 | 13.16 | 31.73 | 0.98 | ckpt | log |
VideoPose3D (CVPR’2019)¶
Video Pose Lift + Videopose3d on H36m¶
VideoPose3D (CVPR'2019)
@inproceedings{pavllo20193d,
title={3d human pose estimation in video with temporal convolutions and semi-supervised training},
author={Pavllo, Dario and Feichtenhofer, Christoph and Grangier, David and Auli, Michael},
booktitle={Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition},
pages={7753--7762},
year={2019}
}
Human3.6M (TPAMI'2014)
@article{h36m_pami,
author = {Ionescu, Catalin and Papava, Dragos and Olaru, Vlad and Sminchisescu, Cristian},
title = {Human3.6M: Large Scale Datasets and Predictive Methods for 3D Human Sensing in Natural Environments},
journal = {IEEE Transactions on Pattern Analysis and Machine Intelligence},
publisher = {IEEE Computer Society},
volume = {36},
number = {7},
pages = {1325-1339},
month = {jul},
year = {2014}
}
Results on Human3.6M dataset with ground truth 2D detections, supervised training
Arch | Receptive Field | MPJPE | P-MPJPE | ckpt | log |
---|---|---|---|---|---|
VideoPose3D | 27 | 40.0 | 30.1 | ckpt | log |
VideoPose3D | 81 | 38.9 | 29.2 | ckpt | log |
VideoPose3D | 243 | 37.6 | 28.3 | ckpt | log |
Results on Human3.6M dataset with CPN 2D detections1, supervised training
Arch | Receptive Field | MPJPE | P-MPJPE | ckpt | log |
---|---|---|---|---|---|
VideoPose3D | 1 | 52.9 | 41.3 | ckpt | log |
VideoPose3D | 243 | 47.9 | 38.0 | ckpt | log |
Results on Human3.6M dataset with ground truth 2D detections, semi-supervised training
Training Data | Arch | Receptive Field | MPJPE | P-MPJPE | N-MPJPE | ckpt | log |
---|---|---|---|---|---|---|---|
10% S1 | VideoPose3D | 27 | 58.1 | 42.8 | 54.7 | ckpt | log |
Results on Human3.6M dataset with CPN 2D detections1, semi-supervised training
Training Data | Arch | Receptive Field | MPJPE | P-MPJPE | N-MPJPE | ckpt | log |
---|---|---|---|---|---|---|---|
10% S1 | VideoPose3D | 27 | 67.4 | 50.1 | 63.2 | ckpt | log |
1 CPN 2D detections are provided by official repo. The reformatted version used in this repository can be downloaded from train_detection and test_detection.
Video Pose Lift + Videopose3d on Mpi_inf_3dhp¶
VideoPose3D (CVPR'2019)
@inproceedings{pavllo20193d,
title={3d human pose estimation in video with temporal convolutions and semi-supervised training},
author={Pavllo, Dario and Feichtenhofer, Christoph and Grangier, David and Auli, Michael},
booktitle={Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition},
pages={7753--7762},
year={2019}
}
MPI-INF-3DHP (3DV'2017)
@inproceedings{mono-3dhp2017,
author = {Mehta, Dushyant and Rhodin, Helge and Casas, Dan and Fua, Pascal and Sotnychenko, Oleksandr and Xu, Weipeng and Theobalt, Christian},
title = {Monocular 3D Human Pose Estimation In The Wild Using Improved CNN Supervision},
booktitle = {3D Vision (3DV), 2017 Fifth International Conference on},
url = {http://gvv.mpi-inf.mpg.de/3dhp_dataset},
year = {2017},
organization={IEEE},
doi={10.1109/3dv.2017.00064},
}
Results on MPI-INF-3DHP dataset with ground truth 2D detections, supervised training
Arch | Receptive Field | MPJPE | P-MPJPE | 3DPCK | 3DAUC | ckpt | log |
---|---|---|---|---|---|---|---|
VideoPose3D | 1 | 58.3 | 40.6 | 94.1 | 63.1 | ckpt | log |
Backbones¶
CPM (CVPR’2016)¶
Topdown Heatmap + CPM on Coco¶
CPM (CVPR'2016)
@inproceedings{wei2016convolutional,
title={Convolutional pose machines},
author={Wei, Shih-En and Ramakrishna, Varun and Kanade, Takeo and Sheikh, Yaser},
booktitle={Proceedings of the IEEE conference on Computer Vision and Pattern Recognition},
pages={4724--4732},
year={2016}
}
COCO (ECCV'2014)
@inproceedings{lin2014microsoft,
title={Microsoft coco: Common objects in context},
author={Lin, Tsung-Yi and Maire, Michael and Belongie, Serge and Hays, James and Perona, Pietro and Ramanan, Deva and Doll{\'a}r, Piotr and Zitnick, C Lawrence},
booktitle={European conference on computer vision},
pages={740--755},
year={2014},
organization={Springer}
}
Results on COCO val2017 with detector having human AP of 56.4 on COCO val2017 dataset
Arch | Input Size | AP | AP50 | AP75 | AR | AR50 | ckpt | log |
---|---|---|---|---|---|---|---|---|
cpm | 256x192 | 0.623 | 0.859 | 0.704 | 0.686 | 0.903 | ckpt | log |
cpm | 384x288 | 0.650 | 0.864 | 0.725 | 0.708 | 0.905 | ckpt | log |
Topdown Heatmap + CPM on JHMDB¶
CPM (CVPR'2016)
@inproceedings{wei2016convolutional,
title={Convolutional pose machines},
author={Wei, Shih-En and Ramakrishna, Varun and Kanade, Takeo and Sheikh, Yaser},
booktitle={Proceedings of the IEEE conference on Computer Vision and Pattern Recognition},
pages={4724--4732},
year={2016}
}
JHMDB (ICCV'2013)
@inproceedings{Jhuang:ICCV:2013,
title = {Towards understanding action recognition},
author = {H. Jhuang and J. Gall and S. Zuffi and C. Schmid and M. J. Black},
booktitle = {International Conf. on Computer Vision (ICCV)},
month = Dec,
pages = {3192-3199},
year = {2013}
}
Results on Sub-JHMDB dataset
The models are pre-trained on MPII dataset only. NO test-time augmentation (multi-scale /rotation testing) is used.
Normalized by Person Size
Split | Arch | Input Size | Head | Sho | Elb | Wri | Hip | Knee | Ank | Mean | ckpt | log |
---|---|---|---|---|---|---|---|---|---|---|---|---|
Sub1 | cpm | 368x368 | 96.1 | 91.9 | 81.0 | 78.9 | 96.6 | 90.8 | 87.3 | 89.5 | ckpt | log |
Sub2 | cpm | 368x368 | 98.1 | 93.6 | 77.1 | 70.9 | 94.0 | 89.1 | 84.7 | 87.4 | ckpt | log |
Sub3 | cpm | 368x368 | 97.9 | 94.9 | 87.3 | 84.0 | 98.6 | 94.4 | 86.2 | 92.4 | ckpt | log |
Average | cpm | 368x368 | 97.4 | 93.5 | 81.5 | 77.9 | 96.4 | 91.4 | 86.1 | 89.8 | - | - |
Normalized by Torso Size
Split | Arch | Input Size | Head | Sho | Elb | Wri | Hip | Knee | Ank | Mean | ckpt | log |
---|---|---|---|---|---|---|---|---|---|---|---|---|
Sub1 | cpm | 368x368 | 89.0 | 63.0 | 54.0 | 54.9 | 68.2 | 63.1 | 61.2 | 66.0 | ckpt | log |
Sub2 | cpm | 368x368 | 90.3 | 57.9 | 46.8 | 44.3 | 60.8 | 58.2 | 62.4 | 61.1 | ckpt | log |
Sub3 | cpm | 368x368 | 91.0 | 72.6 | 59.9 | 54.0 | 73.2 | 68.5 | 65.8 | 70.3 | ckpt | log |
Average | cpm | 368x368 | 90.1 | 64.5 | 53.6 | 51.1 | 67.4 | 63.3 | 63.1 | 65.7 | - | - |
Topdown Heatmap + CPM on Mpii¶
CPM (CVPR'2016)
@inproceedings{wei2016convolutional,
title={Convolutional pose machines},
author={Wei, Shih-En and Ramakrishna, Varun and Kanade, Takeo and Sheikh, Yaser},
booktitle={Proceedings of the IEEE conference on Computer Vision and Pattern Recognition},
pages={4724--4732},
year={2016}
}
MPII (CVPR'2014)
@inproceedings{andriluka14cvpr,
author = {Mykhaylo Andriluka and Leonid Pishchulin and Peter Gehler and Schiele, Bernt},
title = {2D Human Pose Estimation: New Benchmark and State of the Art Analysis},
booktitle = {IEEE Conference on Computer Vision and Pattern Recognition (CVPR)},
year = {2014},
month = {June}
}
Results on MPII val set
Arch | Input Size | Mean | Mean@0.1 | ckpt | log |
---|---|---|---|---|---|
cpm | 368x368 | 0.876 | 0.285 | ckpt | log |
ResNetV1D (CVPR’2019)¶
Topdown Heatmap + Resnetv1d on Coco¶
ResNetV1D (CVPR'2019)
@inproceedings{he2019bag,
title={Bag of tricks for image classification with convolutional neural networks},
author={He, Tong and Zhang, Zhi and Zhang, Hang and Zhang, Zhongyue and Xie, Junyuan and Li, Mu},
booktitle={Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition},
pages={558--567},
year={2019}
}
COCO (ECCV'2014)
@inproceedings{lin2014microsoft,
title={Microsoft coco: Common objects in context},
author={Lin, Tsung-Yi and Maire, Michael and Belongie, Serge and Hays, James and Perona, Pietro and Ramanan, Deva and Doll{\'a}r, Piotr and Zitnick, C Lawrence},
booktitle={European conference on computer vision},
pages={740--755},
year={2014},
organization={Springer}
}
Results on COCO val2017 with detector having human AP of 56.4 on COCO val2017 dataset
Arch | Input Size | AP | AP50 | AP75 | AR | AR50 | ckpt | log |
---|---|---|---|---|---|---|---|---|
pose_resnetv1d_50 | 256x192 | 0.722 | 0.897 | 0.799 | 0.777 | 0.933 | ckpt | log |
pose_resnetv1d_50 | 384x288 | 0.730 | 0.900 | 0.799 | 0.780 | 0.934 | ckpt | log |
pose_resnetv1d_101 | 256x192 | 0.731 | 0.899 | 0.809 | 0.786 | 0.938 | ckpt | log |
pose_resnetv1d_101 | 384x288 | 0.748 | 0.902 | 0.816 | 0.799 | 0.939 | ckpt | log |
pose_resnetv1d_152 | 256x192 | 0.737 | 0.902 | 0.812 | 0.791 | 0.940 | ckpt | log |
pose_resnetv1d_152 | 384x288 | 0.752 | 0.909 | 0.821 | 0.802 | 0.944 | ckpt | log |
Topdown Heatmap + Resnetv1d on Mpii¶
ResNetV1D (CVPR'2019)
@inproceedings{he2019bag,
title={Bag of tricks for image classification with convolutional neural networks},
author={He, Tong and Zhang, Zhi and Zhang, Hang and Zhang, Zhongyue and Xie, Junyuan and Li, Mu},
booktitle={Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition},
pages={558--567},
year={2019}
}
MPII (CVPR'2014)
@inproceedings{andriluka14cvpr,
author = {Mykhaylo Andriluka and Leonid Pishchulin and Peter Gehler and Schiele, Bernt},
title = {2D Human Pose Estimation: New Benchmark and State of the Art Analysis},
booktitle = {IEEE Conference on Computer Vision and Pattern Recognition (CVPR)},
year = {2014},
month = {June}
}
Results on MPII val set
Arch | Input Size | Mean | Mean@0.1 | ckpt | log |
---|---|---|---|---|---|
pose_resnetv1d_50 | 256x256 | 0.881 | 0.290 | ckpt | log |
pose_resnetv1d_101 | 256x256 | 0.883 | 0.295 | ckpt | log |
pose_resnetv1d_152 | 256x256 | 0.888 | 0.300 | ckpt | log |
VGG (ICLR’2015)¶
Topdown Heatmap + VGG on Coco¶
VGG (ICLR'2015)
@article{simonyan2014very,
title={Very deep convolutional networks for large-scale image recognition},
author={Simonyan, Karen and Zisserman, Andrew},
journal={arXiv preprint arXiv:1409.1556},
year={2014}
}
COCO (ECCV'2014)
@inproceedings{lin2014microsoft,
title={Microsoft coco: Common objects in context},
author={Lin, Tsung-Yi and Maire, Michael and Belongie, Serge and Hays, James and Perona, Pietro and Ramanan, Deva and Doll{\'a}r, Piotr and Zitnick, C Lawrence},
booktitle={European conference on computer vision},
pages={740--755},
year={2014},
organization={Springer}
}
Results on COCO val2017 with detector having human AP of 56.4 on COCO val2017 dataset
Arch | Input Size | AP | AP50 | AP75 | AR | AR50 | ckpt | log |
---|---|---|---|---|---|---|---|---|
vgg | 256x192 | 0.698 | 0.890 | 0.768 | 0.754 | 0.929 | ckpt | log |
MobilenetV2 (CVPR’2018)¶
Associative Embedding + Mobilenetv2 on Coco¶
Associative Embedding (NIPS'2017)
@inproceedings{newell2017associative,
title={Associative embedding: End-to-end learning for joint detection and grouping},
author={Newell, Alejandro and Huang, Zhiao and Deng, Jia},
booktitle={Advances in neural information processing systems},
pages={2277--2287},
year={2017}
}
MobilenetV2 (CVPR'2018)
@inproceedings{sandler2018mobilenetv2,
title={Mobilenetv2: Inverted residuals and linear bottlenecks},
author={Sandler, Mark and Howard, Andrew and Zhu, Menglong and Zhmoginov, Andrey and Chen, Liang-Chieh},
booktitle={Proceedings of the IEEE conference on computer vision and pattern recognition},
pages={4510--4520},
year={2018}
}
COCO (ECCV'2014)
@inproceedings{lin2014microsoft,
title={Microsoft coco: Common objects in context},
author={Lin, Tsung-Yi and Maire, Michael and Belongie, Serge and Hays, James and Perona, Pietro and Ramanan, Deva and Doll{\'a}r, Piotr and Zitnick, C Lawrence},
booktitle={European conference on computer vision},
pages={740--755},
year={2014},
organization={Springer}
}
Results on COCO val2017 without multi-scale test
Arch | Input Size | AP | AP50 | AP75 | AR | AR50 | ckpt | log |
---|---|---|---|---|---|---|---|---|
pose_mobilenetv2 | 512x512 | 0.380 | 0.671 | 0.368 | 0.473 | 0.741 | ckpt | log |
Results on COCO val2017 with multi-scale test. 3 default scales ([2, 1, 0.5]) are used
Arch | Input Size | AP | AP50 | AP75 | AR | AR50 | ckpt | log |
---|---|---|---|---|---|---|---|---|
pose_mobilenetv2 | 512x512 | 0.442 | 0.696 | 0.422 | 0.517 | 0.766 | ckpt | log |
Topdown Heatmap + Mobilenetv2 on Coco¶
MobilenetV2 (CVPR'2018)
@inproceedings{sandler2018mobilenetv2,
title={Mobilenetv2: Inverted residuals and linear bottlenecks},
author={Sandler, Mark and Howard, Andrew and Zhu, Menglong and Zhmoginov, Andrey and Chen, Liang-Chieh},
booktitle={Proceedings of the IEEE conference on computer vision and pattern recognition},
pages={4510--4520},
year={2018}
}
COCO (ECCV'2014)
@inproceedings{lin2014microsoft,
title={Microsoft coco: Common objects in context},
author={Lin, Tsung-Yi and Maire, Michael and Belongie, Serge and Hays, James and Perona, Pietro and Ramanan, Deva and Doll{\'a}r, Piotr and Zitnick, C Lawrence},
booktitle={European conference on computer vision},
pages={740--755},
year={2014},
organization={Springer}
}
Results on COCO val2017 with detector having human AP of 56.4 on COCO val2017 dataset
Arch | Input Size | AP | AP50 | AP75 | AR | AR50 | ckpt | log |
---|---|---|---|---|---|---|---|---|
pose_mobilenetv2 | 256x192 | 0.646 | 0.874 | 0.723 | 0.707 | 0.917 | ckpt | log |
pose_mobilenetv2 | 384x288 | 0.673 | 0.879 | 0.743 | 0.729 | 0.916 | ckpt | log |
Topdown Heatmap + Mobilenetv2 on Mpii¶
MobilenetV2 (CVPR'2018)
@inproceedings{sandler2018mobilenetv2,
title={Mobilenetv2: Inverted residuals and linear bottlenecks},
author={Sandler, Mark and Howard, Andrew and Zhu, Menglong and Zhmoginov, Andrey and Chen, Liang-Chieh},
booktitle={Proceedings of the IEEE conference on computer vision and pattern recognition},
pages={4510--4520},
year={2018}
}
MPII (CVPR'2014)
@inproceedings{andriluka14cvpr,
author = {Mykhaylo Andriluka and Leonid Pishchulin and Peter Gehler and Schiele, Bernt},
title = {2D Human Pose Estimation: New Benchmark and State of the Art Analysis},
booktitle = {IEEE Conference on Computer Vision and Pattern Recognition (CVPR)},
year = {2014},
month = {June}
}
Results on MPII val set
Arch | Input Size | Mean | Mean@0.1 | ckpt | log |
---|---|---|---|---|---|
pose_mobilenetv2 | 256x256 | 0.854 | 0.235 | ckpt | log |
Topdown Heatmap + Mobilenetv2 + Coco + Wholebody on Coco_wholebody_face¶
MobilenetV2 (CVPR'2018)
@inproceedings{sandler2018mobilenetv2,
title={Mobilenetv2: Inverted residuals and linear bottlenecks},
author={Sandler, Mark and Howard, Andrew and Zhu, Menglong and Zhmoginov, Andrey and Chen, Liang-Chieh},
booktitle={Proceedings of the IEEE conference on computer vision and pattern recognition},
pages={4510--4520},
year={2018}
}
COCO-WholeBody-Face (ECCV'2020)
@inproceedings{jin2020whole,
title={Whole-Body Human Pose Estimation in the Wild},
author={Jin, Sheng and Xu, Lumin and Xu, Jin and Wang, Can and Liu, Wentao and Qian, Chen and Ouyang, Wanli and Luo, Ping},
booktitle={Proceedings of the European Conference on Computer Vision (ECCV)},
year={2020}
}
Results on COCO-WholeBody-Face val set
Arch | Input Size | NME | ckpt | log |
---|---|---|---|---|
pose_mobilenetv2 | 256x256 | 0.0612 | ckpt | log |
Topdown Heatmap + Mobilenetv2 + Coco + Wholebody on Coco_wholebody_hand¶
MobilenetV2 (CVPR'2018)
@inproceedings{sandler2018mobilenetv2,
title={Mobilenetv2: Inverted residuals and linear bottlenecks},
author={Sandler, Mark and Howard, Andrew and Zhu, Menglong and Zhmoginov, Andrey and Chen, Liang-Chieh},
booktitle={Proceedings of the IEEE conference on computer vision and pattern recognition},
pages={4510--4520},
year={2018}
}
COCO-WholeBody-Hand (ECCV'2020)
@inproceedings{jin2020whole,
title={Whole-Body Human Pose Estimation in the Wild},
author={Jin, Sheng and Xu, Lumin and Xu, Jin and Wang, Can and Liu, Wentao and Qian, Chen and Ouyang, Wanli and Luo, Ping},
booktitle={Proceedings of the European Conference on Computer Vision (ECCV)},
year={2020}
}
Results on COCO-WholeBody-Hand val set
Arch | Input Size | PCK@0.2 | AUC | EPE | ckpt | log |
---|---|---|---|---|---|---|
pose_mobilenetv2 | 256x256 | 0.795 | 0.829 | 4.77 | ckpt | log |
Topdown Heatmap + Mobilenetv2 on Onehand10k¶
MobilenetV2 (CVPR'2018)
@inproceedings{sandler2018mobilenetv2,
title={Mobilenetv2: Inverted residuals and linear bottlenecks},
author={Sandler, Mark and Howard, Andrew and Zhu, Menglong and Zhmoginov, Andrey and Chen, Liang-Chieh},
booktitle={Proceedings of the IEEE conference on computer vision and pattern recognition},
pages={4510--4520},
year={2018}
}
OneHand10K (TCSVT'2019)
@article{wang2018mask,
title={Mask-pose cascaded cnn for 2d hand pose estimation from single color image},
author={Wang, Yangang and Peng, Cong and Liu, Yebin},
journal={IEEE Transactions on Circuits and Systems for Video Technology},
volume={29},
number={11},
pages={3258--3268},
year={2018},
publisher={IEEE}
}
Results on OneHand10K val set
Arch | Input Size | PCK@0.2 | AUC | EPE | ckpt | log |
---|---|---|---|---|---|---|
pose_mobilenet_v2 | 256x256 | 0.986 | 0.537 | 28.60 | ckpt | log |
Topdown Heatmap + Mobilenetv2 on Panoptic2d¶
MobilenetV2 (CVPR'2018)
@inproceedings{sandler2018mobilenetv2,
title={Mobilenetv2: Inverted residuals and linear bottlenecks},
author={Sandler, Mark and Howard, Andrew and Zhu, Menglong and Zhmoginov, Andrey and Chen, Liang-Chieh},
booktitle={Proceedings of the IEEE conference on computer vision and pattern recognition},
pages={4510--4520},
year={2018}
}
CMU Panoptic HandDB (CVPR'2017)
@inproceedings{simon2017hand,
title={Hand keypoint detection in single images using multiview bootstrapping},
author={Simon, Tomas and Joo, Hanbyul and Matthews, Iain and Sheikh, Yaser},
booktitle={Proceedings of the IEEE conference on Computer Vision and Pattern Recognition},
pages={1145--1153},
year={2017}
}
Results on CMU Panoptic (MPII+NZSL val set)
Arch | Input Size | PCKh@0.7 | AUC | EPE | ckpt | log |
---|---|---|---|---|---|---|
pose_mobilenet_v2 | 256x256 | 0.998 | 0.694 | 9.70 | ckpt | log |
Topdown Heatmap + Mobilenetv2 on Rhd2d¶
MobilenetV2 (CVPR'2018)
@inproceedings{sandler2018mobilenetv2,
title={Mobilenetv2: Inverted residuals and linear bottlenecks},
author={Sandler, Mark and Howard, Andrew and Zhu, Menglong and Zhmoginov, Andrey and Chen, Liang-Chieh},
booktitle={Proceedings of the IEEE conference on computer vision and pattern recognition},
pages={4510--4520},
year={2018}
}
RHD (ICCV'2017)
@TechReport{zb2017hand,
author={Christian Zimmermann and Thomas Brox},
title={Learning to Estimate 3D Hand Pose from Single RGB Images},
institution={arXiv:1705.01389},
year={2017},
note="https://arxiv.org/abs/1705.01389",
url="https://lmb.informatik.uni-freiburg.de/projects/hand3d/"
}
Results on RHD test set
Arch | Input Size | PCK@0.2 | AUC | EPE | ckpt | log |
---|---|---|---|---|---|---|
pose_mobilenet_v2 | 256x256 | 0.985 | 0.883 | 2.80 | ckpt | log |
ShufflenetV2 (ECCV’2018)¶
Topdown Heatmap + Shufflenetv2 on Coco¶
ShufflenetV2 (ECCV'2018)
@inproceedings{ma2018shufflenet,
title={Shufflenet v2: Practical guidelines for efficient cnn architecture design},
author={Ma, Ningning and Zhang, Xiangyu and Zheng, Hai-Tao and Sun, Jian},
booktitle={Proceedings of the European conference on computer vision (ECCV)},
pages={116--131},
year={2018}
}
COCO (ECCV'2014)
@inproceedings{lin2014microsoft,
title={Microsoft coco: Common objects in context},
author={Lin, Tsung-Yi and Maire, Michael and Belongie, Serge and Hays, James and Perona, Pietro and Ramanan, Deva and Doll{\'a}r, Piotr and Zitnick, C Lawrence},
booktitle={European conference on computer vision},
pages={740--755},
year={2014},
organization={Springer}
}
Results on COCO val2017 with detector having human AP of 56.4 on COCO val2017 dataset
Arch | Input Size | AP | AP50 | AP75 | AR | AR50 | ckpt | log |
---|---|---|---|---|---|---|---|---|
pose_shufflenetv2 | 256x192 | 0.599 | 0.854 | 0.663 | 0.664 | 0.899 | ckpt | log |
pose_shufflenetv2 | 384x288 | 0.636 | 0.865 | 0.705 | 0.697 | 0.909 | ckpt | log |
Topdown Heatmap + Shufflenetv2 on Mpii¶
ShufflenetV2 (ECCV'2018)
@inproceedings{ma2018shufflenet,
title={Shufflenet v2: Practical guidelines for efficient cnn architecture design},
author={Ma, Ningning and Zhang, Xiangyu and Zheng, Hai-Tao and Sun, Jian},
booktitle={Proceedings of the European conference on computer vision (ECCV)},
pages={116--131},
year={2018}
}
MPII (CVPR'2014)
@inproceedings{andriluka14cvpr,
author = {Mykhaylo Andriluka and Leonid Pishchulin and Peter Gehler and Schiele, Bernt},
title = {2D Human Pose Estimation: New Benchmark and State of the Art Analysis},
booktitle = {IEEE Conference on Computer Vision and Pattern Recognition (CVPR)},
year = {2014},
month = {June}
}
Results on MPII val set
Arch | Input Size | Mean | Mean@0.1 | ckpt | log |
---|---|---|---|---|---|
pose_shufflenetv2 | 256x256 | 0.828 | 0.205 | ckpt | log |
AlexNet (NeurIPS’2012)¶
Topdown Heatmap + Alexnet on Coco¶
AlexNet (NeurIPS'2012)
@inproceedings{krizhevsky2012imagenet,
title={Imagenet classification with deep convolutional neural networks},
author={Krizhevsky, Alex and Sutskever, Ilya and Hinton, Geoffrey E},
booktitle={Advances in neural information processing systems},
pages={1097--1105},
year={2012}
}
COCO (ECCV'2014)
@inproceedings{lin2014microsoft,
title={Microsoft coco: Common objects in context},
author={Lin, Tsung-Yi and Maire, Michael and Belongie, Serge and Hays, James and Perona, Pietro and Ramanan, Deva and Doll{\'a}r, Piotr and Zitnick, C Lawrence},
booktitle={European conference on computer vision},
pages={740--755},
year={2014},
organization={Springer}
}
Results on COCO val2017 with detector having human AP of 56.4 on COCO val2017 dataset
Arch | Input Size | AP | AP50 | AP75 | AR | AR50 | ckpt | log |
---|---|---|---|---|---|---|---|---|
pose_alexnet | 256x192 | 0.397 | 0.758 | 0.381 | 0.478 | 0.822 | ckpt | log |
HigherHRNet (CVPR’2020)¶
Associative Embedding + Higherhrnet on Aic¶
Associative Embedding (NIPS'2017)
@inproceedings{newell2017associative,
title={Associative embedding: End-to-end learning for joint detection and grouping},
author={Newell, Alejandro and Huang, Zhiao and Deng, Jia},
booktitle={Advances in neural information processing systems},
pages={2277--2287},
year={2017}
}
HigherHRNet (CVPR'2020)
@inproceedings{cheng2020higherhrnet,
title={HigherHRNet: Scale-Aware Representation Learning for Bottom-Up Human Pose Estimation},
author={Cheng, Bowen and Xiao, Bin and Wang, Jingdong and Shi, Honghui and Huang, Thomas S and Zhang, Lei},
booktitle={Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition},
pages={5386--5395},
year={2020}
}
AI Challenger (ArXiv'2017)
@article{wu2017ai,
title={Ai challenger: A large-scale dataset for going deeper in image understanding},
author={Wu, Jiahong and Zheng, He and Zhao, Bo and Li, Yixin and Yan, Baoming and Liang, Rui and Wang, Wenjia and Zhou, Shipei and Lin, Guosen and Fu, Yanwei and others},
journal={arXiv preprint arXiv:1711.06475},
year={2017}
}
Results on AIC validation set without multi-scale test
Arch | Input Size | AP | AP50 | AP75 | AR | AR50 | ckpt | log |
---|---|---|---|---|---|---|---|---|
HigherHRNet-w32 | 512x512 | 0.315 | 0.710 | 0.243 | 0.379 | 0.757 | ckpt | log |
Results on AIC validation set with multi-scale test. 3 default scales ([2, 1, 0.5]) are used
Arch | Input Size | AP | AP50 | AP75 | AR | AR50 | ckpt | log |
---|---|---|---|---|---|---|---|---|
HigherHRNet-w32 | 512x512 | 0.323 | 0.718 | 0.254 | 0.379 | 0.758 | ckpt | log |
Associative Embedding + Higherhrnet + Udp on Coco¶
Associative Embedding (NIPS'2017)
@inproceedings{newell2017associative,
title={Associative embedding: End-to-end learning for joint detection and grouping},
author={Newell, Alejandro and Huang, Zhiao and Deng, Jia},
booktitle={Advances in neural information processing systems},
pages={2277--2287},
year={2017}
}
HigherHRNet (CVPR'2020)
@inproceedings{cheng2020higherhrnet,
title={HigherHRNet: Scale-Aware Representation Learning for Bottom-Up Human Pose Estimation},
author={Cheng, Bowen and Xiao, Bin and Wang, Jingdong and Shi, Honghui and Huang, Thomas S and Zhang, Lei},
booktitle={Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition},
pages={5386--5395},
year={2020}
}
UDP (CVPR'2020)
@InProceedings{Huang_2020_CVPR,
author = {Huang, Junjie and Zhu, Zheng and Guo, Feng and Huang, Guan},
title = {The Devil Is in the Details: Delving Into Unbiased Data Processing for Human Pose Estimation},
booktitle = {The IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)},
month = {June},
year = {2020}
}
COCO (ECCV'2014)
@inproceedings{lin2014microsoft,
title={Microsoft coco: Common objects in context},
author={Lin, Tsung-Yi and Maire, Michael and Belongie, Serge and Hays, James and Perona, Pietro and Ramanan, Deva and Doll{\'a}r, Piotr and Zitnick, C Lawrence},
booktitle={European conference on computer vision},
pages={740--755},
year={2014},
organization={Springer}
}
Results on COCO val2017 without multi-scale test
Arch | Input Size | AP | AP50 | AP75 | AR | AR50 | ckpt | log |
---|---|---|---|---|---|---|---|---|
HigherHRNet-w32_udp | 512x512 | 0.678 | 0.862 | 0.736 | 0.724 | 0.890 | ckpt | log |
HigherHRNet-w48_udp | 512x512 | 0.690 | 0.872 | 0.750 | 0.734 | 0.891 | ckpt | log |
Associative Embedding + Higherhrnet on Coco¶
Associative Embedding (NIPS'2017)
@inproceedings{newell2017associative,
title={Associative embedding: End-to-end learning for joint detection and grouping},
author={Newell, Alejandro and Huang, Zhiao and Deng, Jia},
booktitle={Advances in neural information processing systems},
pages={2277--2287},
year={2017}
}
HigherHRNet (CVPR'2020)
@inproceedings{cheng2020higherhrnet,
title={HigherHRNet: Scale-Aware Representation Learning for Bottom-Up Human Pose Estimation},
author={Cheng, Bowen and Xiao, Bin and Wang, Jingdong and Shi, Honghui and Huang, Thomas S and Zhang, Lei},
booktitle={Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition},
pages={5386--5395},
year={2020}
}
COCO (ECCV'2014)
@inproceedings{lin2014microsoft,
title={Microsoft coco: Common objects in context},
author={Lin, Tsung-Yi and Maire, Michael and Belongie, Serge and Hays, James and Perona, Pietro and Ramanan, Deva and Doll{\'a}r, Piotr and Zitnick, C Lawrence},
booktitle={European conference on computer vision},
pages={740--755},
year={2014},
organization={Springer}
}
Results on COCO val2017 without multi-scale test
Arch | Input Size | AP | AP50 | AP75 | AR | AR50 | ckpt | log |
---|---|---|---|---|---|---|---|---|
HigherHRNet-w32 | 512x512 | 0.677 | 0.870 | 0.738 | 0.723 | 0.890 | ckpt | log |
HigherHRNet-w32 | 640x640 | 0.686 | 0.871 | 0.747 | 0.733 | 0.898 | ckpt | log |
HigherHRNet-w48 | 512x512 | 0.686 | 0.873 | 0.741 | 0.731 | 0.892 | ckpt | log |
Results on COCO val2017 with multi-scale test. 3 default scales ([2, 1, 0.5]) are used
Arch | Input Size | AP | AP50 | AP75 | AR | AR50 | ckpt | log |
---|---|---|---|---|---|---|---|---|
HigherHRNet-w32 | 512x512 | 0.706 | 0.881 | 0.771 | 0.747 | 0.901 | ckpt | log |
HigherHRNet-w32 | 640x640 | 0.706 | 0.880 | 0.770 | 0.749 | 0.902 | ckpt | log |
HigherHRNet-w48 | 512x512 | 0.716 | 0.884 | 0.775 | 0.755 | 0.901 | ckpt | log |
Associative Embedding + Higherhrnet on Crowdpose¶
Associative Embedding (NIPS'2017)
@inproceedings{newell2017associative,
title={Associative embedding: End-to-end learning for joint detection and grouping},
author={Newell, Alejandro and Huang, Zhiao and Deng, Jia},
booktitle={Advances in neural information processing systems},
pages={2277--2287},
year={2017}
}
HigherHRNet (CVPR'2020)
@inproceedings{cheng2020higherhrnet,
title={HigherHRNet: Scale-Aware Representation Learning for Bottom-Up Human Pose Estimation},
author={Cheng, Bowen and Xiao, Bin and Wang, Jingdong and Shi, Honghui and Huang, Thomas S and Zhang, Lei},
booktitle={Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition},
pages={5386--5395},
year={2020}
}
CrowdPose (CVPR'2019)
@article{li2018crowdpose,
title={CrowdPose: Efficient Crowded Scenes Pose Estimation and A New Benchmark},
author={Li, Jiefeng and Wang, Can and Zhu, Hao and Mao, Yihuan and Fang, Hao-Shu and Lu, Cewu},
journal={arXiv preprint arXiv:1812.00324},
year={2018}
}
Results on CrowdPose test without multi-scale test
Arch | Input Size | AP | AP50 | AP75 | AP (E) | AP (M) | AP (H) | ckpt | log |
---|---|---|---|---|---|---|---|---|---|
HigherHRNet-w32 | 512x512 | 0.655 | 0.859 | 0.705 | 0.728 | 0.660 | 0.577 | ckpt | log |
Results on CrowdPose test with multi-scale test. 2 scales ([2, 1]) are used
Arch | Input Size | AP | AP50 | AP75 | AP (E) | AP (M) | AP (H) | ckpt | log |
---|---|---|---|---|---|---|---|---|---|
HigherHRNet-w32 | 512x512 | 0.661 | 0.864 | 0.710 | 0.742 | 0.670 | 0.566 | ckpt | log |
Associative Embedding + Higherhrnet on Coco-Wholebody¶
Associative Embedding (NIPS'2017)
@inproceedings{newell2017associative,
title={Associative embedding: End-to-end learning for joint detection and grouping},
author={Newell, Alejandro and Huang, Zhiao and Deng, Jia},
booktitle={Advances in neural information processing systems},
pages={2277--2287},
year={2017}
}
HigherHRNet (CVPR'2020)
@inproceedings{cheng2020higherhrnet,
title={HigherHRNet: Scale-Aware Representation Learning for Bottom-Up Human Pose Estimation},
author={Cheng, Bowen and Xiao, Bin and Wang, Jingdong and Shi, Honghui and Huang, Thomas S and Zhang, Lei},
booktitle={Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition},
pages={5386--5395},
year={2020}
}
COCO-WholeBody (ECCV'2020)
@inproceedings{jin2020whole,
title={Whole-Body Human Pose Estimation in the Wild},
author={Jin, Sheng and Xu, Lumin and Xu, Jin and Wang, Can and Liu, Wentao and Qian, Chen and Ouyang, Wanli and Luo, Ping},
booktitle={Proceedings of the European Conference on Computer Vision (ECCV)},
year={2020}
}
Results on COCO-WholeBody v1.0 val without multi-scale test
Arch | Input Size | Body AP | Body AR | Foot AP | Foot AR | Face AP | Face AR | Hand AP | Hand AR | Whole AP | Whole AR | ckpt | log |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|
HigherHRNet-w32+ | 512x512 | 0.590 | 0.672 | 0.185 | 0.335 | 0.676 | 0.721 | 0.212 | 0.298 | 0.401 | 0.493 | ckpt | log |
HigherHRNet-w48+ | 512x512 | 0.630 | 0.706 | 0.440 | 0.573 | 0.730 | 0.777 | 0.389 | 0.477 | 0.487 | 0.574 | ckpt | log |
Note: +
means the model is first pre-trained on original COCO dataset, and then fine-tuned on COCO-WholeBody dataset. We find this will lead to better performance.
LiteHRNet (CVPR’2021)¶
Topdown Heatmap + Litehrnet on Coco¶
LiteHRNet (CVPR'2021)
@inproceedings{Yulitehrnet21,
title={Lite-HRNet: A Lightweight High-Resolution Network},
author={Yu, Changqian and Xiao, Bin and Gao, Changxin and Yuan, Lu and Zhang, Lei and Sang, Nong and Wang, Jingdong},
booktitle={CVPR},
year={2021}
}
COCO (ECCV'2014)
@inproceedings{lin2014microsoft,
title={Microsoft coco: Common objects in context},
author={Lin, Tsung-Yi and Maire, Michael and Belongie, Serge and Hays, James and Perona, Pietro and Ramanan, Deva and Doll{\'a}r, Piotr and Zitnick, C Lawrence},
booktitle={European conference on computer vision},
pages={740--755},
year={2014},
organization={Springer}
}
Results on COCO val2017 with detector having human AP of 56.4 on COCO val2017 dataset
Arch | Input Size | AP | AP50 | AP75 | AR | AR50 | ckpt | log |
---|---|---|---|---|---|---|---|---|
LiteHRNet-18 | 256x192 | 0.643 | 0.868 | 0.720 | 0.706 | 0.912 | ckpt | log |
LiteHRNet-18 | 384x288 | 0.677 | 0.878 | 0.746 | 0.735 | 0.920 | ckpt | log |
LiteHRNet-30 | 256x192 | 0.675 | 0.881 | 0.754 | 0.736 | 0.924 | ckpt | log |
LiteHRNet-30 | 384x288 | 0.700 | 0.884 | 0.776 | 0.758 | 0.928 | ckpt | log |
Topdown Heatmap + Litehrnet on Mpii¶
LiteHRNet (CVPR'2021)
@inproceedings{Yulitehrnet21,
title={Lite-HRNet: A Lightweight High-Resolution Network},
author={Yu, Changqian and Xiao, Bin and Gao, Changxin and Yuan, Lu and Zhang, Lei and Sang, Nong and Wang, Jingdong},
booktitle={CVPR},
year={2021}
}
MPII (CVPR'2014)
@inproceedings{andriluka14cvpr,
author = {Mykhaylo Andriluka and Leonid Pishchulin and Peter Gehler and Schiele, Bernt},
title = {2D Human Pose Estimation: New Benchmark and State of the Art Analysis},
booktitle = {IEEE Conference on Computer Vision and Pattern Recognition (CVPR)},
year = {2014},
month = {June}
}
Results on MPII val set
Arch | Input Size | Mean | Mean@0.1 | ckpt | log |
---|---|---|---|---|---|
LiteHRNet-18 | 256x256 | 0.859 | 0.260 | ckpt | log |
LiteHRNet-30 | 256x256 | 0.869 | 0.271 | ckpt | log |
Topdown Heatmap + Litehrnet + Coco + Wholebody on Coco_wholebody_hand¶
LiteHRNet (CVPR'2021)
@inproceedings{Yulitehrnet21,
title={Lite-HRNet: A Lightweight High-Resolution Network},
author={Yu, Changqian and Xiao, Bin and Gao, Changxin and Yuan, Lu and Zhang, Lei and Sang, Nong and Wang, Jingdong},
booktitle={CVPR},
year={2021}
}
COCO-WholeBody-Hand (ECCV'2020)
@inproceedings{jin2020whole,
title={Whole-Body Human Pose Estimation in the Wild},
author={Jin, Sheng and Xu, Lumin and Xu, Jin and Wang, Can and Liu, Wentao and Qian, Chen and Ouyang, Wanli and Luo, Ping},
booktitle={Proceedings of the European Conference on Computer Vision (ECCV)},
year={2020}
}
Results on COCO-WholeBody-Hand val set
Arch | Input Size | PCK@0.2 | AUC | EPE | ckpt | log |
---|---|---|---|---|---|---|
LiteHRNet-18 | 256x256 | 0.795 | 0.830 | 4.77 | ckpt | log |
RSN (ECCV’2020)¶
Topdown Heatmap + RSN on Coco¶
RSN (ECCV'2020)
@misc{cai2020learning,
title={Learning Delicate Local Representations for Multi-Person Pose Estimation},
author={Yuanhao Cai and Zhicheng Wang and Zhengxiong Luo and Binyi Yin and Angang Du and Haoqian Wang and Xinyu Zhou and Erjin Zhou and Xiangyu Zhang and Jian Sun},
year={2020},
eprint={2003.04030},
archivePrefix={arXiv},
primaryClass={cs.CV}
}
COCO (ECCV'2014)
@inproceedings{lin2014microsoft,
title={Microsoft coco: Common objects in context},
author={Lin, Tsung-Yi and Maire, Michael and Belongie, Serge and Hays, James and Perona, Pietro and Ramanan, Deva and Doll{\'a}r, Piotr and Zitnick, C Lawrence},
booktitle={European conference on computer vision},
pages={740--755},
year={2014},
organization={Springer}
}
Results on COCO val2017 with detector having human AP of 56.4 on COCO val2017 dataset
Arch | Input Size | AP | AP50 | AP75 | AR | AR50 | ckpt | log |
---|---|---|---|---|---|---|---|---|
rsn_18 | 256x192 | 0.704 | 0.887 | 0.779 | 0.771 | 0.926 | ckpt | log |
rsn_50 | 256x192 | 0.723 | 0.896 | 0.800 | 0.788 | 0.934 | ckpt | log |
2xrsn_50 | 256x192 | 0.745 | 0.899 | 0.818 | 0.809 | 0.939 | ckpt | log |
3xrsn_50 | 256x192 | 0.750 | 0.900 | 0.823 | 0.813 | 0.940 | ckpt | log |
MSPN (ArXiv’2019)¶
Topdown Heatmap + MSPN on Coco¶
MSPN (ArXiv'2019)
@article{li2019rethinking,
title={Rethinking on Multi-Stage Networks for Human Pose Estimation},
author={Li, Wenbo and Wang, Zhicheng and Yin, Binyi and Peng, Qixiang and Du, Yuming and Xiao, Tianzi and Yu, Gang and Lu, Hongtao and Wei, Yichen and Sun, Jian},
journal={arXiv preprint arXiv:1901.00148},
year={2019}
}
COCO (ECCV'2014)
@inproceedings{lin2014microsoft,
title={Microsoft coco: Common objects in context},
author={Lin, Tsung-Yi and Maire, Michael and Belongie, Serge and Hays, James and Perona, Pietro and Ramanan, Deva and Doll{\'a}r, Piotr and Zitnick, C Lawrence},
booktitle={European conference on computer vision},
pages={740--755},
year={2014},
organization={Springer}
}
Results on COCO val2017 with detector having human AP of 56.4 on COCO val2017 dataset
Arch | Input Size | AP | AP50 | AP75 | AR | AR50 | ckpt | log |
---|---|---|---|---|---|---|---|---|
mspn_50 | 256x192 | 0.723 | 0.895 | 0.794 | 0.788 | 0.933 | ckpt | log |
2xmspn_50 | 256x192 | 0.754 | 0.903 | 0.825 | 0.815 | 0.941 | ckpt | log |
3xmspn_50 | 256x192 | 0.758 | 0.904 | 0.830 | 0.821 | 0.943 | ckpt | log |
4xmspn_50 | 256x192 | 0.764 | 0.906 | 0.835 | 0.826 | 0.944 | ckpt | log |
HRNet (CVPR’2019)¶
Topdown Heatmap + Hrnet on Animalpose¶
HRNet (CVPR'2019)
@inproceedings{sun2019deep,
title={Deep high-resolution representation learning for human pose estimation},
author={Sun, Ke and Xiao, Bin and Liu, Dong and Wang, Jingdong},
booktitle={Proceedings of the IEEE conference on computer vision and pattern recognition},
pages={5693--5703},
year={2019}
}
Animal-Pose (ICCV'2019)
@InProceedings{Cao_2019_ICCV,
author = {Cao, Jinkun and Tang, Hongyang and Fang, Hao-Shu and Shen, Xiaoyong and Lu, Cewu and Tai, Yu-Wing},
title = {Cross-Domain Adaptation for Animal Pose Estimation},
booktitle = {The IEEE International Conference on Computer Vision (ICCV)},
month = {October},
year = {2019}
}
Results on AnimalPose validation set (1117 instances)
Arch | Input Size | AP | AP50 | AP75 | AR | AR50 | ckpt | log |
---|---|---|---|---|---|---|---|---|
pose_hrnet_w32 | 256x256 | 0.736 | 0.959 | 0.832 | 0.775 | 0.966 | ckpt | log |
pose_hrnet_w48 | 256x256 | 0.737 | 0.959 | 0.823 | 0.778 | 0.962 | ckpt | log |
Topdown Heatmap + Hrnet on Ap10k¶
HRNet (CVPR'2019)
@inproceedings{sun2019deep,
title={Deep high-resolution representation learning for human pose estimation},
author={Sun, Ke and Xiao, Bin and Liu, Dong and Wang, Jingdong},
booktitle={Proceedings of the IEEE conference on computer vision and pattern recognition},
pages={5693--5703},
year={2019}
}
AP-10K (NeurIPS'2021)
@misc{yu2021ap10k,
title={AP-10K: A Benchmark for Animal Pose Estimation in the Wild},
author={Hang Yu and Yufei Xu and Jing Zhang and Wei Zhao and Ziyu Guan and Dacheng Tao},
year={2021},
eprint={2108.12617},
archivePrefix={arXiv},
primaryClass={cs.CV}
}
Results on AP-10K validation set
Arch | Input Size | AP | AP50 | AP75 | APM | APL | ckpt | log |
---|---|---|---|---|---|---|---|---|
pose_hrnet_w32 | 256x256 | 0.738 | 0.958 | 0.808 | 0.592 | 0.743 | ckpt | log |
pose_hrnet_w48 | 256x256 | 0.744 | 0.959 | 0.807 | 0.589 | 0.748 | ckpt | log |
Topdown Heatmap + Hrnet on Atrw¶
HRNet (CVPR'2019)
@inproceedings{sun2019deep,
title={Deep high-resolution representation learning for human pose estimation},
author={Sun, Ke and Xiao, Bin and Liu, Dong and Wang, Jingdong},
booktitle={Proceedings of the IEEE conference on computer vision and pattern recognition},
pages={5693--5703},
year={2019}
}
ATRW (ACM MM'2020)
@inproceedings{li2020atrw,
title={ATRW: A Benchmark for Amur Tiger Re-identification in the Wild},
author={Li, Shuyuan and Li, Jianguo and Tang, Hanlin and Qian, Rui and Lin, Weiyao},
booktitle={Proceedings of the 28th ACM International Conference on Multimedia},
pages={2590--2598},
year={2020}
}
Results on ATRW validation set
Arch | Input Size | AP | AP50 | AP75 | AR | AR50 | ckpt | log |
---|---|---|---|---|---|---|---|---|
pose_hrnet_w32 | 256x256 | 0.912 | 0.973 | 0.959 | 0.938 | 0.985 | ckpt | log |
pose_hrnet_w48 | 256x256 | 0.911 | 0.972 | 0.946 | 0.937 | 0.985 | ckpt | log |
Topdown Heatmap + Hrnet on Horse10¶
HRNet (CVPR'2019)
@inproceedings{sun2019deep,
title={Deep high-resolution representation learning for human pose estimation},
author={Sun, Ke and Xiao, Bin and Liu, Dong and Wang, Jingdong},
booktitle={Proceedings of the IEEE conference on computer vision and pattern recognition},
pages={5693--5703},
year={2019}
}
Horse-10 (WACV'2021)
@inproceedings{mathis2021pretraining,
title={Pretraining boosts out-of-domain robustness for pose estimation},
author={Mathis, Alexander and Biasi, Thomas and Schneider, Steffen and Yuksekgonul, Mert and Rogers, Byron and Bethge, Matthias and Mathis, Mackenzie W},
booktitle={Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision},
pages={1859--1868},
year={2021}
}
Results on Horse-10 test set
Set | Arch | Input Size | PCK@0.3 | NME | ckpt | log |
---|---|---|---|---|---|---|
split1 | pose_hrnet_w32 | 256x256 | 0.951 | 0.122 | ckpt | log |
split2 | pose_hrnet_w32 | 256x256 | 0.949 | 0.116 | ckpt | log |
split3 | pose_hrnet_w32 | 256x256 | 0.939 | 0.153 | ckpt | log |
split1 | pose_hrnet_w48 | 256x256 | 0.973 | 0.095 | ckpt | log |
split2 | pose_hrnet_w48 | 256x256 | 0.969 | 0.101 | ckpt | log |
split3 | pose_hrnet_w48 | 256x256 | 0.961 | 0.128 | ckpt | log |
Topdown Heatmap + Hrnet on Macaque¶
HRNet (CVPR'2019)
@inproceedings{sun2019deep,
title={Deep high-resolution representation learning for human pose estimation},
author={Sun, Ke and Xiao, Bin and Liu, Dong and Wang, Jingdong},
booktitle={Proceedings of the IEEE conference on computer vision and pattern recognition},
pages={5693--5703},
year={2019}
}
MacaquePose (bioRxiv'2020)
@article{labuguen2020macaquepose,
title={MacaquePose: A novel ‘in the wild’macaque monkey pose dataset for markerless motion capture},
author={Labuguen, Rollyn and Matsumoto, Jumpei and Negrete, Salvador and Nishimaru, Hiroshi and Nishijo, Hisao and Takada, Masahiko and Go, Yasuhiro and Inoue, Ken-ichi and Shibata, Tomohiro},
journal={bioRxiv},
year={2020},
publisher={Cold Spring Harbor Laboratory}
}
Results on MacaquePose with ground-truth detection bounding boxes
Arch | Input Size | AP | AP50 | AP75 | AR | AR50 | ckpt | log |
---|---|---|---|---|---|---|---|---|
pose_hrnet_w32 | 256x192 | 0.814 | 0.953 | 0.918 | 0.851 | 0.969 | ckpt | log |
pose_hrnet_w48 | 256x192 | 0.818 | 0.963 | 0.917 | 0.855 | 0.971 | ckpt | log |
Associative Embedding + Hrnet on Aic¶
Associative Embedding (NIPS'2017)
@inproceedings{newell2017associative,
title={Associative embedding: End-to-end learning for joint detection and grouping},
author={Newell, Alejandro and Huang, Zhiao and Deng, Jia},
booktitle={Advances in neural information processing systems},
pages={2277--2287},
year={2017}
}
HRNet (CVPR'2019)
@inproceedings{sun2019deep,
title={Deep high-resolution representation learning for human pose estimation},
author={Sun, Ke and Xiao, Bin and Liu, Dong and Wang, Jingdong},
booktitle={Proceedings of the IEEE conference on computer vision and pattern recognition},
pages={5693--5703},
year={2019}
}
AI Challenger (ArXiv'2017)
@article{wu2017ai,
title={Ai challenger: A large-scale dataset for going deeper in image understanding},
author={Wu, Jiahong and Zheng, He and Zhao, Bo and Li, Yixin and Yan, Baoming and Liang, Rui and Wang, Wenjia and Zhou, Shipei and Lin, Guosen and Fu, Yanwei and others},
journal={arXiv preprint arXiv:1711.06475},
year={2017}
}
Results on AIC validation set without multi-scale test
Arch | Input Size | AP | AP50 | AP75 | AR | AR50 | ckpt | log |
---|---|---|---|---|---|---|---|---|
HRNet-w32 | 512x512 | 0.303 | 0.697 | 0.225 | 0.373 | 0.755 | ckpt | log |
Results on AIC validation set with multi-scale test. 3 default scales ([2, 1, 0.5]) are used
Arch | Input Size | AP | AP50 | AP75 | AR | AR50 | ckpt | log |
---|---|---|---|---|---|---|---|---|
HRNet-w32 | 512x512 | 0.318 | 0.717 | 0.246 | 0.379 | 0.764 | ckpt | log |
Topdown Heatmap + Hrnet on Aic¶
HRNet (CVPR'2019)
@inproceedings{sun2019deep,
title={Deep high-resolution representation learning for human pose estimation},
author={Sun, Ke and Xiao, Bin and Liu, Dong and Wang, Jingdong},
booktitle={Proceedings of the IEEE conference on computer vision and pattern recognition},
pages={5693--5703},
year={2019}
}
AI Challenger (ArXiv'2017)
@article{wu2017ai,
title={Ai challenger: A large-scale dataset for going deeper in image understanding},
author={Wu, Jiahong and Zheng, He and Zhao, Bo and Li, Yixin and Yan, Baoming and Liang, Rui and Wang, Wenjia and Zhou, Shipei and Lin, Guosen and Fu, Yanwei and others},
journal={arXiv preprint arXiv:1711.06475},
year={2017}
}
Results on AIC val set with ground-truth bounding boxes
Arch | Input Size | AP | AP50 | AP75 | AR | AR50 | ckpt | log |
---|---|---|---|---|---|---|---|---|
pose_hrnet_w32 | 256x192 | 0.323 | 0.762 | 0.219 | 0.366 | 0.789 | ckpt | log |
Associative Embedding + Hrnet on Coco¶
Associative Embedding (NIPS'2017)
@inproceedings{newell2017associative,
title={Associative embedding: End-to-end learning for joint detection and grouping},
author={Newell, Alejandro and Huang, Zhiao and Deng, Jia},
booktitle={Advances in neural information processing systems},
pages={2277--2287},
year={2017}
}
HRNet (CVPR'2019)
@inproceedings{sun2019deep,
title={Deep high-resolution representation learning for human pose estimation},
author={Sun, Ke and Xiao, Bin and Liu, Dong and Wang, Jingdong},
booktitle={Proceedings of the IEEE conference on computer vision and pattern recognition},
pages={5693--5703},
year={2019}
}
COCO (ECCV'2014)
@inproceedings{lin2014microsoft,
title={Microsoft coco: Common objects in context},
author={Lin, Tsung-Yi and Maire, Michael and Belongie, Serge and Hays, James and Perona, Pietro and Ramanan, Deva and Doll{\'a}r, Piotr and Zitnick, C Lawrence},
booktitle={European conference on computer vision},
pages={740--755},
year={2014},
organization={Springer}
}
Results on COCO val2017 without multi-scale test
Arch | Input Size | AP | AP50 | AP75 | AR | AR50 | ckpt | log |
---|---|---|---|---|---|---|---|---|
HRNet-w32 | 512x512 | 0.654 | 0.863 | 0.720 | 0.710 | 0.892 | ckpt | log |
HRNet-w48 | 512x512 | 0.665 | 0.860 | 0.727 | 0.716 | 0.889 | ckpt | log |
Results on COCO val2017 with multi-scale test. 3 default scales ([2, 1, 0.5]) are used
Arch | Input Size | AP | AP50 | AP75 | AR | AR50 | ckpt | log |
---|---|---|---|---|---|---|---|---|
HRNet-w32 | 512x512 | 0.698 | 0.877 | 0.760 | 0.748 | 0.907 | ckpt | log |
HRNet-w48 | 512x512 | 0.712 | 0.880 | 0.771 | 0.757 | 0.909 | ckpt | log |
Associative Embedding + Hrnet + Udp on Coco¶
Associative Embedding (NIPS'2017)
@inproceedings{newell2017associative,
title={Associative embedding: End-to-end learning for joint detection and grouping},
author={Newell, Alejandro and Huang, Zhiao and Deng, Jia},
booktitle={Advances in neural information processing systems},
pages={2277--2287},
year={2017}
}
HRNet (CVPR'2019)
@inproceedings{sun2019deep,
title={Deep high-resolution representation learning for human pose estimation},
author={Sun, Ke and Xiao, Bin and Liu, Dong and Wang, Jingdong},
booktitle={Proceedings of the IEEE conference on computer vision and pattern recognition},
pages={5693--5703},
year={2019}
}
UDP (CVPR'2020)
@InProceedings{Huang_2020_CVPR,
author = {Huang, Junjie and Zhu, Zheng and Guo, Feng and Huang, Guan},
title = {The Devil Is in the Details: Delving Into Unbiased Data Processing for Human Pose Estimation},
booktitle = {The IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)},
month = {June},
year = {2020}
}
COCO (ECCV'2014)
@inproceedings{lin2014microsoft,
title={Microsoft coco: Common objects in context},
author={Lin, Tsung-Yi and Maire, Michael and Belongie, Serge and Hays, James and Perona, Pietro and Ramanan, Deva and Doll{\'a}r, Piotr and Zitnick, C Lawrence},
booktitle={European conference on computer vision},
pages={740--755},
year={2014},
organization={Springer}
}
Results on COCO val2017 without multi-scale test
Arch | Input Size | AP | AP50 | AP75 | AR | AR50 | ckpt | log |
---|---|---|---|---|---|---|---|---|
HRNet-w32_udp | 512x512 | 0.671 | 0.863 | 0.729 | 0.717 | 0.889 | ckpt | log |
HRNet-w48_udp | 512x512 | 0.681 | 0.872 | 0.741 | 0.725 | 0.892 | ckpt | log |
Topdown Heatmap + Hrnet + Augmentation on Coco¶
HRNet (CVPR'2019)
@inproceedings{sun2019deep,
title={Deep high-resolution representation learning for human pose estimation},
author={Sun, Ke and Xiao, Bin and Liu, Dong and Wang, Jingdong},
booktitle={Proceedings of the IEEE conference on computer vision and pattern recognition},
pages={5693--5703},
year={2019}
}
Albumentations (Information'2020)
@article{buslaev2020albumentations,
title={Albumentations: fast and flexible image augmentations},
author={Buslaev, Alexander and Iglovikov, Vladimir I and Khvedchenya, Eugene and Parinov, Alex and Druzhinin, Mikhail and Kalinin, Alexandr A},
journal={Information},
volume={11},
number={2},
pages={125},
year={2020},
publisher={Multidisciplinary Digital Publishing Institute}
}
COCO (ECCV'2014)
@inproceedings{lin2014microsoft,
title={Microsoft coco: Common objects in context},
author={Lin, Tsung-Yi and Maire, Michael and Belongie, Serge and Hays, James and Perona, Pietro and Ramanan, Deva and Doll{\'a}r, Piotr and Zitnick, C Lawrence},
booktitle={European conference on computer vision},
pages={740--755},
year={2014},
organization={Springer}
}
Results on COCO val2017 with detector having human AP of 56.4 on COCO val2017 dataset
Arch | Input Size | AP | AP50 | AP75 | AR | AR50 | ckpt | log |
---|---|---|---|---|---|---|---|---|
coarsedropout | 256x192 | 0.753 | 0.908 | 0.822 | 0.806 | 0.946 | ckpt | log |
gridmask | 256x192 | 0.752 | 0.906 | 0.825 | 0.804 | 0.943 | ckpt | log |
photometric | 256x192 | 0.753 | 0.909 | 0.825 | 0.805 | 0.943 | ckpt | log |
Topdown Heatmap + Hrnet + Fp16 on Coco¶
HRNet (CVPR'2019)
@inproceedings{sun2019deep,
title={Deep high-resolution representation learning for human pose estimation},
author={Sun, Ke and Xiao, Bin and Liu, Dong and Wang, Jingdong},
booktitle={Proceedings of the IEEE conference on computer vision and pattern recognition},
pages={5693--5703},
year={2019}
}
FP16 (ArXiv'2017)
@article{micikevicius2017mixed,
title={Mixed precision training},
author={Micikevicius, Paulius and Narang, Sharan and Alben, Jonah and Diamos, Gregory and Elsen, Erich and Garcia, David and Ginsburg, Boris and Houston, Michael and Kuchaiev, Oleksii and Venkatesh, Ganesh and others},
journal={arXiv preprint arXiv:1710.03740},
year={2017}
}
COCO (ECCV'2014)
@inproceedings{lin2014microsoft,
title={Microsoft coco: Common objects in context},
author={Lin, Tsung-Yi and Maire, Michael and Belongie, Serge and Hays, James and Perona, Pietro and Ramanan, Deva and Doll{\'a}r, Piotr and Zitnick, C Lawrence},
booktitle={European conference on computer vision},
pages={740--755},
year={2014},
organization={Springer}
}
Results on COCO val2017 with detector having human AP of 56.4 on COCO val2017 dataset
Arch | Input Size | AP | AP50 | AP75 | AR | AR50 | ckpt | log |
---|---|---|---|---|---|---|---|---|
pose_hrnet_w32_fp16 | 256x192 | 0.746 | 0.905 | 0.88 | 0.800 | 0.943 | ckpt | log |
Topdown Heatmap + Hrnet on Coco¶
HRNet (CVPR'2019)
@inproceedings{sun2019deep,
title={Deep high-resolution representation learning for human pose estimation},
author={Sun, Ke and Xiao, Bin and Liu, Dong and Wang, Jingdong},
booktitle={Proceedings of the IEEE conference on computer vision and pattern recognition},
pages={5693--5703},
year={2019}
}
COCO (ECCV'2014)
@inproceedings{lin2014microsoft,
title={Microsoft coco: Common objects in context},
author={Lin, Tsung-Yi and Maire, Michael and Belongie, Serge and Hays, James and Perona, Pietro and Ramanan, Deva and Doll{\'a}r, Piotr and Zitnick, C Lawrence},
booktitle={European conference on computer vision},
pages={740--755},
year={2014},
organization={Springer}
}
Results on COCO val2017 with detector having human AP of 56.4 on COCO val2017 dataset
Arch | Input Size | AP | AP50 | AP75 | AR | AR50 | ckpt | log |
---|---|---|---|---|---|---|---|---|
pose_hrnet_w32 | 256x192 | 0.746 | 0.904 | 0.819 | 0.799 | 0.942 | ckpt | log |
pose_hrnet_w32 | 384x288 | 0.760 | 0.906 | 0.829 | 0.810 | 0.943 | ckpt | log |
pose_hrnet_w48 | 256x192 | 0.756 | 0.907 | 0.825 | 0.806 | 0.942 | ckpt | log |
pose_hrnet_w48 | 384x288 | 0.767 | 0.910 | 0.831 | 0.816 | 0.946 | ckpt | log |
Topdown Heatmap + Hrnet + Udp on Coco¶
HRNet (CVPR'2019)
@inproceedings{sun2019deep,
title={Deep high-resolution representation learning for human pose estimation},
author={Sun, Ke and Xiao, Bin and Liu, Dong and Wang, Jingdong},
booktitle={Proceedings of the IEEE conference on computer vision and pattern recognition},
pages={5693--5703},
year={2019}
}
UDP (CVPR'2020)
@InProceedings{Huang_2020_CVPR,
author = {Huang, Junjie and Zhu, Zheng and Guo, Feng and Huang, Guan},
title = {The Devil Is in the Details: Delving Into Unbiased Data Processing for Human Pose Estimation},
booktitle = {The IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)},
month = {June},
year = {2020}
}
COCO (ECCV'2014)
@inproceedings{lin2014microsoft,
title={Microsoft coco: Common objects in context},
author={Lin, Tsung-Yi and Maire, Michael and Belongie, Serge and Hays, James and Perona, Pietro and Ramanan, Deva and Doll{\'a}r, Piotr and Zitnick, C Lawrence},
booktitle={European conference on computer vision},
pages={740--755},
year={2014},
organization={Springer}
}
Results on COCO val2017 with detector having human AP of 56.4 on COCO val2017 dataset
Arch | Input Size | AP | AP50 | AP75 | AR | AR50 | ckpt | log |
---|---|---|---|---|---|---|---|---|
pose_hrnet_w32_udp | 256x192 | 0.760 | 0.907 | 0.827 | 0.811 | 0.945 | ckpt | log |
pose_hrnet_w32_udp | 384x288 | 0.769 | 0.908 | 0.833 | 0.817 | 0.944 | ckpt | log |
pose_hrnet_w48_udp | 256x192 | 0.767 | 0.906 | 0.834 | 0.817 | 0.945 | ckpt | log |
pose_hrnet_w48_udp | 384x288 | 0.772 | 0.910 | 0.835 | 0.820 | 0.945 | ckpt | log |
pose_hrnet_w32_udp_regress | 256x192 | 0.758 | 0.908 | 0.823 | 0.812 | 0.943 | ckpt | log |
Note that, UDP also adopts the unbiased encoding/decoding algorithm of DARK.
Topdown Heatmap + Hrnet + Dark on Coco¶
HRNet (CVPR'2019)
@inproceedings{sun2019deep,
title={Deep high-resolution representation learning for human pose estimation},
author={Sun, Ke and Xiao, Bin and Liu, Dong and Wang, Jingdong},
booktitle={Proceedings of the IEEE conference on computer vision and pattern recognition},
pages={5693--5703},
year={2019}
}
DarkPose (CVPR'2020)
@inproceedings{zhang2020distribution,
title={Distribution-aware coordinate representation for human pose estimation},
author={Zhang, Feng and Zhu, Xiatian and Dai, Hanbin and Ye, Mao and Zhu, Ce},
booktitle={Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition},
pages={7093--7102},
year={2020}
}
COCO (ECCV'2014)
@inproceedings{lin2014microsoft,
title={Microsoft coco: Common objects in context},
author={Lin, Tsung-Yi and Maire, Michael and Belongie, Serge and Hays, James and Perona, Pietro and Ramanan, Deva and Doll{\'a}r, Piotr and Zitnick, C Lawrence},
booktitle={European conference on computer vision},
pages={740--755},
year={2014},
organization={Springer}
}
Results on COCO val2017 with detector having human AP of 56.4 on COCO val2017 dataset
Arch | Input Size | AP | AP50 | AP75 | AR | AR50 | ckpt | log |
---|---|---|---|---|---|---|---|---|
pose_hrnet_w32_dark | 256x192 | 0.757 | 0.907 | 0.823 | 0.808 | 0.943 | ckpt | log |
pose_hrnet_w32_dark | 384x288 | 0.766 | 0.907 | 0.831 | 0.815 | 0.943 | ckpt | log |
pose_hrnet_w48_dark | 256x192 | 0.764 | 0.907 | 0.830 | 0.814 | 0.943 | ckpt | log |
pose_hrnet_w48_dark | 384x288 | 0.772 | 0.910 | 0.836 | 0.820 | 0.946 | ckpt | log |
Topdown Heatmap + Hrnet on Crowdpose¶
HRNet (CVPR'2019)
@inproceedings{sun2019deep,
title={Deep high-resolution representation learning for human pose estimation},
author={Sun, Ke and Xiao, Bin and Liu, Dong and Wang, Jingdong},
booktitle={Proceedings of the IEEE conference on computer vision and pattern recognition},
pages={5693--5703},
year={2019}
}
CrowdPose (CVPR'2019)
@article{li2018crowdpose,
title={CrowdPose: Efficient Crowded Scenes Pose Estimation and A New Benchmark},
author={Li, Jiefeng and Wang, Can and Zhu, Hao and Mao, Yihuan and Fang, Hao-Shu and Lu, Cewu},
journal={arXiv preprint arXiv:1812.00324},
year={2018}
}
Results on CrowdPose test with YOLOv3 human detector
Arch | Input Size | AP | AP50 | AP75 | AP (E) | AP (M) | AP (H) | ckpt | log |
---|---|---|---|---|---|---|---|---|---|
pose_hrnet_w32 | 256x192 | 0.675 | 0.825 | 0.729 | 0.770 | 0.687 | 0.553 | ckpt | log |
Topdown Heatmap + Hrnet on H36m¶
HRNet (CVPR'2019)
@inproceedings{sun2019deep,
title={Deep high-resolution representation learning for human pose estimation},
author={Sun, Ke and Xiao, Bin and Liu, Dong and Wang, Jingdong},
booktitle={Proceedings of the IEEE conference on computer vision and pattern recognition},
pages={5693--5703},
year={2019}
}
Human3.6M (TPAMI'2014)
@article{h36m_pami,
author = {Ionescu, Catalin and Papava, Dragos and Olaru, Vlad and Sminchisescu, Cristian},
title = {Human3.6M: Large Scale Datasets and Predictive Methods for 3D Human Sensing in Natural Environments},
journal = {IEEE Transactions on Pattern Analysis and Machine Intelligence},
publisher = {IEEE Computer Society},
volume = {36},
number = {7},
pages = {1325-1339},
month = {jul},
year = {2014}
}
Results on Human3.6M test set with ground truth 2D detections
Arch | Input Size | EPE | PCK | ckpt | log |
---|---|---|---|---|---|
pose_hrnet_w32 | 256x256 | 9.43 | 0.911 | ckpt | log |
pose_hrnet_w48 | 256x256 | 7.36 | 0.932 | ckpt | log |
Associative Embedding + Hrnet on MHP¶
Associative Embedding (NIPS'2017)
@inproceedings{newell2017associative,
title={Associative embedding: End-to-end learning for joint detection and grouping},
author={Newell, Alejandro and Huang, Zhiao and Deng, Jia},
booktitle={Advances in neural information processing systems},
pages={2277--2287},
year={2017}
}
HRNet (CVPR'2019)
@inproceedings{sun2019deep,
title={Deep high-resolution representation learning for human pose estimation},
author={Sun, Ke and Xiao, Bin and Liu, Dong and Wang, Jingdong},
booktitle={Proceedings of the IEEE conference on computer vision and pattern recognition},
pages={5693--5703},
year={2019}
}
MHP (ACM MM'2018)
@inproceedings{zhao2018understanding,
title={Understanding humans in crowded scenes: Deep nested adversarial learning and a new benchmark for multi-human parsing},
author={Zhao, Jian and Li, Jianshu and Cheng, Yu and Sim, Terence and Yan, Shuicheng and Feng, Jiashi},
booktitle={Proceedings of the 26th ACM international conference on Multimedia},
pages={792--800},
year={2018}
}
Results on MHP v2.0 validation set without multi-scale test
Arch | Input Size | AP | AP50 | AP75 | AR | AR50 | ckpt | log |
---|---|---|---|---|---|---|---|---|
HRNet-w48 | 512x512 | 0.583 | 0.895 | 0.666 | 0.656 | 0.931 | ckpt | log |
Results on MHP v2.0 validation set with multi-scale test. 3 default scales ([2, 1, 0.5]) are used
Arch | Input Size | AP | AP50 | AP75 | AR | AR50 | ckpt | log |
---|---|---|---|---|---|---|---|---|
HRNet-w48 | 512x512 | 0.592 | 0.898 | 0.673 | 0.664 | 0.932 | ckpt | log |
Topdown Heatmap + Hrnet + Dark on Mpii¶
HRNet (CVPR'2019)
@inproceedings{sun2019deep,
title={Deep high-resolution representation learning for human pose estimation},
author={Sun, Ke and Xiao, Bin and Liu, Dong and Wang, Jingdong},
booktitle={Proceedings of the IEEE conference on computer vision and pattern recognition},
pages={5693--5703},
year={2019}
}
DarkPose (CVPR'2020)
@inproceedings{zhang2020distribution,
title={Distribution-aware coordinate representation for human pose estimation},
author={Zhang, Feng and Zhu, Xiatian and Dai, Hanbin and Ye, Mao and Zhu, Ce},
booktitle={Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition},
pages={7093--7102},
year={2020}
}
MPII (CVPR'2014)
@inproceedings{andriluka14cvpr,
author = {Mykhaylo Andriluka and Leonid Pishchulin and Peter Gehler and Schiele, Bernt},
title = {2D Human Pose Estimation: New Benchmark and State of the Art Analysis},
booktitle = {IEEE Conference on Computer Vision and Pattern Recognition (CVPR)},
year = {2014},
month = {June}
}
Results on MPII val set
Arch | Input Size | Mean | Mean@0.1 | ckpt | log |
---|---|---|---|---|---|
pose_hrnet_w32_dark | 256x256 | 0.904 | 0.354 | ckpt | log |
pose_hrnet_w48_dark | 256x256 | 0.905 | 0.360 | ckpt | log |
Topdown Heatmap + Hrnet on Mpii¶
HRNet (CVPR'2019)
@inproceedings{sun2019deep,
title={Deep high-resolution representation learning for human pose estimation},
author={Sun, Ke and Xiao, Bin and Liu, Dong and Wang, Jingdong},
booktitle={Proceedings of the IEEE conference on computer vision and pattern recognition},
pages={5693--5703},
year={2019}
}
MPII (CVPR'2014)
@inproceedings{andriluka14cvpr,
author = {Mykhaylo Andriluka and Leonid Pishchulin and Peter Gehler and Schiele, Bernt},
title = {2D Human Pose Estimation: New Benchmark and State of the Art Analysis},
booktitle = {IEEE Conference on Computer Vision and Pattern Recognition (CVPR)},
year = {2014},
month = {June}
}
Results on MPII val set
Arch | Input Size | Mean | Mean@0.1 | ckpt | log |
---|---|---|---|---|---|
pose_hrnet_w32 | 256x256 | 0.900 | 0.334 | ckpt | log |
pose_hrnet_w48 | 256x256 | 0.901 | 0.337 | ckpt | log |
Topdown Heatmap + Hrnet on Ochuman¶
HRNet (CVPR'2019)
@inproceedings{sun2019deep,
title={Deep high-resolution representation learning for human pose estimation},
author={Sun, Ke and Xiao, Bin and Liu, Dong and Wang, Jingdong},
booktitle={Proceedings of the IEEE conference on computer vision and pattern recognition},
pages={5693--5703},
year={2019}
}
OCHuman (CVPR'2019)
@inproceedings{zhang2019pose2seg,
title={Pose2seg: Detection free human instance segmentation},
author={Zhang, Song-Hai and Li, Ruilong and Dong, Xin and Rosin, Paul and Cai, Zixi and Han, Xi and Yang, Dingcheng and Huang, Haozhi and Hu, Shi-Min},
booktitle={Proceedings of the IEEE conference on computer vision and pattern recognition},
pages={889--898},
year={2019}
}
Results on OCHuman test dataset with ground-truth bounding boxes
Following the common setting, the models are trained on COCO train dataset, and evaluate on OCHuman dataset.
Arch | Input Size | AP | AP50 | AP75 | AR | AR50 | ckpt | log |
---|---|---|---|---|---|---|---|---|
pose_hrnet_w32 | 256x192 | 0.591 | 0.748 | 0.641 | 0.631 | 0.775 | ckpt | log |
pose_hrnet_w32 | 384x288 | 0.606 | 0.748 | 0.650 | 0.647 | 0.776 | ckpt | log |
pose_hrnet_w48 | 256x192 | 0.611 | 0.752 | 0.663 | 0.648 | 0.778 | ckpt | log |
pose_hrnet_w48 | 384x288 | 0.616 | 0.749 | 0.663 | 0.653 | 0.773 | ckpt | log |
Topdown Heatmap + Hrnet on Posetrack18¶
HRNet (CVPR'2019)
@inproceedings{sun2019deep,
title={Deep high-resolution representation learning for human pose estimation},
author={Sun, Ke and Xiao, Bin and Liu, Dong and Wang, Jingdong},
booktitle={Proceedings of the IEEE conference on computer vision and pattern recognition},
pages={5693--5703},
year={2019}
}
PoseTrack18 (CVPR'2018)
@inproceedings{andriluka2018posetrack,
title={Posetrack: A benchmark for human pose estimation and tracking},
author={Andriluka, Mykhaylo and Iqbal, Umar and Insafutdinov, Eldar and Pishchulin, Leonid and Milan, Anton and Gall, Juergen and Schiele, Bernt},
booktitle={Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition},
pages={5167--5176},
year={2018}
}
Results on PoseTrack2018 val with ground-truth bounding boxes
Arch | Input Size | Head | Shou | Elb | Wri | Hip | Knee | Ankl | Total | ckpt | log |
---|---|---|---|---|---|---|---|---|---|---|---|
pose_hrnet_w32 | 256x192 | 87.4 | 88.6 | 84.3 | 78.5 | 79.7 | 81.8 | 78.8 | 83.0 | ckpt | log |
pose_hrnet_w32 | 384x288 | 87.0 | 88.8 | 85.0 | 80.1 | 80.5 | 82.6 | 79.4 | 83.6 | ckpt | log |
pose_hrnet_w48 | 256x192 | 88.2 | 90.1 | 85.8 | 80.8 | 80.7 | 83.3 | 80.3 | 84.4 | ckpt | log |
pose_hrnet_w48 | 384x288 | 87.8 | 90.0 | 85.9 | 81.3 | 81.1 | 83.3 | 80.9 | 84.5 | ckpt | log |
The models are first pre-trained on COCO dataset, and then fine-tuned on PoseTrack18.
Results on PoseTrack2018 val with MMDetection pre-trained Cascade R-CNN (X-101-64x4d-FPN) human detector
Arch | Input Size | Head | Shou | Elb | Wri | Hip | Knee | Ankl | Total | ckpt | log |
---|---|---|---|---|---|---|---|---|---|---|---|
pose_hrnet_w32 | 256x192 | 78.0 | 82.9 | 79.5 | 73.8 | 76.9 | 76.6 | 70.2 | 76.9 | ckpt | log |
pose_hrnet_w32 | 384x288 | 79.9 | 83.6 | 80.4 | 74.5 | 74.8 | 76.1 | 70.5 | 77.3 | ckpt | log |
pose_hrnet_w48 | 256x192 | 80.1 | 83.4 | 80.6 | 74.8 | 74.3 | 76.8 | 70.4 | 77.4 | ckpt | log |
pose_hrnet_w48 | 384x288 | 80.2 | 83.8 | 80.9 | 75.2 | 74.7 | 76.7 | 71.7 | 77.8 | ckpt | log |
The models are first pre-trained on COCO dataset, and then fine-tuned on PoseTrack18.
Posewarper + Hrnet + Posetrack18 on Posetrack18¶
PoseWarper (NeurIPS'2019)
@inproceedings{NIPS2019_gberta,
title = {Learning Temporal Pose Estimation from Sparsely Labeled Videos},
author = {Bertasius, Gedas and Feichtenhofer, Christoph, and Tran, Du and Shi, Jianbo, and Torresani, Lorenzo},
booktitle = {Advances in Neural Information Processing Systems 33},
year = {2019},
}
HRNet (CVPR'2019)
@inproceedings{sun2019deep,
title={Deep high-resolution representation learning for human pose estimation},
author={Sun, Ke and Xiao, Bin and Liu, Dong and Wang, Jingdong},
booktitle={Proceedings of the IEEE conference on computer vision and pattern recognition},
pages={5693--5703},
year={2019}
}
PoseTrack18 (CVPR'2018)
@inproceedings{andriluka2018posetrack,
title={Posetrack: A benchmark for human pose estimation and tracking},
author={Andriluka, Mykhaylo and Iqbal, Umar and Insafutdinov, Eldar and Pishchulin, Leonid and Milan, Anton and Gall, Juergen and Schiele, Bernt},
booktitle={Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition},
pages={5167--5176},
year={2018}
}
COCO (ECCV'2014)
@inproceedings{lin2014microsoft,
title={Microsoft coco: Common objects in context},
author={Lin, Tsung-Yi and Maire, Michael and Belongie, Serge and Hays, James and Perona, Pietro and Ramanan, Deva and Doll{\'a}r, Piotr and Zitnick, C Lawrence},
booktitle={European conference on computer vision},
pages={740--755},
year={2014},
organization={Springer}
}
Note that the training of PoseWarper can be split into two stages.
The first-stage is trained with the pre-trained checkpoint on COCO dataset, and the main backbone is fine-tuned on PoseTrack18 in a single-frame setting.
The second-stage is trained with the last checkpoint from the first stage, and the warping offsets are learned in a multi-frame setting while the backbone is frozen.
Results on PoseTrack2018 val with ground-truth bounding boxes
Arch | Input Size | Head | Shou | Elb | Wri | Hip | Knee | Ankl | Total | ckpt | log |
---|---|---|---|---|---|---|---|---|---|---|---|
pose_hrnet_w48 | 384x288 | 88.2 | 90.3 | 86.1 | 81.6 | 81.8 | 83.8 | 81.5 | 85.0 | ckpt | log |
Results on PoseTrack2018 val with precomputed human bounding boxes from PoseWarper supplementary data files from this link1.
Arch | Input Size | Head | Shou | Elb | Wri | Hip | Knee | Ankl | Total | ckpt | log |
---|---|---|---|---|---|---|---|---|---|---|---|
pose_hrnet_w48 | 384x288 | 81.8 | 85.6 | 82.7 | 77.2 | 76.8 | 79.0 | 74.4 | 79.8 | ckpt | log |
1 Please download the precomputed human bounding boxes on PoseTrack2018 val from $PoseWarper_supp_files/posetrack18_precomputed_boxes/val_boxes.json
and place it here: $mmpose/data/posetrack18/posetrack18_precomputed_boxes/val_boxes.json
to be consistent with the config. Please refer to DATA Preparation for more detail about data preparation.
Associative Embedding + Hrnet on Coco-Wholebody¶
Associative Embedding (NIPS'2017)
@inproceedings{newell2017associative,
title={Associative embedding: End-to-end learning for joint detection and grouping},
author={Newell, Alejandro and Huang, Zhiao and Deng, Jia},
booktitle={Advances in neural information processing systems},
pages={2277--2287},
year={2017}
}
HRNet (CVPR'2019)
@inproceedings{sun2019deep,
title={Deep high-resolution representation learning for human pose estimation},
author={Sun, Ke and Xiao, Bin and Liu, Dong and Wang, Jingdong},
booktitle={Proceedings of the IEEE conference on computer vision and pattern recognition},
pages={5693--5703},
year={2019}
}
COCO-WholeBody (ECCV'2020)
@inproceedings{jin2020whole,
title={Whole-Body Human Pose Estimation in the Wild},
author={Jin, Sheng and Xu, Lumin and Xu, Jin and Wang, Can and Liu, Wentao and Qian, Chen and Ouyang, Wanli and Luo, Ping},
booktitle={Proceedings of the European Conference on Computer Vision (ECCV)},
year={2020}
}
Results on COCO-WholeBody v1.0 val without multi-scale test
Arch | Input Size | Body AP | Body AR | Foot AP | Foot AR | Face AP | Face AR | Hand AP | Hand AR | Whole AP | Whole AR | ckpt | log |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|
HRNet-w32+ | 512x512 | 0.551 | 0.650 | 0.271 | 0.451 | 0.564 | 0.618 | 0.159 | 0.238 | 0.342 | 0.453 | ckpt | log |
HRNet-w48+ | 512x512 | 0.592 | 0.686 | 0.443 | 0.595 | 0.619 | 0.674 | 0.347 | 0.438 | 0.422 | 0.532 | ckpt | log |
Note: +
means the model is first pre-trained on original COCO dataset, and then fine-tuned on COCO-WholeBody dataset. We find this will lead to better performance.
Topdown Heatmap + Hrnet + Dark on Coco-Wholebody¶
HRNet (CVPR'2019)
@inproceedings{sun2019deep,
title={Deep high-resolution representation learning for human pose estimation},
author={Sun, Ke and Xiao, Bin and Liu, Dong and Wang, Jingdong},
booktitle={Proceedings of the IEEE conference on computer vision and pattern recognition},
pages={5693--5703},
year={2019}
}
DarkPose (CVPR'2020)
@inproceedings{zhang2020distribution,
title={Distribution-aware coordinate representation for human pose estimation},
author={Zhang, Feng and Zhu, Xiatian and Dai, Hanbin and Ye, Mao and Zhu, Ce},
booktitle={Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition},
pages={7093--7102},
year={2020}
}
COCO-WholeBody (ECCV'2020)
@inproceedings{jin2020whole,
title={Whole-Body Human Pose Estimation in the Wild},
author={Jin, Sheng and Xu, Lumin and Xu, Jin and Wang, Can and Liu, Wentao and Qian, Chen and Ouyang, Wanli and Luo, Ping},
booktitle={Proceedings of the European Conference on Computer Vision (ECCV)},
year={2020}
}
Results on COCO-WholeBody v1.0 val with detector having human AP of 56.4 on COCO val2017 dataset
Arch | Input Size | Body AP | Body AR | Foot AP | Foot AR | Face AP | Face AR | Hand AP | Hand AR | Whole AP | Whole AR | ckpt | log |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|
pose_hrnet_w32_dark | 256x192 | 0.694 | 0.764 | 0.565 | 0.674 | 0.736 | 0.808 | 0.503 | 0.602 | 0.582 | 0.671 | ckpt | log |
pose_hrnet_w48_dark+ | 384x288 | 0.742 | 0.807 | 0.705 | 0.804 | 0.840 | 0.892 | 0.602 | 0.694 | 0.661 | 0.743 | ckpt | log |
Note: +
means the model is first pre-trained on original COCO dataset, and then fine-tuned on COCO-WholeBody dataset. We find this will lead to better performance.
Topdown Heatmap + Hrnet on Coco-Wholebody¶
HRNet (CVPR'2019)
@inproceedings{sun2019deep,
title={Deep high-resolution representation learning for human pose estimation},
author={Sun, Ke and Xiao, Bin and Liu, Dong and Wang, Jingdong},
booktitle={Proceedings of the IEEE conference on computer vision and pattern recognition},
pages={5693--5703},
year={2019}
}
COCO-WholeBody (ECCV'2020)
@inproceedings{jin2020whole,
title={Whole-Body Human Pose Estimation in the Wild},
author={Jin, Sheng and Xu, Lumin and Xu, Jin and Wang, Can and Liu, Wentao and Qian, Chen and Ouyang, Wanli and Luo, Ping},
booktitle={Proceedings of the European Conference on Computer Vision (ECCV)},
year={2020}
}
Results on COCO-WholeBody v1.0 val with detector having human AP of 56.4 on COCO val2017 dataset
Arch | Input Size | Body AP | Body AR | Foot AP | Foot AR | Face AP | Face AR | Hand AP | Hand AR | Whole AP | Whole AR | ckpt | log |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|
pose_hrnet_w32 | 256x192 | 0.700 | 0.746 | 0.567 | 0.645 | 0.637 | 0.688 | 0.473 | 0.546 | 0.553 | 0.626 | ckpt | log |
pose_hrnet_w32 | 384x288 | 0.701 | 0.773 | 0.586 | 0.692 | 0.727 | 0.783 | 0.516 | 0.604 | 0.586 | 0.674 | ckpt | log |
pose_hrnet_w48 | 256x192 | 0.700 | 0.776 | 0.672 | 0.785 | 0.656 | 0.743 | 0.534 | 0.639 | 0.579 | 0.681 | ckpt | log |
pose_hrnet_w48 | 384x288 | 0.722 | 0.790 | 0.694 | 0.799 | 0.777 | 0.834 | 0.587 | 0.679 | 0.631 | 0.716 | ckpt | log |
Topdown Heatmap + Hrnet + Dark on Halpe¶
HRNet (CVPR'2019)
@inproceedings{sun2019deep,
title={Deep high-resolution representation learning for human pose estimation},
author={Sun, Ke and Xiao, Bin and Liu, Dong and Wang, Jingdong},
booktitle={Proceedings of the IEEE conference on computer vision and pattern recognition},
pages={5693--5703},
year={2019}
}
DarkPose (CVPR'2020)
@inproceedings{zhang2020distribution,
title={Distribution-aware coordinate representation for human pose estimation},
author={Zhang, Feng and Zhu, Xiatian and Dai, Hanbin and Ye, Mao and Zhu, Ce},
booktitle={Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition},
pages={7093--7102},
year={2020}
}
Halpe (CVPR'2020)
@inproceedings{li2020pastanet,
title={PaStaNet: Toward Human Activity Knowledge Engine},
author={Li, Yong-Lu and Xu, Liang and Liu, Xinpeng and Huang, Xijie and Xu, Yue and Wang, Shiyi and Fang, Hao-Shu and Ma, Ze and Chen, Mingyang and Lu, Cewu},
booktitle={CVPR},
year={2020}
}
Results on Halpe v1.0 val with detector having human AP of 56.4 on COCO val2017 dataset
Arch | Input Size | Whole AP | Whole AR | ckpt | log |
---|---|---|---|---|---|
pose_hrnet_w48_dark+ | 384x288 | 0.531 | 0.642 | ckpt | log |
Note: +
means the model is first pre-trained on original COCO dataset, and then fine-tuned on Halpe dataset. We find this will lead to better performance.
SEResNet (CVPR’2018)¶
Topdown Heatmap + Seresnet on Coco¶
SEResNet (CVPR'2018)
@inproceedings{hu2018squeeze,
title={Squeeze-and-excitation networks},
author={Hu, Jie and Shen, Li and Sun, Gang},
booktitle={Proceedings of the IEEE conference on computer vision and pattern recognition},
pages={7132--7141},
year={2018}
}
COCO (ECCV'2014)
@inproceedings{lin2014microsoft,
title={Microsoft coco: Common objects in context},
author={Lin, Tsung-Yi and Maire, Michael and Belongie, Serge and Hays, James and Perona, Pietro and Ramanan, Deva and Doll{\'a}r, Piotr and Zitnick, C Lawrence},
booktitle={European conference on computer vision},
pages={740--755},
year={2014},
organization={Springer}
}
Results on COCO val2017 with detector having human AP of 56.4 on COCO val2017 dataset
Arch | Input Size | AP | AP50 | AP75 | AR | AR50 | ckpt | log |
---|---|---|---|---|---|---|---|---|
pose_seresnet_50 | 256x192 | 0.728 | 0.900 | 0.809 | 0.784 | 0.940 | ckpt | log |
pose_seresnet_50 | 384x288 | 0.748 | 0.905 | 0.819 | 0.799 | 0.941 | ckpt | log |
pose_seresnet_101 | 256x192 | 0.734 | 0.904 | 0.815 | 0.790 | 0.942 | ckpt | log |
pose_seresnet_101 | 384x288 | 0.753 | 0.907 | 0.823 | 0.805 | 0.943 | ckpt | log |
pose_seresnet_152* | 256x192 | 0.730 | 0.899 | 0.810 | 0.786 | 0.940 | ckpt | log |
pose_seresnet_152* | 384x288 | 0.753 | 0.906 | 0.823 | 0.806 | 0.945 | ckpt | log |
Note that * means without imagenet pre-training.
Topdown Heatmap + Seresnet on Mpii¶
SEResNet (CVPR'2018)
@inproceedings{hu2018squeeze,
title={Squeeze-and-excitation networks},
author={Hu, Jie and Shen, Li and Sun, Gang},
booktitle={Proceedings of the IEEE conference on computer vision and pattern recognition},
pages={7132--7141},
year={2018}
}
MPII (CVPR'2014)
@inproceedings{andriluka14cvpr,
author = {Mykhaylo Andriluka and Leonid Pishchulin and Peter Gehler and Schiele, Bernt},
title = {2D Human Pose Estimation: New Benchmark and State of the Art Analysis},
booktitle = {IEEE Conference on Computer Vision and Pattern Recognition (CVPR)},
year = {2014},
month = {June}
}
Results on MPII val set
Arch | Input Size | Mean | Mean@0.1 | ckpt | log |
---|---|---|---|---|---|
pose_seresnet_50 | 256x256 | 0.884 | 0.292 | ckpt | log |
pose_seresnet_101 | 256x256 | 0.884 | 0.295 | ckpt | log |
pose_seresnet_152* | 256x256 | 0.884 | 0.287 | ckpt | log |
Note that * means without imagenet pre-training.
HRNetv2 (TPAMI’2019)¶
Topdown Heatmap + Hrnetv2 on 300w¶
HRNetv2 (TPAMI'2019)
@article{WangSCJDZLMTWLX19,
title={Deep High-Resolution Representation Learning for Visual Recognition},
author={Jingdong Wang and Ke Sun and Tianheng Cheng and
Borui Jiang and Chaorui Deng and Yang Zhao and Dong Liu and Yadong Mu and
Mingkui Tan and Xinggang Wang and Wenyu Liu and Bin Xiao},
journal={TPAMI},
year={2019}
}
300W (IMAVIS'2016)
@article{sagonas2016300,
title={300 faces in-the-wild challenge: Database and results},
author={Sagonas, Christos and Antonakos, Epameinondas and Tzimiropoulos, Georgios and Zafeiriou, Stefanos and Pantic, Maja},
journal={Image and vision computing},
volume={47},
pages={3--18},
year={2016},
publisher={Elsevier}
}
Results on 300W dataset
The model is trained on 300W train.
Arch | Input Size | NMEcommon | NMEchallenge | NMEfull | NMEtest | ckpt | log |
---|---|---|---|---|---|---|---|
pose_hrnetv2_w18 | 256x256 | 2.86 | 5.45 | 3.37 | 3.97 | ckpt | log |
Topdown Heatmap + Hrnetv2 + Dark on Aflw¶
HRNetv2 (TPAMI'2019)
@article{WangSCJDZLMTWLX19,
title={Deep High-Resolution Representation Learning for Visual Recognition},
author={Jingdong Wang and Ke Sun and Tianheng Cheng and
Borui Jiang and Chaorui Deng and Yang Zhao and Dong Liu and Yadong Mu and
Mingkui Tan and Xinggang Wang and Wenyu Liu and Bin Xiao},
journal={TPAMI},
year={2019}
}
DarkPose (CVPR'2020)
@inproceedings{zhang2020distribution,
title={Distribution-aware coordinate representation for human pose estimation},
author={Zhang, Feng and Zhu, Xiatian and Dai, Hanbin and Ye, Mao and Zhu, Ce},
booktitle={Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition},
pages={7093--7102},
year={2020}
}
AFLW (ICCVW'2011)
@inproceedings{koestinger2011annotated,
title={Annotated facial landmarks in the wild: A large-scale, real-world database for facial landmark localization},
author={Koestinger, Martin and Wohlhart, Paul and Roth, Peter M and Bischof, Horst},
booktitle={2011 IEEE international conference on computer vision workshops (ICCV workshops)},
pages={2144--2151},
year={2011},
organization={IEEE}
}
Results on AFLW dataset
The model is trained on AFLW train and evaluated on AFLW full and frontal.
Arch | Input Size | NMEfull | NMEfrontal | ckpt | log |
---|---|---|---|---|---|
pose_hrnetv2_w18_dark | 256x256 | 1.34 | 1.20 | ckpt | log |
Topdown Heatmap + Hrnetv2 on Aflw¶
HRNetv2 (TPAMI'2019)
@article{WangSCJDZLMTWLX19,
title={Deep High-Resolution Representation Learning for Visual Recognition},
author={Jingdong Wang and Ke Sun and Tianheng Cheng and
Borui Jiang and Chaorui Deng and Yang Zhao and Dong Liu and Yadong Mu and
Mingkui Tan and Xinggang Wang and Wenyu Liu and Bin Xiao},
journal={TPAMI},
year={2019}
}
AFLW (ICCVW'2011)
@inproceedings{koestinger2011annotated,
title={Annotated facial landmarks in the wild: A large-scale, real-world database for facial landmark localization},
author={Koestinger, Martin and Wohlhart, Paul and Roth, Peter M and Bischof, Horst},
booktitle={2011 IEEE international conference on computer vision workshops (ICCV workshops)},
pages={2144--2151},
year={2011},
organization={IEEE}
}
Results on AFLW dataset
The model is trained on AFLW train and evaluated on AFLW full and frontal.
Arch | Input Size | NMEfull | NMEfrontal | ckpt | log |
---|---|---|---|---|---|
pose_hrnetv2_w18 | 256x256 | 1.41 | 1.27 | ckpt | log |
Topdown Heatmap + Hrnetv2 + Dark + Coco + Wholebody on Coco_wholebody_face¶
HRNetv2 (TPAMI'2019)
@article{WangSCJDZLMTWLX19,
title={Deep High-Resolution Representation Learning for Visual Recognition},
author={Jingdong Wang and Ke Sun and Tianheng Cheng and
Borui Jiang and Chaorui Deng and Yang Zhao and Dong Liu and Yadong Mu and
Mingkui Tan and Xinggang Wang and Wenyu Liu and Bin Xiao},
journal={TPAMI},
year={2019}
}
DarkPose (CVPR'2020)
@inproceedings{zhang2020distribution,
title={Distribution-aware coordinate representation for human pose estimation},
author={Zhang, Feng and Zhu, Xiatian and Dai, Hanbin and Ye, Mao and Zhu, Ce},
booktitle={Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition},
pages={7093--7102},
year={2020}
}
COCO-WholeBody-Face (ECCV'2020)
@inproceedings{jin2020whole,
title={Whole-Body Human Pose Estimation in the Wild},
author={Jin, Sheng and Xu, Lumin and Xu, Jin and Wang, Can and Liu, Wentao and Qian, Chen and Ouyang, Wanli and Luo, Ping},
booktitle={Proceedings of the European Conference on Computer Vision (ECCV)},
year={2020}
}
Results on COCO-WholeBody-Face val set
Arch | Input Size | NME | ckpt | log |
---|---|---|---|---|
pose_hrnetv2_w18_dark | 256x256 | 0.0513 | ckpt | log |
Topdown Heatmap + Hrnetv2 + Coco + Wholebody on Coco_wholebody_face¶
HRNetv2 (TPAMI'2019)
@article{WangSCJDZLMTWLX19,
title={Deep High-Resolution Representation Learning for Visual Recognition},
author={Jingdong Wang and Ke Sun and Tianheng Cheng and
Borui Jiang and Chaorui Deng and Yang Zhao and Dong Liu and Yadong Mu and
Mingkui Tan and Xinggang Wang and Wenyu Liu and Bin Xiao},
journal={TPAMI},
year={2019}
}
COCO-WholeBody-Face (ECCV'2020)
@inproceedings{jin2020whole,
title={Whole-Body Human Pose Estimation in the Wild},
author={Jin, Sheng and Xu, Lumin and Xu, Jin and Wang, Can and Liu, Wentao and Qian, Chen and Ouyang, Wanli and Luo, Ping},
booktitle={Proceedings of the European Conference on Computer Vision (ECCV)},
year={2020}
}
Results on COCO-WholeBody-Face val set
Arch | Input Size | NME | ckpt | log |
---|---|---|---|---|
pose_hrnetv2_w18 | 256x256 | 0.0569 | ckpt | log |
Topdown Heatmap + Hrnetv2 on Cofw¶
HRNetv2 (TPAMI'2019)
@article{WangSCJDZLMTWLX19,
title={Deep High-Resolution Representation Learning for Visual Recognition},
author={Jingdong Wang and Ke Sun and Tianheng Cheng and
Borui Jiang and Chaorui Deng and Yang Zhao and Dong Liu and Yadong Mu and
Mingkui Tan and Xinggang Wang and Wenyu Liu and Bin Xiao},
journal={TPAMI},
year={2019}
}
COFW (ICCV'2013)
@inproceedings{burgos2013robust,
title={Robust face landmark estimation under occlusion},
author={Burgos-Artizzu, Xavier P and Perona, Pietro and Doll{\'a}r, Piotr},
booktitle={Proceedings of the IEEE international conference on computer vision},
pages={1513--1520},
year={2013}
}
Results on COFW dataset
The model is trained on COFW train.
Arch | Input Size | NME | ckpt | log |
---|---|---|---|---|
pose_hrnetv2_w18 | 256x256 | 3.40 | ckpt | log |
Topdown Heatmap + Hrnetv2 + Dark on WFLW¶
HRNetv2 (TPAMI'2019)
@article{WangSCJDZLMTWLX19,
title={Deep High-Resolution Representation Learning for Visual Recognition},
author={Jingdong Wang and Ke Sun and Tianheng Cheng and
Borui Jiang and Chaorui Deng and Yang Zhao and Dong Liu and Yadong Mu and
Mingkui Tan and Xinggang Wang and Wenyu Liu and Bin Xiao},
journal={TPAMI},
year={2019}
}
DarkPose (CVPR'2020)
@inproceedings{zhang2020distribution,
title={Distribution-aware coordinate representation for human pose estimation},
author={Zhang, Feng and Zhu, Xiatian and Dai, Hanbin and Ye, Mao and Zhu, Ce},
booktitle={Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition},
pages={7093--7102},
year={2020}
}
WFLW (CVPR'2018)
@inproceedings{wu2018look,
title={Look at boundary: A boundary-aware face alignment algorithm},
author={Wu, Wayne and Qian, Chen and Yang, Shuo and Wang, Quan and Cai, Yici and Zhou, Qiang},
booktitle={Proceedings of the IEEE conference on computer vision and pattern recognition},
pages={2129--2138},
year={2018}
}
Results on WFLW dataset
The model is trained on WFLW train.
Arch | Input Size | NMEtest | NMEpose | NMEillumination | NMEocclusion | NMEblur | NMEmakeup | NMEexpression | ckpt | log |
---|---|---|---|---|---|---|---|---|---|---|
pose_hrnetv2_w18_dark | 256x256 | 3.98 | 6.99 | 3.96 | 4.78 | 4.57 | 3.87 | 4.30 | ckpt | log |
Topdown Heatmap + Hrnetv2 + Awing on WFLW¶
HRNetv2 (TPAMI'2019)
@article{WangSCJDZLMTWLX19,
title={Deep High-Resolution Representation Learning for Visual Recognition},
author={Jingdong Wang and Ke Sun and Tianheng Cheng and
Borui Jiang and Chaorui Deng and Yang Zhao and Dong Liu and Yadong Mu and
Mingkui Tan and Xinggang Wang and Wenyu Liu and Bin Xiao},
journal={TPAMI},
year={2019}
}
AdaptiveWingloss (ICCV'2019)
@inproceedings{wang2019adaptive,
title={Adaptive wing loss for robust face alignment via heatmap regression},
author={Wang, Xinyao and Bo, Liefeng and Fuxin, Li},
booktitle={Proceedings of the IEEE/CVF international conference on computer vision},
pages={6971--6981},
year={2019}
}
WFLW (CVPR'2018)
@inproceedings{wu2018look,
title={Look at boundary: A boundary-aware face alignment algorithm},
author={Wu, Wayne and Qian, Chen and Yang, Shuo and Wang, Quan and Cai, Yici and Zhou, Qiang},
booktitle={Proceedings of the IEEE conference on computer vision and pattern recognition},
pages={2129--2138},
year={2018}
}
Results on WFLW dataset
The model is trained on WFLW train.
Arch | Input Size | NMEtest | NMEpose | NMEillumination | NMEocclusion | NMEblur | NMEmakeup | NMEexpression | ckpt | log |
---|---|---|---|---|---|---|---|---|---|---|
pose_hrnetv2_w18_awing | 256x256 | 4.02 | 6.94 | 3.96 | 4.78 | 4.59 | 3.85 | 4.28 | ckpt | log |
Topdown Heatmap + Hrnetv2 on WFLW¶
HRNetv2 (TPAMI'2019)
@article{WangSCJDZLMTWLX19,
title={Deep High-Resolution Representation Learning for Visual Recognition},
author={Jingdong Wang and Ke Sun and Tianheng Cheng and
Borui Jiang and Chaorui Deng and Yang Zhao and Dong Liu and Yadong Mu and
Mingkui Tan and Xinggang Wang and Wenyu Liu and Bin Xiao},
journal={TPAMI},
year={2019}
}
WFLW (CVPR'2018)
@inproceedings{wu2018look,
title={Look at boundary: A boundary-aware face alignment algorithm},
author={Wu, Wayne and Qian, Chen and Yang, Shuo and Wang, Quan and Cai, Yici and Zhou, Qiang},
booktitle={Proceedings of the IEEE conference on computer vision and pattern recognition},
pages={2129--2138},
year={2018}
}
Results on WFLW dataset
The model is trained on WFLW train.
Arch | Input Size | NMEtest | NMEpose | NMEillumination | NMEocclusion | NMEblur | NMEmakeup | NMEexpression | ckpt | log |
---|---|---|---|---|---|---|---|---|---|---|
pose_hrnetv2_w18 | 256x256 | 4.06 | 6.98 | 3.99 | 4.83 | 4.59 | 3.92 | 4.33 | ckpt | log |
Topdown Heatmap + Hrnetv2 + Coco + Wholebody on Coco_wholebody_hand¶
HRNetv2 (TPAMI'2019)
@article{WangSCJDZLMTWLX19,
title={Deep High-Resolution Representation Learning for Visual Recognition},
author={Jingdong Wang and Ke Sun and Tianheng Cheng and
Borui Jiang and Chaorui Deng and Yang Zhao and Dong Liu and Yadong Mu and
Mingkui Tan and Xinggang Wang and Wenyu Liu and Bin Xiao},
journal={TPAMI},
year={2019}
}
COCO-WholeBody-Hand (ECCV'2020)
@inproceedings{jin2020whole,
title={Whole-Body Human Pose Estimation in the Wild},
author={Jin, Sheng and Xu, Lumin and Xu, Jin and Wang, Can and Liu, Wentao and Qian, Chen and Ouyang, Wanli and Luo, Ping},
booktitle={Proceedings of the European Conference on Computer Vision (ECCV)},
year={2020}
}
Results on COCO-WholeBody-Hand val set
Arch | Input Size | PCK@0.2 | AUC | EPE | ckpt | log |
---|---|---|---|---|---|---|
pose_hrnetv2_w18 | 256x256 | 0.813 | 0.840 | 4.39 | ckpt | log |
Topdown Heatmap + Hrnetv2 + Dark + Coco + Wholebody on Coco_wholebody_hand¶
HRNetv2 (TPAMI'2019)
@article{WangSCJDZLMTWLX19,
title={Deep High-Resolution Representation Learning for Visual Recognition},
author={Jingdong Wang and Ke Sun and Tianheng Cheng and
Borui Jiang and Chaorui Deng and Yang Zhao and Dong Liu and Yadong Mu and
Mingkui Tan and Xinggang Wang and Wenyu Liu and Bin Xiao},
journal={TPAMI},
year={2019}
}
DarkPose (CVPR'2020)
@inproceedings{zhang2020distribution,
title={Distribution-aware coordinate representation for human pose estimation},
author={Zhang, Feng and Zhu, Xiatian and Dai, Hanbin and Ye, Mao and Zhu, Ce},
booktitle={Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition},
pages={7093--7102},
year={2020}
}
COCO-WholeBody-Hand (ECCV'2020)
@inproceedings{jin2020whole,
title={Whole-Body Human Pose Estimation in the Wild},
author={Jin, Sheng and Xu, Lumin and Xu, Jin and Wang, Can and Liu, Wentao and Qian, Chen and Ouyang, Wanli and Luo, Ping},
booktitle={Proceedings of the European Conference on Computer Vision (ECCV)},
year={2020}
}
Results on COCO-WholeBody-Hand val set
Arch | Input Size | PCK@0.2 | AUC | EPE | ckpt | log |
---|---|---|---|---|---|---|
pose_hrnetv2_w18_dark | 256x256 | 0.814 | 0.840 | 4.37 | ckpt | log |
Topdown Heatmap + Hrnetv2 + Dark on Onehand10k¶
HRNetv2 (TPAMI'2019)
@article{WangSCJDZLMTWLX19,
title={Deep High-Resolution Representation Learning for Visual Recognition},
author={Jingdong Wang and Ke Sun and Tianheng Cheng and
Borui Jiang and Chaorui Deng and Yang Zhao and Dong Liu and Yadong Mu and
Mingkui Tan and Xinggang Wang and Wenyu Liu and Bin Xiao},
journal={TPAMI},
year={2019}
}
DarkPose (CVPR'2020)
@inproceedings{zhang2020distribution,
title={Distribution-aware coordinate representation for human pose estimation},
author={Zhang, Feng and Zhu, Xiatian and Dai, Hanbin and Ye, Mao and Zhu, Ce},
booktitle={Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition},
pages={7093--7102},
year={2020}
}
OneHand10K (TCSVT'2019)
@article{wang2018mask,
title={Mask-pose cascaded cnn for 2d hand pose estimation from single color image},
author={Wang, Yangang and Peng, Cong and Liu, Yebin},
journal={IEEE Transactions on Circuits and Systems for Video Technology},
volume={29},
number={11},
pages={3258--3268},
year={2018},
publisher={IEEE}
}
Results on OneHand10K val set
Arch | Input Size | PCK@0.2 | AUC | EPE | ckpt | log |
---|---|---|---|---|---|---|
pose_hrnetv2_w18_dark | 256x256 | 0.990 | 0.573 | 23.84 | ckpt | log |
Topdown Heatmap + Hrnetv2 on Onehand10k¶
HRNetv2 (TPAMI'2019)
@article{WangSCJDZLMTWLX19,
title={Deep High-Resolution Representation Learning for Visual Recognition},
author={Jingdong Wang and Ke Sun and Tianheng Cheng and
Borui Jiang and Chaorui Deng and Yang Zhao and Dong Liu and Yadong Mu and
Mingkui Tan and Xinggang Wang and Wenyu Liu and Bin Xiao},
journal={TPAMI},
year={2019}
}
OneHand10K (TCSVT'2019)
@article{wang2018mask,
title={Mask-pose cascaded cnn for 2d hand pose estimation from single color image},
author={Wang, Yangang and Peng, Cong and Liu, Yebin},
journal={IEEE Transactions on Circuits and Systems for Video Technology},
volume={29},
number={11},
pages={3258--3268},
year={2018},
publisher={IEEE}
}
Results on OneHand10K val set
Arch | Input Size | PCK@0.2 | AUC | EPE | ckpt | log |
---|---|---|---|---|---|---|
pose_hrnetv2_w18 | 256x256 | 0.990 | 0.568 | 24.16 | ckpt | log |
Topdown Heatmap + Hrnetv2 + Udp on Onehand10k¶
HRNetv2 (TPAMI'2019)
@article{WangSCJDZLMTWLX19,
title={Deep High-Resolution Representation Learning for Visual Recognition},
author={Jingdong Wang and Ke Sun and Tianheng Cheng and
Borui Jiang and Chaorui Deng and Yang Zhao and Dong Liu and Yadong Mu and
Mingkui Tan and Xinggang Wang and Wenyu Liu and Bin Xiao},
journal={TPAMI},
year={2019}
}
UDP (CVPR'2020)
@InProceedings{Huang_2020_CVPR,
author = {Huang, Junjie and Zhu, Zheng and Guo, Feng and Huang, Guan},
title = {The Devil Is in the Details: Delving Into Unbiased Data Processing for Human Pose Estimation},
booktitle = {The IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)},
month = {June},
year = {2020}
}
OneHand10K (TCSVT'2019)
@article{wang2018mask,
title={Mask-pose cascaded cnn for 2d hand pose estimation from single color image},
author={Wang, Yangang and Peng, Cong and Liu, Yebin},
journal={IEEE Transactions on Circuits and Systems for Video Technology},
volume={29},
number={11},
pages={3258--3268},
year={2018},
publisher={IEEE}
}
Results on OneHand10K val set
Arch | Input Size | PCK@0.2 | AUC | EPE | ckpt | log |
---|---|---|---|---|---|---|
pose_hrnetv2_w18_udp | 256x256 | 0.990 | 0.572 | 23.87 | ckpt | log |
Topdown Heatmap + Hrnetv2 on Panoptic2d¶
HRNetv2 (TPAMI'2019)
@article{WangSCJDZLMTWLX19,
title={Deep High-Resolution Representation Learning for Visual Recognition},
author={Jingdong Wang and Ke Sun and Tianheng Cheng and
Borui Jiang and Chaorui Deng and Yang Zhao and Dong Liu and Yadong Mu and
Mingkui Tan and Xinggang Wang and Wenyu Liu and Bin Xiao},
journal={TPAMI},
year={2019}
}
CMU Panoptic HandDB (CVPR'2017)
@inproceedings{simon2017hand,
title={Hand keypoint detection in single images using multiview bootstrapping},
author={Simon, Tomas and Joo, Hanbyul and Matthews, Iain and Sheikh, Yaser},
booktitle={Proceedings of the IEEE conference on Computer Vision and Pattern Recognition},
pages={1145--1153},
year={2017}
}
Results on CMU Panoptic (MPII+NZSL val set)
Arch | Input Size | PCKh@0.7 | AUC | EPE | ckpt | log |
---|---|---|---|---|---|---|
pose_hrnetv2_w18 | 256x256 | 0.999 | 0.744 | 7.79 | ckpt | log |
Topdown Heatmap + Hrnetv2 + Udp on Panoptic2d¶
HRNetv2 (TPAMI'2019)
@article{WangSCJDZLMTWLX19,
title={Deep High-Resolution Representation Learning for Visual Recognition},
author={Jingdong Wang and Ke Sun and Tianheng Cheng and
Borui Jiang and Chaorui Deng and Yang Zhao and Dong Liu and Yadong Mu and
Mingkui Tan and Xinggang Wang and Wenyu Liu and Bin Xiao},
journal={TPAMI},
year={2019}
}
UDP (CVPR'2020)
@InProceedings{Huang_2020_CVPR,
author = {Huang, Junjie and Zhu, Zheng and Guo, Feng and Huang, Guan},
title = {The Devil Is in the Details: Delving Into Unbiased Data Processing for Human Pose Estimation},
booktitle = {The IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)},
month = {June},
year = {2020}
}
CMU Panoptic HandDB (CVPR'2017)
@inproceedings{simon2017hand,
title={Hand keypoint detection in single images using multiview bootstrapping},
author={Simon, Tomas and Joo, Hanbyul and Matthews, Iain and Sheikh, Yaser},
booktitle={Proceedings of the IEEE conference on Computer Vision and Pattern Recognition},
pages={1145--1153},
year={2017}
}
Results on CMU Panoptic (MPII+NZSL val set)
Arch | Input Size | PCKh@0.7 | AUC | EPE | ckpt | log |
---|---|---|---|---|---|---|
pose_hrnetv2_w18_udp | 256x256 | 0.998 | 0.742 | 7.84 | ckpt | log |
Topdown Heatmap + Hrnetv2 + Dark on Panoptic2d¶
HRNetv2 (TPAMI'2019)
@article{WangSCJDZLMTWLX19,
title={Deep High-Resolution Representation Learning for Visual Recognition},
author={Jingdong Wang and Ke Sun and Tianheng Cheng and
Borui Jiang and Chaorui Deng and Yang Zhao and Dong Liu and Yadong Mu and
Mingkui Tan and Xinggang Wang and Wenyu Liu and Bin Xiao},
journal={TPAMI},
year={2019}
}
DarkPose (CVPR'2020)
@inproceedings{zhang2020distribution,
title={Distribution-aware coordinate representation for human pose estimation},
author={Zhang, Feng and Zhu, Xiatian and Dai, Hanbin and Ye, Mao and Zhu, Ce},
booktitle={Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition},
pages={7093--7102},
year={2020}
}
CMU Panoptic HandDB (CVPR'2017)
@inproceedings{simon2017hand,
title={Hand keypoint detection in single images using multiview bootstrapping},
author={Simon, Tomas and Joo, Hanbyul and Matthews, Iain and Sheikh, Yaser},
booktitle={Proceedings of the IEEE conference on Computer Vision and Pattern Recognition},
pages={1145--1153},
year={2017}
}
Results on CMU Panoptic (MPII+NZSL val set)
Arch | Input Size | PCKh@0.7 | AUC | EPE | ckpt | log |
---|---|---|---|---|---|---|
pose_hrnetv2_w18_dark | 256x256 | 0.999 | 0.745 | 7.77 | ckpt | log |
Topdown Heatmap + Hrnetv2 on Rhd2d¶
HRNetv2 (TPAMI'2019)
@article{WangSCJDZLMTWLX19,
title={Deep High-Resolution Representation Learning for Visual Recognition},
author={Jingdong Wang and Ke Sun and Tianheng Cheng and
Borui Jiang and Chaorui Deng and Yang Zhao and Dong Liu and Yadong Mu and
Mingkui Tan and Xinggang Wang and Wenyu Liu and Bin Xiao},
journal={TPAMI},
year={2019}
}
RHD (ICCV'2017)
@TechReport{zb2017hand,
author={Christian Zimmermann and Thomas Brox},
title={Learning to Estimate 3D Hand Pose from Single RGB Images},
institution={arXiv:1705.01389},
year={2017},
note="https://arxiv.org/abs/1705.01389",
url="https://lmb.informatik.uni-freiburg.de/projects/hand3d/"
}
Results on RHD test set
Arch | Input Size | PCK@0.2 | AUC | EPE | ckpt | log |
---|---|---|---|---|---|---|
pose_hrnetv2_w18 | 256x256 | 0.992 | 0.902 | 2.21 | ckpt | log |
Topdown Heatmap + Hrnetv2 + Udp on Rhd2d¶
HRNetv2 (TPAMI'2019)
@article{WangSCJDZLMTWLX19,
title={Deep High-Resolution Representation Learning for Visual Recognition},
author={Jingdong Wang and Ke Sun and Tianheng Cheng and
Borui Jiang and Chaorui Deng and Yang Zhao and Dong Liu and Yadong Mu and
Mingkui Tan and Xinggang Wang and Wenyu Liu and Bin Xiao},
journal={TPAMI},
year={2019}
}
UDP (CVPR'2020)
@InProceedings{Huang_2020_CVPR,
author = {Huang, Junjie and Zhu, Zheng and Guo, Feng and Huang, Guan},
title = {The Devil Is in the Details: Delving Into Unbiased Data Processing for Human Pose Estimation},
booktitle = {The IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)},
month = {June},
year = {2020}
}
RHD (ICCV'2017)
@TechReport{zb2017hand,
author={Christian Zimmermann and Thomas Brox},
title={Learning to Estimate 3D Hand Pose from Single RGB Images},
institution={arXiv:1705.01389},
year={2017},
note="https://arxiv.org/abs/1705.01389",
url="https://lmb.informatik.uni-freiburg.de/projects/hand3d/"
}
Results on CMU Panoptic (MPII+NZSL val set)
Arch | Input Size | PCKh@0.7 | AUC | EPE | ckpt | log |
---|---|---|---|---|---|---|
pose_hrnetv2_w18_udp | 256x256 | 0.998 | 0.742 | 7.84 | ckpt | log |
Topdown Heatmap + Hrnetv2 + Dark on Rhd2d¶
HRNetv2 (TPAMI'2019)
@article{WangSCJDZLMTWLX19,
title={Deep High-Resolution Representation Learning for Visual Recognition},
author={Jingdong Wang and Ke Sun and Tianheng Cheng and
Borui Jiang and Chaorui Deng and Yang Zhao and Dong Liu and Yadong Mu and
Mingkui Tan and Xinggang Wang and Wenyu Liu and Bin Xiao},
journal={TPAMI},
year={2019}
}
DarkPose (CVPR'2020)
@inproceedings{zhang2020distribution,
title={Distribution-aware coordinate representation for human pose estimation},
author={Zhang, Feng and Zhu, Xiatian and Dai, Hanbin and Ye, Mao and Zhu, Ce},
booktitle={Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition},
pages={7093--7102},
year={2020}
}
RHD (ICCV'2017)
@TechReport{zb2017hand,
author={Christian Zimmermann and Thomas Brox},
title={Learning to Estimate 3D Hand Pose from Single RGB Images},
institution={arXiv:1705.01389},
year={2017},
note="https://arxiv.org/abs/1705.01389",
url="https://lmb.informatik.uni-freiburg.de/projects/hand3d/"
}
Results on RHD test set
Arch | Input Size | PCK@0.2 | AUC | EPE | ckpt | log |
---|---|---|---|---|---|---|
pose_hrnetv2_w18_dark | 256x256 | 0.992 | 0.903 | 2.17 | ckpt | log |
Hourglass (ECCV’2016)¶
Topdown Heatmap + Hourglass on Coco¶
Hourglass (ECCV'2016)
@inproceedings{newell2016stacked,
title={Stacked hourglass networks for human pose estimation},
author={Newell, Alejandro and Yang, Kaiyu and Deng, Jia},
booktitle={European conference on computer vision},
pages={483--499},
year={2016},
organization={Springer}
}
COCO (ECCV'2014)
@inproceedings{lin2014microsoft,
title={Microsoft coco: Common objects in context},
author={Lin, Tsung-Yi and Maire, Michael and Belongie, Serge and Hays, James and Perona, Pietro and Ramanan, Deva and Doll{\'a}r, Piotr and Zitnick, C Lawrence},
booktitle={European conference on computer vision},
pages={740--755},
year={2014},
organization={Springer}
}
Results on COCO val2017 with detector having human AP of 56.4 on COCO val2017 dataset
Arch | Input Size | AP | AP50 | AP75 | AR | AR50 | ckpt | log |
---|---|---|---|---|---|---|---|---|
pose_hourglass_52 | 256x256 | 0.726 | 0.896 | 0.799 | 0.780 | 0.934 | ckpt | log |
pose_hourglass_52 | 384x384 | 0.746 | 0.900 | 0.813 | 0.797 | 0.939 | ckpt | log |
Topdown Heatmap + Hourglass on Mpii¶
Hourglass (ECCV'2016)
@inproceedings{newell2016stacked,
title={Stacked hourglass networks for human pose estimation},
author={Newell, Alejandro and Yang, Kaiyu and Deng, Jia},
booktitle={European conference on computer vision},
pages={483--499},
year={2016},
organization={Springer}
}
MPII (CVPR'2014)
@inproceedings{andriluka14cvpr,
author = {Mykhaylo Andriluka and Leonid Pishchulin and Peter Gehler and Schiele, Bernt},
title = {2D Human Pose Estimation: New Benchmark and State of the Art Analysis},
booktitle = {IEEE Conference on Computer Vision and Pattern Recognition (CVPR)},
year = {2014},
month = {June}
}
Results on MPII val set
Arch | Input Size | Mean | Mean@0.1 | ckpt | log |
---|---|---|---|---|---|
pose_hourglass_52 | 256x256 | 0.889 | 0.317 | ckpt | log |
pose_hourglass_52 | 384x384 | 0.894 | 0.366 | ckpt | log |
Topdown Heatmap + Hourglass + Coco + Wholebody on Coco_wholebody_face¶
Hourglass (ECCV'2016)
@inproceedings{newell2016stacked,
title={Stacked hourglass networks for human pose estimation},
author={Newell, Alejandro and Yang, Kaiyu and Deng, Jia},
booktitle={European conference on computer vision},
pages={483--499},
year={2016},
organization={Springer}
}
COCO-WholeBody-Face (ECCV'2020)
@inproceedings{jin2020whole,
title={Whole-Body Human Pose Estimation in the Wild},
author={Jin, Sheng and Xu, Lumin and Xu, Jin and Wang, Can and Liu, Wentao and Qian, Chen and Ouyang, Wanli and Luo, Ping},
booktitle={Proceedings of the European Conference on Computer Vision (ECCV)},
year={2020}
}
Results on COCO-WholeBody-Face val set
Arch | Input Size | NME | ckpt | log |
---|---|---|---|---|
pose_hourglass_52 | 256x256 | 0.0586 | ckpt | log |
Topdown Heatmap + Hourglass + Coco + Wholebody on Coco_wholebody_hand¶
Hourglass (ECCV'2016)
@inproceedings{newell2016stacked,
title={Stacked hourglass networks for human pose estimation},
author={Newell, Alejandro and Yang, Kaiyu and Deng, Jia},
booktitle={European conference on computer vision},
pages={483--499},
year={2016},
organization={Springer}
}
COCO-WholeBody-Hand (ECCV'2020)
@inproceedings{jin2020whole,
title={Whole-Body Human Pose Estimation in the Wild},
author={Jin, Sheng and Xu, Lumin and Xu, Jin and Wang, Can and Liu, Wentao and Qian, Chen and Ouyang, Wanli and Luo, Ping},
booktitle={Proceedings of the European Conference on Computer Vision (ECCV)},
year={2020}
}
Results on COCO-WholeBody-Hand val set
Arch | Input Size | PCK@0.2 | AUC | EPE | ckpt | log |
---|---|---|---|---|---|---|
pose_hourglass_52 | 256x256 | 0.804 | 0.835 | 4.54 | ckpt | log |
ViPNAS (CVPR’2021)¶
Topdown Heatmap + Vipnas on Coco¶
ViPNAS (CVPR'2021)
@article{xu2021vipnas,
title={ViPNAS: Efficient Video Pose Estimation via Neural Architecture Search},
author={Xu, Lumin and Guan, Yingda and Jin, Sheng and Liu, Wentao and Qian, Chen and Luo, Ping and Ouyang, Wanli and Wang, Xiaogang},
booktitle={Proceedings of the IEEE conference on computer vision and pattern recognition},
year={2021}
}
COCO (ECCV'2014)
@inproceedings{lin2014microsoft,
title={Microsoft coco: Common objects in context},
author={Lin, Tsung-Yi and Maire, Michael and Belongie, Serge and Hays, James and Perona, Pietro and Ramanan, Deva and Doll{\'a}r, Piotr and Zitnick, C Lawrence},
booktitle={European conference on computer vision},
pages={740--755},
year={2014},
organization={Springer}
}
Results on COCO val2017 with detector having human AP of 56.4 on COCO val2017 dataset
Arch | Input Size | AP | AP50 | AP75 | AR | AR50 | ckpt | log |
---|---|---|---|---|---|---|---|---|
S-ViPNAS-MobileNetV3 | 256x192 | 0.700 | 0.887 | 0.778 | 0.757 | 0.929 | ckpt | log |
S-ViPNAS-Res50 | 256x192 | 0.711 | 0.893 | 0.789 | 0.769 | 0.934 | ckpt | log |
Topdown Heatmap + Vipnas + Dark on Coco-Wholebody¶
ViPNAS (CVPR'2021)
@article{xu2021vipnas,
title={ViPNAS: Efficient Video Pose Estimation via Neural Architecture Search},
author={Xu, Lumin and Guan, Yingda and Jin, Sheng and Liu, Wentao and Qian, Chen and Luo, Ping and Ouyang, Wanli and Wang, Xiaogang},
booktitle={Proceedings of the IEEE conference on computer vision and pattern recognition},
year={2021}
}
DarkPose (CVPR'2020)
@inproceedings{zhang2020distribution,
title={Distribution-aware coordinate representation for human pose estimation},
author={Zhang, Feng and Zhu, Xiatian and Dai, Hanbin and Ye, Mao and Zhu, Ce},
booktitle={Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition},
pages={7093--7102},
year={2020}
}
COCO-WholeBody (ECCV'2020)
@inproceedings{jin2020whole,
title={Whole-Body Human Pose Estimation in the Wild},
author={Jin, Sheng and Xu, Lumin and Xu, Jin and Wang, Can and Liu, Wentao and Qian, Chen and Ouyang, Wanli and Luo, Ping},
booktitle={Proceedings of the European Conference on Computer Vision (ECCV)},
year={2020}
}
Results on COCO-WholeBody v1.0 val with detector having human AP of 56.4 on COCO val2017 dataset
Arch | Input Size | Body AP | Body AR | Foot AP | Foot AR | Face AP | Face AR | Hand AP | Hand AR | Whole AP | Whole AR | ckpt | log |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|
S-ViPNAS-MobileNetV3_dark | 256x192 | 0.632 | 0.710 | 0.530 | 0.660 | 0.672 | 0.771 | 0.404 | 0.519 | 0.508 | 0.607 | ckpt | log |
S-ViPNAS-Res50_dark | 256x192 | 0.650 | 0.732 | 0.550 | 0.686 | 0.684 | 0.784 | 0.437 | 0.554 | 0.528 | 0.632 | ckpt | log |
Topdown Heatmap + Vipnas on Coco-Wholebody¶
ViPNAS (CVPR'2021)
@article{xu2021vipnas,
title={ViPNAS: Efficient Video Pose Estimation via Neural Architecture Search},
author={Xu, Lumin and Guan, Yingda and Jin, Sheng and Liu, Wentao and Qian, Chen and Luo, Ping and Ouyang, Wanli and Wang, Xiaogang},
booktitle={Proceedings of the IEEE conference on computer vision and pattern recognition},
year={2021}
}
COCO-WholeBody (ECCV'2020)
@inproceedings{jin2020whole,
title={Whole-Body Human Pose Estimation in the Wild},
author={Jin, Sheng and Xu, Lumin and Xu, Jin and Wang, Can and Liu, Wentao and Qian, Chen and Ouyang, Wanli and Luo, Ping},
booktitle={Proceedings of the European Conference on Computer Vision (ECCV)},
year={2020}
}
Results on COCO-WholeBody v1.0 val with detector having human AP of 56.4 on COCO val2017 dataset
Arch | Input Size | Body AP | Body AR | Foot AP | Foot AR | Face AP | Face AR | Hand AP | Hand AR | Whole AP | Whole AR | ckpt | log |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|
S-ViPNAS-MobileNetV3 | 256x192 | 0.619 | 0.700 | 0.477 | 0.608 | 0.585 | 0.689 | 0.386 | 0.505 | 0.473 | 0.578 | ckpt | log |
S-ViPNAS-Res50 | 256x192 | 0.643 | 0.726 | 0.553 | 0.694 | 0.587 | 0.698 | 0.410 | 0.529 | 0.495 | 0.607 | ckpt | log |
ResNeSt (ArXiv’2020)¶
Topdown Heatmap + Resnest on Coco¶
ResNeSt (ArXiv'2020)
@article{zhang2020resnest,
title={ResNeSt: Split-Attention Networks},
author={Zhang, Hang and Wu, Chongruo and Zhang, Zhongyue and Zhu, Yi and Zhang, Zhi and Lin, Haibin and Sun, Yue and He, Tong and Muller, Jonas and Manmatha, R. and Li, Mu and Smola, Alexander},
journal={arXiv preprint arXiv:2004.08955},
year={2020}
}
COCO (ECCV'2014)
@inproceedings{lin2014microsoft,
title={Microsoft coco: Common objects in context},
author={Lin, Tsung-Yi and Maire, Michael and Belongie, Serge and Hays, James and Perona, Pietro and Ramanan, Deva and Doll{\'a}r, Piotr and Zitnick, C Lawrence},
booktitle={European conference on computer vision},
pages={740--755},
year={2014},
organization={Springer}
}
Results on COCO val2017 with detector having human AP of 56.4 on COCO val2017 dataset
Arch | Input Size | AP | AP50 | AP75 | AR | AR50 | ckpt | log |
---|---|---|---|---|---|---|---|---|
pose_resnest_50 | 256x192 | 0.721 | 0.899 | 0.802 | 0.776 | 0.938 | ckpt | log |
pose_resnest_50 | 384x288 | 0.737 | 0.900 | 0.811 | 0.789 | 0.938 | ckpt | log |
pose_resnest_101 | 256x192 | 0.725 | 0.899 | 0.807 | 0.781 | 0.939 | ckpt | log |
pose_resnest_101 | 384x288 | 0.746 | 0.906 | 0.820 | 0.798 | 0.943 | ckpt | log |
pose_resnest_200 | 256x192 | 0.732 | 0.905 | 0.812 | 0.787 | 0.942 | ckpt | log |
pose_resnest_200 | 384x288 | 0.754 | 0.908 | 0.827 | 0.807 | 0.945 | ckpt | log |
pose_resnest_269 | 256x192 | 0.738 | 0.907 | 0.819 | 0.793 | 0.945 | ckpt | log |
pose_resnest_269 | 384x288 | 0.755 | 0.908 | 0.828 | 0.806 | 0.943 | ckpt | log |
SCNet (CVPR’2020)¶
Topdown Heatmap + Scnet on Coco¶
SCNet (CVPR'2020)
@inproceedings{liu2020improving,
title={Improving Convolutional Networks with Self-Calibrated Convolutions},
author={Liu, Jiang-Jiang and Hou, Qibin and Cheng, Ming-Ming and Wang, Changhu and Feng, Jiashi},
booktitle={Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition},
pages={10096--10105},
year={2020}
}
COCO (ECCV'2014)
@inproceedings{lin2014microsoft,
title={Microsoft coco: Common objects in context},
author={Lin, Tsung-Yi and Maire, Michael and Belongie, Serge and Hays, James and Perona, Pietro and Ramanan, Deva and Doll{\'a}r, Piotr and Zitnick, C Lawrence},
booktitle={European conference on computer vision},
pages={740--755},
year={2014},
organization={Springer}
}
Results on COCO val2017 with detector having human AP of 56.4 on COCO val2017 dataset
Arch | Input Size | AP | AP50 | AP75 | AR | AR50 | ckpt | log |
---|---|---|---|---|---|---|---|---|
pose_scnet_50 | 256x192 | 0.728 | 0.899 | 0.807 | 0.784 | 0.938 | ckpt | log |
pose_scnet_50 | 384x288 | 0.751 | 0.906 | 0.818 | 0.802 | 0.943 | ckpt | log |
pose_scnet_101 | 256x192 | 0.733 | 0.903 | 0.813 | 0.790 | 0.941 | ckpt | log |
pose_scnet_101 | 384x288 | 0.752 | 0.906 | 0.823 | 0.804 | 0.943 | ckpt | log |
Topdown Heatmap + Scnet on Mpii¶
SCNet (CVPR'2020)
@inproceedings{liu2020improving,
title={Improving Convolutional Networks with Self-Calibrated Convolutions},
author={Liu, Jiang-Jiang and Hou, Qibin and Cheng, Ming-Ming and Wang, Changhu and Feng, Jiashi},
booktitle={Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition},
pages={10096--10105},
year={2020}
}
MPII (CVPR'2014)
@inproceedings{andriluka14cvpr,
author = {Mykhaylo Andriluka and Leonid Pishchulin and Peter Gehler and Schiele, Bernt},
title = {2D Human Pose Estimation: New Benchmark and State of the Art Analysis},
booktitle = {IEEE Conference on Computer Vision and Pattern Recognition (CVPR)},
year = {2014},
month = {June}
}
Results on MPII val set
Arch | Input Size | Mean | Mean@0.1 | ckpt | log |
---|---|---|---|---|---|
pose_scnet_50 | 256x256 | 0.888 | 0.290 | ckpt | log |
pose_scnet_101 | 256x256 | 0.886 | 0.293 | ckpt | log |
Topdown Heatmap + Scnet + Coco + Wholebody on Coco_wholebody_face¶
SCNet (CVPR'2020)
@inproceedings{liu2020improving,
title={Improving Convolutional Networks with Self-Calibrated Convolutions},
author={Liu, Jiang-Jiang and Hou, Qibin and Cheng, Ming-Ming and Wang, Changhu and Feng, Jiashi},
booktitle={Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition},
pages={10096--10105},
year={2020}
}
COCO-WholeBody-Face (ECCV'2020)
@inproceedings{jin2020whole,
title={Whole-Body Human Pose Estimation in the Wild},
author={Jin, Sheng and Xu, Lumin and Xu, Jin and Wang, Can and Liu, Wentao and Qian, Chen and Ouyang, Wanli and Luo, Ping},
booktitle={Proceedings of the European Conference on Computer Vision (ECCV)},
year={2020}
}
Results on COCO-WholeBody-Face val set
Arch | Input Size | NME | ckpt | log |
---|---|---|---|---|
pose_scnet_50 | 256x256 | 0.0565 | ckpt | log |
Topdown Heatmap + Scnet + Coco + Wholebody on Coco_wholebody_hand¶
SCNet (CVPR'2020)
@inproceedings{liu2020improving,
title={Improving Convolutional Networks with Self-Calibrated Convolutions},
author={Liu, Jiang-Jiang and Hou, Qibin and Cheng, Ming-Ming and Wang, Changhu and Feng, Jiashi},
booktitle={Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition},
pages={10096--10105},
year={2020}
}
COCO-WholeBody-Hand (ECCV'2020)
@inproceedings{jin2020whole,
title={Whole-Body Human Pose Estimation in the Wild},
author={Jin, Sheng and Xu, Lumin and Xu, Jin and Wang, Can and Liu, Wentao and Qian, Chen and Ouyang, Wanli and Luo, Ping},
booktitle={Proceedings of the European Conference on Computer Vision (ECCV)},
year={2020}
}
Results on COCO-WholeBody-Hand val set
Arch | Input Size | PCK@0.2 | AUC | EPE | ckpt | log |
---|---|---|---|---|---|---|
pose_scnet_50 | 256x256 | 0.803 | 0.834 | 4.55 | ckpt | log |
ShufflenetV1 (CVPR’2018)¶
Topdown Heatmap + Shufflenetv1 on Coco¶
ShufflenetV1 (CVPR'2018)
@inproceedings{zhang2018shufflenet,
title={Shufflenet: An extremely efficient convolutional neural network for mobile devices},
author={Zhang, Xiangyu and Zhou, Xinyu and Lin, Mengxiao and Sun, Jian},
booktitle={Proceedings of the IEEE conference on computer vision and pattern recognition},
pages={6848--6856},
year={2018}
}
COCO (ECCV'2014)
@inproceedings{lin2014microsoft,
title={Microsoft coco: Common objects in context},
author={Lin, Tsung-Yi and Maire, Michael and Belongie, Serge and Hays, James and Perona, Pietro and Ramanan, Deva and Doll{\'a}r, Piotr and Zitnick, C Lawrence},
booktitle={European conference on computer vision},
pages={740--755},
year={2014},
organization={Springer}
}
Results on COCO val2017 with detector having human AP of 56.4 on COCO val2017 dataset
Arch | Input Size | AP | AP50 | AP75 | AR | AR50 | ckpt | log |
---|---|---|---|---|---|---|---|---|
pose_shufflenetv1 | 256x192 | 0.585 | 0.845 | 0.650 | 0.651 | 0.894 | ckpt | log |
pose_shufflenetv1 | 384x288 | 0.622 | 0.859 | 0.685 | 0.684 | 0.901 | ckpt | log |
Topdown Heatmap + Shufflenetv1 on Mpii¶
ShufflenetV1 (CVPR'2018)
@inproceedings{zhang2018shufflenet,
title={Shufflenet: An extremely efficient convolutional neural network for mobile devices},
author={Zhang, Xiangyu and Zhou, Xinyu and Lin, Mengxiao and Sun, Jian},
booktitle={Proceedings of the IEEE conference on computer vision and pattern recognition},
pages={6848--6856},
year={2018}
}
MPII (CVPR'2014)
@inproceedings{andriluka14cvpr,
author = {Mykhaylo Andriluka and Leonid Pishchulin and Peter Gehler and Schiele, Bernt},
title = {2D Human Pose Estimation: New Benchmark and State of the Art Analysis},
booktitle = {IEEE Conference on Computer Vision and Pattern Recognition (CVPR)},
year = {2014},
month = {June}
}
Results on MPII val set
Arch | Input Size | Mean | Mean@0.1 | ckpt | log |
---|---|---|---|---|---|
pose_shufflenetv1 | 256x256 | 0.823 | 0.195 | ckpt | log |
ResNext (CVPR’2017)¶
Topdown Heatmap + Resnext on Coco¶
ResNext (CVPR'2017)
@inproceedings{xie2017aggregated,
title={Aggregated residual transformations for deep neural networks},
author={Xie, Saining and Girshick, Ross and Doll{\'a}r, Piotr and Tu, Zhuowen and He, Kaiming},
booktitle={Proceedings of the IEEE conference on computer vision and pattern recognition},
pages={1492--1500},
year={2017}
}
COCO (ECCV'2014)
@inproceedings{lin2014microsoft,
title={Microsoft coco: Common objects in context},
author={Lin, Tsung-Yi and Maire, Michael and Belongie, Serge and Hays, James and Perona, Pietro and Ramanan, Deva and Doll{\'a}r, Piotr and Zitnick, C Lawrence},
booktitle={European conference on computer vision},
pages={740--755},
year={2014},
organization={Springer}
}
Results on COCO val2017 with detector having human AP of 56.4 on COCO val2017 dataset
Arch | Input Size | AP | AP50 | AP75 | AR | AR50 | ckpt | log |
---|---|---|---|---|---|---|---|---|
pose_resnext_50 | 256x192 | 0.714 | 0.898 | 0.789 | 0.771 | 0.937 | ckpt | log |
pose_resnext_50 | 384x288 | 0.724 | 0.899 | 0.794 | 0.777 | 0.935 | ckpt | log |
pose_resnext_101 | 256x192 | 0.726 | 0.900 | 0.801 | 0.782 | 0.940 | ckpt | log |
pose_resnext_101 | 384x288 | 0.743 | 0.903 | 0.815 | 0.795 | 0.939 | ckpt | log |
pose_resnext_152 | 256x192 | 0.730 | 0.904 | 0.808 | 0.786 | 0.940 | ckpt | log |
pose_resnext_152 | 384x288 | 0.742 | 0.902 | 0.810 | 0.794 | 0.939 | ckpt | log |
Topdown Heatmap + Resnext on Mpii¶
ResNext (CVPR'2017)
@inproceedings{xie2017aggregated,
title={Aggregated residual transformations for deep neural networks},
author={Xie, Saining and Girshick, Ross and Doll{\'a}r, Piotr and Tu, Zhuowen and He, Kaiming},
booktitle={Proceedings of the IEEE conference on computer vision and pattern recognition},
pages={1492--1500},
year={2017}
}
MPII (CVPR'2014)
@inproceedings{andriluka14cvpr,
author = {Mykhaylo Andriluka and Leonid Pishchulin and Peter Gehler and Schiele, Bernt},
title = {2D Human Pose Estimation: New Benchmark and State of the Art Analysis},
booktitle = {IEEE Conference on Computer Vision and Pattern Recognition (CVPR)},
year = {2014},
month = {June}
}
Results on MPII val set
Arch | Input Size | Mean | Mean@0.1 | ckpt | log |
---|---|---|---|---|---|
pose_resnext_152 | 256x256 | 0.887 | 0.294 | ckpt | log |
ResNet (CVPR’2016)¶
Topdown Heatmap + Resnet on Aic¶
SimpleBaseline2D (ECCV'2018)
@inproceedings{xiao2018simple,
title={Simple baselines for human pose estimation and tracking},
author={Xiao, Bin and Wu, Haiping and Wei, Yichen},
booktitle={Proceedings of the European conference on computer vision (ECCV)},
pages={466--481},
year={2018}
}
ResNet (CVPR'2016)
@inproceedings{he2016deep,
title={Deep residual learning for image recognition},
author={He, Kaiming and Zhang, Xiangyu and Ren, Shaoqing and Sun, Jian},
booktitle={Proceedings of the IEEE conference on computer vision and pattern recognition},
pages={770--778},
year={2016}
}
AI Challenger (ArXiv'2017)
@article{wu2017ai,
title={Ai challenger: A large-scale dataset for going deeper in image understanding},
author={Wu, Jiahong and Zheng, He and Zhao, Bo and Li, Yixin and Yan, Baoming and Liang, Rui and Wang, Wenjia and Zhou, Shipei and Lin, Guosen and Fu, Yanwei and others},
journal={arXiv preprint arXiv:1711.06475},
year={2017}
}
Results on AIC val set with ground-truth bounding boxes
Arch | Input Size | AP | AP50 | AP75 | AR | AR50 | ckpt | log |
---|---|---|---|---|---|---|---|---|
pose_resnet_101 | 256x192 | 0.294 | 0.736 | 0.174 | 0.337 | 0.763 | ckpt | log |
Associative Embedding + Resnet on Coco¶
Associative Embedding (NIPS'2017)
@inproceedings{newell2017associative,
title={Associative embedding: End-to-end learning for joint detection and grouping},
author={Newell, Alejandro and Huang, Zhiao and Deng, Jia},
booktitle={Advances in neural information processing systems},
pages={2277--2287},
year={2017}
}
ResNet (CVPR'2016)
@inproceedings{he2016deep,
title={Deep residual learning for image recognition},
author={He, Kaiming and Zhang, Xiangyu and Ren, Shaoqing and Sun, Jian},
booktitle={Proceedings of the IEEE conference on computer vision and pattern recognition},
pages={770--778},
year={2016}
}
COCO (ECCV'2014)
@inproceedings{lin2014microsoft,
title={Microsoft coco: Common objects in context},
author={Lin, Tsung-Yi and Maire, Michael and Belongie, Serge and Hays, James and Perona, Pietro and Ramanan, Deva and Doll{\'a}r, Piotr and Zitnick, C Lawrence},
booktitle={European conference on computer vision},
pages={740--755},
year={2014},
organization={Springer}
}
Results on COCO val2017 without multi-scale test
Arch | Input Size | AP | AP50 | AP75 | AR | AR50 | ckpt | log |
---|---|---|---|---|---|---|---|---|
pose_resnet_50 | 512x512 | 0.466 | 0.742 | 0.479 | 0.552 | 0.797 | ckpt | log |
pose_resnet_50 | 640x640 | 0.479 | 0.757 | 0.487 | 0.566 | 0.810 | ckpt | log |
pose_resnet_101 | 512x512 | 0.554 | 0.807 | 0.599 | 0.622 | 0.841 | ckpt | log |
pose_resnet_152 | 512x512 | 0.595 | 0.829 | 0.648 | 0.651 | 0.856 | ckpt | log |
Results on COCO val2017 with multi-scale test. 3 default scales ([2, 1, 0.5]) are used
Arch | Input Size | AP | AP50 | AP75 | AR | AR50 | ckpt | log |
---|---|---|---|---|---|---|---|---|
pose_resnet_50 | 512x512 | 0.503 | 0.765 | 0.521 | 0.591 | 0.821 | ckpt | log |
pose_resnet_50 | 640x640 | 0.525 | 0.784 | 0.542 | 0.610 | 0.832 | ckpt | log |
pose_resnet_101 | 512x512 | 0.603 | 0.831 | 0.641 | 0.668 | 0.870 | ckpt | log |
pose_resnet_152 | 512x512 | 0.660 | 0.860 | 0.713 | 0.709 | 0.889 | ckpt | log |
Deeppose + Resnet on Coco¶
DeepPose (CVPR'2014)
@inproceedings{toshev2014deeppose,
title={Deeppose: Human pose estimation via deep neural networks},
author={Toshev, Alexander and Szegedy, Christian},
booktitle={Proceedings of the IEEE conference on computer vision and pattern recognition},
pages={1653--1660},
year={2014}
}
ResNet (CVPR'2016)
@inproceedings{he2016deep,
title={Deep residual learning for image recognition},
author={He, Kaiming and Zhang, Xiangyu and Ren, Shaoqing and Sun, Jian},
booktitle={Proceedings of the IEEE conference on computer vision and pattern recognition},
pages={770--778},
year={2016}
}
COCO (ECCV'2014)
@inproceedings{lin2014microsoft,
title={Microsoft coco: Common objects in context},
author={Lin, Tsung-Yi and Maire, Michael and Belongie, Serge and Hays, James and Perona, Pietro and Ramanan, Deva and Doll{\'a}r, Piotr and Zitnick, C Lawrence},
booktitle={European conference on computer vision},
pages={740--755},
year={2014},
organization={Springer}
}
Results on COCO val2017 with detector having human AP of 56.4 on COCO val2017 dataset
Arch | Input Size | AP | AP50 | AP75 | AR | AR50 | ckpt | log |
---|---|---|---|---|---|---|---|---|
deeppose_resnet_50 | 256x192 | 0.526 | 0.816 | 0.586 | 0.638 | 0.887 | ckpt | log |
deeppose_resnet_101 | 256x192 | 0.560 | 0.832 | 0.628 | 0.668 | 0.900 | ckpt | log |
deeppose_resnet_152 | 256x192 | 0.583 | 0.843 | 0.659 | 0.686 | 0.907 | ckpt | log |
Topdown Heatmap + Resnet + Fp16 on Coco¶
SimpleBaseline2D (ECCV'2018)
@inproceedings{xiao2018simple,
title={Simple baselines for human pose estimation and tracking},
author={Xiao, Bin and Wu, Haiping and Wei, Yichen},
booktitle={Proceedings of the European conference on computer vision (ECCV)},
pages={466--481},
year={2018}
}
ResNet (CVPR'2016)
@inproceedings{he2016deep,
title={Deep residual learning for image recognition},
author={He, Kaiming and Zhang, Xiangyu and Ren, Shaoqing and Sun, Jian},
booktitle={Proceedings of the IEEE conference on computer vision and pattern recognition},
pages={770--778},
year={2016}
}
FP16 (ArXiv'2017)
@article{micikevicius2017mixed,
title={Mixed precision training},
author={Micikevicius, Paulius and Narang, Sharan and Alben, Jonah and Diamos, Gregory and Elsen, Erich and Garcia, David and Ginsburg, Boris and Houston, Michael and Kuchaiev, Oleksii and Venkatesh, Ganesh and others},
journal={arXiv preprint arXiv:1710.03740},
year={2017}
}
COCO (ECCV'2014)
@inproceedings{lin2014microsoft,
title={Microsoft coco: Common objects in context},
author={Lin, Tsung-Yi and Maire, Michael and Belongie, Serge and Hays, James and Perona, Pietro and Ramanan, Deva and Doll{\'a}r, Piotr and Zitnick, C Lawrence},
booktitle={European conference on computer vision},
pages={740--755},
year={2014},
organization={Springer}
}
Results on COCO val2017 with detector having human AP of 56.4 on COCO val2017 dataset
Arch | Input Size | AP | AP50 | AP75 | AR | AR50 | ckpt | log |
---|---|---|---|---|---|---|---|---|
pose_resnet_50_fp16 | 256x192 | 0.717 | 0.898 | 0.793 | 0.772 | 0.936 | ckpt | log |
Topdown Heatmap + Resnet on Coco¶
SimpleBaseline2D (ECCV'2018)
@inproceedings{xiao2018simple,
title={Simple baselines for human pose estimation and tracking},
author={Xiao, Bin and Wu, Haiping and Wei, Yichen},
booktitle={Proceedings of the European conference on computer vision (ECCV)},
pages={466--481},
year={2018}
}
ResNet (CVPR'2016)
@inproceedings{he2016deep,
title={Deep residual learning for image recognition},
author={He, Kaiming and Zhang, Xiangyu and Ren, Shaoqing and Sun, Jian},
booktitle={Proceedings of the IEEE conference on computer vision and pattern recognition},
pages={770--778},
year={2016}
}
COCO (ECCV'2014)
@inproceedings{lin2014microsoft,
title={Microsoft coco: Common objects in context},
author={Lin, Tsung-Yi and Maire, Michael and Belongie, Serge and Hays, James and Perona, Pietro and Ramanan, Deva and Doll{\'a}r, Piotr and Zitnick, C Lawrence},
booktitle={European conference on computer vision},
pages={740--755},
year={2014},
organization={Springer}
}
Results on COCO val2017 with detector having human AP of 56.4 on COCO val2017 dataset
Arch | Input Size | AP | AP50 | AP75 | AR | AR50 | ckpt | log |
---|---|---|---|---|---|---|---|---|
pose_resnet_50 | 256x192 | 0.718 | 0.898 | 0.795 | 0.773 | 0.937 | ckpt | log |
pose_resnet_50 | 384x288 | 0.731 | 0.900 | 0.799 | 0.783 | 0.931 | ckpt | log |
pose_resnet_101 | 256x192 | 0.726 | 0.899 | 0.806 | 0.781 | 0.939 | ckpt | log |
pose_resnet_101 | 384x288 | 0.748 | 0.905 | 0.817 | 0.798 | 0.940 | ckpt | log |
pose_resnet_152 | 256x192 | 0.735 | 0.905 | 0.812 | 0.790 | 0.943 | ckpt | log |
pose_resnet_152 | 384x288 | 0.750 | 0.908 | 0.821 | 0.800 | 0.942 | ckpt | log |
Topdown Heatmap + Resnet + Dark on Coco¶
SimpleBaseline2D (ECCV'2018)
@inproceedings{xiao2018simple,
title={Simple baselines for human pose estimation and tracking},
author={Xiao, Bin and Wu, Haiping and Wei, Yichen},
booktitle={Proceedings of the European conference on computer vision (ECCV)},
pages={466--481},
year={2018}
}
ResNet (CVPR'2016)
@inproceedings{he2016deep,
title={Deep residual learning for image recognition},
author={He, Kaiming and Zhang, Xiangyu and Ren, Shaoqing and Sun, Jian},
booktitle={Proceedings of the IEEE conference on computer vision and pattern recognition},
pages={770--778},
year={2016}
}
DarkPose (CVPR'2020)
@inproceedings{zhang2020distribution,
title={Distribution-aware coordinate representation for human pose estimation},
author={Zhang, Feng and Zhu, Xiatian and Dai, Hanbin and Ye, Mao and Zhu, Ce},
booktitle={Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition},
pages={7093--7102},
year={2020}
}
COCO (ECCV'2014)
@inproceedings{lin2014microsoft,
title={Microsoft coco: Common objects in context},
author={Lin, Tsung-Yi and Maire, Michael and Belongie, Serge and Hays, James and Perona, Pietro and Ramanan, Deva and Doll{\'a}r, Piotr and Zitnick, C Lawrence},
booktitle={European conference on computer vision},
pages={740--755},
year={2014},
organization={Springer}
}
Results on COCO val2017 with detector having human AP of 56.4 on COCO val2017 dataset
Arch | Input Size | AP | AP50 | AP75 | AR | AR50 | ckpt | log |
---|---|---|---|---|---|---|---|---|
pose_resnet_50_dark | 256x192 | 0.724 | 0.898 | 0.800 | 0.777 | 0.936 | ckpt | log |
pose_resnet_50_dark | 384x288 | 0.735 | 0.900 | 0.801 | 0.785 | 0.937 | ckpt | log |
pose_resnet_101_dark | 256x192 | 0.732 | 0.899 | 0.808 | 0.786 | 0.938 | ckpt | log |
pose_resnet_101_dark | 384x288 | 0.749 | 0.902 | 0.816 | 0.799 | 0.939 | ckpt | log |
pose_resnet_152_dark | 256x192 | 0.745 | 0.905 | 0.821 | 0.797 | 0.942 | ckpt | log |
pose_resnet_152_dark | 384x288 | 0.757 | 0.909 | 0.826 | 0.806 | 0.943 | ckpt | log |
Topdown Heatmap + Resnet on Crowdpose¶
SimpleBaseline2D (ECCV'2018)
@inproceedings{xiao2018simple,
title={Simple baselines for human pose estimation and tracking},
author={Xiao, Bin and Wu, Haiping and Wei, Yichen},
booktitle={Proceedings of the European conference on computer vision (ECCV)},
pages={466--481},
year={2018}
}
ResNet (CVPR'2016)
@inproceedings{he2016deep,
title={Deep residual learning for image recognition},
author={He, Kaiming and Zhang, Xiangyu and Ren, Shaoqing and Sun, Jian},
booktitle={Proceedings of the IEEE conference on computer vision and pattern recognition},
pages={770--778},
year={2016}
}
CrowdPose (CVPR'2019)
@article{li2018crowdpose,
title={CrowdPose: Efficient Crowded Scenes Pose Estimation and A New Benchmark},
author={Li, Jiefeng and Wang, Can and Zhu, Hao and Mao, Yihuan and Fang, Hao-Shu and Lu, Cewu},
journal={arXiv preprint arXiv:1812.00324},
year={2018}
}
Results on CrowdPose test with YOLOv3 human detector
Arch | Input Size | AP | AP50 | AP75 | AP (E) | AP (M) | AP (H) | ckpt | log |
---|---|---|---|---|---|---|---|---|---|
pose_resnet_50 | 256x192 | 0.637 | 0.808 | 0.692 | 0.739 | 0.650 | 0.506 | ckpt | log |
pose_resnet_101 | 256x192 | 0.647 | 0.810 | 0.703 | 0.744 | 0.658 | 0.522 | ckpt | log |
pose_resnet_101 | 320x256 | 0.661 | 0.821 | 0.714 | 0.759 | 0.671 | 0.536 | ckpt | log |
pose_resnet_152 | 256x192 | 0.656 | 0.818 | 0.712 | 0.754 | 0.666 | 0.532 | ckpt | log |
Topdown Heatmap + Resnet on JHMDB¶
SimpleBaseline2D (ECCV'2018)
@inproceedings{xiao2018simple,
title={Simple baselines for human pose estimation and tracking},
author={Xiao, Bin and Wu, Haiping and Wei, Yichen},
booktitle={Proceedings of the European conference on computer vision (ECCV)},
pages={466--481},
year={2018}
}
ResNet (CVPR'2016)
@inproceedings{he2016deep,
title={Deep residual learning for image recognition},
author={He, Kaiming and Zhang, Xiangyu and Ren, Shaoqing and Sun, Jian},
booktitle={Proceedings of the IEEE conference on computer vision and pattern recognition},
pages={770--778},
year={2016}
}
JHMDB (ICCV'2013)
@inproceedings{Jhuang:ICCV:2013,
title = {Towards understanding action recognition},
author = {H. Jhuang and J. Gall and S. Zuffi and C. Schmid and M. J. Black},
booktitle = {International Conf. on Computer Vision (ICCV)},
month = Dec,
pages = {3192-3199},
year = {2013}
}
Results on Sub-JHMDB dataset
The models are pre-trained on MPII dataset only. NO test-time augmentation (multi-scale /rotation testing) is used.
Normalized by Person Size
Split | Arch | Input Size | Head | Sho | Elb | Wri | Hip | Knee | Ank | Mean | ckpt | log |
---|---|---|---|---|---|---|---|---|---|---|---|---|
Sub1 | pose_resnet_50 | 256x256 | 99.1 | 98.0 | 93.8 | 91.3 | 99.4 | 96.5 | 92.8 | 96.1 | ckpt | log |
Sub2 | pose_resnet_50 | 256x256 | 99.3 | 97.1 | 90.6 | 87.0 | 98.9 | 96.3 | 94.1 | 95.0 | ckpt | log |
Sub3 | pose_resnet_50 | 256x256 | 99.0 | 97.9 | 94.0 | 91.6 | 99.7 | 98.0 | 94.7 | 96.7 | ckpt | log |
Average | pose_resnet_50 | 256x256 | 99.2 | 97.7 | 92.8 | 90.0 | 99.3 | 96.9 | 93.9 | 96.0 | - | - |
Sub1 | pose_resnet_50 (2 Deconv.) | 256x256 | 99.1 | 98.5 | 94.6 | 92.0 | 99.4 | 94.6 | 92.5 | 96.1 | ckpt | log |
Sub2 | pose_resnet_50 (2 Deconv.) | 256x256 | 99.3 | 97.8 | 91.0 | 87.0 | 99.1 | 96.5 | 93.8 | 95.2 | ckpt | log |
Sub3 | pose_resnet_50 (2 Deconv.) | 256x256 | 98.8 | 98.4 | 94.3 | 92.1 | 99.8 | 97.5 | 93.8 | 96.7 | ckpt | log |
Average | pose_resnet_50 (2 Deconv.) | 256x256 | 99.1 | 98.2 | 93.3 | 90.4 | 99.4 | 96.2 | 93.4 | 96.0 | - | - |
Normalized by Torso Size
Split | Arch | Input Size | Head | Sho | Elb | Wri | Hip | Knee | Ank | Mean | ckpt | log |
---|---|---|---|---|---|---|---|---|---|---|---|---|
Sub1 | pose_resnet_50 | 256x256 | 93.3 | 83.2 | 74.4 | 72.7 | 85.0 | 81.2 | 78.9 | 81.9 | ckpt | log |
Sub2 | pose_resnet_50 | 256x256 | 94.1 | 74.9 | 64.5 | 62.5 | 77.9 | 71.9 | 78.6 | 75.5 | ckpt | log |
Sub3 | pose_resnet_50 | 256x256 | 97.0 | 82.2 | 74.9 | 70.7 | 84.7 | 83.7 | 84.2 | 82.9 | ckpt | log |
Average | pose_resnet_50 | 256x256 | 94.8 | 80.1 | 71.3 | 68.6 | 82.5 | 78.9 | 80.6 | 80.1 | - | - |
Sub1 | pose_resnet_50 (2 Deconv.) | 256x256 | 92.4 | 80.6 | 73.2 | 70.5 | 82.3 | 75.4 | 75.0 | 79.2 | ckpt | log |
Sub2 | pose_resnet_50 (2 Deconv.) | 256x256 | 93.4 | 73.6 | 63.8 | 60.5 | 75.1 | 68.4 | 75.5 | 73.7 | ckpt | log |
Sub3 | pose_resnet_50 (2 Deconv.) | 256x256 | 96.1 | 81.2 | 72.6 | 67.9 | 83.6 | 80.9 | 81.5 | 81.2 | ckpt | log |
Average | pose_resnet_50 (2 Deconv.) | 256x256 | 94.0 | 78.5 | 69.9 | 66.3 | 80.3 | 74.9 | 77.3 | 78.0 | - | - |
Topdown Heatmap + Resnet on MHP¶
SimpleBaseline2D (ECCV'2018)
@inproceedings{xiao2018simple,
title={Simple baselines for human pose estimation and tracking},
author={Xiao, Bin and Wu, Haiping and Wei, Yichen},
booktitle={Proceedings of the European conference on computer vision (ECCV)},
pages={466--481},
year={2018}
}
ResNet (CVPR'2016)
@inproceedings{he2016deep,
title={Deep residual learning for image recognition},
author={He, Kaiming and Zhang, Xiangyu and Ren, Shaoqing and Sun, Jian},
booktitle={Proceedings of the IEEE conference on computer vision and pattern recognition},
pages={770--778},
year={2016}
}
MHP (ACM MM'2018)
@inproceedings{zhao2018understanding,
title={Understanding humans in crowded scenes: Deep nested adversarial learning and a new benchmark for multi-human parsing},
author={Zhao, Jian and Li, Jianshu and Cheng, Yu and Sim, Terence and Yan, Shuicheng and Feng, Jiashi},
booktitle={Proceedings of the 26th ACM international conference on Multimedia},
pages={792--800},
year={2018}
}
Results on MHP v2.0 val set
Arch | Input Size | AP | AP50 | AP75 | AR | AR50 | ckpt | log |
---|---|---|---|---|---|---|---|---|
pose_resnet_101 | 256x192 | 0.583 | 0.897 | 0.669 | 0.636 | 0.918 | ckpt | log |
Note that, the evaluation metric used here is mAP (adapted from COCO), which may be different from the official evaluation codes. Please be cautious if you use the results in papers.
Deeppose + Resnet on Mpii¶
DeepPose (CVPR'2014)
@inproceedings{toshev2014deeppose,
title={Deeppose: Human pose estimation via deep neural networks},
author={Toshev, Alexander and Szegedy, Christian},
booktitle={Proceedings of the IEEE conference on computer vision and pattern recognition},
pages={1653--1660},
year={2014}
}
ResNet (CVPR'2016)
@inproceedings{he2016deep,
title={Deep residual learning for image recognition},
author={He, Kaiming and Zhang, Xiangyu and Ren, Shaoqing and Sun, Jian},
booktitle={Proceedings of the IEEE conference on computer vision and pattern recognition},
pages={770--778},
year={2016}
}
MPII (CVPR'2014)
@inproceedings{andriluka14cvpr,
author = {Mykhaylo Andriluka and Leonid Pishchulin and Peter Gehler and Schiele, Bernt},
title = {2D Human Pose Estimation: New Benchmark and State of the Art Analysis},
booktitle = {IEEE Conference on Computer Vision and Pattern Recognition (CVPR)},
year = {2014},
month = {June}
}
Results on MPII val set
Arch | Input Size | Mean | Mean@0.1 | ckpt | log |
---|---|---|---|---|---|
deeppose_resnet_50 | 256x256 | 0.825 | 0.174 | ckpt | log |
deeppose_resnet_101 | 256x256 | 0.841 | 0.193 | ckpt | log |
deeppose_resnet_152 | 256x256 | 0.850 | 0.198 | ckpt | log |
Topdown Heatmap + Resnet on Mpii¶
SimpleBaseline2D (ECCV'2018)
@inproceedings{xiao2018simple,
title={Simple baselines for human pose estimation and tracking},
author={Xiao, Bin and Wu, Haiping and Wei, Yichen},
booktitle={Proceedings of the European conference on computer vision (ECCV)},
pages={466--481},
year={2018}
}
ResNet (CVPR'2016)
@inproceedings{he2016deep,
title={Deep residual learning for image recognition},
author={He, Kaiming and Zhang, Xiangyu and Ren, Shaoqing and Sun, Jian},
booktitle={Proceedings of the IEEE conference on computer vision and pattern recognition},
pages={770--778},
year={2016}
}
MPII (CVPR'2014)
@inproceedings{andriluka14cvpr,
author = {Mykhaylo Andriluka and Leonid Pishchulin and Peter Gehler and Schiele, Bernt},
title = {2D Human Pose Estimation: New Benchmark and State of the Art Analysis},
booktitle = {IEEE Conference on Computer Vision and Pattern Recognition (CVPR)},
year = {2014},
month = {June}
}
Results on MPII val set
Arch | Input Size | Mean | Mean@0.1 | ckpt | log |
---|---|---|---|---|---|
pose_resnet_50 | 256x256 | 0.882 | 0.286 | ckpt | log |
pose_resnet_101 | 256x256 | 0.888 | 0.290 | ckpt | log |
pose_resnet_152 | 256x256 | 0.889 | 0.303 | ckpt | log |
Topdown Heatmap + Resnet + Mpii on Mpii_trb¶
SimpleBaseline2D (ECCV'2018)
@inproceedings{xiao2018simple,
title={Simple baselines for human pose estimation and tracking},
author={Xiao, Bin and Wu, Haiping and Wei, Yichen},
booktitle={Proceedings of the European conference on computer vision (ECCV)},
pages={466--481},
year={2018}
}
ResNet (CVPR'2016)
@inproceedings{he2016deep,
title={Deep residual learning for image recognition},
author={He, Kaiming and Zhang, Xiangyu and Ren, Shaoqing and Sun, Jian},
booktitle={Proceedings of the IEEE conference on computer vision and pattern recognition},
pages={770--778},
year={2016}
}
MPII-TRB (ICCV'2019)
@inproceedings{duan2019trb,
title={TRB: A Novel Triplet Representation for Understanding 2D Human Body},
author={Duan, Haodong and Lin, Kwan-Yee and Jin, Sheng and Liu, Wentao and Qian, Chen and Ouyang, Wanli},
booktitle={Proceedings of the IEEE International Conference on Computer Vision},
pages={9479--9488},
year={2019}
}
Results on MPII-TRB val set
Arch | Input Size | Skeleton Acc | Contour Acc | Mean Acc | ckpt | log |
---|---|---|---|---|---|---|
pose_resnet_50 | 256x256 | 0.887 | 0.858 | 0.868 | ckpt | log |
pose_resnet_101 | 256x256 | 0.890 | 0.863 | 0.873 | ckpt | log |
pose_resnet_152 | 256x256 | 0.897 | 0.868 | 0.879 | ckpt | log |
Topdown Heatmap + Resnet on Ochuman¶
SimpleBaseline2D (ECCV'2018)
@inproceedings{xiao2018simple,
title={Simple baselines for human pose estimation and tracking},
author={Xiao, Bin and Wu, Haiping and Wei, Yichen},
booktitle={Proceedings of the European conference on computer vision (ECCV)},
pages={466--481},
year={2018}
}
ResNet (CVPR'2016)
@inproceedings{he2016deep,
title={Deep residual learning for image recognition},
author={He, Kaiming and Zhang, Xiangyu and Ren, Shaoqing and Sun, Jian},
booktitle={Proceedings of the IEEE conference on computer vision and pattern recognition},
pages={770--778},
year={2016}
}
OCHuman (CVPR'2019)
@inproceedings{zhang2019pose2seg,
title={Pose2seg: Detection free human instance segmentation},
author={Zhang, Song-Hai and Li, Ruilong and Dong, Xin and Rosin, Paul and Cai, Zixi and Han, Xi and Yang, Dingcheng and Huang, Haozhi and Hu, Shi-Min},
booktitle={Proceedings of the IEEE conference on computer vision and pattern recognition},
pages={889--898},
year={2019}
}
Results on OCHuman test dataset with ground-truth bounding boxes
Following the common setting, the models are trained on COCO train dataset, and evaluate on OCHuman dataset.
Arch | Input Size | AP | AP50 | AP75 | AR | AR50 | ckpt | log |
---|---|---|---|---|---|---|---|---|
pose_resnet_50 | 256x192 | 0.546 | 0.726 | 0.593 | 0.592 | 0.755 | ckpt | log |
pose_resnet_50 | 384x288 | 0.539 | 0.723 | 0.574 | 0.588 | 0.756 | ckpt | log |
pose_resnet_101 | 256x192 | 0.559 | 0.724 | 0.606 | 0.605 | 0.751 | ckpt | log |
pose_resnet_101 | 384x288 | 0.571 | 0.715 | 0.615 | 0.615 | 0.748 | ckpt | log |
pose_resnet_152 | 256x192 | 0.570 | 0.725 | 0.617 | 0.616 | 0.754 | ckpt | log |
pose_resnet_152 | 384x288 | 0.582 | 0.723 | 0.627 | 0.627 | 0.752 | ckpt | log |
Topdown Heatmap + Resnet on Posetrack18¶
SimpleBaseline2D (ECCV'2018)
@inproceedings{xiao2018simple,
title={Simple baselines for human pose estimation and tracking},
author={Xiao, Bin and Wu, Haiping and Wei, Yichen},
booktitle={Proceedings of the European conference on computer vision (ECCV)},
pages={466--481},
year={2018}
}
ResNet (CVPR'2016)
@inproceedings{he2016deep,
title={Deep residual learning for image recognition},
author={He, Kaiming and Zhang, Xiangyu and Ren, Shaoqing and Sun, Jian},
booktitle={Proceedings of the IEEE conference on computer vision and pattern recognition},
pages={770--778},
year={2016}
}
PoseTrack18 (CVPR'2018)
@inproceedings{andriluka2018posetrack,
title={Posetrack: A benchmark for human pose estimation and tracking},
author={Andriluka, Mykhaylo and Iqbal, Umar and Insafutdinov, Eldar and Pishchulin, Leonid and Milan, Anton and Gall, Juergen and Schiele, Bernt},
booktitle={Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition},
pages={5167--5176},
year={2018}
}
Results on PoseTrack2018 val with ground-truth bounding boxes
Arch | Input Size | Head | Shou | Elb | Wri | Hip | Knee | Ankl | Total | ckpt | log |
---|---|---|---|---|---|---|---|---|---|---|---|
pose_resnet_50 | 256x192 | 86.5 | 87.5 | 82.3 | 75.6 | 79.9 | 78.6 | 74.0 | 81.0 | ckpt | log |
The models are first pre-trained on COCO dataset, and then fine-tuned on PoseTrack18.
Results on PoseTrack2018 val with MMDetection pre-trained Cascade R-CNN (X-101-64x4d-FPN) human detector
Arch | Input Size | Head | Shou | Elb | Wri | Hip | Knee | Ankl | Total | ckpt | log |
---|---|---|---|---|---|---|---|---|---|---|---|
pose_resnet_50 | 256x192 | 78.9 | 81.9 | 77.8 | 70.8 | 75.3 | 73.2 | 66.4 | 75.2 | ckpt | log |
The models are first pre-trained on COCO dataset, and then fine-tuned on PoseTrack18.
HMR + Resnet on Mixed¶
HMR (CVPR'2018)
@inProceedings{kanazawaHMR18,
title={End-to-end Recovery of Human Shape and Pose},
author = {Angjoo Kanazawa
and Michael J. Black
and David W. Jacobs
and Jitendra Malik},
booktitle={Computer Vision and Pattern Recognition (CVPR)},
year={2018}
}
ResNet (CVPR'2016)
@inproceedings{he2016deep,
title={Deep residual learning for image recognition},
author={He, Kaiming and Zhang, Xiangyu and Ren, Shaoqing and Sun, Jian},
booktitle={Proceedings of the IEEE conference on computer vision and pattern recognition},
pages={770--778},
year={2016}
}
Human3.6M (TPAMI'2014)
@article{h36m_pami,
author = {Ionescu, Catalin and Papava, Dragos and Olaru, Vlad and Sminchisescu, Cristian},
title = {Human3.6M: Large Scale Datasets and Predictive Methods for 3D Human Sensing in Natural Environments},
journal = {IEEE Transactions on Pattern Analysis and Machine Intelligence},
publisher = {IEEE Computer Society},
volume = {36},
number = {7},
pages = {1325-1339},
month = {jul},
year = {2014}
}
Results on Human3.6M with ground-truth bounding box having MPJPE-PA of 52.60 mm on Protocol2
Arch | Input Size | MPJPE (P1) | MPJPE-PA (P1) | MPJPE (P2) | MPJPE-PA (P2) | ckpt | log |
---|---|---|---|---|---|---|---|
hmr_resnet_50 | 224x224 | 80.75 | 55.08 | 80.35 | 52.60 | ckpt | log |
Topdown Heatmap + Resnet + Coco + Wholebody on Coco_wholebody_face¶
SimpleBaseline2D (ECCV'2018)
@inproceedings{xiao2018simple,
title={Simple baselines for human pose estimation and tracking},
author={Xiao, Bin and Wu, Haiping and Wei, Yichen},
booktitle={Proceedings of the European conference on computer vision (ECCV)},
pages={466--481},
year={2018}
}
ResNet (CVPR'2016)
@inproceedings{he2016deep,
title={Deep residual learning for image recognition},
author={He, Kaiming and Zhang, Xiangyu and Ren, Shaoqing and Sun, Jian},
booktitle={Proceedings of the IEEE conference on computer vision and pattern recognition},
pages={770--778},
year={2016}
}
COCO-WholeBody-Face (ECCV'2020)
@inproceedings{jin2020whole,
title={Whole-Body Human Pose Estimation in the Wild},
author={Jin, Sheng and Xu, Lumin and Xu, Jin and Wang, Can and Liu, Wentao and Qian, Chen and Ouyang, Wanli and Luo, Ping},
booktitle={Proceedings of the European Conference on Computer Vision (ECCV)},
year={2020}
}
Results on COCO-WholeBody-Face val set
Arch | Input Size | NME | ckpt | log |
---|---|---|---|---|
pose_res50 | 256x256 | 0.0566 | ckpt | log |
Deeppose + Resnet + Softwingloss on WFLW¶
DeepPose (CVPR'2014)
@inproceedings{toshev2014deeppose,
title={Deeppose: Human pose estimation via deep neural networks},
author={Toshev, Alexander and Szegedy, Christian},
booktitle={Proceedings of the IEEE conference on computer vision and pattern recognition},
pages={1653--1660},
year={2014}
}
ResNet (CVPR'2016)
@inproceedings{he2016deep,
title={Deep residual learning for image recognition},
author={He, Kaiming and Zhang, Xiangyu and Ren, Shaoqing and Sun, Jian},
booktitle={Proceedings of the IEEE conference on computer vision and pattern recognition},
pages={770--778},
year={2016}
}
SoftWingloss (TIP'2021)
@article{lin2021structure,
title={Structure-Coherent Deep Feature Learning for Robust Face Alignment},
author={Lin, Chunze and Zhu, Beier and Wang, Quan and Liao, Renjie and Qian, Chen and Lu, Jiwen and Zhou, Jie},
journal={IEEE Transactions on Image Processing},
year={2021},
publisher={IEEE}
}
WFLW (CVPR'2018)
@inproceedings{wu2018look,
title={Look at boundary: A boundary-aware face alignment algorithm},
author={Wu, Wayne and Qian, Chen and Yang, Shuo and Wang, Quan and Cai, Yici and Zhou, Qiang},
booktitle={Proceedings of the IEEE conference on computer vision and pattern recognition},
pages={2129--2138},
year={2018}
}
Results on WFLW dataset
The model is trained on WFLW train.
Arch | Input Size | NMEtest | NMEpose | NMEillumination | NMEocclusion | NMEblur | NMEmakeup | NMEexpression | ckpt | log |
---|---|---|---|---|---|---|---|---|---|---|
deeppose_res50_softwingloss | 256x256 | 4.41 | 7.77 | 4.37 | 5.27 | 5.01 | 4.36 | 4.70 | ckpt | log |
Deeppose + Resnet on WFLW¶
DeepPose (CVPR'2014)
@inproceedings{toshev2014deeppose,
title={Deeppose: Human pose estimation via deep neural networks},
author={Toshev, Alexander and Szegedy, Christian},
booktitle={Proceedings of the IEEE conference on computer vision and pattern recognition},
pages={1653--1660},
year={2014}
}
ResNet (CVPR'2016)
@inproceedings{he2016deep,
title={Deep residual learning for image recognition},
author={He, Kaiming and Zhang, Xiangyu and Ren, Shaoqing and Sun, Jian},
booktitle={Proceedings of the IEEE conference on computer vision and pattern recognition},
pages={770--778},
year={2016}
}
WFLW (CVPR'2018)
@inproceedings{wu2018look,
title={Look at boundary: A boundary-aware face alignment algorithm},
author={Wu, Wayne and Qian, Chen and Yang, Shuo and Wang, Quan and Cai, Yici and Zhou, Qiang},
booktitle={Proceedings of the IEEE conference on computer vision and pattern recognition},
pages={2129--2138},
year={2018}
}
Results on WFLW dataset
The model is trained on WFLW train.
Arch | Input Size | NMEtest | NMEpose | NMEillumination | NMEocclusion | NMEblur | NMEmakeup | NMEexpression | ckpt | log |
---|---|---|---|---|---|---|---|---|---|---|
deeppose_res50 | 256x256 | 4.85 | 8.50 | 4.81 | 5.69 | 5.45 | 4.82 | 5.20 | ckpt | log |
Deeppose + Resnet + Wingloss on WFLW¶
DeepPose (CVPR'2014)
@inproceedings{toshev2014deeppose,
title={Deeppose: Human pose estimation via deep neural networks},
author={Toshev, Alexander and Szegedy, Christian},
booktitle={Proceedings of the IEEE conference on computer vision and pattern recognition},
pages={1653--1660},
year={2014}
}
ResNet (CVPR'2016)
@inproceedings{he2016deep,
title={Deep residual learning for image recognition},
author={He, Kaiming and Zhang, Xiangyu and Ren, Shaoqing and Sun, Jian},
booktitle={Proceedings of the IEEE conference on computer vision and pattern recognition},
pages={770--778},
year={2016}
}
Wingloss (CVPR'2018)
@inproceedings{feng2018wing,
title={Wing Loss for Robust Facial Landmark Localisation with Convolutional Neural Networks},
author={Feng, Zhen-Hua and Kittler, Josef and Awais, Muhammad and Huber, Patrik and Wu, Xiao-Jun},
booktitle={Computer Vision and Pattern Recognition (CVPR), 2018 IEEE Conference on},
year={2018},
pages ={2235-2245},
organization={IEEE}
}
WFLW (CVPR'2018)
@inproceedings{wu2018look,
title={Look at boundary: A boundary-aware face alignment algorithm},
author={Wu, Wayne and Qian, Chen and Yang, Shuo and Wang, Quan and Cai, Yici and Zhou, Qiang},
booktitle={Proceedings of the IEEE conference on computer vision and pattern recognition},
pages={2129--2138},
year={2018}
}
Results on WFLW dataset
The model is trained on WFLW train.
Arch | Input Size | NMEtest | NMEpose | NMEillumination | NMEocclusion | NMEblur | NMEmakeup | NMEexpression | ckpt | log |
---|---|---|---|---|---|---|---|---|---|---|
deeppose_res50_wingloss | 256x256 | 4.64 | 8.25 | 4.59 | 5.56 | 5.26 | 4.59 | 5.07 | ckpt | log |
Deeppose + Resnet on Deepfashion¶
DeepPose (CVPR'2014)
@inproceedings{toshev2014deeppose,
title={Deeppose: Human pose estimation via deep neural networks},
author={Toshev, Alexander and Szegedy, Christian},
booktitle={Proceedings of the IEEE conference on computer vision and pattern recognition},
pages={1653--1660},
year={2014}
}
ResNet (CVPR'2016)
@inproceedings{he2016deep,
title={Deep residual learning for image recognition},
author={He, Kaiming and Zhang, Xiangyu and Ren, Shaoqing and Sun, Jian},
booktitle={Proceedings of the IEEE conference on computer vision and pattern recognition},
pages={770--778},
year={2016}
}
DeepFashion (CVPR'2016)
@inproceedings{liuLQWTcvpr16DeepFashion,
author = {Liu, Ziwei and Luo, Ping and Qiu, Shi and Wang, Xiaogang and Tang, Xiaoou},
title = {DeepFashion: Powering Robust Clothes Recognition and Retrieval with Rich Annotations},
booktitle = {Proceedings of IEEE Conference on Computer Vision and Pattern Recognition (CVPR)},
month = {June},
year = {2016}
}
DeepFashion (ECCV'2016)
@inproceedings{liuYLWTeccv16FashionLandmark,
author = {Liu, Ziwei and Yan, Sijie and Luo, Ping and Wang, Xiaogang and Tang, Xiaoou},
title = {Fashion Landmark Detection in the Wild},
booktitle = {European Conference on Computer Vision (ECCV)},
month = {October},
year = {2016}
}
Results on DeepFashion val set
Set | Arch | Input Size | PCK@0.2 | AUC | EPE | ckpt | log |
---|---|---|---|---|---|---|---|
upper | deeppose_resnet_50 | 256x256 | 0.965 | 0.535 | 17.2 | ckpt | log |
lower | deeppose_resnet_50 | 256x256 | 0.971 | 0.678 | 11.8 | ckpt | log |
full | deeppose_resnet_50 | 256x256 | 0.983 | 0.602 | 14.0 | ckpt | log |
Topdown Heatmap + Resnet on Deepfashion¶
SimpleBaseline2D (ECCV'2018)
@inproceedings{xiao2018simple,
title={Simple baselines for human pose estimation and tracking},
author={Xiao, Bin and Wu, Haiping and Wei, Yichen},
booktitle={Proceedings of the European conference on computer vision (ECCV)},
pages={466--481},
year={2018}
}
ResNet (CVPR'2016)
@inproceedings{he2016deep,
title={Deep residual learning for image recognition},
author={He, Kaiming and Zhang, Xiangyu and Ren, Shaoqing and Sun, Jian},
booktitle={Proceedings of the IEEE conference on computer vision and pattern recognition},
pages={770--778},
year={2016}
}
DeepFashion (CVPR'2016)
@inproceedings{liuLQWTcvpr16DeepFashion,
author = {Liu, Ziwei and Luo, Ping and Qiu, Shi and Wang, Xiaogang and Tang, Xiaoou},
title = {DeepFashion: Powering Robust Clothes Recognition and Retrieval with Rich Annotations},
booktitle = {Proceedings of IEEE Conference on Computer Vision and Pattern Recognition (CVPR)},
month = {June},
year = {2016}
}
DeepFashion (ECCV'2016)
@inproceedings{liuYLWTeccv16FashionLandmark,
author = {Liu, Ziwei and Yan, Sijie and Luo, Ping and Wang, Xiaogang and Tang, Xiaoou},
title = {Fashion Landmark Detection in the Wild},
booktitle = {European Conference on Computer Vision (ECCV)},
month = {October},
year = {2016}
}
Results on DeepFashion val set
Set | Arch | Input Size | PCK@0.2 | AUC | EPE | ckpt | log |
---|---|---|---|---|---|---|---|
upper | pose_resnet_50 | 256x256 | 0.954 | 0.578 | 16.8 | ckpt | log |
lower | pose_resnet_50 | 256x256 | 0.965 | 0.744 | 10.5 | ckpt | log |
full | pose_resnet_50 | 256x256 | 0.977 | 0.664 | 12.7 | ckpt | log |
Topdown Heatmap + Resnet + Coco + Wholebody on Coco_wholebody_hand¶
SimpleBaseline2D (ECCV'2018)
@inproceedings{xiao2018simple,
title={Simple baselines for human pose estimation and tracking},
author={Xiao, Bin and Wu, Haiping and Wei, Yichen},
booktitle={Proceedings of the European conference on computer vision (ECCV)},
pages={466--481},
year={2018}
}
ResNet (CVPR'2016)
@inproceedings{he2016deep,
title={Deep residual learning for image recognition},
author={He, Kaiming and Zhang, Xiangyu and Ren, Shaoqing and Sun, Jian},
booktitle={Proceedings of the IEEE conference on computer vision and pattern recognition},
pages={770--778},
year={2016}
}
COCO-WholeBody-Hand (ECCV'2020)
@inproceedings{jin2020whole,
title={Whole-Body Human Pose Estimation in the Wild},
author={Jin, Sheng and Xu, Lumin and Xu, Jin and Wang, Can and Liu, Wentao and Qian, Chen and Ouyang, Wanli and Luo, Ping},
booktitle={Proceedings of the European Conference on Computer Vision (ECCV)},
year={2020}
}
Results on COCO-WholeBody-Hand val set
Arch | Input Size | PCK@0.2 | AUC | EPE | ckpt | log |
---|---|---|---|---|---|---|
pose_resnet_50 | 256x256 | 0.800 | 0.833 | 4.64 | ckpt | log |
Topdown Heatmap + Resnet on Freihand2d¶
SimpleBaseline2D (ECCV'2018)
@inproceedings{xiao2018simple,
title={Simple baselines for human pose estimation and tracking},
author={Xiao, Bin and Wu, Haiping and Wei, Yichen},
booktitle={Proceedings of the European conference on computer vision (ECCV)},
pages={466--481},
year={2018}
}
ResNet (CVPR'2016)
@inproceedings{he2016deep,
title={Deep residual learning for image recognition},
author={He, Kaiming and Zhang, Xiangyu and Ren, Shaoqing and Sun, Jian},
booktitle={Proceedings of the IEEE conference on computer vision and pattern recognition},
pages={770--778},
year={2016}
}
FreiHand (ICCV'2019)
@inproceedings{zimmermann2019freihand,
title={Freihand: A dataset for markerless capture of hand pose and shape from single rgb images},
author={Zimmermann, Christian and Ceylan, Duygu and Yang, Jimei and Russell, Bryan and Argus, Max and Brox, Thomas},
booktitle={Proceedings of the IEEE International Conference on Computer Vision},
pages={813--822},
year={2019}
}
Results on FreiHand val & test set
Set | Arch | Input Size | PCK@0.2 | AUC | EPE | ckpt | log |
---|---|---|---|---|---|---|---|
val | pose_resnet_50 | 224x224 | 0.993 | 0.868 | 3.25 | ckpt | log |
test | pose_resnet_50 | 224x224 | 0.992 | 0.868 | 3.27 | ckpt | log |
Topdown Heatmap + Resnet on Interhand2d¶
SimpleBaseline2D (ECCV'2018)
@inproceedings{xiao2018simple,
title={Simple baselines for human pose estimation and tracking},
author={Xiao, Bin and Wu, Haiping and Wei, Yichen},
booktitle={Proceedings of the European conference on computer vision (ECCV)},
pages={466--481},
year={2018}
}
ResNet (CVPR'2016)
@inproceedings{he2016deep,
title={Deep residual learning for image recognition},
author={He, Kaiming and Zhang, Xiangyu and Ren, Shaoqing and Sun, Jian},
booktitle={Proceedings of the IEEE conference on computer vision and pattern recognition},
pages={770--778},
year={2016}
}
InterHand2.6M (ECCV'2020)
@InProceedings{Moon_2020_ECCV_InterHand2.6M,
author = {Moon, Gyeongsik and Yu, Shoou-I and Wen, He and Shiratori, Takaaki and Lee, Kyoung Mu},
title = {InterHand2.6M: A Dataset and Baseline for 3D Interacting Hand Pose Estimation from a Single RGB Image},
booktitle = {European Conference on Computer Vision (ECCV)},
year = {2020}
}
Results on InterHand2.6M val & test set
Train Set | Set | Arch | Input Size | PCK@0.2 | AUC | EPE | ckpt | log |
---|---|---|---|---|---|---|---|---|
Human_annot | val(M) | pose_resnet_50 | 256x256 | 0.973 | 0.828 | 5.15 | ckpt | log |
Human_annot | test(H) | pose_resnet_50 | 256x256 | 0.973 | 0.826 | 5.27 | ckpt | log |
Human_annot | test(M) | pose_resnet_50 | 256x256 | 0.975 | 0.841 | 4.90 | ckpt | log |
Human_annot | test(H+M) | pose_resnet_50 | 256x256 | 0.975 | 0.839 | 4.97 | ckpt | log |
Machine_annot | val(M) | pose_resnet_50 | 256x256 | 0.970 | 0.824 | 5.39 | ckpt | log |
Machine_annot | test(H) | pose_resnet_50 | 256x256 | 0.969 | 0.821 | 5.52 | ckpt | log |
Machine_annot | test(M) | pose_resnet_50 | 256x256 | 0.972 | 0.838 | 5.03 | ckpt | log |
Machine_annot | test(H+M) | pose_resnet_50 | 256x256 | 0.972 | 0.837 | 5.11 | ckpt | log |
All | val(M) | pose_resnet_50 | 256x256 | 0.977 | 0.840 | 4.66 | ckpt | log |
All | test(H) | pose_resnet_50 | 256x256 | 0.979 | 0.839 | 4.65 | ckpt | log |
All | test(M) | pose_resnet_50 | 256x256 | 0.979 | 0.838 | 4.42 | ckpt | log |
All | test(H+M) | pose_resnet_50 | 256x256 | 0.979 | 0.851 | 4.46 | ckpt | log |
Deeppose + Resnet on Onehand10k¶
DeepPose (CVPR'2014)
@inproceedings{toshev2014deeppose,
title={Deeppose: Human pose estimation via deep neural networks},
author={Toshev, Alexander and Szegedy, Christian},
booktitle={Proceedings of the IEEE conference on computer vision and pattern recognition},
pages={1653--1660},
year={2014}
}
ResNet (CVPR'2016)
@inproceedings{he2016deep,
title={Deep residual learning for image recognition},
author={He, Kaiming and Zhang, Xiangyu and Ren, Shaoqing and Sun, Jian},
booktitle={Proceedings of the IEEE conference on computer vision and pattern recognition},
pages={770--778},
year={2016}
}
OneHand10K (TCSVT'2019)
@article{wang2018mask,
title={Mask-pose cascaded cnn for 2d hand pose estimation from single color image},
author={Wang, Yangang and Peng, Cong and Liu, Yebin},
journal={IEEE Transactions on Circuits and Systems for Video Technology},
volume={29},
number={11},
pages={3258--3268},
year={2018},
publisher={IEEE}
}
Results on OneHand10K val set
Arch | Input Size | PCK@0.2 | AUC | EPE | ckpt | log |
---|---|---|---|---|---|---|
deeppose_resnet_50 | 256x256 | 0.990 | 0.486 | 34.28 | ckpt | log |
Topdown Heatmap + Resnet on Onehand10k¶
SimpleBaseline2D (ECCV'2018)
@inproceedings{xiao2018simple,
title={Simple baselines for human pose estimation and tracking},
author={Xiao, Bin and Wu, Haiping and Wei, Yichen},
booktitle={Proceedings of the European conference on computer vision (ECCV)},
pages={466--481},
year={2018}
}
ResNet (CVPR'2016)
@inproceedings{he2016deep,
title={Deep residual learning for image recognition},
author={He, Kaiming and Zhang, Xiangyu and Ren, Shaoqing and Sun, Jian},
booktitle={Proceedings of the IEEE conference on computer vision and pattern recognition},
pages={770--778},
year={2016}
}
OneHand10K (TCSVT'2019)
@article{wang2018mask,
title={Mask-pose cascaded cnn for 2d hand pose estimation from single color image},
author={Wang, Yangang and Peng, Cong and Liu, Yebin},
journal={IEEE Transactions on Circuits and Systems for Video Technology},
volume={29},
number={11},
pages={3258--3268},
year={2018},
publisher={IEEE}
}
Results on OneHand10K val set
Arch | Input Size | PCK@0.2 | AUC | EPE | ckpt | log |
---|---|---|---|---|---|---|
pose_resnet_50 | 256x256 | 0.989 | 0.555 | 25.19 | ckpt | log |
Deeppose + Resnet on Panoptic2d¶
DeepPose (CVPR'2014)
@inproceedings{toshev2014deeppose,
title={Deeppose: Human pose estimation via deep neural networks},
author={Toshev, Alexander and Szegedy, Christian},
booktitle={Proceedings of the IEEE conference on computer vision and pattern recognition},
pages={1653--1660},
year={2014}
}
ResNet (CVPR'2016)
@inproceedings{he2016deep,
title={Deep residual learning for image recognition},
author={He, Kaiming and Zhang, Xiangyu and Ren, Shaoqing and Sun, Jian},
booktitle={Proceedings of the IEEE conference on computer vision and pattern recognition},
pages={770--778},
year={2016}
}
CMU Panoptic HandDB (CVPR'2017)
@inproceedings{simon2017hand,
title={Hand keypoint detection in single images using multiview bootstrapping},
author={Simon, Tomas and Joo, Hanbyul and Matthews, Iain and Sheikh, Yaser},
booktitle={Proceedings of the IEEE conference on Computer Vision and Pattern Recognition},
pages={1145--1153},
year={2017}
}
Results on CMU Panoptic (MPII+NZSL val set)
Arch | Input Size | PCKh@0.7 | AUC | EPE | ckpt | log |
---|---|---|---|---|---|---|
deeppose_resnet_50 | 256x256 | 0.999 | 0.686 | 9.36 | ckpt | log |
Topdown Heatmap + Resnet on Panoptic2d¶
SimpleBaseline2D (ECCV'2018)
@inproceedings{xiao2018simple,
title={Simple baselines for human pose estimation and tracking},
author={Xiao, Bin and Wu, Haiping and Wei, Yichen},
booktitle={Proceedings of the European conference on computer vision (ECCV)},
pages={466--481},
year={2018}
}
ResNet (CVPR'2016)
@inproceedings{he2016deep,
title={Deep residual learning for image recognition},
author={He, Kaiming and Zhang, Xiangyu and Ren, Shaoqing and Sun, Jian},
booktitle={Proceedings of the IEEE conference on computer vision and pattern recognition},
pages={770--778},
year={2016}
}
CMU Panoptic HandDB (CVPR'2017)
@inproceedings{simon2017hand,
title={Hand keypoint detection in single images using multiview bootstrapping},
author={Simon, Tomas and Joo, Hanbyul and Matthews, Iain and Sheikh, Yaser},
booktitle={Proceedings of the IEEE conference on Computer Vision and Pattern Recognition},
pages={1145--1153},
year={2017}
}
Results on CMU Panoptic (MPII+NZSL val set)
Arch | Input Size | PCKh@0.7 | AUC | EPE | ckpt | log |
---|---|---|---|---|---|---|
pose_resnet_50 | 256x256 | 0.999 | 0.713 | 9.00 | ckpt | log |
Deeppose + Resnet on Rhd2d¶
DeepPose (CVPR'2014)
@inproceedings{toshev2014deeppose,
title={Deeppose: Human pose estimation via deep neural networks},
author={Toshev, Alexander and Szegedy, Christian},
booktitle={Proceedings of the IEEE conference on computer vision and pattern recognition},
pages={1653--1660},
year={2014}
}
ResNet (CVPR'2016)
@inproceedings{he2016deep,
title={Deep residual learning for image recognition},
author={He, Kaiming and Zhang, Xiangyu and Ren, Shaoqing and Sun, Jian},
booktitle={Proceedings of the IEEE conference on computer vision and pattern recognition},
pages={770--778},
year={2016}
}
RHD (ICCV'2017)
@TechReport{zb2017hand,
author={Christian Zimmermann and Thomas Brox},
title={Learning to Estimate 3D Hand Pose from Single RGB Images},
institution={arXiv:1705.01389},
year={2017},
note="https://arxiv.org/abs/1705.01389",
url="https://lmb.informatik.uni-freiburg.de/projects/hand3d/"
}
Results on RHD test set
Arch | Input Size | PCK@0.2 | AUC | EPE | ckpt | log |
---|---|---|---|---|---|---|
deeppose_resnet_50 | 256x256 | 0.988 | 0.865 | 3.29 | ckpt | log |
Topdown Heatmap + Resnet on Rhd2d¶
SimpleBaseline2D (ECCV'2018)
@inproceedings{xiao2018simple,
title={Simple baselines for human pose estimation and tracking},
author={Xiao, Bin and Wu, Haiping and Wei, Yichen},
booktitle={Proceedings of the European conference on computer vision (ECCV)},
pages={466--481},
year={2018}
}
ResNet (CVPR'2016)
@inproceedings{he2016deep,
title={Deep residual learning for image recognition},
author={He, Kaiming and Zhang, Xiangyu and Ren, Shaoqing and Sun, Jian},
booktitle={Proceedings of the IEEE conference on computer vision and pattern recognition},
pages={770--778},
year={2016}
}
RHD (ICCV'2017)
@TechReport{zb2017hand,
author={Christian Zimmermann and Thomas Brox},
title={Learning to Estimate 3D Hand Pose from Single RGB Images},
institution={arXiv:1705.01389},
year={2017},
note="https://arxiv.org/abs/1705.01389",
url="https://lmb.informatik.uni-freiburg.de/projects/hand3d/"
}
Results on RHD test set
Arch | Input Size | PCK@0.2 | AUC | EPE | ckpt | log |
---|---|---|---|---|---|---|
pose_resnet50 | 256x256 | 0.991 | 0.898 | 2.33 | ckpt | log |
Internet + Internet on Interhand3d¶
InterNet (ECCV'2020)
@InProceedings{Moon_2020_ECCV_InterHand2.6M,
author = {Moon, Gyeongsik and Yu, Shoou-I and Wen, He and Shiratori, Takaaki and Lee, Kyoung Mu},
title = {InterHand2.6M: A Dataset and Baseline for 3D Interacting Hand Pose Estimation from a Single RGB Image},
booktitle = {European Conference on Computer Vision (ECCV)},
year = {2020}
}
ResNet (CVPR'2016)
@inproceedings{he2016deep,
title={Deep residual learning for image recognition},
author={He, Kaiming and Zhang, Xiangyu and Ren, Shaoqing and Sun, Jian},
booktitle={Proceedings of the IEEE conference on computer vision and pattern recognition},
pages={770--778},
year={2016}
}
InterHand2.6M (ECCV'2020)
@InProceedings{Moon_2020_ECCV_InterHand2.6M,
author = {Moon, Gyeongsik and Yu, Shoou-I and Wen, He and Shiratori, Takaaki and Lee, Kyoung Mu},
title = {InterHand2.6M: A Dataset and Baseline for 3D Interacting Hand Pose Estimation from a Single RGB Image},
booktitle = {European Conference on Computer Vision (ECCV)},
year = {2020}
}
Results on InterHand2.6M val & test set
Train Set | Set | Arch | Input Size | MPJPE-single | MPJPE-interacting | MPJPE-all | MRRPE | APh | ckpt | log |
---|---|---|---|---|---|---|---|---|---|---|
All | test(H+M) | InterNet_resnet_50 | 256x256 | 9.47 | 13.40 | 11.59 | 29.28 | 0.99 | ckpt | log |
All | val(M) | InterNet_resnet_50 | 256x256 | 11.22 | 15.23 | 13.16 | 31.73 | 0.98 | ckpt | log |
Datasets¶
InterHand2.6M (ECCV’2020)¶
Topdown Heatmap + Resnet on Interhand2d¶
SimpleBaseline2D (ECCV'2018)
@inproceedings{xiao2018simple,
title={Simple baselines for human pose estimation and tracking},
author={Xiao, Bin and Wu, Haiping and Wei, Yichen},
booktitle={Proceedings of the European conference on computer vision (ECCV)},
pages={466--481},
year={2018}
}
ResNet (CVPR'2016)
@inproceedings{he2016deep,
title={Deep residual learning for image recognition},
author={He, Kaiming and Zhang, Xiangyu and Ren, Shaoqing and Sun, Jian},
booktitle={Proceedings of the IEEE conference on computer vision and pattern recognition},
pages={770--778},
year={2016}
}
InterHand2.6M (ECCV'2020)
@InProceedings{Moon_2020_ECCV_InterHand2.6M,
author = {Moon, Gyeongsik and Yu, Shoou-I and Wen, He and Shiratori, Takaaki and Lee, Kyoung Mu},
title = {InterHand2.6M: A Dataset and Baseline for 3D Interacting Hand Pose Estimation from a Single RGB Image},
booktitle = {European Conference on Computer Vision (ECCV)},
year = {2020}
}
Results on InterHand2.6M val & test set
Train Set | Set | Arch | Input Size | PCK@0.2 | AUC | EPE | ckpt | log |
---|---|---|---|---|---|---|---|---|
Human_annot | val(M) | pose_resnet_50 | 256x256 | 0.973 | 0.828 | 5.15 | ckpt | log |
Human_annot | test(H) | pose_resnet_50 | 256x256 | 0.973 | 0.826 | 5.27 | ckpt | log |
Human_annot | test(M) | pose_resnet_50 | 256x256 | 0.975 | 0.841 | 4.90 | ckpt | log |
Human_annot | test(H+M) | pose_resnet_50 | 256x256 | 0.975 | 0.839 | 4.97 | ckpt | log |
Machine_annot | val(M) | pose_resnet_50 | 256x256 | 0.970 | 0.824 | 5.39 | ckpt | log |
Machine_annot | test(H) | pose_resnet_50 | 256x256 | 0.969 | 0.821 | 5.52 | ckpt | log |
Machine_annot | test(M) | pose_resnet_50 | 256x256 | 0.972 | 0.838 | 5.03 | ckpt | log |
Machine_annot | test(H+M) | pose_resnet_50 | 256x256 | 0.972 | 0.837 | 5.11 | ckpt | log |
All | val(M) | pose_resnet_50 | 256x256 | 0.977 | 0.840 | 4.66 | ckpt | log |
All | test(H) | pose_resnet_50 | 256x256 | 0.979 | 0.839 | 4.65 | ckpt | log |
All | test(M) | pose_resnet_50 | 256x256 | 0.979 | 0.838 | 4.42 | ckpt | log |
All | test(H+M) | pose_resnet_50 | 256x256 | 0.979 | 0.851 | 4.46 | ckpt | log |
Internet + Internet on Interhand3d¶
InterNet (ECCV'2020)
@InProceedings{Moon_2020_ECCV_InterHand2.6M,
author = {Moon, Gyeongsik and Yu, Shoou-I and Wen, He and Shiratori, Takaaki and Lee, Kyoung Mu},
title = {InterHand2.6M: A Dataset and Baseline for 3D Interacting Hand Pose Estimation from a Single RGB Image},
booktitle = {European Conference on Computer Vision (ECCV)},
year = {2020}
}
ResNet (CVPR'2016)
@inproceedings{he2016deep,
title={Deep residual learning for image recognition},
author={He, Kaiming and Zhang, Xiangyu and Ren, Shaoqing and Sun, Jian},
booktitle={Proceedings of the IEEE conference on computer vision and pattern recognition},
pages={770--778},
year={2016}
}
InterHand2.6M (ECCV'2020)
@InProceedings{Moon_2020_ECCV_InterHand2.6M,
author = {Moon, Gyeongsik and Yu, Shoou-I and Wen, He and Shiratori, Takaaki and Lee, Kyoung Mu},
title = {InterHand2.6M: A Dataset and Baseline for 3D Interacting Hand Pose Estimation from a Single RGB Image},
booktitle = {European Conference on Computer Vision (ECCV)},
year = {2020}
}
Results on InterHand2.6M val & test set
Train Set | Set | Arch | Input Size | MPJPE-single | MPJPE-interacting | MPJPE-all | MRRPE | APh | ckpt | log |
---|---|---|---|---|---|---|---|---|---|---|
All | test(H+M) | InterNet_resnet_50 | 256x256 | 9.47 | 13.40 | 11.59 | 29.28 | 0.99 | ckpt | log |
All | val(M) | InterNet_resnet_50 | 256x256 | 11.22 | 15.23 | 13.16 | 31.73 | 0.98 | ckpt | log |
COCO-WholeBody-Face (ECCV’2020)¶
Topdown Heatmap + Hrnetv2 + Dark + Coco + Wholebody on Coco_wholebody_face¶
HRNetv2 (TPAMI'2019)
@article{WangSCJDZLMTWLX19,
title={Deep High-Resolution Representation Learning for Visual Recognition},
author={Jingdong Wang and Ke Sun and Tianheng Cheng and
Borui Jiang and Chaorui Deng and Yang Zhao and Dong Liu and Yadong Mu and
Mingkui Tan and Xinggang Wang and Wenyu Liu and Bin Xiao},
journal={TPAMI},
year={2019}
}
DarkPose (CVPR'2020)
@inproceedings{zhang2020distribution,
title={Distribution-aware coordinate representation for human pose estimation},
author={Zhang, Feng and Zhu, Xiatian and Dai, Hanbin and Ye, Mao and Zhu, Ce},
booktitle={Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition},
pages={7093--7102},
year={2020}
}
COCO-WholeBody-Face (ECCV'2020)
@inproceedings{jin2020whole,
title={Whole-Body Human Pose Estimation in the Wild},
author={Jin, Sheng and Xu, Lumin and Xu, Jin and Wang, Can and Liu, Wentao and Qian, Chen and Ouyang, Wanli and Luo, Ping},
booktitle={Proceedings of the European Conference on Computer Vision (ECCV)},
year={2020}
}
Results on COCO-WholeBody-Face val set
Arch | Input Size | NME | ckpt | log |
---|---|---|---|---|
pose_hrnetv2_w18_dark | 256x256 | 0.0513 | ckpt | log |
Topdown Heatmap + Mobilenetv2 + Coco + Wholebody on Coco_wholebody_face¶
MobilenetV2 (CVPR'2018)
@inproceedings{sandler2018mobilenetv2,
title={Mobilenetv2: Inverted residuals and linear bottlenecks},
author={Sandler, Mark and Howard, Andrew and Zhu, Menglong and Zhmoginov, Andrey and Chen, Liang-Chieh},
booktitle={Proceedings of the IEEE conference on computer vision and pattern recognition},
pages={4510--4520},
year={2018}
}
COCO-WholeBody-Face (ECCV'2020)
@inproceedings{jin2020whole,
title={Whole-Body Human Pose Estimation in the Wild},
author={Jin, Sheng and Xu, Lumin and Xu, Jin and Wang, Can and Liu, Wentao and Qian, Chen and Ouyang, Wanli and Luo, Ping},
booktitle={Proceedings of the European Conference on Computer Vision (ECCV)},
year={2020}
}
Results on COCO-WholeBody-Face val set
Arch | Input Size | NME | ckpt | log |
---|---|---|---|---|
pose_mobilenetv2 | 256x256 | 0.0612 | ckpt | log |
Topdown Heatmap + Hourglass + Coco + Wholebody on Coco_wholebody_face¶
Hourglass (ECCV'2016)
@inproceedings{newell2016stacked,
title={Stacked hourglass networks for human pose estimation},
author={Newell, Alejandro and Yang, Kaiyu and Deng, Jia},
booktitle={European conference on computer vision},
pages={483--499},
year={2016},
organization={Springer}
}
COCO-WholeBody-Face (ECCV'2020)
@inproceedings{jin2020whole,
title={Whole-Body Human Pose Estimation in the Wild},
author={Jin, Sheng and Xu, Lumin and Xu, Jin and Wang, Can and Liu, Wentao and Qian, Chen and Ouyang, Wanli and Luo, Ping},
booktitle={Proceedings of the European Conference on Computer Vision (ECCV)},
year={2020}
}
Results on COCO-WholeBody-Face val set
Arch | Input Size | NME | ckpt | log |
---|---|---|---|---|
pose_hourglass_52 | 256x256 | 0.0586 | ckpt | log |
Topdown Heatmap + Resnet + Coco + Wholebody on Coco_wholebody_face¶
SimpleBaseline2D (ECCV'2018)
@inproceedings{xiao2018simple,
title={Simple baselines for human pose estimation and tracking},
author={Xiao, Bin and Wu, Haiping and Wei, Yichen},
booktitle={Proceedings of the European conference on computer vision (ECCV)},
pages={466--481},
year={2018}
}
ResNet (CVPR'2016)
@inproceedings{he2016deep,
title={Deep residual learning for image recognition},
author={He, Kaiming and Zhang, Xiangyu and Ren, Shaoqing and Sun, Jian},
booktitle={Proceedings of the IEEE conference on computer vision and pattern recognition},
pages={770--778},
year={2016}
}
COCO-WholeBody-Face (ECCV'2020)
@inproceedings{jin2020whole,
title={Whole-Body Human Pose Estimation in the Wild},
author={Jin, Sheng and Xu, Lumin and Xu, Jin and Wang, Can and Liu, Wentao and Qian, Chen and Ouyang, Wanli and Luo, Ping},
booktitle={Proceedings of the European Conference on Computer Vision (ECCV)},
year={2020}
}
Results on COCO-WholeBody-Face val set
Arch | Input Size | NME | ckpt | log |
---|---|---|---|---|
pose_res50 | 256x256 | 0.0566 | ckpt | log |
Topdown Heatmap + Hrnetv2 + Coco + Wholebody on Coco_wholebody_face¶
HRNetv2 (TPAMI'2019)
@article{WangSCJDZLMTWLX19,
title={Deep High-Resolution Representation Learning for Visual Recognition},
author={Jingdong Wang and Ke Sun and Tianheng Cheng and
Borui Jiang and Chaorui Deng and Yang Zhao and Dong Liu and Yadong Mu and
Mingkui Tan and Xinggang Wang and Wenyu Liu and Bin Xiao},
journal={TPAMI},
year={2019}
}
COCO-WholeBody-Face (ECCV'2020)
@inproceedings{jin2020whole,
title={Whole-Body Human Pose Estimation in the Wild},
author={Jin, Sheng and Xu, Lumin and Xu, Jin and Wang, Can and Liu, Wentao and Qian, Chen and Ouyang, Wanli and Luo, Ping},
booktitle={Proceedings of the European Conference on Computer Vision (ECCV)},
year={2020}
}
Results on COCO-WholeBody-Face val set
Arch | Input Size | NME | ckpt | log |
---|---|---|---|---|
pose_hrnetv2_w18 | 256x256 | 0.0569 | ckpt | log |
Topdown Heatmap + Scnet + Coco + Wholebody on Coco_wholebody_face¶
SCNet (CVPR'2020)
@inproceedings{liu2020improving,
title={Improving Convolutional Networks with Self-Calibrated Convolutions},
author={Liu, Jiang-Jiang and Hou, Qibin and Cheng, Ming-Ming and Wang, Changhu and Feng, Jiashi},
booktitle={Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition},
pages={10096--10105},
year={2020}
}
COCO-WholeBody-Face (ECCV'2020)
@inproceedings{jin2020whole,
title={Whole-Body Human Pose Estimation in the Wild},
author={Jin, Sheng and Xu, Lumin and Xu, Jin and Wang, Can and Liu, Wentao and Qian, Chen and Ouyang, Wanli and Luo, Ping},
booktitle={Proceedings of the European Conference on Computer Vision (ECCV)},
year={2020}
}
Results on COCO-WholeBody-Face val set
Arch | Input Size | NME | ckpt | log |
---|---|---|---|---|
pose_scnet_50 | 256x256 | 0.0565 | ckpt | log |
ATRW (ACM MM’2020)¶
Topdown Heatmap + Resnet on Atrw¶
SimpleBaseline2D (ECCV'2018)
@inproceedings{xiao2018simple,
title={Simple baselines for human pose estimation and tracking},
author={Xiao, Bin and Wu, Haiping and Wei, Yichen},
booktitle={Proceedings of the European conference on computer vision (ECCV)},
pages={466--481},
year={2018}
}
ATRW (ACM MM'2020)
@inproceedings{li2020atrw,
title={ATRW: A Benchmark for Amur Tiger Re-identification in the Wild},
author={Li, Shuyuan and Li, Jianguo and Tang, Hanlin and Qian, Rui and Lin, Weiyao},
booktitle={Proceedings of the 28th ACM International Conference on Multimedia},
pages={2590--2598},
year={2020}
}
Results on ATRW validation set
Arch | Input Size | AP | AP50 | AP75 | AR | AR50 | ckpt | log |
---|---|---|---|---|---|---|---|---|
pose_resnet_50 | 256x256 | 0.900 | 0.973 | 0.932 | 0.929 | 0.985 | ckpt | log |
pose_resnet_101 | 256x256 | 0.898 | 0.973 | 0.936 | 0.927 | 0.985 | ckpt | log |
pose_resnet_152 | 256x256 | 0.896 | 0.973 | 0.931 | 0.927 | 0.985 | ckpt | log |
Topdown Heatmap + Hrnet on Atrw¶
HRNet (CVPR'2019)
@inproceedings{sun2019deep,
title={Deep high-resolution representation learning for human pose estimation},
author={Sun, Ke and Xiao, Bin and Liu, Dong and Wang, Jingdong},
booktitle={Proceedings of the IEEE conference on computer vision and pattern recognition},
pages={5693--5703},
year={2019}
}
ATRW (ACM MM'2020)
@inproceedings{li2020atrw,
title={ATRW: A Benchmark for Amur Tiger Re-identification in the Wild},
author={Li, Shuyuan and Li, Jianguo and Tang, Hanlin and Qian, Rui and Lin, Weiyao},
booktitle={Proceedings of the 28th ACM International Conference on Multimedia},
pages={2590--2598},
year={2020}
}
Results on ATRW validation set
Arch | Input Size | AP | AP50 | AP75 | AR | AR50 | ckpt | log |
---|---|---|---|---|---|---|---|---|
pose_hrnet_w32 | 256x256 | 0.912 | 0.973 | 0.959 | 0.938 | 0.985 | ckpt | log |
pose_hrnet_w48 | 256x256 | 0.911 | 0.972 | 0.946 | 0.937 | 0.985 | ckpt | log |
Horse-10 (WACV’2021)¶
Topdown Heatmap + Resnet on Horse10¶
SimpleBaseline2D (ECCV'2018)
@inproceedings{xiao2018simple,
title={Simple baselines for human pose estimation and tracking},
author={Xiao, Bin and Wu, Haiping and Wei, Yichen},
booktitle={Proceedings of the European conference on computer vision (ECCV)},
pages={466--481},
year={2018}
}
Horse-10 (WACV'2021)
@inproceedings{mathis2021pretraining,
title={Pretraining boosts out-of-domain robustness for pose estimation},
author={Mathis, Alexander and Biasi, Thomas and Schneider, Steffen and Yuksekgonul, Mert and Rogers, Byron and Bethge, Matthias and Mathis, Mackenzie W},
booktitle={Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision},
pages={1859--1868},
year={2021}
}
Results on Horse-10 test set
Set | Arch | Input Size | PCK@0.3 | NME | ckpt | log |
---|---|---|---|---|---|---|
split1 | pose_resnet_50 | 256x256 | 0.956 | 0.113 | ckpt | log |
split2 | pose_resnet_50 | 256x256 | 0.954 | 0.111 | ckpt | log |
split3 | pose_resnet_50 | 256x256 | 0.946 | 0.129 | ckpt | log |
split1 | pose_resnet_101 | 256x256 | 0.958 | 0.115 | ckpt | log |
split2 | pose_resnet_101 | 256x256 | 0.955 | 0.115 | ckpt | log |
split3 | pose_resnet_101 | 256x256 | 0.946 | 0.126 | ckpt | log |
split1 | pose_resnet_152 | 256x256 | 0.969 | 0.105 | ckpt | log |
split2 | pose_resnet_152 | 256x256 | 0.970 | 0.103 | ckpt | log |
split3 | pose_resnet_152 | 256x256 | 0.957 | 0.131 | ckpt | log |
Topdown Heatmap + Hrnet on Horse10¶
HRNet (CVPR'2019)
@inproceedings{sun2019deep,
title={Deep high-resolution representation learning for human pose estimation},
author={Sun, Ke and Xiao, Bin and Liu, Dong and Wang, Jingdong},
booktitle={Proceedings of the IEEE conference on computer vision and pattern recognition},
pages={5693--5703},
year={2019}
}
Horse-10 (WACV'2021)
@inproceedings{mathis2021pretraining,
title={Pretraining boosts out-of-domain robustness for pose estimation},
author={Mathis, Alexander and Biasi, Thomas and Schneider, Steffen and Yuksekgonul, Mert and Rogers, Byron and Bethge, Matthias and Mathis, Mackenzie W},
booktitle={Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision},
pages={1859--1868},
year={2021}
}
Results on Horse-10 test set
Set | Arch | Input Size | PCK@0.3 | NME | ckpt | log |
---|---|---|---|---|---|---|
split1 | pose_hrnet_w32 | 256x256 | 0.951 | 0.122 | ckpt | log |
split2 | pose_hrnet_w32 | 256x256 | 0.949 | 0.116 | ckpt | log |
split3 | pose_hrnet_w32 | 256x256 | 0.939 | 0.153 | ckpt | log |
split1 | pose_hrnet_w48 | 256x256 | 0.973 | 0.095 | ckpt | log |
split2 | pose_hrnet_w48 | 256x256 | 0.969 | 0.101 | ckpt | log |
split3 | pose_hrnet_w48 | 256x256 | 0.961 | 0.128 | ckpt | log |
COCO-WholeBody-Hand (ECCV’2020)¶
Topdown Heatmap + Hrnetv2 + Coco + Wholebody on Coco_wholebody_hand¶
HRNetv2 (TPAMI'2019)
@article{WangSCJDZLMTWLX19,
title={Deep High-Resolution Representation Learning for Visual Recognition},
author={Jingdong Wang and Ke Sun and Tianheng Cheng and
Borui Jiang and Chaorui Deng and Yang Zhao and Dong Liu and Yadong Mu and
Mingkui Tan and Xinggang Wang and Wenyu Liu and Bin Xiao},
journal={TPAMI},
year={2019}
}
COCO-WholeBody-Hand (ECCV'2020)
@inproceedings{jin2020whole,
title={Whole-Body Human Pose Estimation in the Wild},
author={Jin, Sheng and Xu, Lumin and Xu, Jin and Wang, Can and Liu, Wentao and Qian, Chen and Ouyang, Wanli and Luo, Ping},
booktitle={Proceedings of the European Conference on Computer Vision (ECCV)},
year={2020}
}
Results on COCO-WholeBody-Hand val set
Arch | Input Size | PCK@0.2 | AUC | EPE | ckpt | log |
---|---|---|---|---|---|---|
pose_hrnetv2_w18 | 256x256 | 0.813 | 0.840 | 4.39 | ckpt | log |
Topdown Heatmap + Hourglass + Coco + Wholebody on Coco_wholebody_hand¶
Hourglass (ECCV'2016)
@inproceedings{newell2016stacked,
title={Stacked hourglass networks for human pose estimation},
author={Newell, Alejandro and Yang, Kaiyu and Deng, Jia},
booktitle={European conference on computer vision},
pages={483--499},
year={2016},
organization={Springer}
}
COCO-WholeBody-Hand (ECCV'2020)
@inproceedings{jin2020whole,
title={Whole-Body Human Pose Estimation in the Wild},
author={Jin, Sheng and Xu, Lumin and Xu, Jin and Wang, Can and Liu, Wentao and Qian, Chen and Ouyang, Wanli and Luo, Ping},
booktitle={Proceedings of the European Conference on Computer Vision (ECCV)},
year={2020}
}
Results on COCO-WholeBody-Hand val set
Arch | Input Size | PCK@0.2 | AUC | EPE | ckpt | log |
---|---|---|---|---|---|---|
pose_hourglass_52 | 256x256 | 0.804 | 0.835 | 4.54 | ckpt | log |
Topdown Heatmap + Mobilenetv2 + Coco + Wholebody on Coco_wholebody_hand¶
MobilenetV2 (CVPR'2018)
@inproceedings{sandler2018mobilenetv2,
title={Mobilenetv2: Inverted residuals and linear bottlenecks},
author={Sandler, Mark and Howard, Andrew and Zhu, Menglong and Zhmoginov, Andrey and Chen, Liang-Chieh},
booktitle={Proceedings of the IEEE conference on computer vision and pattern recognition},
pages={4510--4520},
year={2018}
}
COCO-WholeBody-Hand (ECCV'2020)
@inproceedings{jin2020whole,
title={Whole-Body Human Pose Estimation in the Wild},
author={Jin, Sheng and Xu, Lumin and Xu, Jin and Wang, Can and Liu, Wentao and Qian, Chen and Ouyang, Wanli and Luo, Ping},
booktitle={Proceedings of the European Conference on Computer Vision (ECCV)},
year={2020}
}
Results on COCO-WholeBody-Hand val set
Arch | Input Size | PCK@0.2 | AUC | EPE | ckpt | log |
---|---|---|---|---|---|---|
pose_mobilenetv2 | 256x256 | 0.795 | 0.829 | 4.77 | ckpt | log |
Topdown Heatmap + Resnet + Coco + Wholebody on Coco_wholebody_hand¶
SimpleBaseline2D (ECCV'2018)
@inproceedings{xiao2018simple,
title={Simple baselines for human pose estimation and tracking},
author={Xiao, Bin and Wu, Haiping and Wei, Yichen},
booktitle={Proceedings of the European conference on computer vision (ECCV)},
pages={466--481},
year={2018}
}
ResNet (CVPR'2016)
@inproceedings{he2016deep,
title={Deep residual learning for image recognition},
author={He, Kaiming and Zhang, Xiangyu and Ren, Shaoqing and Sun, Jian},
booktitle={Proceedings of the IEEE conference on computer vision and pattern recognition},
pages={770--778},
year={2016}
}
COCO-WholeBody-Hand (ECCV'2020)
@inproceedings{jin2020whole,
title={Whole-Body Human Pose Estimation in the Wild},
author={Jin, Sheng and Xu, Lumin and Xu, Jin and Wang, Can and Liu, Wentao and Qian, Chen and Ouyang, Wanli and Luo, Ping},
booktitle={Proceedings of the European Conference on Computer Vision (ECCV)},
year={2020}
}
Results on COCO-WholeBody-Hand val set
Arch | Input Size | PCK@0.2 | AUC | EPE | ckpt | log |
---|---|---|---|---|---|---|
pose_resnet_50 | 256x256 | 0.800 | 0.833 | 4.64 | ckpt | log |
Topdown Heatmap + Litehrnet + Coco + Wholebody on Coco_wholebody_hand¶
LiteHRNet (CVPR'2021)
@inproceedings{Yulitehrnet21,
title={Lite-HRNet: A Lightweight High-Resolution Network},
author={Yu, Changqian and Xiao, Bin and Gao, Changxin and Yuan, Lu and Zhang, Lei and Sang, Nong and Wang, Jingdong},
booktitle={CVPR},
year={2021}
}
COCO-WholeBody-Hand (ECCV'2020)
@inproceedings{jin2020whole,
title={Whole-Body Human Pose Estimation in the Wild},
author={Jin, Sheng and Xu, Lumin and Xu, Jin and Wang, Can and Liu, Wentao and Qian, Chen and Ouyang, Wanli and Luo, Ping},
booktitle={Proceedings of the European Conference on Computer Vision (ECCV)},
year={2020}
}
Results on COCO-WholeBody-Hand val set
Arch | Input Size | PCK@0.2 | AUC | EPE | ckpt | log |
---|---|---|---|---|---|---|
LiteHRNet-18 | 256x256 | 0.795 | 0.830 | 4.77 | ckpt | log |
Topdown Heatmap + Scnet + Coco + Wholebody on Coco_wholebody_hand¶
SCNet (CVPR'2020)
@inproceedings{liu2020improving,
title={Improving Convolutional Networks with Self-Calibrated Convolutions},
author={Liu, Jiang-Jiang and Hou, Qibin and Cheng, Ming-Ming and Wang, Changhu and Feng, Jiashi},
booktitle={Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition},
pages={10096--10105},
year={2020}
}
COCO-WholeBody-Hand (ECCV'2020)
@inproceedings{jin2020whole,
title={Whole-Body Human Pose Estimation in the Wild},
author={Jin, Sheng and Xu, Lumin and Xu, Jin and Wang, Can and Liu, Wentao and Qian, Chen and Ouyang, Wanli and Luo, Ping},
booktitle={Proceedings of the European Conference on Computer Vision (ECCV)},
year={2020}
}
Results on COCO-WholeBody-Hand val set
Arch | Input Size | PCK@0.2 | AUC | EPE | ckpt | log |
---|---|---|---|---|---|---|
pose_scnet_50 | 256x256 | 0.803 | 0.834 | 4.55 | ckpt | log |
Topdown Heatmap + Hrnetv2 + Dark + Coco + Wholebody on Coco_wholebody_hand¶
HRNetv2 (TPAMI'2019)
@article{WangSCJDZLMTWLX19,
title={Deep High-Resolution Representation Learning for Visual Recognition},
author={Jingdong Wang and Ke Sun and Tianheng Cheng and
Borui Jiang and Chaorui Deng and Yang Zhao and Dong Liu and Yadong Mu and
Mingkui Tan and Xinggang Wang and Wenyu Liu and Bin Xiao},
journal={TPAMI},
year={2019}
}
DarkPose (CVPR'2020)
@inproceedings{zhang2020distribution,
title={Distribution-aware coordinate representation for human pose estimation},
author={Zhang, Feng and Zhu, Xiatian and Dai, Hanbin and Ye, Mao and Zhu, Ce},
booktitle={Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition},
pages={7093--7102},
year={2020}
}
COCO-WholeBody-Hand (ECCV'2020)
@inproceedings{jin2020whole,
title={Whole-Body Human Pose Estimation in the Wild},
author={Jin, Sheng and Xu, Lumin and Xu, Jin and Wang, Can and Liu, Wentao and Qian, Chen and Ouyang, Wanli and Luo, Ping},
booktitle={Proceedings of the European Conference on Computer Vision (ECCV)},
year={2020}
}
Results on COCO-WholeBody-Hand val set
Arch | Input Size | PCK@0.2 | AUC | EPE | ckpt | log |
---|---|---|---|---|---|---|
pose_hrnetv2_w18_dark | 256x256 | 0.814 | 0.840 | 4.37 | ckpt | log |
PoseTrack18 (CVPR’2018)¶
Topdown Heatmap + Hrnet on Posetrack18¶
HRNet (CVPR'2019)
@inproceedings{sun2019deep,
title={Deep high-resolution representation learning for human pose estimation},
author={Sun, Ke and Xiao, Bin and Liu, Dong and Wang, Jingdong},
booktitle={Proceedings of the IEEE conference on computer vision and pattern recognition},
pages={5693--5703},
year={2019}
}
PoseTrack18 (CVPR'2018)
@inproceedings{andriluka2018posetrack,
title={Posetrack: A benchmark for human pose estimation and tracking},
author={Andriluka, Mykhaylo and Iqbal, Umar and Insafutdinov, Eldar and Pishchulin, Leonid and Milan, Anton and Gall, Juergen and Schiele, Bernt},
booktitle={Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition},
pages={5167--5176},
year={2018}
}
Results on PoseTrack2018 val with ground-truth bounding boxes
Arch | Input Size | Head | Shou | Elb | Wri | Hip | Knee | Ankl | Total | ckpt | log |
---|---|---|---|---|---|---|---|---|---|---|---|
pose_hrnet_w32 | 256x192 | 87.4 | 88.6 | 84.3 | 78.5 | 79.7 | 81.8 | 78.8 | 83.0 | ckpt | log |
pose_hrnet_w32 | 384x288 | 87.0 | 88.8 | 85.0 | 80.1 | 80.5 | 82.6 | 79.4 | 83.6 | ckpt | log |
pose_hrnet_w48 | 256x192 | 88.2 | 90.1 | 85.8 | 80.8 | 80.7 | 83.3 | 80.3 | 84.4 | ckpt | log |
pose_hrnet_w48 | 384x288 | 87.8 | 90.0 | 85.9 | 81.3 | 81.1 | 83.3 | 80.9 | 84.5 | ckpt | log |
The models are first pre-trained on COCO dataset, and then fine-tuned on PoseTrack18.
Results on PoseTrack2018 val with MMDetection pre-trained Cascade R-CNN (X-101-64x4d-FPN) human detector
Arch | Input Size | Head | Shou | Elb | Wri | Hip | Knee | Ankl | Total | ckpt | log |
---|---|---|---|---|---|---|---|---|---|---|---|
pose_hrnet_w32 | 256x192 | 78.0 | 82.9 | 79.5 | 73.8 | 76.9 | 76.6 | 70.2 | 76.9 | ckpt | log |
pose_hrnet_w32 | 384x288 | 79.9 | 83.6 | 80.4 | 74.5 | 74.8 | 76.1 | 70.5 | 77.3 | ckpt | log |
pose_hrnet_w48 | 256x192 | 80.1 | 83.4 | 80.6 | 74.8 | 74.3 | 76.8 | 70.4 | 77.4 | ckpt | log |
pose_hrnet_w48 | 384x288 | 80.2 | 83.8 | 80.9 | 75.2 | 74.7 | 76.7 | 71.7 | 77.8 | ckpt | log |
The models are first pre-trained on COCO dataset, and then fine-tuned on PoseTrack18.
Topdown Heatmap + Resnet on Posetrack18¶
SimpleBaseline2D (ECCV'2018)
@inproceedings{xiao2018simple,
title={Simple baselines for human pose estimation and tracking},
author={Xiao, Bin and Wu, Haiping and Wei, Yichen},
booktitle={Proceedings of the European conference on computer vision (ECCV)},
pages={466--481},
year={2018}
}
ResNet (CVPR'2016)
@inproceedings{he2016deep,
title={Deep residual learning for image recognition},
author={He, Kaiming and Zhang, Xiangyu and Ren, Shaoqing and Sun, Jian},
booktitle={Proceedings of the IEEE conference on computer vision and pattern recognition},
pages={770--778},
year={2016}
}
PoseTrack18 (CVPR'2018)
@inproceedings{andriluka2018posetrack,
title={Posetrack: A benchmark for human pose estimation and tracking},
author={Andriluka, Mykhaylo and Iqbal, Umar and Insafutdinov, Eldar and Pishchulin, Leonid and Milan, Anton and Gall, Juergen and Schiele, Bernt},
booktitle={Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition},
pages={5167--5176},
year={2018}
}
Results on PoseTrack2018 val with ground-truth bounding boxes
Arch | Input Size | Head | Shou | Elb | Wri | Hip | Knee | Ankl | Total | ckpt | log |
---|---|---|---|---|---|---|---|---|---|---|---|
pose_resnet_50 | 256x192 | 86.5 | 87.5 | 82.3 | 75.6 | 79.9 | 78.6 | 74.0 | 81.0 | ckpt | log |
The models are first pre-trained on COCO dataset, and then fine-tuned on PoseTrack18.
Results on PoseTrack2018 val with MMDetection pre-trained Cascade R-CNN (X-101-64x4d-FPN) human detector
Arch | Input Size | Head | Shou | Elb | Wri | Hip | Knee | Ankl | Total | ckpt | log |
---|---|---|---|---|---|---|---|---|---|---|---|
pose_resnet_50 | 256x192 | 78.9 | 81.9 | 77.8 | 70.8 | 75.3 | 73.2 | 66.4 | 75.2 | ckpt | log |
The models are first pre-trained on COCO dataset, and then fine-tuned on PoseTrack18.
Posewarper + Hrnet + Posetrack18 on Posetrack18¶
PoseWarper (NeurIPS'2019)
@inproceedings{NIPS2019_gberta,
title = {Learning Temporal Pose Estimation from Sparsely Labeled Videos},
author = {Bertasius, Gedas and Feichtenhofer, Christoph, and Tran, Du and Shi, Jianbo, and Torresani, Lorenzo},
booktitle = {Advances in Neural Information Processing Systems 33},
year = {2019},
}
HRNet (CVPR'2019)
@inproceedings{sun2019deep,
title={Deep high-resolution representation learning for human pose estimation},
author={Sun, Ke and Xiao, Bin and Liu, Dong and Wang, Jingdong},
booktitle={Proceedings of the IEEE conference on computer vision and pattern recognition},
pages={5693--5703},
year={2019}
}
PoseTrack18 (CVPR'2018)
@inproceedings{andriluka2018posetrack,
title={Posetrack: A benchmark for human pose estimation and tracking},
author={Andriluka, Mykhaylo and Iqbal, Umar and Insafutdinov, Eldar and Pishchulin, Leonid and Milan, Anton and Gall, Juergen and Schiele, Bernt},
booktitle={Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition},
pages={5167--5176},
year={2018}
}
COCO (ECCV'2014)
@inproceedings{lin2014microsoft,
title={Microsoft coco: Common objects in context},
author={Lin, Tsung-Yi and Maire, Michael and Belongie, Serge and Hays, James and Perona, Pietro and Ramanan, Deva and Doll{\'a}r, Piotr and Zitnick, C Lawrence},
booktitle={European conference on computer vision},
pages={740--755},
year={2014},
organization={Springer}
}
Note that the training of PoseWarper can be split into two stages.
The first-stage is trained with the pre-trained checkpoint on COCO dataset, and the main backbone is fine-tuned on PoseTrack18 in a single-frame setting.
The second-stage is trained with the last checkpoint from the first stage, and the warping offsets are learned in a multi-frame setting while the backbone is frozen.
Results on PoseTrack2018 val with ground-truth bounding boxes
Arch | Input Size | Head | Shou | Elb | Wri | Hip | Knee | Ankl | Total | ckpt | log |
---|---|---|---|---|---|---|---|---|---|---|---|
pose_hrnet_w48 | 384x288 | 88.2 | 90.3 | 86.1 | 81.6 | 81.8 | 83.8 | 81.5 | 85.0 | ckpt | log |
Results on PoseTrack2018 val with precomputed human bounding boxes from PoseWarper supplementary data files from this link1.
Arch | Input Size | Head | Shou | Elb | Wri | Hip | Knee | Ankl | Total | ckpt | log |
---|---|---|---|---|---|---|---|---|---|---|---|
pose_hrnet_w48 | 384x288 | 81.8 | 85.6 | 82.7 | 77.2 | 76.8 | 79.0 | 74.4 | 79.8 | ckpt | log |
1 Please download the precomputed human bounding boxes on PoseTrack2018 val from $PoseWarper_supp_files/posetrack18_precomputed_boxes/val_boxes.json
and place it here: $mmpose/data/posetrack18/posetrack18_precomputed_boxes/val_boxes.json
to be consistent with the config. Please refer to DATA Preparation for more detail about data preparation.
AI Challenger (ArXiv’2017)¶
Associative Embedding + Hrnet on Aic¶
Associative Embedding (NIPS'2017)
@inproceedings{newell2017associative,
title={Associative embedding: End-to-end learning for joint detection and grouping},
author={Newell, Alejandro and Huang, Zhiao and Deng, Jia},
booktitle={Advances in neural information processing systems},
pages={2277--2287},
year={2017}
}
HRNet (CVPR'2019)
@inproceedings{sun2019deep,
title={Deep high-resolution representation learning for human pose estimation},
author={Sun, Ke and Xiao, Bin and Liu, Dong and Wang, Jingdong},
booktitle={Proceedings of the IEEE conference on computer vision and pattern recognition},
pages={5693--5703},
year={2019}
}
AI Challenger (ArXiv'2017)
@article{wu2017ai,
title={Ai challenger: A large-scale dataset for going deeper in image understanding},
author={Wu, Jiahong and Zheng, He and Zhao, Bo and Li, Yixin and Yan, Baoming and Liang, Rui and Wang, Wenjia and Zhou, Shipei and Lin, Guosen and Fu, Yanwei and others},
journal={arXiv preprint arXiv:1711.06475},
year={2017}
}
Results on AIC validation set without multi-scale test
Arch | Input Size | AP | AP50 | AP75 | AR | AR50 | ckpt | log |
---|---|---|---|---|---|---|---|---|
HRNet-w32 | 512x512 | 0.303 | 0.697 | 0.225 | 0.373 | 0.755 | ckpt | log |
Results on AIC validation set with multi-scale test. 3 default scales ([2, 1, 0.5]) are used
Arch | Input Size | AP | AP50 | AP75 | AR | AR50 | ckpt | log |
---|---|---|---|---|---|---|---|---|
HRNet-w32 | 512x512 | 0.318 | 0.717 | 0.246 | 0.379 | 0.764 | ckpt | log |
Associative Embedding + Higherhrnet on Aic¶
Associative Embedding (NIPS'2017)
@inproceedings{newell2017associative,
title={Associative embedding: End-to-end learning for joint detection and grouping},
author={Newell, Alejandro and Huang, Zhiao and Deng, Jia},
booktitle={Advances in neural information processing systems},
pages={2277--2287},
year={2017}
}
HigherHRNet (CVPR'2020)
@inproceedings{cheng2020higherhrnet,
title={HigherHRNet: Scale-Aware Representation Learning for Bottom-Up Human Pose Estimation},
author={Cheng, Bowen and Xiao, Bin and Wang, Jingdong and Shi, Honghui and Huang, Thomas S and Zhang, Lei},
booktitle={Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition},
pages={5386--5395},
year={2020}
}
AI Challenger (ArXiv'2017)
@article{wu2017ai,
title={Ai challenger: A large-scale dataset for going deeper in image understanding},
author={Wu, Jiahong and Zheng, He and Zhao, Bo and Li, Yixin and Yan, Baoming and Liang, Rui and Wang, Wenjia and Zhou, Shipei and Lin, Guosen and Fu, Yanwei and others},
journal={arXiv preprint arXiv:1711.06475},
year={2017}
}
Results on AIC validation set without multi-scale test
Arch | Input Size | AP | AP50 | AP75 | AR | AR50 | ckpt | log |
---|---|---|---|---|---|---|---|---|
HigherHRNet-w32 | 512x512 | 0.315 | 0.710 | 0.243 | 0.379 | 0.757 | ckpt | log |
Results on AIC validation set with multi-scale test. 3 default scales ([2, 1, 0.5]) are used
Arch | Input Size | AP | AP50 | AP75 | AR | AR50 | ckpt | log |
---|---|---|---|---|---|---|---|---|
HigherHRNet-w32 | 512x512 | 0.323 | 0.718 | 0.254 | 0.379 | 0.758 | ckpt | log |
Topdown Heatmap + Hrnet on Aic¶
HRNet (CVPR'2019)
@inproceedings{sun2019deep,
title={Deep high-resolution representation learning for human pose estimation},
author={Sun, Ke and Xiao, Bin and Liu, Dong and Wang, Jingdong},
booktitle={Proceedings of the IEEE conference on computer vision and pattern recognition},
pages={5693--5703},
year={2019}
}
AI Challenger (ArXiv'2017)
@article{wu2017ai,
title={Ai challenger: A large-scale dataset for going deeper in image understanding},
author={Wu, Jiahong and Zheng, He and Zhao, Bo and Li, Yixin and Yan, Baoming and Liang, Rui and Wang, Wenjia and Zhou, Shipei and Lin, Guosen and Fu, Yanwei and others},
journal={arXiv preprint arXiv:1711.06475},
year={2017}
}
Results on AIC val set with ground-truth bounding boxes
Arch | Input Size | AP | AP50 | AP75 | AR | AR50 | ckpt | log |
---|---|---|---|---|---|---|---|---|
pose_hrnet_w32 | 256x192 | 0.323 | 0.762 | 0.219 | 0.366 | 0.789 | ckpt | log |
Topdown Heatmap + Resnet on Aic¶
SimpleBaseline2D (ECCV'2018)
@inproceedings{xiao2018simple,
title={Simple baselines for human pose estimation and tracking},
author={Xiao, Bin and Wu, Haiping and Wei, Yichen},
booktitle={Proceedings of the European conference on computer vision (ECCV)},
pages={466--481},
year={2018}
}
ResNet (CVPR'2016)
@inproceedings{he2016deep,
title={Deep residual learning for image recognition},
author={He, Kaiming and Zhang, Xiangyu and Ren, Shaoqing and Sun, Jian},
booktitle={Proceedings of the IEEE conference on computer vision and pattern recognition},
pages={770--778},
year={2016}
}
AI Challenger (ArXiv'2017)
@article{wu2017ai,
title={Ai challenger: A large-scale dataset for going deeper in image understanding},
author={Wu, Jiahong and Zheng, He and Zhao, Bo and Li, Yixin and Yan, Baoming and Liang, Rui and Wang, Wenjia and Zhou, Shipei and Lin, Guosen and Fu, Yanwei and others},
journal={arXiv preprint arXiv:1711.06475},
year={2017}
}
Results on AIC val set with ground-truth bounding boxes
Arch | Input Size | AP | AP50 | AP75 | AR | AR50 | ckpt | log |
---|---|---|---|---|---|---|---|---|
pose_resnet_101 | 256x192 | 0.294 | 0.736 | 0.174 | 0.337 | 0.763 | ckpt | log |
RHD (ICCV’2017)¶
Deeppose + Resnet on Rhd2d¶
DeepPose (CVPR'2014)
@inproceedings{toshev2014deeppose,
title={Deeppose: Human pose estimation via deep neural networks},
author={Toshev, Alexander and Szegedy, Christian},
booktitle={Proceedings of the IEEE conference on computer vision and pattern recognition},
pages={1653--1660},
year={2014}
}
ResNet (CVPR'2016)
@inproceedings{he2016deep,
title={Deep residual learning for image recognition},
author={He, Kaiming and Zhang, Xiangyu and Ren, Shaoqing and Sun, Jian},
booktitle={Proceedings of the IEEE conference on computer vision and pattern recognition},
pages={770--778},
year={2016}
}
RHD (ICCV'2017)
@TechReport{zb2017hand,
author={Christian Zimmermann and Thomas Brox},
title={Learning to Estimate 3D Hand Pose from Single RGB Images},
institution={arXiv:1705.01389},
year={2017},
note="https://arxiv.org/abs/1705.01389",
url="https://lmb.informatik.uni-freiburg.de/projects/hand3d/"
}
Results on RHD test set
Arch | Input Size | PCK@0.2 | AUC | EPE | ckpt | log |
---|---|---|---|---|---|---|
deeppose_resnet_50 | 256x256 | 0.988 | 0.865 | 3.29 | ckpt | log |
Topdown Heatmap + Mobilenetv2 on Rhd2d¶
MobilenetV2 (CVPR'2018)
@inproceedings{sandler2018mobilenetv2,
title={Mobilenetv2: Inverted residuals and linear bottlenecks},
author={Sandler, Mark and Howard, Andrew and Zhu, Menglong and Zhmoginov, Andrey and Chen, Liang-Chieh},
booktitle={Proceedings of the IEEE conference on computer vision and pattern recognition},
pages={4510--4520},
year={2018}
}
RHD (ICCV'2017)
@TechReport{zb2017hand,
author={Christian Zimmermann and Thomas Brox},
title={Learning to Estimate 3D Hand Pose from Single RGB Images},
institution={arXiv:1705.01389},
year={2017},
note="https://arxiv.org/abs/1705.01389",
url="https://lmb.informatik.uni-freiburg.de/projects/hand3d/"
}
Results on RHD test set
Arch | Input Size | PCK@0.2 | AUC | EPE | ckpt | log |
---|---|---|---|---|---|---|
pose_mobilenet_v2 | 256x256 | 0.985 | 0.883 | 2.80 | ckpt | log |
Topdown Heatmap + Resnet on Rhd2d¶
SimpleBaseline2D (ECCV'2018)
@inproceedings{xiao2018simple,
title={Simple baselines for human pose estimation and tracking},
author={Xiao, Bin and Wu, Haiping and Wei, Yichen},
booktitle={Proceedings of the European conference on computer vision (ECCV)},
pages={466--481},
year={2018}
}
ResNet (CVPR'2016)
@inproceedings{he2016deep,
title={Deep residual learning for image recognition},
author={He, Kaiming and Zhang, Xiangyu and Ren, Shaoqing and Sun, Jian},
booktitle={Proceedings of the IEEE conference on computer vision and pattern recognition},
pages={770--778},
year={2016}
}
RHD (ICCV'2017)
@TechReport{zb2017hand,
author={Christian Zimmermann and Thomas Brox},
title={Learning to Estimate 3D Hand Pose from Single RGB Images},
institution={arXiv:1705.01389},
year={2017},
note="https://arxiv.org/abs/1705.01389",
url="https://lmb.informatik.uni-freiburg.de/projects/hand3d/"
}
Results on RHD test set
Arch | Input Size | PCK@0.2 | AUC | EPE | ckpt | log |
---|---|---|---|---|---|---|
pose_resnet50 | 256x256 | 0.991 | 0.898 | 2.33 | ckpt | log |
Topdown Heatmap + Hrnetv2 on Rhd2d¶
HRNetv2 (TPAMI'2019)
@article{WangSCJDZLMTWLX19,
title={Deep High-Resolution Representation Learning for Visual Recognition},
author={Jingdong Wang and Ke Sun and Tianheng Cheng and
Borui Jiang and Chaorui Deng and Yang Zhao and Dong Liu and Yadong Mu and
Mingkui Tan and Xinggang Wang and Wenyu Liu and Bin Xiao},
journal={TPAMI},
year={2019}
}
RHD (ICCV'2017)
@TechReport{zb2017hand,
author={Christian Zimmermann and Thomas Brox},
title={Learning to Estimate 3D Hand Pose from Single RGB Images},
institution={arXiv:1705.01389},
year={2017},
note="https://arxiv.org/abs/1705.01389",
url="https://lmb.informatik.uni-freiburg.de/projects/hand3d/"
}
Results on RHD test set
Arch | Input Size | PCK@0.2 | AUC | EPE | ckpt | log |
---|---|---|---|---|---|---|
pose_hrnetv2_w18 | 256x256 | 0.992 | 0.902 | 2.21 | ckpt | log |
Topdown Heatmap + Hrnetv2 + Udp on Rhd2d¶
HRNetv2 (TPAMI'2019)
@article{WangSCJDZLMTWLX19,
title={Deep High-Resolution Representation Learning for Visual Recognition},
author={Jingdong Wang and Ke Sun and Tianheng Cheng and
Borui Jiang and Chaorui Deng and Yang Zhao and Dong Liu and Yadong Mu and
Mingkui Tan and Xinggang Wang and Wenyu Liu and Bin Xiao},
journal={TPAMI},
year={2019}
}
UDP (CVPR'2020)
@InProceedings{Huang_2020_CVPR,
author = {Huang, Junjie and Zhu, Zheng and Guo, Feng and Huang, Guan},
title = {The Devil Is in the Details: Delving Into Unbiased Data Processing for Human Pose Estimation},
booktitle = {The IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)},
month = {June},
year = {2020}
}
RHD (ICCV'2017)
@TechReport{zb2017hand,
author={Christian Zimmermann and Thomas Brox},
title={Learning to Estimate 3D Hand Pose from Single RGB Images},
institution={arXiv:1705.01389},
year={2017},
note="https://arxiv.org/abs/1705.01389",
url="https://lmb.informatik.uni-freiburg.de/projects/hand3d/"
}
Results on CMU Panoptic (MPII+NZSL val set)
Arch | Input Size | PCKh@0.7 | AUC | EPE | ckpt | log |
---|---|---|---|---|---|---|
pose_hrnetv2_w18_udp | 256x256 | 0.998 | 0.742 | 7.84 | ckpt | log |
Topdown Heatmap + Hrnetv2 + Dark on Rhd2d¶
HRNetv2 (TPAMI'2019)
@article{WangSCJDZLMTWLX19,
title={Deep High-Resolution Representation Learning for Visual Recognition},
author={Jingdong Wang and Ke Sun and Tianheng Cheng and
Borui Jiang and Chaorui Deng and Yang Zhao and Dong Liu and Yadong Mu and
Mingkui Tan and Xinggang Wang and Wenyu Liu and Bin Xiao},
journal={TPAMI},
year={2019}
}
DarkPose (CVPR'2020)
@inproceedings{zhang2020distribution,
title={Distribution-aware coordinate representation for human pose estimation},
author={Zhang, Feng and Zhu, Xiatian and Dai, Hanbin and Ye, Mao and Zhu, Ce},
booktitle={Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition},
pages={7093--7102},
year={2020}
}
RHD (ICCV'2017)
@TechReport{zb2017hand,
author={Christian Zimmermann and Thomas Brox},
title={Learning to Estimate 3D Hand Pose from Single RGB Images},
institution={arXiv:1705.01389},
year={2017},
note="https://arxiv.org/abs/1705.01389",
url="https://lmb.informatik.uni-freiburg.de/projects/hand3d/"
}
Results on RHD test set
Arch | Input Size | PCK@0.2 | AUC | EPE | ckpt | log |
---|---|---|---|---|---|---|
pose_hrnetv2_w18_dark | 256x256 | 0.992 | 0.903 | 2.17 | ckpt | log |
Human3.6M (TPAMI’2014)¶
Topdown Heatmap + Hrnet on H36m¶
HRNet (CVPR'2019)
@inproceedings{sun2019deep,
title={Deep high-resolution representation learning for human pose estimation},
author={Sun, Ke and Xiao, Bin and Liu, Dong and Wang, Jingdong},
booktitle={Proceedings of the IEEE conference on computer vision and pattern recognition},
pages={5693--5703},
year={2019}
}
Human3.6M (TPAMI'2014)
@article{h36m_pami,
author = {Ionescu, Catalin and Papava, Dragos and Olaru, Vlad and Sminchisescu, Cristian},
title = {Human3.6M: Large Scale Datasets and Predictive Methods for 3D Human Sensing in Natural Environments},
journal = {IEEE Transactions on Pattern Analysis and Machine Intelligence},
publisher = {IEEE Computer Society},
volume = {36},
number = {7},
pages = {1325-1339},
month = {jul},
year = {2014}
}
Results on Human3.6M test set with ground truth 2D detections
Arch | Input Size | EPE | PCK | ckpt | log |
---|---|---|---|---|---|
pose_hrnet_w32 | 256x256 | 9.43 | 0.911 | ckpt | log |
pose_hrnet_w48 | 256x256 | 7.36 | 0.932 | ckpt | log |
Pose Lift + Simplebaseline3d on H36m¶
SimpleBaseline3D (ICCV'2017)
@inproceedings{martinez_2017_3dbaseline,
title={A simple yet effective baseline for 3d human pose estimation},
author={Martinez, Julieta and Hossain, Rayat and Romero, Javier and Little, James J.},
booktitle={ICCV},
year={2017}
}
Human3.6M (TPAMI'2014)
@article{h36m_pami,
author = {Ionescu, Catalin and Papava, Dragos and Olaru, Vlad and Sminchisescu, Cristian},
title = {Human3.6M: Large Scale Datasets and Predictive Methods for 3D Human Sensing in Natural Environments},
journal = {IEEE Transactions on Pattern Analysis and Machine Intelligence},
publisher = {IEEE Computer Society},
volume = {36},
number = {7},
pages = {1325-1339},
month = {jul},
year = {2014}
}
Results on Human3.6M dataset with ground truth 2D detections
Arch | MPJPE | P-MPJPE | ckpt | log |
---|---|---|---|---|
simple_baseline_3d_tcn1 | 43.4 | 34.3 | ckpt | log |
1 Differing from the original paper, we didn’t apply the max-norm constraint
because we found this led to a better convergence and performance.
Video Pose Lift + Videopose3d on H36m¶
VideoPose3D (CVPR'2019)
@inproceedings{pavllo20193d,
title={3d human pose estimation in video with temporal convolutions and semi-supervised training},
author={Pavllo, Dario and Feichtenhofer, Christoph and Grangier, David and Auli, Michael},
booktitle={Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition},
pages={7753--7762},
year={2019}
}
Human3.6M (TPAMI'2014)
@article{h36m_pami,
author = {Ionescu, Catalin and Papava, Dragos and Olaru, Vlad and Sminchisescu, Cristian},
title = {Human3.6M: Large Scale Datasets and Predictive Methods for 3D Human Sensing in Natural Environments},
journal = {IEEE Transactions on Pattern Analysis and Machine Intelligence},
publisher = {IEEE Computer Society},
volume = {36},
number = {7},
pages = {1325-1339},
month = {jul},
year = {2014}
}
Results on Human3.6M dataset with ground truth 2D detections, supervised training
Arch | Receptive Field | MPJPE | P-MPJPE | ckpt | log |
---|---|---|---|---|---|
VideoPose3D | 27 | 40.0 | 30.1 | ckpt | log |
VideoPose3D | 81 | 38.9 | 29.2 | ckpt | log |
VideoPose3D | 243 | 37.6 | 28.3 | ckpt | log |
Results on Human3.6M dataset with CPN 2D detections1, supervised training
Arch | Receptive Field | MPJPE | P-MPJPE | ckpt | log |
---|---|---|---|---|---|
VideoPose3D | 1 | 52.9 | 41.3 | ckpt | log |
VideoPose3D | 243 | 47.9 | 38.0 | ckpt | log |
Results on Human3.6M dataset with ground truth 2D detections, semi-supervised training
Training Data | Arch | Receptive Field | MPJPE | P-MPJPE | N-MPJPE | ckpt | log |
---|---|---|---|---|---|---|---|
10% S1 | VideoPose3D | 27 | 58.1 | 42.8 | 54.7 | ckpt | log |
Results on Human3.6M dataset with CPN 2D detections1, semi-supervised training
Training Data | Arch | Receptive Field | MPJPE | P-MPJPE | N-MPJPE | ckpt | log |
---|---|---|---|---|---|---|---|
10% S1 | VideoPose3D | 27 | 67.4 | 50.1 | 63.2 | ckpt | log |
1 CPN 2D detections are provided by official repo. The reformatted version used in this repository can be downloaded from train_detection and test_detection.
HMR + Resnet on Mixed¶
HMR (CVPR'2018)
@inProceedings{kanazawaHMR18,
title={End-to-end Recovery of Human Shape and Pose},
author = {Angjoo Kanazawa
and Michael J. Black
and David W. Jacobs
and Jitendra Malik},
booktitle={Computer Vision and Pattern Recognition (CVPR)},
year={2018}
}
ResNet (CVPR'2016)
@inproceedings{he2016deep,
title={Deep residual learning for image recognition},
author={He, Kaiming and Zhang, Xiangyu and Ren, Shaoqing and Sun, Jian},
booktitle={Proceedings of the IEEE conference on computer vision and pattern recognition},
pages={770--778},
year={2016}
}
Human3.6M (TPAMI'2014)
@article{h36m_pami,
author = {Ionescu, Catalin and Papava, Dragos and Olaru, Vlad and Sminchisescu, Cristian},
title = {Human3.6M: Large Scale Datasets and Predictive Methods for 3D Human Sensing in Natural Environments},
journal = {IEEE Transactions on Pattern Analysis and Machine Intelligence},
publisher = {IEEE Computer Society},
volume = {36},
number = {7},
pages = {1325-1339},
month = {jul},
year = {2014}
}
Results on Human3.6M with ground-truth bounding box having MPJPE-PA of 52.60 mm on Protocol2
Arch | Input Size | MPJPE (P1) | MPJPE-PA (P1) | MPJPE (P2) | MPJPE-PA (P2) | ckpt | log |
---|---|---|---|---|---|---|---|
hmr_resnet_50 | 224x224 | 80.75 | 55.08 | 80.35 | 52.60 | ckpt | log |
MPII-TRB (ICCV’2019)¶
Topdown Heatmap + Resnet + Mpii on Mpii_trb¶
SimpleBaseline2D (ECCV'2018)
@inproceedings{xiao2018simple,
title={Simple baselines for human pose estimation and tracking},
author={Xiao, Bin and Wu, Haiping and Wei, Yichen},
booktitle={Proceedings of the European conference on computer vision (ECCV)},
pages={466--481},
year={2018}
}
ResNet (CVPR'2016)
@inproceedings{he2016deep,
title={Deep residual learning for image recognition},
author={He, Kaiming and Zhang, Xiangyu and Ren, Shaoqing and Sun, Jian},
booktitle={Proceedings of the IEEE conference on computer vision and pattern recognition},
pages={770--778},
year={2016}
}
MPII-TRB (ICCV'2019)
@inproceedings{duan2019trb,
title={TRB: A Novel Triplet Representation for Understanding 2D Human Body},
author={Duan, Haodong and Lin, Kwan-Yee and Jin, Sheng and Liu, Wentao and Qian, Chen and Ouyang, Wanli},
booktitle={Proceedings of the IEEE International Conference on Computer Vision},
pages={9479--9488},
year={2019}
}
Results on MPII-TRB val set
Arch | Input Size | Skeleton Acc | Contour Acc | Mean Acc | ckpt | log |
---|---|---|---|---|---|---|
pose_resnet_50 | 256x256 | 0.887 | 0.858 | 0.868 | ckpt | log |
pose_resnet_101 | 256x256 | 0.890 | 0.863 | 0.873 | ckpt | log |
pose_resnet_152 | 256x256 | 0.897 | 0.868 | 0.879 | ckpt | log |
COFW (ICCV’2013)¶
Topdown Heatmap + Hrnetv2 on Cofw¶
HRNetv2 (TPAMI'2019)
@article{WangSCJDZLMTWLX19,
title={Deep High-Resolution Representation Learning for Visual Recognition},
author={Jingdong Wang and Ke Sun and Tianheng Cheng and
Borui Jiang and Chaorui Deng and Yang Zhao and Dong Liu and Yadong Mu and
Mingkui Tan and Xinggang Wang and Wenyu Liu and Bin Xiao},
journal={TPAMI},
year={2019}
}
COFW (ICCV'2013)
@inproceedings{burgos2013robust,
title={Robust face landmark estimation under occlusion},
author={Burgos-Artizzu, Xavier P and Perona, Pietro and Doll{\'a}r, Piotr},
booktitle={Proceedings of the IEEE international conference on computer vision},
pages={1513--1520},
year={2013}
}
Results on COFW dataset
The model is trained on COFW train.
Arch | Input Size | NME | ckpt | log |
---|---|---|---|---|
pose_hrnetv2_w18 | 256x256 | 3.40 | ckpt | log |
CrowdPose (CVPR’2019)¶
Associative Embedding + Higherhrnet on Crowdpose¶
Associative Embedding (NIPS'2017)
@inproceedings{newell2017associative,
title={Associative embedding: End-to-end learning for joint detection and grouping},
author={Newell, Alejandro and Huang, Zhiao and Deng, Jia},
booktitle={Advances in neural information processing systems},
pages={2277--2287},
year={2017}
}
HigherHRNet (CVPR'2020)
@inproceedings{cheng2020higherhrnet,
title={HigherHRNet: Scale-Aware Representation Learning for Bottom-Up Human Pose Estimation},
author={Cheng, Bowen and Xiao, Bin and Wang, Jingdong and Shi, Honghui and Huang, Thomas S and Zhang, Lei},
booktitle={Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition},
pages={5386--5395},
year={2020}
}
CrowdPose (CVPR'2019)
@article{li2018crowdpose,
title={CrowdPose: Efficient Crowded Scenes Pose Estimation and A New Benchmark},
author={Li, Jiefeng and Wang, Can and Zhu, Hao and Mao, Yihuan and Fang, Hao-Shu and Lu, Cewu},
journal={arXiv preprint arXiv:1812.00324},
year={2018}
}
Results on CrowdPose test without multi-scale test
Arch | Input Size | AP | AP50 | AP75 | AP (E) | AP (M) | AP (H) | ckpt | log |
---|---|---|---|---|---|---|---|---|---|
HigherHRNet-w32 | 512x512 | 0.655 | 0.859 | 0.705 | 0.728 | 0.660 | 0.577 | ckpt | log |
Results on CrowdPose test with multi-scale test. 2 scales ([2, 1]) are used
Arch | Input Size | AP | AP50 | AP75 | AP (E) | AP (M) | AP (H) | ckpt | log |
---|---|---|---|---|---|---|---|---|---|
HigherHRNet-w32 | 512x512 | 0.661 | 0.864 | 0.710 | 0.742 | 0.670 | 0.566 | ckpt | log |
Topdown Heatmap + Hrnet on Crowdpose¶
HRNet (CVPR'2019)
@inproceedings{sun2019deep,
title={Deep high-resolution representation learning for human pose estimation},
author={Sun, Ke and Xiao, Bin and Liu, Dong and Wang, Jingdong},
booktitle={Proceedings of the IEEE conference on computer vision and pattern recognition},
pages={5693--5703},
year={2019}
}
CrowdPose (CVPR'2019)
@article{li2018crowdpose,
title={CrowdPose: Efficient Crowded Scenes Pose Estimation and A New Benchmark},
author={Li, Jiefeng and Wang, Can and Zhu, Hao and Mao, Yihuan and Fang, Hao-Shu and Lu, Cewu},
journal={arXiv preprint arXiv:1812.00324},
year={2018}
}
Results on CrowdPose test with YOLOv3 human detector
Arch | Input Size | AP | AP50 | AP75 | AP (E) | AP (M) | AP (H) | ckpt | log |
---|---|---|---|---|---|---|---|---|---|
pose_hrnet_w32 | 256x192 | 0.675 | 0.825 | 0.729 | 0.770 | 0.687 | 0.553 | ckpt | log |
Topdown Heatmap + Resnet on Crowdpose¶
SimpleBaseline2D (ECCV'2018)
@inproceedings{xiao2018simple,
title={Simple baselines for human pose estimation and tracking},
author={Xiao, Bin and Wu, Haiping and Wei, Yichen},
booktitle={Proceedings of the European conference on computer vision (ECCV)},
pages={466--481},
year={2018}
}
ResNet (CVPR'2016)
@inproceedings{he2016deep,
title={Deep residual learning for image recognition},
author={He, Kaiming and Zhang, Xiangyu and Ren, Shaoqing and Sun, Jian},
booktitle={Proceedings of the IEEE conference on computer vision and pattern recognition},
pages={770--778},
year={2016}
}
CrowdPose (CVPR'2019)
@article{li2018crowdpose,
title={CrowdPose: Efficient Crowded Scenes Pose Estimation and A New Benchmark},
author={Li, Jiefeng and Wang, Can and Zhu, Hao and Mao, Yihuan and Fang, Hao-Shu and Lu, Cewu},
journal={arXiv preprint arXiv:1812.00324},
year={2018}
}
Results on CrowdPose test with YOLOv3 human detector
Arch | Input Size | AP | AP50 | AP75 | AP (E) | AP (M) | AP (H) | ckpt | log |
---|---|---|---|---|---|---|---|---|---|
pose_resnet_50 | 256x192 | 0.637 | 0.808 | 0.692 | 0.739 | 0.650 | 0.506 | ckpt | log |
pose_resnet_101 | 256x192 | 0.647 | 0.810 | 0.703 | 0.744 | 0.658 | 0.522 | ckpt | log |
pose_resnet_101 | 320x256 | 0.661 | 0.821 | 0.714 | 0.759 | 0.671 | 0.536 | ckpt | log |
pose_resnet_152 | 256x192 | 0.656 | 0.818 | 0.712 | 0.754 | 0.666 | 0.532 | ckpt | log |
OCHuman (CVPR’2019)¶
Topdown Heatmap + Resnet on Ochuman¶
SimpleBaseline2D (ECCV'2018)
@inproceedings{xiao2018simple,
title={Simple baselines for human pose estimation and tracking},
author={Xiao, Bin and Wu, Haiping and Wei, Yichen},
booktitle={Proceedings of the European conference on computer vision (ECCV)},
pages={466--481},
year={2018}
}
ResNet (CVPR'2016)
@inproceedings{he2016deep,
title={Deep residual learning for image recognition},
author={He, Kaiming and Zhang, Xiangyu and Ren, Shaoqing and Sun, Jian},
booktitle={Proceedings of the IEEE conference on computer vision and pattern recognition},
pages={770--778},
year={2016}
}
OCHuman (CVPR'2019)
@inproceedings{zhang2019pose2seg,
title={Pose2seg: Detection free human instance segmentation},
author={Zhang, Song-Hai and Li, Ruilong and Dong, Xin and Rosin, Paul and Cai, Zixi and Han, Xi and Yang, Dingcheng and Huang, Haozhi and Hu, Shi-Min},
booktitle={Proceedings of the IEEE conference on computer vision and pattern recognition},
pages={889--898},
year={2019}
}
Results on OCHuman test dataset with ground-truth bounding boxes
Following the common setting, the models are trained on COCO train dataset, and evaluate on OCHuman dataset.
Arch | Input Size | AP | AP50 | AP75 | AR | AR50 | ckpt | log |
---|---|---|---|---|---|---|---|---|
pose_resnet_50 | 256x192 | 0.546 | 0.726 | 0.593 | 0.592 | 0.755 | ckpt | log |
pose_resnet_50 | 384x288 | 0.539 | 0.723 | 0.574 | 0.588 | 0.756 | ckpt | log |
pose_resnet_101 | 256x192 | 0.559 | 0.724 | 0.606 | 0.605 | 0.751 | ckpt | log |
pose_resnet_101 | 384x288 | 0.571 | 0.715 | 0.615 | 0.615 | 0.748 | ckpt | log |
pose_resnet_152 | 256x192 | 0.570 | 0.725 | 0.617 | 0.616 | 0.754 | ckpt | log |
pose_resnet_152 | 384x288 | 0.582 | 0.723 | 0.627 | 0.627 | 0.752 | ckpt | log |
Topdown Heatmap + Hrnet on Ochuman¶
HRNet (CVPR'2019)
@inproceedings{sun2019deep,
title={Deep high-resolution representation learning for human pose estimation},
author={Sun, Ke and Xiao, Bin and Liu, Dong and Wang, Jingdong},
booktitle={Proceedings of the IEEE conference on computer vision and pattern recognition},
pages={5693--5703},
year={2019}
}
OCHuman (CVPR'2019)
@inproceedings{zhang2019pose2seg,
title={Pose2seg: Detection free human instance segmentation},
author={Zhang, Song-Hai and Li, Ruilong and Dong, Xin and Rosin, Paul and Cai, Zixi and Han, Xi and Yang, Dingcheng and Huang, Haozhi and Hu, Shi-Min},
booktitle={Proceedings of the IEEE conference on computer vision and pattern recognition},
pages={889--898},
year={2019}
}
Results on OCHuman test dataset with ground-truth bounding boxes
Following the common setting, the models are trained on COCO train dataset, and evaluate on OCHuman dataset.
Arch | Input Size | AP | AP50 | AP75 | AR | AR50 | ckpt | log |
---|---|---|---|---|---|---|---|---|
pose_hrnet_w32 | 256x192 | 0.591 | 0.748 | 0.641 | 0.631 | 0.775 | ckpt | log |
pose_hrnet_w32 | 384x288 | 0.606 | 0.748 | 0.650 | 0.647 | 0.776 | ckpt | log |
pose_hrnet_w48 | 256x192 | 0.611 | 0.752 | 0.663 | 0.648 | 0.778 | ckpt | log |
pose_hrnet_w48 | 384x288 | 0.616 | 0.749 | 0.663 | 0.653 | 0.773 | ckpt | log |
Halpe (CVPR’2020)¶
Topdown Heatmap + Hrnet + Dark on Halpe¶
HRNet (CVPR'2019)
@inproceedings{sun2019deep,
title={Deep high-resolution representation learning for human pose estimation},
author={Sun, Ke and Xiao, Bin and Liu, Dong and Wang, Jingdong},
booktitle={Proceedings of the IEEE conference on computer vision and pattern recognition},
pages={5693--5703},
year={2019}
}
DarkPose (CVPR'2020)
@inproceedings{zhang2020distribution,
title={Distribution-aware coordinate representation for human pose estimation},
author={Zhang, Feng and Zhu, Xiatian and Dai, Hanbin and Ye, Mao and Zhu, Ce},
booktitle={Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition},
pages={7093--7102},
year={2020}
}
Halpe (CVPR'2020)
@inproceedings{li2020pastanet,
title={PaStaNet: Toward Human Activity Knowledge Engine},
author={Li, Yong-Lu and Xu, Liang and Liu, Xinpeng and Huang, Xijie and Xu, Yue and Wang, Shiyi and Fang, Hao-Shu and Ma, Ze and Chen, Mingyang and Lu, Cewu},
booktitle={CVPR},
year={2020}
}
Results on Halpe v1.0 val with detector having human AP of 56.4 on COCO val2017 dataset
Arch | Input Size | Whole AP | Whole AR | ckpt | log |
---|---|---|---|---|---|
pose_hrnet_w48_dark+ | 384x288 | 0.531 | 0.642 | ckpt | log |
Note: +
means the model is first pre-trained on original COCO dataset, and then fine-tuned on Halpe dataset. We find this will lead to better performance.
Vinegar Fly (Nature Methods’2019)¶
Topdown Heatmap + Resnet on Fly¶
SimpleBaseline2D (ECCV'2018)
@inproceedings{xiao2018simple,
title={Simple baselines for human pose estimation and tracking},
author={Xiao, Bin and Wu, Haiping and Wei, Yichen},
booktitle={Proceedings of the European conference on computer vision (ECCV)},
pages={466--481},
year={2018}
}
Vinegar Fly (Nature Methods'2019)
@article{pereira2019fast,
title={Fast animal pose estimation using deep neural networks},
author={Pereira, Talmo D and Aldarondo, Diego E and Willmore, Lindsay and Kislin, Mikhail and Wang, Samuel S-H and Murthy, Mala and Shaevitz, Joshua W},
journal={Nature methods},
volume={16},
number={1},
pages={117--125},
year={2019},
publisher={Nature Publishing Group}
}
Results on Vinegar Fly test set
Arch | Input Size | PCK@0.2 | AUC | EPE | ckpt | log |
---|---|---|---|---|---|---|
pose_resnet_50 | 192x192 | 0.996 | 0.910 | 2.00 | ckpt | log |
pose_resnet_101 | 192x192 | 0.996 | 0.912 | 1.95 | ckpt | log |
pose_resnet_152 | 192x192 | 0.997 | 0.917 | 1.78 | ckpt | log |
CMU Panoptic HandDB (CVPR’2017)¶
Deeppose + Resnet on Panoptic2d¶
DeepPose (CVPR'2014)
@inproceedings{toshev2014deeppose,
title={Deeppose: Human pose estimation via deep neural networks},
author={Toshev, Alexander and Szegedy, Christian},
booktitle={Proceedings of the IEEE conference on computer vision and pattern recognition},
pages={1653--1660},
year={2014}
}
ResNet (CVPR'2016)
@inproceedings{he2016deep,
title={Deep residual learning for image recognition},
author={He, Kaiming and Zhang, Xiangyu and Ren, Shaoqing and Sun, Jian},
booktitle={Proceedings of the IEEE conference on computer vision and pattern recognition},
pages={770--778},
year={2016}
}
CMU Panoptic HandDB (CVPR'2017)
@inproceedings{simon2017hand,
title={Hand keypoint detection in single images using multiview bootstrapping},
author={Simon, Tomas and Joo, Hanbyul and Matthews, Iain and Sheikh, Yaser},
booktitle={Proceedings of the IEEE conference on Computer Vision and Pattern Recognition},
pages={1145--1153},
year={2017}
}
Results on CMU Panoptic (MPII+NZSL val set)
Arch | Input Size | PCKh@0.7 | AUC | EPE | ckpt | log |
---|---|---|---|---|---|---|
deeppose_resnet_50 | 256x256 | 0.999 | 0.686 | 9.36 | ckpt | log |
Topdown Heatmap + Hrnetv2 on Panoptic2d¶
HRNetv2 (TPAMI'2019)
@article{WangSCJDZLMTWLX19,
title={Deep High-Resolution Representation Learning for Visual Recognition},
author={Jingdong Wang and Ke Sun and Tianheng Cheng and
Borui Jiang and Chaorui Deng and Yang Zhao and Dong Liu and Yadong Mu and
Mingkui Tan and Xinggang Wang and Wenyu Liu and Bin Xiao},
journal={TPAMI},
year={2019}
}
CMU Panoptic HandDB (CVPR'2017)
@inproceedings{simon2017hand,
title={Hand keypoint detection in single images using multiview bootstrapping},
author={Simon, Tomas and Joo, Hanbyul and Matthews, Iain and Sheikh, Yaser},
booktitle={Proceedings of the IEEE conference on Computer Vision and Pattern Recognition},
pages={1145--1153},
year={2017}
}
Results on CMU Panoptic (MPII+NZSL val set)
Arch | Input Size | PCKh@0.7 | AUC | EPE | ckpt | log |
---|---|---|---|---|---|---|
pose_hrnetv2_w18 | 256x256 | 0.999 | 0.744 | 7.79 | ckpt | log |
Topdown Heatmap + Mobilenetv2 on Panoptic2d¶
MobilenetV2 (CVPR'2018)
@inproceedings{sandler2018mobilenetv2,
title={Mobilenetv2: Inverted residuals and linear bottlenecks},
author={Sandler, Mark and Howard, Andrew and Zhu, Menglong and Zhmoginov, Andrey and Chen, Liang-Chieh},
booktitle={Proceedings of the IEEE conference on computer vision and pattern recognition},
pages={4510--4520},
year={2018}
}
CMU Panoptic HandDB (CVPR'2017)
@inproceedings{simon2017hand,
title={Hand keypoint detection in single images using multiview bootstrapping},
author={Simon, Tomas and Joo, Hanbyul and Matthews, Iain and Sheikh, Yaser},
booktitle={Proceedings of the IEEE conference on Computer Vision and Pattern Recognition},
pages={1145--1153},
year={2017}
}
Results on CMU Panoptic (MPII+NZSL val set)
Arch | Input Size | PCKh@0.7 | AUC | EPE | ckpt | log |
---|---|---|---|---|---|---|
pose_mobilenet_v2 | 256x256 | 0.998 | 0.694 | 9.70 | ckpt | log |
Topdown Heatmap + Resnet on Panoptic2d¶
SimpleBaseline2D (ECCV'2018)
@inproceedings{xiao2018simple,
title={Simple baselines for human pose estimation and tracking},
author={Xiao, Bin and Wu, Haiping and Wei, Yichen},
booktitle={Proceedings of the European conference on computer vision (ECCV)},
pages={466--481},
year={2018}
}
ResNet (CVPR'2016)
@inproceedings{he2016deep,
title={Deep residual learning for image recognition},
author={He, Kaiming and Zhang, Xiangyu and Ren, Shaoqing and Sun, Jian},
booktitle={Proceedings of the IEEE conference on computer vision and pattern recognition},
pages={770--778},
year={2016}
}
CMU Panoptic HandDB (CVPR'2017)
@inproceedings{simon2017hand,
title={Hand keypoint detection in single images using multiview bootstrapping},
author={Simon, Tomas and Joo, Hanbyul and Matthews, Iain and Sheikh, Yaser},
booktitle={Proceedings of the IEEE conference on Computer Vision and Pattern Recognition},
pages={1145--1153},
year={2017}
}
Results on CMU Panoptic (MPII+NZSL val set)
Arch | Input Size | PCKh@0.7 | AUC | EPE | ckpt | log |
---|---|---|---|---|---|---|
pose_resnet_50 | 256x256 | 0.999 | 0.713 | 9.00 | ckpt | log |
Topdown Heatmap + Hrnetv2 + Udp on Panoptic2d¶
HRNetv2 (TPAMI'2019)
@article{WangSCJDZLMTWLX19,
title={Deep High-Resolution Representation Learning for Visual Recognition},
author={Jingdong Wang and Ke Sun and Tianheng Cheng and
Borui Jiang and Chaorui Deng and Yang Zhao and Dong Liu and Yadong Mu and
Mingkui Tan and Xinggang Wang and Wenyu Liu and Bin Xiao},
journal={TPAMI},
year={2019}
}
UDP (CVPR'2020)
@InProceedings{Huang_2020_CVPR,
author = {Huang, Junjie and Zhu, Zheng and Guo, Feng and Huang, Guan},
title = {The Devil Is in the Details: Delving Into Unbiased Data Processing for Human Pose Estimation},
booktitle = {The IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)},
month = {June},
year = {2020}
}
CMU Panoptic HandDB (CVPR'2017)
@inproceedings{simon2017hand,
title={Hand keypoint detection in single images using multiview bootstrapping},
author={Simon, Tomas and Joo, Hanbyul and Matthews, Iain and Sheikh, Yaser},
booktitle={Proceedings of the IEEE conference on Computer Vision and Pattern Recognition},
pages={1145--1153},
year={2017}
}
Results on CMU Panoptic (MPII+NZSL val set)
Arch | Input Size | PCKh@0.7 | AUC | EPE | ckpt | log |
---|---|---|---|---|---|---|
pose_hrnetv2_w18_udp | 256x256 | 0.998 | 0.742 | 7.84 | ckpt | log |
Topdown Heatmap + Hrnetv2 + Dark on Panoptic2d¶
HRNetv2 (TPAMI'2019)
@article{WangSCJDZLMTWLX19,
title={Deep High-Resolution Representation Learning for Visual Recognition},
author={Jingdong Wang and Ke Sun and Tianheng Cheng and
Borui Jiang and Chaorui Deng and Yang Zhao and Dong Liu and Yadong Mu and
Mingkui Tan and Xinggang Wang and Wenyu Liu and Bin Xiao},
journal={TPAMI},
year={2019}
}
DarkPose (CVPR'2020)
@inproceedings{zhang2020distribution,
title={Distribution-aware coordinate representation for human pose estimation},
author={Zhang, Feng and Zhu, Xiatian and Dai, Hanbin and Ye, Mao and Zhu, Ce},
booktitle={Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition},
pages={7093--7102},
year={2020}
}
CMU Panoptic HandDB (CVPR'2017)
@inproceedings{simon2017hand,
title={Hand keypoint detection in single images using multiview bootstrapping},
author={Simon, Tomas and Joo, Hanbyul and Matthews, Iain and Sheikh, Yaser},
booktitle={Proceedings of the IEEE conference on Computer Vision and Pattern Recognition},
pages={1145--1153},
year={2017}
}
Results on CMU Panoptic (MPII+NZSL val set)
Arch | Input Size | PCKh@0.7 | AUC | EPE | ckpt | log |
---|---|---|---|---|---|---|
pose_hrnetv2_w18_dark | 256x256 | 0.999 | 0.745 | 7.77 | ckpt | log |
Animal-Pose (ICCV’2019)¶
Topdown Heatmap + Hrnet on Animalpose¶
HRNet (CVPR'2019)
@inproceedings{sun2019deep,
title={Deep high-resolution representation learning for human pose estimation},
author={Sun, Ke and Xiao, Bin and Liu, Dong and Wang, Jingdong},
booktitle={Proceedings of the IEEE conference on computer vision and pattern recognition},
pages={5693--5703},
year={2019}
}
Animal-Pose (ICCV'2019)
@InProceedings{Cao_2019_ICCV,
author = {Cao, Jinkun and Tang, Hongyang and Fang, Hao-Shu and Shen, Xiaoyong and Lu, Cewu and Tai, Yu-Wing},
title = {Cross-Domain Adaptation for Animal Pose Estimation},
booktitle = {The IEEE International Conference on Computer Vision (ICCV)},
month = {October},
year = {2019}
}
Results on AnimalPose validation set (1117 instances)
Arch | Input Size | AP | AP50 | AP75 | AR | AR50 | ckpt | log |
---|---|---|---|---|---|---|---|---|
pose_hrnet_w32 | 256x256 | 0.736 | 0.959 | 0.832 | 0.775 | 0.966 | ckpt | log |
pose_hrnet_w48 | 256x256 | 0.737 | 0.959 | 0.823 | 0.778 | 0.962 | ckpt | log |
Topdown Heatmap + Resnet on Animalpose¶
SimpleBaseline2D (ECCV'2018)
@inproceedings{xiao2018simple,
title={Simple baselines for human pose estimation and tracking},
author={Xiao, Bin and Wu, Haiping and Wei, Yichen},
booktitle={Proceedings of the European conference on computer vision (ECCV)},
pages={466--481},
year={2018}
}
Animal-Pose (ICCV'2019)
@InProceedings{Cao_2019_ICCV,
author = {Cao, Jinkun and Tang, Hongyang and Fang, Hao-Shu and Shen, Xiaoyong and Lu, Cewu and Tai, Yu-Wing},
title = {Cross-Domain Adaptation for Animal Pose Estimation},
booktitle = {The IEEE International Conference on Computer Vision (ICCV)},
month = {October},
year = {2019}
}
Results on AnimalPose validation set (1117 instances)
Arch | Input Size | AP | AP50 | AP75 | AR | AR50 | ckpt | log |
---|---|---|---|---|---|---|---|---|
pose_resnet_50 | 256x256 | 0.688 | 0.945 | 0.772 | 0.733 | 0.952 | ckpt | log |
pose_resnet_101 | 256x256 | 0.696 | 0.948 | 0.785 | 0.737 | 0.954 | ckpt | log |
pose_resnet_152 | 256x256 | 0.709 | 0.948 | 0.797 | 0.749 | 0.951 | ckpt | log |
COCO (ECCV’2014)¶
Associative Embedding + Mobilenetv2 on Coco¶
Associative Embedding (NIPS'2017)
@inproceedings{newell2017associative,
title={Associative embedding: End-to-end learning for joint detection and grouping},
author={Newell, Alejandro and Huang, Zhiao and Deng, Jia},
booktitle={Advances in neural information processing systems},
pages={2277--2287},
year={2017}
}
MobilenetV2 (CVPR'2018)
@inproceedings{sandler2018mobilenetv2,
title={Mobilenetv2: Inverted residuals and linear bottlenecks},
author={Sandler, Mark and Howard, Andrew and Zhu, Menglong and Zhmoginov, Andrey and Chen, Liang-Chieh},
booktitle={Proceedings of the IEEE conference on computer vision and pattern recognition},
pages={4510--4520},
year={2018}
}
COCO (ECCV'2014)
@inproceedings{lin2014microsoft,
title={Microsoft coco: Common objects in context},
author={Lin, Tsung-Yi and Maire, Michael and Belongie, Serge and Hays, James and Perona, Pietro and Ramanan, Deva and Doll{\'a}r, Piotr and Zitnick, C Lawrence},
booktitle={European conference on computer vision},
pages={740--755},
year={2014},
organization={Springer}
}
Results on COCO val2017 without multi-scale test
Arch | Input Size | AP | AP50 | AP75 | AR | AR50 | ckpt | log |
---|---|---|---|---|---|---|---|---|
pose_mobilenetv2 | 512x512 | 0.380 | 0.671 | 0.368 | 0.473 | 0.741 | ckpt | log |
Results on COCO val2017 with multi-scale test. 3 default scales ([2, 1, 0.5]) are used
Arch | Input Size | AP | AP50 | AP75 | AR | AR50 | ckpt | log |
---|---|---|---|---|---|---|---|---|
pose_mobilenetv2 | 512x512 | 0.442 | 0.696 | 0.422 | 0.517 | 0.766 | ckpt | log |
Associative Embedding + Resnet on Coco¶
Associative Embedding (NIPS'2017)
@inproceedings{newell2017associative,
title={Associative embedding: End-to-end learning for joint detection and grouping},
author={Newell, Alejandro and Huang, Zhiao and Deng, Jia},
booktitle={Advances in neural information processing systems},
pages={2277--2287},
year={2017}
}
ResNet (CVPR'2016)
@inproceedings{he2016deep,
title={Deep residual learning for image recognition},
author={He, Kaiming and Zhang, Xiangyu and Ren, Shaoqing and Sun, Jian},
booktitle={Proceedings of the IEEE conference on computer vision and pattern recognition},
pages={770--778},
year={2016}
}
COCO (ECCV'2014)
@inproceedings{lin2014microsoft,
title={Microsoft coco: Common objects in context},
author={Lin, Tsung-Yi and Maire, Michael and Belongie, Serge and Hays, James and Perona, Pietro and Ramanan, Deva and Doll{\'a}r, Piotr and Zitnick, C Lawrence},
booktitle={European conference on computer vision},
pages={740--755},
year={2014},
organization={Springer}
}
Results on COCO val2017 without multi-scale test
Arch | Input Size | AP | AP50 | AP75 | AR | AR50 | ckpt | log |
---|---|---|---|---|---|---|---|---|
pose_resnet_50 | 512x512 | 0.466 | 0.742 | 0.479 | 0.552 | 0.797 | ckpt | log |
pose_resnet_50 | 640x640 | 0.479 | 0.757 | 0.487 | 0.566 | 0.810 | ckpt | log |
pose_resnet_101 | 512x512 | 0.554 | 0.807 | 0.599 | 0.622 | 0.841 | ckpt | log |
pose_resnet_152 | 512x512 | 0.595 | 0.829 | 0.648 | 0.651 | 0.856 | ckpt | log |
Results on COCO val2017 with multi-scale test. 3 default scales ([2, 1, 0.5]) are used
Arch | Input Size | AP | AP50 | AP75 | AR | AR50 | ckpt | log |
---|---|---|---|---|---|---|---|---|
pose_resnet_50 | 512x512 | 0.503 | 0.765 | 0.521 | 0.591 | 0.821 | ckpt | log |
pose_resnet_50 | 640x640 | 0.525 | 0.784 | 0.542 | 0.610 | 0.832 | ckpt | log |
pose_resnet_101 | 512x512 | 0.603 | 0.831 | 0.641 | 0.668 | 0.870 | ckpt | log |
pose_resnet_152 | 512x512 | 0.660 | 0.860 | 0.713 | 0.709 | 0.889 | ckpt | log |
Associative Embedding + Higherhrnet + Udp on Coco¶
Associative Embedding (NIPS'2017)
@inproceedings{newell2017associative,
title={Associative embedding: End-to-end learning for joint detection and grouping},
author={Newell, Alejandro and Huang, Zhiao and Deng, Jia},
booktitle={Advances in neural information processing systems},
pages={2277--2287},
year={2017}
}
HigherHRNet (CVPR'2020)
@inproceedings{cheng2020higherhrnet,
title={HigherHRNet: Scale-Aware Representation Learning for Bottom-Up Human Pose Estimation},
author={Cheng, Bowen and Xiao, Bin and Wang, Jingdong and Shi, Honghui and Huang, Thomas S and Zhang, Lei},
booktitle={Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition},
pages={5386--5395},
year={2020}
}
UDP (CVPR'2020)
@InProceedings{Huang_2020_CVPR,
author = {Huang, Junjie and Zhu, Zheng and Guo, Feng and Huang, Guan},
title = {The Devil Is in the Details: Delving Into Unbiased Data Processing for Human Pose Estimation},
booktitle = {The IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)},
month = {June},
year = {2020}
}
COCO (ECCV'2014)
@inproceedings{lin2014microsoft,
title={Microsoft coco: Common objects in context},
author={Lin, Tsung-Yi and Maire, Michael and Belongie, Serge and Hays, James and Perona, Pietro and Ramanan, Deva and Doll{\'a}r, Piotr and Zitnick, C Lawrence},
booktitle={European conference on computer vision},
pages={740--755},
year={2014},
organization={Springer}
}
Results on COCO val2017 without multi-scale test
Arch | Input Size | AP | AP50 | AP75 | AR | AR50 | ckpt | log |
---|---|---|---|---|---|---|---|---|
HigherHRNet-w32_udp | 512x512 | 0.678 | 0.862 | 0.736 | 0.724 | 0.890 | ckpt | log |
HigherHRNet-w48_udp | 512x512 | 0.690 | 0.872 | 0.750 | 0.734 | 0.891 | ckpt | log |
Associative Embedding + Hrnet on Coco¶
Associative Embedding (NIPS'2017)
@inproceedings{newell2017associative,
title={Associative embedding: End-to-end learning for joint detection and grouping},
author={Newell, Alejandro and Huang, Zhiao and Deng, Jia},
booktitle={Advances in neural information processing systems},
pages={2277--2287},
year={2017}
}
HRNet (CVPR'2019)
@inproceedings{sun2019deep,
title={Deep high-resolution representation learning for human pose estimation},
author={Sun, Ke and Xiao, Bin and Liu, Dong and Wang, Jingdong},
booktitle={Proceedings of the IEEE conference on computer vision and pattern recognition},
pages={5693--5703},
year={2019}
}
COCO (ECCV'2014)
@inproceedings{lin2014microsoft,
title={Microsoft coco: Common objects in context},
author={Lin, Tsung-Yi and Maire, Michael and Belongie, Serge and Hays, James and Perona, Pietro and Ramanan, Deva and Doll{\'a}r, Piotr and Zitnick, C Lawrence},
booktitle={European conference on computer vision},
pages={740--755},
year={2014},
organization={Springer}
}
Results on COCO val2017 without multi-scale test
Arch | Input Size | AP | AP50 | AP75 | AR | AR50 | ckpt | log |
---|---|---|---|---|---|---|---|---|
HRNet-w32 | 512x512 | 0.654 | 0.863 | 0.720 | 0.710 | 0.892 | ckpt | log |
HRNet-w48 | 512x512 | 0.665 | 0.860 | 0.727 | 0.716 | 0.889 | ckpt | log |
Results on COCO val2017 with multi-scale test. 3 default scales ([2, 1, 0.5]) are used
Arch | Input Size | AP | AP50 | AP75 | AR | AR50 | ckpt | log |
---|---|---|---|---|---|---|---|---|
HRNet-w32 | 512x512 | 0.698 | 0.877 | 0.760 | 0.748 | 0.907 | ckpt | log |
HRNet-w48 | 512x512 | 0.712 | 0.880 | 0.771 | 0.757 | 0.909 | ckpt | log |
Associative Embedding + Higherhrnet on Coco¶
Associative Embedding (NIPS'2017)
@inproceedings{newell2017associative,
title={Associative embedding: End-to-end learning for joint detection and grouping},
author={Newell, Alejandro and Huang, Zhiao and Deng, Jia},
booktitle={Advances in neural information processing systems},
pages={2277--2287},
year={2017}
}
HigherHRNet (CVPR'2020)
@inproceedings{cheng2020higherhrnet,
title={HigherHRNet: Scale-Aware Representation Learning for Bottom-Up Human Pose Estimation},
author={Cheng, Bowen and Xiao, Bin and Wang, Jingdong and Shi, Honghui and Huang, Thomas S and Zhang, Lei},
booktitle={Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition},
pages={5386--5395},
year={2020}
}
COCO (ECCV'2014)
@inproceedings{lin2014microsoft,
title={Microsoft coco: Common objects in context},
author={Lin, Tsung-Yi and Maire, Michael and Belongie, Serge and Hays, James and Perona, Pietro and Ramanan, Deva and Doll{\'a}r, Piotr and Zitnick, C Lawrence},
booktitle={European conference on computer vision},
pages={740--755},
year={2014},
organization={Springer}
}
Results on COCO val2017 without multi-scale test
Arch | Input Size | AP | AP50 | AP75 | AR | AR50 | ckpt | log |
---|---|---|---|---|---|---|---|---|
HigherHRNet-w32 | 512x512 | 0.677 | 0.870 | 0.738 | 0.723 | 0.890 | ckpt | log |
HigherHRNet-w32 | 640x640 | 0.686 | 0.871 | 0.747 | 0.733 | 0.898 | ckpt | log |
HigherHRNet-w48 | 512x512 | 0.686 | 0.873 | 0.741 | 0.731 | 0.892 | ckpt | log |
Results on COCO val2017 with multi-scale test. 3 default scales ([2, 1, 0.5]) are used
Arch | Input Size | AP | AP50 | AP75 | AR | AR50 | ckpt | log |
---|---|---|---|---|---|---|---|---|
HigherHRNet-w32 | 512x512 | 0.706 | 0.881 | 0.771 | 0.747 | 0.901 | ckpt | log |
HigherHRNet-w32 | 640x640 | 0.706 | 0.880 | 0.770 | 0.749 | 0.902 | ckpt | log |
HigherHRNet-w48 | 512x512 | 0.716 | 0.884 | 0.775 | 0.755 | 0.901 | ckpt | log |
Associative Embedding + Hrnet + Udp on Coco¶
Associative Embedding (NIPS'2017)
@inproceedings{newell2017associative,
title={Associative embedding: End-to-end learning for joint detection and grouping},
author={Newell, Alejandro and Huang, Zhiao and Deng, Jia},
booktitle={Advances in neural information processing systems},
pages={2277--2287},
year={2017}
}
HRNet (CVPR'2019)
@inproceedings{sun2019deep,
title={Deep high-resolution representation learning for human pose estimation},
author={Sun, Ke and Xiao, Bin and Liu, Dong and Wang, Jingdong},
booktitle={Proceedings of the IEEE conference on computer vision and pattern recognition},
pages={5693--5703},
year={2019}
}
UDP (CVPR'2020)
@InProceedings{Huang_2020_CVPR,
author = {Huang, Junjie and Zhu, Zheng and Guo, Feng and Huang, Guan},
title = {The Devil Is in the Details: Delving Into Unbiased Data Processing for Human Pose Estimation},
booktitle = {The IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)},
month = {June},
year = {2020}
}
COCO (ECCV'2014)
@inproceedings{lin2014microsoft,
title={Microsoft coco: Common objects in context},
author={Lin, Tsung-Yi and Maire, Michael and Belongie, Serge and Hays, James and Perona, Pietro and Ramanan, Deva and Doll{\'a}r, Piotr and Zitnick, C Lawrence},
booktitle={European conference on computer vision},
pages={740--755},
year={2014},
organization={Springer}
}
Results on COCO val2017 without multi-scale test
Arch | Input Size | AP | AP50 | AP75 | AR | AR50 | ckpt | log |
---|---|---|---|---|---|---|---|---|
HRNet-w32_udp | 512x512 | 0.671 | 0.863 | 0.729 | 0.717 | 0.889 | ckpt | log |
HRNet-w48_udp | 512x512 | 0.681 | 0.872 | 0.741 | 0.725 | 0.892 | ckpt | log |
Associative Embedding + Hourglass + Ae on Coco¶
Associative Embedding (NIPS'2017)
@inproceedings{newell2017associative,
title={Associative embedding: End-to-end learning for joint detection and grouping},
author={Newell, Alejandro and Huang, Zhiao and Deng, Jia},
booktitle={Advances in neural information processing systems},
pages={2277--2287},
year={2017}
}
HourglassAENet (NIPS'2017)
@inproceedings{newell2017associative,
title={Associative embedding: End-to-end learning for joint detection and grouping},
author={Newell, Alejandro and Huang, Zhiao and Deng, Jia},
booktitle={Advances in neural information processing systems},
pages={2277--2287},
year={2017}
}
COCO (ECCV'2014)
@inproceedings{lin2014microsoft,
title={Microsoft coco: Common objects in context},
author={Lin, Tsung-Yi and Maire, Michael and Belongie, Serge and Hays, James and Perona, Pietro and Ramanan, Deva and Doll{\'a}r, Piotr and Zitnick, C Lawrence},
booktitle={European conference on computer vision},
pages={740--755},
year={2014},
organization={Springer}
}
Results on COCO val2017 without multi-scale test
Arch | Input Size | AP | AP50 | AP75 | AR | AR50 | ckpt | log |
---|---|---|---|---|---|---|---|---|
pose_hourglass_ae | 512x512 | 0.613 | 0.833 | 0.667 | 0.659 | 0.850 | ckpt | log |
Results on COCO val2017 with multi-scale test. 3 default scales ([2, 1, 0.5]) are used
Arch | Input Size | AP | AP50 | AP75 | AR | AR50 | ckpt | log |
---|---|---|---|---|---|---|---|---|
pose_hourglass_ae | 512x512 | 0.667 | 0.855 | 0.723 | 0.707 | 0.877 | ckpt | log |
Deeppose + Resnet on Coco¶
DeepPose (CVPR'2014)
@inproceedings{toshev2014deeppose,
title={Deeppose: Human pose estimation via deep neural networks},
author={Toshev, Alexander and Szegedy, Christian},
booktitle={Proceedings of the IEEE conference on computer vision and pattern recognition},
pages={1653--1660},
year={2014}
}
ResNet (CVPR'2016)
@inproceedings{he2016deep,
title={Deep residual learning for image recognition},
author={He, Kaiming and Zhang, Xiangyu and Ren, Shaoqing and Sun, Jian},
booktitle={Proceedings of the IEEE conference on computer vision and pattern recognition},
pages={770--778},
year={2016}
}
COCO (ECCV'2014)
@inproceedings{lin2014microsoft,
title={Microsoft coco: Common objects in context},
author={Lin, Tsung-Yi and Maire, Michael and Belongie, Serge and Hays, James and Perona, Pietro and Ramanan, Deva and Doll{\'a}r, Piotr and Zitnick, C Lawrence},
booktitle={European conference on computer vision},
pages={740--755},
year={2014},
organization={Springer}
}
Results on COCO val2017 with detector having human AP of 56.4 on COCO val2017 dataset
Arch | Input Size | AP | AP50 | AP75 | AR | AR50 | ckpt | log |
---|---|---|---|---|---|---|---|---|
deeppose_resnet_50 | 256x192 | 0.526 | 0.816 | 0.586 | 0.638 | 0.887 | ckpt | log |
deeppose_resnet_101 | 256x192 | 0.560 | 0.832 | 0.628 | 0.668 | 0.900 | ckpt | log |
deeppose_resnet_152 | 256x192 | 0.583 | 0.843 | 0.659 | 0.686 | 0.907 | ckpt | log |
Topdown Heatmap + Shufflenetv2 on Coco¶
ShufflenetV2 (ECCV'2018)
@inproceedings{ma2018shufflenet,
title={Shufflenet v2: Practical guidelines for efficient cnn architecture design},
author={Ma, Ningning and Zhang, Xiangyu and Zheng, Hai-Tao and Sun, Jian},
booktitle={Proceedings of the European conference on computer vision (ECCV)},
pages={116--131},
year={2018}
}
COCO (ECCV'2014)
@inproceedings{lin2014microsoft,
title={Microsoft coco: Common objects in context},
author={Lin, Tsung-Yi and Maire, Michael and Belongie, Serge and Hays, James and Perona, Pietro and Ramanan, Deva and Doll{\'a}r, Piotr and Zitnick, C Lawrence},
booktitle={European conference on computer vision},
pages={740--755},
year={2014},
organization={Springer}
}
Results on COCO val2017 with detector having human AP of 56.4 on COCO val2017 dataset
Arch | Input Size | AP | AP50 | AP75 | AR | AR50 | ckpt | log |
---|---|---|---|---|---|---|---|---|
pose_shufflenetv2 | 256x192 | 0.599 | 0.854 | 0.663 | 0.664 | 0.899 | ckpt | log |
pose_shufflenetv2 | 384x288 | 0.636 | 0.865 | 0.705 | 0.697 | 0.909 | ckpt | log |
Topdown Heatmap + Litehrnet on Coco¶
LiteHRNet (CVPR'2021)
@inproceedings{Yulitehrnet21,
title={Lite-HRNet: A Lightweight High-Resolution Network},
author={Yu, Changqian and Xiao, Bin and Gao, Changxin and Yuan, Lu and Zhang, Lei and Sang, Nong and Wang, Jingdong},
booktitle={CVPR},
year={2021}
}
COCO (ECCV'2014)
@inproceedings{lin2014microsoft,
title={Microsoft coco: Common objects in context},
author={Lin, Tsung-Yi and Maire, Michael and Belongie, Serge and Hays, James and Perona, Pietro and Ramanan, Deva and Doll{\'a}r, Piotr and Zitnick, C Lawrence},
booktitle={European conference on computer vision},
pages={740--755},
year={2014},
organization={Springer}
}
Results on COCO val2017 with detector having human AP of 56.4 on COCO val2017 dataset
Arch | Input Size | AP | AP50 | AP75 | AR | AR50 | ckpt | log |
---|---|---|---|---|---|---|---|---|
LiteHRNet-18 | 256x192 | 0.643 | 0.868 | 0.720 | 0.706 | 0.912 | ckpt | log |
LiteHRNet-18 | 384x288 | 0.677 | 0.878 | 0.746 | 0.735 | 0.920 | ckpt | log |
LiteHRNet-30 | 256x192 | 0.675 | 0.881 | 0.754 | 0.736 | 0.924 | ckpt | log |
LiteHRNet-30 | 384x288 | 0.700 | 0.884 | 0.776 | 0.758 | 0.928 | ckpt | log |
Topdown Heatmap + Hourglass on Coco¶
Hourglass (ECCV'2016)
@inproceedings{newell2016stacked,
title={Stacked hourglass networks for human pose estimation},
author={Newell, Alejandro and Yang, Kaiyu and Deng, Jia},
booktitle={European conference on computer vision},
pages={483--499},
year={2016},
organization={Springer}
}
COCO (ECCV'2014)
@inproceedings{lin2014microsoft,
title={Microsoft coco: Common objects in context},
author={Lin, Tsung-Yi and Maire, Michael and Belongie, Serge and Hays, James and Perona, Pietro and Ramanan, Deva and Doll{\'a}r, Piotr and Zitnick, C Lawrence},
booktitle={European conference on computer vision},
pages={740--755},
year={2014},
organization={Springer}
}
Results on COCO val2017 with detector having human AP of 56.4 on COCO val2017 dataset
Arch | Input Size | AP | AP50 | AP75 | AR | AR50 | ckpt | log |
---|---|---|---|---|---|---|---|---|
pose_hourglass_52 | 256x256 | 0.726 | 0.896 | 0.799 | 0.780 | 0.934 | ckpt | log |
pose_hourglass_52 | 384x384 | 0.746 | 0.900 | 0.813 | 0.797 | 0.939 | ckpt | log |
Topdown Heatmap + Hrnet + Augmentation on Coco¶
HRNet (CVPR'2019)
@inproceedings{sun2019deep,
title={Deep high-resolution representation learning for human pose estimation},
author={Sun, Ke and Xiao, Bin and Liu, Dong and Wang, Jingdong},
booktitle={Proceedings of the IEEE conference on computer vision and pattern recognition},
pages={5693--5703},
year={2019}
}
Albumentations (Information'2020)
@article{buslaev2020albumentations,
title={Albumentations: fast and flexible image augmentations},
author={Buslaev, Alexander and Iglovikov, Vladimir I and Khvedchenya, Eugene and Parinov, Alex and Druzhinin, Mikhail and Kalinin, Alexandr A},
journal={Information},
volume={11},
number={2},
pages={125},
year={2020},
publisher={Multidisciplinary Digital Publishing Institute}
}
COCO (ECCV'2014)
@inproceedings{lin2014microsoft,
title={Microsoft coco: Common objects in context},
author={Lin, Tsung-Yi and Maire, Michael and Belongie, Serge and Hays, James and Perona, Pietro and Ramanan, Deva and Doll{\'a}r, Piotr and Zitnick, C Lawrence},
booktitle={European conference on computer vision},
pages={740--755},
year={2014},
organization={Springer}
}
Results on COCO val2017 with detector having human AP of 56.4 on COCO val2017 dataset
Arch | Input Size | AP | AP50 | AP75 | AR | AR50 | ckpt | log |
---|---|---|---|---|---|---|---|---|
coarsedropout | 256x192 | 0.753 | 0.908 | 0.822 | 0.806 | 0.946 | ckpt | log |
gridmask | 256x192 | 0.752 | 0.906 | 0.825 | 0.804 | 0.943 | ckpt | log |
photometric | 256x192 | 0.753 | 0.909 | 0.825 | 0.805 | 0.943 | ckpt | log |
Topdown Heatmap + Resnet + Fp16 on Coco¶
SimpleBaseline2D (ECCV'2018)
@inproceedings{xiao2018simple,
title={Simple baselines for human pose estimation and tracking},
author={Xiao, Bin and Wu, Haiping and Wei, Yichen},
booktitle={Proceedings of the European conference on computer vision (ECCV)},
pages={466--481},
year={2018}
}
ResNet (CVPR'2016)
@inproceedings{he2016deep,
title={Deep residual learning for image recognition},
author={He, Kaiming and Zhang, Xiangyu and Ren, Shaoqing and Sun, Jian},
booktitle={Proceedings of the IEEE conference on computer vision and pattern recognition},
pages={770--778},
year={2016}
}
FP16 (ArXiv'2017)
@article{micikevicius2017mixed,
title={Mixed precision training},
author={Micikevicius, Paulius and Narang, Sharan and Alben, Jonah and Diamos, Gregory and Elsen, Erich and Garcia, David and Ginsburg, Boris and Houston, Michael and Kuchaiev, Oleksii and Venkatesh, Ganesh and others},
journal={arXiv preprint arXiv:1710.03740},
year={2017}
}
COCO (ECCV'2014)
@inproceedings{lin2014microsoft,
title={Microsoft coco: Common objects in context},
author={Lin, Tsung-Yi and Maire, Michael and Belongie, Serge and Hays, James and Perona, Pietro and Ramanan, Deva and Doll{\'a}r, Piotr and Zitnick, C Lawrence},
booktitle={European conference on computer vision},
pages={740--755},
year={2014},
organization={Springer}
}
Results on COCO val2017 with detector having human AP of 56.4 on COCO val2017 dataset
Arch | Input Size | AP | AP50 | AP75 | AR | AR50 | ckpt | log |
---|---|---|---|---|---|---|---|---|
pose_resnet_50_fp16 | 256x192 | 0.717 | 0.898 | 0.793 | 0.772 | 0.936 | ckpt | log |
Topdown Heatmap + Hrnet + Fp16 on Coco¶
HRNet (CVPR'2019)
@inproceedings{sun2019deep,
title={Deep high-resolution representation learning for human pose estimation},
author={Sun, Ke and Xiao, Bin and Liu, Dong and Wang, Jingdong},
booktitle={Proceedings of the IEEE conference on computer vision and pattern recognition},
pages={5693--5703},
year={2019}
}
FP16 (ArXiv'2017)
@article{micikevicius2017mixed,
title={Mixed precision training},
author={Micikevicius, Paulius and Narang, Sharan and Alben, Jonah and Diamos, Gregory and Elsen, Erich and Garcia, David and Ginsburg, Boris and Houston, Michael and Kuchaiev, Oleksii and Venkatesh, Ganesh and others},
journal={arXiv preprint arXiv:1710.03740},
year={2017}
}
COCO (ECCV'2014)
@inproceedings{lin2014microsoft,
title={Microsoft coco: Common objects in context},
author={Lin, Tsung-Yi and Maire, Michael and Belongie, Serge and Hays, James and Perona, Pietro and Ramanan, Deva and Doll{\'a}r, Piotr and Zitnick, C Lawrence},
booktitle={European conference on computer vision},
pages={740--755},
year={2014},
organization={Springer}
}
Results on COCO val2017 with detector having human AP of 56.4 on COCO val2017 dataset
Arch | Input Size | AP | AP50 | AP75 | AR | AR50 | ckpt | log |
---|---|---|---|---|---|---|---|---|
pose_hrnet_w32_fp16 | 256x192 | 0.746 | 0.905 | 0.88 | 0.800 | 0.943 | ckpt | log |
Topdown Heatmap + Mobilenetv2 on Coco¶
MobilenetV2 (CVPR'2018)
@inproceedings{sandler2018mobilenetv2,
title={Mobilenetv2: Inverted residuals and linear bottlenecks},
author={Sandler, Mark and Howard, Andrew and Zhu, Menglong and Zhmoginov, Andrey and Chen, Liang-Chieh},
booktitle={Proceedings of the IEEE conference on computer vision and pattern recognition},
pages={4510--4520},
year={2018}
}
COCO (ECCV'2014)
@inproceedings{lin2014microsoft,
title={Microsoft coco: Common objects in context},
author={Lin, Tsung-Yi and Maire, Michael and Belongie, Serge and Hays, James and Perona, Pietro and Ramanan, Deva and Doll{\'a}r, Piotr and Zitnick, C Lawrence},
booktitle={European conference on computer vision},
pages={740--755},
year={2014},
organization={Springer}
}
Results on COCO val2017 with detector having human AP of 56.4 on COCO val2017 dataset
Arch | Input Size | AP | AP50 | AP75 | AR | AR50 | ckpt | log |
---|---|---|---|---|---|---|---|---|
pose_mobilenetv2 | 256x192 | 0.646 | 0.874 | 0.723 | 0.707 | 0.917 | ckpt | log |
pose_mobilenetv2 | 384x288 | 0.673 | 0.879 | 0.743 | 0.729 | 0.916 | ckpt | log |
Topdown Heatmap + Resnet on Coco¶
SimpleBaseline2D (ECCV'2018)
@inproceedings{xiao2018simple,
title={Simple baselines for human pose estimation and tracking},
author={Xiao, Bin and Wu, Haiping and Wei, Yichen},
booktitle={Proceedings of the European conference on computer vision (ECCV)},
pages={466--481},
year={2018}
}
ResNet (CVPR'2016)
@inproceedings{he2016deep,
title={Deep residual learning for image recognition},
author={He, Kaiming and Zhang, Xiangyu and Ren, Shaoqing and Sun, Jian},
booktitle={Proceedings of the IEEE conference on computer vision and pattern recognition},
pages={770--778},
year={2016}
}
COCO (ECCV'2014)
@inproceedings{lin2014microsoft,
title={Microsoft coco: Common objects in context},
author={Lin, Tsung-Yi and Maire, Michael and Belongie, Serge and Hays, James and Perona, Pietro and Ramanan, Deva and Doll{\'a}r, Piotr and Zitnick, C Lawrence},
booktitle={European conference on computer vision},
pages={740--755},
year={2014},
organization={Springer}
}
Results on COCO val2017 with detector having human AP of 56.4 on COCO val2017 dataset
Arch | Input Size | AP | AP50 | AP75 | AR | AR50 | ckpt | log |
---|---|---|---|---|---|---|---|---|
pose_resnet_50 | 256x192 | 0.718 | 0.898 | 0.795 | 0.773 | 0.937 | ckpt | log |
pose_resnet_50 | 384x288 | 0.731 | 0.900 | 0.799 | 0.783 | 0.931 | ckpt | log |
pose_resnet_101 | 256x192 | 0.726 | 0.899 | 0.806 | 0.781 | 0.939 | ckpt | log |
pose_resnet_101 | 384x288 | 0.748 | 0.905 | 0.817 | 0.798 | 0.940 | ckpt | log |
pose_resnet_152 | 256x192 | 0.735 | 0.905 | 0.812 | 0.790 | 0.943 | ckpt | log |
pose_resnet_152 | 384x288 | 0.750 | 0.908 | 0.821 | 0.800 | 0.942 | ckpt | log |
Topdown Heatmap + Hrnet on Coco¶
HRNet (CVPR'2019)
@inproceedings{sun2019deep,
title={Deep high-resolution representation learning for human pose estimation},
author={Sun, Ke and Xiao, Bin and Liu, Dong and Wang, Jingdong},
booktitle={Proceedings of the IEEE conference on computer vision and pattern recognition},
pages={5693--5703},
year={2019}
}
COCO (ECCV'2014)
@inproceedings{lin2014microsoft,
title={Microsoft coco: Common objects in context},
author={Lin, Tsung-Yi and Maire, Michael and Belongie, Serge and Hays, James and Perona, Pietro and Ramanan, Deva and Doll{\'a}r, Piotr and Zitnick, C Lawrence},
booktitle={European conference on computer vision},
pages={740--755},
year={2014},
organization={Springer}
}
Results on COCO val2017 with detector having human AP of 56.4 on COCO val2017 dataset
Arch | Input Size | AP | AP50 | AP75 | AR | AR50 | ckpt | log |
---|---|---|---|---|---|---|---|---|
pose_hrnet_w32 | 256x192 | 0.746 | 0.904 | 0.819 | 0.799 | 0.942 | ckpt | log |
pose_hrnet_w32 | 384x288 | 0.760 | 0.906 | 0.829 | 0.810 | 0.943 | ckpt | log |
pose_hrnet_w48 | 256x192 | 0.756 | 0.907 | 0.825 | 0.806 | 0.942 | ckpt | log |
pose_hrnet_w48 | 384x288 | 0.767 | 0.910 | 0.831 | 0.816 | 0.946 | ckpt | log |
Topdown Heatmap + RSN on Coco¶
RSN (ECCV'2020)
@misc{cai2020learning,
title={Learning Delicate Local Representations for Multi-Person Pose Estimation},
author={Yuanhao Cai and Zhicheng Wang and Zhengxiong Luo and Binyi Yin and Angang Du and Haoqian Wang and Xinyu Zhou and Erjin Zhou and Xiangyu Zhang and Jian Sun},
year={2020},
eprint={2003.04030},
archivePrefix={arXiv},
primaryClass={cs.CV}
}
COCO (ECCV'2014)
@inproceedings{lin2014microsoft,
title={Microsoft coco: Common objects in context},
author={Lin, Tsung-Yi and Maire, Michael and Belongie, Serge and Hays, James and Perona, Pietro and Ramanan, Deva and Doll{\'a}r, Piotr and Zitnick, C Lawrence},
booktitle={European conference on computer vision},
pages={740--755},
year={2014},
organization={Springer}
}
Results on COCO val2017 with detector having human AP of 56.4 on COCO val2017 dataset
Arch | Input Size | AP | AP50 | AP75 | AR | AR50 | ckpt | log |
---|---|---|---|---|---|---|---|---|
rsn_18 | 256x192 | 0.704 | 0.887 | 0.779 | 0.771 | 0.926 | ckpt | log |
rsn_50 | 256x192 | 0.723 | 0.896 | 0.800 | 0.788 | 0.934 | ckpt | log |
2xrsn_50 | 256x192 | 0.745 | 0.899 | 0.818 | 0.809 | 0.939 | ckpt | log |
3xrsn_50 | 256x192 | 0.750 | 0.900 | 0.823 | 0.813 | 0.940 | ckpt | log |
Topdown Heatmap + Resnest on Coco¶
ResNeSt (ArXiv'2020)
@article{zhang2020resnest,
title={ResNeSt: Split-Attention Networks},
author={Zhang, Hang and Wu, Chongruo and Zhang, Zhongyue and Zhu, Yi and Zhang, Zhi and Lin, Haibin and Sun, Yue and He, Tong and Muller, Jonas and Manmatha, R. and Li, Mu and Smola, Alexander},
journal={arXiv preprint arXiv:2004.08955},
year={2020}
}
COCO (ECCV'2014)
@inproceedings{lin2014microsoft,
title={Microsoft coco: Common objects in context},
author={Lin, Tsung-Yi and Maire, Michael and Belongie, Serge and Hays, James and Perona, Pietro and Ramanan, Deva and Doll{\'a}r, Piotr and Zitnick, C Lawrence},
booktitle={European conference on computer vision},
pages={740--755},
year={2014},
organization={Springer}
}
Results on COCO val2017 with detector having human AP of 56.4 on COCO val2017 dataset
Arch | Input Size | AP | AP50 | AP75 | AR | AR50 | ckpt | log |
---|---|---|---|---|---|---|---|---|
pose_resnest_50 | 256x192 | 0.721 | 0.899 | 0.802 | 0.776 | 0.938 | ckpt | log |
pose_resnest_50 | 384x288 | 0.737 | 0.900 | 0.811 | 0.789 | 0.938 | ckpt | log |
pose_resnest_101 | 256x192 | 0.725 | 0.899 | 0.807 | 0.781 | 0.939 | ckpt | log |
pose_resnest_101 | 384x288 | 0.746 | 0.906 | 0.820 | 0.798 | 0.943 | ckpt | log |
pose_resnest_200 | 256x192 | 0.732 | 0.905 | 0.812 | 0.787 | 0.942 | ckpt | log |
pose_resnest_200 | 384x288 | 0.754 | 0.908 | 0.827 | 0.807 | 0.945 | ckpt | log |
pose_resnest_269 | 256x192 | 0.738 | 0.907 | 0.819 | 0.793 | 0.945 | ckpt | log |
pose_resnest_269 | 384x288 | 0.755 | 0.908 | 0.828 | 0.806 | 0.943 | ckpt | log |
Topdown Heatmap + Resnext on Coco¶
ResNext (CVPR'2017)
@inproceedings{xie2017aggregated,
title={Aggregated residual transformations for deep neural networks},
author={Xie, Saining and Girshick, Ross and Doll{\'a}r, Piotr and Tu, Zhuowen and He, Kaiming},
booktitle={Proceedings of the IEEE conference on computer vision and pattern recognition},
pages={1492--1500},
year={2017}
}
COCO (ECCV'2014)
@inproceedings{lin2014microsoft,
title={Microsoft coco: Common objects in context},
author={Lin, Tsung-Yi and Maire, Michael and Belongie, Serge and Hays, James and Perona, Pietro and Ramanan, Deva and Doll{\'a}r, Piotr and Zitnick, C Lawrence},
booktitle={European conference on computer vision},
pages={740--755},
year={2014},
organization={Springer}
}
Results on COCO val2017 with detector having human AP of 56.4 on COCO val2017 dataset
Arch | Input Size | AP | AP50 | AP75 | AR | AR50 | ckpt | log |
---|---|---|---|---|---|---|---|---|
pose_resnext_50 | 256x192 | 0.714 | 0.898 | 0.789 | 0.771 | 0.937 | ckpt | log |
pose_resnext_50 | 384x288 | 0.724 | 0.899 | 0.794 | 0.777 | 0.935 | ckpt | log |
pose_resnext_101 | 256x192 | 0.726 | 0.900 | 0.801 | 0.782 | 0.940 | ckpt | log |
pose_resnext_101 | 384x288 | 0.743 | 0.903 | 0.815 | 0.795 | 0.939 | ckpt | log |
pose_resnext_152 | 256x192 | 0.730 | 0.904 | 0.808 | 0.786 | 0.940 | ckpt | log |
pose_resnext_152 | 384x288 | 0.742 | 0.902 | 0.810 | 0.794 | 0.939 | ckpt | log |
Topdown Heatmap + MSPN on Coco¶
MSPN (ArXiv'2019)
@article{li2019rethinking,
title={Rethinking on Multi-Stage Networks for Human Pose Estimation},
author={Li, Wenbo and Wang, Zhicheng and Yin, Binyi and Peng, Qixiang and Du, Yuming and Xiao, Tianzi and Yu, Gang and Lu, Hongtao and Wei, Yichen and Sun, Jian},
journal={arXiv preprint arXiv:1901.00148},
year={2019}
}
COCO (ECCV'2014)
@inproceedings{lin2014microsoft,
title={Microsoft coco: Common objects in context},
author={Lin, Tsung-Yi and Maire, Michael and Belongie, Serge and Hays, James and Perona, Pietro and Ramanan, Deva and Doll{\'a}r, Piotr and Zitnick, C Lawrence},
booktitle={European conference on computer vision},
pages={740--755},
year={2014},
organization={Springer}
}
Results on COCO val2017 with detector having human AP of 56.4 on COCO val2017 dataset
Arch | Input Size | AP | AP50 | AP75 | AR | AR50 | ckpt | log |
---|---|---|---|---|---|---|---|---|
mspn_50 | 256x192 | 0.723 | 0.895 | 0.794 | 0.788 | 0.933 | ckpt | log |
2xmspn_50 | 256x192 | 0.754 | 0.903 | 0.825 | 0.815 | 0.941 | ckpt | log |
3xmspn_50 | 256x192 | 0.758 | 0.904 | 0.830 | 0.821 | 0.943 | ckpt | log |
4xmspn_50 | 256x192 | 0.764 | 0.906 | 0.835 | 0.826 | 0.944 | ckpt | log |
Topdown Heatmap + VGG on Coco¶
VGG (ICLR'2015)
@article{simonyan2014very,
title={Very deep convolutional networks for large-scale image recognition},
author={Simonyan, Karen and Zisserman, Andrew},
journal={arXiv preprint arXiv:1409.1556},
year={2014}
}
COCO (ECCV'2014)
@inproceedings{lin2014microsoft,
title={Microsoft coco: Common objects in context},
author={Lin, Tsung-Yi and Maire, Michael and Belongie, Serge and Hays, James and Perona, Pietro and Ramanan, Deva and Doll{\'a}r, Piotr and Zitnick, C Lawrence},
booktitle={European conference on computer vision},
pages={740--755},
year={2014},
organization={Springer}
}
Results on COCO val2017 with detector having human AP of 56.4 on COCO val2017 dataset
Arch | Input Size | AP | AP50 | AP75 | AR | AR50 | ckpt | log |
---|---|---|---|---|---|---|---|---|
vgg | 256x192 | 0.698 | 0.890 | 0.768 | 0.754 | 0.929 | ckpt | log |
Topdown Heatmap + Resnet + Dark on Coco¶
SimpleBaseline2D (ECCV'2018)
@inproceedings{xiao2018simple,
title={Simple baselines for human pose estimation and tracking},
author={Xiao, Bin and Wu, Haiping and Wei, Yichen},
booktitle={Proceedings of the European conference on computer vision (ECCV)},
pages={466--481},
year={2018}
}
ResNet (CVPR'2016)
@inproceedings{he2016deep,
title={Deep residual learning for image recognition},
author={He, Kaiming and Zhang, Xiangyu and Ren, Shaoqing and Sun, Jian},
booktitle={Proceedings of the IEEE conference on computer vision and pattern recognition},
pages={770--778},
year={2016}
}
DarkPose (CVPR'2020)
@inproceedings{zhang2020distribution,
title={Distribution-aware coordinate representation for human pose estimation},
author={Zhang, Feng and Zhu, Xiatian and Dai, Hanbin and Ye, Mao and Zhu, Ce},
booktitle={Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition},
pages={7093--7102},
year={2020}
}
COCO (ECCV'2014)
@inproceedings{lin2014microsoft,
title={Microsoft coco: Common objects in context},
author={Lin, Tsung-Yi and Maire, Michael and Belongie, Serge and Hays, James and Perona, Pietro and Ramanan, Deva and Doll{\'a}r, Piotr and Zitnick, C Lawrence},
booktitle={European conference on computer vision},
pages={740--755},
year={2014},
organization={Springer}
}
Results on COCO val2017 with detector having human AP of 56.4 on COCO val2017 dataset
Arch | Input Size | AP | AP50 | AP75 | AR | AR50 | ckpt | log |
---|---|---|---|---|---|---|---|---|
pose_resnet_50_dark | 256x192 | 0.724 | 0.898 | 0.800 | 0.777 | 0.936 | ckpt | log |
pose_resnet_50_dark | 384x288 | 0.735 | 0.900 | 0.801 | 0.785 | 0.937 | ckpt | log |
pose_resnet_101_dark | 256x192 | 0.732 | 0.899 | 0.808 | 0.786 | 0.938 | ckpt | log |
pose_resnet_101_dark | 384x288 | 0.749 | 0.902 | 0.816 | 0.799 | 0.939 | ckpt | log |
pose_resnet_152_dark | 256x192 | 0.745 | 0.905 | 0.821 | 0.797 | 0.942 | ckpt | log |
pose_resnet_152_dark | 384x288 | 0.757 | 0.909 | 0.826 | 0.806 | 0.943 | ckpt | log |
Topdown Heatmap + Hrnet + Udp on Coco¶
HRNet (CVPR'2019)
@inproceedings{sun2019deep,
title={Deep high-resolution representation learning for human pose estimation},
author={Sun, Ke and Xiao, Bin and Liu, Dong and Wang, Jingdong},
booktitle={Proceedings of the IEEE conference on computer vision and pattern recognition},
pages={5693--5703},
year={2019}
}
UDP (CVPR'2020)
@InProceedings{Huang_2020_CVPR,
author = {Huang, Junjie and Zhu, Zheng and Guo, Feng and Huang, Guan},
title = {The Devil Is in the Details: Delving Into Unbiased Data Processing for Human Pose Estimation},
booktitle = {The IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)},
month = {June},
year = {2020}
}
COCO (ECCV'2014)
@inproceedings{lin2014microsoft,
title={Microsoft coco: Common objects in context},
author={Lin, Tsung-Yi and Maire, Michael and Belongie, Serge and Hays, James and Perona, Pietro and Ramanan, Deva and Doll{\'a}r, Piotr and Zitnick, C Lawrence},
booktitle={European conference on computer vision},
pages={740--755},
year={2014},
organization={Springer}
}
Results on COCO val2017 with detector having human AP of 56.4 on COCO val2017 dataset
Arch | Input Size | AP | AP50 | AP75 | AR | AR50 | ckpt | log |
---|---|---|---|---|---|---|---|---|
pose_hrnet_w32_udp | 256x192 | 0.760 | 0.907 | 0.827 | 0.811 | 0.945 | ckpt | log |
pose_hrnet_w32_udp | 384x288 | 0.769 | 0.908 | 0.833 | 0.817 | 0.944 | ckpt | log |
pose_hrnet_w48_udp | 256x192 | 0.767 | 0.906 | 0.834 | 0.817 | 0.945 | ckpt | log |
pose_hrnet_w48_udp | 384x288 | 0.772 | 0.910 | 0.835 | 0.820 | 0.945 | ckpt | log |
pose_hrnet_w32_udp_regress | 256x192 | 0.758 | 0.908 | 0.823 | 0.812 | 0.943 | ckpt | log |
Note that, UDP also adopts the unbiased encoding/decoding algorithm of DARK.
Topdown Heatmap + Alexnet on Coco¶
AlexNet (NeurIPS'2012)
@inproceedings{krizhevsky2012imagenet,
title={Imagenet classification with deep convolutional neural networks},
author={Krizhevsky, Alex and Sutskever, Ilya and Hinton, Geoffrey E},
booktitle={Advances in neural information processing systems},
pages={1097--1105},
year={2012}
}
COCO (ECCV'2014)
@inproceedings{lin2014microsoft,
title={Microsoft coco: Common objects in context},
author={Lin, Tsung-Yi and Maire, Michael and Belongie, Serge and Hays, James and Perona, Pietro and Ramanan, Deva and Doll{\'a}r, Piotr and Zitnick, C Lawrence},
booktitle={European conference on computer vision},
pages={740--755},
year={2014},
organization={Springer}
}
Results on COCO val2017 with detector having human AP of 56.4 on COCO val2017 dataset
Arch | Input Size | AP | AP50 | AP75 | AR | AR50 | ckpt | log |
---|---|---|---|---|---|---|---|---|
pose_alexnet | 256x192 | 0.397 | 0.758 | 0.381 | 0.478 | 0.822 | ckpt | log |
Topdown Heatmap + Seresnet on Coco¶
SEResNet (CVPR'2018)
@inproceedings{hu2018squeeze,
title={Squeeze-and-excitation networks},
author={Hu, Jie and Shen, Li and Sun, Gang},
booktitle={Proceedings of the IEEE conference on computer vision and pattern recognition},
pages={7132--7141},
year={2018}
}
COCO (ECCV'2014)
@inproceedings{lin2014microsoft,
title={Microsoft coco: Common objects in context},
author={Lin, Tsung-Yi and Maire, Michael and Belongie, Serge and Hays, James and Perona, Pietro and Ramanan, Deva and Doll{\'a}r, Piotr and Zitnick, C Lawrence},
booktitle={European conference on computer vision},
pages={740--755},
year={2014},
organization={Springer}
}
Results on COCO val2017 with detector having human AP of 56.4 on COCO val2017 dataset
Arch | Input Size | AP | AP50 | AP75 | AR | AR50 | ckpt | log |
---|---|---|---|---|---|---|---|---|
pose_seresnet_50 | 256x192 | 0.728 | 0.900 | 0.809 | 0.784 | 0.940 | ckpt | log |
pose_seresnet_50 | 384x288 | 0.748 | 0.905 | 0.819 | 0.799 | 0.941 | ckpt | log |
pose_seresnet_101 | 256x192 | 0.734 | 0.904 | 0.815 | 0.790 | 0.942 | ckpt | log |
pose_seresnet_101 | 384x288 | 0.753 | 0.907 | 0.823 | 0.805 | 0.943 | ckpt | log |
pose_seresnet_152* | 256x192 | 0.730 | 0.899 | 0.810 | 0.786 | 0.940 | ckpt | log |
pose_seresnet_152* | 384x288 | 0.753 | 0.906 | 0.823 | 0.806 | 0.945 | ckpt | log |
Note that * means without imagenet pre-training.
Topdown Heatmap + Shufflenetv1 on Coco¶
ShufflenetV1 (CVPR'2018)
@inproceedings{zhang2018shufflenet,
title={Shufflenet: An extremely efficient convolutional neural network for mobile devices},
author={Zhang, Xiangyu and Zhou, Xinyu and Lin, Mengxiao and Sun, Jian},
booktitle={Proceedings of the IEEE conference on computer vision and pattern recognition},
pages={6848--6856},
year={2018}
}
COCO (ECCV'2014)
@inproceedings{lin2014microsoft,
title={Microsoft coco: Common objects in context},
author={Lin, Tsung-Yi and Maire, Michael and Belongie, Serge and Hays, James and Perona, Pietro and Ramanan, Deva and Doll{\'a}r, Piotr and Zitnick, C Lawrence},
booktitle={European conference on computer vision},
pages={740--755},
year={2014},
organization={Springer}
}
Results on COCO val2017 with detector having human AP of 56.4 on COCO val2017 dataset
Arch | Input Size | AP | AP50 | AP75 | AR | AR50 | ckpt | log |
---|---|---|---|---|---|---|---|---|
pose_shufflenetv1 | 256x192 | 0.585 | 0.845 | 0.650 | 0.651 | 0.894 | ckpt | log |
pose_shufflenetv1 | 384x288 | 0.622 | 0.859 | 0.685 | 0.684 | 0.901 | ckpt | log |
Topdown Heatmap + CPM on Coco¶
CPM (CVPR'2016)
@inproceedings{wei2016convolutional,
title={Convolutional pose machines},
author={Wei, Shih-En and Ramakrishna, Varun and Kanade, Takeo and Sheikh, Yaser},
booktitle={Proceedings of the IEEE conference on Computer Vision and Pattern Recognition},
pages={4724--4732},
year={2016}
}
COCO (ECCV'2014)
@inproceedings{lin2014microsoft,
title={Microsoft coco: Common objects in context},
author={Lin, Tsung-Yi and Maire, Michael and Belongie, Serge and Hays, James and Perona, Pietro and Ramanan, Deva and Doll{\'a}r, Piotr and Zitnick, C Lawrence},
booktitle={European conference on computer vision},
pages={740--755},
year={2014},
organization={Springer}
}
Results on COCO val2017 with detector having human AP of 56.4 on COCO val2017 dataset
Arch | Input Size | AP | AP50 | AP75 | AR | AR50 | ckpt | log |
---|---|---|---|---|---|---|---|---|
cpm | 256x192 | 0.623 | 0.859 | 0.704 | 0.686 | 0.903 | ckpt | log |
cpm | 384x288 | 0.650 | 0.864 | 0.725 | 0.708 | 0.905 | ckpt | log |
Topdown Heatmap + Hrnet + Dark on Coco¶
HRNet (CVPR'2019)
@inproceedings{sun2019deep,
title={Deep high-resolution representation learning for human pose estimation},
author={Sun, Ke and Xiao, Bin and Liu, Dong and Wang, Jingdong},
booktitle={Proceedings of the IEEE conference on computer vision and pattern recognition},
pages={5693--5703},
year={2019}
}
DarkPose (CVPR'2020)
@inproceedings{zhang2020distribution,
title={Distribution-aware coordinate representation for human pose estimation},
author={Zhang, Feng and Zhu, Xiatian and Dai, Hanbin and Ye, Mao and Zhu, Ce},
booktitle={Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition},
pages={7093--7102},
year={2020}
}
COCO (ECCV'2014)
@inproceedings{lin2014microsoft,
title={Microsoft coco: Common objects in context},
author={Lin, Tsung-Yi and Maire, Michael and Belongie, Serge and Hays, James and Perona, Pietro and Ramanan, Deva and Doll{\'a}r, Piotr and Zitnick, C Lawrence},
booktitle={European conference on computer vision},
pages={740--755},
year={2014},
organization={Springer}
}
Results on COCO val2017 with detector having human AP of 56.4 on COCO val2017 dataset
Arch | Input Size | AP | AP50 | AP75 | AR | AR50 | ckpt | log |
---|---|---|---|---|---|---|---|---|
pose_hrnet_w32_dark | 256x192 | 0.757 | 0.907 | 0.823 | 0.808 | 0.943 | ckpt | log |
pose_hrnet_w32_dark | 384x288 | 0.766 | 0.907 | 0.831 | 0.815 | 0.943 | ckpt | log |
pose_hrnet_w48_dark | 256x192 | 0.764 | 0.907 | 0.830 | 0.814 | 0.943 | ckpt | log |
pose_hrnet_w48_dark | 384x288 | 0.772 | 0.910 | 0.836 | 0.820 | 0.946 | ckpt | log |
Topdown Heatmap + Scnet on Coco¶
SCNet (CVPR'2020)
@inproceedings{liu2020improving,
title={Improving Convolutional Networks with Self-Calibrated Convolutions},
author={Liu, Jiang-Jiang and Hou, Qibin and Cheng, Ming-Ming and Wang, Changhu and Feng, Jiashi},
booktitle={Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition},
pages={10096--10105},
year={2020}
}
COCO (ECCV'2014)
@inproceedings{lin2014microsoft,
title={Microsoft coco: Common objects in context},
author={Lin, Tsung-Yi and Maire, Michael and Belongie, Serge and Hays, James and Perona, Pietro and Ramanan, Deva and Doll{\'a}r, Piotr and Zitnick, C Lawrence},
booktitle={European conference on computer vision},
pages={740--755},
year={2014},
organization={Springer}
}
Results on COCO val2017 with detector having human AP of 56.4 on COCO val2017 dataset
Arch | Input Size | AP | AP50 | AP75 | AR | AR50 | ckpt | log |
---|---|---|---|---|---|---|---|---|
pose_scnet_50 | 256x192 | 0.728 | 0.899 | 0.807 | 0.784 | 0.938 | ckpt | log |
pose_scnet_50 | 384x288 | 0.751 | 0.906 | 0.818 | 0.802 | 0.943 | ckpt | log |
pose_scnet_101 | 256x192 | 0.733 | 0.903 | 0.813 | 0.790 | 0.941 | ckpt | log |
pose_scnet_101 | 384x288 | 0.752 | 0.906 | 0.823 | 0.804 | 0.943 | ckpt | log |
Topdown Heatmap + Resnetv1d on Coco¶
ResNetV1D (CVPR'2019)
@inproceedings{he2019bag,
title={Bag of tricks for image classification with convolutional neural networks},
author={He, Tong and Zhang, Zhi and Zhang, Hang and Zhang, Zhongyue and Xie, Junyuan and Li, Mu},
booktitle={Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition},
pages={558--567},
year={2019}
}
COCO (ECCV'2014)
@inproceedings{lin2014microsoft,
title={Microsoft coco: Common objects in context},
author={Lin, Tsung-Yi and Maire, Michael and Belongie, Serge and Hays, James and Perona, Pietro and Ramanan, Deva and Doll{\'a}r, Piotr and Zitnick, C Lawrence},
booktitle={European conference on computer vision},
pages={740--755},
year={2014},
organization={Springer}
}
Results on COCO val2017 with detector having human AP of 56.4 on COCO val2017 dataset
Arch | Input Size | AP | AP50 | AP75 | AR | AR50 | ckpt | log |
---|---|---|---|---|---|---|---|---|
pose_resnetv1d_50 | 256x192 | 0.722 | 0.897 | 0.799 | 0.777 | 0.933 | ckpt | log |
pose_resnetv1d_50 | 384x288 | 0.730 | 0.900 | 0.799 | 0.780 | 0.934 | ckpt | log |
pose_resnetv1d_101 | 256x192 | 0.731 | 0.899 | 0.809 | 0.786 | 0.938 | ckpt | log |
pose_resnetv1d_101 | 384x288 | 0.748 | 0.902 | 0.816 | 0.799 | 0.939 | ckpt | log |
pose_resnetv1d_152 | 256x192 | 0.737 | 0.902 | 0.812 | 0.791 | 0.940 | ckpt | log |
pose_resnetv1d_152 | 384x288 | 0.752 | 0.909 | 0.821 | 0.802 | 0.944 | ckpt | log |
Topdown Heatmap + Vipnas on Coco¶
ViPNAS (CVPR'2021)
@article{xu2021vipnas,
title={ViPNAS: Efficient Video Pose Estimation via Neural Architecture Search},
author={Xu, Lumin and Guan, Yingda and Jin, Sheng and Liu, Wentao and Qian, Chen and Luo, Ping and Ouyang, Wanli and Wang, Xiaogang},
booktitle={Proceedings of the IEEE conference on computer vision and pattern recognition},
year={2021}
}
COCO (ECCV'2014)
@inproceedings{lin2014microsoft,
title={Microsoft coco: Common objects in context},
author={Lin, Tsung-Yi and Maire, Michael and Belongie, Serge and Hays, James and Perona, Pietro and Ramanan, Deva and Doll{\'a}r, Piotr and Zitnick, C Lawrence},
booktitle={European conference on computer vision},
pages={740--755},
year={2014},
organization={Springer}
}
Results on COCO val2017 with detector having human AP of 56.4 on COCO val2017 dataset
Arch | Input Size | AP | AP50 | AP75 | AR | AR50 | ckpt | log |
---|---|---|---|---|---|---|---|---|
S-ViPNAS-MobileNetV3 | 256x192 | 0.700 | 0.887 | 0.778 | 0.757 | 0.929 | ckpt | log |
S-ViPNAS-Res50 | 256x192 | 0.711 | 0.893 | 0.789 | 0.769 | 0.934 | ckpt | log |
Posewarper + Hrnet + Posetrack18 on Posetrack18¶
PoseWarper (NeurIPS'2019)
@inproceedings{NIPS2019_gberta,
title = {Learning Temporal Pose Estimation from Sparsely Labeled Videos},
author = {Bertasius, Gedas and Feichtenhofer, Christoph, and Tran, Du and Shi, Jianbo, and Torresani, Lorenzo},
booktitle = {Advances in Neural Information Processing Systems 33},
year = {2019},
}
HRNet (CVPR'2019)
@inproceedings{sun2019deep,
title={Deep high-resolution representation learning for human pose estimation},
author={Sun, Ke and Xiao, Bin and Liu, Dong and Wang, Jingdong},
booktitle={Proceedings of the IEEE conference on computer vision and pattern recognition},
pages={5693--5703},
year={2019}
}
PoseTrack18 (CVPR'2018)
@inproceedings{andriluka2018posetrack,
title={Posetrack: A benchmark for human pose estimation and tracking},
author={Andriluka, Mykhaylo and Iqbal, Umar and Insafutdinov, Eldar and Pishchulin, Leonid and Milan, Anton and Gall, Juergen and Schiele, Bernt},
booktitle={Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition},
pages={5167--5176},
year={2018}
}
COCO (ECCV'2014)
@inproceedings{lin2014microsoft,
title={Microsoft coco: Common objects in context},
author={Lin, Tsung-Yi and Maire, Michael and Belongie, Serge and Hays, James and Perona, Pietro and Ramanan, Deva and Doll{\'a}r, Piotr and Zitnick, C Lawrence},
booktitle={European conference on computer vision},
pages={740--755},
year={2014},
organization={Springer}
}
Note that the training of PoseWarper can be split into two stages.
The first-stage is trained with the pre-trained checkpoint on COCO dataset, and the main backbone is fine-tuned on PoseTrack18 in a single-frame setting.
The second-stage is trained with the last checkpoint from the first stage, and the warping offsets are learned in a multi-frame setting while the backbone is frozen.
Results on PoseTrack2018 val with ground-truth bounding boxes
Arch | Input Size | Head | Shou | Elb | Wri | Hip | Knee | Ankl | Total | ckpt | log |
---|---|---|---|---|---|---|---|---|---|---|---|
pose_hrnet_w48 | 384x288 | 88.2 | 90.3 | 86.1 | 81.6 | 81.8 | 83.8 | 81.5 | 85.0 | ckpt | log |
Results on PoseTrack2018 val with precomputed human bounding boxes from PoseWarper supplementary data files from this link1.
Arch | Input Size | Head | Shou | Elb | Wri | Hip | Knee | Ankl | Total | ckpt | log |
---|---|---|---|---|---|---|---|---|---|---|---|
pose_hrnet_w48 | 384x288 | 81.8 | 85.6 | 82.7 | 77.2 | 76.8 | 79.0 | 74.4 | 79.8 | ckpt | log |
1 Please download the precomputed human bounding boxes on PoseTrack2018 val from $PoseWarper_supp_files/posetrack18_precomputed_boxes/val_boxes.json
and place it here: $mmpose/data/posetrack18/posetrack18_precomputed_boxes/val_boxes.json
to be consistent with the config. Please refer to DATA Preparation for more detail about data preparation.
MPII (CVPR’2014)¶
Deeppose + Resnet on Mpii¶
DeepPose (CVPR'2014)
@inproceedings{toshev2014deeppose,
title={Deeppose: Human pose estimation via deep neural networks},
author={Toshev, Alexander and Szegedy, Christian},
booktitle={Proceedings of the IEEE conference on computer vision and pattern recognition},
pages={1653--1660},
year={2014}
}
ResNet (CVPR'2016)
@inproceedings{he2016deep,
title={Deep residual learning for image recognition},
author={He, Kaiming and Zhang, Xiangyu and Ren, Shaoqing and Sun, Jian},
booktitle={Proceedings of the IEEE conference on computer vision and pattern recognition},
pages={770--778},
year={2016}
}
MPII (CVPR'2014)
@inproceedings{andriluka14cvpr,
author = {Mykhaylo Andriluka and Leonid Pishchulin and Peter Gehler and Schiele, Bernt},
title = {2D Human Pose Estimation: New Benchmark and State of the Art Analysis},
booktitle = {IEEE Conference on Computer Vision and Pattern Recognition (CVPR)},
year = {2014},
month = {June}
}
Results on MPII val set
Arch | Input Size | Mean | Mean@0.1 | ckpt | log |
---|---|---|---|---|---|
deeppose_resnet_50 | 256x256 | 0.825 | 0.174 | ckpt | log |
deeppose_resnet_101 | 256x256 | 0.841 | 0.193 | ckpt | log |
deeppose_resnet_152 | 256x256 | 0.850 | 0.198 | ckpt | log |
Topdown Heatmap + Resnet on Mpii¶
SimpleBaseline2D (ECCV'2018)
@inproceedings{xiao2018simple,
title={Simple baselines for human pose estimation and tracking},
author={Xiao, Bin and Wu, Haiping and Wei, Yichen},
booktitle={Proceedings of the European conference on computer vision (ECCV)},
pages={466--481},
year={2018}
}
ResNet (CVPR'2016)
@inproceedings{he2016deep,
title={Deep residual learning for image recognition},
author={He, Kaiming and Zhang, Xiangyu and Ren, Shaoqing and Sun, Jian},
booktitle={Proceedings of the IEEE conference on computer vision and pattern recognition},
pages={770--778},
year={2016}
}
MPII (CVPR'2014)
@inproceedings{andriluka14cvpr,
author = {Mykhaylo Andriluka and Leonid Pishchulin and Peter Gehler and Schiele, Bernt},
title = {2D Human Pose Estimation: New Benchmark and State of the Art Analysis},
booktitle = {IEEE Conference on Computer Vision and Pattern Recognition (CVPR)},
year = {2014},
month = {June}
}
Results on MPII val set
Arch | Input Size | Mean | Mean@0.1 | ckpt | log |
---|---|---|---|---|---|
pose_resnet_50 | 256x256 | 0.882 | 0.286 | ckpt | log |
pose_resnet_101 | 256x256 | 0.888 | 0.290 | ckpt | log |
pose_resnet_152 | 256x256 | 0.889 | 0.303 | ckpt | log |
Topdown Heatmap + Scnet on Mpii¶
SCNet (CVPR'2020)
@inproceedings{liu2020improving,
title={Improving Convolutional Networks with Self-Calibrated Convolutions},
author={Liu, Jiang-Jiang and Hou, Qibin and Cheng, Ming-Ming and Wang, Changhu and Feng, Jiashi},
booktitle={Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition},
pages={10096--10105},
year={2020}
}
MPII (CVPR'2014)
@inproceedings{andriluka14cvpr,
author = {Mykhaylo Andriluka and Leonid Pishchulin and Peter Gehler and Schiele, Bernt},
title = {2D Human Pose Estimation: New Benchmark and State of the Art Analysis},
booktitle = {IEEE Conference on Computer Vision and Pattern Recognition (CVPR)},
year = {2014},
month = {June}
}
Results on MPII val set
Arch | Input Size | Mean | Mean@0.1 | ckpt | log |
---|---|---|---|---|---|
pose_scnet_50 | 256x256 | 0.888 | 0.290 | ckpt | log |
pose_scnet_101 | 256x256 | 0.886 | 0.293 | ckpt | log |
Topdown Heatmap + Resnetv1d on Mpii¶
ResNetV1D (CVPR'2019)
@inproceedings{he2019bag,
title={Bag of tricks for image classification with convolutional neural networks},
author={He, Tong and Zhang, Zhi and Zhang, Hang and Zhang, Zhongyue and Xie, Junyuan and Li, Mu},
booktitle={Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition},
pages={558--567},
year={2019}
}
MPII (CVPR'2014)
@inproceedings{andriluka14cvpr,
author = {Mykhaylo Andriluka and Leonid Pishchulin and Peter Gehler and Schiele, Bernt},
title = {2D Human Pose Estimation: New Benchmark and State of the Art Analysis},
booktitle = {IEEE Conference on Computer Vision and Pattern Recognition (CVPR)},
year = {2014},
month = {June}
}
Results on MPII val set
Arch | Input Size | Mean | Mean@0.1 | ckpt | log |
---|---|---|---|---|---|
pose_resnetv1d_50 | 256x256 | 0.881 | 0.290 | ckpt | log |
pose_resnetv1d_101 | 256x256 | 0.883 | 0.295 | ckpt | log |
pose_resnetv1d_152 | 256x256 | 0.888 | 0.300 | ckpt | log |
Topdown Heatmap + Seresnet on Mpii¶
SEResNet (CVPR'2018)
@inproceedings{hu2018squeeze,
title={Squeeze-and-excitation networks},
author={Hu, Jie and Shen, Li and Sun, Gang},
booktitle={Proceedings of the IEEE conference on computer vision and pattern recognition},
pages={7132--7141},
year={2018}
}
MPII (CVPR'2014)
@inproceedings{andriluka14cvpr,
author = {Mykhaylo Andriluka and Leonid Pishchulin and Peter Gehler and Schiele, Bernt},
title = {2D Human Pose Estimation: New Benchmark and State of the Art Analysis},
booktitle = {IEEE Conference on Computer Vision and Pattern Recognition (CVPR)},
year = {2014},
month = {June}
}
Results on MPII val set
Arch | Input Size | Mean | Mean@0.1 | ckpt | log |
---|---|---|---|---|---|
pose_seresnet_50 | 256x256 | 0.884 | 0.292 | ckpt | log |
pose_seresnet_101 | 256x256 | 0.884 | 0.295 | ckpt | log |
pose_seresnet_152* | 256x256 | 0.884 | 0.287 | ckpt | log |
Note that * means without imagenet pre-training.
Topdown Heatmap + Shufflenetv1 on Mpii¶
ShufflenetV1 (CVPR'2018)
@inproceedings{zhang2018shufflenet,
title={Shufflenet: An extremely efficient convolutional neural network for mobile devices},
author={Zhang, Xiangyu and Zhou, Xinyu and Lin, Mengxiao and Sun, Jian},
booktitle={Proceedings of the IEEE conference on computer vision and pattern recognition},
pages={6848--6856},
year={2018}
}
MPII (CVPR'2014)
@inproceedings{andriluka14cvpr,
author = {Mykhaylo Andriluka and Leonid Pishchulin and Peter Gehler and Schiele, Bernt},
title = {2D Human Pose Estimation: New Benchmark and State of the Art Analysis},
booktitle = {IEEE Conference on Computer Vision and Pattern Recognition (CVPR)},
year = {2014},
month = {June}
}
Results on MPII val set
Arch | Input Size | Mean | Mean@0.1 | ckpt | log |
---|---|---|---|---|---|
pose_shufflenetv1 | 256x256 | 0.823 | 0.195 | ckpt | log |
Topdown Heatmap + Mobilenetv2 on Mpii¶
MobilenetV2 (CVPR'2018)
@inproceedings{sandler2018mobilenetv2,
title={Mobilenetv2: Inverted residuals and linear bottlenecks},
author={Sandler, Mark and Howard, Andrew and Zhu, Menglong and Zhmoginov, Andrey and Chen, Liang-Chieh},
booktitle={Proceedings of the IEEE conference on computer vision and pattern recognition},
pages={4510--4520},
year={2018}
}
MPII (CVPR'2014)
@inproceedings{andriluka14cvpr,
author = {Mykhaylo Andriluka and Leonid Pishchulin and Peter Gehler and Schiele, Bernt},
title = {2D Human Pose Estimation: New Benchmark and State of the Art Analysis},
booktitle = {IEEE Conference on Computer Vision and Pattern Recognition (CVPR)},
year = {2014},
month = {June}
}
Results on MPII val set
Arch | Input Size | Mean | Mean@0.1 | ckpt | log |
---|---|---|---|---|---|
pose_mobilenetv2 | 256x256 | 0.854 | 0.235 | ckpt | log |
Topdown Heatmap + CPM on Mpii¶
CPM (CVPR'2016)
@inproceedings{wei2016convolutional,
title={Convolutional pose machines},
author={Wei, Shih-En and Ramakrishna, Varun and Kanade, Takeo and Sheikh, Yaser},
booktitle={Proceedings of the IEEE conference on Computer Vision and Pattern Recognition},
pages={4724--4732},
year={2016}
}
MPII (CVPR'2014)
@inproceedings{andriluka14cvpr,
author = {Mykhaylo Andriluka and Leonid Pishchulin and Peter Gehler and Schiele, Bernt},
title = {2D Human Pose Estimation: New Benchmark and State of the Art Analysis},
booktitle = {IEEE Conference on Computer Vision and Pattern Recognition (CVPR)},
year = {2014},
month = {June}
}
Results on MPII val set
Arch | Input Size | Mean | Mean@0.1 | ckpt | log |
---|---|---|---|---|---|
cpm | 368x368 | 0.876 | 0.285 | ckpt | log |
Topdown Heatmap + Hourglass on Mpii¶
Hourglass (ECCV'2016)
@inproceedings{newell2016stacked,
title={Stacked hourglass networks for human pose estimation},
author={Newell, Alejandro and Yang, Kaiyu and Deng, Jia},
booktitle={European conference on computer vision},
pages={483--499},
year={2016},
organization={Springer}
}
MPII (CVPR'2014)
@inproceedings{andriluka14cvpr,
author = {Mykhaylo Andriluka and Leonid Pishchulin and Peter Gehler and Schiele, Bernt},
title = {2D Human Pose Estimation: New Benchmark and State of the Art Analysis},
booktitle = {IEEE Conference on Computer Vision and Pattern Recognition (CVPR)},
year = {2014},
month = {June}
}
Results on MPII val set
Arch | Input Size | Mean | Mean@0.1 | ckpt | log |
---|---|---|---|---|---|
pose_hourglass_52 | 256x256 | 0.889 | 0.317 | ckpt | log |
pose_hourglass_52 | 384x384 | 0.894 | 0.366 | ckpt | log |
Topdown Heatmap + Hrnet + Dark on Mpii¶
HRNet (CVPR'2019)
@inproceedings{sun2019deep,
title={Deep high-resolution representation learning for human pose estimation},
author={Sun, Ke and Xiao, Bin and Liu, Dong and Wang, Jingdong},
booktitle={Proceedings of the IEEE conference on computer vision and pattern recognition},
pages={5693--5703},
year={2019}
}
DarkPose (CVPR'2020)
@inproceedings{zhang2020distribution,
title={Distribution-aware coordinate representation for human pose estimation},
author={Zhang, Feng and Zhu, Xiatian and Dai, Hanbin and Ye, Mao and Zhu, Ce},
booktitle={Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition},
pages={7093--7102},
year={2020}
}
MPII (CVPR'2014)
@inproceedings{andriluka14cvpr,
author = {Mykhaylo Andriluka and Leonid Pishchulin and Peter Gehler and Schiele, Bernt},
title = {2D Human Pose Estimation: New Benchmark and State of the Art Analysis},
booktitle = {IEEE Conference on Computer Vision and Pattern Recognition (CVPR)},
year = {2014},
month = {June}
}
Results on MPII val set
Arch | Input Size | Mean | Mean@0.1 | ckpt | log |
---|---|---|---|---|---|
pose_hrnet_w32_dark | 256x256 | 0.904 | 0.354 | ckpt | log |
pose_hrnet_w48_dark | 256x256 | 0.905 | 0.360 | ckpt | log |
Topdown Heatmap + Resnext on Mpii¶
ResNext (CVPR'2017)
@inproceedings{xie2017aggregated,
title={Aggregated residual transformations for deep neural networks},
author={Xie, Saining and Girshick, Ross and Doll{\'a}r, Piotr and Tu, Zhuowen and He, Kaiming},
booktitle={Proceedings of the IEEE conference on computer vision and pattern recognition},
pages={1492--1500},
year={2017}
}
MPII (CVPR'2014)
@inproceedings{andriluka14cvpr,
author = {Mykhaylo Andriluka and Leonid Pishchulin and Peter Gehler and Schiele, Bernt},
title = {2D Human Pose Estimation: New Benchmark and State of the Art Analysis},
booktitle = {IEEE Conference on Computer Vision and Pattern Recognition (CVPR)},
year = {2014},
month = {June}
}
Results on MPII val set
Arch | Input Size | Mean | Mean@0.1 | ckpt | log |
---|---|---|---|---|---|
pose_resnext_152 | 256x256 | 0.887 | 0.294 | ckpt | log |
Topdown Heatmap + Litehrnet on Mpii¶
LiteHRNet (CVPR'2021)
@inproceedings{Yulitehrnet21,
title={Lite-HRNet: A Lightweight High-Resolution Network},
author={Yu, Changqian and Xiao, Bin and Gao, Changxin and Yuan, Lu and Zhang, Lei and Sang, Nong and Wang, Jingdong},
booktitle={CVPR},
year={2021}
}
MPII (CVPR'2014)
@inproceedings{andriluka14cvpr,
author = {Mykhaylo Andriluka and Leonid Pishchulin and Peter Gehler and Schiele, Bernt},
title = {2D Human Pose Estimation: New Benchmark and State of the Art Analysis},
booktitle = {IEEE Conference on Computer Vision and Pattern Recognition (CVPR)},
year = {2014},
month = {June}
}
Results on MPII val set
Arch | Input Size | Mean | Mean@0.1 | ckpt | log |
---|---|---|---|---|---|
LiteHRNet-18 | 256x256 | 0.859 | 0.260 | ckpt | log |
LiteHRNet-30 | 256x256 | 0.869 | 0.271 | ckpt | log |
Topdown Heatmap + Shufflenetv2 on Mpii¶
ShufflenetV2 (ECCV'2018)
@inproceedings{ma2018shufflenet,
title={Shufflenet v2: Practical guidelines for efficient cnn architecture design},
author={Ma, Ningning and Zhang, Xiangyu and Zheng, Hai-Tao and Sun, Jian},
booktitle={Proceedings of the European conference on computer vision (ECCV)},
pages={116--131},
year={2018}
}
MPII (CVPR'2014)
@inproceedings{andriluka14cvpr,
author = {Mykhaylo Andriluka and Leonid Pishchulin and Peter Gehler and Schiele, Bernt},
title = {2D Human Pose Estimation: New Benchmark and State of the Art Analysis},
booktitle = {IEEE Conference on Computer Vision and Pattern Recognition (CVPR)},
year = {2014},
month = {June}
}
Results on MPII val set
Arch | Input Size | Mean | Mean@0.1 | ckpt | log |
---|---|---|---|---|---|
pose_shufflenetv2 | 256x256 | 0.828 | 0.205 | ckpt | log |
Topdown Heatmap + Hrnet on Mpii¶
HRNet (CVPR'2019)
@inproceedings{sun2019deep,
title={Deep high-resolution representation learning for human pose estimation},
author={Sun, Ke and Xiao, Bin and Liu, Dong and Wang, Jingdong},
booktitle={Proceedings of the IEEE conference on computer vision and pattern recognition},
pages={5693--5703},
year={2019}
}
MPII (CVPR'2014)
@inproceedings{andriluka14cvpr,
author = {Mykhaylo Andriluka and Leonid Pishchulin and Peter Gehler and Schiele, Bernt},
title = {2D Human Pose Estimation: New Benchmark and State of the Art Analysis},
booktitle = {IEEE Conference on Computer Vision and Pattern Recognition (CVPR)},
year = {2014},
month = {June}
}
Results on MPII val set
Arch | Input Size | Mean | Mean@0.1 | ckpt | log |
---|---|---|---|---|---|
pose_hrnet_w32 | 256x256 | 0.900 | 0.334 | ckpt | log |
pose_hrnet_w48 | 256x256 | 0.901 | 0.337 | ckpt | log |
JHMDB (ICCV’2013)¶
Topdown Heatmap + CPM on JHMDB¶
CPM (CVPR'2016)
@inproceedings{wei2016convolutional,
title={Convolutional pose machines},
author={Wei, Shih-En and Ramakrishna, Varun and Kanade, Takeo and Sheikh, Yaser},
booktitle={Proceedings of the IEEE conference on Computer Vision and Pattern Recognition},
pages={4724--4732},
year={2016}
}
JHMDB (ICCV'2013)
@inproceedings{Jhuang:ICCV:2013,
title = {Towards understanding action recognition},
author = {H. Jhuang and J. Gall and S. Zuffi and C. Schmid and M. J. Black},
booktitle = {International Conf. on Computer Vision (ICCV)},
month = Dec,
pages = {3192-3199},
year = {2013}
}
Results on Sub-JHMDB dataset
The models are pre-trained on MPII dataset only. NO test-time augmentation (multi-scale /rotation testing) is used.
Normalized by Person Size
Split | Arch | Input Size | Head | Sho | Elb | Wri | Hip | Knee | Ank | Mean | ckpt | log |
---|---|---|---|---|---|---|---|---|---|---|---|---|
Sub1 | cpm | 368x368 | 96.1 | 91.9 | 81.0 | 78.9 | 96.6 | 90.8 | 87.3 | 89.5 | ckpt | log |
Sub2 | cpm | 368x368 | 98.1 | 93.6 | 77.1 | 70.9 | 94.0 | 89.1 | 84.7 | 87.4 | ckpt | log |
Sub3 | cpm | 368x368 | 97.9 | 94.9 | 87.3 | 84.0 | 98.6 | 94.4 | 86.2 | 92.4 | ckpt | log |
Average | cpm | 368x368 | 97.4 | 93.5 | 81.5 | 77.9 | 96.4 | 91.4 | 86.1 | 89.8 | - | - |
Normalized by Torso Size
Split | Arch | Input Size | Head | Sho | Elb | Wri | Hip | Knee | Ank | Mean | ckpt | log |
---|---|---|---|---|---|---|---|---|---|---|---|---|
Sub1 | cpm | 368x368 | 89.0 | 63.0 | 54.0 | 54.9 | 68.2 | 63.1 | 61.2 | 66.0 | ckpt | log |
Sub2 | cpm | 368x368 | 90.3 | 57.9 | 46.8 | 44.3 | 60.8 | 58.2 | 62.4 | 61.1 | ckpt | log |
Sub3 | cpm | 368x368 | 91.0 | 72.6 | 59.9 | 54.0 | 73.2 | 68.5 | 65.8 | 70.3 | ckpt | log |
Average | cpm | 368x368 | 90.1 | 64.5 | 53.6 | 51.1 | 67.4 | 63.3 | 63.1 | 65.7 | - | - |
Topdown Heatmap + Resnet on JHMDB¶
SimpleBaseline2D (ECCV'2018)
@inproceedings{xiao2018simple,
title={Simple baselines for human pose estimation and tracking},
author={Xiao, Bin and Wu, Haiping and Wei, Yichen},
booktitle={Proceedings of the European conference on computer vision (ECCV)},
pages={466--481},
year={2018}
}
ResNet (CVPR'2016)
@inproceedings{he2016deep,
title={Deep residual learning for image recognition},
author={He, Kaiming and Zhang, Xiangyu and Ren, Shaoqing and Sun, Jian},
booktitle={Proceedings of the IEEE conference on computer vision and pattern recognition},
pages={770--778},
year={2016}
}
JHMDB (ICCV'2013)
@inproceedings{Jhuang:ICCV:2013,
title = {Towards understanding action recognition},
author = {H. Jhuang and J. Gall and S. Zuffi and C. Schmid and M. J. Black},
booktitle = {International Conf. on Computer Vision (ICCV)},
month = Dec,
pages = {3192-3199},
year = {2013}
}
Results on Sub-JHMDB dataset
The models are pre-trained on MPII dataset only. NO test-time augmentation (multi-scale /rotation testing) is used.
Normalized by Person Size
Split | Arch | Input Size | Head | Sho | Elb | Wri | Hip | Knee | Ank | Mean | ckpt | log |
---|---|---|---|---|---|---|---|---|---|---|---|---|
Sub1 | pose_resnet_50 | 256x256 | 99.1 | 98.0 | 93.8 | 91.3 | 99.4 | 96.5 | 92.8 | 96.1 | ckpt | log |
Sub2 | pose_resnet_50 | 256x256 | 99.3 | 97.1 | 90.6 | 87.0 | 98.9 | 96.3 | 94.1 | 95.0 | ckpt | log |
Sub3 | pose_resnet_50 | 256x256 | 99.0 | 97.9 | 94.0 | 91.6 | 99.7 | 98.0 | 94.7 | 96.7 | ckpt | log |
Average | pose_resnet_50 | 256x256 | 99.2 | 97.7 | 92.8 | 90.0 | 99.3 | 96.9 | 93.9 | 96.0 | - | - |
Sub1 | pose_resnet_50 (2 Deconv.) | 256x256 | 99.1 | 98.5 | 94.6 | 92.0 | 99.4 | 94.6 | 92.5 | 96.1 | ckpt | log |
Sub2 | pose_resnet_50 (2 Deconv.) | 256x256 | 99.3 | 97.8 | 91.0 | 87.0 | 99.1 | 96.5 | 93.8 | 95.2 | ckpt | log |
Sub3 | pose_resnet_50 (2 Deconv.) | 256x256 | 98.8 | 98.4 | 94.3 | 92.1 | 99.8 | 97.5 | 93.8 | 96.7 | ckpt | log |
Average | pose_resnet_50 (2 Deconv.) | 256x256 | 99.1 | 98.2 | 93.3 | 90.4 | 99.4 | 96.2 | 93.4 | 96.0 | - | - |
Normalized by Torso Size
Split | Arch | Input Size | Head | Sho | Elb | Wri | Hip | Knee | Ank | Mean | ckpt | log |
---|---|---|---|---|---|---|---|---|---|---|---|---|
Sub1 | pose_resnet_50 | 256x256 | 93.3 | 83.2 | 74.4 | 72.7 | 85.0 | 81.2 | 78.9 | 81.9 | ckpt | log |
Sub2 | pose_resnet_50 | 256x256 | 94.1 | 74.9 | 64.5 | 62.5 | 77.9 | 71.9 | 78.6 | 75.5 | ckpt | log |
Sub3 | pose_resnet_50 | 256x256 | 97.0 | 82.2 | 74.9 | 70.7 | 84.7 | 83.7 | 84.2 | 82.9 | ckpt | log |
Average | pose_resnet_50 | 256x256 | 94.8 | 80.1 | 71.3 | 68.6 | 82.5 | 78.9 | 80.6 | 80.1 | - | - |
Sub1 | pose_resnet_50 (2 Deconv.) | 256x256 | 92.4 | 80.6 | 73.2 | 70.5 | 82.3 | 75.4 | 75.0 | 79.2 | ckpt | log |
Sub2 | pose_resnet_50 (2 Deconv.) | 256x256 | 93.4 | 73.6 | 63.8 | 60.5 | 75.1 | 68.4 | 75.5 | 73.7 | ckpt | log |
Sub3 | pose_resnet_50 (2 Deconv.) | 256x256 | 96.1 | 81.2 | 72.6 | 67.9 | 83.6 | 80.9 | 81.5 | 81.2 | ckpt | log |
Average | pose_resnet_50 (2 Deconv.) | 256x256 | 94.0 | 78.5 | 69.9 | 66.3 | 80.3 | 74.9 | 77.3 | 78.0 | - | - |
OneHand10K (TCSVT’2019)¶
Deeppose + Resnet on Onehand10k¶
DeepPose (CVPR'2014)
@inproceedings{toshev2014deeppose,
title={Deeppose: Human pose estimation via deep neural networks},
author={Toshev, Alexander and Szegedy, Christian},
booktitle={Proceedings of the IEEE conference on computer vision and pattern recognition},
pages={1653--1660},
year={2014}
}
ResNet (CVPR'2016)
@inproceedings{he2016deep,
title={Deep residual learning for image recognition},
author={He, Kaiming and Zhang, Xiangyu and Ren, Shaoqing and Sun, Jian},
booktitle={Proceedings of the IEEE conference on computer vision and pattern recognition},
pages={770--778},
year={2016}
}
OneHand10K (TCSVT'2019)
@article{wang2018mask,
title={Mask-pose cascaded cnn for 2d hand pose estimation from single color image},
author={Wang, Yangang and Peng, Cong and Liu, Yebin},
journal={IEEE Transactions on Circuits and Systems for Video Technology},
volume={29},
number={11},
pages={3258--3268},
year={2018},
publisher={IEEE}
}
Results on OneHand10K val set
Arch | Input Size | PCK@0.2 | AUC | EPE | ckpt | log |
---|---|---|---|---|---|---|
deeppose_resnet_50 | 256x256 | 0.990 | 0.486 | 34.28 | ckpt | log |
Topdown Heatmap + Hrnetv2 + Dark on Onehand10k¶
HRNetv2 (TPAMI'2019)
@article{WangSCJDZLMTWLX19,
title={Deep High-Resolution Representation Learning for Visual Recognition},
author={Jingdong Wang and Ke Sun and Tianheng Cheng and
Borui Jiang and Chaorui Deng and Yang Zhao and Dong Liu and Yadong Mu and
Mingkui Tan and Xinggang Wang and Wenyu Liu and Bin Xiao},
journal={TPAMI},
year={2019}
}
DarkPose (CVPR'2020)
@inproceedings{zhang2020distribution,
title={Distribution-aware coordinate representation for human pose estimation},
author={Zhang, Feng and Zhu, Xiatian and Dai, Hanbin and Ye, Mao and Zhu, Ce},
booktitle={Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition},
pages={7093--7102},
year={2020}
}
OneHand10K (TCSVT'2019)
@article{wang2018mask,
title={Mask-pose cascaded cnn for 2d hand pose estimation from single color image},
author={Wang, Yangang and Peng, Cong and Liu, Yebin},
journal={IEEE Transactions on Circuits and Systems for Video Technology},
volume={29},
number={11},
pages={3258--3268},
year={2018},
publisher={IEEE}
}
Results on OneHand10K val set
Arch | Input Size | PCK@0.2 | AUC | EPE | ckpt | log |
---|---|---|---|---|---|---|
pose_hrnetv2_w18_dark | 256x256 | 0.990 | 0.573 | 23.84 | ckpt | log |
Topdown Heatmap + Hrnetv2 on Onehand10k¶
HRNetv2 (TPAMI'2019)
@article{WangSCJDZLMTWLX19,
title={Deep High-Resolution Representation Learning for Visual Recognition},
author={Jingdong Wang and Ke Sun and Tianheng Cheng and
Borui Jiang and Chaorui Deng and Yang Zhao and Dong Liu and Yadong Mu and
Mingkui Tan and Xinggang Wang and Wenyu Liu and Bin Xiao},
journal={TPAMI},
year={2019}
}
OneHand10K (TCSVT'2019)
@article{wang2018mask,
title={Mask-pose cascaded cnn for 2d hand pose estimation from single color image},
author={Wang, Yangang and Peng, Cong and Liu, Yebin},
journal={IEEE Transactions on Circuits and Systems for Video Technology},
volume={29},
number={11},
pages={3258--3268},
year={2018},
publisher={IEEE}
}
Results on OneHand10K val set
Arch | Input Size | PCK@0.2 | AUC | EPE | ckpt | log |
---|---|---|---|---|---|---|
pose_hrnetv2_w18 | 256x256 | 0.990 | 0.568 | 24.16 | ckpt | log |
Topdown Heatmap + Hrnetv2 + Udp on Onehand10k¶
HRNetv2 (TPAMI'2019)
@article{WangSCJDZLMTWLX19,
title={Deep High-Resolution Representation Learning for Visual Recognition},
author={Jingdong Wang and Ke Sun and Tianheng Cheng and
Borui Jiang and Chaorui Deng and Yang Zhao and Dong Liu and Yadong Mu and
Mingkui Tan and Xinggang Wang and Wenyu Liu and Bin Xiao},
journal={TPAMI},
year={2019}
}
UDP (CVPR'2020)
@InProceedings{Huang_2020_CVPR,
author = {Huang, Junjie and Zhu, Zheng and Guo, Feng and Huang, Guan},
title = {The Devil Is in the Details: Delving Into Unbiased Data Processing for Human Pose Estimation},
booktitle = {The IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)},
month = {June},
year = {2020}
}
OneHand10K (TCSVT'2019)
@article{wang2018mask,
title={Mask-pose cascaded cnn for 2d hand pose estimation from single color image},
author={Wang, Yangang and Peng, Cong and Liu, Yebin},
journal={IEEE Transactions on Circuits and Systems for Video Technology},
volume={29},
number={11},
pages={3258--3268},
year={2018},
publisher={IEEE}
}
Results on OneHand10K val set
Arch | Input Size | PCK@0.2 | AUC | EPE | ckpt | log |
---|---|---|---|---|---|---|
pose_hrnetv2_w18_udp | 256x256 | 0.990 | 0.572 | 23.87 | ckpt | log |
Topdown Heatmap + Mobilenetv2 on Onehand10k¶
MobilenetV2 (CVPR'2018)
@inproceedings{sandler2018mobilenetv2,
title={Mobilenetv2: Inverted residuals and linear bottlenecks},
author={Sandler, Mark and Howard, Andrew and Zhu, Menglong and Zhmoginov, Andrey and Chen, Liang-Chieh},
booktitle={Proceedings of the IEEE conference on computer vision and pattern recognition},
pages={4510--4520},
year={2018}
}
OneHand10K (TCSVT'2019)
@article{wang2018mask,
title={Mask-pose cascaded cnn for 2d hand pose estimation from single color image},
author={Wang, Yangang and Peng, Cong and Liu, Yebin},
journal={IEEE Transactions on Circuits and Systems for Video Technology},
volume={29},
number={11},
pages={3258--3268},
year={2018},
publisher={IEEE}
}
Results on OneHand10K val set
Arch | Input Size | PCK@0.2 | AUC | EPE | ckpt | log |
---|---|---|---|---|---|---|
pose_mobilenet_v2 | 256x256 | 0.986 | 0.537 | 28.60 | ckpt | log |
Topdown Heatmap + Resnet on Onehand10k¶
SimpleBaseline2D (ECCV'2018)
@inproceedings{xiao2018simple,
title={Simple baselines for human pose estimation and tracking},
author={Xiao, Bin and Wu, Haiping and Wei, Yichen},
booktitle={Proceedings of the European conference on computer vision (ECCV)},
pages={466--481},
year={2018}
}
ResNet (CVPR'2016)
@inproceedings{he2016deep,
title={Deep residual learning for image recognition},
author={He, Kaiming and Zhang, Xiangyu and Ren, Shaoqing and Sun, Jian},
booktitle={Proceedings of the IEEE conference on computer vision and pattern recognition},
pages={770--778},
year={2016}
}
OneHand10K (TCSVT'2019)
@article{wang2018mask,
title={Mask-pose cascaded cnn for 2d hand pose estimation from single color image},
author={Wang, Yangang and Peng, Cong and Liu, Yebin},
journal={IEEE Transactions on Circuits and Systems for Video Technology},
volume={29},
number={11},
pages={3258--3268},
year={2018},
publisher={IEEE}
}
Results on OneHand10K val set
Arch | Input Size | PCK@0.2 | AUC | EPE | ckpt | log |
---|---|---|---|---|---|---|
pose_resnet_50 | 256x256 | 0.989 | 0.555 | 25.19 | ckpt | log |
Desert Locust (Elife’2019)¶
Topdown Heatmap + Resnet on Locust¶
SimpleBaseline2D (ECCV'2018)
@inproceedings{xiao2018simple,
title={Simple baselines for human pose estimation and tracking},
author={Xiao, Bin and Wu, Haiping and Wei, Yichen},
booktitle={Proceedings of the European conference on computer vision (ECCV)},
pages={466--481},
year={2018}
}
Desert Locust (Elife'2019)
@article{graving2019deepposekit,
title={DeepPoseKit, a software toolkit for fast and robust animal pose estimation using deep learning},
author={Graving, Jacob M and Chae, Daniel and Naik, Hemal and Li, Liang and Koger, Benjamin and Costelloe, Blair R and Couzin, Iain D},
journal={Elife},
volume={8},
pages={e47994},
year={2019},
publisher={eLife Sciences Publications Limited}
}
Results on Desert Locust test set
Arch | Input Size | PCK@0.2 | AUC | EPE | ckpt | log |
---|---|---|---|---|---|---|
pose_resnet_50 | 160x160 | 0.999 | 0.899 | 2.27 | ckpt | log |
pose_resnet_101 | 160x160 | 0.999 | 0.907 | 2.03 | ckpt | log |
pose_resnet_152 | 160x160 | 1.000 | 0.926 | 1.48 | ckpt | log |
AFLW (ICCVW’2011)¶
Topdown Heatmap + Hrnetv2 + Dark on Aflw¶
HRNetv2 (TPAMI'2019)
@article{WangSCJDZLMTWLX19,
title={Deep High-Resolution Representation Learning for Visual Recognition},
author={Jingdong Wang and Ke Sun and Tianheng Cheng and
Borui Jiang and Chaorui Deng and Yang Zhao and Dong Liu and Yadong Mu and
Mingkui Tan and Xinggang Wang and Wenyu Liu and Bin Xiao},
journal={TPAMI},
year={2019}
}
DarkPose (CVPR'2020)
@inproceedings{zhang2020distribution,
title={Distribution-aware coordinate representation for human pose estimation},
author={Zhang, Feng and Zhu, Xiatian and Dai, Hanbin and Ye, Mao and Zhu, Ce},
booktitle={Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition},
pages={7093--7102},
year={2020}
}
AFLW (ICCVW'2011)
@inproceedings{koestinger2011annotated,
title={Annotated facial landmarks in the wild: A large-scale, real-world database for facial landmark localization},
author={Koestinger, Martin and Wohlhart, Paul and Roth, Peter M and Bischof, Horst},
booktitle={2011 IEEE international conference on computer vision workshops (ICCV workshops)},
pages={2144--2151},
year={2011},
organization={IEEE}
}
Results on AFLW dataset
The model is trained on AFLW train and evaluated on AFLW full and frontal.
Arch | Input Size | NMEfull | NMEfrontal | ckpt | log |
---|---|---|---|---|---|
pose_hrnetv2_w18_dark | 256x256 | 1.34 | 1.20 | ckpt | log |
Topdown Heatmap + Hrnetv2 on Aflw¶
HRNetv2 (TPAMI'2019)
@article{WangSCJDZLMTWLX19,
title={Deep High-Resolution Representation Learning for Visual Recognition},
author={Jingdong Wang and Ke Sun and Tianheng Cheng and
Borui Jiang and Chaorui Deng and Yang Zhao and Dong Liu and Yadong Mu and
Mingkui Tan and Xinggang Wang and Wenyu Liu and Bin Xiao},
journal={TPAMI},
year={2019}
}
AFLW (ICCVW'2011)
@inproceedings{koestinger2011annotated,
title={Annotated facial landmarks in the wild: A large-scale, real-world database for facial landmark localization},
author={Koestinger, Martin and Wohlhart, Paul and Roth, Peter M and Bischof, Horst},
booktitle={2011 IEEE international conference on computer vision workshops (ICCV workshops)},
pages={2144--2151},
year={2011},
organization={IEEE}
}
Results on AFLW dataset
The model is trained on AFLW train and evaluated on AFLW full and frontal.
Arch | Input Size | NMEfull | NMEfrontal | ckpt | log |
---|---|---|---|---|---|
pose_hrnetv2_w18 | 256x256 | 1.41 | 1.27 | ckpt | log |
FreiHand (ICCV’2019)¶
Topdown Heatmap + Resnet on Freihand2d¶
SimpleBaseline2D (ECCV'2018)
@inproceedings{xiao2018simple,
title={Simple baselines for human pose estimation and tracking},
author={Xiao, Bin and Wu, Haiping and Wei, Yichen},
booktitle={Proceedings of the European conference on computer vision (ECCV)},
pages={466--481},
year={2018}
}
ResNet (CVPR'2016)
@inproceedings{he2016deep,
title={Deep residual learning for image recognition},
author={He, Kaiming and Zhang, Xiangyu and Ren, Shaoqing and Sun, Jian},
booktitle={Proceedings of the IEEE conference on computer vision and pattern recognition},
pages={770--778},
year={2016}
}
FreiHand (ICCV'2019)
@inproceedings{zimmermann2019freihand,
title={Freihand: A dataset for markerless capture of hand pose and shape from single rgb images},
author={Zimmermann, Christian and Ceylan, Duygu and Yang, Jimei and Russell, Bryan and Argus, Max and Brox, Thomas},
booktitle={Proceedings of the IEEE International Conference on Computer Vision},
pages={813--822},
year={2019}
}
Results on FreiHand val & test set
Set | Arch | Input Size | PCK@0.2 | AUC | EPE | ckpt | log |
---|---|---|---|---|---|---|---|
val | pose_resnet_50 | 224x224 | 0.993 | 0.868 | 3.25 | ckpt | log |
test | pose_resnet_50 | 224x224 | 0.992 | 0.868 | 3.27 | ckpt | log |
MHP (ACM MM’2018)¶
Associative Embedding + Hrnet on MHP¶
Associative Embedding (NIPS'2017)
@inproceedings{newell2017associative,
title={Associative embedding: End-to-end learning for joint detection and grouping},
author={Newell, Alejandro and Huang, Zhiao and Deng, Jia},
booktitle={Advances in neural information processing systems},
pages={2277--2287},
year={2017}
}
HRNet (CVPR'2019)
@inproceedings{sun2019deep,
title={Deep high-resolution representation learning for human pose estimation},
author={Sun, Ke and Xiao, Bin and Liu, Dong and Wang, Jingdong},
booktitle={Proceedings of the IEEE conference on computer vision and pattern recognition},
pages={5693--5703},
year={2019}
}
MHP (ACM MM'2018)
@inproceedings{zhao2018understanding,
title={Understanding humans in crowded scenes: Deep nested adversarial learning and a new benchmark for multi-human parsing},
author={Zhao, Jian and Li, Jianshu and Cheng, Yu and Sim, Terence and Yan, Shuicheng and Feng, Jiashi},
booktitle={Proceedings of the 26th ACM international conference on Multimedia},
pages={792--800},
year={2018}
}
Results on MHP v2.0 validation set without multi-scale test
Arch | Input Size | AP | AP50 | AP75 | AR | AR50 | ckpt | log |
---|---|---|---|---|---|---|---|---|
HRNet-w48 | 512x512 | 0.583 | 0.895 | 0.666 | 0.656 | 0.931 | ckpt | log |
Results on MHP v2.0 validation set with multi-scale test. 3 default scales ([2, 1, 0.5]) are used
Arch | Input Size | AP | AP50 | AP75 | AR | AR50 | ckpt | log |
---|---|---|---|---|---|---|---|---|
HRNet-w48 | 512x512 | 0.592 | 0.898 | 0.673 | 0.664 | 0.932 | ckpt | log |
Topdown Heatmap + Resnet on MHP¶
SimpleBaseline2D (ECCV'2018)
@inproceedings{xiao2018simple,
title={Simple baselines for human pose estimation and tracking},
author={Xiao, Bin and Wu, Haiping and Wei, Yichen},
booktitle={Proceedings of the European conference on computer vision (ECCV)},
pages={466--481},
year={2018}
}
ResNet (CVPR'2016)
@inproceedings{he2016deep,
title={Deep residual learning for image recognition},
author={He, Kaiming and Zhang, Xiangyu and Ren, Shaoqing and Sun, Jian},
booktitle={Proceedings of the IEEE conference on computer vision and pattern recognition},
pages={770--778},
year={2016}
}
MHP (ACM MM'2018)
@inproceedings{zhao2018understanding,
title={Understanding humans in crowded scenes: Deep nested adversarial learning and a new benchmark for multi-human parsing},
author={Zhao, Jian and Li, Jianshu and Cheng, Yu and Sim, Terence and Yan, Shuicheng and Feng, Jiashi},
booktitle={Proceedings of the 26th ACM international conference on Multimedia},
pages={792--800},
year={2018}
}
Results on MHP v2.0 val set
Arch | Input Size | AP | AP50 | AP75 | AR | AR50 | ckpt | log |
---|---|---|---|---|---|---|---|---|
pose_resnet_101 | 256x192 | 0.583 | 0.897 | 0.669 | 0.636 | 0.918 | ckpt | log |
Note that, the evaluation metric used here is mAP (adapted from COCO), which may be different from the official evaluation codes. Please be cautious if you use the results in papers.
Grévy’s Zebra (Elife’2019)¶
Topdown Heatmap + Resnet on Zebra¶
SimpleBaseline2D (ECCV'2018)
@inproceedings{xiao2018simple,
title={Simple baselines for human pose estimation and tracking},
author={Xiao, Bin and Wu, Haiping and Wei, Yichen},
booktitle={Proceedings of the European conference on computer vision (ECCV)},
pages={466--481},
year={2018}
}
Grévy’s Zebra (Elife'2019)
@article{graving2019deepposekit,
title={DeepPoseKit, a software toolkit for fast and robust animal pose estimation using deep learning},
author={Graving, Jacob M and Chae, Daniel and Naik, Hemal and Li, Liang and Koger, Benjamin and Costelloe, Blair R and Couzin, Iain D},
journal={Elife},
volume={8},
pages={e47994},
year={2019},
publisher={eLife Sciences Publications Limited}
}
Results on Grévy’s Zebra test set
Arch | Input Size | PCK@0.2 | AUC | EPE | ckpt | log |
---|---|---|---|---|---|---|
pose_resnet_50 | 160x160 | 1.000 | 0.914 | 1.86 | ckpt | log |
pose_resnet_101 | 160x160 | 1.000 | 0.916 | 1.82 | ckpt | log |
pose_resnet_152 | 160x160 | 1.000 | 0.921 | 1.66 | ckpt | log |
MacaquePose (bioRxiv’2020)¶
Topdown Heatmap + Resnet on Macaque¶
SimpleBaseline2D (ECCV'2018)
@inproceedings{xiao2018simple,
title={Simple baselines for human pose estimation and tracking},
author={Xiao, Bin and Wu, Haiping and Wei, Yichen},
booktitle={Proceedings of the European conference on computer vision (ECCV)},
pages={466--481},
year={2018}
}
MacaquePose (bioRxiv'2020)
@article{labuguen2020macaquepose,
title={MacaquePose: A novel ‘in the wild’macaque monkey pose dataset for markerless motion capture},
author={Labuguen, Rollyn and Matsumoto, Jumpei and Negrete, Salvador and Nishimaru, Hiroshi and Nishijo, Hisao and Takada, Masahiko and Go, Yasuhiro and Inoue, Ken-ichi and Shibata, Tomohiro},
journal={bioRxiv},
year={2020},
publisher={Cold Spring Harbor Laboratory}
}
Results on MacaquePose with ground-truth detection bounding boxes
Arch | Input Size | AP | AP50 | AP75 | AR | AR50 | ckpt | log |
---|---|---|---|---|---|---|---|---|
pose_resnet_50 | 256x192 | 0.799 | 0.952 | 0.919 | 0.837 | 0.964 | ckpt | log |
pose_resnet_101 | 256x192 | 0.790 | 0.953 | 0.908 | 0.828 | 0.967 | ckpt | log |
pose_resnet_152 | 256x192 | 0.794 | 0.951 | 0.915 | 0.834 | 0.968 | ckpt | log |
Topdown Heatmap + Hrnet on Macaque¶
HRNet (CVPR'2019)
@inproceedings{sun2019deep,
title={Deep high-resolution representation learning for human pose estimation},
author={Sun, Ke and Xiao, Bin and Liu, Dong and Wang, Jingdong},
booktitle={Proceedings of the IEEE conference on computer vision and pattern recognition},
pages={5693--5703},
year={2019}
}
MacaquePose (bioRxiv'2020)
@article{labuguen2020macaquepose,
title={MacaquePose: A novel ‘in the wild’macaque monkey pose dataset for markerless motion capture},
author={Labuguen, Rollyn and Matsumoto, Jumpei and Negrete, Salvador and Nishimaru, Hiroshi and Nishijo, Hisao and Takada, Masahiko and Go, Yasuhiro and Inoue, Ken-ichi and Shibata, Tomohiro},
journal={bioRxiv},
year={2020},
publisher={Cold Spring Harbor Laboratory}
}
Results on MacaquePose with ground-truth detection bounding boxes
Arch | Input Size | AP | AP50 | AP75 | AR | AR50 | ckpt | log |
---|---|---|---|---|---|---|---|---|
pose_hrnet_w32 | 256x192 | 0.814 | 0.953 | 0.918 | 0.851 | 0.969 | ckpt | log |
pose_hrnet_w48 | 256x192 | 0.818 | 0.963 | 0.917 | 0.855 | 0.971 | ckpt | log |
COCO-WholeBody (ECCV’2020)¶
Associative Embedding + Hrnet on Coco-Wholebody¶
Associative Embedding (NIPS'2017)
@inproceedings{newell2017associative,
title={Associative embedding: End-to-end learning for joint detection and grouping},
author={Newell, Alejandro and Huang, Zhiao and Deng, Jia},
booktitle={Advances in neural information processing systems},
pages={2277--2287},
year={2017}
}
HRNet (CVPR'2019)
@inproceedings{sun2019deep,
title={Deep high-resolution representation learning for human pose estimation},
author={Sun, Ke and Xiao, Bin and Liu, Dong and Wang, Jingdong},
booktitle={Proceedings of the IEEE conference on computer vision and pattern recognition},
pages={5693--5703},
year={2019}
}
COCO-WholeBody (ECCV'2020)
@inproceedings{jin2020whole,
title={Whole-Body Human Pose Estimation in the Wild},
author={Jin, Sheng and Xu, Lumin and Xu, Jin and Wang, Can and Liu, Wentao and Qian, Chen and Ouyang, Wanli and Luo, Ping},
booktitle={Proceedings of the European Conference on Computer Vision (ECCV)},
year={2020}
}
Results on COCO-WholeBody v1.0 val without multi-scale test
Arch | Input Size | Body AP | Body AR | Foot AP | Foot AR | Face AP | Face AR | Hand AP | Hand AR | Whole AP | Whole AR | ckpt | log |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|
HRNet-w32+ | 512x512 | 0.551 | 0.650 | 0.271 | 0.451 | 0.564 | 0.618 | 0.159 | 0.238 | 0.342 | 0.453 | ckpt | log |
HRNet-w48+ | 512x512 | 0.592 | 0.686 | 0.443 | 0.595 | 0.619 | 0.674 | 0.347 | 0.438 | 0.422 | 0.532 | ckpt | log |
Note: +
means the model is first pre-trained on original COCO dataset, and then fine-tuned on COCO-WholeBody dataset. We find this will lead to better performance.
Associative Embedding + Higherhrnet on Coco-Wholebody¶
Associative Embedding (NIPS'2017)
@inproceedings{newell2017associative,
title={Associative embedding: End-to-end learning for joint detection and grouping},
author={Newell, Alejandro and Huang, Zhiao and Deng, Jia},
booktitle={Advances in neural information processing systems},
pages={2277--2287},
year={2017}
}
HigherHRNet (CVPR'2020)
@inproceedings{cheng2020higherhrnet,
title={HigherHRNet: Scale-Aware Representation Learning for Bottom-Up Human Pose Estimation},
author={Cheng, Bowen and Xiao, Bin and Wang, Jingdong and Shi, Honghui and Huang, Thomas S and Zhang, Lei},
booktitle={Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition},
pages={5386--5395},
year={2020}
}
COCO-WholeBody (ECCV'2020)
@inproceedings{jin2020whole,
title={Whole-Body Human Pose Estimation in the Wild},
author={Jin, Sheng and Xu, Lumin and Xu, Jin and Wang, Can and Liu, Wentao and Qian, Chen and Ouyang, Wanli and Luo, Ping},
booktitle={Proceedings of the European Conference on Computer Vision (ECCV)},
year={2020}
}
Results on COCO-WholeBody v1.0 val without multi-scale test
Arch | Input Size | Body AP | Body AR | Foot AP | Foot AR | Face AP | Face AR | Hand AP | Hand AR | Whole AP | Whole AR | ckpt | log |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|
HigherHRNet-w32+ | 512x512 | 0.590 | 0.672 | 0.185 | 0.335 | 0.676 | 0.721 | 0.212 | 0.298 | 0.401 | 0.493 | ckpt | log |
HigherHRNet-w48+ | 512x512 | 0.630 | 0.706 | 0.440 | 0.573 | 0.730 | 0.777 | 0.389 | 0.477 | 0.487 | 0.574 | ckpt | log |
Note: +
means the model is first pre-trained on original COCO dataset, and then fine-tuned on COCO-WholeBody dataset. We find this will lead to better performance.
Topdown Heatmap + Vipnas + Dark on Coco-Wholebody¶
ViPNAS (CVPR'2021)
@article{xu2021vipnas,
title={ViPNAS: Efficient Video Pose Estimation via Neural Architecture Search},
author={Xu, Lumin and Guan, Yingda and Jin, Sheng and Liu, Wentao and Qian, Chen and Luo, Ping and Ouyang, Wanli and Wang, Xiaogang},
booktitle={Proceedings of the IEEE conference on computer vision and pattern recognition},
year={2021}
}
DarkPose (CVPR'2020)
@inproceedings{zhang2020distribution,
title={Distribution-aware coordinate representation for human pose estimation},
author={Zhang, Feng and Zhu, Xiatian and Dai, Hanbin and Ye, Mao and Zhu, Ce},
booktitle={Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition},
pages={7093--7102},
year={2020}
}
COCO-WholeBody (ECCV'2020)
@inproceedings{jin2020whole,
title={Whole-Body Human Pose Estimation in the Wild},
author={Jin, Sheng and Xu, Lumin and Xu, Jin and Wang, Can and Liu, Wentao and Qian, Chen and Ouyang, Wanli and Luo, Ping},
booktitle={Proceedings of the European Conference on Computer Vision (ECCV)},
year={2020}
}
Results on COCO-WholeBody v1.0 val with detector having human AP of 56.4 on COCO val2017 dataset
Arch | Input Size | Body AP | Body AR | Foot AP | Foot AR | Face AP | Face AR | Hand AP | Hand AR | Whole AP | Whole AR | ckpt | log |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|
S-ViPNAS-MobileNetV3_dark | 256x192 | 0.632 | 0.710 | 0.530 | 0.660 | 0.672 | 0.771 | 0.404 | 0.519 | 0.508 | 0.607 | ckpt | log |
S-ViPNAS-Res50_dark | 256x192 | 0.650 | 0.732 | 0.550 | 0.686 | 0.684 | 0.784 | 0.437 | 0.554 | 0.528 | 0.632 | ckpt | log |
Topdown Heatmap + Hrnet + Dark on Coco-Wholebody¶
HRNet (CVPR'2019)
@inproceedings{sun2019deep,
title={Deep high-resolution representation learning for human pose estimation},
author={Sun, Ke and Xiao, Bin and Liu, Dong and Wang, Jingdong},
booktitle={Proceedings of the IEEE conference on computer vision and pattern recognition},
pages={5693--5703},
year={2019}
}
DarkPose (CVPR'2020)
@inproceedings{zhang2020distribution,
title={Distribution-aware coordinate representation for human pose estimation},
author={Zhang, Feng and Zhu, Xiatian and Dai, Hanbin and Ye, Mao and Zhu, Ce},
booktitle={Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition},
pages={7093--7102},
year={2020}
}
COCO-WholeBody (ECCV'2020)
@inproceedings{jin2020whole,
title={Whole-Body Human Pose Estimation in the Wild},
author={Jin, Sheng and Xu, Lumin and Xu, Jin and Wang, Can and Liu, Wentao and Qian, Chen and Ouyang, Wanli and Luo, Ping},
booktitle={Proceedings of the European Conference on Computer Vision (ECCV)},
year={2020}
}
Results on COCO-WholeBody v1.0 val with detector having human AP of 56.4 on COCO val2017 dataset
Arch | Input Size | Body AP | Body AR | Foot AP | Foot AR | Face AP | Face AR | Hand AP | Hand AR | Whole AP | Whole AR | ckpt | log |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|
pose_hrnet_w32_dark | 256x192 | 0.694 | 0.764 | 0.565 | 0.674 | 0.736 | 0.808 | 0.503 | 0.602 | 0.582 | 0.671 | ckpt | log |
pose_hrnet_w48_dark+ | 384x288 | 0.742 | 0.807 | 0.705 | 0.804 | 0.840 | 0.892 | 0.602 | 0.694 | 0.661 | 0.743 | ckpt | log |
Note: +
means the model is first pre-trained on original COCO dataset, and then fine-tuned on COCO-WholeBody dataset. We find this will lead to better performance.
Topdown Heatmap + Hrnet on Coco-Wholebody¶
HRNet (CVPR'2019)
@inproceedings{sun2019deep,
title={Deep high-resolution representation learning for human pose estimation},
author={Sun, Ke and Xiao, Bin and Liu, Dong and Wang, Jingdong},
booktitle={Proceedings of the IEEE conference on computer vision and pattern recognition},
pages={5693--5703},
year={2019}
}
COCO-WholeBody (ECCV'2020)
@inproceedings{jin2020whole,
title={Whole-Body Human Pose Estimation in the Wild},
author={Jin, Sheng and Xu, Lumin and Xu, Jin and Wang, Can and Liu, Wentao and Qian, Chen and Ouyang, Wanli and Luo, Ping},
booktitle={Proceedings of the European Conference on Computer Vision (ECCV)},
year={2020}
}
Results on COCO-WholeBody v1.0 val with detector having human AP of 56.4 on COCO val2017 dataset
Arch | Input Size | Body AP | Body AR | Foot AP | Foot AR | Face AP | Face AR | Hand AP | Hand AR | Whole AP | Whole AR | ckpt | log |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|
pose_hrnet_w32 | 256x192 | 0.700 | 0.746 | 0.567 | 0.645 | 0.637 | 0.688 | 0.473 | 0.546 | 0.553 | 0.626 | ckpt | log |
pose_hrnet_w32 | 384x288 | 0.701 | 0.773 | 0.586 | 0.692 | 0.727 | 0.783 | 0.516 | 0.604 | 0.586 | 0.674 | ckpt | log |
pose_hrnet_w48 | 256x192 | 0.700 | 0.776 | 0.672 | 0.785 | 0.656 | 0.743 | 0.534 | 0.639 | 0.579 | 0.681 | ckpt | log |
pose_hrnet_w48 | 384x288 | 0.722 | 0.790 | 0.694 | 0.799 | 0.777 | 0.834 | 0.587 | 0.679 | 0.631 | 0.716 | ckpt | log |
Topdown Heatmap + Resnet on Coco-Wholebody¶
SimpleBaseline2D (ECCV'2018)
@inproceedings{xiao2018simple,
title={Simple baselines for human pose estimation and tracking},
author={Xiao, Bin and Wu, Haiping and Wei, Yichen},
booktitle={Proceedings of the European conference on computer vision (ECCV)},
pages={466--481},
year={2018}
}
COCO-WholeBody (ECCV'2020)
@inproceedings{jin2020whole,
title={Whole-Body Human Pose Estimation in the Wild},
author={Jin, Sheng and Xu, Lumin and Xu, Jin and Wang, Can and Liu, Wentao and Qian, Chen and Ouyang, Wanli and Luo, Ping},
booktitle={Proceedings of the European Conference on Computer Vision (ECCV)},
year={2020}
}
Results on COCO-WholeBody v1.0 val with detector having human AP of 56.4 on COCO val2017 dataset
Arch | Input Size | Body AP | Body AR | Foot AP | Foot AR | Face AP | Face AR | Hand AP | Hand AR | Whole AP | Whole AR | ckpt | log |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|
pose_resnet_50 | 256x192 | 0.652 | 0.739 | 0.614 | 0.746 | 0.608 | 0.716 | 0.460 | 0.584 | 0.520 | 0.633 | ckpt | log |
pose_resnet_50 | 384x288 | 0.666 | 0.747 | 0.635 | 0.763 | 0.732 | 0.812 | 0.537 | 0.647 | 0.573 | 0.671 | ckpt | log |
pose_resnet_101 | 256x192 | 0.670 | 0.754 | 0.640 | 0.767 | 0.611 | 0.723 | 0.463 | 0.589 | 0.533 | 0.647 | ckpt | log |
pose_resnet_101 | 384x288 | 0.692 | 0.770 | 0.680 | 0.798 | 0.747 | 0.822 | 0.549 | 0.658 | 0.597 | 0.692 | ckpt | log |
pose_resnet_152 | 256x192 | 0.682 | 0.764 | 0.662 | 0.788 | 0.624 | 0.728 | 0.482 | 0.606 | 0.548 | 0.661 | ckpt | log |
pose_resnet_152 | 384x288 | 0.703 | 0.780 | 0.693 | 0.813 | 0.751 | 0.825 | 0.559 | 0.667 | 0.610 | 0.705 | ckpt | log |
Topdown Heatmap + Vipnas on Coco-Wholebody¶
ViPNAS (CVPR'2021)
@article{xu2021vipnas,
title={ViPNAS: Efficient Video Pose Estimation via Neural Architecture Search},
author={Xu, Lumin and Guan, Yingda and Jin, Sheng and Liu, Wentao and Qian, Chen and Luo, Ping and Ouyang, Wanli and Wang, Xiaogang},
booktitle={Proceedings of the IEEE conference on computer vision and pattern recognition},
year={2021}
}
COCO-WholeBody (ECCV'2020)
@inproceedings{jin2020whole,
title={Whole-Body Human Pose Estimation in the Wild},
author={Jin, Sheng and Xu, Lumin and Xu, Jin and Wang, Can and Liu, Wentao and Qian, Chen and Ouyang, Wanli and Luo, Ping},
booktitle={Proceedings of the European Conference on Computer Vision (ECCV)},
year={2020}
}
Results on COCO-WholeBody v1.0 val with detector having human AP of 56.4 on COCO val2017 dataset
Arch | Input Size | Body AP | Body AR | Foot AP | Foot AR | Face AP | Face AR | Hand AP | Hand AR | Whole AP | Whole AR | ckpt | log |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|
S-ViPNAS-MobileNetV3 | 256x192 | 0.619 | 0.700 | 0.477 | 0.608 | 0.585 | 0.689 | 0.386 | 0.505 | 0.473 | 0.578 | ckpt | log |
S-ViPNAS-Res50 | 256x192 | 0.643 | 0.726 | 0.553 | 0.694 | 0.587 | 0.698 | 0.410 | 0.529 | 0.495 | 0.607 | ckpt | log |
MPI-INF-3DHP (3DV’2017)¶
Pose Lift + Simplebaseline3d on Mpi_inf_3dhp¶
SimpleBaseline3D (ICCV'2017)
@inproceedings{martinez_2017_3dbaseline,
title={A simple yet effective baseline for 3d human pose estimation},
author={Martinez, Julieta and Hossain, Rayat and Romero, Javier and Little, James J.},
booktitle={ICCV},
year={2017}
}
MPI-INF-3DHP (3DV'2017)
@inproceedings{mono-3dhp2017,
author = {Mehta, Dushyant and Rhodin, Helge and Casas, Dan and Fua, Pascal and Sotnychenko, Oleksandr and Xu, Weipeng and Theobalt, Christian},
title = {Monocular 3D Human Pose Estimation In The Wild Using Improved CNN Supervision},
booktitle = {3D Vision (3DV), 2017 Fifth International Conference on},
url = {http://gvv.mpi-inf.mpg.de/3dhp_dataset},
year = {2017},
organization={IEEE},
doi={10.1109/3dv.2017.00064},
}
Results on MPI-INF-3DHP dataset with ground truth 2D detections
Arch | MPJPE | P-MPJPE | 3DPCK | 3DAUC | ckpt | log |
---|---|---|---|---|---|---|
simple_baseline_3d_tcn1 | 84.3 | 53.2 | 85.0 | 52.0 | ckpt | log |
1 Differing from the original paper, we didn’t apply the max-norm constraint
because we found this led to a better convergence and performance.
Video Pose Lift + Videopose3d on Mpi_inf_3dhp¶
VideoPose3D (CVPR'2019)
@inproceedings{pavllo20193d,
title={3d human pose estimation in video with temporal convolutions and semi-supervised training},
author={Pavllo, Dario and Feichtenhofer, Christoph and Grangier, David and Auli, Michael},
booktitle={Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition},
pages={7753--7762},
year={2019}
}
MPI-INF-3DHP (3DV'2017)
@inproceedings{mono-3dhp2017,
author = {Mehta, Dushyant and Rhodin, Helge and Casas, Dan and Fua, Pascal and Sotnychenko, Oleksandr and Xu, Weipeng and Theobalt, Christian},
title = {Monocular 3D Human Pose Estimation In The Wild Using Improved CNN Supervision},
booktitle = {3D Vision (3DV), 2017 Fifth International Conference on},
url = {http://gvv.mpi-inf.mpg.de/3dhp_dataset},
year = {2017},
organization={IEEE},
doi={10.1109/3dv.2017.00064},
}
Results on MPI-INF-3DHP dataset with ground truth 2D detections, supervised training
Arch | Receptive Field | MPJPE | P-MPJPE | 3DPCK | 3DAUC | ckpt | log |
---|---|---|---|---|---|---|---|
VideoPose3D | 1 | 58.3 | 40.6 | 94.1 | 63.1 | ckpt | log |
CMU Panoptic (ICCV’2015)¶
Voxelpose + Voxelpose + Prn64x64x64 + Cpn80x80x20 + Panoptic on Panoptic¶
VoxelPose (ECCV'2020)
@inproceedings{tumultipose,
title={VoxelPose: Towards Multi-Camera 3D Human Pose Estimation in Wild Environment},
author={Tu, Hanyue and Wang, Chunyu and Zeng, Wenjun},
booktitle={ECCV},
year={2020}
}
CMU Panoptic (ICCV'2015)
@Article = {joo_iccv_2015,
author = {Hanbyul Joo, Hao Liu, Lei Tan, Lin Gui, Bart Nabbe, Iain Matthews, Takeo Kanade, Shohei Nobuhara, and Yaser Sheikh},
title = {Panoptic Studio: A Massively Multiview System for Social Motion Capture},
booktitle = {ICCV},
year = {2015}
}
Results on CMU Panoptic dataset.
Arch | mAP | mAR | MPJPE | Recall@500mm | ckpt | log |
---|---|---|---|---|---|---|
prn64_cpn80_res50 | 97.31 | 97.99 | 17.57 | 99.85 | ckpt | log |
DeepFashion (CVPR’2016)¶
Deeppose + Resnet on Deepfashion¶
DeepPose (CVPR'2014)
@inproceedings{toshev2014deeppose,
title={Deeppose: Human pose estimation via deep neural networks},
author={Toshev, Alexander and Szegedy, Christian},
booktitle={Proceedings of the IEEE conference on computer vision and pattern recognition},
pages={1653--1660},
year={2014}
}
ResNet (CVPR'2016)
@inproceedings{he2016deep,
title={Deep residual learning for image recognition},
author={He, Kaiming and Zhang, Xiangyu and Ren, Shaoqing and Sun, Jian},
booktitle={Proceedings of the IEEE conference on computer vision and pattern recognition},
pages={770--778},
year={2016}
}
DeepFashion (CVPR'2016)
@inproceedings{liuLQWTcvpr16DeepFashion,
author = {Liu, Ziwei and Luo, Ping and Qiu, Shi and Wang, Xiaogang and Tang, Xiaoou},
title = {DeepFashion: Powering Robust Clothes Recognition and Retrieval with Rich Annotations},
booktitle = {Proceedings of IEEE Conference on Computer Vision and Pattern Recognition (CVPR)},
month = {June},
year = {2016}
}
DeepFashion (ECCV'2016)
@inproceedings{liuYLWTeccv16FashionLandmark,
author = {Liu, Ziwei and Yan, Sijie and Luo, Ping and Wang, Xiaogang and Tang, Xiaoou},
title = {Fashion Landmark Detection in the Wild},
booktitle = {European Conference on Computer Vision (ECCV)},
month = {October},
year = {2016}
}
Results on DeepFashion val set
Set | Arch | Input Size | PCK@0.2 | AUC | EPE | ckpt | log |
---|---|---|---|---|---|---|---|
upper | deeppose_resnet_50 | 256x256 | 0.965 | 0.535 | 17.2 | ckpt | log |
lower | deeppose_resnet_50 | 256x256 | 0.971 | 0.678 | 11.8 | ckpt | log |
full | deeppose_resnet_50 | 256x256 | 0.983 | 0.602 | 14.0 | ckpt | log |
Topdown Heatmap + Resnet on Deepfashion¶
SimpleBaseline2D (ECCV'2018)
@inproceedings{xiao2018simple,
title={Simple baselines for human pose estimation and tracking},
author={Xiao, Bin and Wu, Haiping and Wei, Yichen},
booktitle={Proceedings of the European conference on computer vision (ECCV)},
pages={466--481},
year={2018}
}
ResNet (CVPR'2016)
@inproceedings{he2016deep,
title={Deep residual learning for image recognition},
author={He, Kaiming and Zhang, Xiangyu and Ren, Shaoqing and Sun, Jian},
booktitle={Proceedings of the IEEE conference on computer vision and pattern recognition},
pages={770--778},
year={2016}
}
DeepFashion (CVPR'2016)
@inproceedings{liuLQWTcvpr16DeepFashion,
author = {Liu, Ziwei and Luo, Ping and Qiu, Shi and Wang, Xiaogang and Tang, Xiaoou},
title = {DeepFashion: Powering Robust Clothes Recognition and Retrieval with Rich Annotations},
booktitle = {Proceedings of IEEE Conference on Computer Vision and Pattern Recognition (CVPR)},
month = {June},
year = {2016}
}
DeepFashion (ECCV'2016)
@inproceedings{liuYLWTeccv16FashionLandmark,
author = {Liu, Ziwei and Yan, Sijie and Luo, Ping and Wang, Xiaogang and Tang, Xiaoou},
title = {Fashion Landmark Detection in the Wild},
booktitle = {European Conference on Computer Vision (ECCV)},
month = {October},
year = {2016}
}
Results on DeepFashion val set
Set | Arch | Input Size | PCK@0.2 | AUC | EPE | ckpt | log |
---|---|---|---|---|---|---|---|
upper | pose_resnet_50 | 256x256 | 0.954 | 0.578 | 16.8 | ckpt | log |
lower | pose_resnet_50 | 256x256 | 0.965 | 0.744 | 10.5 | ckpt | log |
full | pose_resnet_50 | 256x256 | 0.977 | 0.664 | 12.7 | ckpt | log |
AP-10K (NeurIPS’2021)¶
Topdown Heatmap + Hrnet on Ap10k¶
HRNet (CVPR'2019)
@inproceedings{sun2019deep,
title={Deep high-resolution representation learning for human pose estimation},
author={Sun, Ke and Xiao, Bin and Liu, Dong and Wang, Jingdong},
booktitle={Proceedings of the IEEE conference on computer vision and pattern recognition},
pages={5693--5703},
year={2019}
}
AP-10K (NeurIPS'2021)
@misc{yu2021ap10k,
title={AP-10K: A Benchmark for Animal Pose Estimation in the Wild},
author={Hang Yu and Yufei Xu and Jing Zhang and Wei Zhao and Ziyu Guan and Dacheng Tao},
year={2021},
eprint={2108.12617},
archivePrefix={arXiv},
primaryClass={cs.CV}
}
Results on AP-10K validation set
Arch | Input Size | AP | AP50 | AP75 | APM | APL | ckpt | log |
---|---|---|---|---|---|---|---|---|
pose_hrnet_w32 | 256x256 | 0.738 | 0.958 | 0.808 | 0.592 | 0.743 | ckpt | log |
pose_hrnet_w48 | 256x256 | 0.744 | 0.959 | 0.807 | 0.589 | 0.748 | ckpt | log |
Topdown Heatmap + Resnet on Ap10k¶
SimpleBaseline2D (ECCV'2018)
@inproceedings{xiao2018simple,
title={Simple baselines for human pose estimation and tracking},
author={Xiao, Bin and Wu, Haiping and Wei, Yichen},
booktitle={Proceedings of the European conference on computer vision (ECCV)},
pages={466--481},
year={2018}
}
AP-10K (NeurIPS'2021)
@misc{yu2021ap10k,
title={AP-10K: A Benchmark for Animal Pose Estimation in the Wild},
author={Hang Yu and Yufei Xu and Jing Zhang and Wei Zhao and Ziyu Guan and Dacheng Tao},
year={2021},
eprint={2108.12617},
archivePrefix={arXiv},
primaryClass={cs.CV}
}
Results on AP-10K validation set
Arch | Input Size | AP | AP50 | AP75 | APM | APL | ckpt | log |
---|---|---|---|---|---|---|---|---|
pose_resnet_50 | 256x256 | 0.699 | 0.940 | 0.760 | 0.570 | 0.703 | ckpt | log |
pose_resnet_101 | 256x256 | 0.698 | 0.943 | 0.754 | 0.543 | 0.702 | ckpt | log |
300W (IMAVIS’2016)¶
Topdown Heatmap + Hrnetv2 on 300w¶
HRNetv2 (TPAMI'2019)
@article{WangSCJDZLMTWLX19,
title={Deep High-Resolution Representation Learning for Visual Recognition},
author={Jingdong Wang and Ke Sun and Tianheng Cheng and
Borui Jiang and Chaorui Deng and Yang Zhao and Dong Liu and Yadong Mu and
Mingkui Tan and Xinggang Wang and Wenyu Liu and Bin Xiao},
journal={TPAMI},
year={2019}
}
300W (IMAVIS'2016)
@article{sagonas2016300,
title={300 faces in-the-wild challenge: Database and results},
author={Sagonas, Christos and Antonakos, Epameinondas and Tzimiropoulos, Georgios and Zafeiriou, Stefanos and Pantic, Maja},
journal={Image and vision computing},
volume={47},
pages={3--18},
year={2016},
publisher={Elsevier}
}
Results on 300W dataset
The model is trained on 300W train.
Arch | Input Size | NMEcommon | NMEchallenge | NMEfull | NMEtest | ckpt | log |
---|---|---|---|---|---|---|---|
pose_hrnetv2_w18 | 256x256 | 2.86 | 5.45 | 3.37 | 3.97 | ckpt | log |
WFLW (CVPR’2018)¶
Deeppose + Resnet + Softwingloss on WFLW¶
DeepPose (CVPR'2014)
@inproceedings{toshev2014deeppose,
title={Deeppose: Human pose estimation via deep neural networks},
author={Toshev, Alexander and Szegedy, Christian},
booktitle={Proceedings of the IEEE conference on computer vision and pattern recognition},
pages={1653--1660},
year={2014}
}
ResNet (CVPR'2016)
@inproceedings{he2016deep,
title={Deep residual learning for image recognition},
author={He, Kaiming and Zhang, Xiangyu and Ren, Shaoqing and Sun, Jian},
booktitle={Proceedings of the IEEE conference on computer vision and pattern recognition},
pages={770--778},
year={2016}
}
SoftWingloss (TIP'2021)
@article{lin2021structure,
title={Structure-Coherent Deep Feature Learning for Robust Face Alignment},
author={Lin, Chunze and Zhu, Beier and Wang, Quan and Liao, Renjie and Qian, Chen and Lu, Jiwen and Zhou, Jie},
journal={IEEE Transactions on Image Processing},
year={2021},
publisher={IEEE}
}
WFLW (CVPR'2018)
@inproceedings{wu2018look,
title={Look at boundary: A boundary-aware face alignment algorithm},
author={Wu, Wayne and Qian, Chen and Yang, Shuo and Wang, Quan and Cai, Yici and Zhou, Qiang},
booktitle={Proceedings of the IEEE conference on computer vision and pattern recognition},
pages={2129--2138},
year={2018}
}
Results on WFLW dataset
The model is trained on WFLW train.
Arch | Input Size | NMEtest | NMEpose | NMEillumination | NMEocclusion | NMEblur | NMEmakeup | NMEexpression | ckpt | log |
---|---|---|---|---|---|---|---|---|---|---|
deeppose_res50_softwingloss | 256x256 | 4.41 | 7.77 | 4.37 | 5.27 | 5.01 | 4.36 | 4.70 | ckpt | log |
Deeppose + Resnet on WFLW¶
DeepPose (CVPR'2014)
@inproceedings{toshev2014deeppose,
title={Deeppose: Human pose estimation via deep neural networks},
author={Toshev, Alexander and Szegedy, Christian},
booktitle={Proceedings of the IEEE conference on computer vision and pattern recognition},
pages={1653--1660},
year={2014}
}
ResNet (CVPR'2016)
@inproceedings{he2016deep,
title={Deep residual learning for image recognition},
author={He, Kaiming and Zhang, Xiangyu and Ren, Shaoqing and Sun, Jian},
booktitle={Proceedings of the IEEE conference on computer vision and pattern recognition},
pages={770--778},
year={2016}
}
WFLW (CVPR'2018)
@inproceedings{wu2018look,
title={Look at boundary: A boundary-aware face alignment algorithm},
author={Wu, Wayne and Qian, Chen and Yang, Shuo and Wang, Quan and Cai, Yici and Zhou, Qiang},
booktitle={Proceedings of the IEEE conference on computer vision and pattern recognition},
pages={2129--2138},
year={2018}
}
Results on WFLW dataset
The model is trained on WFLW train.
Arch | Input Size | NMEtest | NMEpose | NMEillumination | NMEocclusion | NMEblur | NMEmakeup | NMEexpression | ckpt | log |
---|---|---|---|---|---|---|---|---|---|---|
deeppose_res50 | 256x256 | 4.85 | 8.50 | 4.81 | 5.69 | 5.45 | 4.82 | 5.20 | ckpt | log |
Deeppose + Resnet + Wingloss on WFLW¶
DeepPose (CVPR'2014)
@inproceedings{toshev2014deeppose,
title={Deeppose: Human pose estimation via deep neural networks},
author={Toshev, Alexander and Szegedy, Christian},
booktitle={Proceedings of the IEEE conference on computer vision and pattern recognition},
pages={1653--1660},
year={2014}
}
ResNet (CVPR'2016)
@inproceedings{he2016deep,
title={Deep residual learning for image recognition},
author={He, Kaiming and Zhang, Xiangyu and Ren, Shaoqing and Sun, Jian},
booktitle={Proceedings of the IEEE conference on computer vision and pattern recognition},
pages={770--778},
year={2016}
}
Wingloss (CVPR'2018)
@inproceedings{feng2018wing,
title={Wing Loss for Robust Facial Landmark Localisation with Convolutional Neural Networks},
author={Feng, Zhen-Hua and Kittler, Josef and Awais, Muhammad and Huber, Patrik and Wu, Xiao-Jun},
booktitle={Computer Vision and Pattern Recognition (CVPR), 2018 IEEE Conference on},
year={2018},
pages ={2235-2245},
organization={IEEE}
}
WFLW (CVPR'2018)
@inproceedings{wu2018look,
title={Look at boundary: A boundary-aware face alignment algorithm},
author={Wu, Wayne and Qian, Chen and Yang, Shuo and Wang, Quan and Cai, Yici and Zhou, Qiang},
booktitle={Proceedings of the IEEE conference on computer vision and pattern recognition},
pages={2129--2138},
year={2018}
}
Results on WFLW dataset
The model is trained on WFLW train.
Arch | Input Size | NMEtest | NMEpose | NMEillumination | NMEocclusion | NMEblur | NMEmakeup | NMEexpression | ckpt | log |
---|---|---|---|---|---|---|---|---|---|---|
deeppose_res50_wingloss | 256x256 | 4.64 | 8.25 | 4.59 | 5.56 | 5.26 | 4.59 | 5.07 | ckpt | log |
Topdown Heatmap + Hrnetv2 + Dark on WFLW¶
HRNetv2 (TPAMI'2019)
@article{WangSCJDZLMTWLX19,
title={Deep High-Resolution Representation Learning for Visual Recognition},
author={Jingdong Wang and Ke Sun and Tianheng Cheng and
Borui Jiang and Chaorui Deng and Yang Zhao and Dong Liu and Yadong Mu and
Mingkui Tan and Xinggang Wang and Wenyu Liu and Bin Xiao},
journal={TPAMI},
year={2019}
}
DarkPose (CVPR'2020)
@inproceedings{zhang2020distribution,
title={Distribution-aware coordinate representation for human pose estimation},
author={Zhang, Feng and Zhu, Xiatian and Dai, Hanbin and Ye, Mao and Zhu, Ce},
booktitle={Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition},
pages={7093--7102},
year={2020}
}
WFLW (CVPR'2018)
@inproceedings{wu2018look,
title={Look at boundary: A boundary-aware face alignment algorithm},
author={Wu, Wayne and Qian, Chen and Yang, Shuo and Wang, Quan and Cai, Yici and Zhou, Qiang},
booktitle={Proceedings of the IEEE conference on computer vision and pattern recognition},
pages={2129--2138},
year={2018}
}
Results on WFLW dataset
The model is trained on WFLW train.
Arch | Input Size | NMEtest | NMEpose | NMEillumination | NMEocclusion | NMEblur | NMEmakeup | NMEexpression | ckpt | log |
---|---|---|---|---|---|---|---|---|---|---|
pose_hrnetv2_w18_dark | 256x256 | 3.98 | 6.99 | 3.96 | 4.78 | 4.57 | 3.87 | 4.30 | ckpt | log |
Topdown Heatmap + Hrnetv2 + Awing on WFLW¶
HRNetv2 (TPAMI'2019)
@article{WangSCJDZLMTWLX19,
title={Deep High-Resolution Representation Learning for Visual Recognition},
author={Jingdong Wang and Ke Sun and Tianheng Cheng and
Borui Jiang and Chaorui Deng and Yang Zhao and Dong Liu and Yadong Mu and
Mingkui Tan and Xinggang Wang and Wenyu Liu and Bin Xiao},
journal={TPAMI},
year={2019}
}
AdaptiveWingloss (ICCV'2019)
@inproceedings{wang2019adaptive,
title={Adaptive wing loss for robust face alignment via heatmap regression},
author={Wang, Xinyao and Bo, Liefeng and Fuxin, Li},
booktitle={Proceedings of the IEEE/CVF international conference on computer vision},
pages={6971--6981},
year={2019}
}
WFLW (CVPR'2018)
@inproceedings{wu2018look,
title={Look at boundary: A boundary-aware face alignment algorithm},
author={Wu, Wayne and Qian, Chen and Yang, Shuo and Wang, Quan and Cai, Yici and Zhou, Qiang},
booktitle={Proceedings of the IEEE conference on computer vision and pattern recognition},
pages={2129--2138},
year={2018}
}
Results on WFLW dataset
The model is trained on WFLW train.
Arch | Input Size | NMEtest | NMEpose | NMEillumination | NMEocclusion | NMEblur | NMEmakeup | NMEexpression | ckpt | log |
---|---|---|---|---|---|---|---|---|---|---|
pose_hrnetv2_w18_awing | 256x256 | 4.02 | 6.94 | 3.96 | 4.78 | 4.59 | 3.85 | 4.28 | ckpt | log |
Topdown Heatmap + Hrnetv2 on WFLW¶
HRNetv2 (TPAMI'2019)
@article{WangSCJDZLMTWLX19,
title={Deep High-Resolution Representation Learning for Visual Recognition},
author={Jingdong Wang and Ke Sun and Tianheng Cheng and
Borui Jiang and Chaorui Deng and Yang Zhao and Dong Liu and Yadong Mu and
Mingkui Tan and Xinggang Wang and Wenyu Liu and Bin Xiao},
journal={TPAMI},
year={2019}
}
WFLW (CVPR'2018)
@inproceedings{wu2018look,
title={Look at boundary: A boundary-aware face alignment algorithm},
author={Wu, Wayne and Qian, Chen and Yang, Shuo and Wang, Quan and Cai, Yici and Zhou, Qiang},
booktitle={Proceedings of the IEEE conference on computer vision and pattern recognition},
pages={2129--2138},
year={2018}
}
Results on WFLW dataset
The model is trained on WFLW train.
Arch | Input Size | NMEtest | NMEpose | NMEillumination | NMEocclusion | NMEblur | NMEmakeup | NMEexpression | ckpt | log |
---|---|---|---|---|---|---|---|---|---|---|
pose_hrnetv2_w18 | 256x256 | 4.06 | 6.98 | 3.99 | 4.83 | 4.59 | 3.92 | 4.33 | ckpt | log |
Techniques¶
UDP (CVPR’2020)¶
Associative Embedding + Higherhrnet + Udp on Coco¶
Associative Embedding (NIPS'2017)
@inproceedings{newell2017associative,
title={Associative embedding: End-to-end learning for joint detection and grouping},
author={Newell, Alejandro and Huang, Zhiao and Deng, Jia},
booktitle={Advances in neural information processing systems},
pages={2277--2287},
year={2017}
}
HigherHRNet (CVPR'2020)
@inproceedings{cheng2020higherhrnet,
title={HigherHRNet: Scale-Aware Representation Learning for Bottom-Up Human Pose Estimation},
author={Cheng, Bowen and Xiao, Bin and Wang, Jingdong and Shi, Honghui and Huang, Thomas S and Zhang, Lei},
booktitle={Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition},
pages={5386--5395},
year={2020}
}
UDP (CVPR'2020)
@InProceedings{Huang_2020_CVPR,
author = {Huang, Junjie and Zhu, Zheng and Guo, Feng and Huang, Guan},
title = {The Devil Is in the Details: Delving Into Unbiased Data Processing for Human Pose Estimation},
booktitle = {The IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)},
month = {June},
year = {2020}
}
COCO (ECCV'2014)
@inproceedings{lin2014microsoft,
title={Microsoft coco: Common objects in context},
author={Lin, Tsung-Yi and Maire, Michael and Belongie, Serge and Hays, James and Perona, Pietro and Ramanan, Deva and Doll{\'a}r, Piotr and Zitnick, C Lawrence},
booktitle={European conference on computer vision},
pages={740--755},
year={2014},
organization={Springer}
}
Results on COCO val2017 without multi-scale test
Arch | Input Size | AP | AP50 | AP75 | AR | AR50 | ckpt | log |
---|---|---|---|---|---|---|---|---|
HigherHRNet-w32_udp | 512x512 | 0.678 | 0.862 | 0.736 | 0.724 | 0.890 | ckpt | log |
HigherHRNet-w48_udp | 512x512 | 0.690 | 0.872 | 0.750 | 0.734 | 0.891 | ckpt | log |
Associative Embedding + Hrnet + Udp on Coco¶
Associative Embedding (NIPS'2017)
@inproceedings{newell2017associative,
title={Associative embedding: End-to-end learning for joint detection and grouping},
author={Newell, Alejandro and Huang, Zhiao and Deng, Jia},
booktitle={Advances in neural information processing systems},
pages={2277--2287},
year={2017}
}
HRNet (CVPR'2019)
@inproceedings{sun2019deep,
title={Deep high-resolution representation learning for human pose estimation},
author={Sun, Ke and Xiao, Bin and Liu, Dong and Wang, Jingdong},
booktitle={Proceedings of the IEEE conference on computer vision and pattern recognition},
pages={5693--5703},
year={2019}
}
UDP (CVPR'2020)
@InProceedings{Huang_2020_CVPR,
author = {Huang, Junjie and Zhu, Zheng and Guo, Feng and Huang, Guan},
title = {The Devil Is in the Details: Delving Into Unbiased Data Processing for Human Pose Estimation},
booktitle = {The IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)},
month = {June},
year = {2020}
}
COCO (ECCV'2014)
@inproceedings{lin2014microsoft,
title={Microsoft coco: Common objects in context},
author={Lin, Tsung-Yi and Maire, Michael and Belongie, Serge and Hays, James and Perona, Pietro and Ramanan, Deva and Doll{\'a}r, Piotr and Zitnick, C Lawrence},
booktitle={European conference on computer vision},
pages={740--755},
year={2014},
organization={Springer}
}
Results on COCO val2017 without multi-scale test
Arch | Input Size | AP | AP50 | AP75 | AR | AR50 | ckpt | log |
---|---|---|---|---|---|---|---|---|
HRNet-w32_udp | 512x512 | 0.671 | 0.863 | 0.729 | 0.717 | 0.889 | ckpt | log |
HRNet-w48_udp | 512x512 | 0.681 | 0.872 | 0.741 | 0.725 | 0.892 | ckpt | log |
Topdown Heatmap + Hrnet + Udp on Coco¶
HRNet (CVPR'2019)
@inproceedings{sun2019deep,
title={Deep high-resolution representation learning for human pose estimation},
author={Sun, Ke and Xiao, Bin and Liu, Dong and Wang, Jingdong},
booktitle={Proceedings of the IEEE conference on computer vision and pattern recognition},
pages={5693--5703},
year={2019}
}
UDP (CVPR'2020)
@InProceedings{Huang_2020_CVPR,
author = {Huang, Junjie and Zhu, Zheng and Guo, Feng and Huang, Guan},
title = {The Devil Is in the Details: Delving Into Unbiased Data Processing for Human Pose Estimation},
booktitle = {The IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)},
month = {June},
year = {2020}
}
COCO (ECCV'2014)
@inproceedings{lin2014microsoft,
title={Microsoft coco: Common objects in context},
author={Lin, Tsung-Yi and Maire, Michael and Belongie, Serge and Hays, James and Perona, Pietro and Ramanan, Deva and Doll{\'a}r, Piotr and Zitnick, C Lawrence},
booktitle={European conference on computer vision},
pages={740--755},
year={2014},
organization={Springer}
}
Results on COCO val2017 with detector having human AP of 56.4 on COCO val2017 dataset
Arch | Input Size | AP | AP50 | AP75 | AR | AR50 | ckpt | log |
---|---|---|---|---|---|---|---|---|
pose_hrnet_w32_udp | 256x192 | 0.760 | 0.907 | 0.827 | 0.811 | 0.945 | ckpt | log |
pose_hrnet_w32_udp | 384x288 | 0.769 | 0.908 | 0.833 | 0.817 | 0.944 | ckpt | log |
pose_hrnet_w48_udp | 256x192 | 0.767 | 0.906 | 0.834 | 0.817 | 0.945 | ckpt | log |
pose_hrnet_w48_udp | 384x288 | 0.772 | 0.910 | 0.835 | 0.820 | 0.945 | ckpt | log |
pose_hrnet_w32_udp_regress | 256x192 | 0.758 | 0.908 | 0.823 | 0.812 | 0.943 | ckpt | log |
Note that, UDP also adopts the unbiased encoding/decoding algorithm of DARK.
Topdown Heatmap + Hrnetv2 + Udp on Onehand10k¶
HRNetv2 (TPAMI'2019)
@article{WangSCJDZLMTWLX19,
title={Deep High-Resolution Representation Learning for Visual Recognition},
author={Jingdong Wang and Ke Sun and Tianheng Cheng and
Borui Jiang and Chaorui Deng and Yang Zhao and Dong Liu and Yadong Mu and
Mingkui Tan and Xinggang Wang and Wenyu Liu and Bin Xiao},
journal={TPAMI},
year={2019}
}
UDP (CVPR'2020)
@InProceedings{Huang_2020_CVPR,
author = {Huang, Junjie and Zhu, Zheng and Guo, Feng and Huang, Guan},
title = {The Devil Is in the Details: Delving Into Unbiased Data Processing for Human Pose Estimation},
booktitle = {The IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)},
month = {June},
year = {2020}
}
OneHand10K (TCSVT'2019)
@article{wang2018mask,
title={Mask-pose cascaded cnn for 2d hand pose estimation from single color image},
author={Wang, Yangang and Peng, Cong and Liu, Yebin},
journal={IEEE Transactions on Circuits and Systems for Video Technology},
volume={29},
number={11},
pages={3258--3268},
year={2018},
publisher={IEEE}
}
Results on OneHand10K val set
Arch | Input Size | PCK@0.2 | AUC | EPE | ckpt | log |
---|---|---|---|---|---|---|
pose_hrnetv2_w18_udp | 256x256 | 0.990 | 0.572 | 23.87 | ckpt | log |
Topdown Heatmap + Hrnetv2 + Udp on Panoptic2d¶
HRNetv2 (TPAMI'2019)
@article{WangSCJDZLMTWLX19,
title={Deep High-Resolution Representation Learning for Visual Recognition},
author={Jingdong Wang and Ke Sun and Tianheng Cheng and
Borui Jiang and Chaorui Deng and Yang Zhao and Dong Liu and Yadong Mu and
Mingkui Tan and Xinggang Wang and Wenyu Liu and Bin Xiao},
journal={TPAMI},
year={2019}
}
UDP (CVPR'2020)
@InProceedings{Huang_2020_CVPR,
author = {Huang, Junjie and Zhu, Zheng and Guo, Feng and Huang, Guan},
title = {The Devil Is in the Details: Delving Into Unbiased Data Processing for Human Pose Estimation},
booktitle = {The IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)},
month = {June},
year = {2020}
}
CMU Panoptic HandDB (CVPR'2017)
@inproceedings{simon2017hand,
title={Hand keypoint detection in single images using multiview bootstrapping},
author={Simon, Tomas and Joo, Hanbyul and Matthews, Iain and Sheikh, Yaser},
booktitle={Proceedings of the IEEE conference on Computer Vision and Pattern Recognition},
pages={1145--1153},
year={2017}
}
Results on CMU Panoptic (MPII+NZSL val set)
Arch | Input Size | PCKh@0.7 | AUC | EPE | ckpt | log |
---|---|---|---|---|---|---|
pose_hrnetv2_w18_udp | 256x256 | 0.998 | 0.742 | 7.84 | ckpt | log |
Topdown Heatmap + Hrnetv2 + Udp on Rhd2d¶
HRNetv2 (TPAMI'2019)
@article{WangSCJDZLMTWLX19,
title={Deep High-Resolution Representation Learning for Visual Recognition},
author={Jingdong Wang and Ke Sun and Tianheng Cheng and
Borui Jiang and Chaorui Deng and Yang Zhao and Dong Liu and Yadong Mu and
Mingkui Tan and Xinggang Wang and Wenyu Liu and Bin Xiao},
journal={TPAMI},
year={2019}
}
UDP (CVPR'2020)
@InProceedings{Huang_2020_CVPR,
author = {Huang, Junjie and Zhu, Zheng and Guo, Feng and Huang, Guan},
title = {The Devil Is in the Details: Delving Into Unbiased Data Processing for Human Pose Estimation},
booktitle = {The IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)},
month = {June},
year = {2020}
}
RHD (ICCV'2017)
@TechReport{zb2017hand,
author={Christian Zimmermann and Thomas Brox},
title={Learning to Estimate 3D Hand Pose from Single RGB Images},
institution={arXiv:1705.01389},
year={2017},
note="https://arxiv.org/abs/1705.01389",
url="https://lmb.informatik.uni-freiburg.de/projects/hand3d/"
}
Results on CMU Panoptic (MPII+NZSL val set)
Arch | Input Size | PCKh@0.7 | AUC | EPE | ckpt | log |
---|---|---|---|---|---|---|
pose_hrnetv2_w18_udp | 256x256 | 0.998 | 0.742 | 7.84 | ckpt | log |
AdaptiveWingloss (ICCV’2019)¶
Topdown Heatmap + Hrnetv2 + Awing on WFLW¶
HRNetv2 (TPAMI'2019)
@article{WangSCJDZLMTWLX19,
title={Deep High-Resolution Representation Learning for Visual Recognition},
author={Jingdong Wang and Ke Sun and Tianheng Cheng and
Borui Jiang and Chaorui Deng and Yang Zhao and Dong Liu and Yadong Mu and
Mingkui Tan and Xinggang Wang and Wenyu Liu and Bin Xiao},
journal={TPAMI},
year={2019}
}
AdaptiveWingloss (ICCV'2019)
@inproceedings{wang2019adaptive,
title={Adaptive wing loss for robust face alignment via heatmap regression},
author={Wang, Xinyao and Bo, Liefeng and Fuxin, Li},
booktitle={Proceedings of the IEEE/CVF international conference on computer vision},
pages={6971--6981},
year={2019}
}
WFLW (CVPR'2018)
@inproceedings{wu2018look,
title={Look at boundary: A boundary-aware face alignment algorithm},
author={Wu, Wayne and Qian, Chen and Yang, Shuo and Wang, Quan and Cai, Yici and Zhou, Qiang},
booktitle={Proceedings of the IEEE conference on computer vision and pattern recognition},
pages={2129--2138},
year={2018}
}
Results on WFLW dataset
The model is trained on WFLW train.
Arch | Input Size | NMEtest | NMEpose | NMEillumination | NMEocclusion | NMEblur | NMEmakeup | NMEexpression | ckpt | log |
---|---|---|---|---|---|---|---|---|---|---|
pose_hrnetv2_w18_awing | 256x256 | 4.02 | 6.94 | 3.96 | 4.78 | 4.59 | 3.85 | 4.28 | ckpt | log |
Wingloss (CVPR’2018)¶
Deeppose + Resnet + Wingloss on WFLW¶
DeepPose (CVPR'2014)
@inproceedings{toshev2014deeppose,
title={Deeppose: Human pose estimation via deep neural networks},
author={Toshev, Alexander and Szegedy, Christian},
booktitle={Proceedings of the IEEE conference on computer vision and pattern recognition},
pages={1653--1660},
year={2014}
}
ResNet (CVPR'2016)
@inproceedings{he2016deep,
title={Deep residual learning for image recognition},
author={He, Kaiming and Zhang, Xiangyu and Ren, Shaoqing and Sun, Jian},
booktitle={Proceedings of the IEEE conference on computer vision and pattern recognition},
pages={770--778},
year={2016}
}
Wingloss (CVPR'2018)
@inproceedings{feng2018wing,
title={Wing Loss for Robust Facial Landmark Localisation with Convolutional Neural Networks},
author={Feng, Zhen-Hua and Kittler, Josef and Awais, Muhammad and Huber, Patrik and Wu, Xiao-Jun},
booktitle={Computer Vision and Pattern Recognition (CVPR), 2018 IEEE Conference on},
year={2018},
pages ={2235-2245},
organization={IEEE}
}
WFLW (CVPR'2018)
@inproceedings{wu2018look,
title={Look at boundary: A boundary-aware face alignment algorithm},
author={Wu, Wayne and Qian, Chen and Yang, Shuo and Wang, Quan and Cai, Yici and Zhou, Qiang},
booktitle={Proceedings of the IEEE conference on computer vision and pattern recognition},
pages={2129--2138},
year={2018}
}
Results on WFLW dataset
The model is trained on WFLW train.
Arch | Input Size | NMEtest | NMEpose | NMEillumination | NMEocclusion | NMEblur | NMEmakeup | NMEexpression | ckpt | log |
---|---|---|---|---|---|---|---|---|---|---|
deeppose_res50_wingloss | 256x256 | 4.64 | 8.25 | 4.59 | 5.56 | 5.26 | 4.59 | 5.07 | ckpt | log |
DarkPose (CVPR’2020)¶
Topdown Heatmap + Resnet + Dark on Coco¶
SimpleBaseline2D (ECCV'2018)
@inproceedings{xiao2018simple,
title={Simple baselines for human pose estimation and tracking},
author={Xiao, Bin and Wu, Haiping and Wei, Yichen},
booktitle={Proceedings of the European conference on computer vision (ECCV)},
pages={466--481},
year={2018}
}
ResNet (CVPR'2016)
@inproceedings{he2016deep,
title={Deep residual learning for image recognition},
author={He, Kaiming and Zhang, Xiangyu and Ren, Shaoqing and Sun, Jian},
booktitle={Proceedings of the IEEE conference on computer vision and pattern recognition},
pages={770--778},
year={2016}
}
DarkPose (CVPR'2020)
@inproceedings{zhang2020distribution,
title={Distribution-aware coordinate representation for human pose estimation},
author={Zhang, Feng and Zhu, Xiatian and Dai, Hanbin and Ye, Mao and Zhu, Ce},
booktitle={Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition},
pages={7093--7102},
year={2020}
}
COCO (ECCV'2014)
@inproceedings{lin2014microsoft,
title={Microsoft coco: Common objects in context},
author={Lin, Tsung-Yi and Maire, Michael and Belongie, Serge and Hays, James and Perona, Pietro and Ramanan, Deva and Doll{\'a}r, Piotr and Zitnick, C Lawrence},
booktitle={European conference on computer vision},
pages={740--755},
year={2014},
organization={Springer}
}
Results on COCO val2017 with detector having human AP of 56.4 on COCO val2017 dataset
Arch | Input Size | AP | AP50 | AP75 | AR | AR50 | ckpt | log |
---|---|---|---|---|---|---|---|---|
pose_resnet_50_dark | 256x192 | 0.724 | 0.898 | 0.800 | 0.777 | 0.936 | ckpt | log |
pose_resnet_50_dark | 384x288 | 0.735 | 0.900 | 0.801 | 0.785 | 0.937 | ckpt | log |
pose_resnet_101_dark | 256x192 | 0.732 | 0.899 | 0.808 | 0.786 | 0.938 | ckpt | log |
pose_resnet_101_dark | 384x288 | 0.749 | 0.902 | 0.816 | 0.799 | 0.939 | ckpt | log |
pose_resnet_152_dark | 256x192 | 0.745 | 0.905 | 0.821 | 0.797 | 0.942 | ckpt | log |
pose_resnet_152_dark | 384x288 | 0.757 | 0.909 | 0.826 | 0.806 | 0.943 | ckpt | log |
Topdown Heatmap + Hrnet + Dark on Coco¶
HRNet (CVPR'2019)
@inproceedings{sun2019deep,
title={Deep high-resolution representation learning for human pose estimation},
author={Sun, Ke and Xiao, Bin and Liu, Dong and Wang, Jingdong},
booktitle={Proceedings of the IEEE conference on computer vision and pattern recognition},
pages={5693--5703},
year={2019}
}
DarkPose (CVPR'2020)
@inproceedings{zhang2020distribution,
title={Distribution-aware coordinate representation for human pose estimation},
author={Zhang, Feng and Zhu, Xiatian and Dai, Hanbin and Ye, Mao and Zhu, Ce},
booktitle={Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition},
pages={7093--7102},
year={2020}
}
COCO (ECCV'2014)
@inproceedings{lin2014microsoft,
title={Microsoft coco: Common objects in context},
author={Lin, Tsung-Yi and Maire, Michael and Belongie, Serge and Hays, James and Perona, Pietro and Ramanan, Deva and Doll{\'a}r, Piotr and Zitnick, C Lawrence},
booktitle={European conference on computer vision},
pages={740--755},
year={2014},
organization={Springer}
}
Results on COCO val2017 with detector having human AP of 56.4 on COCO val2017 dataset
Arch | Input Size | AP | AP50 | AP75 | AR | AR50 | ckpt | log |
---|---|---|---|---|---|---|---|---|
pose_hrnet_w32_dark | 256x192 | 0.757 | 0.907 | 0.823 | 0.808 | 0.943 | ckpt | log |
pose_hrnet_w32_dark | 384x288 | 0.766 | 0.907 | 0.831 | 0.815 | 0.943 | ckpt | log |
pose_hrnet_w48_dark | 256x192 | 0.764 | 0.907 | 0.830 | 0.814 | 0.943 | ckpt | log |
pose_hrnet_w48_dark | 384x288 | 0.772 | 0.910 | 0.836 | 0.820 | 0.946 | ckpt | log |
Topdown Heatmap + Hrnet + Dark on Mpii¶
HRNet (CVPR'2019)
@inproceedings{sun2019deep,
title={Deep high-resolution representation learning for human pose estimation},
author={Sun, Ke and Xiao, Bin and Liu, Dong and Wang, Jingdong},
booktitle={Proceedings of the IEEE conference on computer vision and pattern recognition},
pages={5693--5703},
year={2019}
}
DarkPose (CVPR'2020)
@inproceedings{zhang2020distribution,
title={Distribution-aware coordinate representation for human pose estimation},
author={Zhang, Feng and Zhu, Xiatian and Dai, Hanbin and Ye, Mao and Zhu, Ce},
booktitle={Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition},
pages={7093--7102},
year={2020}
}
MPII (CVPR'2014)
@inproceedings{andriluka14cvpr,
author = {Mykhaylo Andriluka and Leonid Pishchulin and Peter Gehler and Schiele, Bernt},
title = {2D Human Pose Estimation: New Benchmark and State of the Art Analysis},
booktitle = {IEEE Conference on Computer Vision and Pattern Recognition (CVPR)},
year = {2014},
month = {June}
}
Results on MPII val set
Arch | Input Size | Mean | Mean@0.1 | ckpt | log |
---|---|---|---|---|---|
pose_hrnet_w32_dark | 256x256 | 0.904 | 0.354 | ckpt | log |
pose_hrnet_w48_dark | 256x256 | 0.905 | 0.360 | ckpt | log |
Topdown Heatmap + Hrnetv2 + Dark on Aflw¶
HRNetv2 (TPAMI'2019)
@article{WangSCJDZLMTWLX19,
title={Deep High-Resolution Representation Learning for Visual Recognition},
author={Jingdong Wang and Ke Sun and Tianheng Cheng and
Borui Jiang and Chaorui Deng and Yang Zhao and Dong Liu and Yadong Mu and
Mingkui Tan and Xinggang Wang and Wenyu Liu and Bin Xiao},
journal={TPAMI},
year={2019}
}
DarkPose (CVPR'2020)
@inproceedings{zhang2020distribution,
title={Distribution-aware coordinate representation for human pose estimation},
author={Zhang, Feng and Zhu, Xiatian and Dai, Hanbin and Ye, Mao and Zhu, Ce},
booktitle={Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition},
pages={7093--7102},
year={2020}
}
AFLW (ICCVW'2011)
@inproceedings{koestinger2011annotated,
title={Annotated facial landmarks in the wild: A large-scale, real-world database for facial landmark localization},
author={Koestinger, Martin and Wohlhart, Paul and Roth, Peter M and Bischof, Horst},
booktitle={2011 IEEE international conference on computer vision workshops (ICCV workshops)},
pages={2144--2151},
year={2011},
organization={IEEE}
}
Results on AFLW dataset
The model is trained on AFLW train and evaluated on AFLW full and frontal.
Arch | Input Size | NMEfull | NMEfrontal | ckpt | log |
---|---|---|---|---|---|
pose_hrnetv2_w18_dark | 256x256 | 1.34 | 1.20 | ckpt | log |
Topdown Heatmap + Hrnetv2 + Dark + Coco + Wholebody on Coco_wholebody_face¶
HRNetv2 (TPAMI'2019)
@article{WangSCJDZLMTWLX19,
title={Deep High-Resolution Representation Learning for Visual Recognition},
author={Jingdong Wang and Ke Sun and Tianheng Cheng and
Borui Jiang and Chaorui Deng and Yang Zhao and Dong Liu and Yadong Mu and
Mingkui Tan and Xinggang Wang and Wenyu Liu and Bin Xiao},
journal={TPAMI},
year={2019}
}
DarkPose (CVPR'2020)
@inproceedings{zhang2020distribution,
title={Distribution-aware coordinate representation for human pose estimation},
author={Zhang, Feng and Zhu, Xiatian and Dai, Hanbin and Ye, Mao and Zhu, Ce},
booktitle={Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition},
pages={7093--7102},
year={2020}
}
COCO-WholeBody-Face (ECCV'2020)
@inproceedings{jin2020whole,
title={Whole-Body Human Pose Estimation in the Wild},
author={Jin, Sheng and Xu, Lumin and Xu, Jin and Wang, Can and Liu, Wentao and Qian, Chen and Ouyang, Wanli and Luo, Ping},
booktitle={Proceedings of the European Conference on Computer Vision (ECCV)},
year={2020}
}
Results on COCO-WholeBody-Face val set
Arch | Input Size | NME | ckpt | log |
---|---|---|---|---|
pose_hrnetv2_w18_dark | 256x256 | 0.0513 | ckpt | log |
Topdown Heatmap + Hrnetv2 + Dark on WFLW¶
HRNetv2 (TPAMI'2019)
@article{WangSCJDZLMTWLX19,
title={Deep High-Resolution Representation Learning for Visual Recognition},
author={Jingdong Wang and Ke Sun and Tianheng Cheng and
Borui Jiang and Chaorui Deng and Yang Zhao and Dong Liu and Yadong Mu and
Mingkui Tan and Xinggang Wang and Wenyu Liu and Bin Xiao},
journal={TPAMI},
year={2019}
}
DarkPose (CVPR'2020)
@inproceedings{zhang2020distribution,
title={Distribution-aware coordinate representation for human pose estimation},
author={Zhang, Feng and Zhu, Xiatian and Dai, Hanbin and Ye, Mao and Zhu, Ce},
booktitle={Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition},
pages={7093--7102},
year={2020}
}
WFLW (CVPR'2018)
@inproceedings{wu2018look,
title={Look at boundary: A boundary-aware face alignment algorithm},
author={Wu, Wayne and Qian, Chen and Yang, Shuo and Wang, Quan and Cai, Yici and Zhou, Qiang},
booktitle={Proceedings of the IEEE conference on computer vision and pattern recognition},
pages={2129--2138},
year={2018}
}
Results on WFLW dataset
The model is trained on WFLW train.
Arch | Input Size | NMEtest | NMEpose | NMEillumination | NMEocclusion | NMEblur | NMEmakeup | NMEexpression | ckpt | log |
---|---|---|---|---|---|---|---|---|---|---|
pose_hrnetv2_w18_dark | 256x256 | 3.98 | 6.99 | 3.96 | 4.78 | 4.57 | 3.87 | 4.30 | ckpt | log |
Topdown Heatmap + Hrnetv2 + Dark + Coco + Wholebody on Coco_wholebody_hand¶
HRNetv2 (TPAMI'2019)
@article{WangSCJDZLMTWLX19,
title={Deep High-Resolution Representation Learning for Visual Recognition},
author={Jingdong Wang and Ke Sun and Tianheng Cheng and
Borui Jiang and Chaorui Deng and Yang Zhao and Dong Liu and Yadong Mu and
Mingkui Tan and Xinggang Wang and Wenyu Liu and Bin Xiao},
journal={TPAMI},
year={2019}
}
DarkPose (CVPR'2020)
@inproceedings{zhang2020distribution,
title={Distribution-aware coordinate representation for human pose estimation},
author={Zhang, Feng and Zhu, Xiatian and Dai, Hanbin and Ye, Mao and Zhu, Ce},
booktitle={Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition},
pages={7093--7102},
year={2020}
}
COCO-WholeBody-Hand (ECCV'2020)
@inproceedings{jin2020whole,
title={Whole-Body Human Pose Estimation in the Wild},
author={Jin, Sheng and Xu, Lumin and Xu, Jin and Wang, Can and Liu, Wentao and Qian, Chen and Ouyang, Wanli and Luo, Ping},
booktitle={Proceedings of the European Conference on Computer Vision (ECCV)},
year={2020}
}
Results on COCO-WholeBody-Hand val set
Arch | Input Size | PCK@0.2 | AUC | EPE | ckpt | log |
---|---|---|---|---|---|---|
pose_hrnetv2_w18_dark | 256x256 | 0.814 | 0.840 | 4.37 | ckpt | log |
Topdown Heatmap + Hrnetv2 + Dark on Onehand10k¶
HRNetv2 (TPAMI'2019)
@article{WangSCJDZLMTWLX19,
title={Deep High-Resolution Representation Learning for Visual Recognition},
author={Jingdong Wang and Ke Sun and Tianheng Cheng and
Borui Jiang and Chaorui Deng and Yang Zhao and Dong Liu and Yadong Mu and
Mingkui Tan and Xinggang Wang and Wenyu Liu and Bin Xiao},
journal={TPAMI},
year={2019}
}
DarkPose (CVPR'2020)
@inproceedings{zhang2020distribution,
title={Distribution-aware coordinate representation for human pose estimation},
author={Zhang, Feng and Zhu, Xiatian and Dai, Hanbin and Ye, Mao and Zhu, Ce},
booktitle={Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition},
pages={7093--7102},
year={2020}
}
OneHand10K (TCSVT'2019)
@article{wang2018mask,
title={Mask-pose cascaded cnn for 2d hand pose estimation from single color image},
author={Wang, Yangang and Peng, Cong and Liu, Yebin},
journal={IEEE Transactions on Circuits and Systems for Video Technology},
volume={29},
number={11},
pages={3258--3268},
year={2018},
publisher={IEEE}
}
Results on OneHand10K val set
Arch | Input Size | PCK@0.2 | AUC | EPE | ckpt | log |
---|---|---|---|---|---|---|
pose_hrnetv2_w18_dark | 256x256 | 0.990 | 0.573 | 23.84 | ckpt | log |
Topdown Heatmap + Hrnetv2 + Dark on Panoptic2d¶
HRNetv2 (TPAMI'2019)
@article{WangSCJDZLMTWLX19,
title={Deep High-Resolution Representation Learning for Visual Recognition},
author={Jingdong Wang and Ke Sun and Tianheng Cheng and
Borui Jiang and Chaorui Deng and Yang Zhao and Dong Liu and Yadong Mu and
Mingkui Tan and Xinggang Wang and Wenyu Liu and Bin Xiao},
journal={TPAMI},
year={2019}
}
DarkPose (CVPR'2020)
@inproceedings{zhang2020distribution,
title={Distribution-aware coordinate representation for human pose estimation},
author={Zhang, Feng and Zhu, Xiatian and Dai, Hanbin and Ye, Mao and Zhu, Ce},
booktitle={Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition},
pages={7093--7102},
year={2020}
}
CMU Panoptic HandDB (CVPR'2017)
@inproceedings{simon2017hand,
title={Hand keypoint detection in single images using multiview bootstrapping},
author={Simon, Tomas and Joo, Hanbyul and Matthews, Iain and Sheikh, Yaser},
booktitle={Proceedings of the IEEE conference on Computer Vision and Pattern Recognition},
pages={1145--1153},
year={2017}
}
Results on CMU Panoptic (MPII+NZSL val set)
Arch | Input Size | PCKh@0.7 | AUC | EPE | ckpt | log |
---|---|---|---|---|---|---|
pose_hrnetv2_w18_dark | 256x256 | 0.999 | 0.745 | 7.77 | ckpt | log |
Topdown Heatmap + Hrnetv2 + Dark on Rhd2d¶
HRNetv2 (TPAMI'2019)
@article{WangSCJDZLMTWLX19,
title={Deep High-Resolution Representation Learning for Visual Recognition},
author={Jingdong Wang and Ke Sun and Tianheng Cheng and
Borui Jiang and Chaorui Deng and Yang Zhao and Dong Liu and Yadong Mu and
Mingkui Tan and Xinggang Wang and Wenyu Liu and Bin Xiao},
journal={TPAMI},
year={2019}
}
DarkPose (CVPR'2020)
@inproceedings{zhang2020distribution,
title={Distribution-aware coordinate representation for human pose estimation},
author={Zhang, Feng and Zhu, Xiatian and Dai, Hanbin and Ye, Mao and Zhu, Ce},
booktitle={Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition},
pages={7093--7102},
year={2020}
}
RHD (ICCV'2017)
@TechReport{zb2017hand,
author={Christian Zimmermann and Thomas Brox},
title={Learning to Estimate 3D Hand Pose from Single RGB Images},
institution={arXiv:1705.01389},
year={2017},
note="https://arxiv.org/abs/1705.01389",
url="https://lmb.informatik.uni-freiburg.de/projects/hand3d/"
}
Results on RHD test set
Arch | Input Size | PCK@0.2 | AUC | EPE | ckpt | log |
---|---|---|---|---|---|---|
pose_hrnetv2_w18_dark | 256x256 | 0.992 | 0.903 | 2.17 | ckpt | log |
Topdown Heatmap + Vipnas + Dark on Coco-Wholebody¶
ViPNAS (CVPR'2021)
@article{xu2021vipnas,
title={ViPNAS: Efficient Video Pose Estimation via Neural Architecture Search},
author={Xu, Lumin and Guan, Yingda and Jin, Sheng and Liu, Wentao and Qian, Chen and Luo, Ping and Ouyang, Wanli and Wang, Xiaogang},
booktitle={Proceedings of the IEEE conference on computer vision and pattern recognition},
year={2021}
}
DarkPose (CVPR'2020)
@inproceedings{zhang2020distribution,
title={Distribution-aware coordinate representation for human pose estimation},
author={Zhang, Feng and Zhu, Xiatian and Dai, Hanbin and Ye, Mao and Zhu, Ce},
booktitle={Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition},
pages={7093--7102},
year={2020}
}
COCO-WholeBody (ECCV'2020)
@inproceedings{jin2020whole,
title={Whole-Body Human Pose Estimation in the Wild},
author={Jin, Sheng and Xu, Lumin and Xu, Jin and Wang, Can and Liu, Wentao and Qian, Chen and Ouyang, Wanli and Luo, Ping},
booktitle={Proceedings of the European Conference on Computer Vision (ECCV)},
year={2020}
}
Results on COCO-WholeBody v1.0 val with detector having human AP of 56.4 on COCO val2017 dataset
Arch | Input Size | Body AP | Body AR | Foot AP | Foot AR | Face AP | Face AR | Hand AP | Hand AR | Whole AP | Whole AR | ckpt | log |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|
S-ViPNAS-MobileNetV3_dark | 256x192 | 0.632 | 0.710 | 0.530 | 0.660 | 0.672 | 0.771 | 0.404 | 0.519 | 0.508 | 0.607 | ckpt | log |
S-ViPNAS-Res50_dark | 256x192 | 0.650 | 0.732 | 0.550 | 0.686 | 0.684 | 0.784 | 0.437 | 0.554 | 0.528 | 0.632 | ckpt | log |
Topdown Heatmap + Hrnet + Dark on Coco-Wholebody¶
HRNet (CVPR'2019)
@inproceedings{sun2019deep,
title={Deep high-resolution representation learning for human pose estimation},
author={Sun, Ke and Xiao, Bin and Liu, Dong and Wang, Jingdong},
booktitle={Proceedings of the IEEE conference on computer vision and pattern recognition},
pages={5693--5703},
year={2019}
}
DarkPose (CVPR'2020)
@inproceedings{zhang2020distribution,
title={Distribution-aware coordinate representation for human pose estimation},
author={Zhang, Feng and Zhu, Xiatian and Dai, Hanbin and Ye, Mao and Zhu, Ce},
booktitle={Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition},
pages={7093--7102},
year={2020}
}
COCO-WholeBody (ECCV'2020)
@inproceedings{jin2020whole,
title={Whole-Body Human Pose Estimation in the Wild},
author={Jin, Sheng and Xu, Lumin and Xu, Jin and Wang, Can and Liu, Wentao and Qian, Chen and Ouyang, Wanli and Luo, Ping},
booktitle={Proceedings of the European Conference on Computer Vision (ECCV)},
year={2020}
}
Results on COCO-WholeBody v1.0 val with detector having human AP of 56.4 on COCO val2017 dataset
Arch | Input Size | Body AP | Body AR | Foot AP | Foot AR | Face AP | Face AR | Hand AP | Hand AR | Whole AP | Whole AR | ckpt | log |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|
pose_hrnet_w32_dark | 256x192 | 0.694 | 0.764 | 0.565 | 0.674 | 0.736 | 0.808 | 0.503 | 0.602 | 0.582 | 0.671 | ckpt | log |
pose_hrnet_w48_dark+ | 384x288 | 0.742 | 0.807 | 0.705 | 0.804 | 0.840 | 0.892 | 0.602 | 0.694 | 0.661 | 0.743 | ckpt | log |
Note: +
means the model is first pre-trained on original COCO dataset, and then fine-tuned on COCO-WholeBody dataset. We find this will lead to better performance.
Topdown Heatmap + Hrnet + Dark on Halpe¶
HRNet (CVPR'2019)
@inproceedings{sun2019deep,
title={Deep high-resolution representation learning for human pose estimation},
author={Sun, Ke and Xiao, Bin and Liu, Dong and Wang, Jingdong},
booktitle={Proceedings of the IEEE conference on computer vision and pattern recognition},
pages={5693--5703},
year={2019}
}
DarkPose (CVPR'2020)
@inproceedings{zhang2020distribution,
title={Distribution-aware coordinate representation for human pose estimation},
author={Zhang, Feng and Zhu, Xiatian and Dai, Hanbin and Ye, Mao and Zhu, Ce},
booktitle={Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition},
pages={7093--7102},
year={2020}
}
Halpe (CVPR'2020)
@inproceedings{li2020pastanet,
title={PaStaNet: Toward Human Activity Knowledge Engine},
author={Li, Yong-Lu and Xu, Liang and Liu, Xinpeng and Huang, Xijie and Xu, Yue and Wang, Shiyi and Fang, Hao-Shu and Ma, Ze and Chen, Mingyang and Lu, Cewu},
booktitle={CVPR},
year={2020}
}
Results on Halpe v1.0 val with detector having human AP of 56.4 on COCO val2017 dataset
Arch | Input Size | Whole AP | Whole AR | ckpt | log |
---|---|---|---|---|---|
pose_hrnet_w48_dark+ | 384x288 | 0.531 | 0.642 | ckpt | log |
Note: +
means the model is first pre-trained on original COCO dataset, and then fine-tuned on Halpe dataset. We find this will lead to better performance.
FP16 (ArXiv’2017)¶
Topdown Heatmap + Resnet + Fp16 on Coco¶
SimpleBaseline2D (ECCV'2018)
@inproceedings{xiao2018simple,
title={Simple baselines for human pose estimation and tracking},
author={Xiao, Bin and Wu, Haiping and Wei, Yichen},
booktitle={Proceedings of the European conference on computer vision (ECCV)},
pages={466--481},
year={2018}
}
ResNet (CVPR'2016)
@inproceedings{he2016deep,
title={Deep residual learning for image recognition},
author={He, Kaiming and Zhang, Xiangyu and Ren, Shaoqing and Sun, Jian},
booktitle={Proceedings of the IEEE conference on computer vision and pattern recognition},
pages={770--778},
year={2016}
}
FP16 (ArXiv'2017)
@article{micikevicius2017mixed,
title={Mixed precision training},
author={Micikevicius, Paulius and Narang, Sharan and Alben, Jonah and Diamos, Gregory and Elsen, Erich and Garcia, David and Ginsburg, Boris and Houston, Michael and Kuchaiev, Oleksii and Venkatesh, Ganesh and others},
journal={arXiv preprint arXiv:1710.03740},
year={2017}
}
COCO (ECCV'2014)
@inproceedings{lin2014microsoft,
title={Microsoft coco: Common objects in context},
author={Lin, Tsung-Yi and Maire, Michael and Belongie, Serge and Hays, James and Perona, Pietro and Ramanan, Deva and Doll{\'a}r, Piotr and Zitnick, C Lawrence},
booktitle={European conference on computer vision},
pages={740--755},
year={2014},
organization={Springer}
}
Results on COCO val2017 with detector having human AP of 56.4 on COCO val2017 dataset
Arch | Input Size | AP | AP50 | AP75 | AR | AR50 | ckpt | log |
---|---|---|---|---|---|---|---|---|
pose_resnet_50_fp16 | 256x192 | 0.717 | 0.898 | 0.793 | 0.772 | 0.936 | ckpt | log |
Topdown Heatmap + Hrnet + Fp16 on Coco¶
HRNet (CVPR'2019)
@inproceedings{sun2019deep,
title={Deep high-resolution representation learning for human pose estimation},
author={Sun, Ke and Xiao, Bin and Liu, Dong and Wang, Jingdong},
booktitle={Proceedings of the IEEE conference on computer vision and pattern recognition},
pages={5693--5703},
year={2019}
}
FP16 (ArXiv'2017)
@article{micikevicius2017mixed,
title={Mixed precision training},
author={Micikevicius, Paulius and Narang, Sharan and Alben, Jonah and Diamos, Gregory and Elsen, Erich and Garcia, David and Ginsburg, Boris and Houston, Michael and Kuchaiev, Oleksii and Venkatesh, Ganesh and others},
journal={arXiv preprint arXiv:1710.03740},
year={2017}
}
COCO (ECCV'2014)
@inproceedings{lin2014microsoft,
title={Microsoft coco: Common objects in context},
author={Lin, Tsung-Yi and Maire, Michael and Belongie, Serge and Hays, James and Perona, Pietro and Ramanan, Deva and Doll{\'a}r, Piotr and Zitnick, C Lawrence},
booktitle={European conference on computer vision},
pages={740--755},
year={2014},
organization={Springer}
}
Results on COCO val2017 with detector having human AP of 56.4 on COCO val2017 dataset
Arch | Input Size | AP | AP50 | AP75 | AR | AR50 | ckpt | log |
---|---|---|---|---|---|---|---|---|
pose_hrnet_w32_fp16 | 256x192 | 0.746 | 0.905 | 0.88 | 0.800 | 0.943 | ckpt | log |
SoftWingloss (TIP’2021)¶
Deeppose + Resnet + Softwingloss on WFLW¶
DeepPose (CVPR'2014)
@inproceedings{toshev2014deeppose,
title={Deeppose: Human pose estimation via deep neural networks},
author={Toshev, Alexander and Szegedy, Christian},
booktitle={Proceedings of the IEEE conference on computer vision and pattern recognition},
pages={1653--1660},
year={2014}
}
ResNet (CVPR'2016)
@inproceedings{he2016deep,
title={Deep residual learning for image recognition},
author={He, Kaiming and Zhang, Xiangyu and Ren, Shaoqing and Sun, Jian},
booktitle={Proceedings of the IEEE conference on computer vision and pattern recognition},
pages={770--778},
year={2016}
}
SoftWingloss (TIP'2021)
@article{lin2021structure,
title={Structure-Coherent Deep Feature Learning for Robust Face Alignment},
author={Lin, Chunze and Zhu, Beier and Wang, Quan and Liao, Renjie and Qian, Chen and Lu, Jiwen and Zhou, Jie},
journal={IEEE Transactions on Image Processing},
year={2021},
publisher={IEEE}
}
WFLW (CVPR'2018)
@inproceedings{wu2018look,
title={Look at boundary: A boundary-aware face alignment algorithm},
author={Wu, Wayne and Qian, Chen and Yang, Shuo and Wang, Quan and Cai, Yici and Zhou, Qiang},
booktitle={Proceedings of the IEEE conference on computer vision and pattern recognition},
pages={2129--2138},
year={2018}
}
Results on WFLW dataset
The model is trained on WFLW train.
Arch | Input Size | NMEtest | NMEpose | NMEillumination | NMEocclusion | NMEblur | NMEmakeup | NMEexpression | ckpt | log |
---|---|---|---|---|---|---|---|---|---|---|
deeppose_res50_softwingloss | 256x256 | 4.41 | 7.77 | 4.37 | 5.27 | 5.01 | 4.36 | 4.70 | ckpt | log |
Albumentations (Information’2020)¶
Topdown Heatmap + Hrnet + Augmentation on Coco¶
HRNet (CVPR'2019)
@inproceedings{sun2019deep,
title={Deep high-resolution representation learning for human pose estimation},
author={Sun, Ke and Xiao, Bin and Liu, Dong and Wang, Jingdong},
booktitle={Proceedings of the IEEE conference on computer vision and pattern recognition},
pages={5693--5703},
year={2019}
}
Albumentations (Information'2020)
@article{buslaev2020albumentations,
title={Albumentations: fast and flexible image augmentations},
author={Buslaev, Alexander and Iglovikov, Vladimir I and Khvedchenya, Eugene and Parinov, Alex and Druzhinin, Mikhail and Kalinin, Alexandr A},
journal={Information},
volume={11},
number={2},
pages={125},
year={2020},
publisher={Multidisciplinary Digital Publishing Institute}
}
COCO (ECCV'2014)
@inproceedings{lin2014microsoft,
title={Microsoft coco: Common objects in context},
author={Lin, Tsung-Yi and Maire, Michael and Belongie, Serge and Hays, James and Perona, Pietro and Ramanan, Deva and Doll{\'a}r, Piotr and Zitnick, C Lawrence},
booktitle={European conference on computer vision},
pages={740--755},
year={2014},
organization={Springer}
}
Results on COCO val2017 with detector having human AP of 56.4 on COCO val2017 dataset
Arch | Input Size | AP | AP50 | AP75 | AR | AR50 | ckpt | log |
---|---|---|---|---|---|---|---|---|
coarsedropout | 256x192 | 0.753 | 0.908 | 0.822 | 0.806 | 0.946 | ckpt | log |
gridmask | 256x192 | 0.752 | 0.906 | 0.825 | 0.804 | 0.943 | ckpt | log |
photometric | 256x192 | 0.753 | 0.909 | 0.825 | 0.805 | 0.943 | ckpt | log |
教程 0: 模型配置文件¶
我们使用 python 文件作为配置文件,将模块化设计和继承设计结合到配置系统中,便于进行各种实验。
您可以在 $MMPose/configs
下找到所有提供的配置。如果要检查配置文件,您可以运行
python tools/analysis/print_config.py /PATH/TO/CONFIG
来查看完整的配置。
通过脚本参数修改配置
配置文件命名约定
配置系统
常见问题
在配置中使用中间变量
通过脚本参数修改配置¶
当使用 “tools/train.py” 或 “tools/test.py” 提交作业时,您可以指定 --cfg-options
来修改配置。
更新配置字典链的键值。
可以按照原始配置文件中字典的键的顺序指定配置选项。 例如,
--cfg-options model.backbone.norm_eval=False
将主干网络中的所有 BN 模块更改为train
模式。更新配置列表内部的键值。
一些配置字典在配置文件中会形成一个列表。例如,训练流水线
data.train.pipeline
通常是一个列表。 例如,[dict(type='LoadImageFromFile'), dict(type='TopDownRandomFlip', flip_prob=0.5), ...]
。如果要将流水线中的'flip_prob=0.5'
更改为'flip_prob=0.0'
,您可以这样指定--cfg-options data.train.pipeline.1.flip_prob=0.0
。更新列表 / 元组的值。
如果要更新的值是列表或元组,例如,配置文件通常设置为
workflow=[('train', 1)]
。 如果您想更改这个键,您可以这样指定--cfg-options workflow="[(train,1),(val,1)]"
。 请注意,引号 ” 是必要的,以支持列表 / 元组数据类型,并且指定值的引号内 不允许 有空格。
配置文件命名约定¶
我们按照下面的样式命名配置文件。建议贡献者也遵循同样的风格。
configs/{topic}/{task}/{algorithm}/{dataset}/{backbone}_[model_setting]_{dataset}_[input_size]_[technique].py
{xxx}
是必填字段,[yyy]
是可选字段.
{topic}
: 主题类型,如body
,face
,hand
,animal
等。{task}
: 任务类型,[2d | 3d]_[kpt | mesh]_[sview | mview]_[rgb | rgbd]_[img | vid]
。任务类型从5个维度定义:(1)二维或三维姿态估计;(2)姿态表示形式:关键点 (kpt)、网格 (mesh) 或密集姿态 (dense); (3)单视图 (sview) 或多视图 (mview);(4)RGB 或 RGBD; 以及(5)图像 (img) 或视频 (vid)。例如,2d_kpt_sview_rgb_img
,3d_kpt_sview_rgb_vid
, 等等。{algorithm}
: 算法类型,例如,associative_embedding
,deeppose
等。{dataset}
: 数据集名称,例如,coco
等。{backbone}
: 主干网络类型,例如,res50
(ResNet-50) 等。[model setting]
: 对某些模型的特定设置。[input_size]
: 模型的输入大小。[technique]
: 一些特定的技术,包括损失函数,数据增强,训练技巧等,例如,wingloss
,udp
,fp16
等.
配置系统¶
基于热图的二维自顶向下的人体姿态估计实例
为了帮助用户对完整的配置结构和配置系统中的模块有一个基本的了解, 我们下面对配置文件 ‘https://github.com/open-mmlab/mmpose/tree/e1ec589884235bee875c89102170439a991f8450/configs/top_down/resnet/coco/res50_coco_256x192.py’ 作简要的注释。 有关每个模块中每个参数的更详细用法和替代方法,请参阅 API 文档。
# 运行设置 log_level = 'INFO' # 日志记录级别 load_from = None # 从给定路径加载预训练模型 resume_from = None # 从给定路径恢复模型权重文件,将从保存模型权重文件时的轮次开始继续训练 dist_params = dict(backend='nccl') # 设置分布式训练的参数,也可以设置端口 workflow = [('train', 1)] # 运行程序的工作流。[('train', 1)] 表示只有一个工作流,名为 'train' 的工作流执行一次 checkpoint_config = dict( # 设置模型权重文件钩子的配置,请参阅 https://github.com/open-mmlab/mmcv/blob/master/mmcv/runner/hooks/checkpoint.py 的实现 interval=10) # 保存模型权重文件的间隔 evaluation = dict( # 训练期间评估的配置 interval=10, # 执行评估的间隔 metric='mAP', # 采用的评价指标 key_indicator='AP') # 将 `AP` 设置为关键指标以保存最佳模型权重文件 # 优化器 optimizer = dict( # 用于构建优化器的配置,支持 (1). PyTorch 中的所有优化器, # 其参数也与 PyTorch 中的相同. (2). 自定义的优化器 # 它们通过 `constructor` 构建,可参阅 "tutorials/4_new_modules.md" # 的实现。 type='Adam', # 优化器的类型, 可参阅 https://github.com/open-mmlab/mmcv/blob/master/mmcv/runner/optimizer/default_constructor.py#L13 获取更多细节 lr=5e-4, # 学习率, 参数的详细用法见 PyTorch 文档 ) optimizer_config = dict(grad_clip=None) # 不限制梯度的范围 # 学习率调整策略 lr_config = dict( # 用于注册 LrUpdater 钩子的学习率调度器的配置 policy='step', # 调整策略, 还支持 CosineAnnealing, Cyclic, 等等,请参阅 https://github.com/open-mmlab/mmcv/blob/master/mmcv/runner/hooks/lr_updater.py#L9 获取支持的 LrUpdater 细节 warmup='linear', # 使用的预热类型,它可以是 None (不使用预热), 'constant', 'linear' 或者 'exp'. warmup_iters=500, # 预热的迭代次数或者轮数 warmup_ratio=0.001, # 预热开始时使用的学习率,等于预热比 (warmup_ratio) * 初始学习率 step=[170, 200]) # 降低学习率的步数 total_epochs = 210 # 训练模型的总轮数 log_config = dict( # 注册日志记录器钩子的配置 interval=50, # 打印日志的间隔 hooks=[ dict(type='TextLoggerHook'), # 用来记录训练过程的日志记录器 # dict(type='TensorboardLoggerHook') # 也支持 Tensorboard 日志记录器 ]) channel_cfg = dict( num_output_channels=17, # 关键点头部的输出通道数 dataset_joints=17, # 数据集的关节数 dataset_channel=[ # 数据集支持的通道数 [0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16], ], inference_channel=[ # 输出通道数 0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16 ]) # 模型设置 model = dict( # 模型的配置 type='TopDown', # 模型的类型 pretrained='torchvision://resnet50', # 预训练模型的 url / 网址 backbone=dict( # 主干网络的字典 type='ResNet', # 主干网络的名称 depth=50), # ResNet 模型的深度 keypoint_head=dict( # 关键点头部的字典 type='TopdownHeatmapSimpleHead', # 关键点头部的名称 in_channels=2048, # 关键点头部的输入通道数 out_channels=channel_cfg['num_output_channels'], # 关键点头部的输出通道数 loss_keypoint=dict( # 关键点损失函数的字典 type='JointsMSELoss', # 关键点损失函数的名称 use_target_weight=True)), # 在损失计算中是否考虑目标权重 train_cfg=dict(), # 训练超参数的配置 test_cfg=dict( # 测试超参数的配置 flip_test=True, # 推断时是否使用翻转测试 post_process='default', # 使用“默认” (default) 后处理方法。 shift_heatmap=True, # 移动并对齐翻转的热图以获得更高的性能 modulate_kernel=11)) # 用于调制的高斯核大小。仅用于 "post_process='unbiased'" data_cfg = dict( image_size=[192, 256], # 模型输入分辨率的大小 heatmap_size=[48, 64], # 输出热图的大小 num_output_channels=channel_cfg['num_output_channels'], # 输出通道数 num_joints=channel_cfg['dataset_joints'], # 关节点数量 dataset_channel=channel_cfg['dataset_channel'], # 数据集支持的通道数 inference_channel=channel_cfg['inference_channel'], # 输出通道数 soft_nms=False, # 推理过程中是否执行 soft_nms nms_thr=1.0, # 非极大抑制阈值 oks_thr=0.9, # nms 期间 oks(对象关键点相似性)得分阈值 vis_thr=0.2, # 关键点可见性阈值 use_gt_bbox=False, # 测试时是否使用人工标注的边界框 det_bbox_thr=0.0, # 检测到的边界框分数的阈值。当 'use_gt_bbox=True' 时使用 bbox_file='data/coco/person_detection_results/' # 边界框检测文件的路径 'COCO_val2017_detections_AP_H_56_person.json', ) train_pipeline = [ dict(type='LoadImageFromFile'), # 从文件加载图像 dict(type='TopDownRandomFlip', # 执行随机翻转增强 flip_prob=0.5), # 执行翻转的概率 dict( type='TopDownHalfBodyTransform', # TopDownHalfBodyTransform 数据增强的配置 num_joints_half_body=8, # 执行半身变换的阈值 prob_half_body=0.3), # 执行翻转的概率 dict( type='TopDownGetRandomScaleRotation', # TopDownGetRandomScaleRotation 的配置 rot_factor=40, # 旋转到 ``[-2*rot_factor, 2*rot_factor]``. scale_factor=0.5), # 缩放到 ``[1-scale_factor, 1+scale_factor]``. dict(type='TopDownAffine', # 对图像进行仿射变换形成输入 use_udp=False), # 不使用无偏数据处理 dict(type='ToTensor'), # 将其他类型转换为张量类型流水线 dict( type='NormalizeTensor', # 标准化输入张量 mean=[0.485, 0.456, 0.406], # 要标准化的不同通道的平均值 std=[0.229, 0.224, 0.225]), # 要标准化的不同通道的标准差 dict(type='TopDownGenerateTarget', # 生成热图目标。支持不同的编码类型 sigma=2), # 热图高斯的 Sigma dict( type='Collect', # 收集决定数据中哪些键应该传递到检测器的流水线 keys=['img', 'target', 'target_weight'], # 输入键 meta_keys=[ # 输入的元键 'image_file', 'joints_3d', 'joints_3d_visible', 'center', 'scale', 'rotation', 'bbox_score', 'flip_pairs' ]), ] val_pipeline = [ dict(type='LoadImageFromFile'), # 从文件加载图像 dict(type='TopDownAffine'), # 对图像进行仿射变换形成输入 dict(type='ToTensor'), # ToTensor 的配置 dict( type='NormalizeTensor', mean=[0.485, 0.456, 0.406], # 要标准化的不同通道的平均值 std=[0.229, 0.224, 0.225]), # 要标准化的不同通道的标准差 dict( type='Collect', # 收集决定数据中哪些键应该传递到检测器的流水线 keys=['img'], # 输入键 meta_keys=[ # 输入的元键 'image_file', 'center', 'scale', 'rotation', 'bbox_score', 'flip_pairs' ]), ] test_pipeline = val_pipeline data_root = 'data/coco' # 数据集的配置 data = dict( samples_per_gpu=64, # 训练期间每个 GPU 的 Batch size workers_per_gpu=2, # 每个 GPU 预取数据的 worker 个数 val_dataloader=dict(samples_per_gpu=32), # 验证期间每个 GPU 的 Batch size test_dataloader=dict(samples_per_gpu=32), # 测试期间每个 GPU 的 Batch size train=dict( # 训练数据集的配置 type='TopDownCocoDataset', # 数据集的名称 ann_file=f'{data_root}/annotations/person_keypoints_train2017.json', # 标注文件的路径 img_prefix=f'{data_root}/train2017/', data_cfg=data_cfg, pipeline=train_pipeline), val=dict( # 验证数据集的配置 type='TopDownCocoDataset', # 数据集的名称 ann_file=f'{data_root}/annotations/person_keypoints_val2017.json', # 标注文件的路径 img_prefix=f'{data_root}/val2017/', data_cfg=data_cfg, pipeline=val_pipeline), test=dict( # 测试数据集的配置 type='TopDownCocoDataset', # 数据集的名称 ann_file=f'{data_root}/annotations/person_keypoints_val2017.json', # 标注文件的路径 img_prefix=f'{data_root}/val2017/', data_cfg=data_cfg, pipeline=val_pipeline), )
教程 1:如何微调模型¶
在 COCO 数据集上进行预训练,然后在其他数据集(如 COCO-WholeBody 数据集)上进行微调,往往可以提升模型的效果。 本教程介绍如何使用模型库中的预训练模型,并在其他数据集上进行微调。
概要
修改 Head
修改数据集
修改训练策略
使用预训练模型
概要¶
对新数据集上的模型微调需要两个步骤:
支持新数据集。详情参见 教程 2:如何增加新数据集
修改配置文件。这部分将在本教程中做具体讨论。
例如,如果想要在自定义数据集上,微调 COCO 预训练的模型,则需要修改 配置文件 中 网络头、数据集、训练策略、预训练模型四个部分。
修改网络头¶
如果自定义数据集的关键点个数,与 COCO 不同,则需要相应修改 keypoint_head
中的 out_channels
参数。
网络头(head)的最后一层的预训练参数不会被载入,而其他层的参数都会被正常载入。
例如,COCO-WholeBody 拥有 133 个关键点,因此需要把 17 (COCO 数据集的关键点数目) 改为 133。
channel_cfg = dict(
num_output_channels=133, # 从 17 改为 133
dataset_joints=133, # 从 17 改为 133
dataset_channel=[
list(range(133)), # 从 17 改为 133
],
inference_channel=list(range(133))) # 从 17 改为 133
# model settings
model = dict(
type='TopDown',
pretrained='https://download.openmmlab.com/mmpose/'
'pretrain_models/hrnet_w48-8ef0771d.pth',
backbone=dict(
type='HRNet',
in_channels=3,
extra=dict(
stage1=dict(
num_modules=1,
num_branches=1,
block='BOTTLENECK',
num_blocks=(4, ),
num_channels=(64, )),
stage2=dict(
num_modules=1,
num_branches=2,
block='BASIC',
num_blocks=(4, 4),
num_channels=(48, 96)),
stage3=dict(
num_modules=4,
num_branches=3,
block='BASIC',
num_blocks=(4, 4, 4),
num_channels=(48, 96, 192)),
stage4=dict(
num_modules=3,
num_branches=4,
block='BASIC',
num_blocks=(4, 4, 4, 4),
num_channels=(48, 96, 192, 384))),
),
keypoint_head=dict(
type='TopdownHeatmapSimpleHead',
in_channels=48,
out_channels=channel_cfg['num_output_channels'], # 已对应修改
num_deconv_layers=0,
extra=dict(final_conv_kernel=1, ),
loss_keypoint=dict(type='JointsMSELoss', use_target_weight=True)),
train_cfg=dict(),
test_cfg=dict(
flip_test=True,
post_process='unbiased',
shift_heatmap=True,
modulate_kernel=17))
其中, pretrained='https://download.openmmlab.com/mmpose/pretrain_models/hrnet_w48-8ef0771d.pth'
表示采用 ImageNet 预训练的权重,初始化主干网络(backbone)。
不过,pretrained
只会初始化主干网络(backbone),而不会初始化网络头(head)。因此,我们模型微调时的预训练权重一般通过 load_from
指定,而不是使用 pretrained
指定。
支持自己的数据集¶
MMPose 支持十余种不同的数据集,包括 COCO, COCO-WholeBody, MPII, MPII-TRB 等数据集。 用户可将自定义数据集转换为已有数据集格式,并修改如下字段。
data_root = 'data/coco'
data = dict(
samples_per_gpu=32,
workers_per_gpu=2,
val_dataloader=dict(samples_per_gpu=32),
test_dataloader=dict(samples_per_gpu=32),
train=dict(
type='TopDownCocoWholeBodyDataset', # 对应修改数据集名称
ann_file=f'{data_root}/annotations/coco_wholebody_train_v1.0.json', # 修改数据集标签路径
img_prefix=f'{data_root}/train2017/',
data_cfg=data_cfg,
pipeline=train_pipeline),
val=dict(
type='TopDownCocoWholeBodyDataset', # 对应修改数据集名称
ann_file=f'{data_root}/annotations/coco_wholebody_val_v1.0.json', # 修改数据集标签路径
img_prefix=f'{data_root}/val2017/',
data_cfg=data_cfg,
pipeline=val_pipeline),
test=dict(
type='TopDownCocoWholeBodyDataset', # 对应修改数据集名称
ann_file=f'{data_root}/annotations/coco_wholebody_val_v1.0.json', # 修改数据集标签路径
img_prefix=f'{data_root}/val2017/',
data_cfg=data_cfg,
pipeline=val_pipeline)
)
修改训练策略¶
通常情况下,微调模型时设置较小的学习率和训练轮数,即可取得较好效果。
# 优化器
optimizer = dict(
type='Adam',
lr=5e-4, # 可以适当减小
)
optimizer_config = dict(grad_clip=None)
# 学习策略
lr_config = dict(
policy='step',
warmup='linear',
warmup_iters=500,
warmup_ratio=0.001,
step=[170, 200]) # 可以适当减小
total_epochs = 210 # 可以适当减小
使用预训练模型¶
网络设置中的 pretrained
,仅会在主干网络模型上加载预训练参数。若要载入整个网络的预训练参数,需要通过 load_from
指定模型文件路径或模型链接。
# 将预训练模型用于整个 HRNet 网络
load_from = 'https://download.openmmlab.com/mmpose/top_down/hrnet/hrnet_w48_coco_384x288_dark-741844ba_20200812.pth' # 模型路径可以在 model zoo 中找到
教程 2: 增加新的数据集¶
将数据集转化为COCO格式¶
我们首先需要将自定义数据集,转换为COCO数据集格式。
COCO数据集格式的json标注文件有以下关键字:
'images': [
{
'file_name': '000000001268.jpg',
'height': 427,
'width': 640,
'id': 1268
},
...
],
'annotations': [
{
'segmentation': [[426.36,
...
424.34,
223.3]],
'keypoints': [0,0,0,
0,0,0,
0,0,0,
427,220,2,
443,222,2,
414,228,2,
449,232,2,
408,248,1,
454,261,2,
0,0,0,
0,0,0,
411,287,2,
431,287,2,
0,0,0,
458,265,2,
0,0,0,
466,300,1],
'num_keypoints': 10,
'area': 3894.5826,
'iscrowd': 0,
'image_id': 1268,
'bbox': [402.34, 205.02, 65.26, 88.45],
'category_id': 1,
'id': 215218
},
...
],
'categories': [
{'id': 1, 'name': 'person'},
]
Json文件中必须包含以下三个关键字:
images
: 包含图片信息的列表,提供图片的file_name
,height
,width
和id
等信息。annotations
: 包含实例标注的列表。categories
: 包含类别名称 (’person’) 和对应的 ID (1)。
为自定义数据集创建 dataset_info 数据集配置文件¶
在如下位置,添加一个数据集配置文件。
configs/_base_/datasets/custom.py
数据集配置文件的样例如下:
keypoint_info
包含每个关键点的信息,其中:
name
: 代表关键点的名称。一个数据集的每个关键点,名称必须唯一。id
: 关键点的标识号。color
: ([R, G, B]) 用于可视化关键点。type
: 分为 ‘upper’ 和 ‘lower’ 两种,用于数据增强。swap
: 表示与当前关键点,“镜像对称”的关键点名称。
skeleton_info
包含关键点之间的连接关系,主要用于可视化。
joint_weights
可以为不同的关键点设置不同的损失权重,用于训练。
sigmas
用于计算 OKS 得分,具体内容请参考 keypoints-eval。
dataset_info = dict(
dataset_name='coco',
paper_info=dict(
author='Lin, Tsung-Yi and Maire, Michael and '
'Belongie, Serge and Hays, James and '
'Perona, Pietro and Ramanan, Deva and '
r'Doll{\'a}r, Piotr and Zitnick, C Lawrence',
title='Microsoft coco: Common objects in context',
container='European conference on computer vision',
year='2014',
homepage='http://cocodataset.org/',
),
keypoint_info={
0:
dict(name='nose', id=0, color=[51, 153, 255], type='upper', swap=''),
1:
dict(
name='left_eye',
id=1,
color=[51, 153, 255],
type='upper',
swap='right_eye'),
2:
dict(
name='right_eye',
id=2,
color=[51, 153, 255],
type='upper',
swap='left_eye'),
3:
dict(
name='left_ear',
id=3,
color=[51, 153, 255],
type='upper',
swap='right_ear'),
4:
dict(
name='right_ear',
id=4,
color=[51, 153, 255],
type='upper',
swap='left_ear'),
5:
dict(
name='left_shoulder',
id=5,
color=[0, 255, 0],
type='upper',
swap='right_shoulder'),
6:
dict(
name='right_shoulder',
id=6,
color=[255, 128, 0],
type='upper',
swap='left_shoulder'),
7:
dict(
name='left_elbow',
id=7,
color=[0, 255, 0],
type='upper',
swap='right_elbow'),
8:
dict(
name='right_elbow',
id=8,
color=[255, 128, 0],
type='upper',
swap='left_elbow'),
9:
dict(
name='left_wrist',
id=9,
color=[0, 255, 0],
type='upper',
swap='right_wrist'),
10:
dict(
name='right_wrist',
id=10,
color=[255, 128, 0],
type='upper',
swap='left_wrist'),
11:
dict(
name='left_hip',
id=11,
color=[0, 255, 0],
type='lower',
swap='right_hip'),
12:
dict(
name='right_hip',
id=12,
color=[255, 128, 0],
type='lower',
swap='left_hip'),
13:
dict(
name='left_knee',
id=13,
color=[0, 255, 0],
type='lower',
swap='right_knee'),
14:
dict(
name='right_knee',
id=14,
color=[255, 128, 0],
type='lower',
swap='left_knee'),
15:
dict(
name='left_ankle',
id=15,
color=[0, 255, 0],
type='lower',
swap='right_ankle'),
16:
dict(
name='right_ankle',
id=16,
color=[255, 128, 0],
type='lower',
swap='left_ankle')
},
skeleton_info={
0:
dict(link=('left_ankle', 'left_knee'), id=0, color=[0, 255, 0]),
1:
dict(link=('left_knee', 'left_hip'), id=1, color=[0, 255, 0]),
2:
dict(link=('right_ankle', 'right_knee'), id=2, color=[255, 128, 0]),
3:
dict(link=('right_knee', 'right_hip'), id=3, color=[255, 128, 0]),
4:
dict(link=('left_hip', 'right_hip'), id=4, color=[51, 153, 255]),
5:
dict(link=('left_shoulder', 'left_hip'), id=5, color=[51, 153, 255]),
6:
dict(link=('right_shoulder', 'right_hip'), id=6, color=[51, 153, 255]),
7:
dict(
link=('left_shoulder', 'right_shoulder'),
id=7,
color=[51, 153, 255]),
8:
dict(link=('left_shoulder', 'left_elbow'), id=8, color=[0, 255, 0]),
9:
dict(
link=('right_shoulder', 'right_elbow'), id=9, color=[255, 128, 0]),
10:
dict(link=('left_elbow', 'left_wrist'), id=10, color=[0, 255, 0]),
11:
dict(link=('right_elbow', 'right_wrist'), id=11, color=[255, 128, 0]),
12:
dict(link=('left_eye', 'right_eye'), id=12, color=[51, 153, 255]),
13:
dict(link=('nose', 'left_eye'), id=13, color=[51, 153, 255]),
14:
dict(link=('nose', 'right_eye'), id=14, color=[51, 153, 255]),
15:
dict(link=('left_eye', 'left_ear'), id=15, color=[51, 153, 255]),
16:
dict(link=('right_eye', 'right_ear'), id=16, color=[51, 153, 255]),
17:
dict(link=('left_ear', 'left_shoulder'), id=17, color=[51, 153, 255]),
18:
dict(
link=('right_ear', 'right_shoulder'), id=18, color=[51, 153, 255])
},
joint_weights=[
1., 1., 1., 1., 1., 1., 1., 1.2, 1.2, 1.5, 1.5, 1., 1., 1.2, 1.2, 1.5,
1.5
],
sigmas=[
0.026, 0.025, 0.025, 0.035, 0.035, 0.079, 0.079, 0.072, 0.072, 0.062,
0.062, 0.107, 0.107, 0.087, 0.087, 0.089, 0.089
])
创建自定义数据集类¶
首先在 mmpose/datasets/datasets 文件夹创建一个包,比如命名为 custom。
定义数据集类,并且注册这个类。
@DATASETS.register_module(name='MyCustomDataset') class MyCustomDataset(SomeOtherBaseClassAsPerYourNeed):
为你的自定义类别创建
mmpose/datasets/datasets/custom/__init__.py
更新
mmpose/datasets/__init__.py
创建和修改训练配置文件¶
创建和修改训练配置文件,来使用你的自定义数据集。
在 configs/my_custom_config.py
中,修改如下几行。
...
# dataset settings
dataset_type = 'MyCustomDataset'
...
data = dict(
samples_per_gpu=2,
workers_per_gpu=2,
train=dict(
type=dataset_type,
ann_file='path/to/your/train/json',
img_prefix='path/to/your/train/img',
...),
val=dict(
type=dataset_type,
ann_file='path/to/your/val/json',
img_prefix='path/to/your/val/img',
...),
test=dict(
type=dataset_type,
ann_file='path/to/your/test/json',
img_prefix='path/to/your/test/img',
...))
...
教程 3: 自定义数据前处理流水线¶
设计数据前处理流水线¶
参照惯例,MMPose 使用 Dataset
和 DataLoader
实现多进程数据加载。
Dataset
返回一个字典,作为模型的输入。
由于姿态估计任务的数据大小不一定相同(图片大小,边界框大小等),MMPose 使用 MMCV 中的 DataContainer
收集和分配不同大小的数据。
详情可见此处。
数据前处理流水线和数据集是相互独立的。 通常,数据集定义如何处理标注文件,而数据前处理流水线将原始数据处理成网络输入。 数据前处理流水线包含一系列操作。 每个操作都输入一个字典(dict),新增/更新/删除相关字段,最终输出更新后的字典作为下一个操作的输入。
数据前处理流水线的操作可以被分类为数据加载、预处理、格式化和生成监督等(后文将详细介绍)。
这里以 Simple Baseline (ResNet50) 的数据前处理流水线为例:
train_pipeline = [
dict(type='LoadImageFromFile'),
dict(type='TopDownRandomFlip', flip_prob=0.5),
dict(type='TopDownHalfBodyTransform', num_joints_half_body=8, prob_half_body=0.3),
dict(type='TopDownGetRandomScaleRotation', rot_factor=40, scale_factor=0.5),
dict(type='TopDownAffine'),
dict(type='ToTensor'),
dict(
type='NormalizeTensor',
mean=[0.485, 0.456, 0.406],
std=[0.229, 0.224, 0.225]),
dict(type='TopDownGenerateTarget', sigma=2),
dict(
type='Collect',
keys=['img', 'target', 'target_weight'],
meta_keys=[
'image_file', 'joints_3d', 'joints_3d_visible', 'center', 'scale',
'rotation', 'bbox_score', 'flip_pairs'
]),
]
val_pipeline = [
dict(type='LoadImageFromFile'),
dict(type='TopDownAffine'),
dict(type='ToTensor'),
dict(
type='NormalizeTensor',
mean=[0.485, 0.456, 0.406],
std=[0.229, 0.224, 0.225]),
dict(
type='Collect',
keys=['img'],
meta_keys=[
'image_file', 'center', 'scale', 'rotation', 'bbox_score',
'flip_pairs'
]),
]
下面列出每个操作新增/更新/删除的相关字典字段。
预处理¶
TopDownRandomFlip
更新: img, joints_3d, joints_3d_visible, center
TopDownHalfBodyTransform
更新: center, scale
TopDownGetRandomScaleRotation
更新: scale, rotation
TopDownAffine
更新: img, joints_3d, joints_3d_visible
NormalizeTensor
更新: img
扩展和使用自定义流水线¶
将一个新的处理流水线操作写入任一文件中,例如
my_pipeline.py
。它以一个字典作为输入,并返回一个更新后的字典。from mmpose.datasets import PIPELINES @PIPELINES.register_module() class MyTransform: def __call__(self, results): results['dummy'] = True return results
导入定义好的新类。
from .my_pipeline import MyTransform
在配置文件中使用它。
train_pipeline = [ dict(type='LoadImageFromFile'), dict(type='TopDownRandomFlip', flip_prob=0.5), dict(type='TopDownHalfBodyTransform', num_joints_half_body=8, prob_half_body=0.3), dict(type='TopDownGetRandomScaleRotation', rot_factor=40, scale_factor=0.5), dict(type='TopDownAffine'), dict(type='MyTransform'), dict(type='ToTensor'), dict( type='NormalizeTensor', mean=[0.485, 0.456, 0.406], std=[0.229, 0.224, 0.225]), dict(type='TopDownGenerateTarget', sigma=2), dict( type='Collect', keys=['img', 'target', 'target_weight'], meta_keys=[ 'image_file', 'joints_3d', 'joints_3d_visible', 'center', 'scale', 'rotation', 'bbox_score', 'flip_pairs' ]), ]
教程 4: 增加新的模块¶
自定义优化器¶
在本教程中,我们将介绍如何为项目定制优化器.
假设想要添加一个名为 MyOptimizer
的优化器,它有 a
,b
和 c
三个参数。
那么首先需要在一个文件中实现该优化器,例如 mmpose/core/optimizer/my_optimizer.py
:
from mmcv.runner import OPTIMIZERS
from torch.optim import Optimizer
@OPTIMIZERS.register_module()
class MyOptimizer(Optimizer):
def __init__(self, a, b, c)
然后需要将其添加到 mmpose/core/optimizer/__init__.py
中,从而让注册器可以找到这个新的优化器并添加它:
from .my_optimizer import MyOptimizer
之后,可以在配置文件的 optimizer
字段中使用 MyOptimizer
。
在配置中,优化器由 optimizer
字段所定义,如下所示:
optimizer = dict(type='SGD', lr=0.02, momentum=0.9, weight_decay=0.0001)
若要使用自己新定义的优化器,可以将字段修改为:
optimizer = dict(type='MyOptimizer', a=a_value, b=b_value, c=c_value)
我们已经支持使用 PyTorch 实现的所有优化器,
只需要更改配置文件的 optimizer
字段。
例如:若用户想要使用ADAM
优化器,只需要做出如下修改,虽然这会造成网络效果下降。
optimizer = dict(type='Adam', lr=0.0003, weight_decay=0.0001)
用户可以直接根据 PyTorch API 文档 对参数进行设置。
自定义优化器构造器¶
某些模型可能对不同层的参数有特定的优化设置,例如 BatchNorm 层的权值衰减。 用户可以通过自定义优化器构造函数来进行这些细粒度的参数调整。
from mmcv.utils import build_from_cfg
from mmcv.runner import OPTIMIZER_BUILDERS, OPTIMIZERS
from mmpose.utils import get_root_logger
from .cocktail_optimizer import CocktailOptimizer
@OPTIMIZER_BUILDERS.register_module()
class CocktailOptimizerConstructor:
def __init__(self, optimizer_cfg, paramwise_cfg=None):
def __call__(self, model):
return my_optimizer
开发新组件¶
MMPose 将模型组件分为 3 种基础模型:
检测器(detector):整个检测器模型流水线,通常包含一个主干网络(backbone)和关键点头(keypoint_head)。
主干网络(backbone):通常为一个用于提取特征的 FCN 网络,例如 ResNet,HRNet。
关键点头(keypoint_head):用于姿势估计的组件,通常包括一系列反卷积层。
创建一个新文件
mmpose/models/backbones/my_model.py
.
import torch.nn as nn
from ..builder import BACKBONES
@BACKBONES.register_module()
class MyModel(nn.Module):
def __init__(self, arg1, arg2):
pass
def forward(self, x): # should return a tuple
pass
def init_weights(self, pretrained=None):
pass
在
mmpose/models/backbones/__init__.py
中导入新的主干网络.
from .my_model import MyModel
创建一个新文件
mmpose/models/keypoint_heads/my_head.py
.
用户可以通过继承 nn.Module
编写一个新的关键点头,
并重写 init_weights(self)
和 forward(self, x)
方法。
from ..builder import HEADS
@HEADS.register_module()
class MyHead(nn.Module):
def __init__(self, arg1, arg2):
pass
def forward(self, x):
pass
def init_weights(self):
pass
在
mmpose/models/keypoint_heads/__init__.py
中导入新的关键点头
from .my_head import MyHead
在配置文件中使用它。
对于自顶向下的 2D 姿态估计模型,我们将模型类型设置为 TopDown
。
model = dict(
type='TopDown',
backbone=dict(
type='MyModel',
arg1=xxx,
arg2=xxx),
keypoint_head=dict(
type='MyHead',
arg1=xxx,
arg2=xxx))
添加新的损失函数¶
假设用户想要为关键点估计添加一个名为 MyLoss
的新损失函数。
为了添加一个新的损失函数,用户需要在 mmpose/models/losses/my_loss.py
下实现该函数。
其中,装饰器 weighted_loss
使损失函数能够为每个元素加权。
import torch
import torch.nn as nn
from mmpose.models import LOSSES
def my_loss(pred, target):
assert pred.size() == target.size() and target.numel() > 0
loss = torch.abs(pred - target)
loss = torch.mean(loss)
return loss
@LOSSES.register_module()
class MyLoss(nn.Module):
def __init__(self, use_target_weight=False):
super(MyLoss, self).__init__()
self.criterion = my_loss()
self.use_target_weight = use_target_weight
def forward(self, output, target, target_weight):
batch_size = output.size(0)
num_joints = output.size(1)
heatmaps_pred = output.reshape(
(batch_size, num_joints, -1)).split(1, 1)
heatmaps_gt = target.reshape((batch_size, num_joints, -1)).split(1, 1)
loss = 0.
for idx in range(num_joints):
heatmap_pred = heatmaps_pred[idx].squeeze(1)
heatmap_gt = heatmaps_gt[idx].squeeze(1)
if self.use_target_weight:
loss += self.criterion(
heatmap_pred * target_weight[:, idx],
heatmap_gt * target_weight[:, idx])
else:
loss += self.criterion(heatmap_pred, heatmap_gt)
return loss / num_joints
之后,用户需要把它添加进 mmpose/models/losses/__init__.py
。
from .my_loss import MyLoss, my_loss
若要使用新的损失函数,可以修改模型中的 loss_keypoint
字段。
loss_keypoint=dict(type='MyLoss', use_target_weight=False)
教程 5:如何导出模型为 onnx 格式¶
开放式神经网络交换格式(Open Neural Network Exchange,即 ONNX)是各种框架共用的一种模型交换格式,AI 开发人员可以方便将模型部署到所需的框架之中。
支持的模型
如何使用
准备工作
如何使用¶
用户可以使用这里的 脚本 来导出 ONNX 格式。
准备工作¶
首先,安装 onnx
pip install onnx onnxruntime
MMPose 提供了一个 python 脚本,将 MMPose 训练的 pytorch 模型导出到 ONNX。
python tools/deployment/pytorch2onnx.py ${CONFIG_FILE} ${CHECKPOINT_FILE} [--shape ${SHAPE}] \
[--verify] [--show] [--output-file ${OUTPUT_FILE}] [--is-localizer] [--opset-version ${VERSION}]
可选参数:
--shape
: 模型输入张量的形状。对于 2D 关键点检测模型(如 HRNet),输入形状应当为$batch $channel $height $width
(例如,1 3 256 192
);--verify
: 是否对导出模型进行验证,验证项包括是否可运行,数值是否正确等。如果没有手动指定,默认为False
。--show
: 是否打印导出模型的结构。如果没有手动指定,默认为False
。--output-file
: 导出的 onnx 模型名。如果没有手动指定,默认为tmp.onnx
。--opset-version
:决定 onnx 的执行版本,MMPose 推荐用户使用高版本(例如 11 版本)的 onnx 以确保稳定性。如果没有手动指定,默认为11
。
如果发现提供的模型权重文件没有被成功导出,或者存在精度损失,可以在本 repo 下提出问题(issue)。
教程 6: 自定义运行时设置¶
内容建设中……
常用工具¶
内容建设中……
常见问题¶
内容建设中……
mmpose.apis¶
- mmpose.apis.extract_pose_sequence(pose_results, frame_idx, causal, seq_len, step=1)[源代码]¶
Extract the target frame from 2D pose results, and pad the sequence to a fixed length.
- 参数
pose_results (list[list[dict]]) –
Multi-frame pose detection results stored in a nested list. Each element of the outer list is the pose detection results of a single frame, and each element of the inner list is the pose information of one person, which contains:
keypoints (ndarray[K, 2 or 3]): x, y, [score]
track_id (int): unique id of each person, required when
with_track_id==True
.bbox ((4, ) or (5, )): left, right, top, bottom, [score]
frame_idx (int) – The index of the frame in the original video.
causal (bool) – If True, the target frame is the last frame in a sequence. Otherwise, the target frame is in the middle of a sequence.
seq_len (int) – The number of frames in the input sequence.
step (int) – Step size to extract frames from the video.
- 返回
Multi-frame pose detection results stored in a nested list with a length of seq_len.
- 返回类型
list[list[dict]]
- mmpose.apis.get_track_id(results, results_last, next_id, min_keypoints=3, use_oks=False, tracking_thr=0.3, use_one_euro=False, fps=None)[源代码]¶
Get track id for each person instance on the current frame.
- 参数
results (list[dict]) – The bbox & pose results of the current frame (bbox_result, pose_result).
results_last (list[dict]) – The bbox & pose & track_id info of the last frame (bbox_result, pose_result, track_id).
next_id (int) – The track id for the new person instance.
min_keypoints (int) – Minimum number of keypoints recognized as person. default: 3.
use_oks (bool) – Flag to using oks tracking. default: False.
tracking_thr (float) – The threshold for tracking.
use_one_euro (bool) – Option to use one-euro-filter. default: False.
fps (optional) – Parameters that d_cutoff when one-euro-filter is used as a video input
- 返回
results (list[dict]): The bbox & pose & track_id info of the current frame (bbox_result, pose_result, track_id).
next_id (int): The track id for the new person instance.
- 返回类型
tuple
- mmpose.apis.inference_bottom_up_pose_model(model, img_or_path, dataset='BottomUpCocoDataset', dataset_info=None, pose_nms_thr=0.9, return_heatmap=False, outputs=None)[源代码]¶
Inference a single image with a bottom-up pose model.
注解
num_people: P
num_keypoints: K
bbox height: H
bbox width: W
- 参数
model (nn.Module) – The loaded pose model.
img_or_path (str| np.ndarray) – Image filename or loaded image.
dataset (str) – Dataset name, e.g. ‘BottomUpCocoDataset’. It is deprecated. Please use dataset_info instead.
dataset_info (DatasetInfo) – A class containing all dataset info.
pose_nms_thr (float) – retain oks overlap < pose_nms_thr, default: 0.9.
return_heatmap (bool) – Flag to return heatmap, default: False.
outputs (list(str) | tuple(str)) – Names of layers whose outputs need to be returned, default: None.
- 返回
pose_results (list[np.ndarray]): The predicted pose info. The length of the list is the number of people (P). Each item in the list is a ndarray, containing each person’s pose (np.ndarray[Kx3]): x, y, score.
returned_outputs (list[dict[np.ndarray[N, K, H, W] | torch.Tensor[N, K, H, W]]]): Output feature maps from layers specified in outputs. Includes ‘heatmap’ if return_heatmap is True.
- 返回类型
tuple
- mmpose.apis.inference_interhand_3d_model(model, img_or_path, det_results, bbox_thr=None, format='xywh', dataset='InterHand3DDataset')[源代码]¶
Inference a single image with a list of hand bounding boxes.
注解
num_bboxes: N
num_keypoints: K
- 参数
model (nn.Module) – The loaded pose model.
img_or_path (str | np.ndarray) – Image filename or loaded image.
det_results (list[dict]) – The 2D bbox sequences stored in a list. Each each element of the list is the bbox of one person, whose shape is (ndarray[4 or 5]), containing 4 box coordinates (and score).
dataset (str) – Dataset name.
format – bbox format (‘xyxy’ | ‘xywh’). Default: ‘xywh’. ‘xyxy’ means (left, top, right, bottom), ‘xywh’ means (left, top, width, height).
- 返回
3D pose inference results. Each element is the result of an instance, which contains the predicted 3D keypoints with shape (ndarray[K,3]). If there is no valid instance, an empty list will be returned.
- 返回类型
list[dict]
- mmpose.apis.inference_mesh_model(model, img_or_path, det_results, bbox_thr=None, format='xywh', dataset='MeshH36MDataset')[源代码]¶
Inference a single image with a list of bounding boxes.
注解
num_bboxes: N
num_keypoints: K
num_vertices: V
num_faces: F
- 参数
model (nn.Module) – The loaded pose model.
img_or_path (str | np.ndarray) – Image filename or loaded image.
det_results (list[dict]) – The 2D bbox sequences stored in a list. Each element of the list is the bbox of one person. “bbox” (ndarray[4 or 5]): The person bounding box, which contains 4 box coordinates (and score).
bbox_thr (float | None) – Threshold for bounding boxes. Only bboxes with higher scores will be fed into the pose detector. If bbox_thr is None, all boxes will be used.
format (str) –
bbox format (‘xyxy’ | ‘xywh’). Default: ‘xywh’.
’xyxy’ means (left, top, right, bottom),
’xywh’ means (left, top, width, height).
dataset (str) – Dataset name.
- 返回
3D pose inference results. Each element is the result of an instance, which contains:
’bbox’ (ndarray[4]): instance bounding bbox
’center’ (ndarray[2]): bbox center
’scale’ (ndarray[2]): bbox scale
’keypoints_3d’ (ndarray[K,3]): predicted 3D keypoints
’camera’ (ndarray[3]): camera parameters
’vertices’ (ndarray[V, 3]): predicted 3D vertices
’faces’ (ndarray[F, 3]): mesh faces
If there is no valid instance, an empty list will be returned.
- 返回类型
list[dict]
- mmpose.apis.inference_pose_lifter_model(model, pose_results_2d, dataset=None, dataset_info=None, with_track_id=True, image_size=None, norm_pose_2d=False)[源代码]¶
Inference 3D pose from 2D pose sequences using a pose lifter model.
- 参数
model (nn.Module) – The loaded pose lifter model
pose_results_2d (list[list[dict]]) –
The 2D pose sequences stored in a nested list. Each element of the outer list is the 2D pose results of a single frame, and each element of the inner list is the 2D pose of one person, which contains:
”keypoints” (ndarray[K, 2 or 3]): x, y, [score]
”track_id” (int)
dataset (str) – Dataset name, e.g. ‘Body3DH36MDataset’
with_track_id – If True, the element in pose_results_2d is expected to contain “track_id”, which will be used to gather the pose sequence of a person from multiple frames. Otherwise, the pose results in each frame are expected to have a consistent number and order of identities. Default is True.
image_size (tuple|list) – image width, image height. If None, image size will not be contained in dict
data
.norm_pose_2d (bool) – If True, scale the bbox (along with the 2D pose) to the average bbox scale of the dataset, and move the bbox (along with the 2D pose) to the average bbox center of the dataset.
- 返回
3D pose inference results. Each element is the result of an instance, which contains:
”keypoints_3d” (ndarray[K, 3]): predicted 3D keypoints
”keypoints” (ndarray[K, 2 or 3]): from the last frame in
pose_results_2d
.”track_id” (int): from the last frame in
pose_results_2d
. If there is no valid instance, an empty list will be returned.
- 返回类型
list[dict]
- mmpose.apis.inference_top_down_pose_model(model, img_or_path, person_results=None, bbox_thr=None, format='xywh', dataset='TopDownCocoDataset', dataset_info=None, return_heatmap=False, outputs=None)[源代码]¶
Inference a single image with a list of person bounding boxes.
注解
num_people: P
num_keypoints: K
bbox height: H
bbox width: W
- 参数
model (nn.Module) – The loaded pose model.
img_or_path (str| np.ndarray) – Image filename or loaded image.
person_results (list(dict), optional) –
a list of detected persons that contains
bbox
and/ortrack_id
:bbox
(4, ) or (5, ): The person bounding box, which contains4 box coordinates (and score).
track_id
(int): The unique id for each human instance. Ifnot provided, a dummy person result with a bbox covering the entire image will be used. Default: None.
bbox_thr (float | None) – Threshold for bounding boxes. Only bboxes with higher scores will be fed into the pose detector. If bbox_thr is None, all boxes will be used.
format (str) –
bbox format (‘xyxy’ | ‘xywh’). Default: ‘xywh’.
xyxy means (left, top, right, bottom),
xywh means (left, top, width, height).
dataset (str) – Dataset name, e.g. ‘TopDownCocoDataset’. It is deprecated. Please use dataset_info instead.
dataset_info (DatasetInfo) – A class containing all dataset info.
return_heatmap (bool) – Flag to return heatmap, default: False
outputs (list(str) | tuple(str)) – Names of layers whose outputs need to be returned. Default: None.
- 返回
pose_results (list[dict]): The bbox & pose info. Each item in the list is a dictionary, containing the bbox: (left, top, right, bottom, [score]) and the pose (ndarray[Kx3]): x, y, score.
returned_outputs (list[dict[np.ndarray[N, K, H, W] | torch.Tensor[N, K, H, W]]]): Output feature maps from layers specified in outputs. Includes ‘heatmap’ if return_heatmap is True.
- 返回类型
tuple
- mmpose.apis.init_pose_model(config, checkpoint=None, device='cuda:0')[源代码]¶
Initialize a pose model from config file.
- 参数
config (str or
mmcv.Config
) – Config file path or the config object.checkpoint (str, optional) – Checkpoint path. If left as None, the model will not load any weights.
- 返回
The constructed detector.
- 返回类型
nn.Module
- mmpose.apis.init_random_seed(seed=None, device='cuda')[源代码]¶
Initialize random seed.
If the seed is not set, the seed will be automatically randomized, and then broadcast to all processes to prevent some potential bugs.
- 参数
seed (int, Optional) – The seed. Default to None.
device (str) – The device where the seed will be put on. Default to ‘cuda’.
- 返回
Seed to be used.
- 返回类型
int
- mmpose.apis.multi_gpu_test(model, data_loader, tmpdir=None, gpu_collect=False)[源代码]¶
Test model with multiple gpus.
This method tests model with multiple gpus and collects the results under two different modes: gpu and cpu modes. By setting ‘gpu_collect=True’ it encodes results to gpu tensors and use gpu communication for results collection. On cpu mode it saves the results on different gpus to ‘tmpdir’ and collects them by the rank 0 worker.
- 参数
model (nn.Module) – Model to be tested.
data_loader (nn.Dataloader) – Pytorch data loader.
tmpdir (str) – Path of directory to save the temporary results from different gpus under cpu mode.
gpu_collect (bool) – Option to use either gpu or cpu to collect results.
- 返回
The prediction results.
- 返回类型
list
- mmpose.apis.process_mmdet_results(mmdet_results, cat_id=1)[源代码]¶
Process mmdet results, and return a list of bboxes.
- 参数
mmdet_results (list|tuple) – mmdet results.
cat_id (int) – category id (default: 1 for human)
- 返回
a list of detected bounding boxes
- 返回类型
person_results (list)
- mmpose.apis.single_gpu_test(model, data_loader)[源代码]¶
Test model with a single gpu.
This method tests model with a single gpu and displays test progress bar.
- 参数
model (nn.Module) – Model to be tested.
data_loader (nn.Dataloader) – Pytorch data loader.
- 返回
The prediction results.
- 返回类型
list
- mmpose.apis.train_model(model, dataset, cfg, distributed=False, validate=False, timestamp=None, meta=None)[源代码]¶
Train model entry function.
- 参数
model (nn.Module) – The model to be trained.
dataset (Dataset) – Train dataset.
cfg (dict) – The config dict for training.
distributed (bool) – Whether to use distributed training. Default: False.
validate (bool) – Whether to do evaluation. Default: False.
timestamp (str | None) – Local time for runner. Default: None.
meta (dict | None) – Meta dict to record some important information. Default: None
- mmpose.apis.vis_3d_mesh_result(model, result, img=None, show=False, out_file=None)[源代码]¶
Visualize the 3D mesh estimation results.
- 参数
model (nn.Module) – The loaded model.
result (list[dict]) – 3D mesh estimation results.
- mmpose.apis.vis_3d_pose_result(model, result, img=None, dataset='Body3DH36MDataset', dataset_info=None, kpt_score_thr=0.3, radius=8, thickness=2, num_instances=- 1, show=False, out_file=None)[源代码]¶
Visualize the 3D pose estimation results.
- 参数
model (nn.Module) – The loaded model.
result (list[dict]) –
- mmpose.apis.vis_pose_result(model, img, result, radius=4, thickness=1, kpt_score_thr=0.3, bbox_color='green', dataset='TopDownCocoDataset', dataset_info=None, show=False, out_file=None)[源代码]¶
Visualize the detection results on the image.
- 参数
model (nn.Module) – The loaded detector.
img (str | np.ndarray) – Image filename or loaded image.
result (list[dict]) – The results to draw over img (bbox_result, pose_result).
radius (int) – Radius of circles.
thickness (int) – Thickness of lines.
kpt_score_thr (float) – The threshold to visualize the keypoints.
skeleton (list[tuple()]) – Default None.
show (bool) – Whether to show the image. Default True.
out_file (str|None) – The filename of the output visualization image.
- mmpose.apis.vis_pose_tracking_result(model, img, result, radius=4, thickness=1, kpt_score_thr=0.3, dataset='TopDownCocoDataset', dataset_info=None, show=False, out_file=None)[源代码]¶
Visualize the pose tracking results on the image.
- 参数
model (nn.Module) – The loaded detector.
img (str | np.ndarray) – Image filename or loaded image.
result (list[dict]) – The results to draw over img (bbox_result, pose_result).
radius (int) – Radius of circles.
thickness (int) – Thickness of lines.
kpt_score_thr (float) – The threshold to visualize the keypoints.
skeleton (list[tuple]) – Default None.
show (bool) – Whether to show the image. Default True.
out_file (str|None) – The filename of the output visualization image.
mmpose.core¶
evaluation¶
- class mmpose.core.evaluation.DistEvalHook(dataloader, start=None, interval=1, by_epoch=True, save_best=None, rule=None, test_fn=None, greater_keys=['acc', 'ap', 'ar', 'pck', 'auc', '3dpck', 'p-3dpck', '3dauc', 'p-3dauc'], less_keys=['loss', 'epe', 'nme', 'mpjpe', 'p-mpjpe', 'n-mpjpe'], broadcast_bn_buffer=True, tmpdir=None, gpu_collect=False, **eval_kwargs)[源代码]¶
- class mmpose.core.evaluation.EvalHook(dataloader, start=None, interval=1, by_epoch=True, save_best=None, rule=None, test_fn=None, greater_keys=['acc', 'ap', 'ar', 'pck', 'auc', '3dpck', 'p-3dpck', '3dauc', 'p-3dauc'], less_keys=['loss', 'epe', 'nme', 'mpjpe', 'p-mpjpe', 'n-mpjpe'], **eval_kwargs)[源代码]¶
- mmpose.core.evaluation.aggregate_scale(feature_maps_list, align_corners=False, aggregate_scale='average')[源代码]¶
Aggregate multi-scale outputs.
注解
batch size: N keypoints num : K heatmap width: W heatmap height: H
- 参数
feature_maps_list (list[Tensor]) – Aggregated feature maps.
project2image (bool) – Option to resize to base scale.
align_corners (bool) – Align corners when performing interpolation.
aggregate_scale (str) –
Methods to aggregate multi-scale feature maps. Options: ‘average’, ‘unsqueeze_concat’.
’average’: Get the average of the feature maps.
- ’unsqueeze_concat’: Concatenate the feature maps along new axis.
Default: ‘average.
- 返回
Aggregated feature maps.
- 返回类型
Tensor
- mmpose.core.evaluation.aggregate_stage_flip(feature_maps, feature_maps_flip, index=- 1, project2image=True, size_projected=None, align_corners=False, aggregate_stage='concat', aggregate_flip='average')[源代码]¶
Inference the model to get multi-stage outputs (heatmaps & tags), and resize them to base sizes.
- 参数
feature_maps (list[Tensor]) – feature_maps can be heatmaps, tags, and pafs.
feature_maps_flip (list[Tensor] | None) – flipped feature_maps. feature maps can be heatmaps, tags, and pafs.
project2image (bool) – Option to resize to base scale.
size_projected (list[int, int]) – Base size of heatmaps [w, h].
align_corners (bool) – Align corners when performing interpolation.
aggregate_stage (str) –
Methods to aggregate multi-stage feature maps. Options: ‘concat’, ‘average’. Default: ‘concat.
’concat’: Concatenate the original and the flipped feature maps.
- ’average’: Get the average of the original and the flipped
feature maps.
aggregate_flip (str) –
Methods to aggregate the original and the flipped feature maps. Options: ‘concat’, ‘average’, ‘none’. Default: ‘average.
’concat’: Concatenate the original and the flipped feature maps.
- ’average’: Get the average of the original and the flipped
feature maps..
’none’: no flipped feature maps.
- 返回
Aggregated feature maps with shape [NxKxWxH].
- 返回类型
list[Tensor]
- mmpose.core.evaluation.compute_similarity_transform(source_points, target_points)[源代码]¶
Computes a similarity transform (sR, t) that takes a set of 3D points source_points (N x 3) closest to a set of 3D points target_points, where R is an 3x3 rotation matrix, t 3x1 translation, s scale. And return the transformed 3D points source_points_hat (N x 3). i.e. solves the orthogonal Procrutes problem.
注解
Points number: N
- 参数
source_points (np.ndarray) – Source point set with shape [N, 3].
target_points (np.ndarray) – Target point set with shape [N, 3].
- 返回
Transformed source point set with shape [N, 3].
- 返回类型
np.ndarray
- mmpose.core.evaluation.flip_feature_maps(feature_maps, flip_index=None)[源代码]¶
Flip the feature maps and swap the channels.
- 参数
feature_maps (list[Tensor]) – Feature maps.
flip_index (list[int] | None) – Channel-flip indexes. If None, do not flip channels.
- 返回
Flipped feature_maps.
- 返回类型
list[Tensor]
- mmpose.core.evaluation.get_group_preds(grouped_joints, center, scale, heatmap_size, use_udp=False)[源代码]¶
Transform the grouped joints back to the image.
- 参数
grouped_joints (list) – Grouped person joints.
center (np.ndarray[2, ]) – Center of the bounding box (x, y).
scale (np.ndarray[2, ]) – Scale of the bounding box wrt [width, height].
heatmap_size (np.ndarray[2, ]) – Size of the destination heatmaps.
use_udp (bool) – Unbiased data processing. Paper ref: Huang et al. The Devil is in the Details: Delving into Unbiased Data Processing for Human Pose Estimation (CVPR’2020).
- 返回
List of the pose result for each person.
- 返回类型
list
- mmpose.core.evaluation.keypoint_3d_auc(pred, gt, mask, alignment='none')[源代码]¶
Calculate the Area Under the Curve (3DAUC) computed for a range of 3DPCK thresholds.
Paper ref: Monocular 3D Human Pose Estimation In The Wild Using Improved CNN Supervision’ 3DV’2017. . This implementation is derived from mpii_compute_3d_pck.m, which is provided as part of the MPI-INF-3DHP test data release.
注解
batch_size: N num_keypoints: K keypoint_dims: C
- 参数
pred (np.ndarray[N, K, C]) – Predicted keypoint location.
gt (np.ndarray[N, K, C]) – Groundtruth keypoint location.
mask (np.ndarray[N, K]) – Visibility of the target. False for invisible joints, and True for visible. Invisible joints will be ignored for accuracy calculation.
alignment (str, optional) –
method to align the prediction with the groundtruth. Supported options are:
'none'
: no alignment will be applied'scale'
: align in the least-square sense in scale'procrustes'
: align in the least-square sense in scale,rotation and translation.
- 返回
AUC computed for a range of 3DPCK thresholds.
- 返回类型
auc
- mmpose.core.evaluation.keypoint_3d_pck(pred, gt, mask, alignment='none', threshold=0.15)[源代码]¶
Calculate the Percentage of Correct Keypoints (3DPCK) w. or w/o rigid alignment.
Paper ref: Monocular 3D Human Pose Estimation In The Wild Using Improved CNN Supervision’ 3DV’2017. .
注解
batch_size: N
num_keypoints: K
keypoint_dims: C
- 参数
pred (np.ndarray[N, K, C]) – Predicted keypoint location.
gt (np.ndarray[N, K, C]) – Groundtruth keypoint location.
mask (np.ndarray[N, K]) – Visibility of the target. False for invisible joints, and True for visible. Invisible joints will be ignored for accuracy calculation.
alignment (str, optional) –
method to align the prediction with the groundtruth. Supported options are:
'none'
: no alignment will be applied'scale'
: align in the least-square sense in scale'procrustes'
: align in the least-square sense in scale,rotation and translation.
threshold – If L2 distance between the prediction and the groundtruth is less then threshold, the predicted result is considered as correct. Default: 0.15 (m).
- 返回
percentage of correct keypoints.
- 返回类型
pck
- mmpose.core.evaluation.keypoint_auc(pred, gt, mask, normalize, num_step=20)[源代码]¶
Calculate the pose accuracy of PCK for each individual keypoint and the averaged accuracy across all keypoints for coordinates.
注解
batch_size: N
num_keypoints: K
- 参数
pred (np.ndarray[N, K, 2]) – Predicted keypoint location.
gt (np.ndarray[N, K, 2]) – Groundtruth keypoint location.
mask (np.ndarray[N, K]) – Visibility of the target. False for invisible joints, and True for visible. Invisible joints will be ignored for accuracy calculation.
normalize (float) – Normalization factor.
- 返回
Area under curve.
- 返回类型
float
- mmpose.core.evaluation.keypoint_epe(pred, gt, mask)[源代码]¶
Calculate the end-point error.
注解
batch_size: N
num_keypoints: K
- 参数
pred (np.ndarray[N, K, 2]) – Predicted keypoint location.
gt (np.ndarray[N, K, 2]) – Groundtruth keypoint location.
mask (np.ndarray[N, K]) – Visibility of the target. False for invisible joints, and True for visible. Invisible joints will be ignored for accuracy calculation.
- 返回
Average end-point error.
- 返回类型
float
- mmpose.core.evaluation.keypoint_mpjpe(pred, gt, mask, alignment='none')[源代码]¶
Calculate the mean per-joint position error (MPJPE) and the error after rigid alignment with the ground truth (P-MPJPE).
注解
batch_size: N
num_keypoints: K
keypoint_dims: C
- 参数
pred (np.ndarray) – Predicted keypoint location with shape [N, K, C].
gt (np.ndarray) – Groundtruth keypoint location with shape [N, K, C].
mask (np.ndarray) – Visibility of the target with shape [N, K]. False for invisible joints, and True for visible. Invisible joints will be ignored for accuracy calculation.
alignment (str, optional) –
method to align the prediction with the groundtruth. Supported options are:
'none'
: no alignment will be applied'scale'
: align in the least-square sense in scale'procrustes'
: align in the least-square sense inscale, rotation and translation.
- 返回
A tuple containing joint position errors
(float | np.ndarray): mean per-joint position error (mpjpe).
- (float | np.ndarray): mpjpe after rigid alignment with the
ground truth (p-mpjpe).
- 返回类型
tuple
- mmpose.core.evaluation.keypoint_pck_accuracy(pred, gt, mask, thr, normalize)[源代码]¶
Calculate the pose accuracy of PCK for each individual keypoint and the averaged accuracy across all keypoints for coordinates.
注解
PCK metric measures accuracy of the localization of the body joints. The distances between predicted positions and the ground-truth ones are typically normalized by the bounding box size. The threshold (thr) of the normalized distance is commonly set as 0.05, 0.1 or 0.2 etc.
batch_size: N
num_keypoints: K
- 参数
pred (np.ndarray[N, K, 2]) – Predicted keypoint location.
gt (np.ndarray[N, K, 2]) – Groundtruth keypoint location.
mask (np.ndarray[N, K]) – Visibility of the target. False for invisible joints, and True for visible. Invisible joints will be ignored for accuracy calculation.
thr (float) – Threshold of PCK calculation.
normalize (np.ndarray[N, 2]) – Normalization factor for H&W.
- 返回
A tuple containing keypoint accuracy.
acc (np.ndarray[K]): Accuracy of each keypoint.
avg_acc (float): Averaged accuracy across all keypoints.
cnt (int): Number of valid keypoints.
- 返回类型
tuple
- mmpose.core.evaluation.keypoints_from_heatmaps(heatmaps, center, scale, unbiased=False, post_process='default', kernel=11, valid_radius_factor=0.0546875, use_udp=False, target_type='GaussianHeatmap')[源代码]¶
Get final keypoint predictions from heatmaps and transform them back to the image.
注解
batch size: N
num keypoints: K
heatmap height: H
heatmap width: W
- 参数
heatmaps (np.ndarray[N, K, H, W]) – model predicted heatmaps.
center (np.ndarray[N, 2]) – Center of the bounding box (x, y).
scale (np.ndarray[N, 2]) – Scale of the bounding box wrt height/width.
post_process (str/None) – Choice of methods to post-process heatmaps. Currently supported: None, ‘default’, ‘unbiased’, ‘megvii’.
unbiased (bool) – Option to use unbiased decoding. Mutually exclusive with megvii. Note: this arg is deprecated and unbiased=True can be replaced by post_process=’unbiased’ Paper ref: Zhang et al. Distribution-Aware Coordinate Representation for Human Pose Estimation (CVPR 2020).
kernel (int) – Gaussian kernel size (K) for modulation, which should match the heatmap gaussian sigma when training. K=17 for sigma=3 and k=11 for sigma=2.
valid_radius_factor (float) – The radius factor of the positive area in classification heatmap for UDP.
use_udp (bool) – Use unbiased data processing.
target_type (str) – ‘GaussianHeatmap’ or ‘CombinedTarget’. GaussianHeatmap: Classification target with gaussian distribution. CombinedTarget: The combination of classification target (response map) and regression target (offset map). Paper ref: Huang et al. The Devil is in the Details: Delving into Unbiased Data Processing for Human Pose Estimation (CVPR 2020).
- 返回
A tuple containing keypoint predictions and scores.
preds (np.ndarray[N, K, 2]): Predicted keypoint location in images.
maxvals (np.ndarray[N, K, 1]): Scores (confidence) of the keypoints.
- 返回类型
tuple
- mmpose.core.evaluation.keypoints_from_heatmaps3d(heatmaps, center, scale)[源代码]¶
Get final keypoint predictions from 3d heatmaps and transform them back to the image.
注解
batch size: N
num keypoints: K
heatmap depth size: D
heatmap height: H
heatmap width: W
- 参数
heatmaps (np.ndarray[N, K, D, H, W]) – model predicted heatmaps.
center (np.ndarray[N, 2]) – Center of the bounding box (x, y).
scale (np.ndarray[N, 2]) – Scale of the bounding box wrt height/width.
- 返回
A tuple containing keypoint predictions and scores.
preds (np.ndarray[N, K, 3]): Predicted 3d keypoint location in images.
maxvals (np.ndarray[N, K, 1]): Scores (confidence) of the keypoints.
- 返回类型
tuple
- mmpose.core.evaluation.keypoints_from_regression(regression_preds, center, scale, img_size)[源代码]¶
Get final keypoint predictions from regression vectors and transform them back to the image.
注解
batch_size: N
num_keypoints: K
- 参数
regression_preds (np.ndarray[N, K, 2]) – model prediction.
center (np.ndarray[N, 2]) – Center of the bounding box (x, y).
scale (np.ndarray[N, 2]) – Scale of the bounding box wrt height/width.
img_size (list(img_width, img_height)) – model input image size.
- 返回
preds (np.ndarray[N, K, 2]): Predicted keypoint location in images.
maxvals (np.ndarray[N, K, 1]): Scores (confidence) of the keypoints.
- 返回类型
tuple
- mmpose.core.evaluation.multilabel_classification_accuracy(pred, gt, mask, thr=0.5)[源代码]¶
Get multi-label classification accuracy.
注解
batch size: N
label number: L
- 参数
pred (np.ndarray[N, L, 2]) – model predicted labels.
gt (np.ndarray[N, L, 2]) – ground-truth labels.
mask (np.ndarray[N, 1] or np.ndarray[N, L]) – reliability of
labels. (ground-truth) –
- 返回
multi-label classification accuracy.
- 返回类型
float
- mmpose.core.evaluation.pose_pck_accuracy(output, target, mask, thr=0.05, normalize=None)[源代码]¶
Calculate the pose accuracy of PCK for each individual keypoint and the averaged accuracy across all keypoints from heatmaps.
注解
PCK metric measures accuracy of the localization of the body joints. The distances between predicted positions and the ground-truth ones are typically normalized by the bounding box size. The threshold (thr) of the normalized distance is commonly set as 0.05, 0.1 or 0.2 etc.
batch_size: N
num_keypoints: K
heatmap height: H
heatmap width: W
- 参数
output (np.ndarray[N, K, H, W]) – Model output heatmaps.
target (np.ndarray[N, K, H, W]) – Groundtruth heatmaps.
mask (np.ndarray[N, K]) – Visibility of the target. False for invisible joints, and True for visible. Invisible joints will be ignored for accuracy calculation.
thr (float) – Threshold of PCK calculation. Default 0.05.
normalize (np.ndarray[N, 2]) – Normalization factor for H&W.
- 返回
A tuple containing keypoint accuracy.
np.ndarray[K]: Accuracy of each keypoint.
float: Averaged accuracy across all keypoints.
int: Number of valid keypoints.
- 返回类型
tuple
- mmpose.core.evaluation.post_dark_udp(coords, batch_heatmaps, kernel=3)[源代码]¶
DARK post-pocessing. Implemented by udp. Paper ref: Huang et al. The Devil is in the Details: Delving into Unbiased Data Processing for Human Pose Estimation (CVPR 2020). Zhang et al. Distribution-Aware Coordinate Representation for Human Pose Estimation (CVPR 2020).
注解
batch size: B
num keypoints: K
num persons: N
height of heatmaps: H
width of heatmaps: W
B=1 for bottom_up paradigm where all persons share the same heatmap. B=N for top_down paradigm where each person has its own heatmaps.
- 参数
coords (np.ndarray[N, K, 2]) – Initial coordinates of human pose.
batch_heatmaps (np.ndarray[B, K, H, W]) – batch_heatmaps
kernel (int) – Gaussian kernel size (K) for modulation.
- 返回
Refined coordinates.
- 返回类型
np.ndarray([N, K, 2])
- mmpose.core.evaluation.split_ae_outputs(outputs, num_joints, with_heatmaps, with_ae, select_output_index)[源代码]¶
Split multi-stage outputs into heatmaps & tags.
- 参数
outputs (list(Tensor)) – Outputs of network
num_joints (int) – Number of joints
with_heatmaps (list[bool]) – Option to output heatmaps for different stages.
with_ae (list[bool]) – Option to output ae tags for different stages.
select_output_index (list[int]) – Output keep the selected index
- 返回
A tuple containing multi-stage outputs.
list[Tensor]: multi-stage heatmaps.
list[Tensor]: multi-stage tags.
- 返回类型
tuple
fp16¶
- class mmpose.core.fp16.Fp16OptimizerHook(grad_clip=None, coalesce=True, bucket_size_mb=- 1, loss_scale=512.0, distributed=True)[源代码]¶
FP16 optimizer hook.
The steps of fp16 optimizer is as follows. 1. Scale the loss value. 2. BP in the fp16 model. 2. Copy gradients from fp16 model to fp32 weights. 3. Update fp32 weights. 4. Copy updated parameters from fp32 weights to fp16 model.
Refer to https://arxiv.org/abs/1710.03740 for more details.
- 参数
loss_scale (float) – Scale factor multiplied with loss.
- after_train_iter(runner)[源代码]¶
Backward optimization steps for Mixed Precision Training.
Scale the loss by a scale factor.
Backward the loss to obtain the gradients (fp16).
Copy gradients from the model to the fp32 weight copy.
Scale the gradients back and update the fp32 weight copy.
Copy back the params from fp32 weight copy to the fp16 model.
- 参数
runner (
mmcv.Runner
) – The underlines training runner.
- before_run(runner)[源代码]¶
Preparing steps before Mixed Precision Training.
Make a master copy of fp32 weights for optimization.
Convert the main model from fp32 to fp16.
- 参数
runner (
mmcv.Runner
) – The underlines training runner.
- mmpose.core.fp16.auto_fp16(apply_to=None, out_fp32=False)[源代码]¶
Decorator to enable fp16 training automatically.
This decorator is useful when you write custom modules and want to support mixed precision training. If inputs arguments are fp32 tensors, they will be converted to fp16 automatically. Arguments other than fp32 tensors are ignored.
- 参数
apply_to (Iterable, optional) – The argument names to be converted. None indicates all arguments.
out_fp32 (bool) – Whether to convert the output back to fp32.
示例
>>> import torch.nn as nn >>> class MyModule1(nn.Module): >>> >>> # Convert x and y to fp16 >>> @auto_fp16() >>> def forward(self, x, y): >>> pass
>>> import torch.nn as nn >>> class MyModule2(nn.Module): >>> >>> # convert pred to fp16 >>> @auto_fp16(apply_to=('pred', )) >>> def do_something(self, pred, others): >>> pass
- mmpose.core.fp16.cast_tensor_type(inputs, src_type, dst_type)[源代码]¶
Recursively convert Tensor in inputs from src_type to dst_type.
- 参数
inputs – Inputs that to be casted.
src_type (torch.dtype) – Source type.
dst_type (torch.dtype) – Destination type.
- 返回
The same type with inputs, but all contained Tensors have been cast.
- mmpose.core.fp16.force_fp32(apply_to=None, out_fp16=False)[源代码]¶
Decorator to convert input arguments to fp32 in force.
This decorator is useful when you write custom modules and want to support mixed precision training. If there are some inputs that must be processed in fp32 mode, then this decorator can handle it. If inputs arguments are fp16 tensors, they will be converted to fp32 automatically. Arguments other than fp16 tensors are ignored.
- 参数
apply_to (Iterable, optional) – The argument names to be converted. None indicates all arguments.
out_fp16 (bool) – Whether to convert the output back to fp16.
示例
>>> import torch.nn as nn >>> class MyModule1(nn.Module): >>> >>> # Convert x and y to fp32 >>> @force_fp32() >>> def loss(self, x, y): >>> pass
>>> import torch.nn as nn >>> class MyModule2(nn.Module): >>> >>> # convert pred to fp32 >>> @force_fp32(apply_to=('pred', )) >>> def post_process(self, pred, others): >>> pass
utils¶
- class mmpose.core.utils.WeightNormClipHook(max_norm=1.0, module_param_names='weight')[源代码]¶
Apply weight norm clip regularization.
The module’s parameter will be clip to a given maximum norm before each forward pass.
- 参数
max_norm (float) – The maximum norm of the parameter.
module_param_names (str|list) – The parameter name (or name list) to apply weight norm clip.
- property hook_type¶
Hook type Subclasses should overwrite this function to return a string value in.
{forward, forward_pre, backward}
- mmpose.core.utils.allreduce_grads(params, coalesce=True, bucket_size_mb=- 1)[源代码]¶
Allreduce gradients.
- 参数
params (list[torch.Parameters]) – List of parameters of a model
coalesce (bool, optional) – Whether allreduce parameters as a whole. Default: True.
bucket_size_mb (int, optional) – Size of bucket, the unit is MB. Default: -1.
post_processing¶
- mmpose.core.post_processing.affine_transform(pt, trans_mat)[源代码]¶
Apply an affine transformation to the points.
- 参数
pt (np.ndarray) – a 2 dimensional point to be transformed
trans_mat (np.ndarray) – 2x3 matrix of an affine transform
- 返回
Transformed points.
- 返回类型
np.ndarray
- mmpose.core.post_processing.flip_back(output_flipped, flip_pairs, target_type='GaussianHeatmap')[源代码]¶
Flip the flipped heatmaps back to the original form.
注解
batch_size: N
num_keypoints: K
heatmap height: H
heatmap width: W
- 参数
output_flipped (np.ndarray[N, K, H, W]) – The output heatmaps obtained from the flipped images.
flip_pairs (list[tuple()) – Pairs of keypoints which are mirrored (for example, left ear – right ear).
target_type (str) – GaussianHeatmap or CombinedTarget
- 返回
heatmaps that flipped back to the original image
- 返回类型
np.ndarray
- mmpose.core.post_processing.fliplr_joints(joints_3d, joints_3d_visible, img_width, flip_pairs)[源代码]¶
Flip human joints horizontally.
注解
num_keypoints: K
- 参数
joints_3d (np.ndarray([K, 3])) – Coordinates of keypoints.
joints_3d_visible (np.ndarray([K, 1])) – Visibility of keypoints.
img_width (int) – Image width.
flip_pairs (list[tuple]) – Pairs of keypoints which are mirrored (for example, left ear and right ear).
- 返回
Flipped human joints.
joints_3d_flipped (np.ndarray([K, 3])): Flipped joints.
joints_3d_visible_flipped (np.ndarray([K, 1])): Joint visibility.
- 返回类型
tuple
- mmpose.core.post_processing.fliplr_regression(regression, flip_pairs, center_mode='static', center_x=0.5, center_index=0)[源代码]¶
Flip human joints horizontally.
注解
batch_size: N
num_keypoint: K
- 参数
regression (np.ndarray([..., K, C])) –
Coordinates of keypoints, where K is the joint number and C is the dimension. Example shapes are:
[N, K, C]: a batch of keypoints where N is the batch size.
- [N, T, K, C]: a batch of pose sequences, where T is the frame
number.
flip_pairs (list[tuple()]) – Pairs of keypoints which are mirrored (for example, left ear – right ear).
center_mode (str) –
The mode to set the center location on the x-axis to flip around. Options are:
static: use a static x value (see center_x also)
root: use a root joint (see center_index also)
center_x (float) – Set the x-axis location of the flip center. Only used when center_mode=static.
center_index (int) – Set the index of the root joint, whose x location will be used as the flip center. Only used when center_mode=root.
- 返回
Flipped joints.
- 返回类型
np.ndarray([…, K, C])
- mmpose.core.post_processing.get_affine_transform(center, scale, rot, output_size, shift=(0.0, 0.0), inv=False)[源代码]¶
Get the affine transform matrix, given the center/scale/rot/output_size.
- 参数
center (np.ndarray[2, ]) – Center of the bounding box (x, y).
scale (np.ndarray[2, ]) – Scale of the bounding box wrt [width, height].
rot (float) – Rotation angle (degree).
output_size (np.ndarray[2, ] | list(2,)) – Size of the destination heatmaps.
shift (0-100%) – Shift translation ratio wrt the width/height. Default (0., 0.).
inv (bool) – Option to inverse the affine transform direction. (inv=False: src->dst or inv=True: dst->src)
- 返回
The transform matrix.
- 返回类型
np.ndarray
- mmpose.core.post_processing.get_warp_matrix(theta, size_input, size_dst, size_target)[源代码]¶
Calculate the transformation matrix under the constraint of unbiased. Paper ref: Huang et al. The Devil is in the Details: Delving into Unbiased Data Processing for Human Pose Estimation (CVPR 2020).
- 参数
theta (float) – Rotation angle in degrees.
size_input (np.ndarray) – Size of input image [w, h].
size_dst (np.ndarray) – Size of output image [w, h].
size_target (np.ndarray) – Size of ROI in input plane [w, h].
- 返回
A matrix for transformation.
- 返回类型
np.ndarray
- mmpose.core.post_processing.oks_iou(g, d, a_g, a_d, sigmas=None, vis_thr=None)[源代码]¶
Calculate oks ious.
- 参数
g – Ground truth keypoints.
d – Detected keypoints.
a_g – Area of the ground truth object.
a_d – Area of the detected object.
sigmas – standard deviation of keypoint labelling.
vis_thr – threshold of the keypoint visibility.
- 返回
The oks ious.
- 返回类型
list
- mmpose.core.post_processing.oks_nms(kpts_db, thr, sigmas=None, vis_thr=None, score_per_joint=False)[源代码]¶
OKS NMS implementations.
- 参数
kpts_db – keypoints.
thr – Retain overlap < thr.
sigmas – standard deviation of keypoint labelling.
vis_thr – threshold of the keypoint visibility.
score_per_joint – the input scores (in kpts_db) are per joint scores
- 返回
indexes to keep.
- 返回类型
np.ndarray
- mmpose.core.post_processing.rotate_point(pt, angle_rad)[源代码]¶
Rotate a point by an angle.
- 参数
pt (list[float]) – 2 dimensional point to be rotated
angle_rad (float) – rotation angle by radian
- 返回
Rotated point.
- 返回类型
list[float]
- mmpose.core.post_processing.soft_oks_nms(kpts_db, thr, max_dets=20, sigmas=None, vis_thr=None, score_per_joint=False)[源代码]¶
Soft OKS NMS implementations.
- 参数
kpts_db –
thr – retain oks overlap < thr.
max_dets – max number of detections to keep.
sigmas – Keypoint labelling uncertainty.
score_per_joint – the input scores (in kpts_db) are per joint scores
- 返回
indexes to keep.
- 返回类型
np.ndarray
- mmpose.core.post_processing.transform_preds(coords, center, scale, output_size, use_udp=False)[源代码]¶
Get final keypoint predictions from heatmaps and apply scaling and translation to map them back to the image.
注解
num_keypoints: K
- 参数
coords (np.ndarray[K, ndims]) –
If ndims=2, corrds are predicted keypoint location.
If ndims=4, corrds are composed of (x, y, scores, tags)
If ndims=5, corrds are composed of (x, y, scores, tags, flipped_tags)
center (np.ndarray[2, ]) – Center of the bounding box (x, y).
scale (np.ndarray[2, ]) – Scale of the bounding box wrt [width, height].
output_size (np.ndarray[2, ] | list(2,)) – Size of the destination heatmaps.
use_udp (bool) – Use unbiased data processing
- 返回
Predicted coordinates in the images.
- 返回类型
np.ndarray
- mmpose.core.post_processing.warp_affine_joints(joints, mat)[源代码]¶
Apply affine transformation defined by the transform matrix on the joints.
- 参数
joints (np.ndarray[..., 2]) – Origin coordinate of joints.
mat (np.ndarray[3, 2]) – The affine matrix.
- 返回
Result coordinate of joints.
- 返回类型
np.ndarray[…, 2]
mmpose.models¶
backbones¶
- class mmpose.models.backbones.AlexNet(num_classes=- 1)[源代码]¶
AlexNet backbone.
The input for AlexNet is a 224x224 RGB image.
- 参数
num_classes (int) – number of classes for classification. The default value is -1, which uses the backbone as a feature extractor without the top classifier.
- class mmpose.models.backbones.CPM(in_channels, out_channels, feat_channels=128, middle_channels=32, num_stages=6, norm_cfg={'requires_grad': True, 'type': 'BN'})[源代码]¶
CPM backbone.
Convolutional Pose Machines. More details can be found in the paper .
- 参数
in_channels (int) – The input channels of the CPM.
out_channels (int) – The output channels of the CPM.
feat_channels (int) – Feature channel of each CPM stage.
middle_channels (int) – Feature channel of conv after the middle stage.
num_stages (int) – Number of stages.
norm_cfg (dict) – Dictionary to construct and config norm layer.
示例
>>> from mmpose.models import CPM >>> import torch >>> self = CPM(3, 17) >>> self.eval() >>> inputs = torch.rand(1, 3, 368, 368) >>> level_outputs = self.forward(inputs) >>> for level_output in level_outputs: ... print(tuple(level_output.shape)) (1, 17, 46, 46) (1, 17, 46, 46) (1, 17, 46, 46) (1, 17, 46, 46) (1, 17, 46, 46) (1, 17, 46, 46)
- class mmpose.models.backbones.HRNet(extra, in_channels=3, conv_cfg=None, norm_cfg={'type': 'BN'}, norm_eval=False, with_cp=False, zero_init_residual=False, frozen_stages=- 1)[源代码]¶
HRNet backbone.
High-Resolution Representations for Labeling Pixels and Regions
- 参数
extra (dict) – detailed configuration for each stage of HRNet.
in_channels (int) – Number of input image channels. Default: 3.
conv_cfg (dict) – dictionary to construct and config conv layer.
norm_cfg (dict) – dictionary to construct and config norm layer.
norm_eval (bool) – Whether to set norm layers to eval mode, namely, freeze running stats (mean and var). Note: Effect on Batch Norm and its variants only. Default: False
with_cp (bool) – Use checkpoint or not. Using checkpoint will save some memory while slowing down the training speed.
zero_init_residual (bool) – whether to use zero init for last norm layer in resblocks to let them behave as identity.
frozen_stages (int) – Stages to be frozen (stop grad and set eval mode). -1 means not freezing any parameters. Default: -1.
示例
>>> from mmpose.models import HRNet >>> import torch >>> extra = dict( >>> stage1=dict( >>> num_modules=1, >>> num_branches=1, >>> block='BOTTLENECK', >>> num_blocks=(4, ), >>> num_channels=(64, )), >>> stage2=dict( >>> num_modules=1, >>> num_branches=2, >>> block='BASIC', >>> num_blocks=(4, 4), >>> num_channels=(32, 64)), >>> stage3=dict( >>> num_modules=4, >>> num_branches=3, >>> block='BASIC', >>> num_blocks=(4, 4, 4), >>> num_channels=(32, 64, 128)), >>> stage4=dict( >>> num_modules=3, >>> num_branches=4, >>> block='BASIC', >>> num_blocks=(4, 4, 4, 4), >>> num_channels=(32, 64, 128, 256))) >>> self = HRNet(extra, in_channels=1) >>> self.eval() >>> inputs = torch.rand(1, 1, 32, 32) >>> level_outputs = self.forward(inputs) >>> for level_out in level_outputs: ... print(tuple(level_out.shape)) (1, 32, 8, 8)
- init_weights(pretrained=None)[源代码]¶
Initialize the weights in backbone.
- 参数
pretrained (str, optional) – Path to pre-trained weights. Defaults to None.
- property norm1¶
the normalization layer named “norm1”
- Type
nn.Module
- property norm2¶
the normalization layer named “norm2”
- Type
nn.Module
- class mmpose.models.backbones.HourglassAENet(downsample_times=4, num_stacks=1, out_channels=34, stage_channels=(256, 384, 512, 640, 768), feat_channels=256, norm_cfg={'requires_grad': True, 'type': 'BN'})[源代码]¶
Hourglass-AE Network proposed by Newell et al.
Associative Embedding: End-to-End Learning for Joint Detection and Grouping.
More details can be found in the paper .
- 参数
downsample_times (int) – Downsample times in a HourglassModule.
num_stacks (int) – Number of HourglassModule modules stacked, 1 for Hourglass-52, 2 for Hourglass-104.
stage_channels (list[int]) – Feature channel of each sub-module in a HourglassModule.
stage_blocks (list[int]) – Number of sub-modules stacked in a HourglassModule.
feat_channels (int) – Feature channel of conv after a HourglassModule.
norm_cfg (dict) – Dictionary to construct and config norm layer.
示例
>>> from mmpose.models import HourglassAENet >>> import torch >>> self = HourglassAENet() >>> self.eval() >>> inputs = torch.rand(1, 3, 512, 512) >>> level_outputs = self.forward(inputs) >>> for level_output in level_outputs: ... print(tuple(level_output.shape)) (1, 34, 128, 128)
- class mmpose.models.backbones.HourglassNet(downsample_times=5, num_stacks=2, stage_channels=(256, 256, 384, 384, 384, 512), stage_blocks=(2, 2, 2, 2, 2, 4), feat_channel=256, norm_cfg={'requires_grad': True, 'type': 'BN'})[源代码]¶
HourglassNet backbone.
Stacked Hourglass Networks for Human Pose Estimation. More details can be found in the paper .
- 参数
downsample_times (int) – Downsample times in a HourglassModule.
num_stacks (int) – Number of HourglassModule modules stacked, 1 for Hourglass-52, 2 for Hourglass-104.
stage_channels (list[int]) – Feature channel of each sub-module in a HourglassModule.
stage_blocks (list[int]) – Number of sub-modules stacked in a HourglassModule.
feat_channel (int) – Feature channel of conv after a HourglassModule.
norm_cfg (dict) – Dictionary to construct and config norm layer.
示例
>>> from mmpose.models import HourglassNet >>> import torch >>> self = HourglassNet() >>> self.eval() >>> inputs = torch.rand(1, 3, 511, 511) >>> level_outputs = self.forward(inputs) >>> for level_output in level_outputs: ... print(tuple(level_output.shape)) (1, 256, 128, 128) (1, 256, 128, 128)
- class mmpose.models.backbones.LiteHRNet(extra, in_channels=3, conv_cfg=None, norm_cfg={'type': 'BN'}, norm_eval=False, with_cp=False)[源代码]¶
Lite-HRNet backbone.
Lite-HRNet: A Lightweight High-Resolution Network.
Code adapted from ‘https://github.com/HRNet/Lite-HRNet’.
- 参数
extra (dict) – detailed configuration for each stage of HRNet.
in_channels (int) – Number of input image channels. Default: 3.
conv_cfg (dict) – dictionary to construct and config conv layer.
norm_cfg (dict) – dictionary to construct and config norm layer.
norm_eval (bool) – Whether to set norm layers to eval mode, namely, freeze running stats (mean and var). Note: Effect on Batch Norm and its variants only. Default: False
with_cp (bool) – Use checkpoint or not. Using checkpoint will save some memory while slowing down the training speed.
示例
>>> from mmpose.models import LiteHRNet >>> import torch >>> extra=dict( >>> stem=dict(stem_channels=32, out_channels=32, expand_ratio=1), >>> num_stages=3, >>> stages_spec=dict( >>> num_modules=(2, 4, 2), >>> num_branches=(2, 3, 4), >>> num_blocks=(2, 2, 2), >>> module_type=('LITE', 'LITE', 'LITE'), >>> with_fuse=(True, True, True), >>> reduce_ratios=(8, 8, 8), >>> num_channels=( >>> (40, 80), >>> (40, 80, 160), >>> (40, 80, 160, 320), >>> )), >>> with_head=False) >>> self = LiteHRNet(extra, in_channels=1) >>> self.eval() >>> inputs = torch.rand(1, 1, 32, 32) >>> level_outputs = self.forward(inputs) >>> for level_out in level_outputs: ... print(tuple(level_out.shape)) (1, 40, 8, 8)
- class mmpose.models.backbones.MSPN(unit_channels=256, num_stages=4, num_units=4, num_blocks=[2, 2, 2, 2], norm_cfg={'type': 'BN'}, res_top_channels=64)[源代码]¶
MSPN backbone. Paper ref: Li et al. “Rethinking on Multi-Stage Networks for Human Pose Estimation” (CVPR 2020).
- 参数
unit_channels (int) – Number of Channels in an upsample unit. Default: 256
num_stages (int) – Number of stages in a multi-stage MSPN. Default: 4
num_units (int) – Number of downsample/upsample units in a single-stage network. Default: 4 Note: Make sure num_units == len(self.num_blocks)
num_blocks (list) – Number of bottlenecks in each downsample unit. Default: [2, 2, 2, 2]
norm_cfg (dict) – dictionary to construct and config norm layer. Default: dict(type=’BN’)
res_top_channels (int) – Number of channels of feature from ResNetTop. Default: 64.
示例
>>> from mmpose.models import MSPN >>> import torch >>> self = MSPN(num_stages=2,num_units=2,num_blocks=[2,2]) >>> self.eval() >>> inputs = torch.rand(1, 3, 511, 511) >>> level_outputs = self.forward(inputs) >>> for level_output in level_outputs: ... for feature in level_output: ... print(tuple(feature.shape)) ... (1, 256, 64, 64) (1, 256, 128, 128) (1, 256, 64, 64) (1, 256, 128, 128)
- class mmpose.models.backbones.MobileNetV2(widen_factor=1.0, out_indices=(7), frozen_stages=- 1, conv_cfg=None, norm_cfg={'type': 'BN'}, act_cfg={'type': 'ReLU6'}, norm_eval=False, with_cp=False)[源代码]¶
MobileNetV2 backbone.
- 参数
widen_factor (float) – Width multiplier, multiply number of channels in each layer by this amount. Default: 1.0.
out_indices (None or Sequence[int]) – Output from which stages. Default: (7, ).
frozen_stages (int) – Stages to be frozen (all param fixed). Default: -1, which means not freezing any parameters.
conv_cfg (dict) – Config dict for convolution layer. Default: None, which means using conv2d.
norm_cfg (dict) – Config dict for normalization layer. Default: dict(type=’BN’).
act_cfg (dict) – Config dict for activation layer. Default: dict(type=’ReLU6’).
norm_eval (bool) – Whether to set norm layers to eval mode, namely, freeze running stats (mean and var). Note: Effect on Batch Norm and its variants only. Default: False.
with_cp (bool) – Use checkpoint or not. Using checkpoint will save some memory while slowing down the training speed. Default: False.
- forward(x)[源代码]¶
Forward function.
- 参数
x (Tensor | tuple[Tensor]) – x could be a torch.Tensor or a tuple of torch.Tensor, containing input data for forward computation.
- init_weights(pretrained=None)[源代码]¶
Init backbone weights.
- 参数
pretrained (str | None) – If pretrained is a string, then it initializes backbone weights by loading the pretrained checkpoint. If pretrained is None, then it follows default initializer or customized initializer in subclasses.
- make_layer(out_channels, num_blocks, stride, expand_ratio)[源代码]¶
Stack InvertedResidual blocks to build a layer for MobileNetV2.
- 参数
out_channels (int) – out_channels of block.
num_blocks (int) – number of blocks.
stride (int) – stride of the first block. Default: 1
expand_ratio (int) – Expand the number of channels of the hidden layer in InvertedResidual by this ratio. Default: 6.
- train(mode=True)[源代码]¶
Sets the module in training mode.
This has any effect only on certain modules. See documentations of particular modules for details of their behaviors in training/evaluation mode, if they are affected, e.g.
Dropout
,BatchNorm
, etc.- 参数
mode (bool) – whether to set training mode (
True
) or evaluation mode (False
). Default:True
.- 返回
self
- 返回类型
Module
- class mmpose.models.backbones.MobileNetV3(arch='small', conv_cfg=None, norm_cfg={'type': 'BN'}, out_indices=(- 1), frozen_stages=- 1, norm_eval=False, with_cp=False)[源代码]¶
MobileNetV3 backbone.
- 参数
arch (str) – Architecture of mobilnetv3, from {small, big}. Default: small.
conv_cfg (dict) – Config dict for convolution layer. Default: None, which means using conv2d.
norm_cfg (dict) – Config dict for normalization layer. Default: dict(type=’BN’).
out_indices (None or Sequence[int]) – Output from which stages. Default: (-1, ), which means output tensors from final stage.
frozen_stages (int) – Stages to be frozen (all param fixed). Default: -1, which means not freezing any parameters.
norm_eval (bool) – Whether to set norm layers to eval mode, namely, freeze running stats (mean and var). Note: Effect on Batch Norm and its variants only. Default: False.
with_cp (bool) – Use checkpoint or not. Using checkpoint will save some memory while slowing down the training speed. Default: False.
- forward(x)[源代码]¶
Forward function.
- 参数
x (Tensor | tuple[Tensor]) – x could be a torch.Tensor or a tuple of torch.Tensor, containing input data for forward computation.
- init_weights(pretrained=None)[源代码]¶
Init backbone weights.
- 参数
pretrained (str | None) – If pretrained is a string, then it initializes backbone weights by loading the pretrained checkpoint. If pretrained is None, then it follows default initializer or customized initializer in subclasses.
- train(mode=True)[源代码]¶
Sets the module in training mode.
This has any effect only on certain modules. See documentations of particular modules for details of their behaviors in training/evaluation mode, if they are affected, e.g.
Dropout
,BatchNorm
, etc.- 参数
mode (bool) – whether to set training mode (
True
) or evaluation mode (False
). Default:True
.- 返回
self
- 返回类型
Module
- class mmpose.models.backbones.RSN(unit_channels=256, num_stages=4, num_units=4, num_blocks=[2, 2, 2, 2], num_steps=4, norm_cfg={'type': 'BN'}, res_top_channels=64, expand_times=26)[源代码]¶
Residual Steps Network backbone. Paper ref: Cai et al. “Learning Delicate Local Representations for Multi-Person Pose Estimation” (ECCV 2020).
- 参数
unit_channels (int) – Number of Channels in an upsample unit. Default: 256
num_stages (int) – Number of stages in a multi-stage RSN. Default: 4
num_units (int) – NUmber of downsample/upsample units in a single-stage RSN. Default: 4 Note: Make sure num_units == len(self.num_blocks)
num_blocks (list) – Number of RSBs (Residual Steps Block) in each downsample unit. Default: [2, 2, 2, 2]
num_steps (int) – Number of steps in a RSB. Default:4
norm_cfg (dict) – dictionary to construct and config norm layer. Default: dict(type=’BN’)
res_top_channels (int) – Number of channels of feature from ResNet_top. Default: 64.
expand_times (int) – Times by which the in_channels are expanded in RSB. Default:26.
示例
>>> from mmpose.models import RSN >>> import torch >>> self = RSN(num_stages=2,num_units=2,num_blocks=[2,2]) >>> self.eval() >>> inputs = torch.rand(1, 3, 511, 511) >>> level_outputs = self.forward(inputs) >>> for level_output in level_outputs: ... for feature in level_output: ... print(tuple(feature.shape)) ... (1, 256, 64, 64) (1, 256, 128, 128) (1, 256, 64, 64) (1, 256, 128, 128)
- class mmpose.models.backbones.RegNet(arch, in_channels=3, stem_channels=32, base_channels=32, strides=(2, 2, 2, 2), dilations=(1, 1, 1, 1), out_indices=(3), style='pytorch', deep_stem=False, avg_down=False, frozen_stages=- 1, conv_cfg=None, norm_cfg={'requires_grad': True, 'type': 'BN'}, norm_eval=False, with_cp=False, zero_init_residual=True)[源代码]¶
RegNet backbone.
More details can be found in paper .
- 参数
arch (dict) – The parameter of RegNets. - w0 (int): initial width - wa (float): slope of width - wm (float): quantization parameter to quantize the width - depth (int): depth of the backbone - group_w (int): width of group - bot_mul (float): bottleneck ratio, i.e. expansion of bottleneck.
strides (Sequence[int]) – Strides of the first block of each stage.
base_channels (int) – Base channels after stem layer.
in_channels (int) – Number of input image channels. Default: 3.
dilations (Sequence[int]) – Dilation of each stage.
out_indices (Sequence[int]) – Output from which stages.
style (str) – pytorch or caffe. If set to “pytorch”, the stride-two layer is the 3x3 conv layer, otherwise the stride-two layer is the first 1x1 conv layer. Default: “pytorch”.
frozen_stages (int) – Stages to be frozen (all param fixed). -1 means not freezing any parameters. Default: -1.
norm_cfg (dict) – dictionary to construct and config norm layer. Default: dict(type=’BN’, requires_grad=True).
norm_eval (bool) – Whether to set norm layers to eval mode, namely, freeze running stats (mean and var). Note: Effect on Batch Norm and its variants only. Default: False.
with_cp (bool) – Use checkpoint or not. Using checkpoint will save some memory while slowing down the training speed. Default: False.
zero_init_residual (bool) – whether to use zero init for last norm layer in resblocks to let them behave as identity. Default: True.
示例
>>> from mmpose.models import RegNet >>> import torch >>> self = RegNet( arch=dict( w0=88, wa=26.31, wm=2.25, group_w=48, depth=25, bot_mul=1.0), out_indices=(0, 1, 2, 3)) >>> self.eval() >>> inputs = torch.rand(1, 3, 32, 32) >>> level_outputs = self.forward(inputs) >>> for level_out in level_outputs: ... print(tuple(level_out.shape)) (1, 96, 8, 8) (1, 192, 4, 4) (1, 432, 2, 2) (1, 1008, 1, 1)
- adjust_width_group(widths, bottleneck_ratio, groups)[源代码]¶
Adjusts the compatibility of widths and groups.
- 参数
widths (list[int]) – Width of each stage.
bottleneck_ratio (float) – Bottleneck ratio.
groups (int) – number of groups in each stage
- 返回
The adjusted widths and groups of each stage.
- 返回类型
tuple(list)
- static generate_regnet(initial_width, width_slope, width_parameter, depth, divisor=8)[源代码]¶
Generates per block width from RegNet parameters.
- 参数
initial_width ([int]) – Initial width of the backbone
width_slope ([float]) – Slope of the quantized linear function
width_parameter ([int]) – Parameter used to quantize the width.
depth ([int]) – Depth of the backbone.
divisor (int, optional) – The divisor of channels. Defaults to 8.
- 返回
- return a list of widths of each stage and the number of
stages
- 返回类型
list, int
- class mmpose.models.backbones.ResNeSt(depth, groups=1, width_per_group=4, radix=2, reduction_factor=4, avg_down_stride=True, **kwargs)[源代码]¶
ResNeSt backbone.
Please refer to the paper for details.
- 参数
depth (int) – Network depth, from {50, 101, 152, 200}.
groups (int) – Groups of conv2 in Bottleneck. Default: 32.
width_per_group (int) – Width per group of conv2 in Bottleneck. Default: 4.
radix (int) – Radix of SpltAtConv2d. Default: 2
reduction_factor (int) – Reduction factor of SplitAttentionConv2d. Default: 4.
avg_down_stride (bool) – Whether to use average pool for stride in Bottleneck. Default: True.
in_channels (int) – Number of input image channels. Default: 3.
stem_channels (int) – Output channels of the stem layer. Default: 64.
num_stages (int) – Stages of the network. Default: 4.
strides (Sequence[int]) – Strides of the first block of each stage. Default:
(1, 2, 2, 2)
.dilations (Sequence[int]) – Dilation of each stage. Default:
(1, 1, 1, 1)
.out_indices (Sequence[int]) – Output from which stages. If only one stage is specified, a single tensor (feature map) is returned, otherwise multiple stages are specified, a tuple of tensors will be returned. Default:
(3, )
.style (str) – pytorch or caffe. If set to “pytorch”, the stride-two layer is the 3x3 conv layer, otherwise the stride-two layer is the first 1x1 conv layer.
deep_stem (bool) – Replace 7x7 conv in input stem with 3 3x3 conv. Default: False.
avg_down (bool) – Use AvgPool instead of stride conv when downsampling in the bottleneck. Default: False.
frozen_stages (int) – Stages to be frozen (stop grad and set eval mode). -1 means not freezing any parameters. Default: -1.
conv_cfg (dict | None) – The config dict for conv layers. Default: None.
norm_cfg (dict) – The config dict for norm layers.
norm_eval (bool) – Whether to set norm layers to eval mode, namely, freeze running stats (mean and var). Note: Effect on Batch Norm and its variants only. Default: False.
with_cp (bool) – Use checkpoint or not. Using checkpoint will save some memory while slowing down the training speed. Default: False.
zero_init_residual (bool) – Whether to use zero init for last norm layer in resblocks to let them behave as identity. Default: True.
- class mmpose.models.backbones.ResNeXt(depth, groups=32, width_per_group=4, **kwargs)[源代码]¶
ResNeXt backbone.
Please refer to the paper for details.
- 参数
depth (int) – Network depth, from {50, 101, 152}.
groups (int) – Groups of conv2 in Bottleneck. Default: 32.
width_per_group (int) – Width per group of conv2 in Bottleneck. Default: 4.
in_channels (int) – Number of input image channels. Default: 3.
stem_channels (int) – Output channels of the stem layer. Default: 64.
num_stages (int) – Stages of the network. Default: 4.
strides (Sequence[int]) – Strides of the first block of each stage. Default:
(1, 2, 2, 2)
.dilations (Sequence[int]) – Dilation of each stage. Default:
(1, 1, 1, 1)
.out_indices (Sequence[int]) – Output from which stages. If only one stage is specified, a single tensor (feature map) is returned, otherwise multiple stages are specified, a tuple of tensors will be returned. Default:
(3, )
.style (str) – pytorch or caffe. If set to “pytorch”, the stride-two layer is the 3x3 conv layer, otherwise the stride-two layer is the first 1x1 conv layer.
deep_stem (bool) – Replace 7x7 conv in input stem with 3 3x3 conv. Default: False.
avg_down (bool) – Use AvgPool instead of stride conv when downsampling in the bottleneck. Default: False.
frozen_stages (int) – Stages to be frozen (stop grad and set eval mode). -1 means not freezing any parameters. Default: -1.
conv_cfg (dict | None) – The config dict for conv layers. Default: None.
norm_cfg (dict) – The config dict for norm layers.
norm_eval (bool) – Whether to set norm layers to eval mode, namely, freeze running stats (mean and var). Note: Effect on Batch Norm and its variants only. Default: False.
with_cp (bool) – Use checkpoint or not. Using checkpoint will save some memory while slowing down the training speed. Default: False.
zero_init_residual – Whether to use zero init for last norm layer in resblocks to let them behave as identity. Default: True.
- class mmpose.models.backbones.ResNet(depth, in_channels=3, stem_channels=64, base_channels=64, expansion=None, num_stages=4, strides=(1, 2, 2, 2), dilations=(1, 1, 1, 1), out_indices=(3), style='pytorch', deep_stem=False, avg_down=False, frozen_stages=- 1, conv_cfg=None, norm_cfg={'requires_grad': True, 'type': 'BN'}, norm_eval=False, with_cp=False, zero_init_residual=True)[源代码]¶
ResNet backbone.
Please refer to the paper for details.
- 参数
depth (int) – Network depth, from {18, 34, 50, 101, 152}.
in_channels (int) – Number of input image channels. Default: 3.
stem_channels (int) – Output channels of the stem layer. Default: 64.
base_channels (int) – Middle channels of the first stage. Default: 64.
num_stages (int) – Stages of the network. Default: 4.
strides (Sequence[int]) – Strides of the first block of each stage. Default:
(1, 2, 2, 2)
.dilations (Sequence[int]) – Dilation of each stage. Default:
(1, 1, 1, 1)
.out_indices (Sequence[int]) – Output from which stages. If only one stage is specified, a single tensor (feature map) is returned, otherwise multiple stages are specified, a tuple of tensors will be returned. Default:
(3, )
.style (str) – pytorch or caffe. If set to “pytorch”, the stride-two layer is the 3x3 conv layer, otherwise the stride-two layer is the first 1x1 conv layer.
deep_stem (bool) – Replace 7x7 conv in input stem with 3 3x3 conv. Default: False.
avg_down (bool) – Use AvgPool instead of stride conv when downsampling in the bottleneck. Default: False.
frozen_stages (int) – Stages to be frozen (stop grad and set eval mode). -1 means not freezing any parameters. Default: -1.
conv_cfg (dict | None) – The config dict for conv layers. Default: None.
norm_cfg (dict) – The config dict for norm layers.
norm_eval (bool) – Whether to set norm layers to eval mode, namely, freeze running stats (mean and var). Note: Effect on Batch Norm and its variants only. Default: False.
with_cp (bool) – Use checkpoint or not. Using checkpoint will save some memory while slowing down the training speed. Default: False.
zero_init_residual (bool) – Whether to use zero init for last norm layer in resblocks to let them behave as identity. Default: True.
示例
>>> from mmpose.models import ResNet >>> import torch >>> self = ResNet(depth=18, out_indices=(0, 1, 2, 3)) >>> self.eval() >>> inputs = torch.rand(1, 3, 32, 32) >>> level_outputs = self.forward(inputs) >>> for level_out in level_outputs: ... print(tuple(level_out.shape)) (1, 64, 8, 8) (1, 128, 4, 4) (1, 256, 2, 2) (1, 512, 1, 1)
- init_weights(pretrained=None)[源代码]¶
Initialize the weights in backbone.
- 参数
pretrained (str, optional) – Path to pre-trained weights. Defaults to None.
- property norm1¶
the normalization layer named “norm1”
- Type
nn.Module
- class mmpose.models.backbones.ResNetV1d(**kwargs)[源代码]¶
ResNetV1d variant described in Bag of Tricks.
Compared with default ResNet(ResNetV1b), ResNetV1d replaces the 7x7 conv in the input stem with three 3x3 convs. And in the downsampling block, a 2x2 avg_pool with stride 2 is added before conv, whose stride is changed to 1.
- class mmpose.models.backbones.SCNet(depth, **kwargs)[源代码]¶
SCNet backbone.
Improving Convolutional Networks with Self-Calibrated Convolutions, Jiang-Jiang Liu, Qibin Hou, Ming-Ming Cheng, Changhu Wang, Jiashi Feng, IEEE CVPR, 2020. http://mftp.mmcheng.net/Papers/20cvprSCNet.pdf
- 参数
depth (int) – Depth of scnet, from {50, 101}.
in_channels (int) – Number of input image channels. Normally 3.
base_channels (int) – Number of base channels of hidden layer.
num_stages (int) – SCNet stages, normally 4.
strides (Sequence[int]) – Strides of the first block of each stage.
dilations (Sequence[int]) – Dilation of each stage.
out_indices (Sequence[int]) – Output from which stages.
style (str) – pytorch or caffe. If set to “pytorch”, the stride-two layer is the 3x3 conv layer, otherwise the stride-two layer is the first 1x1 conv layer.
deep_stem (bool) – Replace 7x7 conv in input stem with 3 3x3 conv
avg_down (bool) – Use AvgPool instead of stride conv when downsampling in the bottleneck.
frozen_stages (int) – Stages to be frozen (stop grad and set eval mode). -1 means not freezing any parameters.
norm_cfg (dict) – Dictionary to construct and config norm layer.
norm_eval (bool) – Whether to set norm layers to eval mode, namely, freeze running stats (mean and var). Note: Effect on Batch Norm and its variants only.
with_cp (bool) – Use checkpoint or not. Using checkpoint will save some memory while slowing down the training speed.
zero_init_residual (bool) – Whether to use zero init for last norm layer in resblocks to let them behave as identity.
示例
>>> from mmpose.models import SCNet >>> import torch >>> self = SCNet(depth=50, out_indices=(0, 1, 2, 3)) >>> self.eval() >>> inputs = torch.rand(1, 3, 224, 224) >>> level_outputs = self.forward(inputs) >>> for level_out in level_outputs: ... print(tuple(level_out.shape)) (1, 256, 56, 56) (1, 512, 28, 28) (1, 1024, 14, 14) (1, 2048, 7, 7)
- class mmpose.models.backbones.SEResNeXt(depth, groups=32, width_per_group=4, **kwargs)[源代码]¶
SEResNeXt backbone.
Please refer to the paper for details.
- 参数
depth (int) – Network depth, from {50, 101, 152}.
groups (int) – Groups of conv2 in Bottleneck. Default: 32.
width_per_group (int) – Width per group of conv2 in Bottleneck. Default: 4.
se_ratio (int) – Squeeze ratio in SELayer. Default: 16.
in_channels (int) – Number of input image channels. Default: 3.
stem_channels (int) – Output channels of the stem layer. Default: 64.
num_stages (int) – Stages of the network. Default: 4.
strides (Sequence[int]) – Strides of the first block of each stage. Default:
(1, 2, 2, 2)
.dilations (Sequence[int]) – Dilation of each stage. Default:
(1, 1, 1, 1)
.out_indices (Sequence[int]) – Output from which stages. If only one stage is specified, a single tensor (feature map) is returned, otherwise multiple stages are specified, a tuple of tensors will be returned. Default:
(3, )
.style (str) – pytorch or caffe. If set to “pytorch”, the stride-two layer is the 3x3 conv layer, otherwise the stride-two layer is the first 1x1 conv layer.
deep_stem (bool) – Replace 7x7 conv in input stem with 3 3x3 conv. Default: False.
avg_down (bool) – Use AvgPool instead of stride conv when downsampling in the bottleneck. Default: False.
frozen_stages (int) – Stages to be frozen (stop grad and set eval mode). -1 means not freezing any parameters. Default: -1.
conv_cfg (dict | None) – The config dict for conv layers. Default: None.
norm_cfg (dict) – The config dict for norm layers.
norm_eval (bool) – Whether to set norm layers to eval mode, namely, freeze running stats (mean and var). Note: Effect on Batch Norm and its variants only. Default: False.
with_cp (bool) – Use checkpoint or not. Using checkpoint will save some memory while slowing down the training speed. Default: False.
zero_init_residual (bool) – Whether to use zero init for last norm layer in resblocks to let them behave as identity. Default: True.
示例
>>> from mmpose.models import SEResNeXt >>> import torch >>> self = SEResNet(depth=50, out_indices=(0, 1, 2, 3)) >>> self.eval() >>> inputs = torch.rand(1, 3, 224, 224) >>> level_outputs = self.forward(inputs) >>> for level_out in level_outputs: ... print(tuple(level_out.shape)) (1, 256, 56, 56) (1, 512, 28, 28) (1, 1024, 14, 14) (1, 2048, 7, 7)
- class mmpose.models.backbones.SEResNet(depth, se_ratio=16, **kwargs)[源代码]¶
SEResNet backbone.
Please refer to the paper for details.
- 参数
depth (int) – Network depth, from {50, 101, 152}.
se_ratio (int) – Squeeze ratio in SELayer. Default: 16.
in_channels (int) – Number of input image channels. Default: 3.
stem_channels (int) – Output channels of the stem layer. Default: 64.
num_stages (int) – Stages of the network. Default: 4.
strides (Sequence[int]) – Strides of the first block of each stage. Default:
(1, 2, 2, 2)
.dilations (Sequence[int]) – Dilation of each stage. Default:
(1, 1, 1, 1)
.out_indices (Sequence[int]) – Output from which stages. If only one stage is specified, a single tensor (feature map) is returned, otherwise multiple stages are specified, a tuple of tensors will be returned. Default:
(3, )
.style (str) – pytorch or caffe. If set to “pytorch”, the stride-two layer is the 3x3 conv layer, otherwise the stride-two layer is the first 1x1 conv layer.
deep_stem (bool) – Replace 7x7 conv in input stem with 3 3x3 conv. Default: False.
avg_down (bool) – Use AvgPool instead of stride conv when downsampling in the bottleneck. Default: False.
frozen_stages (int) – Stages to be frozen (stop grad and set eval mode). -1 means not freezing any parameters. Default: -1.
conv_cfg (dict | None) – The config dict for conv layers. Default: None.
norm_cfg (dict) – The config dict for norm layers.
norm_eval (bool) – Whether to set norm layers to eval mode, namely, freeze running stats (mean and var). Note: Effect on Batch Norm and its variants only. Default: False.
with_cp (bool) – Use checkpoint or not. Using checkpoint will save some memory while slowing down the training speed. Default: False.
zero_init_residual (bool) – Whether to use zero init for last norm layer in resblocks to let them behave as identity. Default: True.
示例
>>> from mmpose.models import SEResNet >>> import torch >>> self = SEResNet(depth=50, out_indices=(0, 1, 2, 3)) >>> self.eval() >>> inputs = torch.rand(1, 3, 224, 224) >>> level_outputs = self.forward(inputs) >>> for level_out in level_outputs: ... print(tuple(level_out.shape)) (1, 256, 56, 56) (1, 512, 28, 28) (1, 1024, 14, 14) (1, 2048, 7, 7)
- class mmpose.models.backbones.ShuffleNetV1(groups=3, widen_factor=1.0, out_indices=(2), frozen_stages=- 1, conv_cfg=None, norm_cfg={'type': 'BN'}, act_cfg={'type': 'ReLU'}, norm_eval=False, with_cp=False)[源代码]¶
ShuffleNetV1 backbone.
- 参数
groups (int, optional) – The number of groups to be used in grouped 1x1 convolutions in each ShuffleUnit. Default: 3.
widen_factor (float, optional) – Width multiplier - adjusts the number of channels in each layer by this amount. Default: 1.0.
out_indices (Sequence[int]) – Output from which stages. Default: (2, )
frozen_stages (int) – Stages to be frozen (all param fixed). Default: -1, which means not freezing any parameters.
conv_cfg (dict) – Config dict for convolution layer. Default: None, which means using conv2d.
norm_cfg (dict) – Config dict for normalization layer. Default: dict(type=’BN’).
act_cfg (dict) – Config dict for activation layer. Default: dict(type=’ReLU’).
norm_eval (bool) – Whether to set norm layers to eval mode, namely, freeze running stats (mean and var). Note: Effect on Batch Norm and its variants only. Default: False.
with_cp (bool) – Use checkpoint or not. Using checkpoint will save some memory while slowing down the training speed. Default: False.
- forward(x)[源代码]¶
Forward function.
- 参数
x (Tensor | tuple[Tensor]) – x could be a torch.Tensor or a tuple of torch.Tensor, containing input data for forward computation.
- init_weights(pretrained=None)[源代码]¶
Init backbone weights.
- 参数
pretrained (str | None) – If pretrained is a string, then it initializes backbone weights by loading the pretrained checkpoint. If pretrained is None, then it follows default initializer or customized initializer in subclasses.
- make_layer(out_channels, num_blocks, first_block=False)[源代码]¶
Stack ShuffleUnit blocks to make a layer.
- 参数
out_channels (int) – out_channels of the block.
num_blocks (int) – Number of blocks.
first_block (bool, optional) – Whether is the first ShuffleUnit of a sequential ShuffleUnits. Default: False, which means using the grouped 1x1 convolution.
- train(mode=True)[源代码]¶
Sets the module in training mode.
This has any effect only on certain modules. See documentations of particular modules for details of their behaviors in training/evaluation mode, if they are affected, e.g.
Dropout
,BatchNorm
, etc.- 参数
mode (bool) – whether to set training mode (
True
) or evaluation mode (False
). Default:True
.- 返回
self
- 返回类型
Module
- class mmpose.models.backbones.ShuffleNetV2(widen_factor=1.0, out_indices=(3), frozen_stages=- 1, conv_cfg=None, norm_cfg={'type': 'BN'}, act_cfg={'type': 'ReLU'}, norm_eval=False, with_cp=False)[源代码]¶
ShuffleNetV2 backbone.
- 参数
widen_factor (float) – Width multiplier - adjusts the number of channels in each layer by this amount. Default: 1.0.
out_indices (Sequence[int]) – Output from which stages. Default: (0, 1, 2, 3).
frozen_stages (int) – Stages to be frozen (all param fixed). Default: -1, which means not freezing any parameters.
conv_cfg (dict) – Config dict for convolution layer. Default: None, which means using conv2d.
norm_cfg (dict) – Config dict for normalization layer. Default: dict(type=’BN’).
act_cfg (dict) – Config dict for activation layer. Default: dict(type=’ReLU’).
norm_eval (bool) – Whether to set norm layers to eval mode, namely, freeze running stats (mean and var). Note: Effect on Batch Norm and its variants only. Default: False.
with_cp (bool) – Use checkpoint or not. Using checkpoint will save some memory while slowing down the training speed. Default: False.
- forward(x)[源代码]¶
Forward function.
- 参数
x (Tensor | tuple[Tensor]) – x could be a torch.Tensor or a tuple of torch.Tensor, containing input data for forward computation.
- init_weights(pretrained=None)[源代码]¶
Init backbone weights.
- 参数
pretrained (str | None) – If pretrained is a string, then it initializes backbone weights by loading the pretrained checkpoint. If pretrained is None, then it follows default initializer or customized initializer in subclasses.
- train(mode=True)[源代码]¶
Sets the module in training mode.
This has any effect only on certain modules. See documentations of particular modules for details of their behaviors in training/evaluation mode, if they are affected, e.g.
Dropout
,BatchNorm
, etc.- 参数
mode (bool) – whether to set training mode (
True
) or evaluation mode (False
). Default:True
.- 返回
self
- 返回类型
Module
- class mmpose.models.backbones.TCN(in_channels, stem_channels=1024, num_blocks=2, kernel_sizes=(3, 3, 3), dropout=0.25, causal=False, residual=True, use_stride_conv=False, conv_cfg={'type': 'Conv1d'}, norm_cfg={'type': 'BN1d'}, max_norm=None)[源代码]¶
TCN backbone.
Temporal Convolutional Networks. More details can be found in the paper .
- 参数
in_channels (int) – Number of input channels, which equals to num_keypoints * num_features.
stem_channels (int) – Number of feature channels. Default: 1024.
num_blocks (int) – NUmber of basic temporal convolutional blocks. Default: 2.
kernel_sizes (Sequence[int]) – Sizes of the convolving kernel of each basic block. Default:
(3, 3, 3)
.dropout (float) – Dropout rate. Default: 0.25.
causal (bool) – Use causal convolutions instead of symmetric convolutions (for real-time applications). Default: False.
residual (bool) – Use residual connection. Default: True.
use_stride_conv (bool) – Use TCN backbone optimized for single-frame batching, i.e. where batches have input length = receptive field, and output length = 1. This implementation replaces dilated convolutions with strided convolutions to avoid generating unused intermediate results. The weights are interchangeable with the reference implementation. Default: False
conv_cfg (dict) – dictionary to construct and config conv layer. Default: dict(type=’Conv1d’).
norm_cfg (dict) – dictionary to construct and config norm layer. Default: dict(type=’BN1d’).
max_norm (float|None) – if not None, the weight of convolution layers will be clipped to have a maximum norm of max_norm.
示例
>>> from mmpose.models import TCN >>> import torch >>> self = TCN(in_channels=34) >>> self.eval() >>> inputs = torch.rand(1, 34, 243) >>> level_outputs = self.forward(inputs) >>> for level_out in level_outputs: ... print(tuple(level_out.shape)) (1, 1024, 235) (1, 1024, 217)
- class mmpose.models.backbones.V2VNet(input_channels, output_channels, mid_channels=32)[源代码]¶
V2VNet.
- Please refer to the paper <https://arxiv.org/abs/1711.07399>
for details.
- 参数
input_channels (int) – Number of channels of the input feature volume.
output_channels (int) – Number of channels of the output volume.
mid_channels (int) – Input and output channels of the encoder-decoder block.
- class mmpose.models.backbones.VGG(depth, num_classes=- 1, num_stages=5, dilations=(1, 1, 1, 1, 1), out_indices=None, frozen_stages=- 1, conv_cfg=None, norm_cfg=None, act_cfg={'type': 'ReLU'}, norm_eval=False, ceil_mode=False, with_last_pool=True)[源代码]¶
VGG backbone.
- 参数
depth (int) – Depth of vgg, from {11, 13, 16, 19}.
with_norm (bool) – Use BatchNorm or not.
num_classes (int) – number of classes for classification.
num_stages (int) – VGG stages, normally 5.
dilations (Sequence[int]) – Dilation of each stage.
out_indices (Sequence[int]) – Output from which stages. If only one stage is specified, a single tensor (feature map) is returned, otherwise multiple stages are specified, a tuple of tensors will be returned. When it is None, the default behavior depends on whether num_classes is specified. If num_classes <= 0, the default value is (4, ), outputting the last feature map before classifier. If num_classes > 0, the default value is (5, ), outputting the classification score. Default: None.
frozen_stages (int) – Stages to be frozen (all param fixed). -1 means not freezing any parameters.
norm_eval (bool) – Whether to set norm layers to eval mode, namely, freeze running stats (mean and var). Note: Effect on Batch Norm and its variants only. Default: False.
ceil_mode (bool) – Whether to use ceil_mode of MaxPool. Default: False.
with_last_pool (bool) – Whether to keep the last pooling before classifier. Default: True.
- forward(x)[源代码]¶
Forward function.
- 参数
x (Tensor | tuple[Tensor]) – x could be a torch.Tensor or a tuple of torch.Tensor, containing input data for forward computation.
- init_weights(pretrained=None)[源代码]¶
Init backbone weights.
- 参数
pretrained (str | None) – If pretrained is a string, then it initializes backbone weights by loading the pretrained checkpoint. If pretrained is None, then it follows default initializer or customized initializer in subclasses.
- train(mode=True)[源代码]¶
Sets the module in training mode.
This has any effect only on certain modules. See documentations of particular modules for details of their behaviors in training/evaluation mode, if they are affected, e.g.
Dropout
,BatchNorm
, etc.- 参数
mode (bool) – whether to set training mode (
True
) or evaluation mode (False
). Default:True
.- 返回
self
- 返回类型
Module
- class mmpose.models.backbones.ViPNAS_MobileNetV3(wid=[16, 16, 24, 40, 80, 112, 160], expan=[None, 1, 5, 4, 5, 5, 6], dep=[None, 1, 4, 4, 4, 4, 4], ks=[3, 3, 7, 7, 5, 7, 5], group=[None, 8, 120, 20, 100, 280, 240], att=[None, True, True, False, True, True, True], stride=[2, 1, 2, 2, 2, 1, 2], act=['HSwish', 'ReLU', 'ReLU', 'ReLU', 'HSwish', 'HSwish', 'HSwish'], conv_cfg=None, norm_cfg={'type': 'BN'}, frozen_stages=- 1, norm_eval=False, with_cp=False)[源代码]¶
ViPNAS_MobileNetV3 backbone.
“ViPNAS: Efficient Video Pose Estimation via Neural Architecture Search” More details can be found in the paper .
- 参数
wid (list(int)) – Searched width config for each stage.
expan (list(int)) – Searched expansion ratio config for each stage.
dep (list(int)) – Searched depth config for each stage.
ks (list(int)) – Searched kernel size config for each stage.
group (list(int)) – Searched group number config for each stage.
att (list(bool)) – Searched attention config for each stage.
stride (list(int)) – Stride config for each stage.
act (list(dict)) – Activation config for each stage.
conv_cfg (dict) – Config dict for convolution layer. Default: None, which means using conv2d.
norm_cfg (dict) – Config dict for normalization layer. Default: dict(type=’BN’).
frozen_stages (int) – Stages to be frozen (all param fixed). Default: -1, which means not freezing any parameters.
norm_eval (bool) – Whether to set norm layers to eval mode, namely, freeze running stats (mean and var). Note: Effect on Batch Norm and its variants only. Default: False.
with_cp (bool) – Use checkpoint or not. Using checkpoint will save some memory while slowing down the training speed. Default: False.
- forward(x)[源代码]¶
Forward function.
- 参数
x (Tensor | tuple[Tensor]) – x could be a torch.Tensor or a tuple of torch.Tensor, containing input data for forward computation.
- init_weights(pretrained=None)[源代码]¶
Init backbone weights.
- 参数
pretrained (str | None) – If pretrained is a string, then it initializes backbone weights by loading the pretrained checkpoint. If pretrained is None, then it follows default initializer or customized initializer in subclasses.
- train(mode=True)[源代码]¶
Sets the module in training mode.
This has any effect only on certain modules. See documentations of particular modules for details of their behaviors in training/evaluation mode, if they are affected, e.g.
Dropout
,BatchNorm
, etc.- 参数
mode (bool) – whether to set training mode (
True
) or evaluation mode (False
). Default:True
.- 返回
self
- 返回类型
Module
- class mmpose.models.backbones.ViPNAS_ResNet(depth, in_channels=3, num_stages=4, strides=(1, 2, 2, 2), dilations=(1, 1, 1, 1), out_indices=(3), style='pytorch', deep_stem=False, avg_down=False, frozen_stages=- 1, conv_cfg=None, norm_cfg={'requires_grad': True, 'type': 'BN'}, norm_eval=False, with_cp=False, zero_init_residual=True, wid=[48, 80, 160, 304, 608], expan=[None, 1, 1, 1, 1], dep=[None, 4, 6, 7, 3], ks=[7, 3, 5, 5, 5], group=[None, 16, 16, 16, 16], att=[None, True, False, True, True])[源代码]¶
ViPNAS_ResNet backbone.
“ViPNAS: Efficient Video Pose Estimation via Neural Architecture Search” More details can be found in the paper .
- 参数
depth (int) – Network depth, from {18, 34, 50, 101, 152}.
in_channels (int) – Number of input image channels. Default: 3.
num_stages (int) – Stages of the network. Default: 4.
strides (Sequence[int]) – Strides of the first block of each stage. Default:
(1, 2, 2, 2)
.dilations (Sequence[int]) – Dilation of each stage. Default:
(1, 1, 1, 1)
.out_indices (Sequence[int]) – Output from which stages. If only one stage is specified, a single tensor (feature map) is returned, otherwise multiple stages are specified, a tuple of tensors will be returned. Default:
(3, )
.style (str) – pytorch or caffe. If set to “pytorch”, the stride-two layer is the 3x3 conv layer, otherwise the stride-two layer is the first 1x1 conv layer.
deep_stem (bool) – Replace 7x7 conv in input stem with 3 3x3 conv. Default: False.
avg_down (bool) – Use AvgPool instead of stride conv when downsampling in the bottleneck. Default: False.
frozen_stages (int) – Stages to be frozen (stop grad and set eval mode). -1 means not freezing any parameters. Default: -1.
conv_cfg (dict | None) – The config dict for conv layers. Default: None.
norm_cfg (dict) – The config dict for norm layers.
norm_eval (bool) – Whether to set norm layers to eval mode, namely, freeze running stats (mean and var). Note: Effect on Batch Norm and its variants only. Default: False.
with_cp (bool) – Use checkpoint or not. Using checkpoint will save some memory while slowing down the training speed. Default: False.
zero_init_residual (bool) – Whether to use zero init for last norm layer in resblocks to let them behave as identity. Default: True.
wid (list(int)) – Searched width config for each stage.
expan (list(int)) – Searched expansion ratio config for each stage.
dep (list(int)) – Searched depth config for each stage.
ks (list(int)) – Searched kernel size config for each stage.
group (list(int)) – Searched group number config for each stage.
att (list(bool)) – Searched attention config for each stage.
- property norm1¶
the normalization layer named “norm1”
- Type
nn.Module
necks¶
- class mmpose.models.necks.GlobalAveragePooling[源代码]¶
Global Average Pooling neck.
Note that we use view to remove extra channel after pooling. We do not use squeeze as it will also remove the batch dimension when the tensor has a batch dimension of size 1, which can lead to unexpected errors.
- forward(inputs)[源代码]¶
Defines the computation performed at every call.
Should be overridden by all subclasses.
注解
Although the recipe for forward pass needs to be defined within this function, one should call the
Module
instance afterwards instead of this since the former takes care of running the registered hooks while the latter silently ignores them.
- class mmpose.models.necks.PoseWarperNeck(in_channels, out_channels, inner_channels, deform_groups=17, dilations=(3, 6, 12, 18, 24), trans_conv_kernel=1, res_blocks_cfg=None, offsets_kernel=3, deform_conv_kernel=3, in_index=0, input_transform=None, freeze_trans_layer=True, norm_eval=False, im2col_step=80)[源代码]¶
PoseWarper neck.
“Learning temporal pose estimation from sparsely-labeled videos”.
- 参数
in_channels (int) – Number of input channels from backbone
out_channels (int) – Number of output channels
inner_channels (int) – Number of intermediate channels of the res block
deform_groups (int) – Number of groups in the deformable conv
dilations (list|tuple) – different dilations of the offset conv layers
trans_conv_kernel (int) – the kernel of the trans conv layer, which is used to get heatmap from the output of backbone. Default: 1
res_blocks_cfg (dict|None) –
config of residual blocks. If None, use the default values. If not None, it should contain the following keys:
block (str): the type of residual block, Default: ‘BASIC’.
num_blocks (int): the number of blocks, Default: 20.
offsets_kernel (int) – the kernel of offset conv layer.
deform_conv_kernel (int) – the kernel of defomrable conv layer.
in_index (int|Sequence[int]) – Input feature index. Default: 0
input_transform (str|None) –
Transformation type of input features. Options: ‘resize_concat’, ‘multiple_select’, None. Default: None.
’resize_concat’: Multiple feature maps will be resize to the same size as first one and than concat together. Usually used in FCN head of HRNet.
’multiple_select’: Multiple feature maps will be bundle into a list and passed into decode head.
None: Only one select feature map is allowed.
freeze_trans_layer (bool) – Whether to freeze the transition layer (stop grad and set eval mode). Default: True.
norm_eval (bool) – Whether to set norm layers to eval mode, namely, freeze running stats (mean and var). Note: Effect on Batch Norm and its variants only. Default: False.
im2col_step (int) – the argument im2col_step in deformable conv, Default: 80.
- forward(inputs, frame_weight)[源代码]¶
Defines the computation performed at every call.
Should be overridden by all subclasses.
注解
Although the recipe for forward pass needs to be defined within this function, one should call the
Module
instance afterwards instead of this since the former takes care of running the registered hooks while the latter silently ignores them.
detectors¶
- class mmpose.models.detectors.AssociativeEmbedding(backbone, keypoint_head=None, train_cfg=None, test_cfg=None, pretrained=None, loss_pose=None)[源代码]¶
Associative embedding pose detectors.
- 参数
backbone (dict) – Backbone modules to extract feature.
keypoint_head (dict) – Keypoint head to process feature.
train_cfg (dict) – Config for training. Default: None.
test_cfg (dict) – Config for testing. Default: None.
pretrained (str) – Path to the pretrained models.
loss_pose (None) – Deprecated arguments. Please use
loss_keypoint
for heads instead.
- forward(img=None, targets=None, masks=None, joints=None, img_metas=None, return_loss=True, return_heatmap=False, **kwargs)[源代码]¶
Calls either forward_train or forward_test depending on whether return_loss is True.
注解
batch_size: N
num_keypoints: K
num_img_channel: C
img_width: imgW
img_height: imgH
heatmaps weight: W
heatmaps height: H
max_num_people: M
- 参数
img (torch.Tensor[N,C,imgH,imgW]) – Input image.
targets (list(torch.Tensor[N,K,H,W])) – Multi-scale target heatmaps.
masks (list(torch.Tensor[N,H,W])) – Masks of multi-scale target heatmaps
joints (list(torch.Tensor[N,M,K,2])) – Joints of multi-scale target heatmaps for ae loss
img_metas (dict) –
Information about val & test. By default it includes:
”image_file”: image path
”aug_data”: input
”test_scale_factor”: test scale factor
”base_size”: base size of input
”center”: center of image
”scale”: scale of image
”flip_index”: flip index of keypoints
loss (return) –
return_loss=True
for training,return_loss=False
for validation & test.return_heatmap (bool) – Option to return heatmap.
- 返回
if ‘return_loss’ is true, then return losses. Otherwise, return predicted poses, scores, image paths and heatmaps.
- 返回类型
dict|tuple
- forward_dummy(img)[源代码]¶
Used for computing network FLOPs.
See
tools/get_flops.py
.- 参数
img (torch.Tensor) – Input image.
- 返回
Outputs.
- 返回类型
Tensor
- forward_test(img, img_metas, return_heatmap=False, **kwargs)[源代码]¶
Inference the bottom-up model.
注解
Batchsize: N (currently support batchsize = 1)
num_img_channel: C
img_width: imgW
img_height: imgH
- 参数
flip_index (List(int)) –
aug_data (List(Tensor[NxCximgHximgW])) – Multi-scale image
test_scale_factor (List(float)) – Multi-scale factor
base_size (Tuple(int)) – Base size of image when scale is 1
center (np.ndarray) – center of image
scale (np.ndarray) – the scale of image
- forward_train(img, targets, masks, joints, img_metas, **kwargs)[源代码]¶
Forward the bottom-up model and calculate the loss.
注解
batch_size: N num_keypoints: K num_img_channel: C img_width: imgW img_height: imgH heatmaps weight: W heatmaps height: H max_num_people: M
- 参数
img (torch.Tensor[N,C,imgH,imgW]) – Input image.
targets (List(torch.Tensor[N,K,H,W])) – Multi-scale target heatmaps.
masks (List(torch.Tensor[N,H,W])) – Masks of multi-scale target heatmaps
joints (List(torch.Tensor[N,M,K,2])) – Joints of multi-scale target heatmaps for ae loss
img_metas (dict) – Information about val&test By default this includes: - “image_file”: image path - “aug_data”: input - “test_scale_factor”: test scale factor - “base_size”: base size of input - “center”: center of image - “scale”: scale of image - “flip_index”: flip index of keypoints
- 返回
The total loss for bottom-up
- 返回类型
dict
- show_result(img, result, skeleton=None, kpt_score_thr=0.3, bbox_color=None, pose_kpt_color=None, pose_link_color=None, radius=4, thickness=1, font_scale=0.5, win_name='', show=False, show_keypoint_weight=False, wait_time=0, out_file=None)[源代码]¶
Draw result over img.
- 参数
img (str or Tensor) – The image to be displayed.
result (list[dict]) – The results to draw over img (bbox_result, pose_result).
skeleton (list[list]) – The connection of keypoints. skeleton is 0-based indexing.
kpt_score_thr (float, optional) – Minimum score of keypoints to be shown. Default: 0.3.
pose_kpt_color (np.array[Nx3]`) – Color of N keypoints. If None, do not draw keypoints.
pose_link_color (np.array[Mx3]) – Color of M links. If None, do not draw links.
radius (int) – Radius of circles.
thickness (int) – Thickness of lines.
font_scale (float) – Font scales of texts.
win_name (str) – The window name.
show (bool) – Whether to show the image. Default: False.
show_keypoint_weight (bool) – Whether to change the transparency using the predicted confidence scores of keypoints.
wait_time (int) – Value of waitKey param. Default: 0.
out_file (str or None) – The filename to write the image. Default: None.
- 返回
Visualized image only if not show or out_file
- 返回类型
Tensor
- property with_keypoint¶
Check if has keypoint_head.
- class mmpose.models.detectors.Interhand3D(backbone, neck=None, keypoint_head=None, train_cfg=None, test_cfg=None, pretrained=None, loss_pose=None)[源代码]¶
Top-down interhand 3D pose detector of paper ref: Gyeongsik Moon.
“InterHand2.6M: A Dataset and Baseline for 3D Interacting Hand Pose Estimation from a Single RGB Image”. A child class of TopDown detector.
- forward(img, target=None, target_weight=None, img_metas=None, return_loss=True, **kwargs)[源代码]¶
Calls either forward_train or forward_test depending on whether return_loss=True. Note this setting will change the expected inputs. When return_loss=True, img and img_meta are single-nested (i.e. Tensor and List[dict]), and when resturn_loss=False, img and img_meta should be double nested (i.e. list[Tensor], list[list[dict]]), with the outer list indicating test time augmentations.
注解
batch_size: N
num_keypoints: K
num_img_channel: C (Default: 3)
img height: imgH
img width: imgW
heatmaps height: H
heatmaps weight: W
- 参数
img (torch.Tensor[NxCximgHximgW]) – Input images.
target (list[torch.Tensor]) – Target heatmaps, relative hand
depth and hand type. (root) –
target_weight (list[torch.Tensor]) – Weights for target
heatmaps –
hand root depth and hand type. (relative) –
img_metas (list(dict)) –
Information about data augmentation By default this includes:
”image_file: path to the image file
”center”: center of the bbox
”scale”: scale of the bbox
”rotation”: rotation of the bbox
”bbox_score”: score of bbox
- ”heatmap3d_depth_bound”: depth bound of hand keypoint 3D
heatmap
- ”root_depth_bound”: depth bound of relative root depth 1D
heatmap
return_loss (bool) – Option to return loss. return loss=True for training, return loss=False for validation & test.
- 返回
if return loss is true, then return losses. Otherwise, return predicted poses, boxes, image paths, heatmaps, relative hand root depth and hand type.
- 返回类型
dict|tuple
- forward_test(img, img_metas, **kwargs)[源代码]¶
Defines the computation performed at every call when testing.
- show_result(result, img=None, skeleton=None, kpt_score_thr=0.3, radius=8, bbox_color='green', thickness=2, pose_kpt_color=None, pose_link_color=None, vis_height=400, num_instances=- 1, win_name='', show=False, wait_time=0, out_file=None)[源代码]¶
Visualize 3D pose estimation results.
- 参数
result (list[dict]) –
The pose estimation results containing:
”keypoints_3d” ([K,4]): 3D keypoints
- ”keypoints” ([K,3] or [T,K,3]): Optional for visualizing
2D inputs. If a sequence is given, only the last frame will be used for visualization
”bbox” ([4,] or [T,4]): Optional for visualizing 2D inputs
”title” (str): title for the subplot
img (str or Tensor) – Optional. The image to visualize 2D inputs on.
skeleton (list of [idx_i,idx_j]) – Skeleton described by a list of links, each is a pair of joint indices.
kpt_score_thr (float, optional) – Minimum score of keypoints to be shown. Default: 0.3.
radius (int) – Radius of circles.
bbox_color (str or tuple or
Color
) – Color of bbox lines.thickness (int) – Thickness of lines.
pose_kpt_color (np.array[Nx3]`) – Color of N keypoints. If None, do not draw keypoints.
pose_link_color (np.array[Mx3]) – Color of M limbs. If None, do not draw limbs.
vis_height (int) – The image height of the visualization. The width will be N*vis_height depending on the number of visualized items.
num_instances (int) – Number of instances to be shown in 3D. If smaller than 0, all the instances in the pose_result will be shown. Otherwise, pad or truncate the pose_result to a length of num_instances.
win_name (str) – The window name.
show (bool) – Whether to show the image. Default: False.
wait_time (int) – Value of waitKey param. Default: 0.
out_file (str or None) – The filename to write the image. Default: None.
- 返回
Visualized img, only if not show or out_file.
- 返回类型
Tensor
- class mmpose.models.detectors.MultiTask(backbone, heads, necks=None, head2neck=None, pretrained=None)[源代码]¶
Multi-task detectors.
- 参数
backbone (dict) – Backbone modules to extract feature.
heads (list[dict]) – heads to output predictions.
necks (list[dict] | None) – necks to process feature.
(dict{int (head2neck) – int}): head index to neck index.
pretrained (str) – Path to the pretrained models.
- forward(img, target=None, target_weight=None, img_metas=None, return_loss=True, **kwargs)[源代码]¶
Calls either forward_train or forward_test depending on whether return_loss=True. Note this setting will change the expected inputs. When return_loss=True, img and img_meta are single-nested (i.e. Tensor and List[dict]), and when resturn_loss=False, img and img_meta should be double nested (i.e. List[Tensor], List[List[dict]]), with the outer list indicating test time augmentations.
注解
batch_size: N
num_keypoints: K
num_img_channel: C (Default: 3)
img height: imgH
img weight: imgW
heatmaps height: H
heatmaps weight: W
- 参数
img (torch.Tensor[N,C,imgH,imgW]) – Input images.
target (list[torch.Tensor]) – Targets.
target_weight (List[torch.Tensor]) – Weights.
img_metas (list(dict)) –
Information about data augmentation By default this includes:
”image_file: path to the image file
”center”: center of the bbox
”scale”: scale of the bbox
”rotation”: rotation of the bbox
”bbox_score”: score of bbox
return_loss (bool) – Option to return loss. return loss=True for training, return loss=False for validation & test.
- 返回
if return loss is true, then return losses. Otherwise, return predicted poses, boxes, image paths and heatmaps.
- 返回类型
dict|tuple
- forward_dummy(img)[源代码]¶
Used for computing network FLOPs.
See
tools/get_flops.py
.- 参数
img (torch.Tensor) – Input image.
- 返回
Outputs.
- 返回类型
list[Tensor]
- forward_test(img, img_metas, **kwargs)[源代码]¶
Defines the computation performed at every call when testing.
- forward_train(img, target, target_weight, img_metas, **kwargs)[源代码]¶
Defines the computation performed at every call when training.
- property with_necks¶
Check if has keypoint_head.
- class mmpose.models.detectors.ParametricMesh(backbone, mesh_head, smpl, disc=None, loss_gan=None, loss_mesh=None, train_cfg=None, test_cfg=None, pretrained=None)[源代码]¶
Model-based 3D human mesh detector. Take a single color image as input and output 3D joints, SMPL parameters and camera parameters.
- 参数
backbone (dict) – Backbone modules to extract feature.
mesh_head (dict) – Mesh head to process feature.
smpl (dict) – Config for SMPL model.
disc (dict) – Discriminator for SMPL parameters. Default: None.
loss_gan (dict) – Config for adversarial loss. Default: None.
loss_mesh (dict) – Config for mesh loss. Default: None.
train_cfg (dict) – Config for training. Default: None.
test_cfg (dict) – Config for testing. Default: None.
pretrained (str) – Path to the pretrained models.
- forward(img, img_metas=None, return_loss=False, **kwargs)[源代码]¶
Forward function.
Calls either forward_train or forward_test depending on whether return_loss=True.
注解
batch_size: N
num_img_channel: C (Default: 3)
img height: imgH
img width: imgW
- 参数
img (torch.Tensor[N x C x imgH x imgW]) – Input images.
img_metas (list(dict)) –
Information about data augmentation By default this includes:
”image_file: path to the image file
”center”: center of the bbox
”scale”: scale of the bbox
”rotation”: rotation of the bbox
”bbox_score”: score of bbox
return_loss (bool) – Option to return loss. return loss=True for training, return loss=False for validation & test.
- 返回
Return predicted 3D joints, SMPL parameters, boxes and image paths.
- forward_dummy(img)[源代码]¶
Used for computing network FLOPs.
See
tools/get_flops.py
.- 参数
img (torch.Tensor) – Input image.
- 返回
Outputs.
- 返回类型
Tensor
- forward_test(img, img_metas, return_vertices=False, return_faces=False, **kwargs)[源代码]¶
Defines the computation performed at every call when testing.
- forward_train(*args, **kwargs)[源代码]¶
Forward function for training.
For ParametricMesh, we do not use this interface.
- get_3d_joints_from_mesh(vertices)[源代码]¶
Get 3D joints from 3D mesh using predefined joints regressor.
- show_result(result, img, show=False, out_file=None, win_name='', wait_time=0, bbox_color='green', mesh_color=(76, 76, 204), **kwargs)[源代码]¶
Visualize 3D mesh estimation results.
- 参数
result (list[dict]) –
The mesh estimation results containing:
”bbox” (ndarray[4]): instance bounding bbox
”center” (ndarray[2]): bbox center
”scale” (ndarray[2]): bbox scale
”keypoints_3d” (ndarray[K,3]): predicted 3D keypoints
”camera” (ndarray[3]): camera parameters
”vertices” (ndarray[V, 3]): predicted 3D vertices
”faces” (ndarray[F, 3]): mesh faces
img (str or Tensor) – Optional. The image to visualize 2D inputs on.
win_name (str) – The window name.
show (bool) – Whether to show the image. Default: False.
wait_time (int) – Value of waitKey param. Default: 0.
out_file (str or None) – The filename to write the image. Default: None.
bbox_color (str or tuple or
Color
) – Color of bbox lines.mesh_color (str or tuple or
Color
) – Color of mesh surface.
- 返回
Visualized img, only if not show or out_file.
- 返回类型
ndarray
- train_step(data_batch, optimizer, **kwargs)[源代码]¶
Train step function.
In this function, the detector will finish the train step following the pipeline:
get fake and real SMPL parameters
optimize discriminator (if have)
optimize generator
If self.train_cfg.disc_step > 1, the train step will contain multiple iterations for optimizing discriminator with different input data and only one iteration for optimizing generator after disc_step iterations for discriminator.
- 参数
data_batch (torch.Tensor) – Batch of data as input.
optimizer (dict[torch.optim.Optimizer]) – Dict with optimizers for generator and discriminator (if have).
- 返回
Dict with loss, information for logger, the number of samples.
- 返回类型
outputs (dict)
- class mmpose.models.detectors.PoseLifter(backbone, neck=None, keypoint_head=None, traj_backbone=None, traj_neck=None, traj_head=None, loss_semi=None, train_cfg=None, test_cfg=None, pretrained=None)[源代码]¶
Pose lifter that lifts 2D pose to 3D pose.
The basic model is a pose model that predicts root-relative pose. If traj_head is not None, a trajectory model that predicts absolute root joint position is also built.
- 参数
backbone (dict) – Config for the backbone of pose model.
neck (dict|None) – Config for the neck of pose model.
keypoint_head (dict|None) – Config for the head of pose model.
traj_backbone (dict|None) – Config for the backbone of trajectory model. If traj_backbone is None and traj_head is not None, trajectory model will share backbone with pose model.
traj_neck (dict|None) – Config for the neck of trajectory model.
traj_head (dict|None) – Config for the head of trajectory model.
loss_semi (dict|None) – Config for semi-supervision loss.
train_cfg (dict|None) – Config for keypoint head during training.
test_cfg (dict|None) – Config for keypoint head during testing.
pretrained (str|None) – Path to pretrained weights.
- forward(input, target=None, target_weight=None, metas=None, return_loss=True, **kwargs)[源代码]¶
Calls either forward_train or forward_test depending on whether return_loss=True.
注解
batch_size: N
num_input_keypoints: Ki
input_keypoint_dim: Ci
input_sequence_len: Ti
num_output_keypoints: Ko
output_keypoint_dim: Co
input_sequence_len: To
- 参数
input (torch.Tensor[NxKixCixTi]) – Input keypoint coordinates.
target (torch.Tensor[NxKoxCoxTo]) – Output keypoint coordinates. Defaults to None.
target_weight (torch.Tensor[NxKox1]) – Weights across different joint types. Defaults to None.
metas (list(dict)) – Information about data augmentation
return_loss (bool) – Option to return loss. return loss=True for training, return loss=False for validation & test.
- 返回
If reutrn_loss is true, return losses. Otherwise return predicted poses.
- 返回类型
dict|Tensor
- forward_dummy(input)[源代码]¶
Used for computing network FLOPs. See
tools/get_flops.py
.- 参数
input (torch.Tensor) – Input pose
- 返回
Model output
- 返回类型
Tensor
- forward_test(input, metas, **kwargs)[源代码]¶
Defines the computation performed at every call when training.
- forward_train(input, target, target_weight, metas, **kwargs)[源代码]¶
Defines the computation performed at every call when training.
- show_result(result, img=None, skeleton=None, pose_kpt_color=None, pose_link_color=None, radius=8, thickness=2, vis_height=400, num_instances=- 1, win_name='', show=False, wait_time=0, out_file=None)[源代码]¶
Visualize 3D pose estimation results.
- 参数
result (list[dict]) –
The pose estimation results containing:
”keypoints_3d” ([K,4]): 3D keypoints
- ”keypoints” ([K,3] or [T,K,3]): Optional for visualizing
2D inputs. If a sequence is given, only the last frame will be used for visualization
”bbox” ([4,] or [T,4]): Optional for visualizing 2D inputs
”title” (str): title for the subplot
img (str or Tensor) – Optional. The image to visualize 2D inputs on.
skeleton (list of [idx_i,idx_j]) – Skeleton described by a list of links, each is a pair of joint indices.
pose_kpt_color (np.array[Nx3]`) – Color of N keypoints. If None, do not draw keypoints.
pose_link_color (np.array[Mx3]) – Color of M links. If None, do not draw links.
radius (int) – Radius of circles.
thickness (int) – Thickness of lines.
vis_height (int) – The image height of the visualization. The width will be N*vis_height depending on the number of visualized items.
win_name (str) – The window name.
wait_time (int) – Value of waitKey param. Default: 0.
out_file (str or None) – The filename to write the image. Default: None.
- 返回
Visualized img, only if not show or out_file.
- 返回类型
Tensor
- property with_keypoint¶
Check if has keypoint_head.
- property with_neck¶
Check if has keypoint_neck.
- property with_traj¶
Check if has trajectory_head.
- property with_traj_backbone¶
Check if has trajectory_backbone.
- property with_traj_neck¶
Check if has trajectory_neck.
- class mmpose.models.detectors.PoseWarper(backbone, neck=None, keypoint_head=None, train_cfg=None, test_cfg=None, pretrained=None, loss_pose=None, concat_tensors=True)[源代码]¶
Top-down pose detectors for multi-frame settings for video inputs.
“Learning temporal pose estimation from sparsely-labeled videos”.
A child class of TopDown detector. The main difference between PoseWarper and TopDown lies in that the former takes a list of tensors as input image while the latter takes a single tensor as input image in forward method.
- 参数
backbone (dict) – Backbone modules to extract features.
neck (dict) – intermediate modules to transform features.
keypoint_head (dict) – Keypoint head to process feature.
train_cfg (dict) – Config for training. Default: None.
test_cfg (dict) – Config for testing. Default: None.
pretrained (str) – Path to the pretrained models.
loss_pose (None) – Deprecated arguments. Please use loss_keypoint for heads instead.
concat_tensors (bool) – Whether to concat the tensors on the batch dim, which can speed up, Default: True
- forward(img, target=None, target_weight=None, img_metas=None, return_loss=True, return_heatmap=False, **kwargs)[源代码]¶
Calls either forward_train or forward_test depending on whether return_loss=True. Note this setting will change the expected inputs. When return_loss=True, img and img_meta are single-nested (i.e. Tensor and List[dict]), and when resturn_loss=False, img and img_meta should be double nested (i.e. List[Tensor], List[List[dict]]), with the outer list indicating test time augmentations.
注解
number of frames: F
batch_size: N
num_keypoints: K
num_img_channel: C (Default: 3)
img height: imgH
img width: imgW
heatmaps height: H
heatmaps weight: W
- 参数
imgs (list[F,torch.Tensor[N,C,imgH,imgW]]) – multiple input frames
target (torch.Tensor[N,K,H,W]) – Target heatmaps for one frame.
target_weight (torch.Tensor[N,K,1]) – Weights across different joint types.
img_metas (list(dict)) –
Information about data augmentation By default this includes:
”image_file: paths to multiple video frames
”center”: center of the bbox
”scale”: scale of the bbox
”rotation”: rotation of the bbox
”bbox_score”: score of bbox
return_loss (bool) – Option to return loss. return loss=True for training, return loss=False for validation & test.
return_heatmap (bool) – Option to return heatmap.
- 返回
if return loss is true, then return losses. Otherwise, return predicted poses, boxes, image paths and heatmaps.
- 返回类型
dict|tuple
- forward_dummy(img)[源代码]¶
Used for computing network FLOPs.
See
tools/get_flops.py
.- 参数
img (torch.Tensor[N,C,imgH,imgW], or list|tuple of tensors) – multiple input frames, N >= 2.
- 返回
Output heatmaps.
- 返回类型
Tensor
- class mmpose.models.detectors.TopDown(backbone, neck=None, keypoint_head=None, train_cfg=None, test_cfg=None, pretrained=None, loss_pose=None)[源代码]¶
Top-down pose detectors.
- 参数
backbone (dict) – Backbone modules to extract feature.
keypoint_head (dict) – Keypoint head to process feature.
train_cfg (dict) – Config for training. Default: None.
test_cfg (dict) – Config for testing. Default: None.
pretrained (str) – Path to the pretrained models.
loss_pose (None) – Deprecated arguments. Please use loss_keypoint for heads instead.
- forward(img, target=None, target_weight=None, img_metas=None, return_loss=True, return_heatmap=False, **kwargs)[源代码]¶
Calls either forward_train or forward_test depending on whether return_loss=True. Note this setting will change the expected inputs. When return_loss=True, img and img_meta are single-nested (i.e. Tensor and List[dict]), and when resturn_loss=False, img and img_meta should be double nested (i.e. List[Tensor], List[List[dict]]), with the outer list indicating test time augmentations.
注解
batch_size: N
num_keypoints: K
num_img_channel: C (Default: 3)
img height: imgH
img width: imgW
heatmaps height: H
heatmaps weight: W
- 参数
img (torch.Tensor[NxCximgHximgW]) – Input images.
target (torch.Tensor[NxKxHxW]) – Target heatmaps.
target_weight (torch.Tensor[NxKx1]) – Weights across different joint types.
img_metas (list(dict)) –
Information about data augmentation By default this includes:
”image_file: path to the image file
”center”: center of the bbox
”scale”: scale of the bbox
”rotation”: rotation of the bbox
”bbox_score”: score of bbox
return_loss (bool) – Option to return loss. return loss=True for training, return loss=False for validation & test.
return_heatmap (bool) – Option to return heatmap.
- 返回
if return loss is true, then return losses. Otherwise, return predicted poses, boxes, image paths and heatmaps.
- 返回类型
dict|tuple
- forward_dummy(img)[源代码]¶
Used for computing network FLOPs.
See
tools/get_flops.py
.- 参数
img (torch.Tensor) – Input image.
- 返回
Output heatmaps.
- 返回类型
Tensor
- forward_test(img, img_metas, return_heatmap=False, **kwargs)[源代码]¶
Defines the computation performed at every call when testing.
- forward_train(img, target, target_weight, img_metas, **kwargs)[源代码]¶
Defines the computation performed at every call when training.
- show_result(img, result, skeleton=None, kpt_score_thr=0.3, bbox_color='green', pose_kpt_color=None, pose_link_color=None, text_color='white', radius=4, thickness=1, font_scale=0.5, bbox_thickness=1, win_name='', show=False, show_keypoint_weight=False, wait_time=0, out_file=None)[源代码]¶
Draw result over img.
- 参数
img (str or Tensor) – The image to be displayed.
result (list[dict]) – The results to draw over img (bbox_result, pose_result).
skeleton (list[list]) – The connection of keypoints. skeleton is 0-based indexing.
kpt_score_thr (float, optional) – Minimum score of keypoints to be shown. Default: 0.3.
bbox_color (str or tuple or
Color
) – Color of bbox lines.pose_kpt_color (np.array[Nx3]`) – Color of N keypoints. If None, do not draw keypoints.
pose_link_color (np.array[Mx3]) – Color of M links. If None, do not draw links.
text_color (str or tuple or
Color
) – Color of texts.radius (int) – Radius of circles.
thickness (int) – Thickness of lines.
font_scale (float) – Font scales of texts.
win_name (str) – The window name.
show (bool) – Whether to show the image. Default: False.
show_keypoint_weight (bool) – Whether to change the transparency using the predicted confidence scores of keypoints.
wait_time (int) – Value of waitKey param. Default: 0.
out_file (str or None) – The filename to write the image. Default: None.
- 返回
Visualized img, only if not show or out_file.
- 返回类型
Tensor
- property with_keypoint¶
Check if has keypoint_head.
- property with_neck¶
Check if has neck.
- class mmpose.models.detectors.VoxelPose(detector_2d, space_3d, project_layer, center_net, center_head, pose_net, pose_head, train_cfg=None, test_cfg=None, pretrained=None, freeze_2d=True)[源代码]¶
VoxelPose Please refer to the paper <https://arxiv.org/abs/2004.06239> for details.
- 参数
detector_2d (ConfigDict) – Dictionary to construct the 2D pose detector
space_3d (ConfigDict) – Dictionary that contains 3D space information space_size (list): Size of the 3D space cube_size (list): Size of the input volume to the center net. space_center (list): Coordinate of the center of the 3D space sub_space_size (list): Size of the cuboid human proposal. sub_cube_size (list): Size of the input volume to the pose net.
project_layer (ConfigDict) – Dictionary to construct the project layer.
center_net (ConfigDict) – Dictionary to construct the center net.
center_head (ConfigDict) – Dictionary to construct the center head.
pose_net (ConfigDict) – Dictionary to construct the pose net.
pose_head (ConfigDict) – Dictionary to construct the pose head.
train_cfg (ConfigDict) – Config for training. Default: None.
test_cfg (ConfigDict) – Config for testing. Default: None.
pretrained (str) – Path to the pretrained 2D model. Default: None.
freeze_2d (bool) – Whether to freeze the 2D model in training. Default: True.
- assign2gt(center_candidates, gt_centers, gt_num_persons)[源代码]¶
“Assign gt id to each valid human center candidate.
- forward(img, img_metas, return_loss=True, targets=None, masks=None, targets_3d=None, input_heatmaps=None, **kwargs)[源代码]¶
注解
batch_size: N num_keypoints: K num_img_channel: C img_width: imgW img_height: imgH heatmaps width: W heatmaps height: H volume_length: cubeL volume_width: cubeW volume_height: cubeH
- 参数
img (list(torch.Tensor[NxCximgHximgW])) – Multi-camera input images to the 2D model.
img_metas (list(dict)) – Information about image, 3D groundtruth and camera parameters.
return_loss – Option to return loss. return loss=True for training, return loss=False for validation & test.
targets (list(torch.Tensor[NxKxHxW])) – Multi-camera target heatmaps of the 2D model.
masks (list(torch.Tensor[NxHxW])) – Multi-camera masks of the input to the 2D model.
targets_3d (torch.Tensor[NxcubeLxcubeWxcubeH]) – Ground-truth 3D heatmap of human centers.
input_heatmaps (list(torch.Tensor[NxKxHxW])) –
- Multi-camera heatmaps when the 2D model is not available.
Default: None.
**kwargs –
- 返回
- if ‘return_loss’ is true, then return losses.
Otherwise, return predicted poses, human centers and sample_id
- 返回类型
dict
- forward_test(img, img_metas, input_heatmaps=None)[源代码]¶
注解
batch_size: N num_keypoints: K num_img_channel: C img_width: imgW img_height: imgH heatmaps width: W heatmaps height: H volume_length: cubeL volume_width: cubeW volume_height: cubeH
- 参数
img (list(torch.Tensor[NxCximgHximgW])) – Multi-camera input images to the 2D model.
img_metas (list(dict)) – Information about image, 3D groundtruth and camera parameters.
input_heatmaps (list(torch.Tensor[NxKxHxW])) –
- Multi-camera heatmaps when the 2D model is not available.
Default: None.
- 返回
predicted poses, human centers and sample_id
- 返回类型
dict
- forward_train(img, img_metas, targets=None, masks=None, targets_3d=None, input_heatmaps=None)[源代码]¶
注解
batch_size: N num_keypoints: K num_img_channel: C img_width: imgW img_height: imgH heatmaps width: W heatmaps height: H volume_length: cubeL volume_width: cubeW volume_height: cubeH
- 参数
img (list(torch.Tensor[NxCximgHximgW])) – Multi-camera input images to the 2D model.
img_metas (list(dict)) – Information about image, 3D groundtruth and camera parameters.
targets (list(torch.Tensor[NxKxHxW])) – Multi-camera target heatmaps of the 2D model.
masks (list(torch.Tensor[NxHxW])) – Multi-camera masks of the input to the 2D model.
targets_3d (torch.Tensor[NxcubeLxcubeWxcubeH]) – Ground-truth 3D heatmap of human centers.
input_heatmaps (list(torch.Tensor[NxKxHxW])) –
- Multi-camera heatmaps when the 2D model is not available.
Default: None.
- 返回
losses.
- 返回类型
dict
- train(mode=True)[源代码]¶
Sets the module in training mode. :param mode: whether to set training mode (
True
)or evaluation mode (
False
). Default:True
.- 返回
self
- 返回类型
Module
- train_step(data_batch, optimizer, **kwargs)[源代码]¶
The iteration step during training.
This method defines an iteration step during training, except for the back propagation and optimizer updating, which are done in an optimizer hook. Note that in some complicated cases or models, the whole process including back propagation and optimizer updating is also defined in this method, such as GAN.
- 参数
data_batch (dict) – The output of dataloader.
optimizer (
torch.optim.Optimizer
| dict) – The optimizer of runner is passed totrain_step()
. This argument is unused and reserved.
- 返回
- It should contain at least 3 keys:
loss
,log_vars
, num_samples
.loss
is a tensor for back propagation, which can be a weighted sum of multiple losses.log_vars
contains all the variables to be sent to the logger.num_samples
indicates the batch size (when the model is DDP, it means the batch size on each GPU), which is used for averaging the logs.
- It should contain at least 3 keys:
- 返回类型
dict
heads¶
- class mmpose.models.heads.AEHigherResolutionHead(in_channels, num_joints, tag_per_joint=True, extra=None, num_deconv_layers=1, num_deconv_filters=(32), num_deconv_kernels=(4), num_basic_blocks=4, cat_output=None, with_ae_loss=None, loss_keypoint=None)[源代码]¶
Associative embedding with higher resolution head. paper ref: Bowen Cheng et al. “HigherHRNet: Scale-Aware Representation Learning for Bottom- Up Human Pose Estimation”.
- 参数
in_channels (int) – Number of input channels.
num_joints (int) – Number of joints
tag_per_joint (bool) – If tag_per_joint is True, the dimension of tags equals to num_joints, else the dimension of tags is 1. Default: True
extra (dict) – Configs for extra conv layers. Default: None
num_deconv_layers (int) – Number of deconv layers. num_deconv_layers should >= 0. Note that 0 means no deconv layers.
num_deconv_filters (list|tuple) – Number of filters. If num_deconv_layers > 0, the length of
num_deconv_kernels (list|tuple) – Kernel sizes.
cat_output (list[bool]) – Option to concat outputs.
with_ae_loss (list[bool]) – Option to use ae loss.
loss_keypoint (dict) – Config for loss. Default: None.
- get_loss(outputs, targets, masks, joints)[源代码]¶
Calculate bottom-up keypoint loss.
注解
batch_size: N
num_keypoints: K
num_outputs: O
heatmaps height: H
heatmaps weight: W
- 参数
outputs (list(torch.Tensor[N,K,H,W])) – Multi-scale output heatmaps.
targets (List(torch.Tensor[N,K,H,W])) – Multi-scale target heatmaps.
masks (List(torch.Tensor[N,H,W])) – Masks of multi-scale target heatmaps
joints (List(torch.Tensor[N,M,K,2])) – Joints of multi-scale target heatmaps for ae loss
- class mmpose.models.heads.AEMultiStageHead(in_channels, out_channels, num_stages=1, num_deconv_layers=3, num_deconv_filters=(256, 256, 256), num_deconv_kernels=(4, 4, 4), extra=None, loss_keypoint=None)[源代码]¶
Associative embedding multi-stage head. paper ref: Alejandro Newell et al. “Associative Embedding: End-to-end Learning for Joint Detection and Grouping”
- 参数
in_channels (int) – Number of input channels.
out_channels (int) – Number of output channels.
num_deconv_layers (int) – Number of deconv layers. num_deconv_layers should >= 0. Note that 0 means no deconv layers.
num_deconv_filters (list|tuple) – Number of filters. If num_deconv_layers > 0, the length of
num_deconv_kernels (list|tuple) – Kernel sizes.
loss_keypoint (dict) – Config for loss. Default: None.
- forward(x)[源代码]¶
Forward function.
- 返回
a list of heatmaps from multiple stages.
- 返回类型
out (list[Tensor])
- get_loss(output, targets, masks, joints)[源代码]¶
Calculate bottom-up keypoint loss.
注解
batch_size: N
num_keypoints: K
heatmaps height: H
heatmaps weight: W
- 参数
output (List(torch.Tensor[NxKxHxW])) – Output heatmaps.
targets (List(List(torch.Tensor[NxKxHxW]))) – Multi-stage and multi-scale target heatmaps.
masks (List(List(torch.Tensor[NxHxW]))) – Masks of multi-stage and multi-scale target heatmaps
joints (List(List(torch.Tensor[NxMxKx2]))) – Joints of multi-stage multi-scale target heatmaps for ae loss
- class mmpose.models.heads.AESimpleHead(in_channels, num_joints, num_deconv_layers=3, num_deconv_filters=(256, 256, 256), num_deconv_kernels=(4, 4, 4), tag_per_joint=True, with_ae_loss=None, extra=None, loss_keypoint=None)[源代码]¶
Associative embedding simple head. paper ref: Alejandro Newell et al. “Associative Embedding: End-to-end Learning for Joint Detection and Grouping”
- 参数
in_channels (int) – Number of input channels.
num_joints (int) – Number of joints.
num_deconv_layers (int) – Number of deconv layers. num_deconv_layers should >= 0. Note that 0 means no deconv layers.
num_deconv_filters (list|tuple) – Number of filters. If num_deconv_layers > 0, the length of
num_deconv_kernels (list|tuple) – Kernel sizes.
tag_per_joint (bool) – If tag_per_joint is True, the dimension of tags equals to num_joints, else the dimension of tags is 1. Default: True
with_ae_loss (list[bool]) – Option to use ae loss or not.
loss_keypoint (dict) – Config for loss. Default: None.
- get_loss(outputs, targets, masks, joints)[源代码]¶
Calculate bottom-up keypoint loss.
注解
batch_size: N
num_keypoints: K
num_outputs: O
heatmaps height: H
heatmaps weight: W
- 参数
outputs (list(torch.Tensor[N,K,H,W])) – Multi-scale output heatmaps.
targets (List(torch.Tensor[N,K,H,W])) – Multi-scale target heatmaps.
masks (List(torch.Tensor[N,H,W])) – Masks of multi-scale target heatmaps
joints (List(torch.Tensor[N,M,K,2])) – Joints of multi-scale target heatmaps for ae loss
- class mmpose.models.heads.CuboidCenterHead(cfg)[源代码]¶
Get results from the 3D human center heatmap. In this module, human 3D centers are local maximums obtained from the 3D heatmap via NMS (max- pooling).
- 参数
cfg (dict) – space_size (list[3]): The size of the 3D space. cube_size (list[3]): The size of the heatmap volume. space_center (list[3]): The coordinate of space center. max_num (int): Maximum of human center detections. max_pool_kernel (int): Kernel size of the max-pool kernel in nms.
- class mmpose.models.heads.CuboidPoseHead(beta)[源代码]¶
- forward(heatmap_volumes, grid_coordinates)[源代码]¶
- 参数
heatmap_volumes (torch.Tensor(NxKxLxWxH)) – 3D human pose heatmaps predicted by the network.
grid_coordinates (torch.Tensor(Nx(LxWxH)x3)) – Coordinates of the grids in the heatmap volumes.
- 返回
Coordinates of human poses.
- 返回类型
human_poses (torch.Tensor(NxKx3))
- class mmpose.models.heads.DeconvHead(in_channels=3, out_channels=17, num_deconv_layers=3, num_deconv_filters=(256, 256, 256), num_deconv_kernels=(4, 4, 4), extra=None, in_index=0, input_transform=None, align_corners=False, loss_keypoint=None)[源代码]¶
Simple deconv head.
- 参数
in_channels (int) – Number of input channels.
out_channels (int) – Number of output channels.
num_deconv_layers (int) – Number of deconv layers. num_deconv_layers should >= 0. Note that 0 means no deconv layers.
num_deconv_filters (list|tuple) – Number of filters. If num_deconv_layers > 0, the length of
num_deconv_kernels (list|tuple) – Kernel sizes.
in_index (int|Sequence[int]) – Input feature index. Default: 0
input_transform (str|None) –
Transformation type of input features. Options: ‘resize_concat’, ‘multiple_select’, None. Default: None.
- ’resize_concat’: Multiple feature maps will be resized to the
same size as the first one and then concat together. Usually used in FCN head of HRNet.
- ’multiple_select’: Multiple feature maps will be bundle into
a list and passed into decode head.
None: Only one select feature map is allowed.
align_corners (bool) – align_corners argument of F.interpolate. Default: False.
loss_keypoint (dict) – Config for loss. Default: None.
- get_loss(outputs, targets, masks)[源代码]¶
Calculate bottom-up masked mse loss.
注解
batch_size: N
num_channels: C
heatmaps height: H
heatmaps weight: W
- 参数
outputs (List(torch.Tensor[N,C,H,W])) – Multi-scale outputs.
targets (List(torch.Tensor[N,C,H,W])) – Multi-scale targets.
masks (List(torch.Tensor[N,H,W])) – Masks of multi-scale targets.
- class mmpose.models.heads.DeepposeRegressionHead(in_channels, num_joints, loss_keypoint=None, train_cfg=None, test_cfg=None)[源代码]¶
Deeppose regression head with fully connected layers.
“DeepPose: Human Pose Estimation via Deep Neural Networks”.
- 参数
in_channels (int) – Number of input channels
num_joints (int) – Number of joints
loss_keypoint (dict) – Config for keypoint loss. Default: None.
- decode(img_metas, output, **kwargs)[源代码]¶
Decode the keypoints from output regression.
- 参数
img_metas (list(dict)) –
Information about data augmentation By default this includes:
”image_file: path to the image file
”center”: center of the bbox
”scale”: scale of the bbox
”rotation”: rotation of the bbox
”bbox_score”: score of bbox
output (np.ndarray[N, K, 2]) – predicted regression vector.
kwargs – dict contains ‘img_size’. img_size (tuple(img_width, img_height)): input image size.
- get_accuracy(output, target, target_weight)[源代码]¶
Calculate accuracy for top-down keypoint loss.
注解
batch_size: N
num_keypoints: K
- 参数
output (torch.Tensor[N, K, 2]) – Output keypoints.
target (torch.Tensor[N, K, 2]) – Target keypoints.
target_weight (torch.Tensor[N, K, 2]) – Weights across different joint types.
- class mmpose.models.heads.HMRMeshHead(in_channels, smpl_mean_params=None, n_iter=3)[源代码]¶
SMPL parameters regressor head of simple baseline. “End-to-end Recovery of Human Shape and Pose”, CVPR’2018.
- 参数
in_channels (int) – Number of input channels
smpl_mean_params (str) – The file name of the mean SMPL parameters
n_iter (int) – The iterations of estimating delta parameters
- class mmpose.models.heads.Interhand3DHead(keypoint_head_cfg, root_head_cfg, hand_type_head_cfg, loss_keypoint=None, loss_root_depth=None, loss_hand_type=None, train_cfg=None, test_cfg=None)[源代码]¶
Interhand 3D head of paper ref: Gyeongsik Moon. “InterHand2.6M: A Dataset and Baseline for 3D Interacting Hand Pose Estimation from a Single RGB Image”.
- 参数
keypoint_head_cfg (dict) – Configs of Heatmap3DHead for hand keypoint estimation.
root_head_cfg (dict) – Configs of Heatmap1DHead for relative hand root depth estimation.
hand_type_head_cfg (dict) – Configs of MultilabelClassificationHead for hand type classification.
loss_keypoint (dict) – Config for keypoint loss. Default: None.
loss_root_depth (dict) – Config for relative root depth loss. Default: None.
loss_hand_type (dict) – Config for hand type classification loss. Default: None.
- decode(img_metas, output, **kwargs)[源代码]¶
Decode hand keypoint, relative root depth and hand type.
- 参数
img_metas (list(dict)) –
Information about data augmentation By default this includes:
”image_file: path to the image file
”center”: center of the bbox
”scale”: scale of the bbox
”rotation”: rotation of the bbox
”bbox_score”: score of bbox
- ”heatmap3d_depth_bound”: depth bound of hand keypoint
3D heatmap
- ”root_depth_bound”: depth bound of relative root depth
1D heatmap
output (list[np.ndarray]) – model predicted 3D heatmaps, relative root depth and hand type.
- get_accuracy(output, target, target_weight)[源代码]¶
Calculate accuracy for hand type.
- 参数
output (list[Tensor]) – a list of outputs from multiple heads.
target (list[Tensor]) – a list of targets for multiple heads.
target_weight (list[Tensor]) – a list of targets weight for multiple heads.
- get_loss(output, target, target_weight)[源代码]¶
Calculate loss for hand keypoint heatmaps, relative root depth and hand type.
- 参数
output (list[Tensor]) – a list of outputs from multiple heads.
target (list[Tensor]) – a list of targets for multiple heads.
target_weight (list[Tensor]) – a list of targets weight for multiple heads.
- class mmpose.models.heads.TemporalRegressionHead(in_channels, num_joints, max_norm=None, loss_keypoint=None, is_trajectory=False, train_cfg=None, test_cfg=None)[源代码]¶
Regression head of VideoPose3D.
“3D human pose estimation in video with temporal convolutions and semi-supervised training”, CVPR’2019.
- 参数
in_channels (int) – Number of input channels
num_joints (int) – Number of joints
loss_keypoint (dict) – Config for keypoint loss. Default: None.
max_norm (float|None) – if not None, the weight of convolution layers will be clipped to have a maximum norm of max_norm.
is_trajectory (bool) – If the model only predicts root joint position, then this arg should be set to True. In this case, traj_loss will be calculated. Otherwise, it should be set to False. Default: False.
- decode(metas, output)[源代码]¶
Decode the keypoints from output regression.
- 参数
metas (list(dict)) –
Information about data augmentation. By default this includes:
”target_image_path”: path to the image file
output (np.ndarray[N, K, 3]) – predicted regression vector.
metas –
Information about data augmentation including:
target_image_path (str): Optional, path to the image file
- target_mean (float): Optional, normalization parameter of
the target pose.
- target_std (float): Optional, normalization parameter of the
target pose.
- root_position (np.ndarray[3,1]): Optional, global
position of the root joint.
- root_index (torch.ndarray[1,]): Optional, original index of
the root joint before root-centering.
- get_accuracy(output, target, target_weight, metas)[源代码]¶
Calculate accuracy for keypoint loss.
注解
batch_size: N
num_keypoints: K
- 参数
output (torch.Tensor[N, K, 3]) – Output keypoints.
target (torch.Tensor[N, K, 3]) – Target keypoints.
target_weight (torch.Tensor[N, K, 3]) – Weights across different joint types.
metas (list(dict)) –
Information about data augmentation including:
target_image_path (str): Optional, path to the image file
- target_mean (float): Optional, normalization parameter of
the target pose.
- target_std (float): Optional, normalization parameter of the
target pose.
- root_position (np.ndarray[3,1]): Optional, global
position of the root joint.
- root_index (torch.ndarray[1,]): Optional, original index of
the root joint before root-centering.
- get_loss(output, target, target_weight)[源代码]¶
Calculate keypoint loss.
注解
batch_size: N
num_keypoints: K
- 参数
output (torch.Tensor[N, K, 3]) – Output keypoints.
target (torch.Tensor[N, K, 3]) – Target keypoints.
target_weight (torch.Tensor[N, K, 3]) – Weights across different joint types. If self.is_trajectory is True and target_weight is None, target_weight will be set inversely proportional to joint depth.
- class mmpose.models.heads.TopdownHeatmapBaseHead[源代码]¶
Base class for top-down heatmap heads.
All top-down heatmap heads should subclass it. All subclass should overwrite:
Methods:get_loss, supporting to calculate loss. Methods:get_accuracy, supporting to calculate accuracy. Methods:forward, supporting to forward model. Methods:inference_model, supporting to inference model.
- decode(img_metas, output, **kwargs)[源代码]¶
Decode keypoints from heatmaps.
- 参数
img_metas (list(dict)) –
Information about data augmentation By default this includes:
”image_file: path to the image file
”center”: center of the bbox
”scale”: scale of the bbox
”rotation”: rotation of the bbox
”bbox_score”: score of bbox
output (np.ndarray[N, K, H, W]) – model predicted heatmaps.
- class mmpose.models.heads.TopdownHeatmapMSMUHead(out_shape, unit_channels=256, out_channels=17, num_stages=4, num_units=4, use_prm=False, norm_cfg={'type': 'BN'}, loss_keypoint=None, train_cfg=None, test_cfg=None)[源代码]¶
Heads for multi-stage multi-unit heads used in Multi-Stage Pose estimation Network (MSPN), and Residual Steps Networks (RSN).
- 参数
unit_channels (int) – Number of input channels.
out_channels (int) – Number of output channels.
out_shape (tuple) – Shape of the output heatmap.
num_stages (int) – Number of stages.
num_units (int) – Number of units in each stage.
use_prm (bool) – Whether to use pose refine machine (PRM). Default: False.
norm_cfg (dict) – dictionary to construct and config norm layer. Default: dict(type=’BN’)
loss_keypoint (dict) – Config for keypoint loss. Default: None.
- forward(x)[源代码]¶
Forward function.
- 返回
- a list of heatmaps from multiple stages
and units.
- 返回类型
out (list[Tensor])
- get_accuracy(output, target, target_weight)[源代码]¶
Calculate accuracy for top-down keypoint loss.
注解
batch_size: N
num_keypoints: K
heatmaps height: H
heatmaps weight: W
- 参数
output (torch.Tensor[N,K,H,W]) – Output heatmaps.
target (torch.Tensor[N,K,H,W]) – Target heatmaps.
target_weight (torch.Tensor[N,K,1]) – Weights across different joint types.
- get_loss(output, target, target_weight)[源代码]¶
Calculate top-down keypoint loss.
注解
batch_size: N
num_keypoints: K
num_outputs: O
heatmaps height: H
heatmaps weight: W
- 参数
output (torch.Tensor[N,O,K,H,W]) – Output heatmaps.
target (torch.Tensor[N,O,K,H,W]) – Target heatmaps.
target_weight (torch.Tensor[N,O,K,1]) – Weights across different joint types.
- class mmpose.models.heads.TopdownHeatmapMultiStageHead(in_channels=512, out_channels=17, num_stages=1, num_deconv_layers=3, num_deconv_filters=(256, 256, 256), num_deconv_kernels=(4, 4, 4), extra=None, loss_keypoint=None, train_cfg=None, test_cfg=None)[源代码]¶
Top-down heatmap multi-stage head.
TopdownHeatmapMultiStageHead is consisted of multiple branches, each of which has num_deconv_layers(>=0) number of deconv layers and a simple conv2d layer.
- 参数
in_channels (int) – Number of input channels.
out_channels (int) – Number of output channels.
num_stages (int) – Number of stages.
num_deconv_layers (int) – Number of deconv layers. num_deconv_layers should >= 0. Note that 0 means no deconv layers.
num_deconv_filters (list|tuple) – Number of filters. If num_deconv_layers > 0, the length of
num_deconv_kernels (list|tuple) – Kernel sizes.
loss_keypoint (dict) – Config for keypoint loss. Default: None.
- forward(x)[源代码]¶
Forward function.
- 返回
a list of heatmaps from multiple stages.
- 返回类型
out (list[Tensor])
- get_accuracy(output, target, target_weight)[源代码]¶
Calculate accuracy for top-down keypoint loss.
注解
batch_size: N
num_keypoints: K
heatmaps height: H
heatmaps weight: W
- 参数
output (torch.Tensor[N,K,H,W]) – Output heatmaps.
target (torch.Tensor[N,K,H,W]) – Target heatmaps.
target_weight (torch.Tensor[N,K,1]) – Weights across different joint types.
- get_loss(output, target, target_weight)[源代码]¶
Calculate top-down keypoint loss.
注解
batch_size: N
num_keypoints: K
num_outputs: O
heatmaps height: H
heatmaps weight: W
- 参数
output (torch.Tensor[N,K,H,W]) – Output heatmaps.
target (torch.Tensor[N,K,H,W]) – Target heatmaps.
target_weight (torch.Tensor[N,K,1]) – Weights across different joint types.
- class mmpose.models.heads.TopdownHeatmapSimpleHead(in_channels, out_channels, num_deconv_layers=3, num_deconv_filters=(256, 256, 256), num_deconv_kernels=(4, 4, 4), extra=None, in_index=0, input_transform=None, align_corners=False, loss_keypoint=None, train_cfg=None, test_cfg=None)[源代码]¶
Top-down heatmap simple head. paper ref: Bin Xiao et al.
Simple Baselines for Human Pose Estimation and Tracking
.TopdownHeatmapSimpleHead is consisted of (>=0) number of deconv layers and a simple conv2d layer.
- 参数
in_channels (int) – Number of input channels
out_channels (int) – Number of output channels
num_deconv_layers (int) – Number of deconv layers. num_deconv_layers should >= 0. Note that 0 means no deconv layers.
num_deconv_filters (list|tuple) – Number of filters. If num_deconv_layers > 0, the length of
num_deconv_kernels (list|tuple) – Kernel sizes.
in_index (int|Sequence[int]) – Input feature index. Default: 0
input_transform (str|None) –
Transformation type of input features. Options: ‘resize_concat’, ‘multiple_select’, None. Default: None.
- ’resize_concat’: Multiple feature maps will be resized to the
same size as the first one and then concat together. Usually used in FCN head of HRNet.
- ’multiple_select’: Multiple feature maps will be bundle into
a list and passed into decode head.
None: Only one select feature map is allowed.
align_corners (bool) – align_corners argument of F.interpolate. Default: False.
loss_keypoint (dict) – Config for keypoint loss. Default: None.
- get_accuracy(output, target, target_weight)[源代码]¶
Calculate accuracy for top-down keypoint loss.
注解
batch_size: N
num_keypoints: K
heatmaps height: H
heatmaps weight: W
- 参数
output (torch.Tensor[N,K,H,W]) – Output heatmaps.
target (torch.Tensor[N,K,H,W]) – Target heatmaps.
target_weight (torch.Tensor[N,K,1]) – Weights across different joint types.
- get_loss(output, target, target_weight)[源代码]¶
Calculate top-down keypoint loss.
注解
batch_size: N
num_keypoints: K
heatmaps height: H
heatmaps weight: W
- 参数
output (torch.Tensor[N,K,H,W]) – Output heatmaps.
target (torch.Tensor[N,K,H,W]) – Target heatmaps.
target_weight (torch.Tensor[N,K,1]) – Weights across different joint types.
- class mmpose.models.heads.ViPNASHeatmapSimpleHead(in_channels, out_channels, num_deconv_layers=3, num_deconv_filters=(144, 144, 144), num_deconv_kernels=(4, 4, 4), num_deconv_groups=(16, 16, 16), extra=None, in_index=0, input_transform=None, align_corners=False, loss_keypoint=None, train_cfg=None, test_cfg=None)[源代码]¶
ViPNAS heatmap simple head.
ViPNAS: Efficient Video Pose Estimation via Neural Architecture Search. More details can be found in the paper .
TopdownHeatmapSimpleHead is consisted of (>=0) number of deconv layers and a simple conv2d layer.
- 参数
in_channels (int) – Number of input channels
out_channels (int) – Number of output channels
num_deconv_layers (int) – Number of deconv layers. num_deconv_layers should >= 0. Note that 0 means no deconv layers.
num_deconv_filters (list|tuple) – Number of filters. If num_deconv_layers > 0, the length of
num_deconv_kernels (list|tuple) – Kernel sizes.
num_deconv_groups (list|tuple) – Group number.
in_index (int|Sequence[int]) – Input feature index. Default: -1
input_transform (str|None) –
Transformation type of input features. Options: ‘resize_concat’, ‘multiple_select’, None. Default: None.
- ’resize_concat’: Multiple feature maps will be resize to the
same size as first one and than concat together. Usually used in FCN head of HRNet.
- ’multiple_select’: Multiple feature maps will be bundle into
a list and passed into decode head.
None: Only one select feature map is allowed.
align_corners (bool) – align_corners argument of F.interpolate. Default: False.
loss_keypoint (dict) – Config for keypoint loss. Default: None.
- get_accuracy(output, target, target_weight)[源代码]¶
Calculate accuracy for top-down keypoint loss.
注解
batch_size: N
num_keypoints: K
heatmaps height: H
heatmaps weight: W
- 参数
output (torch.Tensor[N,K,H,W]) – Output heatmaps.
target (torch.Tensor[N,K,H,W]) – Target heatmaps.
target_weight (torch.Tensor[N,K,1]) – Weights across different joint types.
- get_loss(output, target, target_weight)[源代码]¶
Calculate top-down keypoint loss.
注解
batch_size: N
num_keypoints: K
heatmaps height: H
heatmaps weight: W
- 参数
output (torch.Tensor[N,K,H,W]) – Output heatmaps.
target (torch.Tensor[N,K,H,W]) – Target heatmaps.
target_weight (torch.Tensor[N,K,1]) – Weights across different joint types.
losses¶
- class mmpose.models.losses.AELoss(loss_type)[源代码]¶
Associative Embedding loss.
Associative Embedding: End-to-End Learning for Joint Detection and Grouping.
- class mmpose.models.losses.AdaptiveWingLoss(alpha=2.1, omega=14, epsilon=1, theta=0.5, use_target_weight=False, loss_weight=1.0)[源代码]¶
Adaptive wing loss. paper ref: ‘Adaptive Wing Loss for Robust Face Alignment via Heatmap Regression’ Wang et al. ICCV’2019.
- 参数
alpha (float), omega (float), epsilon (float), theta (float) – are hyper-parameters.
use_target_weight (bool) – Option to use weighted MSE loss. Different joint types may have different target weights.
loss_weight (float) – Weight of the loss. Default: 1.0.
- class mmpose.models.losses.BCELoss(use_target_weight=False, loss_weight=1.0)[源代码]¶
Binary Cross Entropy loss.
- class mmpose.models.losses.BoneLoss(joint_parents, use_target_weight=False, loss_weight=1.0)[源代码]¶
Bone length loss.
- 参数
joint_parents (list) – Indices of each joint’s parent joint.
use_target_weight (bool) – Option to use weighted bone loss. Different bone types may have different target weights.
loss_weight (float) – Weight of the loss. Default: 1.0.
- forward(output, target, target_weight=None)[源代码]¶
Forward function.
注解
batch_size: N
num_keypoints: K
dimension of keypoints: D (D=2 or D=3)
- 参数
output (torch.Tensor[N, K, D]) – Output regression.
target (torch.Tensor[N, K, D]) – Target regression.
target_weight (torch.Tensor[N, K-1]) – Weights across different bone types.
- class mmpose.models.losses.GANLoss(gan_type, real_label_val=1.0, fake_label_val=0.0, loss_weight=1.0)[源代码]¶
Define GAN loss.
- 参数
gan_type (str) – Support ‘vanilla’, ‘lsgan’, ‘wgan’, ‘hinge’.
real_label_val (float) – The value for real label. Default: 1.0.
fake_label_val (float) – The value for fake label. Default: 0.0.
loss_weight (float) – Loss weight. Default: 1.0. Note that loss_weight is only for generators; and it is always 1.0 for discriminators.
- class mmpose.models.losses.HeatmapLoss(supervise_empty=True)[源代码]¶
Accumulate the heatmap loss for each image in the batch.
- 参数
supervise_empty (bool) – Whether to supervise empty channels.
- class mmpose.models.losses.JointsMSELoss(use_target_weight=False, loss_weight=1.0)[源代码]¶
MSE loss for heatmaps.
- 参数
use_target_weight (bool) – Option to use weighted MSE loss. Different joint types may have different target weights.
loss_weight (float) – Weight of the loss. Default: 1.0.
- class mmpose.models.losses.JointsOHKMMSELoss(use_target_weight=False, topk=8, loss_weight=1.0)[源代码]¶
MSE loss with online hard keypoint mining.
- 参数
use_target_weight (bool) – Option to use weighted MSE loss. Different joint types may have different target weights.
topk (int) – Only top k joint losses are kept.
loss_weight (float) – Weight of the loss. Default: 1.0.
- class mmpose.models.losses.MPJPELoss(use_target_weight=False, loss_weight=1.0)[源代码]¶
MPJPE (Mean Per Joint Position Error) loss.
- 参数
use_target_weight (bool) – Option to use weighted MSE loss. Different joint types may have different target weights.
loss_weight (float) – Weight of the loss. Default: 1.0.
- forward(output, target, target_weight=None)[源代码]¶
Forward function.
注解
batch_size: N
num_keypoints: K
dimension of keypoints: D (D=2 or D=3)
- 参数
output (torch.Tensor[N, K, D]) – Output regression.
target (torch.Tensor[N, K, D]) – Target regression.
target_weight (torch.Tensor[N,K,D]) – Weights across different joint types.
- class mmpose.models.losses.MSELoss(use_target_weight=False, loss_weight=1.0)[源代码]¶
MSE loss for coordinate regression.
- class mmpose.models.losses.MeshLoss(joints_2d_loss_weight, joints_3d_loss_weight, vertex_loss_weight, smpl_pose_loss_weight, smpl_beta_loss_weight, img_res, focal_length=5000)[源代码]¶
Mix loss for 3D human mesh. It is composed of loss on 2D joints, 3D joints, mesh vertices and smpl parameters (if any).
- 参数
joints_2d_loss_weight (float) – Weight for loss on 2D joints.
joints_3d_loss_weight (float) – Weight for loss on 3D joints.
vertex_loss_weight (float) – Weight for loss on 3D verteices.
smpl_pose_loss_weight (float) – Weight for loss on SMPL pose parameters.
smpl_beta_loss_weight (float) – Weight for loss on SMPL shape parameters.
img_res (int) – Input image resolution.
focal_length (float) – Focal length of camera model. Default=5000.
- forward(output, target)[源代码]¶
Forward function.
- 参数
output (dict) – dict of network predicted results. Keys: ‘vertices’, ‘joints_3d’, ‘camera’, ‘pose’(optional), ‘beta’(optional)
target (dict) – dict of ground-truth labels. Keys: ‘vertices’, ‘joints_3d’, ‘joints_3d_visible’, ‘joints_2d’, ‘joints_2d_visible’, ‘pose’, ‘beta’, ‘has_smpl’
- 返回
dict of losses.
- 返回类型
dict
- joints_2d_loss(pred_joints_2d, gt_joints_2d, joints_2d_visible)[源代码]¶
Compute 2D reprojection loss on the joints.
The loss is weighted by joints_2d_visible.
- joints_3d_loss(pred_joints_3d, gt_joints_3d, joints_3d_visible)[源代码]¶
Compute 3D joints loss for the examples that 3D joint annotations are available.
The loss is weighted by joints_3d_visible.
- project_points(points_3d, camera)[源代码]¶
Perform orthographic projection of 3D points using the camera parameters, return projected 2D points in image plane.
注解
batch size: B
point number: N
- 参数
points_3d (Tensor([B, N, 3])) – 3D points.
camera (Tensor([B, 3])) – camera parameters with the 3 channel as (scale, translation_x, translation_y)
- 返回
projected 2D points in image space.
- 返回类型
Tensor([B, N, 2])
- class mmpose.models.losses.MultiLossFactory(num_joints, num_stages, ae_loss_type, with_ae_loss, push_loss_factor, pull_loss_factor, with_heatmaps_loss, heatmaps_loss_factor, supervise_empty=True)[源代码]¶
Loss for bottom-up models.
- 参数
num_joints (int) – Number of keypoints.
num_stages (int) – Number of stages.
ae_loss_type (str) – Type of ae loss.
with_ae_loss (list[bool]) – Use ae loss or not in multi-heatmap.
push_loss_factor (list[float]) – Parameter of push loss in multi-heatmap.
pull_loss_factor (list[float]) – Parameter of pull loss in multi-heatmap.
with_heatmap_loss (list[bool]) – Use heatmap loss or not in multi-heatmap.
heatmaps_loss_factor (list[float]) – Parameter of heatmap loss in multi-heatmap.
supervise_empty (bool) – Whether to supervise empty channels.
- forward(outputs, heatmaps, masks, joints)[源代码]¶
Forward function to calculate losses.
注解
batch_size: N
heatmaps weight: W
heatmaps height: H
max_num_people: M
num_keypoints: K
output_channel: C C=2K if use ae loss else K
- 参数
outputs (list(torch.Tensor[N,C,H,W])) – outputs of stages.
heatmaps (list(torch.Tensor[N,K,H,W])) – target of heatmaps.
masks (list(torch.Tensor[N,H,W])) – masks of heatmaps.
joints (list(torch.Tensor[N,M,K,2])) – joints of ae loss.
- class mmpose.models.losses.SemiSupervisionLoss(joint_parents, projection_loss_weight=1.0, bone_loss_weight=1.0, warmup_iterations=0)[源代码]¶
Semi-supervision loss for unlabeled data. It is composed of projection loss and bone loss.
Paper ref: 3D human pose estimation in video with temporal convolutions and semi-supervised training Dario Pavllo et al. CVPR’2019.
- 参数
joint_parents (list) – Indices of each joint’s parent joint.
projection_loss_weight (float) – Weight for projection loss.
bone_loss_weight (float) – Weight for bone loss.
warmup_iterations (int) – Number of warmup iterations. In the first warmup_iterations iterations, the model is trained only on labeled data, and semi-supervision loss will be 0. This is a workaround since currently we cannot access epoch number in loss functions. Note that the iteration number in an epoch can be changed due to different GPU numbers in multi-GPU settings. So please set this parameter carefully. warmup_iterations = dataset_size // samples_per_gpu // gpu_num * warmup_epochs
- forward(output, target)[源代码]¶
Defines the computation performed at every call.
Should be overridden by all subclasses.
注解
Although the recipe for forward pass needs to be defined within this function, one should call the
Module
instance afterwards instead of this since the former takes care of running the registered hooks while the latter silently ignores them.
- class mmpose.models.losses.SmoothL1Loss(use_target_weight=False, loss_weight=1.0)[源代码]¶
SmoothL1Loss loss.
- 参数
use_target_weight (bool) – Option to use weighted MSE loss. Different joint types may have different target weights.
loss_weight (float) – Weight of the loss. Default: 1.0.
- forward(output, target, target_weight=None)[源代码]¶
Forward function.
注解
batch_size: N
num_keypoints: K
dimension of keypoints: D (D=2 or D=3)
- 参数
output (torch.Tensor[N, K, D]) – Output regression.
target (torch.Tensor[N, K, D]) – Target regression.
target_weight (torch.Tensor[N, K, D]) – Weights across different joint types.
- class mmpose.models.losses.SoftWingLoss(omega1=2.0, omega2=20.0, epsilon=0.5, use_target_weight=False, loss_weight=1.0)[源代码]¶
Soft Wing Loss ‘Structure-Coherent Deep Feature Learning for Robust Face Alignment’ Lin et al. TIP’2021.
- 参数
omega1 (float) – The first threshold.
omega2 (float) – The second threshold.
epsilon (float) – Also referred to as curvature.
use_target_weight (bool) – Option to use weighted MSE loss. Different joint types may have different target weights.
loss_weight (float) – Weight of the loss. Default: 1.0.
- criterion(pred, target)[源代码]¶
Criterion of wingloss.
注解
batch_size: N num_keypoints: K dimension of keypoints: D (D=2 or D=3)
- 参数
pred (torch.Tensor[N, K, D]) – Output regression.
target (torch.Tensor[N, K, D]) – Target regression.
- forward(output, target, target_weight=None)[源代码]¶
Forward function.
注解
batch_size: N num_keypoints: K dimension of keypoints: D (D=2 or D=3)
- 参数
output (torch.Tensor[N, K, D]) – Output regression.
target (torch.Tensor[N, K, D]) – Target regression.
target_weight (torch.Tensor[N, K, D]) – Weights across different joint types.
- class mmpose.models.losses.WingLoss(omega=10.0, epsilon=2.0, use_target_weight=False, loss_weight=1.0)[源代码]¶
Wing Loss. paper ref: ‘Wing Loss for Robust Facial Landmark Localisation with Convolutional Neural Networks’ Feng et al. CVPR’2018.
- 参数
omega (float) – Also referred to as width.
epsilon (float) – Also referred to as curvature.
use_target_weight (bool) – Option to use weighted MSE loss. Different joint types may have different target weights.
loss_weight (float) – Weight of the loss. Default: 1.0.
- criterion(pred, target)[源代码]¶
Criterion of wingloss.
注解
batch_size: N
num_keypoints: K
dimension of keypoints: D (D=2 or D=3)
- 参数
pred (torch.Tensor[N, K, D]) – Output regression.
target (torch.Tensor[N, K, D]) – Target regression.
- forward(output, target, target_weight=None)[源代码]¶
Forward function.
注解
batch_size: N
num_keypoints: K
dimension of keypoints: D (D=2 or D=3)
- 参数
output (torch.Tensor[N, K, D]) – Output regression.
target (torch.Tensor[N, K, D]) – Target regression.
target_weight (torch.Tensor[N,K,D]) – Weights across different joint types.
misc¶
mmpose.datasets¶
- class mmpose.datasets.AnimalATRWDataset(ann_file, img_prefix, data_cfg, pipeline, dataset_info=None, test_mode=False)[源代码]¶
ATRW dataset for animal pose estimation.
“ATRW: A Benchmark for Amur Tiger Re-identification in the Wild” ACM MM’2020. More details can be found in the paper .
The dataset loads raw features and apply specified transforms to return a dict containing the image tensors and other information.
ATRW keypoint indexes:
0: "left_ear", 1: "right_ear", 2: "nose", 3: "right_shoulder", 4: "right_front_paw", 5: "left_shoulder", 6: "left_front_paw", 7: "right_hip", 8: "right_knee", 9: "right_back_paw", 10: "left_hip", 11: "left_knee", 12: "left_back_paw", 13: "tail", 14: "center"
- 参数
ann_file (str) – Path to the annotation file.
img_prefix (str) – Path to a directory where images are held. Default: None.
data_cfg (dict) – config
pipeline (list[dict | callable]) – A sequence of data transforms.
dataset_info (DatasetInfo) – A class containing all dataset info.
test_mode (bool) – Store True when building test or validation dataset. Default: False.
- evaluate(outputs, res_folder, metric='mAP', **kwargs)[源代码]¶
Evaluate coco keypoint results. The pose prediction results will be saved in
${res_folder}/result_keypoints.json
.注解
batch_size: N
num_keypoints: K
heatmap height: H
heatmap width: W
- 参数
outputs (list[dict]) –
Outputs containing the following items.
preds (np.ndarray[N,K,3]): The first two dimensions are coordinates, score is the third dimension of the array.
boxes (np.ndarray[N,6]): [center[0], center[1], scale[0], scale[1],area, score]
image_paths (list[str]): For example, [‘data/coco/val2017 /000000393226.jpg’]
heatmap (np.ndarray[N, K, H, W]): model output heatmap
bbox_id (list(int)).
res_folder (str) – Path of directory to save the results.
metric (str | list[str]) – Metric to be performed. Defaults: ‘mAP’.
- 返回
Evaluation results for evaluation metric.
- 返回类型
dict
- class mmpose.datasets.AnimalFlyDataset(ann_file, img_prefix, data_cfg, pipeline, dataset_info=None, test_mode=False)[源代码]¶
AnimalFlyDataset for animal pose estimation.
“Fast animal pose estimation using deep neural networks” Nature methods’2019. More details can be found in the paper .
The dataset loads raw features and apply specified transforms to return a dict containing the image tensors and other information.
Vinegar Fly keypoint indexes:
0: "head", 1: "eyeL", 2: "eyeR", 3: "neck", 4: "thorax", 5: "abdomen", 6: "forelegR1", 7: "forelegR2", 8: "forelegR3", 9: "forelegR4", 10: "midlegR1", 11: "midlegR2", 12: "midlegR3", 13: "midlegR4", 14: "hindlegR1", 15: "hindlegR2", 16: "hindlegR3", 17: "hindlegR4", 18: "forelegL1", 19: "forelegL2", 20: "forelegL3", 21: "forelegL4", 22: "midlegL1", 23: "midlegL2", 24: "midlegL3", 25: "midlegL4", 26: "hindlegL1", 27: "hindlegL2", 28: "hindlegL3", 29: "hindlegL4", 30: "wingL", 31: "wingR"
- 参数
ann_file (str) – Path to the annotation file.
img_prefix (str) – Path to a directory where images are held. Default: None.
data_cfg (dict) – config
pipeline (list[dict | callable]) – A sequence of data transforms.
dataset_info (DatasetInfo) – A class containing all dataset info.
test_mode (bool) – Store True when building test or validation dataset. Default: False.
- evaluate(outputs, res_folder, metric='PCK', **kwargs)[源代码]¶
Evaluate Fly keypoint results. The pose prediction results will be saved in
${res_folder}/result_keypoints.json
.注解
batch_size: N
num_keypoints: K
heatmap height: H
heatmap width: W
- 参数
outputs (list[dict]) –
Outputs containing the following items.
preds (np.ndarray[N,K,3]): The first two dimensions are coordinates, score is the third dimension of the array.
boxes (np.ndarray[N,6]): [center[0], center[1], scale[0], scale[1],area, score]
image_paths (list[str]): For example, [‘Test/source/0.jpg’]
output_heatmap (np.ndarray[N, K, H, W]): model outputs.
res_folder (str) – Path of directory to save the results.
metric (str | list[str]) – Metric to be performed. Options: ‘PCK’, ‘AUC’, ‘EPE’.
- 返回
Evaluation results for evaluation metric.
- 返回类型
dict
- class mmpose.datasets.AnimalHorse10Dataset(ann_file, img_prefix, data_cfg, pipeline, dataset_info=None, test_mode=False)[源代码]¶
AnimalHorse10Dataset for animal pose estimation.
“Pretraining boosts out-of-domain robustness for pose estimation” WACV’2021. More details can be found in the paper .
The dataset loads raw features and apply specified transforms to return a dict containing the image tensors and other information.
Horse-10 keypoint indexes:
0: 'Nose', 1: 'Eye', 2: 'Nearknee', 3: 'Nearfrontfetlock', 4: 'Nearfrontfoot', 5: 'Offknee', 6: 'Offfrontfetlock', 7: 'Offfrontfoot', 8: 'Shoulder', 9: 'Midshoulder', 10: 'Elbow', 11: 'Girth', 12: 'Wither', 13: 'Nearhindhock', 14: 'Nearhindfetlock', 15: 'Nearhindfoot', 16: 'Hip', 17: 'Stifle', 18: 'Offhindhock', 19: 'Offhindfetlock', 20: 'Offhindfoot', 21: 'Ischium'
- 参数
ann_file (str) – Path to the annotation file.
img_prefix (str) – Path to a directory where images are held. Default: None.
data_cfg (dict) – config
pipeline (list[dict | callable]) – A sequence of data transforms.
dataset_info (DatasetInfo) – A class containing all dataset info.
test_mode (bool) – Store True when building test or validation dataset. Default: False.
- evaluate(outputs, res_folder, metric='PCK', **kwargs)[源代码]¶
Evaluate horse-10 keypoint results. The pose prediction results will be saved in
${res_folder}/result_keypoints.json
.注解
batch_size: N
num_keypoints: K
heatmap height: H
heatmap width: W
- 参数
outputs (list[dict]) –
Outputs containing the following items.
preds (np.ndarray[N,K,3]): The first two dimensions are coordinates, score is the third dimension of the array.
boxes (np.ndarray[N,6]): [center[0], center[1], scale[0], scale[1],area, score]
image_paths (list[str]): For example, [‘Test/source/0.jpg’]
output_heatmap (np.ndarray[N, K, H, W]): model outputs.
res_folder (str) – Path of directory to save the results.
metric (str | list[str]) – Metric to be performed. Options: ‘PCK’, ‘NME’.
- 返回
Evaluation results for evaluation metric.
- 返回类型
dict
- class mmpose.datasets.AnimalLocustDataset(ann_file, img_prefix, data_cfg, pipeline, dataset_info=None, test_mode=False)[源代码]¶
AnimalLocustDataset for animal pose estimation.
“DeepPoseKit, a software toolkit for fast and robust animal pose estimation using deep learning” Elife’2019. More details can be found in the paper.
The dataset loads raw features and apply specified transforms to return a dict containing the image tensors and other information.
Desert Locust keypoint indexes:
0: "head", 1: "neck", 2: "thorax", 3: "abdomen1", 4: "abdomen2", 5: "anttipL", 6: "antbaseL", 7: "eyeL", 8: "forelegL1", 9: "forelegL2", 10: "forelegL3", 11: "forelegL4", 12: "midlegL1", 13: "midlegL2", 14: "midlegL3", 15: "midlegL4", 16: "hindlegL1", 17: "hindlegL2", 18: "hindlegL3", 19: "hindlegL4", 20: "anttipR", 21: "antbaseR", 22: "eyeR", 23: "forelegR1", 24: "forelegR2", 25: "forelegR3", 26: "forelegR4", 27: "midlegR1", 28: "midlegR2", 29: "midlegR3", 30: "midlegR4", 31: "hindlegR1", 32: "hindlegR2", 33: "hindlegR3", 34: "hindlegR4"
- 参数
ann_file (str) – Path to the annotation file.
img_prefix (str) – Path to a directory where images are held. Default: None.
data_cfg (dict) – config
pipeline (list[dict | callable]) – A sequence of data transforms.
dataset_info (DatasetInfo) – A class containing all dataset info.
test_mode (bool) – Store True when building test or validation dataset. Default: False.
- evaluate(outputs, res_folder, metric='PCK', **kwargs)[源代码]¶
Evaluate Fly keypoint results. The pose prediction results will be saved in
${res_folder}/result_keypoints.json
.注解
batch_size: N
num_keypoints: K
heatmap height: H
heatmap width: W
- 参数
outputs (list[dict]) –
Outputs containing the following items.
preds (np.ndarray[N,K,3]): The first two dimensions are coordinates, score is the third dimension of the array.
boxes (np.ndarray[N,6]): [center[0], center[1], scale[0], scale[1],area, score]
image_paths (list[str]): For example, [‘Test/source/0.jpg’]
output_heatmap (np.ndarray[N, K, H, W]): model outputs.
res_folder (str) – Path of directory to save the results.
metric (str | list[str]) – Metric to be performed. Options: ‘PCK’, ‘AUC’, ‘EPE’.
- 返回
Evaluation results for evaluation metric.
- 返回类型
dict
- class mmpose.datasets.AnimalMacaqueDataset(ann_file, img_prefix, data_cfg, pipeline, dataset_info=None, test_mode=False)[源代码]¶
MacaquePose dataset for animal pose estimation.
“MacaquePose: A novel ‘in the wild’ macaque monkey pose dataset for markerless motion capture” bioRxiv’2020. More details can be found in the paper .
The dataset loads raw features and apply specified transforms to return a dict containing the image tensors and other information.
Macaque keypoint indexes:
0: 'nose', 1: 'left_eye', 2: 'right_eye', 3: 'left_ear', 4: 'right_ear', 5: 'left_shoulder', 6: 'right_shoulder', 7: 'left_elbow', 8: 'right_elbow', 9: 'left_wrist', 10: 'right_wrist', 11: 'left_hip', 12: 'right_hip', 13: 'left_knee', 14: 'right_knee', 15: 'left_ankle', 16: 'right_ankle'
- 参数
ann_file (str) – Path to the annotation file.
img_prefix (str) – Path to a directory where images are held. Default: None.
data_cfg (dict) – config
pipeline (list[dict | callable]) – A sequence of data transforms.
dataset_info (DatasetInfo) – A class containing all dataset info.
test_mode (bool) – Store True when building test or validation dataset. Default: False.
- evaluate(outputs, res_folder, metric='mAP', **kwargs)[源代码]¶
Evaluate coco keypoint results. The pose prediction results will be saved in
${res_folder}/result_keypoints.json
.注解
batch_size: N num_keypoints: K heatmap height: H heatmap width: W
- 参数
outputs (list[dict]) –
Outputs containing the following items.
preds (np.ndarray[N,K,3]): The first two dimensions are coordinates, score is the third dimension of the array.
boxes (np.ndarray[N,6]): [center[0], center[1], scale[0], scale[1],area, score]
image_paths (list[str]): For example, [‘data/coco/val2017 /000000393226.jpg’]
heatmap (np.ndarray[N, K, H, W]): model output heatmap
bbox_id (list(int)).
res_folder (str) – Path of directory to save the results.
metric (str | list[str]) – Metric to be performed. Defaults: ‘mAP’.
- 返回
Evaluation results for evaluation metric.
- 返回类型
dict
- class mmpose.datasets.AnimalPoseDataset(ann_file, img_prefix, data_cfg, pipeline, dataset_info=None, test_mode=False)[源代码]¶
Animal-Pose dataset for animal pose estimation.
“Cross-domain Adaptation For Animal Pose Estimation” ICCV’2019 More details can be found in the paper .
The dataset loads raw features and apply specified transforms to return a dict containing the image tensors and other information.
Animal-Pose keypoint indexes:
0: 'L_Eye', 1: 'R_Eye', 2: 'L_EarBase', 3: 'R_EarBase', 4: 'Nose', 5: 'Throat', 6: 'TailBase', 7: 'Withers', 8: 'L_F_Elbow', 9: 'R_F_Elbow', 10: 'L_B_Elbow', 11: 'R_B_Elbow', 12: 'L_F_Knee', 13: 'R_F_Knee', 14: 'L_B_Knee', 15: 'R_B_Knee', 16: 'L_F_Paw', 17: 'R_F_Paw', 18: 'L_B_Paw', 19: 'R_B_Paw'
- 参数
ann_file (str) – Path to the annotation file.
img_prefix (str) – Path to a directory where images are held. Default: None.
data_cfg (dict) – config
pipeline (list[dict | callable]) – A sequence of data transforms.
dataset_info (DatasetInfo) – A class containing all dataset info.
test_mode (bool) – Store True when building test or validation dataset. Default: False.
- evaluate(outputs, res_folder, metric='mAP', **kwargs)[源代码]¶
Evaluate coco keypoint results. The pose prediction results will be saved in
${res_folder}/result_keypoints.json
.注解
batch_size: N
num_keypoints: K
heatmap height: H
heatmap width: W
- 参数
outputs (list[dict]) –
Outputs containing the following items.
preds (np.ndarray[N,K,3]): The first two dimensions are coordinates, score is the third dimension of the array.
boxes (np.ndarray[N,6]): [center[0], center[1], scale[0], scale[1],area, score]
image_paths (list[str]): For example, [‘data/coco/val2017 /000000393226.jpg’]
heatmap (np.ndarray[N, K, H, W]): model output heatmap
bbox_id (list(int)).
res_folder (str) – Path of directory to save the results.
metric (str | list[str]) – Metric to be performed. Defaults: ‘mAP’.
- 返回
Evaluation results for evaluation metric.
- 返回类型
dict
- class mmpose.datasets.AnimalZebraDataset(ann_file, img_prefix, data_cfg, pipeline, dataset_info=None, test_mode=False)[源代码]¶
AnimalZebraDataset for animal pose estimation.
“DeepPoseKit, a software toolkit for fast and robust animal pose estimation using deep learning” Elife’2019. More details can be found in the paper.
The dataset loads raw features and apply specified transforms to return a dict containing the image tensors and other information.
Desert Locust keypoint indexes:
0: "snout", 1: "head", 2: "neck", 3: "forelegL1", 4: "forelegR1", 5: "hindlegL1", 6: "hindlegR1", 7: "tailbase", 8: "tailtip"
- 参数
ann_file (str) – Path to the annotation file.
img_prefix (str) – Path to a directory where images are held. Default: None.
data_cfg (dict) – config
pipeline (list[dict | callable]) – A sequence of data transforms.
dataset_info (DatasetInfo) – A class containing all dataset info.
test_mode (bool) – Store True when building test or validation dataset. Default: False.
- evaluate(outputs, res_folder, metric='PCK', **kwargs)[源代码]¶
Evaluate Fly keypoint results. The pose prediction results will be saved in
${res_folder}/result_keypoints.json
.注解
batch_size: N
num_keypoints: K
heatmap height: H
heatmap width: W
- 参数
outputs (list[dict]) –
Outputs containing the following items.
preds (np.ndarray[N,K,3]): The first two dimensions are coordinates, score is the third dimension of the array.
boxes (np.ndarray[N,6]): [center[0], center[1], scale[0], scale[1],area, score]
image_paths (list[str]): For example, [‘Test/source/0.jpg’]
output_heatmap (np.ndarray[N, K, H, W]): model outputs.
res_folder (str) – Path of directory to save the results.
metric (str | list[str]) – Metric to be performed. Options: ‘PCK’, ‘AUC’, ‘EPE’.
- 返回
Evaluation results for evaluation metric.
- 返回类型
dict
- class mmpose.datasets.Body3DH36MDataset(ann_file, img_prefix, data_cfg, pipeline, dataset_info=None, test_mode=False)[源代码]¶
Human3.6M dataset for 3D human pose estimation.
“Human3.6M: Large Scale Datasets and Predictive Methods for 3D Human Sensing in Natural Environments”, TPAMI`2014. More details can be found in the paper.
Human3.6M keypoint indexes:
0: 'root (pelvis)', 1: 'right_hip', 2: 'right_knee', 3: 'right_foot', 4: 'left_hip', 5: 'left_knee', 6: 'left_foot', 7: 'spine', 8: 'thorax', 9: 'neck_base', 10: 'head', 11: 'left_shoulder', 12: 'left_elbow', 13: 'left_wrist', 14: 'right_shoulder', 15: 'right_elbow', 16: 'right_wrist'
- 参数
ann_file (str) – Path to the annotation file.
img_prefix (str) – Path to a directory where images are held. Default: None.
data_cfg (dict) – config
pipeline (list[dict | callable]) – A sequence of data transforms.
dataset_info (DatasetInfo) – A class containing all dataset info.
test_mode (bool) – Store True when building test or validation dataset. Default: False.
- class mmpose.datasets.BottomUpAicDataset(ann_file, img_prefix, data_cfg, pipeline, dataset_info=None, test_mode=False)[源代码]¶
Aic dataset for bottom-up pose estimation.
“AI Challenger : A Large-scale Dataset for Going Deeper in Image Understanding”, arXiv’2017. More details can be found in the paper
The dataset loads raw features and apply specified transforms to return a dict containing the image tensors and other information.
AIC keypoint indexes:
0: "right_shoulder", 1: "right_elbow", 2: "right_wrist", 3: "left_shoulder", 4: "left_elbow", 5: "left_wrist", 6: "right_hip", 7: "right_knee", 8: "right_ankle", 9: "left_hip", 10: "left_knee", 11: "left_ankle", 12: "head_top", 13: "neck"
- 参数
ann_file (str) – Path to the annotation file.
img_prefix (str) – Path to a directory where images are held. Default: None.
data_cfg (dict) – config
pipeline (list[dict | callable]) – A sequence of data transforms.
dataset_info (DatasetInfo) – A class containing all dataset info.
test_mode (bool) – Store True when building test or validation dataset. Default: False.
- class mmpose.datasets.BottomUpCocoDataset(ann_file, img_prefix, data_cfg, pipeline, dataset_info=None, test_mode=False)[源代码]¶
COCO dataset for bottom-up pose estimation.
The dataset loads raw features and apply specified transforms to return a dict containing the image tensors and other information.
COCO keypoint indexes:
0: 'nose', 1: 'left_eye', 2: 'right_eye', 3: 'left_ear', 4: 'right_ear', 5: 'left_shoulder', 6: 'right_shoulder', 7: 'left_elbow', 8: 'right_elbow', 9: 'left_wrist', 10: 'right_wrist', 11: 'left_hip', 12: 'right_hip', 13: 'left_knee', 14: 'right_knee', 15: 'left_ankle', 16: 'right_ankle'
- 参数
ann_file (str) – Path to the annotation file.
img_prefix (str) – Path to a directory where images are held. Default: None.
data_cfg (dict) – config
pipeline (list[dict | callable]) – A sequence of data transforms.
dataset_info (DatasetInfo) – A class containing all dataset info.
test_mode (bool) – Store True when building test or validation dataset. Default: False.
- evaluate(outputs, res_folder, metric='mAP', **kwargs)[源代码]¶
Evaluate coco keypoint results. The pose prediction results will be saved in
${res_folder}/result_keypoints.json
.注解
num_people: P
num_keypoints: K
- 参数
outputs (list[dict]) –
Outputs containing the following items.
preds (list[np.ndarray(P, K, 3+tag_num)]): Pose predictions for all people in images.
scores (list[P]): List of person scores.
image_path (list[str]): For example, [‘coco/images/ val2017/000000397133.jpg’]
heatmap (np.ndarray[N, K, H, W]): model outputs.
res_folder (str) – Path of directory to save the results.
metric (str | list[str]) – Metric to be performed. Defaults: ‘mAP’.
- 返回
Evaluation results for evaluation metric.
- 返回类型
dict
- class mmpose.datasets.BottomUpCocoWholeBodyDataset(ann_file, img_prefix, data_cfg, pipeline, dataset_info=None, test_mode=False)[源代码]¶
CocoWholeBodyDataset dataset for bottom-up pose estimation.
Whole-Body Human Pose Estimation in the Wild’, ECCV’2020. More details can be found in the `paper .
The dataset loads raw features and apply specified transforms to return a dict containing the image tensors and other information.
In total, we have 133 keypoints for wholebody pose estimation.
COCO-WholeBody keypoint indexes:
0-16: 17 body keypoints, 17-22: 6 foot keypoints, 23-90: 68 face keypoints, 91-132: 42 hand keypoints
- 参数
ann_file (str) – Path to the annotation file.
img_prefix (str) – Path to a directory where images are held. Default: None.
data_cfg (dict) – config
pipeline (list[dict | callable]) – A sequence of data transforms.
dataset_info (DatasetInfo) – A class containing all dataset info.
test_mode (bool) – Store True when building test or validation dataset. Default: False.
- class mmpose.datasets.BottomUpCrowdPoseDataset(ann_file, img_prefix, data_cfg, pipeline, dataset_info=None, test_mode=False)[源代码]¶
CrowdPose dataset for bottom-up pose estimation.
“CrowdPose: Efficient Crowded Scenes Pose Estimation and A New Benchmark”, CVPR’2019. More details can be found in the paper.
The dataset loads raw features and apply specified transforms to return a dict containing the image tensors and other information.
CrowdPose keypoint indexes:
0: 'left_shoulder', 1: 'right_shoulder', 2: 'left_elbow', 3: 'right_elbow', 4: 'left_wrist', 5: 'right_wrist', 6: 'left_hip', 7: 'right_hip', 8: 'left_knee', 9: 'right_knee', 10: 'left_ankle', 11: 'right_ankle', 12: 'top_head', 13: 'neck'
- 参数
ann_file (str) – Path to the annotation file.
img_prefix (str) – Path to a directory where images are held. Default: None.
data_cfg (dict) – config
pipeline (list[dict | callable]) – A sequence of data transforms.
dataset_info (DatasetInfo) – A class containing all dataset info.
test_mode (bool) – Store True when building test or validation dataset. Default: False.
- class mmpose.datasets.BottomUpMhpDataset(ann_file, img_prefix, data_cfg, pipeline, dataset_info=None, test_mode=False)[源代码]¶
MHPv2.0 dataset for top-down pose estimation.
“Understanding Humans in Crowded Scenes: Deep Nested Adversarial Learning and A New Benchmark for Multi-Human Parsing”, ACM MM’2018. More details can be found in the paper
The dataset loads raw features and apply specified transforms to return a dict containing the image tensors and other information.
MHP keypoint indexes:
0: "right ankle", 1: "right knee", 2: "right hip", 3: "left hip", 4: "left knee", 5: "left ankle", 6: "pelvis", 7: "thorax", 8: "upper neck", 9: "head top", 10: "right wrist", 11: "right elbow", 12: "right shoulder", 13: "left shoulder", 14: "left elbow", 15: "left wrist",
- 参数
ann_file (str) – Path to the annotation file.
img_prefix (str) – Path to a directory where images are held. Default: None.
data_cfg (dict) – config
pipeline (list[dict | callable]) – A sequence of data transforms.
dataset_info (DatasetInfo) – A class containing all dataset info.
test_mode (bool) – Store True when building test or validation dataset. Default: False.
- class mmpose.datasets.Compose(transforms)[源代码]¶
Compose a data pipeline with a sequence of transforms.
- 参数
transforms (list[dict | callable]) – Either config dicts of transforms or transform objects.
- class mmpose.datasets.DeepFashionDataset(ann_file, img_prefix, data_cfg, pipeline, subset='', dataset_info=None, test_mode=False)[源代码]¶
DeepFashion dataset (full-body clothes) for fashion landmark detection.
“DeepFashion: Powering Robust Clothes Recognition and Retrieval with Rich Annotations”, CVPR’2016. “Fashion Landmark Detection in the Wild”, ECCV’2016.
The dataset loads raw features and apply specified transforms to return a dict containing the image tensors and other information.
The dataset contains 3 categories for full-body, upper-body and lower-body.
Fashion landmark indexes for upper-body clothes:
0: 'left collar', 1: 'right collar', 2: 'left sleeve', 3: 'right sleeve', 4: 'left hem', 5: 'right hem'
Fashion landmark indexes for lower-body clothes:
0: 'left waistline', 1: 'right waistline', 2: 'left hem', 3: 'right hem'
Fashion landmark indexes for full-body clothes:
0: 'left collar', 1: 'right collar', 2: 'left sleeve', 3: 'right sleeve', 4: 'left waistline', 5: 'right waistline', 6: 'left hem', 7: 'right hem'
- 参数
ann_file (str) – Path to the annotation file.
img_prefix (str) – Path to a directory where images are held. Default: None.
data_cfg (dict) – config
pipeline (list[dict | callable]) – A sequence of data transforms.
dataset_info (DatasetInfo) – A class containing all dataset info.
test_mode (bool) – Store True when building test or validation dataset. Default: False.
- evaluate(outputs, res_folder, metric='PCK', **kwargs)[源代码]¶
Evaluate freihand keypoint results. The pose prediction results will be saved in
${res_folder}/result_keypoints.json
.注解
batch_size: N
num_keypoints: K
heatmap height: H
heatmap width: W
- 参数
outputs (list[dict]) –
Outputs containing the following items.
preds (np.ndarray[N,K,3]): The first two dimensions are coordinates, score is the third dimension of the array.
boxes (np.ndarray[N,6]): [center[0], center[1], scale[0], scale[1],area, score]
image_paths (list[str]): For example, [‘img_00000001.jpg’]
output_heatmap (np.ndarray[N, K, H, W]): model outputs.
res_folder (str) – Path of directory to save the results.
metric (str | list[str]) – Metric to be performed. Options: ‘PCK’, ‘AUC’, ‘EPE’.
- 返回
Evaluation results for evaluation metric.
- 返回类型
dict
- class mmpose.datasets.DistributedSampler(dataset, num_replicas=None, rank=None, shuffle=True, seed=0)[源代码]¶
DistributedSampler inheriting from torch.utils.data.DistributedSampler.
In pytorch of lower versions, there is no shuffle argument. This child class will port one to DistributedSampler.
- class mmpose.datasets.Face300WDataset(ann_file, img_prefix, data_cfg, pipeline, dataset_info=None, test_mode=False)[源代码]¶
Face300W dataset for top-down face keypoint localization.
“300 faces In-the-wild challenge: Database and results”, Image and Vision Computing (IMAVIS) 2019.
The dataset loads raw images and apply specified transforms to return a dict containing the image tensors and other information.
The landmark annotations follow the 68 points mark-up. The definition can be found in https://ibug.doc.ic.ac.uk/resources/300-W/.
- 参数
ann_file (str) – Path to the annotation file.
img_prefix (str) – Path to a directory where images are held. Default: None.
data_cfg (dict) – config
pipeline (list[dict | callable]) – A sequence of data transforms.
dataset_info (DatasetInfo) – A class containing all dataset info.
test_mode (bool) – Store True when building test or validation dataset. Default: False.
- evaluate(outputs, res_folder, metric='NME', **kwargs)[源代码]¶
Evaluate freihand keypoint results. The pose prediction results will be saved in
${res_folder}/result_keypoints.json
.注解
batch_size: N
num_keypoints: K
heatmap height: H
heatmap width: W
- 参数
outputs (list[dict]) –
Outputs containing the following items.
preds (np.ndarray[1,K,3]): The first two dimensions are coordinates, score is the third dimension of the array.
boxes (np.ndarray[1,6]): [center[0], center[1], scale[0], scale[1],area, score]
image_path (list[str]): For example, [‘300W/ibug/ image_018.jpg’]
output_heatmap (np.ndarray[N, K, H, W]): model outputs.
res_folder (str) – Path of directory to save the results.
metric (str | list[str]) – Metric to be performed. Options: ‘NME’.
- 返回
Evaluation results for evaluation metric.
- 返回类型
dict
- class mmpose.datasets.FaceAFLWDataset(ann_file, img_prefix, data_cfg, pipeline, dataset_info=None, test_mode=False)[源代码]¶
Face AFLW dataset for top-down face keypoint localization.
“Annotated Facial Landmarks in the Wild: A Large-scale, Real-world Database for Facial Landmark Localization”. In Proc. First IEEE International Workshop on Benchmarking Facial Image Analysis Technologies, 2011.
The dataset loads raw images and apply specified transforms to return a dict containing the image tensors and other information.
The landmark annotations follow the 19 points mark-up. The definition can be found in https://www.tugraz.at/institute/icg/research /team-bischof/lrs/downloads/aflw/
- 参数
ann_file (str) – Path to the annotation file.
img_prefix (str) – Path to a directory where images are held. Default: None.
data_cfg (dict) – config
pipeline (list[dict | callable]) – A sequence of data transforms.
dataset_info (DatasetInfo) – A class containing all dataset info.
test_mode (bool) – Store True when building test or validation dataset. Default: False.
- evaluate(outputs, res_folder, metric='NME', **kwargs)[源代码]¶
Evaluate freihand keypoint results. The pose prediction results will be saved in
${res_folder}/result_keypoints.json
.注解
batch_size: N
num_keypoints: K
heatmap height: H
heatmap width: W
- 参数
outputs (list[dict]) –
Outputs containing the following items.
preds (np.ndarray[1,K,3]): The first two dimensions are coordinates, score is the third dimension of the array.
boxes (np.ndarray[1,6]): [center[0], center[1], scale[0], scale[1],area, score]
image_path (list[str]): For example, [‘300W/ibug/ image_018.jpg’]
output_heatmap (np.ndarray[N, K, H, W]): model outputs.
res_folder (str) – Path of directory to save the results.
metric (str | list[str]) – Metric to be performed. Options: ‘NME’.
- 返回
Evaluation results for evaluation metric.
- 返回类型
dict
- class mmpose.datasets.FaceCOFWDataset(ann_file, img_prefix, data_cfg, pipeline, dataset_info=None, test_mode=False)[源代码]¶
Face COFW dataset for top-down face keypoint localization.
“Robust face landmark estimation under occlusion”, ICCV’2013.
The dataset loads raw images and apply specified transforms to return a dict containing the image tensors and other information.
The landmark annotations follow the 29 points mark-up. The definition can be found in http://www.vision.caltech.edu/xpburgos/ICCV13/.
- 参数
ann_file (str) – Path to the annotation file.
img_prefix (str) – Path to a directory where images are held. Default: None.
data_cfg (dict) – config
pipeline (list[dict | callable]) – A sequence of data transforms.
dataset_info (DatasetInfo) – A class containing all dataset info.
test_mode (bool) – Store True when building test or validation dataset. Default: False.
- evaluate(outputs, res_folder, metric='NME', **kwargs)[源代码]¶
Evaluate freihand keypoint results. The pose prediction results will be saved in
${res_folder}/result_keypoints.json
.注解
batch_size: N
num_keypoints: K
heatmap height: H
heatmap width: W
- 参数
outputs (list[dict]) –
Outputs containing the following items.
preds (np.ndarray[1,K,3]): The first two dimensions are coordinates, score is the third dimension of the array.
boxes (np.ndarray[1,6]): [center[0], center[1], scale[0], scale[1],area, score]
image_path (list[str]): For example, [‘300W/ibug/ image_018.jpg’]
output_heatmap (np.ndarray[N, K, H, W]): model outputs.
res_folder (str) – Path of directory to save the results.
metric (str | list[str]) – Metric to be performed. Options: ‘NME’.
- 返回
Evaluation results for evaluation metric.
- 返回类型
dict
- class mmpose.datasets.FaceCocoWholeBodyDataset(ann_file, img_prefix, data_cfg, pipeline, dataset_info=None, test_mode=False)[源代码]¶
CocoWholeBodyDataset for face keypoint localization.
Whole-Body Human Pose Estimation in the Wild’, ECCV’2020. More details can be found in the `paper .
The dataset loads raw features and apply specified transforms to return a dict containing the image tensors and other information.
The face landmark annotations follow the 68 points mark-up.
- 参数
ann_file (str) – Path to the annotation file.
img_prefix (str) – Path to a directory where images are held. Default: None.
data_cfg (dict) – config
pipeline (list[dict | callable]) – A sequence of data transforms.
dataset_info (DatasetInfo) – A class containing all dataset info.
test_mode (bool) – Store True when building test or validation dataset. Default: False.
- evaluate(outputs, res_folder, metric='NME', **kwargs)[源代码]¶
Evaluate COCO-WholeBody Face keypoint results. The pose prediction results will be saved in
${res_folder}/result_keypoints.json
.注解
batch_size: N
num_keypoints: K
heatmap height: H
heatmap width: W
- 参数
outputs (list[dict]) –
Outputs containing the following items.
preds (np.ndarray[1,K,3]): The first two dimensions are coordinates, score is the third dimension of the array.
boxes (np.ndarray[1,6]): [center[0], center[1], scale[0], scale[1],area, score]
image_path (list[str]): For example, [‘300W/ibug/ image_018.jpg’]
output_heatmap (np.ndarray[N, K, H, W]): model outputs.
res_folder (str) – Path of directory to save the results.
metric (str | list[str]) – Metric to be performed. Options: ‘NME’.
- 返回
Evaluation results for evaluation metric.
- 返回类型
dict
- class mmpose.datasets.FaceWFLWDataset(ann_file, img_prefix, data_cfg, pipeline, dataset_info=None, test_mode=False)[源代码]¶
Face WFLW dataset for top-down face keypoint localization.
“Look at Boundary: A Boundary-Aware Face Alignment Algorithm”, CVPR’2018.
The dataset loads raw images and apply specified transforms to return a dict containing the image tensors and other information.
The landmark annotations follow the 98 points mark-up. The definition can be found in https://wywu.github.io/projects/LAB/WFLW.html.
- 参数
ann_file (str) – Path to the annotation file.
img_prefix (str) – Path to a directory where images are held. Default: None.
data_cfg (dict) – config
pipeline (list[dict | callable]) – A sequence of data transforms.
dataset_info (DatasetInfo) – A class containing all dataset info.
test_mode (bool) – Store True when building test or validation dataset. Default: False.
- evaluate(outputs, res_folder, metric='NME', **kwargs)[源代码]¶
Evaluate freihand keypoint results. The pose prediction results will be saved in
${res_folder}/result_keypoints.json
.注解
batch_size: N
num_keypoints: K
heatmap height: H
heatmap width: W
- 参数
outputs (list[dict]) –
Outputs containing the following items.
preds (np.ndarray[1,K,3]): The first two dimensions are coordinates, score is the third dimension of the array.
boxes (np.ndarray[1,6]): [center[0], center[1], scale[0], scale[1],area, score]
image_path (list[str]): For example, [‘300W/ibug/ image_018.jpg’]
output_heatmap (np.ndarray[N, K, H, W]): model outputs.
res_folder (str) – Path of directory to save the results.
metric (str | list[str]) – Metric to be performed. Options: ‘NME’.
- 返回
Evaluation results for evaluation metric.
- 返回类型
dict
- class mmpose.datasets.FreiHandDataset(ann_file, img_prefix, data_cfg, pipeline, dataset_info=None, test_mode=False)[源代码]¶
FreiHand dataset for top-down hand pose estimation.
“FreiHAND: A Dataset for Markerless Capture of Hand Pose and Shape from Single RGB Images”, ICCV’2019. More details can be found in the paper .
The dataset loads raw features and apply specified transforms to return a dict containing the image tensors and other information.
FreiHand keypoint indexes:
0: 'wrist', 1: 'thumb1', 2: 'thumb2', 3: 'thumb3', 4: 'thumb4', 5: 'forefinger1', 6: 'forefinger2', 7: 'forefinger3', 8: 'forefinger4', 9: 'middle_finger1', 10: 'middle_finger2', 11: 'middle_finger3', 12: 'middle_finger4', 13: 'ring_finger1', 14: 'ring_finger2', 15: 'ring_finger3', 16: 'ring_finger4', 17: 'pinky_finger1', 18: 'pinky_finger2', 19: 'pinky_finger3', 20: 'pinky_finger4'
- 参数
ann_file (str) – Path to the annotation file.
img_prefix (str) – Path to a directory where images are held. Default: None.
data_cfg (dict) – config
pipeline (list[dict | callable]) – A sequence of data transforms.
dataset_info (DatasetInfo) – A class containing all dataset info.
test_mode (bool) – Store True when building test or validation dataset. Default: False.
- evaluate(outputs, res_folder, metric='PCK', **kwargs)[源代码]¶
Evaluate freihand keypoint results. The pose prediction results will be saved in
${res_folder}/result_keypoints.json
.注解
batch_size: N
num_keypoints: K
heatmap height: H
heatmap width: W
- 参数
outputs (list[dict]) –
Outputs containing the following items.
preds (np.ndarray[N,K,3]): The first two dimensions are coordinates, score is the third dimension of the array.
boxes (np.ndarray[N,6]): [center[0], center[1], scale[0], scale[1],area, score]
image_paths (list[str]): For example, [‘training/rgb/ 00031426.jpg’]
output_heatmap (np.ndarray[N, K, H, W]): model outputs.
res_folder (str) – Path of directory to save the results.
metric (str | list[str]) – Metric to be performed. Options: ‘PCK’, ‘AUC’, ‘EPE’.
- 返回
Evaluation results for evaluation metric.
- 返回类型
dict
- class mmpose.datasets.HandCocoWholeBodyDataset(ann_file, img_prefix, data_cfg, pipeline, dataset_info=None, test_mode=False)[源代码]¶
CocoWholeBodyDataset for top-down hand pose estimation.
“Whole-Body Human Pose Estimation in the Wild”, ECCV’2020. More details can be found in the paper .
The dataset loads raw features and apply specified transforms to return a dict containing the image tensors and other information.
COCO-WholeBody Hand keypoint indexes:
0: 'wrist', 1: 'thumb1', 2: 'thumb2', 3: 'thumb3', 4: 'thumb4', 5: 'forefinger1', 6: 'forefinger2', 7: 'forefinger3', 8: 'forefinger4', 9: 'middle_finger1', 10: 'middle_finger2', 11: 'middle_finger3', 12: 'middle_finger4', 13: 'ring_finger1', 14: 'ring_finger2', 15: 'ring_finger3', 16: 'ring_finger4', 17: 'pinky_finger1', 18: 'pinky_finger2', 19: 'pinky_finger3', 20: 'pinky_finger4'
- 参数
ann_file (str) – Path to the annotation file.
img_prefix (str) – Path to a directory where images are held. Default: None.
data_cfg (dict) – config
pipeline (list[dict | callable]) – A sequence of data transforms.
dataset_info (DatasetInfo) – A class containing all dataset info.
test_mode (bool) – Store True when building test or validation dataset. Default: False.
- evaluate(outputs, res_folder, metric='PCK', **kwargs)[源代码]¶
Evaluate COCO-WholeBody Hand keypoint results. The pose prediction results will be saved in
${res_folder}/result_keypoints.json
.注解
batch_size: N
num_keypoints: K
heatmap height: H
heatmap width: W
- 参数
outputs (list[dict]) –
Outputs containing the following items.
preds (np.ndarray[N,K,3]): The first two dimensions are coordinates, score is the third dimension of the array.
boxes (np.ndarray[N,6]): [center[0], center[1], scale[0], scale[1],area, score]
image_paths (list[str]): For example, [‘Test/source/0.jpg’]
output_heatmap (np.ndarray[N, K, H, W]): model outputs.
res_folder (str) – Path of directory to save the results.
metric (str | list[str]) – Metric to be performed. Options: ‘PCK’, ‘AUC’, ‘EPE’.
- 返回
Evaluation results for evaluation metric.
- 返回类型
dict
- class mmpose.datasets.InterHand2DDataset(ann_file, camera_file, joint_file, img_prefix, data_cfg, pipeline, dataset_info=None, test_mode=False)[源代码]¶
InterHand2.6M 2D dataset for top-down hand pose estimation.
“InterHand2.6M: A Dataset and Baseline for 3D Interacting Hand Pose Estimation from a Single RGB Image”, ECCV’2020. More details can be found in the paper .
The dataset loads raw features and apply specified transforms to return a dict containing the image tensors and other information.
InterHand2.6M keypoint indexes:
0: 'thumb4', 1: 'thumb3', 2: 'thumb2', 3: 'thumb1', 4: 'forefinger4', 5: 'forefinger3', 6: 'forefinger2', 7: 'forefinger1', 8: 'middle_finger4', 9: 'middle_finger3', 10: 'middle_finger2', 11: 'middle_finger1', 12: 'ring_finger4', 13: 'ring_finger3', 14: 'ring_finger2', 15: 'ring_finger1', 16: 'pinky_finger4', 17: 'pinky_finger3', 18: 'pinky_finger2', 19: 'pinky_finger1', 20: 'wrist'
- 参数
ann_file (str) – Path to the annotation file.
camera_file (str) – Path to the camera file.
joint_file (str) – Path to the joint file.
img_prefix (str) – Path to a directory where images are held. Default: None.
data_cfg (dict) – config
pipeline (list[dict | callable]) – A sequence of data transforms.
dataset_info (DatasetInfo) – A class containing all dataset info.
test_mode (str) – Store True when building test or validation dataset. Default: False.
- evaluate(outputs, res_folder, metric='PCK', **kwargs)[源代码]¶
Evaluate interhand2d keypoint results. The pose prediction results will be saved in
${res_folder}/result_keypoints.json
.注解
batch_size: N
num_keypoints: K
heatmap height: H
heatmap width: W
- 参数
outputs (list[dict]) –
Outputs containing the following items.
preds (np.ndarray[N,K,3]): The first two dimensions are coordinates, score is the third dimension of the array.
boxes (np.ndarray[N,6]): [center[0], center[1], scale[0], scale[1],area, score]
image_paths (list[str]): For example, [‘Capture12/ 0390_dh_touchROM/cam410209/image62434.jpg’]
output_heatmap (np.ndarray[N, K, H, W]): model outputs.
res_folder (str) – Path of directory to save the results.
metric (str | list[str]) – Metric to be performed. Options: ‘PCK’, ‘AUC’, ‘EPE’.
- 返回
Evaluation results for evaluation metric.
- 返回类型
dict
- class mmpose.datasets.InterHand3DDataset(ann_file, camera_file, joint_file, img_prefix, data_cfg, pipeline, use_gt_root_depth=True, rootnet_result_file=None, dataset_info=None, test_mode=False)[源代码]¶
InterHand2.6M 3D dataset for top-down hand pose estimation.
“InterHand2.6M: A Dataset and Baseline for 3D Interacting Hand Pose Estimation from a Single RGB Image”, ECCV’2020. More details can be found in the paper .
The dataset loads raw features and apply specified transforms to return a dict containing the image tensors and other information.
InterHand2.6M keypoint indexes:
0: 'r_thumb4', 1: 'r_thumb3', 2: 'r_thumb2', 3: 'r_thumb1', 4: 'r_index4', 5: 'r_index3', 6: 'r_index2', 7: 'r_index1', 8: 'r_middle4', 9: 'r_middle3', 10: 'r_middle2', 11: 'r_middle1', 12: 'r_ring4', 13: 'r_ring3', 14: 'r_ring2', 15: 'r_ring1', 16: 'r_pinky4', 17: 'r_pinky3', 18: 'r_pinky2', 19: 'r_pinky1', 20: 'r_wrist', 21: 'l_thumb4', 22: 'l_thumb3', 23: 'l_thumb2', 24: 'l_thumb1', 25: 'l_index4', 26: 'l_index3', 27: 'l_index2', 28: 'l_index1', 29: 'l_middle4', 30: 'l_middle3', 31: 'l_middle2', 32: 'l_middle1', 33: 'l_ring4', 34: 'l_ring3', 35: 'l_ring2', 36: 'l_ring1', 37: 'l_pinky4', 38: 'l_pinky3', 39: 'l_pinky2', 40: 'l_pinky1', 41: 'l_wrist'
- 参数
ann_file (str) – Path to the annotation file.
camera_file (str) – Path to the camera file.
joint_file (str) – Path to the joint file.
img_prefix (str) – Path to a directory where images are held. Default: None.
data_cfg (dict) – config
pipeline (list[dict | callable]) – A sequence of data transforms.
use_gt_root_depth (bool) – Using the ground truth depth of the wrist or given depth from rootnet_result_file.
rootnet_result_file (str) – Path to the wrist depth file.
dataset_info (DatasetInfo) – A class containing all dataset info.
test_mode (str) – Store True when building test or validation dataset. Default: False.
- evaluate(outputs, res_folder, metric='MPJPE', **kwargs)[源代码]¶
Evaluate interhand2d keypoint results. The pose prediction results will be saved in
${res_folder}/result_keypoints.json
.注解
batch_size: N
num_keypoints: K
heatmap height: H
heatmap width: W
- 参数
outputs (list[dict]) –
Outputs containing the following items.
preds (np.ndarray[N,K,3]): The first two dimensions are coordinates, score is the third dimension of the array.
hand_type (np.ndarray[N, 4]): The first two dimensions are hand type, scores is the last two dimensions.
rel_root_depth (np.ndarray[N]): The relative depth of left wrist and right wrist.
boxes (np.ndarray[N,6]): [center[0], center[1], scale[0], scale[1],area, score]
image_paths (list[str]): For example, [‘Capture6/ 0012_aokay_upright/cam410061/image4996.jpg’]
output_heatmap (np.ndarray[N, K, H, W]): model outputs.
res_folder (str) – Path of directory to save the results.
metric (str | list[str]) – Metric to be performed. Options: ‘MRRPE’, ‘MPJPE’, ‘Handedness_acc’.
- 返回
Evaluation results for evaluation metric.
- 返回类型
dict
- class mmpose.datasets.MeshAdversarialDataset(train_dataset, adversarial_dataset)[源代码]¶
Mix Dataset for the adversarial training in 3D human mesh estimation task.
The dataset combines data from two datasets and return a dict containing data from two datasets.
- 参数
train_dataset (Dataset) – Dataset for 3D human mesh estimation.
adversarial_dataset (Dataset) – Dataset for adversarial learning, provides real SMPL parameters.
- class mmpose.datasets.MeshH36MDataset(ann_file, img_prefix, data_cfg, pipeline, test_mode=False)[源代码]¶
Human3.6M Dataset for 3D human mesh estimation. It inherits all function from MeshBaseDataset and has its own evaluate function.
The dataset loads raw features and apply specified transforms to return a dict containing the image tensors and other information.
- 参数
ann_file (str) – Path to the annotation file.
img_prefix (str) – Path to a directory where images are held. Default: None.
data_cfg (dict) – config
pipeline (list[dict | callable]) – A sequence of data transforms.
test_mode (bool) – Store True when building test or validation dataset. Default: False.
- class mmpose.datasets.MeshMixDataset(configs, partition)[源代码]¶
Mix Dataset for 3D human mesh estimation.
The dataset combines data from multiple datasets (MeshBaseDataset) and sample the data from different datasets with the provided proportions. The dataset loads raw features and apply specified transforms to return a dict containing the image tensors and other information.
- 参数
configs (list) – List of configs for multiple datasets.
partition (list) – Sample proportion of multiple datasets. The length of partition should be same with that of configs. The elements of it should be non-negative and is not necessary summing up to one.
示例
>>> from mmpose.datasets import MeshMixDataset >>> data_cfg = dict( >>> image_size=[256, 256], >>> iuv_size=[64, 64], >>> num_joints=24, >>> use_IUV=True, >>> uv_type='BF') >>> >>> mix_dataset = MeshMixDataset( >>> configs=[ >>> dict( >>> ann_file='tests/data/h36m/test_h36m.npz', >>> img_prefix='tests/data/h36m', >>> data_cfg=data_cfg, >>> pipeline=[]), >>> dict( >>> ann_file='tests/data/h36m/test_h36m.npz', >>> img_prefix='tests/data/h36m', >>> data_cfg=data_cfg, >>> pipeline=[]), >>> ], >>> partition=[0.6, 0.4])
- class mmpose.datasets.MoshDataset(ann_file, pipeline, test_mode=False)[源代码]¶
Mosh Dataset for the adversarial training in 3D human mesh estimation task.
The dataset return a dict containing real-world SMPL parameters.
- 参数
ann_file (str) – Path to the annotation file.
pipeline (list[dict | callable]) – A sequence of data transforms.
test_mode (bool) – Store True when building test or validation dataset. Default: False.
- class mmpose.datasets.OneHand10KDataset(ann_file, img_prefix, data_cfg, pipeline, dataset_info=None, test_mode=False)[源代码]¶
OneHand10K dataset for top-down hand pose estimation.
“Mask-pose Cascaded CNN for 2D Hand Pose Estimation from Single Color Images”, TCSVT’2019. More details can be found in the paper .
The dataset loads raw features and apply specified transforms to return a dict containing the image tensors and other information.
OneHand10K keypoint indexes:
0: 'wrist', 1: 'thumb1', 2: 'thumb2', 3: 'thumb3', 4: 'thumb4', 5: 'forefinger1', 6: 'forefinger2', 7: 'forefinger3', 8: 'forefinger4', 9: 'middle_finger1', 10: 'middle_finger2', 11: 'middle_finger3', 12: 'middle_finger4', 13: 'ring_finger1', 14: 'ring_finger2', 15: 'ring_finger3', 16: 'ring_finger4', 17: 'pinky_finger1', 18: 'pinky_finger2', 19: 'pinky_finger3', 20: 'pinky_finger4'
- 参数
ann_file (str) – Path to the annotation file.
img_prefix (str) – Path to a directory where images are held. Default: None.
data_cfg (dict) – config
pipeline (list[dict | callable]) – A sequence of data transforms.
dataset_info (DatasetInfo) – A class containing all dataset info.
test_mode (bool) – Store True when building test or validation dataset. Default: False.
- evaluate(outputs, res_folder, metric='PCK', **kwargs)[源代码]¶
Evaluate onehand10k keypoint results. The pose prediction results will be saved in
${res_folder}/result_keypoints.json
.注解
batch_size: N
num_keypoints: K
heatmap height: H
heatmap width: W
- 参数
outputs (list[dict]) –
Outputs containing the following items.
preds (np.ndarray[N,K,3]): The first two dimensions are coordinates, score is the third dimension of the array.
boxes (np.ndarray[N,6]): [center[0], center[1], scale[0], scale[1],area, score]
image_paths (list[str]): For example, [‘Test/source/0.jpg’]
output_heatmap (np.ndarray[N, K, H, W]): model outputs.
res_folder (str) – Path of directory to save the results.
metric (str | list[str]) – Metric to be performed. Options: ‘PCK’, ‘AUC’, ‘EPE’.
- 返回
Evaluation results for evaluation metric.
- 返回类型
dict
- class mmpose.datasets.PanopticDataset(ann_file, img_prefix, data_cfg, pipeline, dataset_info=None, test_mode=False)[源代码]¶
Panoptic dataset for top-down hand pose estimation.
“Hand Keypoint Detection in Single Images using Multiview Bootstrapping”, CVPR’2017. More details can be found in the paper .
The dataset loads raw features and apply specified transforms to return a dict containing the image tensors and other information.
Panoptic keypoint indexes:
0: 'wrist', 1: 'thumb1', 2: 'thumb2', 3: 'thumb3', 4: 'thumb4', 5: 'forefinger1', 6: 'forefinger2', 7: 'forefinger3', 8: 'forefinger4', 9: 'middle_finger1', 10: 'middle_finger2', 11: 'middle_finger3', 12: 'middle_finger4', 13: 'ring_finger1', 14: 'ring_finger2', 15: 'ring_finger3', 16: 'ring_finger4', 17: 'pinky_finger1', 18: 'pinky_finger2', 19: 'pinky_finger3', 20: 'pinky_finger4'
- 参数
ann_file (str) – Path to the annotation file.
img_prefix (str) – Path to a directory where images are held. Default: None.
data_cfg (dict) – config
pipeline (list[dict | callable]) – A sequence of data transforms.
dataset_info (DatasetInfo) – A class containing all dataset info.
test_mode (bool) – Store True when building test or validation dataset. Default: False.
- evaluate(outputs, res_folder, metric='PCKh', **kwargs)[源代码]¶
Evaluate panoptic keypoint results. The pose prediction results will be saved in
${res_folder}/result_keypoints.json
.注解
batch_size: N
num_keypoints: K
heatmap height: H
heatmap width: W
- 参数
outputs (list[dict]) –
Outputs containing the following items.
preds (np.ndarray[N,K,3]): The first two dimensions are coordinates, score is the third dimension of the array.
boxes (np.ndarray[N,6]): [center[0], center[1], scale[0], scale[1],area, score]
image_paths (list[str]): For example, [‘hand_labels/ manual_test/000648952_02_l.jpg’]
output_heatmap (np.ndarray[N, K, H, W]): model outputs.
res_folder (str) – Path of directory to save the results.
metric (str | list[str]) – Metric to be performed. Options: ‘PCKh’, ‘AUC’, ‘EPE’.
- 返回
Evaluation results for evaluation metric.
- 返回类型
dict
- class mmpose.datasets.TopDownAicDataset(ann_file, img_prefix, data_cfg, pipeline, dataset_info=None, test_mode=False)[源代码]¶
AicDataset dataset for top-down pose estimation.
“AI Challenger : A Large-scale Dataset for Going Deeper in Image Understanding”, arXiv’2017. More details can be found in the paper
The dataset loads raw features and apply specified transforms to return a dict containing the image tensors and other information.
AIC keypoint indexes:
0: "right_shoulder", 1: "right_elbow", 2: "right_wrist", 3: "left_shoulder", 4: "left_elbow", 5: "left_wrist", 6: "right_hip", 7: "right_knee", 8: "right_ankle", 9: "left_hip", 10: "left_knee", 11: "left_ankle", 12: "head_top", 13: "neck"
- 参数
ann_file (str) – Path to the annotation file.
img_prefix (str) – Path to a directory where images are held. Default: None.
data_cfg (dict) – config
pipeline (list[dict | callable]) – A sequence of data transforms.
dataset_info (DatasetInfo) – A class containing all dataset info.
test_mode (bool) – Store True when building test or validation dataset. Default: False.
- class mmpose.datasets.TopDownCocoDataset(ann_file, img_prefix, data_cfg, pipeline, dataset_info=None, test_mode=False)[源代码]¶
CocoDataset dataset for top-down pose estimation.
“Microsoft COCO: Common Objects in Context”, ECCV’2014. More details can be found in the paper .
The dataset loads raw features and apply specified transforms to return a dict containing the image tensors and other information.
COCO keypoint indexes:
0: 'nose', 1: 'left_eye', 2: 'right_eye', 3: 'left_ear', 4: 'right_ear', 5: 'left_shoulder', 6: 'right_shoulder', 7: 'left_elbow', 8: 'right_elbow', 9: 'left_wrist', 10: 'right_wrist', 11: 'left_hip', 12: 'right_hip', 13: 'left_knee', 14: 'right_knee', 15: 'left_ankle', 16: 'right_ankle'
- 参数
ann_file (str) – Path to the annotation file.
img_prefix (str) – Path to a directory where images are held. Default: None.
data_cfg (dict) – config
pipeline (list[dict | callable]) – A sequence of data transforms.
dataset_info (DatasetInfo) – A class containing all dataset info.
test_mode (bool) – Store True when building test or validation dataset. Default: False.
- evaluate(outputs, res_folder, metric='mAP', **kwargs)[源代码]¶
Evaluate coco keypoint results. The pose prediction results will be saved in
${res_folder}/result_keypoints.json
.注解
batch_size: N
num_keypoints: K
heatmap height: H
heatmap width: W
- 参数
outputs (list[dict]) –
Outputs containing the following items.
preds (np.ndarray[N,K,3]): The first two dimensions are coordinates, score is the third dimension of the array.
boxes (np.ndarray[N,6]): [center[0], center[1], scale[0], scale[1],area, score]
image_paths (list[str]): For example, [‘data/coco/val2017 /000000393226.jpg’]
heatmap (np.ndarray[N, K, H, W]): model output heatmap
bbox_id (list(int)).
res_folder (str) – Path of directory to save the results.
metric (str | list[str]) – Metric to be performed. Defaults: ‘mAP’.
- 返回
Evaluation results for evaluation metric.
- 返回类型
dict
- class mmpose.datasets.TopDownCocoWholeBodyDataset(ann_file, img_prefix, data_cfg, pipeline, dataset_info=None, test_mode=False)[源代码]¶
CocoWholeBodyDataset dataset for top-down pose estimation.
“Whole-Body Human Pose Estimation in the Wild”, ECCV’2020. More details can be found in the paper .
The dataset loads raw features and apply specified transforms to return a dict containing the image tensors and other information.
COCO-WholeBody keypoint indexes:
0-16: 17 body keypoints, 17-22: 6 foot keypoints, 23-90: 68 face keypoints, 91-132: 42 hand keypoints In total, we have 133 keypoints for wholebody pose estimation.
- 参数
ann_file (str) – Path to the annotation file.
img_prefix (str) – Path to a directory where images are held. Default: None.
data_cfg (dict) – config
pipeline (list[dict | callable]) – A sequence of data transforms.
dataset_info (DatasetInfo) – A class containing all dataset info.
test_mode (bool) – Store True when building test or validation dataset. Default: False.
- class mmpose.datasets.TopDownCrowdPoseDataset(ann_file, img_prefix, data_cfg, pipeline, dataset_info=None, test_mode=False)[源代码]¶
CrowdPoseDataset dataset for top-down pose estimation.
“CrowdPose: Efficient Crowded Scenes Pose Estimation and A New Benchmark”, CVPR’2019. More details can be found in the paper.
The dataset loads raw features and apply specified transforms to return a dict containing the image tensors and other information.
CrowdPose keypoint indexes:
0: 'left_shoulder', 1: 'right_shoulder', 2: 'left_elbow', 3: 'right_elbow', 4: 'left_wrist', 5: 'right_wrist', 6: 'left_hip', 7: 'right_hip', 8: 'left_knee', 9: 'right_knee', 10: 'left_ankle', 11: 'right_ankle', 12: 'top_head', 13: 'neck'
- 参数
ann_file (str) – Path to the annotation file.
img_prefix (str) – Path to a directory where images are held. Default: None.
data_cfg (dict) – config
pipeline (list[dict | callable]) – A sequence of data transforms.
dataset_info (DatasetInfo) – A class containing all dataset info.
test_mode (bool) – Store True when building test or validation dataset. Default: False.
- class mmpose.datasets.TopDownFreiHandDataset(*args, **kwargs)[源代码]¶
Deprecated TopDownFreiHandDataset.
- class mmpose.datasets.TopDownH36MDataset(ann_file, img_prefix, data_cfg, pipeline, dataset_info=None, test_mode=False)[源代码]¶
Human3.6M dataset for top-down 2D pose estimation.
“Human3.6M: Large Scale Datasets and Predictive Methods for 3D Human Sensing in Natural Environments”, TPAMI`2014. More details can be found in the paper.
Human3.6M keypoint indexes:
0: 'root (pelvis)', 1: 'right_hip', 2: 'right_knee', 3: 'right_foot', 4: 'left_hip', 5: 'left_knee', 6: 'left_foot', 7: 'spine', 8: 'thorax', 9: 'neck_base', 10: 'head', 11: 'left_shoulder', 12: 'left_elbow', 13: 'left_wrist', 14: 'right_shoulder', 15: 'right_elbow', 16: 'right_wrist'
- 参数
ann_file (str) – Path to the annotation file.
img_prefix (str) – Path to a directory where images are held. Default: None.
data_cfg (dict) – config
pipeline (list[dict | callable]) – A sequence of data transforms.
dataset_info (DatasetInfo) – A class containing all dataset info.
test_mode (bool) – Store True when building test or validation dataset. Default: False.
- evaluate(outputs, res_folder, metric, **kwargs)[源代码]¶
Evaluate human3.6m 2d keypoint results. The pose prediction results will be saved in ${res_folder}/result_keypoints.json.
注解
batch_size: N
num_keypoints: K
heatmap height: H
heatmap width: W
- 参数
outputs (list[dict]) –
Outputs containing the following items.
- preds (np.ndarray[N,K,3]): The first two dimensions are
coordinates, score is the third dimension of the array.
- boxes (np.ndarray[N,6]): [center[0], center[1], scale[0],
scale[1],area, score]
- image_paths (list[str]): For example, [‘data/coco/val2017
/000000393226.jpg’]
heatmap (np.ndarray[N, K, H, W]): model output heatmap
bbox_id (list(int)).
res_folder (str) – Path of directory to save the results.
metric (str | list[str]) – Metric to be performed. Defaults: ‘mAP’.
- 返回
Evaluation results for evaluation metric.
- 返回类型
dict
- class mmpose.datasets.TopDownJhmdbDataset(ann_file, img_prefix, data_cfg, pipeline, dataset_info=None, test_mode=False)[源代码]¶
JhmdbDataset dataset for top-down pose estimation.
“Towards understanding action recognition”, ICCV’2013. More details can be found in the paper
The dataset loads raw features and apply specified transforms to return a dict containing the image tensors and other information.
sub-JHMDB keypoint indexes:
0: "neck", 1: "belly", 2: "head", 3: "right_shoulder", 4: "left_shoulder", 5: "right_hip", 6: "left_hip", 7: "right_elbow", 8: "left_elbow", 9: "right_knee", 10: "left_knee", 11: "right_wrist", 12: "left_wrist", 13: "right_ankle", 14: "left_ankle"
- 参数
ann_file (str) – Path to the annotation file.
img_prefix (str) – Path to a directory where images are held. Default: None.
data_cfg (dict) – config
pipeline (list[dict | callable]) – A sequence of data transforms.
dataset_info (DatasetInfo) – A class containing all dataset info.
test_mode (bool) – Store True when building test or validation dataset. Default: False.
- evaluate(outputs, res_folder, metric='PCK', **kwargs)[源代码]¶
Evaluate onehand10k keypoint results. The pose prediction results will be saved in ${res_folder}/result_keypoints.json.
注解
batch_size: N
num_keypoints: K
heatmap height: H
heatmap width: W
- 参数
outputs (list[dict]) –
Outputs containing the following items.
preds (np.ndarray[N,K,3]): The first two dimensions are coordinates, score is the third dimension of the array.
boxes (np.ndarray[N,6]): [center[0], center[1], scale[0], scale[1],area, score]
image_path (list[str])
output_heatmap (np.ndarray[N, K, H, W]): model outputs.
res_folder (str) – Path of directory to save the results.
metric (str | list[str]) – Metric to be performed. Options: ‘PCK’, ‘tPCK’. PCK means normalized by the bounding boxes, while tPCK means normalized by the torso size.
- 返回
Evaluation results for evaluation metric.
- 返回类型
dict
- class mmpose.datasets.TopDownMhpDataset(ann_file, img_prefix, data_cfg, pipeline, dataset_info=None, test_mode=False)[源代码]¶
MHPv2.0 dataset for top-down pose estimation.
“Understanding Humans in Crowded Scenes: Deep Nested Adversarial Learning and A New Benchmark for Multi-Human Parsing”, ACM MM’2018. More details can be found in the paper
Note that, the evaluation metric used here is mAP (adapted from COCO), which may be different from the official evaluation codes. ‘https://github.com/ZhaoJ9014/Multi-Human-Parsing/tree/master/’ ‘Evaluation/Multi-Human-Pose’ Please be cautious if you use the results in papers.
The dataset loads raw features and apply specified transforms to return a dict containing the image tensors and other information.
MHP keypoint indexes:
0: "right ankle", 1: "right knee", 2: "right hip", 3: "left hip", 4: "left knee", 5: "left ankle", 6: "pelvis", 7: "thorax", 8: "upper neck", 9: "head top", 10: "right wrist", 11: "right elbow", 12: "right shoulder", 13: "left shoulder", 14: "left elbow", 15: "left wrist",
- 参数
ann_file (str) – Path to the annotation file.
img_prefix (str) – Path to a directory where images are held. Default: None.
data_cfg (dict) – config
pipeline (list[dict | callable]) – A sequence of data transforms.
dataset_info (DatasetInfo) – A class containing all dataset info.
test_mode (bool) – Store True when building test or validation dataset. Default: False.
- class mmpose.datasets.TopDownMpiiDataset(ann_file, img_prefix, data_cfg, pipeline, dataset_info=None, test_mode=False)[源代码]¶
MPII Dataset for top-down pose estimation.
“2D Human Pose Estimation: New Benchmark and State of the Art Analysis” ,CVPR’2014. More details can be found in the paper .
The dataset loads raw features and apply specified transforms to return a dict containing the image tensors and other information.
MPII keypoint indexes:
0: 'right_ankle' 1: 'right_knee', 2: 'right_hip', 3: 'left_hip', 4: 'left_knee', 5: 'left_ankle', 6: 'pelvis', 7: 'thorax', 8: 'upper_neck', 9: 'head_top', 10: 'right_wrist', 11: 'right_elbow', 12: 'right_shoulder', 13: 'left_shoulder', 14: 'left_elbow', 15: 'left_wrist'
- 参数
ann_file (str) – Path to the annotation file.
img_prefix (str) – Path to a directory where images are held. Default: None.
data_cfg (dict) – config
pipeline (list[dict | callable]) – A sequence of data transforms.
dataset_info (DatasetInfo) – A class containing all dataset info.
test_mode (bool) – Store True when building test or validation dataset. Default: False.
- evaluate(outputs, res_folder, metric='PCKh', **kwargs)[源代码]¶
Evaluate PCKh for MPII dataset. Adapted from https://github.com/leoxiaobin/deep-high-resolution-net.pytorch Copyright (c) Microsoft, under the MIT License.
注解
batch_size: N
num_keypoints: K
heatmap height: H
heatmap width: W
- 参数
outputs (list[dict]) –
Outputs containing the following items.
preds (np.ndarray[N,K,3]): The first two dimensions are coordinates, score is the third dimension of the array.
boxes (np.ndarray[N,6]): [center[0], center[1], scale[0], scale[1],area, score]
image_paths (list[str]): For example, [‘/val2017/000000 397133.jpg’]
heatmap (np.ndarray[N, K, H, W]): model output heatmap.
res_folder (str) – Path of directory to save the results.
metric (str | list[str]) – Metrics to be performed. Defaults: ‘PCKh’.
- 返回
PCKh for each joint
- 返回类型
dict
- class mmpose.datasets.TopDownMpiiTrbDataset(ann_file, img_prefix, data_cfg, pipeline, dataset_info=None, test_mode=False)[源代码]¶
MPII-TRB Dataset dataset for top-down pose estimation.
“TRB: A Novel Triplet Representation for Understanding 2D Human Body”, ICCV’2019. More details can be found in the paper .
The dataset loads raw features and apply specified transforms to return a dict containing the image tensors and other information.
MPII-TRB keypoint indexes:
0: 'left_shoulder' 1: 'right_shoulder' 2: 'left_elbow' 3: 'right_elbow' 4: 'left_wrist' 5: 'right_wrist' 6: 'left_hip' 7: 'right_hip' 8: 'left_knee' 9: 'right_knee' 10: 'left_ankle' 11: 'right_ankle' 12: 'head' 13: 'neck' 14: 'right_neck' 15: 'left_neck' 16: 'medial_right_shoulder' 17: 'lateral_right_shoulder' 18: 'medial_right_bow' 19: 'lateral_right_bow' 20: 'medial_right_wrist' 21: 'lateral_right_wrist' 22: 'medial_left_shoulder' 23: 'lateral_left_shoulder' 24: 'medial_left_bow' 25: 'lateral_left_bow' 26: 'medial_left_wrist' 27: 'lateral_left_wrist' 28: 'medial_right_hip' 29: 'lateral_right_hip' 30: 'medial_right_knee' 31: 'lateral_right_knee' 32: 'medial_right_ankle' 33: 'lateral_right_ankle' 34: 'medial_left_hip' 35: 'lateral_left_hip' 36: 'medial_left_knee' 37: 'lateral_left_knee' 38: 'medial_left_ankle' 39: 'lateral_left_ankle'
- 参数
ann_file (str) – Path to the annotation file.
img_prefix (str) – Path to a directory where images are held. Default: None.
data_cfg (dict) – config
pipeline (list[dict | callable]) – A sequence of data transforms.
dataset_info (DatasetInfo) – A class containing all dataset info.
test_mode (bool) – Store True when building test or validation dataset. Default: False.
- evaluate(outputs, res_folder, metric='PCKh', **kwargs)[源代码]¶
Evaluate PCKh for MPII-TRB dataset.
注解
batch_size: N
num_keypoints: K
heatmap height: H
heatmap width: W
- 参数
outputs (list[dict]) –
Outputs containing the following items.
preds (np.ndarray[N,K,3]): The first two dimensions are coordinates, score is the third dimension of the array.
boxes (np.ndarray[N,6]): [center[0], center[1], scale[0], scale[1],area, score]
image_paths (list[str]): For example, [‘/val2017/ 000000397133.jpg’]
heatmap (np.ndarray[N, K, H, W]): model output heatmap.
bbox_ids (list[str]): For example, [‘27407’].
res_folder (str) – Path of directory to save the results.
metric (str | list[str]) – Metrics to be performed. Defaults: ‘PCKh’.
- 返回
PCKh for each joint
- 返回类型
dict
- class mmpose.datasets.TopDownOCHumanDataset(ann_file, img_prefix, data_cfg, pipeline, dataset_info=None, test_mode=False)[源代码]¶
OChuman dataset for top-down pose estimation.
“Pose2Seg: Detection Free Human Instance Segmentation”, CVPR’2019. More details can be found in the paper .
“Occluded Human (OCHuman)” dataset contains 8110 heavily occluded human instances within 4731 images. OCHuman dataset is designed for validation and testing. To evaluate on OCHuman, the model should be trained on COCO training set, and then test the robustness of the model to occlusion using OCHuman.
OCHuman keypoint indexes (same as COCO):
0: 'nose', 1: 'left_eye', 2: 'right_eye', 3: 'left_ear', 4: 'right_ear', 5: 'left_shoulder', 6: 'right_shoulder', 7: 'left_elbow', 8: 'right_elbow', 9: 'left_wrist', 10: 'right_wrist', 11: 'left_hip', 12: 'right_hip', 13: 'left_knee', 14: 'right_knee', 15: 'left_ankle', 16: 'right_ankle'
- 参数
ann_file (str) – Path to the annotation file.
img_prefix (str) – Path to a directory where images are held. Default: None.
data_cfg (dict) – config
pipeline (list[dict | callable]) – A sequence of data transforms.
dataset_info (DatasetInfo) – A class containing all dataset info.
test_mode (bool) – Store True when building test or validation dataset. Default: False.
- class mmpose.datasets.TopDownOneHand10KDataset(*args, **kwargs)[源代码]¶
Deprecated TopDownOneHand10KDataset.
- class mmpose.datasets.TopDownPanopticDataset(*args, **kwargs)[源代码]¶
Deprecated TopDownPanopticDataset.
- class mmpose.datasets.TopDownPoseTrack18Dataset(ann_file, img_prefix, data_cfg, pipeline, dataset_info=None, test_mode=False)[源代码]¶
PoseTrack18 dataset for top-down pose estimation.
“Posetrack: A benchmark for human pose estimation and tracking”, CVPR’2018. More details can be found in the paper .
The dataset loads raw features and apply specified transforms to return a dict containing the image tensors and other information.
PoseTrack2018 keypoint indexes:
0: 'nose', 1: 'head_bottom', 2: 'head_top', 3: 'left_ear', 4: 'right_ear', 5: 'left_shoulder', 6: 'right_shoulder', 7: 'left_elbow', 8: 'right_elbow', 9: 'left_wrist', 10: 'right_wrist', 11: 'left_hip', 12: 'right_hip', 13: 'left_knee', 14: 'right_knee', 15: 'left_ankle', 16: 'right_ankle'
- 参数
ann_file (str) – Path to the annotation file.
img_prefix (str) – Path to a directory where images are held. Default: None.
data_cfg (dict) – config
pipeline (list[dict | callable]) – A sequence of data transforms.
dataset_info (DatasetInfo) – A class containing all dataset info.
test_mode (bool) – Store True when building test or validation dataset. Default: False.
- evaluate(outputs, res_folder, metric='mAP', **kwargs)[源代码]¶
Evaluate posetrack keypoint results. The pose prediction results will be saved in
${res_folder}/result_keypoints.json
.注解
num_keypoints: K
- 参数
outputs (list[dict]) –
Outputs containing the following items.
preds (np.ndarray[N,K,3]): The first two dimensions are coordinates, score is the third dimension of the array.
boxes (np.ndarray[N,6]): [center[0], center[1], scale[0], scale[1],area, score]
image_paths (list[str]): For example, [‘val/010016_mpii_test /000024.jpg’]
heatmap (np.ndarray[N, K, H, W]): model output heatmap.
bbox_id (list(int))
res_folder (str) – Path of directory to save the results.
metric (str | list[str]) – Metric to be performed. Defaults: ‘mAP’.
- 返回
Evaluation results for evaluation metric.
- 返回类型
dict
- class mmpose.datasets.TopDownPoseTrack18VideoDataset(ann_file, img_prefix, data_cfg, pipeline, dataset_info=None, test_mode=False, ph_fill_len=6)[源代码]¶
PoseTrack18 dataset for top-down pose estimation.
“Posetrack: A benchmark for human pose estimation and tracking”, CVPR’2018. More details can be found in the paper .
The dataset loads raw features and apply specified transforms to return a dict containing the image tensors and other information.
PoseTrack2018 keypoint indexes:
0: 'nose', 1: 'head_bottom', 2: 'head_top', 3: 'left_ear', 4: 'right_ear', 5: 'left_shoulder', 6: 'right_shoulder', 7: 'left_elbow', 8: 'right_elbow', 9: 'left_wrist', 10: 'right_wrist', 11: 'left_hip', 12: 'right_hip', 13: 'left_knee', 14: 'right_knee', 15: 'left_ankle', 16: 'right_ankle'
- 参数
ann_file (str) – Path to the annotation file.
img_prefix (str) – Path to a directory where videos/images are held. Default: None.
data_cfg (dict) – config
pipeline (list[dict | callable]) – A sequence of data transforms.
dataset_info (DatasetInfo) – A class containing all dataset info.
test_mode (bool) – Store True when building test or validation dataset. Default: False.
ph_fill_len (int) – The length of the placeholder to fill in the image filenames, default: 6 in PoseTrack18.
- evaluate(outputs, res_folder, metric='mAP', **kwargs)[源代码]¶
Evaluate posetrack keypoint results. The pose prediction results will be saved in
${res_folder}/result_keypoints.json
.注解
num_keypoints: K
- 参数
outputs (list[dict]) –
Outputs containing the following items.
preds (np.ndarray[N,K,3]): The first two dimensions are coordinates, score is the third dimension of the array.
boxes (np.ndarray[N,6]): [center[0], center[1], scale[0], scale[1],area, score]
image_paths (list[str]): For example, [‘val/010016_mpii_test /000024.jpg’]
heatmap (np.ndarray[N, K, H, W]): model output heatmap.
bbox_id (list(int))
res_folder (str) – Path of directory to save the results.
metric (str | list[str]) – Metric to be performed. Defaults: ‘mAP’.
- 返回
Evaluation results for evaluation metric.
- 返回类型
dict
- mmpose.datasets.build_dataloader(dataset, samples_per_gpu, workers_per_gpu, num_gpus=1, dist=True, shuffle=True, seed=None, drop_last=True, pin_memory=True, **kwargs)[源代码]¶
Build PyTorch DataLoader.
In distributed training, each GPU/process has a dataloader. In non-distributed training, there is only one dataloader for all GPUs.
- 参数
dataset (Dataset) – A PyTorch dataset.
samples_per_gpu (int) – Number of training samples on each GPU, i.e., batch size of each GPU.
workers_per_gpu (int) – How many subprocesses to use for data loading for each GPU.
num_gpus (int) – Number of GPUs. Only used in non-distributed training.
dist (bool) – Distributed training/test or not. Default: True.
shuffle (bool) – Whether to shuffle the data at every epoch. Default: True.
drop_last (bool) – Whether to drop the last incomplete batch in epoch. Default: True
pin_memory (bool) – Whether to use pin_memory in DataLoader. Default: True
kwargs – any keyword argument to be used to initialize DataLoader
- 返回
A PyTorch dataloader.
- 返回类型
DataLoader
- mmpose.datasets.build_dataset(cfg, default_args=None)[源代码]¶
Build a dataset from config dict.
- 参数
cfg (dict) – Config dict. It should at least contain the key “type”.
default_args (dict, optional) – Default initialization arguments. Default: None.
- 返回
The constructed dataset.
- 返回类型
Dataset
datasets¶
- class mmpose.datasets.datasets.top_down.TopDownAicDataset(ann_file, img_prefix, data_cfg, pipeline, dataset_info=None, test_mode=False)[源代码]¶
AicDataset dataset for top-down pose estimation.
“AI Challenger : A Large-scale Dataset for Going Deeper in Image Understanding”, arXiv’2017. More details can be found in the paper
The dataset loads raw features and apply specified transforms to return a dict containing the image tensors and other information.
AIC keypoint indexes:
0: "right_shoulder", 1: "right_elbow", 2: "right_wrist", 3: "left_shoulder", 4: "left_elbow", 5: "left_wrist", 6: "right_hip", 7: "right_knee", 8: "right_ankle", 9: "left_hip", 10: "left_knee", 11: "left_ankle", 12: "head_top", 13: "neck"
- 参数
ann_file (str) – Path to the annotation file.
img_prefix (str) – Path to a directory where images are held. Default: None.
data_cfg (dict) – config
pipeline (list[dict | callable]) – A sequence of data transforms.
dataset_info (DatasetInfo) – A class containing all dataset info.
test_mode (bool) – Store True when building test or validation dataset. Default: False.
- class mmpose.datasets.datasets.top_down.TopDownCocoDataset(ann_file, img_prefix, data_cfg, pipeline, dataset_info=None, test_mode=False)[源代码]¶
CocoDataset dataset for top-down pose estimation.
“Microsoft COCO: Common Objects in Context”, ECCV’2014. More details can be found in the paper .
The dataset loads raw features and apply specified transforms to return a dict containing the image tensors and other information.
COCO keypoint indexes:
0: 'nose', 1: 'left_eye', 2: 'right_eye', 3: 'left_ear', 4: 'right_ear', 5: 'left_shoulder', 6: 'right_shoulder', 7: 'left_elbow', 8: 'right_elbow', 9: 'left_wrist', 10: 'right_wrist', 11: 'left_hip', 12: 'right_hip', 13: 'left_knee', 14: 'right_knee', 15: 'left_ankle', 16: 'right_ankle'
- 参数
ann_file (str) – Path to the annotation file.
img_prefix (str) – Path to a directory where images are held. Default: None.
data_cfg (dict) – config
pipeline (list[dict | callable]) – A sequence of data transforms.
dataset_info (DatasetInfo) – A class containing all dataset info.
test_mode (bool) – Store True when building test or validation dataset. Default: False.
- evaluate(outputs, res_folder, metric='mAP', **kwargs)[源代码]¶
Evaluate coco keypoint results. The pose prediction results will be saved in
${res_folder}/result_keypoints.json
.注解
batch_size: N
num_keypoints: K
heatmap height: H
heatmap width: W
- 参数
outputs (list[dict]) –
Outputs containing the following items.
preds (np.ndarray[N,K,3]): The first two dimensions are coordinates, score is the third dimension of the array.
boxes (np.ndarray[N,6]): [center[0], center[1], scale[0], scale[1],area, score]
image_paths (list[str]): For example, [‘data/coco/val2017 /000000393226.jpg’]
heatmap (np.ndarray[N, K, H, W]): model output heatmap
bbox_id (list(int)).
res_folder (str) – Path of directory to save the results.
metric (str | list[str]) – Metric to be performed. Defaults: ‘mAP’.
- 返回
Evaluation results for evaluation metric.
- 返回类型
dict
- class mmpose.datasets.datasets.top_down.TopDownCocoWholeBodyDataset(ann_file, img_prefix, data_cfg, pipeline, dataset_info=None, test_mode=False)[源代码]¶
CocoWholeBodyDataset dataset for top-down pose estimation.
“Whole-Body Human Pose Estimation in the Wild”, ECCV’2020. More details can be found in the paper .
The dataset loads raw features and apply specified transforms to return a dict containing the image tensors and other information.
COCO-WholeBody keypoint indexes:
0-16: 17 body keypoints, 17-22: 6 foot keypoints, 23-90: 68 face keypoints, 91-132: 42 hand keypoints In total, we have 133 keypoints for wholebody pose estimation.
- 参数
ann_file (str) – Path to the annotation file.
img_prefix (str) – Path to a directory where images are held. Default: None.
data_cfg (dict) – config
pipeline (list[dict | callable]) – A sequence of data transforms.
dataset_info (DatasetInfo) – A class containing all dataset info.
test_mode (bool) – Store True when building test or validation dataset. Default: False.
- class mmpose.datasets.datasets.top_down.TopDownCrowdPoseDataset(ann_file, img_prefix, data_cfg, pipeline, dataset_info=None, test_mode=False)[源代码]¶
CrowdPoseDataset dataset for top-down pose estimation.
“CrowdPose: Efficient Crowded Scenes Pose Estimation and A New Benchmark”, CVPR’2019. More details can be found in the paper.
The dataset loads raw features and apply specified transforms to return a dict containing the image tensors and other information.
CrowdPose keypoint indexes:
0: 'left_shoulder', 1: 'right_shoulder', 2: 'left_elbow', 3: 'right_elbow', 4: 'left_wrist', 5: 'right_wrist', 6: 'left_hip', 7: 'right_hip', 8: 'left_knee', 9: 'right_knee', 10: 'left_ankle', 11: 'right_ankle', 12: 'top_head', 13: 'neck'
- 参数
ann_file (str) – Path to the annotation file.
img_prefix (str) – Path to a directory where images are held. Default: None.
data_cfg (dict) – config
pipeline (list[dict | callable]) – A sequence of data transforms.
dataset_info (DatasetInfo) – A class containing all dataset info.
test_mode (bool) – Store True when building test or validation dataset. Default: False.
- class mmpose.datasets.datasets.top_down.TopDownH36MDataset(ann_file, img_prefix, data_cfg, pipeline, dataset_info=None, test_mode=False)[源代码]¶
Human3.6M dataset for top-down 2D pose estimation.
“Human3.6M: Large Scale Datasets and Predictive Methods for 3D Human Sensing in Natural Environments”, TPAMI`2014. More details can be found in the paper.
Human3.6M keypoint indexes:
0: 'root (pelvis)', 1: 'right_hip', 2: 'right_knee', 3: 'right_foot', 4: 'left_hip', 5: 'left_knee', 6: 'left_foot', 7: 'spine', 8: 'thorax', 9: 'neck_base', 10: 'head', 11: 'left_shoulder', 12: 'left_elbow', 13: 'left_wrist', 14: 'right_shoulder', 15: 'right_elbow', 16: 'right_wrist'
- 参数
ann_file (str) – Path to the annotation file.
img_prefix (str) – Path to a directory where images are held. Default: None.
data_cfg (dict) – config
pipeline (list[dict | callable]) – A sequence of data transforms.
dataset_info (DatasetInfo) – A class containing all dataset info.
test_mode (bool) – Store True when building test or validation dataset. Default: False.
- evaluate(outputs, res_folder, metric, **kwargs)[源代码]¶
Evaluate human3.6m 2d keypoint results. The pose prediction results will be saved in ${res_folder}/result_keypoints.json.
注解
batch_size: N
num_keypoints: K
heatmap height: H
heatmap width: W
- 参数
outputs (list[dict]) –
Outputs containing the following items.
- preds (np.ndarray[N,K,3]): The first two dimensions are
coordinates, score is the third dimension of the array.
- boxes (np.ndarray[N,6]): [center[0], center[1], scale[0],
scale[1],area, score]
- image_paths (list[str]): For example, [‘data/coco/val2017
/000000393226.jpg’]
heatmap (np.ndarray[N, K, H, W]): model output heatmap
bbox_id (list(int)).
res_folder (str) – Path of directory to save the results.
metric (str | list[str]) – Metric to be performed. Defaults: ‘mAP’.
- 返回
Evaluation results for evaluation metric.
- 返回类型
dict
- class mmpose.datasets.datasets.top_down.TopDownHalpeDataset(ann_file, img_prefix, data_cfg, pipeline, dataset_info=None, test_mode=False)[源代码]¶
HalpeDataset for top-down pose estimation.
‘https://github.com/Fang-Haoshu/Halpe-FullBody’
The dataset loads raw features and apply specified transforms to return a dict containing the image tensors and other information.
Halpe keypoint indexes:
0-19: 20 body keypoints, 20-25: 6 foot keypoints, 26-93: 68 face keypoints, 94-135: 42 hand keypoints In total, we have 136 keypoints for wholebody pose estimation.
- 参数
ann_file (str) – Path to the annotation file.
img_prefix (str) – Path to a directory where images are held. Default: None.
data_cfg (dict) – config
pipeline (list[dict | callable]) – A sequence of data transforms.
dataset_info (DatasetInfo) – A class containing all dataset info.
test_mode (bool) – Store True when building test or validation dataset. Default: False.
- class mmpose.datasets.datasets.top_down.TopDownJhmdbDataset(ann_file, img_prefix, data_cfg, pipeline, dataset_info=None, test_mode=False)[源代码]¶
JhmdbDataset dataset for top-down pose estimation.
“Towards understanding action recognition”, ICCV’2013. More details can be found in the paper
The dataset loads raw features and apply specified transforms to return a dict containing the image tensors and other information.
sub-JHMDB keypoint indexes:
0: "neck", 1: "belly", 2: "head", 3: "right_shoulder", 4: "left_shoulder", 5: "right_hip", 6: "left_hip", 7: "right_elbow", 8: "left_elbow", 9: "right_knee", 10: "left_knee", 11: "right_wrist", 12: "left_wrist", 13: "right_ankle", 14: "left_ankle"
- 参数
ann_file (str) – Path to the annotation file.
img_prefix (str) – Path to a directory where images are held. Default: None.
data_cfg (dict) – config
pipeline (list[dict | callable]) – A sequence of data transforms.
dataset_info (DatasetInfo) – A class containing all dataset info.
test_mode (bool) – Store True when building test or validation dataset. Default: False.
- evaluate(outputs, res_folder, metric='PCK', **kwargs)[源代码]¶
Evaluate onehand10k keypoint results. The pose prediction results will be saved in ${res_folder}/result_keypoints.json.
注解
batch_size: N
num_keypoints: K
heatmap height: H
heatmap width: W
- 参数
outputs (list[dict]) –
Outputs containing the following items.
preds (np.ndarray[N,K,3]): The first two dimensions are coordinates, score is the third dimension of the array.
boxes (np.ndarray[N,6]): [center[0], center[1], scale[0], scale[1],area, score]
image_path (list[str])
output_heatmap (np.ndarray[N, K, H, W]): model outputs.
res_folder (str) – Path of directory to save the results.
metric (str | list[str]) – Metric to be performed. Options: ‘PCK’, ‘tPCK’. PCK means normalized by the bounding boxes, while tPCK means normalized by the torso size.
- 返回
Evaluation results for evaluation metric.
- 返回类型
dict
- class mmpose.datasets.datasets.top_down.TopDownMhpDataset(ann_file, img_prefix, data_cfg, pipeline, dataset_info=None, test_mode=False)[源代码]¶
MHPv2.0 dataset for top-down pose estimation.
“Understanding Humans in Crowded Scenes: Deep Nested Adversarial Learning and A New Benchmark for Multi-Human Parsing”, ACM MM’2018. More details can be found in the paper
Note that, the evaluation metric used here is mAP (adapted from COCO), which may be different from the official evaluation codes. ‘https://github.com/ZhaoJ9014/Multi-Human-Parsing/tree/master/’ ‘Evaluation/Multi-Human-Pose’ Please be cautious if you use the results in papers.
The dataset loads raw features and apply specified transforms to return a dict containing the image tensors and other information.
MHP keypoint indexes:
0: "right ankle", 1: "right knee", 2: "right hip", 3: "left hip", 4: "left knee", 5: "left ankle", 6: "pelvis", 7: "thorax", 8: "upper neck", 9: "head top", 10: "right wrist", 11: "right elbow", 12: "right shoulder", 13: "left shoulder", 14: "left elbow", 15: "left wrist",
- 参数
ann_file (str) – Path to the annotation file.
img_prefix (str) – Path to a directory where images are held. Default: None.
data_cfg (dict) – config
pipeline (list[dict | callable]) – A sequence of data transforms.
dataset_info (DatasetInfo) – A class containing all dataset info.
test_mode (bool) – Store True when building test or validation dataset. Default: False.
- class mmpose.datasets.datasets.top_down.TopDownMpiiDataset(ann_file, img_prefix, data_cfg, pipeline, dataset_info=None, test_mode=False)[源代码]¶
MPII Dataset for top-down pose estimation.
“2D Human Pose Estimation: New Benchmark and State of the Art Analysis” ,CVPR’2014. More details can be found in the paper .
The dataset loads raw features and apply specified transforms to return a dict containing the image tensors and other information.
MPII keypoint indexes:
0: 'right_ankle' 1: 'right_knee', 2: 'right_hip', 3: 'left_hip', 4: 'left_knee', 5: 'left_ankle', 6: 'pelvis', 7: 'thorax', 8: 'upper_neck', 9: 'head_top', 10: 'right_wrist', 11: 'right_elbow', 12: 'right_shoulder', 13: 'left_shoulder', 14: 'left_elbow', 15: 'left_wrist'
- 参数
ann_file (str) – Path to the annotation file.
img_prefix (str) – Path to a directory where images are held. Default: None.
data_cfg (dict) – config
pipeline (list[dict | callable]) – A sequence of data transforms.
dataset_info (DatasetInfo) – A class containing all dataset info.
test_mode (bool) – Store True when building test or validation dataset. Default: False.
- evaluate(outputs, res_folder, metric='PCKh', **kwargs)[源代码]¶
Evaluate PCKh for MPII dataset. Adapted from https://github.com/leoxiaobin/deep-high-resolution-net.pytorch Copyright (c) Microsoft, under the MIT License.
注解
batch_size: N
num_keypoints: K
heatmap height: H
heatmap width: W
- 参数
outputs (list[dict]) –
Outputs containing the following items.
preds (np.ndarray[N,K,3]): The first two dimensions are coordinates, score is the third dimension of the array.
boxes (np.ndarray[N,6]): [center[0], center[1], scale[0], scale[1],area, score]
image_paths (list[str]): For example, [‘/val2017/000000 397133.jpg’]
heatmap (np.ndarray[N, K, H, W]): model output heatmap.
res_folder (str) – Path of directory to save the results.
metric (str | list[str]) – Metrics to be performed. Defaults: ‘PCKh’.
- 返回
PCKh for each joint
- 返回类型
dict
- class mmpose.datasets.datasets.top_down.TopDownMpiiTrbDataset(ann_file, img_prefix, data_cfg, pipeline, dataset_info=None, test_mode=False)[源代码]¶
MPII-TRB Dataset dataset for top-down pose estimation.
“TRB: A Novel Triplet Representation for Understanding 2D Human Body”, ICCV’2019. More details can be found in the paper .
The dataset loads raw features and apply specified transforms to return a dict containing the image tensors and other information.
MPII-TRB keypoint indexes:
0: 'left_shoulder' 1: 'right_shoulder' 2: 'left_elbow' 3: 'right_elbow' 4: 'left_wrist' 5: 'right_wrist' 6: 'left_hip' 7: 'right_hip' 8: 'left_knee' 9: 'right_knee' 10: 'left_ankle' 11: 'right_ankle' 12: 'head' 13: 'neck' 14: 'right_neck' 15: 'left_neck' 16: 'medial_right_shoulder' 17: 'lateral_right_shoulder' 18: 'medial_right_bow' 19: 'lateral_right_bow' 20: 'medial_right_wrist' 21: 'lateral_right_wrist' 22: 'medial_left_shoulder' 23: 'lateral_left_shoulder' 24: 'medial_left_bow' 25: 'lateral_left_bow' 26: 'medial_left_wrist' 27: 'lateral_left_wrist' 28: 'medial_right_hip' 29: 'lateral_right_hip' 30: 'medial_right_knee' 31: 'lateral_right_knee' 32: 'medial_right_ankle' 33: 'lateral_right_ankle' 34: 'medial_left_hip' 35: 'lateral_left_hip' 36: 'medial_left_knee' 37: 'lateral_left_knee' 38: 'medial_left_ankle' 39: 'lateral_left_ankle'
- 参数
ann_file (str) – Path to the annotation file.
img_prefix (str) – Path to a directory where images are held. Default: None.
data_cfg (dict) – config
pipeline (list[dict | callable]) – A sequence of data transforms.
dataset_info (DatasetInfo) – A class containing all dataset info.
test_mode (bool) – Store True when building test or validation dataset. Default: False.
- evaluate(outputs, res_folder, metric='PCKh', **kwargs)[源代码]¶
Evaluate PCKh for MPII-TRB dataset.
注解
batch_size: N
num_keypoints: K
heatmap height: H
heatmap width: W
- 参数
outputs (list[dict]) –
Outputs containing the following items.
preds (np.ndarray[N,K,3]): The first two dimensions are coordinates, score is the third dimension of the array.
boxes (np.ndarray[N,6]): [center[0], center[1], scale[0], scale[1],area, score]
image_paths (list[str]): For example, [‘/val2017/ 000000397133.jpg’]
heatmap (np.ndarray[N, K, H, W]): model output heatmap.
bbox_ids (list[str]): For example, [‘27407’].
res_folder (str) – Path of directory to save the results.
metric (str | list[str]) – Metrics to be performed. Defaults: ‘PCKh’.
- 返回
PCKh for each joint
- 返回类型
dict
- class mmpose.datasets.datasets.top_down.TopDownOCHumanDataset(ann_file, img_prefix, data_cfg, pipeline, dataset_info=None, test_mode=False)[源代码]¶
OChuman dataset for top-down pose estimation.
“Pose2Seg: Detection Free Human Instance Segmentation”, CVPR’2019. More details can be found in the paper .
“Occluded Human (OCHuman)” dataset contains 8110 heavily occluded human instances within 4731 images. OCHuman dataset is designed for validation and testing. To evaluate on OCHuman, the model should be trained on COCO training set, and then test the robustness of the model to occlusion using OCHuman.
OCHuman keypoint indexes (same as COCO):
0: 'nose', 1: 'left_eye', 2: 'right_eye', 3: 'left_ear', 4: 'right_ear', 5: 'left_shoulder', 6: 'right_shoulder', 7: 'left_elbow', 8: 'right_elbow', 9: 'left_wrist', 10: 'right_wrist', 11: 'left_hip', 12: 'right_hip', 13: 'left_knee', 14: 'right_knee', 15: 'left_ankle', 16: 'right_ankle'
- 参数
ann_file (str) – Path to the annotation file.
img_prefix (str) – Path to a directory where images are held. Default: None.
data_cfg (dict) – config
pipeline (list[dict | callable]) – A sequence of data transforms.
dataset_info (DatasetInfo) – A class containing all dataset info.
test_mode (bool) – Store True when building test or validation dataset. Default: False.
- class mmpose.datasets.datasets.top_down.TopDownPoseTrack18Dataset(ann_file, img_prefix, data_cfg, pipeline, dataset_info=None, test_mode=False)[源代码]¶
PoseTrack18 dataset for top-down pose estimation.
“Posetrack: A benchmark for human pose estimation and tracking”, CVPR’2018. More details can be found in the paper .
The dataset loads raw features and apply specified transforms to return a dict containing the image tensors and other information.
PoseTrack2018 keypoint indexes:
0: 'nose', 1: 'head_bottom', 2: 'head_top', 3: 'left_ear', 4: 'right_ear', 5: 'left_shoulder', 6: 'right_shoulder', 7: 'left_elbow', 8: 'right_elbow', 9: 'left_wrist', 10: 'right_wrist', 11: 'left_hip', 12: 'right_hip', 13: 'left_knee', 14: 'right_knee', 15: 'left_ankle', 16: 'right_ankle'
- 参数
ann_file (str) – Path to the annotation file.
img_prefix (str) – Path to a directory where images are held. Default: None.
data_cfg (dict) – config
pipeline (list[dict | callable]) – A sequence of data transforms.
dataset_info (DatasetInfo) – A class containing all dataset info.
test_mode (bool) – Store True when building test or validation dataset. Default: False.
- evaluate(outputs, res_folder, metric='mAP', **kwargs)[源代码]¶
Evaluate posetrack keypoint results. The pose prediction results will be saved in
${res_folder}/result_keypoints.json
.注解
num_keypoints: K
- 参数
outputs (list[dict]) –
Outputs containing the following items.
preds (np.ndarray[N,K,3]): The first two dimensions are coordinates, score is the third dimension of the array.
boxes (np.ndarray[N,6]): [center[0], center[1], scale[0], scale[1],area, score]
image_paths (list[str]): For example, [‘val/010016_mpii_test /000024.jpg’]
heatmap (np.ndarray[N, K, H, W]): model output heatmap.
bbox_id (list(int))
res_folder (str) – Path of directory to save the results.
metric (str | list[str]) – Metric to be performed. Defaults: ‘mAP’.
- 返回
Evaluation results for evaluation metric.
- 返回类型
dict
- class mmpose.datasets.datasets.top_down.TopDownPoseTrack18VideoDataset(ann_file, img_prefix, data_cfg, pipeline, dataset_info=None, test_mode=False, ph_fill_len=6)[源代码]¶
PoseTrack18 dataset for top-down pose estimation.
“Posetrack: A benchmark for human pose estimation and tracking”, CVPR’2018. More details can be found in the paper .
The dataset loads raw features and apply specified transforms to return a dict containing the image tensors and other information.
PoseTrack2018 keypoint indexes:
0: 'nose', 1: 'head_bottom', 2: 'head_top', 3: 'left_ear', 4: 'right_ear', 5: 'left_shoulder', 6: 'right_shoulder', 7: 'left_elbow', 8: 'right_elbow', 9: 'left_wrist', 10: 'right_wrist', 11: 'left_hip', 12: 'right_hip', 13: 'left_knee', 14: 'right_knee', 15: 'left_ankle', 16: 'right_ankle'
- 参数
ann_file (str) – Path to the annotation file.
img_prefix (str) – Path to a directory where videos/images are held. Default: None.
data_cfg (dict) – config
pipeline (list[dict | callable]) – A sequence of data transforms.
dataset_info (DatasetInfo) – A class containing all dataset info.
test_mode (bool) – Store True when building test or validation dataset. Default: False.
ph_fill_len (int) – The length of the placeholder to fill in the image filenames, default: 6 in PoseTrack18.
- evaluate(outputs, res_folder, metric='mAP', **kwargs)[源代码]¶
Evaluate posetrack keypoint results. The pose prediction results will be saved in
${res_folder}/result_keypoints.json
.注解
num_keypoints: K
- 参数
outputs (list[dict]) –
Outputs containing the following items.
preds (np.ndarray[N,K,3]): The first two dimensions are coordinates, score is the third dimension of the array.
boxes (np.ndarray[N,6]): [center[0], center[1], scale[0], scale[1],area, score]
image_paths (list[str]): For example, [‘val/010016_mpii_test /000024.jpg’]
heatmap (np.ndarray[N, K, H, W]): model output heatmap.
bbox_id (list(int))
res_folder (str) – Path of directory to save the results.
metric (str | list[str]) – Metric to be performed. Defaults: ‘mAP’.
- 返回
Evaluation results for evaluation metric.
- 返回类型
dict
- class mmpose.datasets.datasets.bottom_up.BottomUpAicDataset(ann_file, img_prefix, data_cfg, pipeline, dataset_info=None, test_mode=False)[源代码]¶
Aic dataset for bottom-up pose estimation.
“AI Challenger : A Large-scale Dataset for Going Deeper in Image Understanding”, arXiv’2017. More details can be found in the paper
The dataset loads raw features and apply specified transforms to return a dict containing the image tensors and other information.
AIC keypoint indexes:
0: "right_shoulder", 1: "right_elbow", 2: "right_wrist", 3: "left_shoulder", 4: "left_elbow", 5: "left_wrist", 6: "right_hip", 7: "right_knee", 8: "right_ankle", 9: "left_hip", 10: "left_knee", 11: "left_ankle", 12: "head_top", 13: "neck"
- 参数
ann_file (str) – Path to the annotation file.
img_prefix (str) – Path to a directory where images are held. Default: None.
data_cfg (dict) – config
pipeline (list[dict | callable]) – A sequence of data transforms.
dataset_info (DatasetInfo) – A class containing all dataset info.
test_mode (bool) – Store True when building test or validation dataset. Default: False.
- class mmpose.datasets.datasets.bottom_up.BottomUpCocoDataset(ann_file, img_prefix, data_cfg, pipeline, dataset_info=None, test_mode=False)[源代码]¶
COCO dataset for bottom-up pose estimation.
The dataset loads raw features and apply specified transforms to return a dict containing the image tensors and other information.
COCO keypoint indexes:
0: 'nose', 1: 'left_eye', 2: 'right_eye', 3: 'left_ear', 4: 'right_ear', 5: 'left_shoulder', 6: 'right_shoulder', 7: 'left_elbow', 8: 'right_elbow', 9: 'left_wrist', 10: 'right_wrist', 11: 'left_hip', 12: 'right_hip', 13: 'left_knee', 14: 'right_knee', 15: 'left_ankle', 16: 'right_ankle'
- 参数
ann_file (str) – Path to the annotation file.
img_prefix (str) – Path to a directory where images are held. Default: None.
data_cfg (dict) – config
pipeline (list[dict | callable]) – A sequence of data transforms.
dataset_info (DatasetInfo) – A class containing all dataset info.
test_mode (bool) – Store True when building test or validation dataset. Default: False.
- evaluate(outputs, res_folder, metric='mAP', **kwargs)[源代码]¶
Evaluate coco keypoint results. The pose prediction results will be saved in
${res_folder}/result_keypoints.json
.注解
num_people: P
num_keypoints: K
- 参数
outputs (list[dict]) –
Outputs containing the following items.
preds (list[np.ndarray(P, K, 3+tag_num)]): Pose predictions for all people in images.
scores (list[P]): List of person scores.
image_path (list[str]): For example, [‘coco/images/ val2017/000000397133.jpg’]
heatmap (np.ndarray[N, K, H, W]): model outputs.
res_folder (str) – Path of directory to save the results.
metric (str | list[str]) – Metric to be performed. Defaults: ‘mAP’.
- 返回
Evaluation results for evaluation metric.
- 返回类型
dict
- class mmpose.datasets.datasets.bottom_up.BottomUpCocoWholeBodyDataset(ann_file, img_prefix, data_cfg, pipeline, dataset_info=None, test_mode=False)[源代码]¶
CocoWholeBodyDataset dataset for bottom-up pose estimation.
Whole-Body Human Pose Estimation in the Wild’, ECCV’2020. More details can be found in the `paper .
The dataset loads raw features and apply specified transforms to return a dict containing the image tensors and other information.
In total, we have 133 keypoints for wholebody pose estimation.
COCO-WholeBody keypoint indexes:
0-16: 17 body keypoints, 17-22: 6 foot keypoints, 23-90: 68 face keypoints, 91-132: 42 hand keypoints
- 参数
ann_file (str) – Path to the annotation file.
img_prefix (str) – Path to a directory where images are held. Default: None.
data_cfg (dict) – config
pipeline (list[dict | callable]) – A sequence of data transforms.
dataset_info (DatasetInfo) – A class containing all dataset info.
test_mode (bool) – Store True when building test or validation dataset. Default: False.
- class mmpose.datasets.datasets.bottom_up.BottomUpCrowdPoseDataset(ann_file, img_prefix, data_cfg, pipeline, dataset_info=None, test_mode=False)[源代码]¶
CrowdPose dataset for bottom-up pose estimation.
“CrowdPose: Efficient Crowded Scenes Pose Estimation and A New Benchmark”, CVPR’2019. More details can be found in the paper.
The dataset loads raw features and apply specified transforms to return a dict containing the image tensors and other information.
CrowdPose keypoint indexes:
0: 'left_shoulder', 1: 'right_shoulder', 2: 'left_elbow', 3: 'right_elbow', 4: 'left_wrist', 5: 'right_wrist', 6: 'left_hip', 7: 'right_hip', 8: 'left_knee', 9: 'right_knee', 10: 'left_ankle', 11: 'right_ankle', 12: 'top_head', 13: 'neck'
- 参数
ann_file (str) – Path to the annotation file.
img_prefix (str) – Path to a directory where images are held. Default: None.
data_cfg (dict) – config
pipeline (list[dict | callable]) – A sequence of data transforms.
dataset_info (DatasetInfo) – A class containing all dataset info.
test_mode (bool) – Store True when building test or validation dataset. Default: False.
- class mmpose.datasets.datasets.bottom_up.BottomUpMhpDataset(ann_file, img_prefix, data_cfg, pipeline, dataset_info=None, test_mode=False)[源代码]¶
MHPv2.0 dataset for top-down pose estimation.
“Understanding Humans in Crowded Scenes: Deep Nested Adversarial Learning and A New Benchmark for Multi-Human Parsing”, ACM MM’2018. More details can be found in the paper
The dataset loads raw features and apply specified transforms to return a dict containing the image tensors and other information.
MHP keypoint indexes:
0: "right ankle", 1: "right knee", 2: "right hip", 3: "left hip", 4: "left knee", 5: "left ankle", 6: "pelvis", 7: "thorax", 8: "upper neck", 9: "head top", 10: "right wrist", 11: "right elbow", 12: "right shoulder", 13: "left shoulder", 14: "left elbow", 15: "left wrist",
- 参数
ann_file (str) – Path to the annotation file.
img_prefix (str) – Path to a directory where images are held. Default: None.
data_cfg (dict) – config
pipeline (list[dict | callable]) – A sequence of data transforms.
dataset_info (DatasetInfo) – A class containing all dataset info.
test_mode (bool) – Store True when building test or validation dataset. Default: False.
pipelines¶
- class mmpose.datasets.pipelines.loading.LoadImageFromFile(to_float32=False, color_type='color', channel_order='rgb', file_client_args={'backend': 'disk'})[源代码]¶
Loading image(s) from file.
Required key: “image_file”.
Added key: “img”.
- 参数
to_float32 (bool) – Whether to convert the loaded image to a float32 numpy array. If set to False, the loaded image is an uint8 array. Defaults to False.
color_type (str) – Flags specifying the color type of a loaded image, candidates are ‘color’, ‘grayscale’ and ‘unchanged’.
channel_order (str) – Order of channel, candidates are ‘bgr’ and ‘rgb’.
file_client_args (dict) – Arguments to instantiate a FileClient. See
mmcv.fileio.FileClient
for details. Defaults todict(backend='disk')
.
Albumentation augmentation (pixel-level transforms only). Adds custom pixel-level transformations from Albumentations library. Please visit https://albumentations.readthedocs.io to get more information.
Note: we only support pixel-level transforms. Please visit https://github.com/albumentations-team/ albumentations#pixel-level-transforms to get more information about pixel-level transforms.
An example of
transforms
is as followed:[ dict( type='RandomBrightnessContrast', brightness_limit=[0.1, 0.3], contrast_limit=[0.1, 0.3], p=0.2), dict(type='ChannelShuffle', p=0.1), dict( type='OneOf', transforms=[ dict(type='Blur', blur_limit=3, p=1.0), dict(type='MedianBlur', blur_limit=3, p=1.0) ], p=0.1), ]
- 参数
transforms (list[dict]) – A list of Albumentation transformations
keymap (dict) – Contains {‘input key’:’albumentation-style key’}, e.g., {‘img’: ‘image’}.
Import a module from albumentations.
It resembles some of
build_from_cfg()
logic.- 参数
cfg (dict) – Config dict. It should at least contain the key “type”.
- 返回
The constructed object.
- 返回类型
obj
Dictionary mapper.
Renames keys according to keymap provided.
- 参数
d (dict) – old dict
keymap (dict) – {‘old_key’:’new_key’}
- 返回
new dict.
- 返回类型
dict
Collect data from the loader relevant to the specific task.
This keeps the items in keys as it is, and collect items in meta_keys into a meta item called meta_name.This is usually the last stage of the data loader pipeline. For example, when keys=’imgs’, meta_keys=(‘filename’, ‘label’, ‘original_shape’), meta_name=’img_metas’, the results will be a dict with keys ‘imgs’ and ‘img_metas’, where ‘img_metas’ is a DataContainer of another dict with keys ‘filename’, ‘label’, ‘original_shape’.
- 参数
keys (Sequence[str|tuple]) – Required keys to be collected. If a tuple (key, key_new) is given as an element, the item retrieved by key will be renamed as key_new in collected data.
meta_name (str) – The name of the key that contains meta information. This key is always populated. Default: “img_metas”.
meta_keys (Sequence[str|tuple]) – Keys that are collected under meta_name. The contents of the meta_name dictionary depends on meta_keys.
Compose a data pipeline with a sequence of transforms.
- 参数
transforms (list[dict | callable]) – Either config dicts of transforms or transform objects.
Process each item and merge multi-item results to lists.
- 参数
pipeline (dict) – Dictionary to construct pipeline for a single item.
Gather the targets for multitask heads.
- 参数
pipeline_list (list[list]) – List of pipelines for all heads.
pipeline_indices (list[int]) – Pipeline index of each head.
Normalize the Tensor image (CxHxW), with mean and std.
Required key: ‘img’. Modifies key: ‘img’.
- 参数
mean (list[float]) – Mean values of 3 channels.
std (list[float]) – Std values of 3 channels.
Apply photometric distortion to image sequentially, every transformation is applied with a probability of 0.5. The position of random contrast is in second or second to last.
random brightness
random contrast (mode 0)
convert color from BGR to HSV
random saturation
random hue
convert color from HSV to BGR
random contrast (mode 1)
randomly swap channels
- 参数
brightness_delta (int) – delta of brightness.
contrast_range (tuple) – range of contrast.
saturation_range (tuple) – range of saturation.
hue_delta (int) – delta of hue.
Brightness distortion.
Contrast distortion.
Multiple with alpha and add beta with clip.
Rename the keys.
- 参数
key_pairs (Sequence[tuple]) – Required keys to be renamed. If a tuple (key_src, key_tgt) is given as an element, the item retrieved by key_src will be renamed as key_tgt.
Transform image to Tensor.
Required key: ‘img’. Modifies key: ‘img’.
- 参数
results (dict) – contain all information about training.
- class mmpose.datasets.pipelines.top_down_transform.TopDownAffine(use_udp=False)[源代码]¶
Affine transform the image to make input.
Required keys:’img’, ‘joints_3d’, ‘joints_3d_visible’, ‘ann_info’,’scale’, ‘rotation’ and ‘center’.
Modified keys:’img’, ‘joints_3d’, and ‘joints_3d_visible’.
- 参数
use_udp (bool) – To use unbiased data processing. Paper ref: Huang et al. The Devil is in the Details: Delving into Unbiased Data Processing for Human Pose Estimation (CVPR 2020).
- class mmpose.datasets.pipelines.top_down_transform.TopDownGenerateTarget(sigma=2, kernel=(11, 11), valid_radius_factor=0.0546875, target_type='GaussianHeatmap', encoding='MSRA', unbiased_encoding=False)[源代码]¶
Generate the target heatmap.
Required keys: ‘joints_3d’, ‘joints_3d_visible’, ‘ann_info’.
Modified keys: ‘target’, and ‘target_weight’.
- 参数
sigma – Sigma of heatmap gaussian for ‘MSRA’ approach.
kernel – Kernel of heatmap gaussian for ‘Megvii’ approach.
encoding (str) – Approach to generate target heatmaps. Currently supported approaches: ‘MSRA’, ‘Megvii’, ‘UDP’. Default:’MSRA’
unbiased_encoding (bool) – Option to use unbiased encoding methods. Paper ref: Zhang et al. Distribution-Aware Coordinate Representation for Human Pose Estimation (CVPR 2020).
keypoint_pose_distance – Keypoint pose distance for UDP. Paper ref: Huang et al. The Devil is in the Details: Delving into Unbiased Data Processing for Human Pose Estimation (CVPR 2020).
target_type (str) – supported targets: ‘GaussianHeatmap’, ‘CombinedTarget’. Default:’GaussianHeatmap’ CombinedTarget: The combination of classification target (response map) and regression target (offset map). Paper ref: Huang et al. The Devil is in the Details: Delving into Unbiased Data Processing for Human Pose Estimation (CVPR 2020).
- class mmpose.datasets.pipelines.top_down_transform.TopDownGenerateTargetRegression[源代码]¶
Generate the target regression vector (coordinates).
Required keys: ‘joints_3d’, ‘joints_3d_visible’, ‘ann_info’. Modified keys: ‘target’, and ‘target_weight’.
- class mmpose.datasets.pipelines.top_down_transform.TopDownGetRandomScaleRotation(rot_factor=40, scale_factor=0.5, rot_prob=0.6)[源代码]¶
Data augmentation with random scaling & rotating.
Required key: ‘scale’.
Modifies key: ‘scale’ and ‘rotation’.
- 参数
rot_factor (int) – Rotating to
[-2*rot_factor, 2*rot_factor]
.scale_factor (float) – Scaling to
[1-scale_factor, 1+scale_factor]
.rot_prob (float) – Probability of random rotation.
- class mmpose.datasets.pipelines.top_down_transform.TopDownHalfBodyTransform(num_joints_half_body=8, prob_half_body=0.3)[源代码]¶
Data augmentation with half-body transform. Keep only the upper body or the lower body at random.
Required keys: ‘joints_3d’, ‘joints_3d_visible’, and ‘ann_info’.
Modifies key: ‘scale’ and ‘center’.
- 参数
num_joints_half_body (int) – Threshold of performing half-body transform. If the body has fewer number of joints (< num_joints_half_body), ignore this step.
prob_half_body (float) – Probability of half-body transform.
- class mmpose.datasets.pipelines.top_down_transform.TopDownRandomFlip(flip_prob=0.5)[源代码]¶
Data augmentation with random image flip.
Required keys: ‘img’, ‘joints_3d’, ‘joints_3d_visible’, ‘center’ and ‘ann_info’.
Modifies key: ‘img’, ‘joints_3d’, ‘joints_3d_visible’, ‘center’ and ‘flipped’.
- 参数
flip (bool) – Option to perform random flip.
flip_prob (float) – Probability of flip.
- class mmpose.datasets.pipelines.top_down_transform.TopDownRandomTranslation(trans_factor=0.15, trans_prob=1.0)[源代码]¶
Data augmentation with random translation.
Required key: ‘scale’ and ‘center’.
Modifies key: ‘center’.
注解
bbox height: H
bbox width: W
- 参数
trans_factor (float) – Translating center to
[-trans_factor, trans_factor] * [W, H] + center
.trans_prob (float) – Probability of random translation.
- class mmpose.datasets.pipelines.bottom_up_transform.BottomUpGenerateHeatmapTarget(sigma, use_udp=False)[源代码]¶
Generate multi-scale heatmap target for bottom-up.
- 参数
sigma (int) – Sigma of heatmap Gaussian
max_num_people (int) – Maximum number of people in an image
use_udp (bool) – To use unbiased data processing. Paper ref: Huang et al. The Devil is in the Details: Delving into Unbiased Data Processing for Human Pose Estimation (CVPR 2020).
- class mmpose.datasets.pipelines.bottom_up_transform.BottomUpGeneratePAFTarget(limb_width, skeleton=None)[源代码]¶
Generate multi-scale heatmaps and part affinity fields (PAF) target for bottom-up. Paper ref: Cao et al. Realtime Multi-Person 2D Human Pose Estimation using Part Affinity Fields (CVPR 2017).
- 参数
limb_width (int) – Limb width of part affinity fields
- class mmpose.datasets.pipelines.bottom_up_transform.BottomUpGenerateTarget(sigma, max_num_people, use_udp=False)[源代码]¶
Generate multi-scale heatmap target for associate embedding.
- 参数
sigma (int) – Sigma of heatmap Gaussian
max_num_people (int) – Maximum number of people in an image
use_udp (bool) – To use unbiased data processing. Paper ref: Huang et al. The Devil is in the Details: Delving into Unbiased Data Processing for Human Pose Estimation (CVPR 2020).
- class mmpose.datasets.pipelines.bottom_up_transform.BottomUpGetImgSize(test_scale_factor, current_scale=1, use_udp=False)[源代码]¶
Get multi-scale image sizes for bottom-up, including base_size and test_scale_factor. Keep the ratio and the image is resized to results[‘ann_info’][‘image_size’]×current_scale.
- 参数
test_scale_factor (List[float]) – Multi scale
current_scale (int) – default 1
use_udp (bool) – To use unbiased data processing. Paper ref: Huang et al. The Devil is in the Details: Delving into Unbiased Data Processing for Human Pose Estimation (CVPR 2020).
- class mmpose.datasets.pipelines.bottom_up_transform.BottomUpRandomAffine(rot_factor, scale_factor, scale_type, trans_factor, use_udp=False)[源代码]¶
Data augmentation with random scaling & rotating.
- 参数
rot_factor (int) – Rotating to [-rotation_factor, rotation_factor]
scale_factor (float) – Scaling to [1-scale_factor, 1+scale_factor]
scale_type – wrt
long
orshort
length of the image.trans_factor – Translation factor.
use_udp (bool) – To use unbiased data processing. Paper ref: Huang et al. The Devil is in the Details: Delving into Unbiased Data Processing for Human Pose Estimation (CVPR 2020).
- class mmpose.datasets.pipelines.bottom_up_transform.BottomUpRandomFlip(flip_prob=0.5)[源代码]¶
Data augmentation with random image flip for bottom-up.
- 参数
flip_prob (float) – Probability of flip.
- class mmpose.datasets.pipelines.bottom_up_transform.BottomUpResizeAlign(transforms, use_udp=False)[源代码]¶
Resize multi-scale size and align transform for bottom-up.
- 参数
transforms (List) – ToTensor & Normalize
use_udp (bool) – To use unbiased data processing. Paper ref: Huang et al. The Devil is in the Details: Delving into Unbiased Data Processing for Human Pose Estimation (CVPR 2020).
- class mmpose.datasets.pipelines.bottom_up_transform.HeatmapGenerator(output_size, num_joints, sigma=- 1, use_udp=False)[源代码]¶
Generate heatmaps for bottom-up models.
- 参数
num_joints (int) – Number of keypoints
output_size (np.ndarray) – Size (w, h) of feature map
sigma (int) – Sigma of the heatmaps.
use_udp (bool) – To use unbiased data processing. Paper ref: Huang et al. The Devil is in the Details: Delving into Unbiased Data Processing for Human Pose Estimation (CVPR 2020).
- class mmpose.datasets.pipelines.bottom_up_transform.JointsEncoder(max_num_people, num_joints, output_size, tag_per_joint)[源代码]¶
Encodes the visible joints into (coordinates, score); The coordinate of one joint and its score are of int type.
(idx * output_size**2 + y * output_size + x, 1) or (0, 0).
- 参数
max_num_people (int) – Max number of people in an image
num_joints (int) – Number of keypoints
output_size (np.ndarray) – Size (w, h) of feature map
tag_per_joint (bool) – Option to use one tag map per joint.
- class mmpose.datasets.pipelines.bottom_up_transform.PAFGenerator(output_size, limb_width, skeleton)[源代码]¶
Generate part affinity fields.
- 参数
output_size (np.ndarray) – Size (w, h) of feature map.
limb_width (int) – Limb width of part affinity fields.
skeleton (list[list]) – connections of joints.
- class mmpose.datasets.pipelines.mesh_transform.IUVToTensor[源代码]¶
Transform IUV image to part index mask and uv coordinates image. The 3 channels of IUV image means: part index, u coordinates, v coordinates.
Required key: ‘iuv’, ‘ann_info’. Modifies key: ‘part_index’, ‘uv_coordinates’.
- 参数
results (dict) – contain all information about training.
- class mmpose.datasets.pipelines.mesh_transform.LoadIUVFromFile(to_float32=False)[源代码]¶
Loading IUV image from file.
- class mmpose.datasets.pipelines.mesh_transform.MeshAffine[源代码]¶
Affine transform the image to get input image. Affine transform the 2D keypoints, 3D kepoints and IUV image too.
Required keys: ‘img’, ‘joints_2d’,’joints_2d_visible’, ‘joints_3d’, ‘joints_3d_visible’, ‘pose’, ‘iuv’, ‘ann_info’,’scale’, ‘rotation’ and ‘center’. Modifies key: ‘img’, ‘joints_2d’,’joints_2d_visible’, ‘joints_3d’, ‘pose’, ‘iuv’.
- class mmpose.datasets.pipelines.mesh_transform.MeshGetRandomScaleRotation(rot_factor=30, scale_factor=0.25, rot_prob=0.6)[源代码]¶
Data augmentation with random scaling & rotating.
Required key: ‘scale’. Modifies key: ‘scale’ and ‘rotation’.
- 参数
rot_factor (int) – Rotating to
[-2*rot_factor, 2*rot_factor]
.scale_factor (float) – Scaling to
[1-scale_factor, 1+scale_factor]
.rot_prob (float) – Probability of random rotation.
- class mmpose.datasets.pipelines.mesh_transform.MeshRandomChannelNoise(noise_factor=0.4)[源代码]¶
Data augmentation with random channel noise.
Required keys: ‘img’ Modifies key: ‘img’
- 参数
noise_factor (float) – Multiply each channel with a factor between``[1-scale_factor, 1+scale_factor]``
- class mmpose.datasets.pipelines.mesh_transform.MeshRandomFlip(flip_prob=0.5)[源代码]¶
Data augmentation with random image flip.
Required keys: ‘img’, ‘joints_2d’,’joints_2d_visible’, ‘joints_3d’, ‘joints_3d_visible’, ‘center’, ‘pose’, ‘iuv’ and ‘ann_info’. Modifies key: ‘img’, ‘joints_2d’,’joints_2d_visible’, ‘joints_3d’, ‘joints_3d_visible’, ‘center’, ‘pose’, ‘iuv’.
- 参数
flip_prob (float) – Probability of flip.
- class mmpose.datasets.pipelines.pose3d_transform.CameraProjection(item, mode, output_name=None, camera_type='SimpleCamera', camera_param=None)[源代码]¶
Apply camera projection to joint coordinates.
- 参数
item (str) – The name of the pose to apply camera projection.
mode (str) –
The type of camera projection, supported options are
world_to_camera
world_to_pixel
camera_to_world
camera_to_pixel
output_name (str|None) – The name of the projected pose. If None (default) is given, the projected pose will be stored in place.
camera_type (str) – The camera class name (should be registered in CAMERA).
camera_param (dict|None) – The camera parameter dict. See the camera class definition for more details. If None is given, the camera parameter will be obtained during processing of each data sample with the key “camera_param”.
Required keys:
item
camera_param (if camera parameters are not given in initialization)
- Modified keys:
output_name
- class mmpose.datasets.pipelines.pose3d_transform.CollectCameraIntrinsics(camera_param=None, need_distortion=True)[源代码]¶
Store camera intrinsics in a 1-dim array, including f, c, k, p.
- 参数
camera_param (dict|None) – The camera parameter dict. See the camera class definition for more details. If None is given, the camera parameter will be obtained during processing of each data sample with the key “camera_param”.
need_distortion (bool) – Whether need distortion parameters k and p. Default: True.
- Required keys:
camera_param (if camera parameters are not given in initialization)
- Modified keys:
intrinsics
- class mmpose.datasets.pipelines.pose3d_transform.Generate3DHeatmapTarget(sigma=2, joint_indices=None, max_bound=1.0)[源代码]¶
Generate the target 3d heatmap.
Required keys: ‘joints_3d’, ‘joints_3d_visible’, ‘ann_info’. Modified keys: ‘target’, and ‘target_weight’.
- 参数
sigma – Sigma of heatmap gaussian.
joint_indices (list) – Indices of joints used for heatmap generation. If None (default) is given, all joints will be used.
max_bound (float) – The maximal value of heatmap.
- class mmpose.datasets.pipelines.pose3d_transform.GenerateVoxel3DHeatmapTarget(sigma=200.0, joint_indices=None)[源代码]¶
Generate the target 3d heatmap.
Required keys: ‘joints_3d’, ‘joints_3d_visible’, ‘ann_info_3d’. Modified keys: ‘target’, and ‘target_weight’.
- 参数
sigma – Sigma of heatmap gaussian (mm).
joint_indices (list) – Indices of joints used for heatmap generation. If None (default) is given, all joints will be used.
- class mmpose.datasets.pipelines.pose3d_transform.GetRootCenteredPose(item, root_index, visible_item=None, remove_root=False, root_name=None)[源代码]¶
Zero-center the pose around a given root joint. Optionally, the root joint can be removed from the original pose and stored as a separate item.
Note that the root-centered joints may no longer align with some annotation information (e.g. flip_pairs, num_joints, inference_channel, etc.) due to the removal of the root joint.
- 参数
item (str) – The name of the pose to apply root-centering.
root_index (int) – Root joint index in the pose.
visible_item (str) – The name of the visibility item.
remove_root (bool) – If true, remove the root joint from the pose
root_name (str) – Optional. If not none, it will be used as the key to store the root position separated from the original pose.
- Required keys:
item
- Modified keys:
item, visible_item, root_name
- class mmpose.datasets.pipelines.pose3d_transform.ImageCoordinateNormalization(item, norm_camera=False, camera_param=None)[源代码]¶
Normalize the 2D joint coordinate with image width and height. Range [0, w] is mapped to [-1, 1], while preserving the aspect ratio.
- 参数
item (str|list[str]) – The name of the pose to normalize.
norm_camera (bool) – Whether to normalize camera intrinsics. Default: False.
camera_param (dict|None) – The camera parameter dict. See the camera class definition for more details. If None is given, the camera parameter will be obtained during processing of each data sample with the key “camera_param”.
- Required keys:
item
- Modified keys:
item (, camera_param)
- class mmpose.datasets.pipelines.pose3d_transform.NormalizeJointCoordinate(item, mean=None, std=None, norm_param_file=None)[源代码]¶
Normalize the joint coordinate with given mean and std.
- 参数
item (str) – The name of the pose to normalize.
mean (array) – Mean values of joint coordinates in shape [K, C].
std (array) – Std values of joint coordinates in shape [K, C].
norm_param_file (str) – Optionally load a dict containing mean and std from a file using mmcv.load.
- Required keys:
item
- Modified keys:
item
- class mmpose.datasets.pipelines.pose3d_transform.PoseSequenceToTensor(item)[源代码]¶
Convert pose sequence from numpy array to Tensor.
The original pose sequence should have a shape of [T,K,C] or [K,C], where T is the sequence length, K and C are keypoint number and dimension. The converted pose sequence will have a shape of [KxC, T].
- 参数
item (str) – The name of the pose sequence
- Required keys:
item
- Modified keys:
item
- class mmpose.datasets.pipelines.pose3d_transform.RelativeJointRandomFlip(item, flip_cfg, visible_item=None, flip_prob=0.5, flip_camera=False, camera_param=None)[源代码]¶
Data augmentation with random horizontal joint flip around a root joint.
- 参数
item (str|list[str]) – The name of the pose to flip.
flip_cfg (dict|list[dict]) –
Configurations of the fliplr_regression function. It should contain the following arguments:
center_mode
: The mode to set the center location on the x-axis to flip around.center_x
orcenter_index
: Set the x-axis location or the root joint’s index to define the flip center.
Please refer to the docstring of the fliplr_regression function for more details.
visible_item (str|list[str]) – The name of the visibility item which will be flipped accordingly along with the pose.
flip_prob (float) – Probability of flip.
flip_camera (bool) – Whether to flip horizontal distortion coefficients.
camera_param (dict|None) – The camera parameter dict. See the camera class definition for more details. If None is given, the camera parameter will be obtained during processing of each data sample with the key “camera_param”.
- Required keys:
item
- Modified keys:
item (, camera_param)
samplers¶
- class mmpose.datasets.samplers.DistributedSampler(dataset, num_replicas=None, rank=None, shuffle=True, seed=0)[源代码]¶
DistributedSampler inheriting from torch.utils.data.DistributedSampler.
In pytorch of lower versions, there is no shuffle argument. This child class will port one to DistributedSampler.
mmpose.utils¶
- class mmpose.utils.StopWatch(window=1)[源代码]¶
A helper class to measure FPS and detailed time consuming of each phase in a video processing loop or similar scenarios.
- 参数
window (int) – The sliding window size to calculate the running average of the time consuming.
示例
>>> from mmpose.utils import StopWatch >>> import time >>> stop_watch = StopWatch(window=10) >>> with stop_watch.timeit('total'): >>> time.sleep(0.1) >>> # 'timeit' support nested use >>> with stop_watch.timeit('phase1'): >>> time.sleep(0.1) >>> with stop_watch.timeit('phase2'): >>> time.sleep(0.2) >>> time.sleep(0.2) >>> report = stop_watch.report()
- report(key=None)[源代码]¶
Report timing information.
- 返回
The key is the timer name and the value is the corresponding average time consuming.
- 返回类型
dict
- report_strings()[源代码]¶
Report timing information in texture strings.
- 返回
Each element is the information string of a timed event, in format of ‘{timer_name}: {time_in_ms}’. Specially, if timer_name is ‘_FPS_’, the result will be converted to fps.
- 返回类型
list(str)
- timeit(timer_name='_FPS_')[源代码]¶
Timing a code snippet with an assigned name.
- 参数
timer_name (str) – The unique name of the interested code snippet to handle multiple timers and generate reports. Note that ‘_FPS_’ is a special key that the measurement will be in fps instead of millisecond. Also see report and report_strings. Default: ‘_FPS_’.
注解
This function should always be used in a with statement, as shown in the example.
- mmpose.utils.get_root_logger(log_file=None, log_level=20)[源代码]¶
Use get_logger method in mmcv to get the root logger.
The logger will be initialized if it has not been initialized. By default a StreamHandler will be added. If log_file is specified, a FileHandler will also be added. The name of the root logger is the top-level package name, e.g., “mmpose”.
- 参数
log_file (str | None) – The log filename. If specified, a FileHandler will be added to the root logger.
log_level (int) – The root logger level. Note that only the process of rank 0 is affected, while other processes will set the level to “Error” and be silent most of the time.
- 返回
The root logger.
- 返回类型
logging.Logger