Algorithms¶
MTUT (CVPR’2019)¶
Mtut + I3d on Nvgesture¶
MTUT (CVPR'2019)
@InProceedings{Abavisani_2019_CVPR,
author = {Abavisani, Mahdi and Joze, Hamid Reza Vaezi and Patel, Vishal M.},
title = {Improving the Performance of Unimodal Dynamic Hand-Gesture Recognition With Multimodal Training},
booktitle = {Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)},
month = {June},
year = {2019}
}
I3D (CVPR'2017)
@InProceedings{Carreira_2017_CVPR,
author = {Carreira, Joao and Zisserman, Andrew},
title = {Quo Vadis, Action Recognition? A New Model and the Kinetics Dataset},
booktitle = {Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR)},
month = {July},
year = {2017}
}
NVGesture (CVPR'2016)
@InProceedings{Molchanov_2016_CVPR,
author = {Molchanov, Pavlo and Yang, Xiaodong and Gupta, Shalini and Kim, Kihwan and Tyree, Stephen and Kautz, Jan},
title = {Online Detection and Classification of Dynamic Hand Gestures With Recurrent 3D Convolutional Neural Network},
booktitle = {Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR)},
month = {June},
year = {2016}
}
Results on NVGesture test set
Arch | Input Size | fps | bbox | AP_rgb | AP_depth | ckpt | log |
---|---|---|---|---|---|---|---|
I3D+MTUT* | 112x112 | 15 | $\surd$ | 0.725 | 0.730 | ckpt | log |
I3D+MTUT | 224x224 | 30 | $\surd$ | 0.782 | 0.811 | ckpt | log |
I3D+MTUT | 224x224 | 30 | $\times$ | 0.739 | 0.809 | ckpt | log |
*: MTUT supports multi-modal training and uni-modal testing. Model trained with this config can be used to recognize gestures in rgb videos with inference config.
MSPN (ArXiv’2019)¶
Topdown Heatmap + MSPN on Coco¶
MSPN (ArXiv'2019)
@article{li2019rethinking,
title={Rethinking on Multi-Stage Networks for Human Pose Estimation},
author={Li, Wenbo and Wang, Zhicheng and Yin, Binyi and Peng, Qixiang and Du, Yuming and Xiao, Tianzi and Yu, Gang and Lu, Hongtao and Wei, Yichen and Sun, Jian},
journal={arXiv preprint arXiv:1901.00148},
year={2019}
}
COCO (ECCV'2014)
@inproceedings{lin2014microsoft,
title={Microsoft coco: Common objects in context},
author={Lin, Tsung-Yi and Maire, Michael and Belongie, Serge and Hays, James and Perona, Pietro and Ramanan, Deva and Doll{\'a}r, Piotr and Zitnick, C Lawrence},
booktitle={European conference on computer vision},
pages={740--755},
year={2014},
organization={Springer}
}
Results on COCO val2017 with detector having human AP of 56.4 on COCO val2017 dataset
Arch | Input Size | AP | AP50 | AP75 | AR | AR50 | ckpt | log |
---|---|---|---|---|---|---|---|---|
mspn_50 | 256x192 | 0.723 | 0.895 | 0.794 | 0.788 | 0.933 | ckpt | log |
2xmspn_50 | 256x192 | 0.754 | 0.903 | 0.825 | 0.815 | 0.941 | ckpt | log |
3xmspn_50 | 256x192 | 0.758 | 0.904 | 0.830 | 0.821 | 0.943 | ckpt | log |
4xmspn_50 | 256x192 | 0.764 | 0.906 | 0.835 | 0.826 | 0.944 | ckpt | log |
InterNet (ECCV’2020)¶
Internet + Internet on Interhand3d¶
InterNet (ECCV'2020)
@InProceedings{Moon_2020_ECCV_InterHand2.6M,
author = {Moon, Gyeongsik and Yu, Shoou-I and Wen, He and Shiratori, Takaaki and Lee, Kyoung Mu},
title = {InterHand2.6M: A Dataset and Baseline for 3D Interacting Hand Pose Estimation from a Single RGB Image},
booktitle = {European Conference on Computer Vision (ECCV)},
year = {2020}
}
ResNet (CVPR'2016)
@inproceedings{he2016deep,
title={Deep residual learning for image recognition},
author={He, Kaiming and Zhang, Xiangyu and Ren, Shaoqing and Sun, Jian},
booktitle={Proceedings of the IEEE conference on computer vision and pattern recognition},
pages={770--778},
year={2016}
}
InterHand2.6M (ECCV'2020)
@InProceedings{Moon_2020_ECCV_InterHand2.6M,
author = {Moon, Gyeongsik and Yu, Shoou-I and Wen, He and Shiratori, Takaaki and Lee, Kyoung Mu},
title = {InterHand2.6M: A Dataset and Baseline for 3D Interacting Hand Pose Estimation from a Single RGB Image},
booktitle = {European Conference on Computer Vision (ECCV)},
year = {2020}
}
Results on InterHand2.6M val & test set
Train Set | Set | Arch | Input Size | MPJPE-single | MPJPE-interacting | MPJPE-all | MRRPE | APh | ckpt | log |
---|---|---|---|---|---|---|---|---|---|---|
All | test(H+M) | InterNet_resnet_50 | 256x256 | 9.47 | 13.40 | 11.59 | 29.28 | 0.99 | ckpt | log |
All | val(M) | InterNet_resnet_50 | 256x256 | 11.22 | 15.23 | 13.16 | 31.73 | 0.98 | ckpt | log |
DEKR (CVPR’2021)¶
Dekr + Hrnet on Coco¶
DEKR (CVPR'2021)
@inproceedings{geng2021bottom,
title={Bottom-up human pose estimation via disentangled keypoint regression},
author={Geng, Zigang and Sun, Ke and Xiao, Bin and Zhang, Zhaoxiang and Wang, Jingdong},
booktitle={Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition},
pages={14676--14686},
year={2021}
}
HRNet (CVPR'2019)
@inproceedings{sun2019deep,
title={Deep high-resolution representation learning for human pose estimation},
author={Sun, Ke and Xiao, Bin and Liu, Dong and Wang, Jingdong},
booktitle={Proceedings of the IEEE conference on computer vision and pattern recognition},
pages={5693--5703},
year={2019}
}
COCO (ECCV'2014)
@inproceedings{lin2014microsoft,
title={Microsoft coco: Common objects in context},
author={Lin, Tsung-Yi and Maire, Michael and Belongie, Serge and Hays, James and Perona, Pietro and Ramanan, Deva and Doll{\'a}r, Piotr and Zitnick, C Lawrence},
booktitle={European conference on computer vision},
pages={740--755},
year={2014},
organization={Springer}
}
Results on COCO val2017 without multi-scale test
Arch | Input Size | AP | AP50 | AP75 | AR | AR50 | ckpt | log |
---|---|---|---|---|---|---|---|---|
HRNet-w32 | 512x512 | 0.680 | 0.868 | 0.745 | 0.728 | 0.897 | ckpt | log |
HRNet-w48 | 640x640 | 0.709 | 0.876 | 0.773 | 0.758 | 0.909 | ckpt | log |
Results on COCO val2017 with multi-scale test. 3 default scales ([2, 1, 0.5]) are used
Arch | Input Size | AP | AP50 | AP75 | AR | AR50 | ckpt |
---|---|---|---|---|---|---|---|
HRNet-w32* | 512x512 | 0.705 | 0.878 | 0.767 | 0.759 | 0.921 | ckpt |
HRNet-w48* | 640x640 | 0.722 | 0.882 | 0.785 | 0.778 | 0.928 | ckpt |
* these configs are generally used for evaluation. The training settings are identical to their single-scale counterparts.
The results of models provided by the authors on COCO val2017 using the same evaluation protocol
Arch | Input Size | Setting | AP | AP50 | AP75 | AR | AR50 | ckpt |
---|---|---|---|---|---|---|---|---|
HRNet-w32 | 512x512 | single-scale | 0.678 | 0.868 | 0.744 | 0.728 | 0.897 | see official implementation |
HRNet-w48 | 640x640 | single-scale | 0.707 | 0.876 | 0.773 | 0.757 | 0.909 | see official implementation |
HRNet-w32 | 512x512 | multi-scale | 0.708 | 0.880 | 0.773 | 0.763 | 0.921 | see official implementation |
HRNet-w48 | 640x640 | multi-scale | 0.721 | 0.881 | 0.786 | 0.779 | 0.927 | see official implementation |
The discrepancy between these results and that shown in paper is attributed to the differences in implementation details in evaluation process.
Dekr + Hrnet on Crowdpose¶
DEKR (CVPR'2021)
@inproceedings{geng2021bottom,
title={Bottom-up human pose estimation via disentangled keypoint regression},
author={Geng, Zigang and Sun, Ke and Xiao, Bin and Zhang, Zhaoxiang and Wang, Jingdong},
booktitle={Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition},
pages={14676--14686},
year={2021}
}
HRNet (CVPR'2019)
@inproceedings{sun2019deep,
title={Deep high-resolution representation learning for human pose estimation},
author={Sun, Ke and Xiao, Bin and Liu, Dong and Wang, Jingdong},
booktitle={Proceedings of the IEEE conference on computer vision and pattern recognition},
pages={5693--5703},
year={2019}
}
CrowdPose (CVPR'2019)
@article{li2018crowdpose,
title={CrowdPose: Efficient Crowded Scenes Pose Estimation and A New Benchmark},
author={Li, Jiefeng and Wang, Can and Zhu, Hao and Mao, Yihuan and Fang, Hao-Shu and Lu, Cewu},
journal={arXiv preprint arXiv:1812.00324},
year={2018}
}
Results on CrowdPose test without multi-scale test
Arch | Input Size | AP | AP50 | AP75 | AR | AR50 | ckpt | log |
---|---|---|---|---|---|---|---|---|
HRNet-w32 | 512x512 | 0.663 | 0.857 | 0.715 | 0.719 | 0.893 | ckpt | log |
HRNet-w48 | 640x640 | 0.682 | 0.869 | 0.736 | 0.742 | 0.911 | ckpt | log |
Results on CrowdPose test with multi-scale test. 3 default scales ([2, 1, 0.5]) are used
Arch | Input Size | AP | AP50 | AP75 | AR | AR50 | ckpt |
---|---|---|---|---|---|---|---|
HRNet-w32* | 512x512 | 0.692 | 0.874 | 0.748 | 0.755 | 0.926 | ckpt |
HRNet-w48* | 640x640 | 0.696 | 0.869 | 0.749 | 0.769 | 0.933 | ckpt |
* these configs are generally used for evaluation. The training settings are identical to their single-scale counterparts.
HigherHRNet (CVPR’2020)¶
Associative Embedding + Higherhrnet on Aic¶
Associative Embedding (NIPS'2017)
@inproceedings{newell2017associative,
title={Associative embedding: End-to-end learning for joint detection and grouping},
author={Newell, Alejandro and Huang, Zhiao and Deng, Jia},
booktitle={Advances in neural information processing systems},
pages={2277--2287},
year={2017}
}
HigherHRNet (CVPR'2020)
@inproceedings{cheng2020higherhrnet,
title={HigherHRNet: Scale-Aware Representation Learning for Bottom-Up Human Pose Estimation},
author={Cheng, Bowen and Xiao, Bin and Wang, Jingdong and Shi, Honghui and Huang, Thomas S and Zhang, Lei},
booktitle={Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition},
pages={5386--5395},
year={2020}
}
AI Challenger (ArXiv'2017)
@article{wu2017ai,
title={Ai challenger: A large-scale dataset for going deeper in image understanding},
author={Wu, Jiahong and Zheng, He and Zhao, Bo and Li, Yixin and Yan, Baoming and Liang, Rui and Wang, Wenjia and Zhou, Shipei and Lin, Guosen and Fu, Yanwei and others},
journal={arXiv preprint arXiv:1711.06475},
year={2017}
}
Results on AIC validation set without multi-scale test
Arch | Input Size | AP | AP50 | AP75 | AR | AR50 | ckpt | log |
---|---|---|---|---|---|---|---|---|
HigherHRNet-w32 | 512x512 | 0.315 | 0.710 | 0.243 | 0.379 | 0.757 | ckpt | log |
Results on AIC validation set with multi-scale test. 3 default scales ([2, 1, 0.5]) are used
Arch | Input Size | AP | AP50 | AP75 | AR | AR50 | ckpt | log |
---|---|---|---|---|---|---|---|---|
HigherHRNet-w32 | 512x512 | 0.323 | 0.718 | 0.254 | 0.379 | 0.758 | ckpt | log |
Associative Embedding + Higherhrnet + Udp on Coco¶
Associative Embedding (NIPS'2017)
@inproceedings{newell2017associative,
title={Associative embedding: End-to-end learning for joint detection and grouping},
author={Newell, Alejandro and Huang, Zhiao and Deng, Jia},
booktitle={Advances in neural information processing systems},
pages={2277--2287},
year={2017}
}
HigherHRNet (CVPR'2020)
@inproceedings{cheng2020higherhrnet,
title={HigherHRNet: Scale-Aware Representation Learning for Bottom-Up Human Pose Estimation},
author={Cheng, Bowen and Xiao, Bin and Wang, Jingdong and Shi, Honghui and Huang, Thomas S and Zhang, Lei},
booktitle={Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition},
pages={5386--5395},
year={2020}
}
UDP (CVPR'2020)
@InProceedings{Huang_2020_CVPR,
author = {Huang, Junjie and Zhu, Zheng and Guo, Feng and Huang, Guan},
title = {The Devil Is in the Details: Delving Into Unbiased Data Processing for Human Pose Estimation},
booktitle = {The IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)},
month = {June},
year = {2020}
}
COCO (ECCV'2014)
@inproceedings{lin2014microsoft,
title={Microsoft coco: Common objects in context},
author={Lin, Tsung-Yi and Maire, Michael and Belongie, Serge and Hays, James and Perona, Pietro and Ramanan, Deva and Doll{\'a}r, Piotr and Zitnick, C Lawrence},
booktitle={European conference on computer vision},
pages={740--755},
year={2014},
organization={Springer}
}
Results on COCO val2017 without multi-scale test
Arch | Input Size | AP | AP50 | AP75 | AR | AR50 | ckpt | log |
---|---|---|---|---|---|---|---|---|
HigherHRNet-w32_udp | 512x512 | 0.678 | 0.862 | 0.736 | 0.724 | 0.890 | ckpt | log |
HigherHRNet-w48_udp | 512x512 | 0.690 | 0.872 | 0.750 | 0.734 | 0.891 | ckpt | log |
Associative Embedding + Higherhrnet on Coco¶
Associative Embedding (NIPS'2017)
@inproceedings{newell2017associative,
title={Associative embedding: End-to-end learning for joint detection and grouping},
author={Newell, Alejandro and Huang, Zhiao and Deng, Jia},
booktitle={Advances in neural information processing systems},
pages={2277--2287},
year={2017}
}
HigherHRNet (CVPR'2020)
@inproceedings{cheng2020higherhrnet,
title={HigherHRNet: Scale-Aware Representation Learning for Bottom-Up Human Pose Estimation},
author={Cheng, Bowen and Xiao, Bin and Wang, Jingdong and Shi, Honghui and Huang, Thomas S and Zhang, Lei},
booktitle={Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition},
pages={5386--5395},
year={2020}
}
COCO (ECCV'2014)
@inproceedings{lin2014microsoft,
title={Microsoft coco: Common objects in context},
author={Lin, Tsung-Yi and Maire, Michael and Belongie, Serge and Hays, James and Perona, Pietro and Ramanan, Deva and Doll{\'a}r, Piotr and Zitnick, C Lawrence},
booktitle={European conference on computer vision},
pages={740--755},
year={2014},
organization={Springer}
}
Results on COCO val2017 without multi-scale test
Arch | Input Size | AP | AP50 | AP75 | AR | AR50 | ckpt | log |
---|---|---|---|---|---|---|---|---|
HigherHRNet-w32 | 512x512 | 0.677 | 0.870 | 0.738 | 0.723 | 0.890 | ckpt | log |
HigherHRNet-w32 | 640x640 | 0.686 | 0.871 | 0.747 | 0.733 | 0.898 | ckpt | log |
HigherHRNet-w48 | 512x512 | 0.686 | 0.873 | 0.741 | 0.731 | 0.892 | ckpt | log |
Results on COCO val2017 with multi-scale test. 3 default scales ([2, 1, 0.5]) are used
Arch | Input Size | AP | AP50 | AP75 | AR | AR50 | ckpt | log |
---|---|---|---|---|---|---|---|---|
HigherHRNet-w32 | 512x512 | 0.706 | 0.881 | 0.771 | 0.747 | 0.901 | ckpt | log |
HigherHRNet-w32 | 640x640 | 0.706 | 0.880 | 0.770 | 0.749 | 0.902 | ckpt | log |
HigherHRNet-w48 | 512x512 | 0.716 | 0.884 | 0.775 | 0.755 | 0.901 | ckpt | log |
Associative Embedding + Higherhrnet on Crowdpose¶
Associative Embedding (NIPS'2017)
@inproceedings{newell2017associative,
title={Associative embedding: End-to-end learning for joint detection and grouping},
author={Newell, Alejandro and Huang, Zhiao and Deng, Jia},
booktitle={Advances in neural information processing systems},
pages={2277--2287},
year={2017}
}
HigherHRNet (CVPR'2020)
@inproceedings{cheng2020higherhrnet,
title={HigherHRNet: Scale-Aware Representation Learning for Bottom-Up Human Pose Estimation},
author={Cheng, Bowen and Xiao, Bin and Wang, Jingdong and Shi, Honghui and Huang, Thomas S and Zhang, Lei},
booktitle={Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition},
pages={5386--5395},
year={2020}
}
CrowdPose (CVPR'2019)
@article{li2018crowdpose,
title={CrowdPose: Efficient Crowded Scenes Pose Estimation and A New Benchmark},
author={Li, Jiefeng and Wang, Can and Zhu, Hao and Mao, Yihuan and Fang, Hao-Shu and Lu, Cewu},
journal={arXiv preprint arXiv:1812.00324},
year={2018}
}
Results on CrowdPose test without multi-scale test
Arch | Input Size | AP | AP50 | AP75 | AP (E) | AP (M) | AP (H) | ckpt | log |
---|---|---|---|---|---|---|---|---|---|
HigherHRNet-w32 | 512x512 | 0.655 | 0.859 | 0.705 | 0.728 | 0.660 | 0.577 | ckpt | log |
Results on CrowdPose test with multi-scale test. 2 scales ([2, 1]) are used
Arch | Input Size | AP | AP50 | AP75 | AP (E) | AP (M) | AP (H) | ckpt | log |
---|---|---|---|---|---|---|---|---|---|
HigherHRNet-w32 | 512x512 | 0.661 | 0.864 | 0.710 | 0.742 | 0.670 | 0.566 | ckpt | log |
Associative Embedding + Higherhrnet on Coco-Wholebody¶
Associative Embedding (NIPS'2017)
@inproceedings{newell2017associative,
title={Associative embedding: End-to-end learning for joint detection and grouping},
author={Newell, Alejandro and Huang, Zhiao and Deng, Jia},
booktitle={Advances in neural information processing systems},
pages={2277--2287},
year={2017}
}
HigherHRNet (CVPR'2020)
@inproceedings{cheng2020higherhrnet,
title={HigherHRNet: Scale-Aware Representation Learning for Bottom-Up Human Pose Estimation},
author={Cheng, Bowen and Xiao, Bin and Wang, Jingdong and Shi, Honghui and Huang, Thomas S and Zhang, Lei},
booktitle={Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition},
pages={5386--5395},
year={2020}
}
COCO-WholeBody (ECCV'2020)
@inproceedings{jin2020whole,
title={Whole-Body Human Pose Estimation in the Wild},
author={Jin, Sheng and Xu, Lumin and Xu, Jin and Wang, Can and Liu, Wentao and Qian, Chen and Ouyang, Wanli and Luo, Ping},
booktitle={Proceedings of the European Conference on Computer Vision (ECCV)},
year={2020}
}
Results on COCO-WholeBody v1.0 val without multi-scale test
Arch | Input Size | Body AP | Body AR | Foot AP | Foot AR | Face AP | Face AR | Hand AP | Hand AR | Whole AP | Whole AR | ckpt | log |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|
HigherHRNet-w32+ | 512x512 | 0.590 | 0.672 | 0.185 | 0.335 | 0.676 | 0.721 | 0.212 | 0.298 | 0.401 | 0.493 | ckpt | log |
HigherHRNet-w48+ | 512x512 | 0.630 | 0.706 | 0.440 | 0.573 | 0.730 | 0.777 | 0.389 | 0.477 | 0.487 | 0.574 | ckpt | log |
Note: +
means the model is first pre-trained on original COCO dataset, and then fine-tuned on COCO-WholeBody dataset. We find this will lead to better performance.
DeepPose (CVPR’2014)¶
Deeppose + Resnet on Coco¶
DeepPose (CVPR'2014)
@inproceedings{toshev2014deeppose,
title={Deeppose: Human pose estimation via deep neural networks},
author={Toshev, Alexander and Szegedy, Christian},
booktitle={Proceedings of the IEEE conference on computer vision and pattern recognition},
pages={1653--1660},
year={2014}
}
ResNet (CVPR'2016)
@inproceedings{he2016deep,
title={Deep residual learning for image recognition},
author={He, Kaiming and Zhang, Xiangyu and Ren, Shaoqing and Sun, Jian},
booktitle={Proceedings of the IEEE conference on computer vision and pattern recognition},
pages={770--778},
year={2016}
}
COCO (ECCV'2014)
@inproceedings{lin2014microsoft,
title={Microsoft coco: Common objects in context},
author={Lin, Tsung-Yi and Maire, Michael and Belongie, Serge and Hays, James and Perona, Pietro and Ramanan, Deva and Doll{\'a}r, Piotr and Zitnick, C Lawrence},
booktitle={European conference on computer vision},
pages={740--755},
year={2014},
organization={Springer}
}
Results on COCO val2017 with detector having human AP of 56.4 on COCO val2017 dataset
Arch | Input Size | AP | AP50 | AP75 | AR | AR50 | ckpt | log |
---|---|---|---|---|---|---|---|---|
deeppose_resnet_50 | 256x192 | 0.526 | 0.816 | 0.586 | 0.638 | 0.887 | ckpt | log |
deeppose_resnet_101 | 256x192 | 0.560 | 0.832 | 0.628 | 0.668 | 0.900 | ckpt | log |
deeppose_resnet_152 | 256x192 | 0.583 | 0.843 | 0.659 | 0.686 | 0.907 | ckpt | log |
Deeppose + Resnet + Rle on Coco¶
DeepPose (CVPR'2014)
@inproceedings{toshev2014deeppose,
title={Deeppose: Human pose estimation via deep neural networks},
author={Toshev, Alexander and Szegedy, Christian},
booktitle={Proceedings of the IEEE conference on computer vision and pattern recognition},
pages={1653--1660},
year={2014}
}
RLE (ICCV'2021)
@inproceedings{li2021human,
title={Human pose regression with residual log-likelihood estimation},
author={Li, Jiefeng and Bian, Siyuan and Zeng, Ailing and Wang, Can and Pang, Bo and Liu, Wentao and Lu, Cewu},
booktitle={Proceedings of the IEEE/CVF International Conference on Computer Vision},
pages={11025--11034},
year={2021}
}
ResNet (CVPR'2016)
@inproceedings{he2016deep,
title={Deep residual learning for image recognition},
author={He, Kaiming and Zhang, Xiangyu and Ren, Shaoqing and Sun, Jian},
booktitle={Proceedings of the IEEE conference on computer vision and pattern recognition},
pages={770--778},
year={2016}
}
COCO (ECCV'2014)
@inproceedings{lin2014microsoft,
title={Microsoft coco: Common objects in context},
author={Lin, Tsung-Yi and Maire, Michael and Belongie, Serge and Hays, James and Perona, Pietro and Ramanan, Deva and Doll{\'a}r, Piotr and Zitnick, C Lawrence},
booktitle={European conference on computer vision},
pages={740--755},
year={2014},
organization={Springer}
}
Results on COCO val2017 with detector having human AP of 56.4 on COCO val2017 dataset
Arch | Input Size | AP | AP50 | AP75 | AR | AR50 | ckpt | log |
---|---|---|---|---|---|---|---|---|
deeppose_resnet_50_rle | 256x192 | 0.704 | 0.883 | 0.777 | 0.751 | 0.920 | ckpt | log |
deeppose_resnet_101_rle | 256x192 | 0.722 | 0.894 | 0.794 | 0.768 | 0.930 | ckpt | log |
deeppose_resnet_152_rle | 256x192 | 0.731 | 0.897 | 0.805 | 0.777 | 0.933 | ckpt | log |
deeppose_resnet_152_rle | 384x288 | 0.749 | 0.901 | 0.815 | 0.793 | 0.935 | ckpt | log |
Deeppose + Resnet on Mpii¶
DeepPose (CVPR'2014)
@inproceedings{toshev2014deeppose,
title={Deeppose: Human pose estimation via deep neural networks},
author={Toshev, Alexander and Szegedy, Christian},
booktitle={Proceedings of the IEEE conference on computer vision and pattern recognition},
pages={1653--1660},
year={2014}
}
ResNet (CVPR'2016)
@inproceedings{he2016deep,
title={Deep residual learning for image recognition},
author={He, Kaiming and Zhang, Xiangyu and Ren, Shaoqing and Sun, Jian},
booktitle={Proceedings of the IEEE conference on computer vision and pattern recognition},
pages={770--778},
year={2016}
}
MPII (CVPR'2014)
@inproceedings{andriluka14cvpr,
author = {Mykhaylo Andriluka and Leonid Pishchulin and Peter Gehler and Schiele, Bernt},
title = {2D Human Pose Estimation: New Benchmark and State of the Art Analysis},
booktitle = {IEEE Conference on Computer Vision and Pattern Recognition (CVPR)},
year = {2014},
month = {June}
}
Results on MPII val set
Arch | Input Size | Mean | Mean@0.1 | ckpt | log |
---|---|---|---|---|---|
deeppose_resnet_50 | 256x256 | 0.825 | 0.174 | ckpt | log |
deeppose_resnet_101 | 256x256 | 0.841 | 0.193 | ckpt | log |
deeppose_resnet_152 | 256x256 | 0.850 | 0.198 | ckpt | log |
Deeppose + Resnet + Rle on Mpii¶
DeepPose (CVPR'2014)
@inproceedings{toshev2014deeppose,
title={Deeppose: Human pose estimation via deep neural networks},
author={Toshev, Alexander and Szegedy, Christian},
booktitle={Proceedings of the IEEE conference on computer vision and pattern recognition},
pages={1653--1660},
year={2014}
}
RLE (ICCV'2021)
@inproceedings{li2021human,
title={Human pose regression with residual log-likelihood estimation},
author={Li, Jiefeng and Bian, Siyuan and Zeng, Ailing and Wang, Can and Pang, Bo and Liu, Wentao and Lu, Cewu},
booktitle={Proceedings of the IEEE/CVF International Conference on Computer Vision},
pages={11025--11034},
year={2021}
}
ResNet (CVPR'2016)
@inproceedings{he2016deep,
title={Deep residual learning for image recognition},
author={He, Kaiming and Zhang, Xiangyu and Ren, Shaoqing and Sun, Jian},
booktitle={Proceedings of the IEEE conference on computer vision and pattern recognition},
pages={770--778},
year={2016}
}
MPII (CVPR'2014)
@inproceedings{andriluka14cvpr,
author = {Mykhaylo Andriluka and Leonid Pishchulin and Peter Gehler and Schiele, Bernt},
title = {2D Human Pose Estimation: New Benchmark and State of the Art Analysis},
booktitle = {IEEE Conference on Computer Vision and Pattern Recognition (CVPR)},
year = {2014},
month = {June}
}
Results on MPII val set
Arch | Input Size | Mean | Mean@0.1 | ckpt | log |
---|---|---|---|---|---|
deeppose_resnet_50_rle | 256x256 | 0.860 | 0.263 | ckpt | log |
Deeppose + Resnet on WFLW¶
DeepPose (CVPR'2014)
@inproceedings{toshev2014deeppose,
title={Deeppose: Human pose estimation via deep neural networks},
author={Toshev, Alexander and Szegedy, Christian},
booktitle={Proceedings of the IEEE conference on computer vision and pattern recognition},
pages={1653--1660},
year={2014}
}
ResNet (CVPR'2016)
@inproceedings{he2016deep,
title={Deep residual learning for image recognition},
author={He, Kaiming and Zhang, Xiangyu and Ren, Shaoqing and Sun, Jian},
booktitle={Proceedings of the IEEE conference on computer vision and pattern recognition},
pages={770--778},
year={2016}
}
WFLW (CVPR'2018)
@inproceedings{wu2018look,
title={Look at boundary: A boundary-aware face alignment algorithm},
author={Wu, Wayne and Qian, Chen and Yang, Shuo and Wang, Quan and Cai, Yici and Zhou, Qiang},
booktitle={Proceedings of the IEEE conference on computer vision and pattern recognition},
pages={2129--2138},
year={2018}
}
Results on WFLW dataset
The model is trained on WFLW train.
Arch | Input Size | NMEtest | NMEpose | NMEillumination | NMEocclusion | NMEblur | NMEmakeup | NMEexpression | ckpt | log |
---|---|---|---|---|---|---|---|---|---|---|
deeppose_res50 | 256x256 | 4.85 | 8.50 | 4.81 | 5.69 | 5.45 | 4.82 | 5.20 | ckpt | log |
Deeppose + Resnet + Softwingloss on WFLW¶
DeepPose (CVPR'2014)
@inproceedings{toshev2014deeppose,
title={Deeppose: Human pose estimation via deep neural networks},
author={Toshev, Alexander and Szegedy, Christian},
booktitle={Proceedings of the IEEE conference on computer vision and pattern recognition},
pages={1653--1660},
year={2014}
}
ResNet (CVPR'2016)
@inproceedings{he2016deep,
title={Deep residual learning for image recognition},
author={He, Kaiming and Zhang, Xiangyu and Ren, Shaoqing and Sun, Jian},
booktitle={Proceedings of the IEEE conference on computer vision and pattern recognition},
pages={770--778},
year={2016}
}
SoftWingloss (TIP'2021)
@article{lin2021structure,
title={Structure-Coherent Deep Feature Learning for Robust Face Alignment},
author={Lin, Chunze and Zhu, Beier and Wang, Quan and Liao, Renjie and Qian, Chen and Lu, Jiwen and Zhou, Jie},
journal={IEEE Transactions on Image Processing},
year={2021},
publisher={IEEE}
}
WFLW (CVPR'2018)
@inproceedings{wu2018look,
title={Look at boundary: A boundary-aware face alignment algorithm},
author={Wu, Wayne and Qian, Chen and Yang, Shuo and Wang, Quan and Cai, Yici and Zhou, Qiang},
booktitle={Proceedings of the IEEE conference on computer vision and pattern recognition},
pages={2129--2138},
year={2018}
}
Results on WFLW dataset
The model is trained on WFLW train.
Arch | Input Size | NMEtest | NMEpose | NMEillumination | NMEocclusion | NMEblur | NMEmakeup | NMEexpression | ckpt | log |
---|---|---|---|---|---|---|---|---|---|---|
deeppose_res50_softwingloss | 256x256 | 4.41 | 7.77 | 4.37 | 5.27 | 5.01 | 4.36 | 4.70 | ckpt | log |
Deeppose + Resnet + Wingloss on WFLW¶
DeepPose (CVPR'2014)
@inproceedings{toshev2014deeppose,
title={Deeppose: Human pose estimation via deep neural networks},
author={Toshev, Alexander and Szegedy, Christian},
booktitle={Proceedings of the IEEE conference on computer vision and pattern recognition},
pages={1653--1660},
year={2014}
}
ResNet (CVPR'2016)
@inproceedings{he2016deep,
title={Deep residual learning for image recognition},
author={He, Kaiming and Zhang, Xiangyu and Ren, Shaoqing and Sun, Jian},
booktitle={Proceedings of the IEEE conference on computer vision and pattern recognition},
pages={770--778},
year={2016}
}
Wingloss (CVPR'2018)
@inproceedings{feng2018wing,
title={Wing Loss for Robust Facial Landmark Localisation with Convolutional Neural Networks},
author={Feng, Zhen-Hua and Kittler, Josef and Awais, Muhammad and Huber, Patrik and Wu, Xiao-Jun},
booktitle={Computer Vision and Pattern Recognition (CVPR), 2018 IEEE Conference on},
year={2018},
pages ={2235-2245},
organization={IEEE}
}
WFLW (CVPR'2018)
@inproceedings{wu2018look,
title={Look at boundary: A boundary-aware face alignment algorithm},
author={Wu, Wayne and Qian, Chen and Yang, Shuo and Wang, Quan and Cai, Yici and Zhou, Qiang},
booktitle={Proceedings of the IEEE conference on computer vision and pattern recognition},
pages={2129--2138},
year={2018}
}
Results on WFLW dataset
The model is trained on WFLW train.
Arch | Input Size | NMEtest | NMEpose | NMEillumination | NMEocclusion | NMEblur | NMEmakeup | NMEexpression | ckpt | log |
---|---|---|---|---|---|---|---|---|---|---|
deeppose_res50_wingloss | 256x256 | 4.64 | 8.25 | 4.59 | 5.56 | 5.26 | 4.59 | 5.07 | ckpt | log |
Deeppose + Resnet on Deepfashion¶
DeepPose (CVPR'2014)
@inproceedings{toshev2014deeppose,
title={Deeppose: Human pose estimation via deep neural networks},
author={Toshev, Alexander and Szegedy, Christian},
booktitle={Proceedings of the IEEE conference on computer vision and pattern recognition},
pages={1653--1660},
year={2014}
}
ResNet (CVPR'2016)
@inproceedings{he2016deep,
title={Deep residual learning for image recognition},
author={He, Kaiming and Zhang, Xiangyu and Ren, Shaoqing and Sun, Jian},
booktitle={Proceedings of the IEEE conference on computer vision and pattern recognition},
pages={770--778},
year={2016}
}
DeepFashion (CVPR'2016)
@inproceedings{liuLQWTcvpr16DeepFashion,
author = {Liu, Ziwei and Luo, Ping and Qiu, Shi and Wang, Xiaogang and Tang, Xiaoou},
title = {DeepFashion: Powering Robust Clothes Recognition and Retrieval with Rich Annotations},
booktitle = {Proceedings of IEEE Conference on Computer Vision and Pattern Recognition (CVPR)},
month = {June},
year = {2016}
}
DeepFashion (ECCV'2016)
@inproceedings{liuYLWTeccv16FashionLandmark,
author = {Liu, Ziwei and Yan, Sijie and Luo, Ping and Wang, Xiaogang and Tang, Xiaoou},
title = {Fashion Landmark Detection in the Wild},
booktitle = {European Conference on Computer Vision (ECCV)},
month = {October},
year = {2016}
}
Results on DeepFashion val set
Set | Arch | Input Size | PCK@0.2 | AUC | EPE | ckpt | log |
---|---|---|---|---|---|---|---|
upper | deeppose_resnet_50 | 256x256 | 0.965 | 0.535 | 17.2 | ckpt | log |
lower | deeppose_resnet_50 | 256x256 | 0.971 | 0.678 | 11.8 | ckpt | log |
full | deeppose_resnet_50 | 256x256 | 0.983 | 0.602 | 14.0 | ckpt | log |
Deeppose + Resnet on Onehand10k¶
DeepPose (CVPR'2014)
@inproceedings{toshev2014deeppose,
title={Deeppose: Human pose estimation via deep neural networks},
author={Toshev, Alexander and Szegedy, Christian},
booktitle={Proceedings of the IEEE conference on computer vision and pattern recognition},
pages={1653--1660},
year={2014}
}
ResNet (CVPR'2016)
@inproceedings{he2016deep,
title={Deep residual learning for image recognition},
author={He, Kaiming and Zhang, Xiangyu and Ren, Shaoqing and Sun, Jian},
booktitle={Proceedings of the IEEE conference on computer vision and pattern recognition},
pages={770--778},
year={2016}
}
OneHand10K (TCSVT'2019)
@article{wang2018mask,
title={Mask-pose cascaded cnn for 2d hand pose estimation from single color image},
author={Wang, Yangang and Peng, Cong and Liu, Yebin},
journal={IEEE Transactions on Circuits and Systems for Video Technology},
volume={29},
number={11},
pages={3258--3268},
year={2018},
publisher={IEEE}
}
Results on OneHand10K val set
Arch | Input Size | PCK@0.2 | AUC | EPE | ckpt | log |
---|---|---|---|---|---|---|
deeppose_resnet_50 | 256x256 | 0.990 | 0.486 | 34.28 | ckpt | log |
Deeppose + Resnet on Panoptic2d¶
DeepPose (CVPR'2014)
@inproceedings{toshev2014deeppose,
title={Deeppose: Human pose estimation via deep neural networks},
author={Toshev, Alexander and Szegedy, Christian},
booktitle={Proceedings of the IEEE conference on computer vision and pattern recognition},
pages={1653--1660},
year={2014}
}
ResNet (CVPR'2016)
@inproceedings{he2016deep,
title={Deep residual learning for image recognition},
author={He, Kaiming and Zhang, Xiangyu and Ren, Shaoqing and Sun, Jian},
booktitle={Proceedings of the IEEE conference on computer vision and pattern recognition},
pages={770--778},
year={2016}
}
CMU Panoptic HandDB (CVPR'2017)
@inproceedings{simon2017hand,
title={Hand keypoint detection in single images using multiview bootstrapping},
author={Simon, Tomas and Joo, Hanbyul and Matthews, Iain and Sheikh, Yaser},
booktitle={Proceedings of the IEEE conference on Computer Vision and Pattern Recognition},
pages={1145--1153},
year={2017}
}
Results on CMU Panoptic (MPII+NZSL val set)
Arch | Input Size | PCKh@0.7 | AUC | EPE | ckpt | log |
---|---|---|---|---|---|---|
deeppose_resnet_50 | 256x256 | 0.999 | 0.686 | 9.36 | ckpt | log |
Deeppose + Resnet on Rhd2d¶
DeepPose (CVPR'2014)
@inproceedings{toshev2014deeppose,
title={Deeppose: Human pose estimation via deep neural networks},
author={Toshev, Alexander and Szegedy, Christian},
booktitle={Proceedings of the IEEE conference on computer vision and pattern recognition},
pages={1653--1660},
year={2014}
}
ResNet (CVPR'2016)
@inproceedings{he2016deep,
title={Deep residual learning for image recognition},
author={He, Kaiming and Zhang, Xiangyu and Ren, Shaoqing and Sun, Jian},
booktitle={Proceedings of the IEEE conference on computer vision and pattern recognition},
pages={770--778},
year={2016}
}
RHD (ICCV'2017)
@TechReport{zb2017hand,
author={Christian Zimmermann and Thomas Brox},
title={Learning to Estimate 3D Hand Pose from Single RGB Images},
institution={arXiv:1705.01389},
year={2017},
note="https://arxiv.org/abs/1705.01389",
url="https://lmb.informatik.uni-freiburg.de/projects/hand3d/"
}
Results on RHD test set
Arch | Input Size | PCK@0.2 | AUC | EPE | ckpt | log |
---|---|---|---|---|---|---|
deeppose_resnet_50 | 256x256 | 0.988 | 0.865 | 3.29 | ckpt | log |
RLE (ICCV’2021)¶
Deeppose + Resnet + Rle on Coco¶
DeepPose (CVPR'2014)
@inproceedings{toshev2014deeppose,
title={Deeppose: Human pose estimation via deep neural networks},
author={Toshev, Alexander and Szegedy, Christian},
booktitle={Proceedings of the IEEE conference on computer vision and pattern recognition},
pages={1653--1660},
year={2014}
}
RLE (ICCV'2021)
@inproceedings{li2021human,
title={Human pose regression with residual log-likelihood estimation},
author={Li, Jiefeng and Bian, Siyuan and Zeng, Ailing and Wang, Can and Pang, Bo and Liu, Wentao and Lu, Cewu},
booktitle={Proceedings of the IEEE/CVF International Conference on Computer Vision},
pages={11025--11034},
year={2021}
}
ResNet (CVPR'2016)
@inproceedings{he2016deep,
title={Deep residual learning for image recognition},
author={He, Kaiming and Zhang, Xiangyu and Ren, Shaoqing and Sun, Jian},
booktitle={Proceedings of the IEEE conference on computer vision and pattern recognition},
pages={770--778},
year={2016}
}
COCO (ECCV'2014)
@inproceedings{lin2014microsoft,
title={Microsoft coco: Common objects in context},
author={Lin, Tsung-Yi and Maire, Michael and Belongie, Serge and Hays, James and Perona, Pietro and Ramanan, Deva and Doll{\'a}r, Piotr and Zitnick, C Lawrence},
booktitle={European conference on computer vision},
pages={740--755},
year={2014},
organization={Springer}
}
Results on COCO val2017 with detector having human AP of 56.4 on COCO val2017 dataset
Arch | Input Size | AP | AP50 | AP75 | AR | AR50 | ckpt | log |
---|---|---|---|---|---|---|---|---|
deeppose_resnet_50_rle | 256x192 | 0.704 | 0.883 | 0.777 | 0.751 | 0.920 | ckpt | log |
deeppose_resnet_101_rle | 256x192 | 0.722 | 0.894 | 0.794 | 0.768 | 0.930 | ckpt | log |
deeppose_resnet_152_rle | 256x192 | 0.731 | 0.897 | 0.805 | 0.777 | 0.933 | ckpt | log |
deeppose_resnet_152_rle | 384x288 | 0.749 | 0.901 | 0.815 | 0.793 | 0.935 | ckpt | log |
Deeppose + Resnet + Rle on Mpii¶
DeepPose (CVPR'2014)
@inproceedings{toshev2014deeppose,
title={Deeppose: Human pose estimation via deep neural networks},
author={Toshev, Alexander and Szegedy, Christian},
booktitle={Proceedings of the IEEE conference on computer vision and pattern recognition},
pages={1653--1660},
year={2014}
}
RLE (ICCV'2021)
@inproceedings{li2021human,
title={Human pose regression with residual log-likelihood estimation},
author={Li, Jiefeng and Bian, Siyuan and Zeng, Ailing and Wang, Can and Pang, Bo and Liu, Wentao and Lu, Cewu},
booktitle={Proceedings of the IEEE/CVF International Conference on Computer Vision},
pages={11025--11034},
year={2021}
}
ResNet (CVPR'2016)
@inproceedings{he2016deep,
title={Deep residual learning for image recognition},
author={He, Kaiming and Zhang, Xiangyu and Ren, Shaoqing and Sun, Jian},
booktitle={Proceedings of the IEEE conference on computer vision and pattern recognition},
pages={770--778},
year={2016}
}
MPII (CVPR'2014)
@inproceedings{andriluka14cvpr,
author = {Mykhaylo Andriluka and Leonid Pishchulin and Peter Gehler and Schiele, Bernt},
title = {2D Human Pose Estimation: New Benchmark and State of the Art Analysis},
booktitle = {IEEE Conference on Computer Vision and Pattern Recognition (CVPR)},
year = {2014},
month = {June}
}
Results on MPII val set
Arch | Input Size | Mean | Mean@0.1 | ckpt | log |
---|---|---|---|---|---|
deeppose_resnet_50_rle | 256x256 | 0.860 | 0.263 | ckpt | log |
SoftWingloss (TIP’2021)¶
Deeppose + Resnet + Softwingloss on WFLW¶
DeepPose (CVPR'2014)
@inproceedings{toshev2014deeppose,
title={Deeppose: Human pose estimation via deep neural networks},
author={Toshev, Alexander and Szegedy, Christian},
booktitle={Proceedings of the IEEE conference on computer vision and pattern recognition},
pages={1653--1660},
year={2014}
}
ResNet (CVPR'2016)
@inproceedings{he2016deep,
title={Deep residual learning for image recognition},
author={He, Kaiming and Zhang, Xiangyu and Ren, Shaoqing and Sun, Jian},
booktitle={Proceedings of the IEEE conference on computer vision and pattern recognition},
pages={770--778},
year={2016}
}
SoftWingloss (TIP'2021)
@article{lin2021structure,
title={Structure-Coherent Deep Feature Learning for Robust Face Alignment},
author={Lin, Chunze and Zhu, Beier and Wang, Quan and Liao, Renjie and Qian, Chen and Lu, Jiwen and Zhou, Jie},
journal={IEEE Transactions on Image Processing},
year={2021},
publisher={IEEE}
}
WFLW (CVPR'2018)
@inproceedings{wu2018look,
title={Look at boundary: A boundary-aware face alignment algorithm},
author={Wu, Wayne and Qian, Chen and Yang, Shuo and Wang, Quan and Cai, Yici and Zhou, Qiang},
booktitle={Proceedings of the IEEE conference on computer vision and pattern recognition},
pages={2129--2138},
year={2018}
}
Results on WFLW dataset
The model is trained on WFLW train.
Arch | Input Size | NMEtest | NMEpose | NMEillumination | NMEocclusion | NMEblur | NMEmakeup | NMEexpression | ckpt | log |
---|---|---|---|---|---|---|---|---|---|---|
deeppose_res50_softwingloss | 256x256 | 4.41 | 7.77 | 4.37 | 5.27 | 5.01 | 4.36 | 4.70 | ckpt | log |
VideoPose3D (CVPR’2019)¶
Video Pose Lift + Videopose3d on H36m¶
VideoPose3D (CVPR'2019)
@inproceedings{pavllo20193d,
title={3d human pose estimation in video with temporal convolutions and semi-supervised training},
author={Pavllo, Dario and Feichtenhofer, Christoph and Grangier, David and Auli, Michael},
booktitle={Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition},
pages={7753--7762},
year={2019}
}
Human3.6M (TPAMI'2014)
@article{h36m_pami,
author = {Ionescu, Catalin and Papava, Dragos and Olaru, Vlad and Sminchisescu, Cristian},
title = {Human3.6M: Large Scale Datasets and Predictive Methods for 3D Human Sensing in Natural Environments},
journal = {IEEE Transactions on Pattern Analysis and Machine Intelligence},
publisher = {IEEE Computer Society},
volume = {36},
number = {7},
pages = {1325-1339},
month = {jul},
year = {2014}
}
Results on Human3.6M dataset with ground truth 2D detections, supervised training
Arch | Receptive Field | MPJPE | P-MPJPE | ckpt | log |
---|---|---|---|---|---|
VideoPose3D | 27 | 40.0 | 30.1 | ckpt | log |
VideoPose3D | 81 | 38.9 | 29.2 | ckpt | log |
VideoPose3D | 243 | 37.6 | 28.3 | ckpt | log |
Results on Human3.6M dataset with CPN 2D detections1, supervised training
Arch | Receptive Field | MPJPE | P-MPJPE | ckpt | log |
---|---|---|---|---|---|
VideoPose3D | 1 | 52.9 | 41.3 | ckpt | log |
VideoPose3D | 243 | 47.9 | 38.0 | ckpt | log |
Results on Human3.6M dataset with ground truth 2D detections, semi-supervised training
Training Data | Arch | Receptive Field | MPJPE | P-MPJPE | N-MPJPE | ckpt | log |
---|---|---|---|---|---|---|---|
10% S1 | VideoPose3D | 27 | 58.1 | 42.8 | 54.7 | ckpt | log |
Results on Human3.6M dataset with CPN 2D detections1, semi-supervised training
Training Data | Arch | Receptive Field | MPJPE | P-MPJPE | N-MPJPE | ckpt | log |
---|---|---|---|---|---|---|---|
10% S1 | VideoPose3D | 27 | 67.4 | 50.1 | 63.2 | ckpt | log |
1 CPN 2D detections are provided by official repo. The reformatted version used in this repository can be downloaded from train_detection and test_detection.
Video Pose Lift + Videopose3d on Mpi_inf_3dhp¶
VideoPose3D (CVPR'2019)
@inproceedings{pavllo20193d,
title={3d human pose estimation in video with temporal convolutions and semi-supervised training},
author={Pavllo, Dario and Feichtenhofer, Christoph and Grangier, David and Auli, Michael},
booktitle={Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition},
pages={7753--7762},
year={2019}
}
MPI-INF-3DHP (3DV'2017)
@inproceedings{mono-3dhp2017,
author = {Mehta, Dushyant and Rhodin, Helge and Casas, Dan and Fua, Pascal and Sotnychenko, Oleksandr and Xu, Weipeng and Theobalt, Christian},
title = {Monocular 3D Human Pose Estimation In The Wild Using Improved CNN Supervision},
booktitle = {3D Vision (3DV), 2017 Fifth International Conference on},
url = {http://gvv.mpi-inf.mpg.de/3dhp_dataset},
year = {2017},
organization={IEEE},
doi={10.1109/3dv.2017.00064},
}
Results on MPI-INF-3DHP dataset with ground truth 2D detections, supervised training
Arch | Receptive Field | MPJPE | P-MPJPE | 3DPCK | 3DAUC | ckpt | log |
---|---|---|---|---|---|---|---|
VideoPose3D | 1 | 58.3 | 40.6 | 94.1 | 63.1 | ckpt | log |
Hourglass (ECCV’2016)¶
Topdown Heatmap + Hourglass on Coco¶
Hourglass (ECCV'2016)
@inproceedings{newell2016stacked,
title={Stacked hourglass networks for human pose estimation},
author={Newell, Alejandro and Yang, Kaiyu and Deng, Jia},
booktitle={European conference on computer vision},
pages={483--499},
year={2016},
organization={Springer}
}
COCO (ECCV'2014)
@inproceedings{lin2014microsoft,
title={Microsoft coco: Common objects in context},
author={Lin, Tsung-Yi and Maire, Michael and Belongie, Serge and Hays, James and Perona, Pietro and Ramanan, Deva and Doll{\'a}r, Piotr and Zitnick, C Lawrence},
booktitle={European conference on computer vision},
pages={740--755},
year={2014},
organization={Springer}
}
Results on COCO val2017 with detector having human AP of 56.4 on COCO val2017 dataset
Arch | Input Size | AP | AP50 | AP75 | AR | AR50 | ckpt | log |
---|---|---|---|---|---|---|---|---|
pose_hourglass_52 | 256x256 | 0.726 | 0.896 | 0.799 | 0.780 | 0.934 | ckpt | log |
pose_hourglass_52 | 384x384 | 0.746 | 0.900 | 0.813 | 0.797 | 0.939 | ckpt | log |
Topdown Heatmap + Hourglass on Mpii¶
Hourglass (ECCV'2016)
@inproceedings{newell2016stacked,
title={Stacked hourglass networks for human pose estimation},
author={Newell, Alejandro and Yang, Kaiyu and Deng, Jia},
booktitle={European conference on computer vision},
pages={483--499},
year={2016},
organization={Springer}
}
MPII (CVPR'2014)
@inproceedings{andriluka14cvpr,
author = {Mykhaylo Andriluka and Leonid Pishchulin and Peter Gehler and Schiele, Bernt},
title = {2D Human Pose Estimation: New Benchmark and State of the Art Analysis},
booktitle = {IEEE Conference on Computer Vision and Pattern Recognition (CVPR)},
year = {2014},
month = {June}
}
Results on MPII val set
Arch | Input Size | Mean | Mean@0.1 | ckpt | log |
---|---|---|---|---|---|
pose_hourglass_52 | 256x256 | 0.889 | 0.317 | ckpt | log |
pose_hourglass_52 | 384x384 | 0.894 | 0.366 | ckpt | log |
Topdown Heatmap + Hourglass + Coco + Wholebody on Coco_wholebody_face¶
Hourglass (ECCV'2016)
@inproceedings{newell2016stacked,
title={Stacked hourglass networks for human pose estimation},
author={Newell, Alejandro and Yang, Kaiyu and Deng, Jia},
booktitle={European conference on computer vision},
pages={483--499},
year={2016},
organization={Springer}
}
COCO-WholeBody-Face (ECCV'2020)
@inproceedings{jin2020whole,
title={Whole-Body Human Pose Estimation in the Wild},
author={Jin, Sheng and Xu, Lumin and Xu, Jin and Wang, Can and Liu, Wentao and Qian, Chen and Ouyang, Wanli and Luo, Ping},
booktitle={Proceedings of the European Conference on Computer Vision (ECCV)},
year={2020}
}
Results on COCO-WholeBody-Face val set
Arch | Input Size | NME | ckpt | log |
---|---|---|---|---|
pose_hourglass_52 | 256x256 | 0.0586 | ckpt | log |
Topdown Heatmap + Hourglass + Coco + Wholebody on Coco_wholebody_hand¶
Hourglass (ECCV'2016)
@inproceedings{newell2016stacked,
title={Stacked hourglass networks for human pose estimation},
author={Newell, Alejandro and Yang, Kaiyu and Deng, Jia},
booktitle={European conference on computer vision},
pages={483--499},
year={2016},
organization={Springer}
}
COCO-WholeBody-Hand (ECCV'2020)
@inproceedings{jin2020whole,
title={Whole-Body Human Pose Estimation in the Wild},
author={Jin, Sheng and Xu, Lumin and Xu, Jin and Wang, Can and Liu, Wentao and Qian, Chen and Ouyang, Wanli and Luo, Ping},
booktitle={Proceedings of the European Conference on Computer Vision (ECCV)},
year={2020}
}
Results on COCO-WholeBody-Hand val set
Arch | Input Size | PCK@0.2 | AUC | EPE | ckpt | log |
---|---|---|---|---|---|---|
pose_hourglass_52 | 256x256 | 0.804 | 0.835 | 4.54 | ckpt | log |
LiteHRNet (CVPR’2021)¶
Topdown Heatmap + Litehrnet on Coco¶
LiteHRNet (CVPR'2021)
@inproceedings{Yulitehrnet21,
title={Lite-HRNet: A Lightweight High-Resolution Network},
author={Yu, Changqian and Xiao, Bin and Gao, Changxin and Yuan, Lu and Zhang, Lei and Sang, Nong and Wang, Jingdong},
booktitle={CVPR},
year={2021}
}
COCO (ECCV'2014)
@inproceedings{lin2014microsoft,
title={Microsoft coco: Common objects in context},
author={Lin, Tsung-Yi and Maire, Michael and Belongie, Serge and Hays, James and Perona, Pietro and Ramanan, Deva and Doll{\'a}r, Piotr and Zitnick, C Lawrence},
booktitle={European conference on computer vision},
pages={740--755},
year={2014},
organization={Springer}
}
Results on COCO val2017 with detector having human AP of 56.4 on COCO val2017 dataset
Arch | Input Size | AP | AP50 | AP75 | AR | AR50 | ckpt | log |
---|---|---|---|---|---|---|---|---|
LiteHRNet-18 | 256x192 | 0.643 | 0.868 | 0.720 | 0.706 | 0.912 | ckpt | log |
LiteHRNet-18 | 384x288 | 0.677 | 0.878 | 0.746 | 0.735 | 0.920 | ckpt | log |
LiteHRNet-30 | 256x192 | 0.675 | 0.881 | 0.754 | 0.736 | 0.924 | ckpt | log |
LiteHRNet-30 | 384x288 | 0.700 | 0.884 | 0.776 | 0.758 | 0.928 | ckpt | log |
Topdown Heatmap + Litehrnet on Mpii¶
LiteHRNet (CVPR'2021)
@inproceedings{Yulitehrnet21,
title={Lite-HRNet: A Lightweight High-Resolution Network},
author={Yu, Changqian and Xiao, Bin and Gao, Changxin and Yuan, Lu and Zhang, Lei and Sang, Nong and Wang, Jingdong},
booktitle={CVPR},
year={2021}
}
MPII (CVPR'2014)
@inproceedings{andriluka14cvpr,
author = {Mykhaylo Andriluka and Leonid Pishchulin and Peter Gehler and Schiele, Bernt},
title = {2D Human Pose Estimation: New Benchmark and State of the Art Analysis},
booktitle = {IEEE Conference on Computer Vision and Pattern Recognition (CVPR)},
year = {2014},
month = {June}
}
Results on MPII val set
Arch | Input Size | Mean | Mean@0.1 | ckpt | log |
---|---|---|---|---|---|
LiteHRNet-18 | 256x256 | 0.859 | 0.260 | ckpt | log |
LiteHRNet-30 | 256x256 | 0.869 | 0.271 | ckpt | log |
Topdown Heatmap + Litehrnet + Coco + Wholebody on Coco_wholebody_hand¶
LiteHRNet (CVPR'2021)
@inproceedings{Yulitehrnet21,
title={Lite-HRNet: A Lightweight High-Resolution Network},
author={Yu, Changqian and Xiao, Bin and Gao, Changxin and Yuan, Lu and Zhang, Lei and Sang, Nong and Wang, Jingdong},
booktitle={CVPR},
year={2021}
}
COCO-WholeBody-Hand (ECCV'2020)
@inproceedings{jin2020whole,
title={Whole-Body Human Pose Estimation in the Wild},
author={Jin, Sheng and Xu, Lumin and Xu, Jin and Wang, Can and Liu, Wentao and Qian, Chen and Ouyang, Wanli and Luo, Ping},
booktitle={Proceedings of the European Conference on Computer Vision (ECCV)},
year={2020}
}
Results on COCO-WholeBody-Hand val set
Arch | Input Size | PCK@0.2 | AUC | EPE | ckpt | log |
---|---|---|---|---|---|---|
LiteHRNet-18 | 256x256 | 0.795 | 0.830 | 4.77 | ckpt | log |
AdaptiveWingloss (ICCV’2019)¶
Topdown Heatmap + Hrnetv2 + Awing on WFLW¶
HRNetv2 (TPAMI'2019)
@article{WangSCJDZLMTWLX19,
title={Deep High-Resolution Representation Learning for Visual Recognition},
author={Jingdong Wang and Ke Sun and Tianheng Cheng and
Borui Jiang and Chaorui Deng and Yang Zhao and Dong Liu and Yadong Mu and
Mingkui Tan and Xinggang Wang and Wenyu Liu and Bin Xiao},
journal={TPAMI},
year={2019}
}
AdaptiveWingloss (ICCV'2019)
@inproceedings{wang2019adaptive,
title={Adaptive wing loss for robust face alignment via heatmap regression},
author={Wang, Xinyao and Bo, Liefeng and Fuxin, Li},
booktitle={Proceedings of the IEEE/CVF international conference on computer vision},
pages={6971--6981},
year={2019}
}
WFLW (CVPR'2018)
@inproceedings{wu2018look,
title={Look at boundary: A boundary-aware face alignment algorithm},
author={Wu, Wayne and Qian, Chen and Yang, Shuo and Wang, Quan and Cai, Yici and Zhou, Qiang},
booktitle={Proceedings of the IEEE conference on computer vision and pattern recognition},
pages={2129--2138},
year={2018}
}
Results on WFLW dataset
The model is trained on WFLW train.
Arch | Input Size | NMEtest | NMEpose | NMEillumination | NMEocclusion | NMEblur | NMEmakeup | NMEexpression | ckpt | log |
---|---|---|---|---|---|---|---|---|---|---|
pose_hrnetv2_w18_awing | 256x256 | 4.02 | 6.94 | 3.96 | 4.78 | 4.59 | 3.85 | 4.28 | ckpt | log |
SimpleBaseline2D (ECCV’2018)¶
Topdown Heatmap + Resnet on Animalpose¶
SimpleBaseline2D (ECCV'2018)
@inproceedings{xiao2018simple,
title={Simple baselines for human pose estimation and tracking},
author={Xiao, Bin and Wu, Haiping and Wei, Yichen},
booktitle={Proceedings of the European conference on computer vision (ECCV)},
pages={466--481},
year={2018}
}
Animal-Pose (ICCV'2019)
@InProceedings{Cao_2019_ICCV,
author = {Cao, Jinkun and Tang, Hongyang and Fang, Hao-Shu and Shen, Xiaoyong and Lu, Cewu and Tai, Yu-Wing},
title = {Cross-Domain Adaptation for Animal Pose Estimation},
booktitle = {The IEEE International Conference on Computer Vision (ICCV)},
month = {October},
year = {2019}
}
Results on AnimalPose validation set (1117 instances)
Arch | Input Size | AP | AP50 | AP75 | AR | AR50 | ckpt | log |
---|---|---|---|---|---|---|---|---|
pose_resnet_50 | 256x256 | 0.688 | 0.945 | 0.772 | 0.733 | 0.952 | ckpt | log |
pose_resnet_101 | 256x256 | 0.696 | 0.948 | 0.785 | 0.737 | 0.954 | ckpt | log |
pose_resnet_152 | 256x256 | 0.709 | 0.948 | 0.797 | 0.749 | 0.951 | ckpt | log |
Topdown Heatmap + Resnet on Ap10k¶
SimpleBaseline2D (ECCV'2018)
@inproceedings{xiao2018simple,
title={Simple baselines for human pose estimation and tracking},
author={Xiao, Bin and Wu, Haiping and Wei, Yichen},
booktitle={Proceedings of the European conference on computer vision (ECCV)},
pages={466--481},
year={2018}
}
AP-10K (NeurIPS'2021)
@misc{yu2021ap10k,
title={AP-10K: A Benchmark for Animal Pose Estimation in the Wild},
author={Hang Yu and Yufei Xu and Jing Zhang and Wei Zhao and Ziyu Guan and Dacheng Tao},
year={2021},
eprint={2108.12617},
archivePrefix={arXiv},
primaryClass={cs.CV}
}
Results on AP-10K validation set
Arch | Input Size | AP | AP50 | AP75 | APM | APL | ckpt | log |
---|---|---|---|---|---|---|---|---|
pose_resnet_50 | 256x256 | 0.681 | 0.923 | 0.740 | 0.510 | 0.688 | ckpt | log |
pose_resnet_101 | 256x256 | 0.681 | 0.922 | 0.742 | 0.534 | 0.688 | ckpt | log |
Topdown Heatmap + Resnet on Atrw¶
SimpleBaseline2D (ECCV'2018)
@inproceedings{xiao2018simple,
title={Simple baselines for human pose estimation and tracking},
author={Xiao, Bin and Wu, Haiping and Wei, Yichen},
booktitle={Proceedings of the European conference on computer vision (ECCV)},
pages={466--481},
year={2018}
}
ATRW (ACM MM'2020)
@inproceedings{li2020atrw,
title={ATRW: A Benchmark for Amur Tiger Re-identification in the Wild},
author={Li, Shuyuan and Li, Jianguo and Tang, Hanlin and Qian, Rui and Lin, Weiyao},
booktitle={Proceedings of the 28th ACM International Conference on Multimedia},
pages={2590--2598},
year={2020}
}
Results on ATRW validation set
Arch | Input Size | AP | AP50 | AP75 | AR | AR50 | ckpt | log |
---|---|---|---|---|---|---|---|---|
pose_resnet_50 | 256x256 | 0.900 | 0.973 | 0.932 | 0.929 | 0.985 | ckpt | log |
pose_resnet_101 | 256x256 | 0.898 | 0.973 | 0.936 | 0.927 | 0.985 | ckpt | log |
pose_resnet_152 | 256x256 | 0.896 | 0.973 | 0.931 | 0.927 | 0.985 | ckpt | log |
Topdown Heatmap + Resnet on Fly¶
SimpleBaseline2D (ECCV'2018)
@inproceedings{xiao2018simple,
title={Simple baselines for human pose estimation and tracking},
author={Xiao, Bin and Wu, Haiping and Wei, Yichen},
booktitle={Proceedings of the European conference on computer vision (ECCV)},
pages={466--481},
year={2018}
}
Vinegar Fly (Nature Methods'2019)
@article{pereira2019fast,
title={Fast animal pose estimation using deep neural networks},
author={Pereira, Talmo D and Aldarondo, Diego E and Willmore, Lindsay and Kislin, Mikhail and Wang, Samuel S-H and Murthy, Mala and Shaevitz, Joshua W},
journal={Nature methods},
volume={16},
number={1},
pages={117--125},
year={2019},
publisher={Nature Publishing Group}
}
Results on Vinegar Fly test set
Arch | Input Size | PCK@0.2 | AUC | EPE | ckpt | log |
---|---|---|---|---|---|---|
pose_resnet_50 | 192x192 | 0.996 | 0.910 | 2.00 | ckpt | log |
pose_resnet_101 | 192x192 | 0.996 | 0.912 | 1.95 | ckpt | log |
pose_resnet_152 | 192x192 | 0.997 | 0.917 | 1.78 | ckpt | log |
Topdown Heatmap + Resnet on Horse10¶
SimpleBaseline2D (ECCV'2018)
@inproceedings{xiao2018simple,
title={Simple baselines for human pose estimation and tracking},
author={Xiao, Bin and Wu, Haiping and Wei, Yichen},
booktitle={Proceedings of the European conference on computer vision (ECCV)},
pages={466--481},
year={2018}
}
Horse-10 (WACV'2021)
@inproceedings{mathis2021pretraining,
title={Pretraining boosts out-of-domain robustness for pose estimation},
author={Mathis, Alexander and Biasi, Thomas and Schneider, Steffen and Yuksekgonul, Mert and Rogers, Byron and Bethge, Matthias and Mathis, Mackenzie W},
booktitle={Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision},
pages={1859--1868},
year={2021}
}
Results on Horse-10 test set
Set | Arch | Input Size | PCK@0.3 | NME | ckpt | log |
---|---|---|---|---|---|---|
split1 | pose_resnet_50 | 256x256 | 0.956 | 0.113 | ckpt | log |
split2 | pose_resnet_50 | 256x256 | 0.954 | 0.111 | ckpt | log |
split3 | pose_resnet_50 | 256x256 | 0.946 | 0.129 | ckpt | log |
split1 | pose_resnet_101 | 256x256 | 0.958 | 0.115 | ckpt | log |
split2 | pose_resnet_101 | 256x256 | 0.955 | 0.115 | ckpt | log |
split3 | pose_resnet_101 | 256x256 | 0.946 | 0.126 | ckpt | log |
split1 | pose_resnet_152 | 256x256 | 0.969 | 0.105 | ckpt | log |
split2 | pose_resnet_152 | 256x256 | 0.970 | 0.103 | ckpt | log |
split3 | pose_resnet_152 | 256x256 | 0.957 | 0.131 | ckpt | log |
Topdown Heatmap + Resnet on Locust¶
SimpleBaseline2D (ECCV'2018)
@inproceedings{xiao2018simple,
title={Simple baselines for human pose estimation and tracking},
author={Xiao, Bin and Wu, Haiping and Wei, Yichen},
booktitle={Proceedings of the European conference on computer vision (ECCV)},
pages={466--481},
year={2018}
}
Desert Locust (Elife'2019)
@article{graving2019deepposekit,
title={DeepPoseKit, a software toolkit for fast and robust animal pose estimation using deep learning},
author={Graving, Jacob M and Chae, Daniel and Naik, Hemal and Li, Liang and Koger, Benjamin and Costelloe, Blair R and Couzin, Iain D},
journal={Elife},
volume={8},
pages={e47994},
year={2019},
publisher={eLife Sciences Publications Limited}
}
Results on Desert Locust test set
Arch | Input Size | PCK@0.2 | AUC | EPE | ckpt | log |
---|---|---|---|---|---|---|
pose_resnet_50 | 160x160 | 0.999 | 0.899 | 2.27 | ckpt | log |
pose_resnet_101 | 160x160 | 0.999 | 0.907 | 2.03 | ckpt | log |
pose_resnet_152 | 160x160 | 1.000 | 0.926 | 1.48 | ckpt | log |
Topdown Heatmap + Resnet on Macaque¶
SimpleBaseline2D (ECCV'2018)
@inproceedings{xiao2018simple,
title={Simple baselines for human pose estimation and tracking},
author={Xiao, Bin and Wu, Haiping and Wei, Yichen},
booktitle={Proceedings of the European conference on computer vision (ECCV)},
pages={466--481},
year={2018}
}
MacaquePose (bioRxiv'2020)
@article{labuguen2020macaquepose,
title={MacaquePose: A novel ‘in the wild’macaque monkey pose dataset for markerless motion capture},
author={Labuguen, Rollyn and Matsumoto, Jumpei and Negrete, Salvador and Nishimaru, Hiroshi and Nishijo, Hisao and Takada, Masahiko and Go, Yasuhiro and Inoue, Ken-ichi and Shibata, Tomohiro},
journal={bioRxiv},
year={2020},
publisher={Cold Spring Harbor Laboratory}
}
Results on MacaquePose with ground-truth detection bounding boxes
Arch | Input Size | AP | AP50 | AP75 | AR | AR50 | ckpt | log |
---|---|---|---|---|---|---|---|---|
pose_resnet_50 | 256x192 | 0.799 | 0.952 | 0.919 | 0.837 | 0.964 | ckpt | log |
pose_resnet_101 | 256x192 | 0.790 | 0.953 | 0.908 | 0.828 | 0.967 | ckpt | log |
pose_resnet_152 | 256x192 | 0.794 | 0.951 | 0.915 | 0.834 | 0.968 | ckpt | log |
Topdown Heatmap + Resnet on Zebra¶
SimpleBaseline2D (ECCV'2018)
@inproceedings{xiao2018simple,
title={Simple baselines for human pose estimation and tracking},
author={Xiao, Bin and Wu, Haiping and Wei, Yichen},
booktitle={Proceedings of the European conference on computer vision (ECCV)},
pages={466--481},
year={2018}
}
Grévy’s Zebra (Elife'2019)
@article{graving2019deepposekit,
title={DeepPoseKit, a software toolkit for fast and robust animal pose estimation using deep learning},
author={Graving, Jacob M and Chae, Daniel and Naik, Hemal and Li, Liang and Koger, Benjamin and Costelloe, Blair R and Couzin, Iain D},
journal={Elife},
volume={8},
pages={e47994},
year={2019},
publisher={eLife Sciences Publications Limited}
}
Results on Grévy’s Zebra test set
Arch | Input Size | PCK@0.2 | AUC | EPE | ckpt | log |
---|---|---|---|---|---|---|
pose_resnet_50 | 160x160 | 1.000 | 0.914 | 1.86 | ckpt | log |
pose_resnet_101 | 160x160 | 1.000 | 0.916 | 1.82 | ckpt | log |
pose_resnet_152 | 160x160 | 1.000 | 0.921 | 1.66 | ckpt | log |
Topdown Heatmap + Resnet on Aic¶
SimpleBaseline2D (ECCV'2018)
@inproceedings{xiao2018simple,
title={Simple baselines for human pose estimation and tracking},
author={Xiao, Bin and Wu, Haiping and Wei, Yichen},
booktitle={Proceedings of the European conference on computer vision (ECCV)},
pages={466--481},
year={2018}
}
ResNet (CVPR'2016)
@inproceedings{he2016deep,
title={Deep residual learning for image recognition},
author={He, Kaiming and Zhang, Xiangyu and Ren, Shaoqing and Sun, Jian},
booktitle={Proceedings of the IEEE conference on computer vision and pattern recognition},
pages={770--778},
year={2016}
}
AI Challenger (ArXiv'2017)
@article{wu2017ai,
title={Ai challenger: A large-scale dataset for going deeper in image understanding},
author={Wu, Jiahong and Zheng, He and Zhao, Bo and Li, Yixin and Yan, Baoming and Liang, Rui and Wang, Wenjia and Zhou, Shipei and Lin, Guosen and Fu, Yanwei and others},
journal={arXiv preprint arXiv:1711.06475},
year={2017}
}
Results on AIC val set with ground-truth bounding boxes
Arch | Input Size | AP | AP50 | AP75 | AR | AR50 | ckpt | log |
---|---|---|---|---|---|---|---|---|
pose_resnet_101 | 256x192 | 0.294 | 0.736 | 0.174 | 0.337 | 0.763 | ckpt | log |
Topdown Heatmap + Resnet + Fp16 on Coco¶
SimpleBaseline2D (ECCV'2018)
@inproceedings{xiao2018simple,
title={Simple baselines for human pose estimation and tracking},
author={Xiao, Bin and Wu, Haiping and Wei, Yichen},
booktitle={Proceedings of the European conference on computer vision (ECCV)},
pages={466--481},
year={2018}
}
ResNet (CVPR'2016)
@inproceedings{he2016deep,
title={Deep residual learning for image recognition},
author={He, Kaiming and Zhang, Xiangyu and Ren, Shaoqing and Sun, Jian},
booktitle={Proceedings of the IEEE conference on computer vision and pattern recognition},
pages={770--778},
year={2016}
}
FP16 (ArXiv'2017)
@article{micikevicius2017mixed,
title={Mixed precision training},
author={Micikevicius, Paulius and Narang, Sharan and Alben, Jonah and Diamos, Gregory and Elsen, Erich and Garcia, David and Ginsburg, Boris and Houston, Michael and Kuchaiev, Oleksii and Venkatesh, Ganesh and others},
journal={arXiv preprint arXiv:1710.03740},
year={2017}
}
COCO (ECCV'2014)
@inproceedings{lin2014microsoft,
title={Microsoft coco: Common objects in context},
author={Lin, Tsung-Yi and Maire, Michael and Belongie, Serge and Hays, James and Perona, Pietro and Ramanan, Deva and Doll{\'a}r, Piotr and Zitnick, C Lawrence},
booktitle={European conference on computer vision},
pages={740--755},
year={2014},
organization={Springer}
}
Results on COCO val2017 with detector having human AP of 56.4 on COCO val2017 dataset
Arch | Input Size | AP | AP50 | AP75 | AR | AR50 | ckpt | log |
---|---|---|---|---|---|---|---|---|
pose_resnet_50_fp16 | 256x192 | 0.717 | 0.898 | 0.793 | 0.772 | 0.936 | ckpt | log |
Topdown Heatmap + Resnet + Dark on Coco¶
SimpleBaseline2D (ECCV'2018)
@inproceedings{xiao2018simple,
title={Simple baselines for human pose estimation and tracking},
author={Xiao, Bin and Wu, Haiping and Wei, Yichen},
booktitle={Proceedings of the European conference on computer vision (ECCV)},
pages={466--481},
year={2018}
}
ResNet (CVPR'2016)
@inproceedings{he2016deep,
title={Deep residual learning for image recognition},
author={He, Kaiming and Zhang, Xiangyu and Ren, Shaoqing and Sun, Jian},
booktitle={Proceedings of the IEEE conference on computer vision and pattern recognition},
pages={770--778},
year={2016}
}
DarkPose (CVPR'2020)
@inproceedings{zhang2020distribution,
title={Distribution-aware coordinate representation for human pose estimation},
author={Zhang, Feng and Zhu, Xiatian and Dai, Hanbin and Ye, Mao and Zhu, Ce},
booktitle={Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition},
pages={7093--7102},
year={2020}
}
COCO (ECCV'2014)
@inproceedings{lin2014microsoft,
title={Microsoft coco: Common objects in context},
author={Lin, Tsung-Yi and Maire, Michael and Belongie, Serge and Hays, James and Perona, Pietro and Ramanan, Deva and Doll{\'a}r, Piotr and Zitnick, C Lawrence},
booktitle={European conference on computer vision},
pages={740--755},
year={2014},
organization={Springer}
}
Results on COCO val2017 with detector having human AP of 56.4 on COCO val2017 dataset
Arch | Input Size | AP | AP50 | AP75 | AR | AR50 | ckpt | log |
---|---|---|---|---|---|---|---|---|
pose_resnet_50_dark | 256x192 | 0.724 | 0.898 | 0.800 | 0.777 | 0.936 | ckpt | log |
pose_resnet_50_dark | 384x288 | 0.735 | 0.900 | 0.801 | 0.785 | 0.937 | ckpt | log |
pose_resnet_101_dark | 256x192 | 0.732 | 0.899 | 0.808 | 0.786 | 0.938 | ckpt | log |
pose_resnet_101_dark | 384x288 | 0.749 | 0.902 | 0.816 | 0.799 | 0.939 | ckpt | log |
pose_resnet_152_dark | 256x192 | 0.745 | 0.905 | 0.821 | 0.797 | 0.942 | ckpt | log |
pose_resnet_152_dark | 384x288 | 0.757 | 0.909 | 0.826 | 0.806 | 0.943 | ckpt | log |
Topdown Heatmap + Swin on Coco¶
SimpleBaseline2D (ECCV'2018)
@inproceedings{xiao2018simple,
title={Simple baselines for human pose estimation and tracking},
author={Xiao, Bin and Wu, Haiping and Wei, Yichen},
booktitle={Proceedings of the European conference on computer vision (ECCV)},
pages={466--481},
year={2018}
}
Swin (ICCV'2021)
@inproceedings{liu2021swin,
title={Swin transformer: Hierarchical vision transformer using shifted windows},
author={Liu, Ze and Lin, Yutong and Cao, Yue and Hu, Han and Wei, Yixuan and Zhang, Zheng and Lin, Stephen and Guo, Baining},
booktitle={Proceedings of the IEEE/CVF International Conference on Computer Vision},
pages={10012--10022},
year={2021}
}
FPN (CVPR'2017)
@inproceedings{lin2017feature,
title={Feature pyramid networks for object detection},
author={Lin, Tsung-Yi and Doll{\'a}r, Piotr and Girshick, Ross and He, Kaiming and Hariharan, Bharath and Belongie, Serge},
booktitle={Proceedings of the IEEE conference on computer vision and pattern recognition},
pages={2117--2125},
year={2017}
}
COCO (ECCV'2014)
@inproceedings{lin2014microsoft,
title={Microsoft coco: Common objects in context},
author={Lin, Tsung-Yi and Maire, Michael and Belongie, Serge and Hays, James and Perona, Pietro and Ramanan, Deva and Doll{\'a}r, Piotr and Zitnick, C Lawrence},
booktitle={European conference on computer vision},
pages={740--755},
year={2014},
organization={Springer}
}
Results on COCO val2017 with detector having human AP of 56.4 on COCO val2017 dataset
Arch | Input Size | AP | AP50 | AP75 | AR | AR50 | ckpt | log |
---|---|---|---|---|---|---|---|---|
pose_swin_t | 256x192 | 0.724 | 0.901 | 0.806 | 0.782 | 0.940 | ckpt | log |
pose_swin_b | 256x192 | 0.737 | 0.904 | 0.820 | 0.798 | 0.946 | ckpt | log |
pose_swin_b | 384x288 | 0.759 | 0.910 | 0.832 | 0.811 | 0.946 | ckpt | log |
pose_swin_l | 256x192 | 0.743 | 0.906 | 0.821 | 0.798 | 0.943 | ckpt | log |
pose_swin_l | 384x288 | 0.763 | 0.912 | 0.830 | 0.814 | 0.949 | ckpt | log |
pose_swin_b_fpn | 256x192 | 0.741 | 0.907 | 0.821 | 0.798 | 0.946 | ckpt | log |
Topdown Heatmap + Resnet on Coco¶
SimpleBaseline2D (ECCV'2018)
@inproceedings{xiao2018simple,
title={Simple baselines for human pose estimation and tracking},
author={Xiao, Bin and Wu, Haiping and Wei, Yichen},
booktitle={Proceedings of the European conference on computer vision (ECCV)},
pages={466--481},
year={2018}
}
ResNet (CVPR'2016)
@inproceedings{he2016deep,
title={Deep residual learning for image recognition},
author={He, Kaiming and Zhang, Xiangyu and Ren, Shaoqing and Sun, Jian},
booktitle={Proceedings of the IEEE conference on computer vision and pattern recognition},
pages={770--778},
year={2016}
}
COCO (ECCV'2014)
@inproceedings{lin2014microsoft,
title={Microsoft coco: Common objects in context},
author={Lin, Tsung-Yi and Maire, Michael and Belongie, Serge and Hays, James and Perona, Pietro and Ramanan, Deva and Doll{\'a}r, Piotr and Zitnick, C Lawrence},
booktitle={European conference on computer vision},
pages={740--755},
year={2014},
organization={Springer}
}
Results on COCO val2017 with detector having human AP of 56.4 on COCO val2017 dataset
Arch | Input Size | AP | AP50 | AP75 | AR | AR50 | ckpt | log |
---|---|---|---|---|---|---|---|---|
pose_resnet_50 | 256x192 | 0.718 | 0.898 | 0.795 | 0.773 | 0.937 | ckpt | log |
pose_resnet_50 | 384x288 | 0.731 | 0.900 | 0.799 | 0.783 | 0.931 | ckpt | log |
pose_resnet_101 | 256x192 | 0.726 | 0.899 | 0.806 | 0.781 | 0.939 | ckpt | log |
pose_resnet_101 | 384x288 | 0.748 | 0.905 | 0.817 | 0.798 | 0.940 | ckpt | log |
pose_resnet_152 | 256x192 | 0.735 | 0.905 | 0.812 | 0.790 | 0.943 | ckpt | log |
pose_resnet_152 | 384x288 | 0.750 | 0.908 | 0.821 | 0.800 | 0.942 | ckpt | log |
Topdown Heatmap + Resnet on Crowdpose¶
SimpleBaseline2D (ECCV'2018)
@inproceedings{xiao2018simple,
title={Simple baselines for human pose estimation and tracking},
author={Xiao, Bin and Wu, Haiping and Wei, Yichen},
booktitle={Proceedings of the European conference on computer vision (ECCV)},
pages={466--481},
year={2018}
}
ResNet (CVPR'2016)
@inproceedings{he2016deep,
title={Deep residual learning for image recognition},
author={He, Kaiming and Zhang, Xiangyu and Ren, Shaoqing and Sun, Jian},
booktitle={Proceedings of the IEEE conference on computer vision and pattern recognition},
pages={770--778},
year={2016}
}
CrowdPose (CVPR'2019)
@article{li2018crowdpose,
title={CrowdPose: Efficient Crowded Scenes Pose Estimation and A New Benchmark},
author={Li, Jiefeng and Wang, Can and Zhu, Hao and Mao, Yihuan and Fang, Hao-Shu and Lu, Cewu},
journal={arXiv preprint arXiv:1812.00324},
year={2018}
}
Results on CrowdPose test with YOLOv3 human detector
Arch | Input Size | AP | AP50 | AP75 | AP (E) | AP (M) | AP (H) | ckpt | log |
---|---|---|---|---|---|---|---|---|---|
pose_resnet_50 | 256x192 | 0.637 | 0.808 | 0.692 | 0.739 | 0.650 | 0.506 | ckpt | log |
pose_resnet_101 | 256x192 | 0.647 | 0.810 | 0.703 | 0.744 | 0.658 | 0.522 | ckpt | log |
pose_resnet_101 | 320x256 | 0.661 | 0.821 | 0.714 | 0.759 | 0.671 | 0.536 | ckpt | log |
pose_resnet_152 | 256x192 | 0.656 | 0.818 | 0.712 | 0.754 | 0.666 | 0.532 | ckpt | log |
Topdown Heatmap + Resnet on JHMDB¶
SimpleBaseline2D (ECCV'2018)
@inproceedings{xiao2018simple,
title={Simple baselines for human pose estimation and tracking},
author={Xiao, Bin and Wu, Haiping and Wei, Yichen},
booktitle={Proceedings of the European conference on computer vision (ECCV)},
pages={466--481},
year={2018}
}
ResNet (CVPR'2016)
@inproceedings{he2016deep,
title={Deep residual learning for image recognition},
author={He, Kaiming and Zhang, Xiangyu and Ren, Shaoqing and Sun, Jian},
booktitle={Proceedings of the IEEE conference on computer vision and pattern recognition},
pages={770--778},
year={2016}
}
JHMDB (ICCV'2013)
@inproceedings{Jhuang:ICCV:2013,
title = {Towards understanding action recognition},
author = {H. Jhuang and J. Gall and S. Zuffi and C. Schmid and M. J. Black},
booktitle = {International Conf. on Computer Vision (ICCV)},
month = Dec,
pages = {3192-3199},
year = {2013}
}
Results on Sub-JHMDB dataset
The models are pre-trained on MPII dataset only. NO test-time augmentation (multi-scale /rotation testing) is used.
Normalized by Person Size
Split | Arch | Input Size | Head | Sho | Elb | Wri | Hip | Knee | Ank | Mean | ckpt | log |
---|---|---|---|---|---|---|---|---|---|---|---|---|
Sub1 | pose_resnet_50 | 256x256 | 99.1 | 98.0 | 93.8 | 91.3 | 99.4 | 96.5 | 92.8 | 96.1 | ckpt | log |
Sub2 | pose_resnet_50 | 256x256 | 99.3 | 97.1 | 90.6 | 87.0 | 98.9 | 96.3 | 94.1 | 95.0 | ckpt | log |
Sub3 | pose_resnet_50 | 256x256 | 99.0 | 97.9 | 94.0 | 91.6 | 99.7 | 98.0 | 94.7 | 96.7 | ckpt | log |
Average | pose_resnet_50 | 256x256 | 99.2 | 97.7 | 92.8 | 90.0 | 99.3 | 96.9 | 93.9 | 96.0 | - | - |
Sub1 | pose_resnet_50 (2 Deconv.) | 256x256 | 99.1 | 98.5 | 94.6 | 92.0 | 99.4 | 94.6 | 92.5 | 96.1 | ckpt | log |
Sub2 | pose_resnet_50 (2 Deconv.) | 256x256 | 99.3 | 97.8 | 91.0 | 87.0 | 99.1 | 96.5 | 93.8 | 95.2 | ckpt | log |
Sub3 | pose_resnet_50 (2 Deconv.) | 256x256 | 98.8 | 98.4 | 94.3 | 92.1 | 99.8 | 97.5 | 93.8 | 96.7 | ckpt | log |
Average | pose_resnet_50 (2 Deconv.) | 256x256 | 99.1 | 98.2 | 93.3 | 90.4 | 99.4 | 96.2 | 93.4 | 96.0 | - | - |
Normalized by Torso Size
Split | Arch | Input Size | Head | Sho | Elb | Wri | Hip | Knee | Ank | Mean | ckpt | log |
---|---|---|---|---|---|---|---|---|---|---|---|---|
Sub1 | pose_resnet_50 | 256x256 | 93.3 | 83.2 | 74.4 | 72.7 | 85.0 | 81.2 | 78.9 | 81.9 | ckpt | log |
Sub2 | pose_resnet_50 | 256x256 | 94.1 | 74.9 | 64.5 | 62.5 | 77.9 | 71.9 | 78.6 | 75.5 | ckpt | log |
Sub3 | pose_resnet_50 | 256x256 | 97.0 | 82.2 | 74.9 | 70.7 | 84.7 | 83.7 | 84.2 | 82.9 | ckpt | log |
Average | pose_resnet_50 | 256x256 | 94.8 | 80.1 | 71.3 | 68.6 | 82.5 | 78.9 | 80.6 | 80.1 | - | - |
Sub1 | pose_resnet_50 (2 Deconv.) | 256x256 | 92.4 | 80.6 | 73.2 | 70.5 | 82.3 | 75.4 | 75.0 | 79.2 | ckpt | log |
Sub2 | pose_resnet_50 (2 Deconv.) | 256x256 | 93.4 | 73.6 | 63.8 | 60.5 | 75.1 | 68.4 | 75.5 | 73.7 | ckpt | log |
Sub3 | pose_resnet_50 (2 Deconv.) | 256x256 | 96.1 | 81.2 | 72.6 | 67.9 | 83.6 | 80.9 | 81.5 | 81.2 | ckpt | log |
Average | pose_resnet_50 (2 Deconv.) | 256x256 | 94.0 | 78.5 | 69.9 | 66.3 | 80.3 | 74.9 | 77.3 | 78.0 | - | - |
Topdown Heatmap + Resnet on MHP¶
SimpleBaseline2D (ECCV'2018)
@inproceedings{xiao2018simple,
title={Simple baselines for human pose estimation and tracking},
author={Xiao, Bin and Wu, Haiping and Wei, Yichen},
booktitle={Proceedings of the European conference on computer vision (ECCV)},
pages={466--481},
year={2018}
}
ResNet (CVPR'2016)
@inproceedings{he2016deep,
title={Deep residual learning for image recognition},
author={He, Kaiming and Zhang, Xiangyu and Ren, Shaoqing and Sun, Jian},
booktitle={Proceedings of the IEEE conference on computer vision and pattern recognition},
pages={770--778},
year={2016}
}
MHP (ACM MM'2018)
@inproceedings{zhao2018understanding,
title={Understanding humans in crowded scenes: Deep nested adversarial learning and a new benchmark for multi-human parsing},
author={Zhao, Jian and Li, Jianshu and Cheng, Yu and Sim, Terence and Yan, Shuicheng and Feng, Jiashi},
booktitle={Proceedings of the 26th ACM international conference on Multimedia},
pages={792--800},
year={2018}
}
Results on MHP v2.0 val set
Arch | Input Size | AP | AP50 | AP75 | AR | AR50 | ckpt | log |
---|---|---|---|---|---|---|---|---|
pose_resnet_101 | 256x192 | 0.583 | 0.897 | 0.669 | 0.636 | 0.918 | ckpt | log |
Note that, the evaluation metric used here is mAP (adapted from COCO), which may be different from the official evaluation codes. Please be cautious if you use the results in papers.
Topdown Heatmap + Resnet on Mpii¶
SimpleBaseline2D (ECCV'2018)
@inproceedings{xiao2018simple,
title={Simple baselines for human pose estimation and tracking},
author={Xiao, Bin and Wu, Haiping and Wei, Yichen},
booktitle={Proceedings of the European conference on computer vision (ECCV)},
pages={466--481},
year={2018}
}
ResNet (CVPR'2016)
@inproceedings{he2016deep,
title={Deep residual learning for image recognition},
author={He, Kaiming and Zhang, Xiangyu and Ren, Shaoqing and Sun, Jian},
booktitle={Proceedings of the IEEE conference on computer vision and pattern recognition},
pages={770--778},
year={2016}
}
MPII (CVPR'2014)
@inproceedings{andriluka14cvpr,
author = {Mykhaylo Andriluka and Leonid Pishchulin and Peter Gehler and Schiele, Bernt},
title = {2D Human Pose Estimation: New Benchmark and State of the Art Analysis},
booktitle = {IEEE Conference on Computer Vision and Pattern Recognition (CVPR)},
year = {2014},
month = {June}
}
Results on MPII val set
Arch | Input Size | Mean | Mean@0.1 | ckpt | log |
---|---|---|---|---|---|
pose_resnet_50 | 256x256 | 0.882 | 0.286 | ckpt | log |
pose_resnet_101 | 256x256 | 0.888 | 0.290 | ckpt | log |
pose_resnet_152 | 256x256 | 0.889 | 0.303 | ckpt | log |
Topdown Heatmap + Resnet + Mpii on Mpii_trb¶
SimpleBaseline2D (ECCV'2018)
@inproceedings{xiao2018simple,
title={Simple baselines for human pose estimation and tracking},
author={Xiao, Bin and Wu, Haiping and Wei, Yichen},
booktitle={Proceedings of the European conference on computer vision (ECCV)},
pages={466--481},
year={2018}
}
ResNet (CVPR'2016)
@inproceedings{he2016deep,
title={Deep residual learning for image recognition},
author={He, Kaiming and Zhang, Xiangyu and Ren, Shaoqing and Sun, Jian},
booktitle={Proceedings of the IEEE conference on computer vision and pattern recognition},
pages={770--778},
year={2016}
}
MPII-TRB (ICCV'2019)
@inproceedings{duan2019trb,
title={TRB: A Novel Triplet Representation for Understanding 2D Human Body},
author={Duan, Haodong and Lin, Kwan-Yee and Jin, Sheng and Liu, Wentao and Qian, Chen and Ouyang, Wanli},
booktitle={Proceedings of the IEEE International Conference on Computer Vision},
pages={9479--9488},
year={2019}
}
Results on MPII-TRB val set
Arch | Input Size | Skeleton Acc | Contour Acc | Mean Acc | ckpt | log |
---|---|---|---|---|---|---|
pose_resnet_50 | 256x256 | 0.887 | 0.858 | 0.868 | ckpt | log |
pose_resnet_101 | 256x256 | 0.890 | 0.863 | 0.873 | ckpt | log |
pose_resnet_152 | 256x256 | 0.897 | 0.868 | 0.879 | ckpt | log |
Topdown Heatmap + Resnet on Ochuman¶
SimpleBaseline2D (ECCV'2018)
@inproceedings{xiao2018simple,
title={Simple baselines for human pose estimation and tracking},
author={Xiao, Bin and Wu, Haiping and Wei, Yichen},
booktitle={Proceedings of the European conference on computer vision (ECCV)},
pages={466--481},
year={2018}
}
ResNet (CVPR'2016)
@inproceedings{he2016deep,
title={Deep residual learning for image recognition},
author={He, Kaiming and Zhang, Xiangyu and Ren, Shaoqing and Sun, Jian},
booktitle={Proceedings of the IEEE conference on computer vision and pattern recognition},
pages={770--778},
year={2016}
}
OCHuman (CVPR'2019)
@inproceedings{zhang2019pose2seg,
title={Pose2seg: Detection free human instance segmentation},
author={Zhang, Song-Hai and Li, Ruilong and Dong, Xin and Rosin, Paul and Cai, Zixi and Han, Xi and Yang, Dingcheng and Huang, Haozhi and Hu, Shi-Min},
booktitle={Proceedings of the IEEE conference on computer vision and pattern recognition},
pages={889--898},
year={2019}
}
Results on OCHuman test dataset with ground-truth bounding boxes
Following the common setting, the models are trained on COCO train dataset, and evaluate on OCHuman dataset.
Arch | Input Size | AP | AP50 | AP75 | AR | AR50 | ckpt | log |
---|---|---|---|---|---|---|---|---|
pose_resnet_50 | 256x192 | 0.546 | 0.726 | 0.593 | 0.592 | 0.755 | ckpt | log |
pose_resnet_50 | 384x288 | 0.539 | 0.723 | 0.574 | 0.588 | 0.756 | ckpt | log |
pose_resnet_101 | 256x192 | 0.559 | 0.724 | 0.606 | 0.605 | 0.751 | ckpt | log |
pose_resnet_101 | 384x288 | 0.571 | 0.715 | 0.615 | 0.615 | 0.748 | ckpt | log |
pose_resnet_152 | 256x192 | 0.570 | 0.725 | 0.617 | 0.616 | 0.754 | ckpt | log |
pose_resnet_152 | 384x288 | 0.582 | 0.723 | 0.627 | 0.627 | 0.752 | ckpt | log |
Topdown Heatmap + Resnet on Posetrack18¶
SimpleBaseline2D (ECCV'2018)
@inproceedings{xiao2018simple,
title={Simple baselines for human pose estimation and tracking},
author={Xiao, Bin and Wu, Haiping and Wei, Yichen},
booktitle={Proceedings of the European conference on computer vision (ECCV)},
pages={466--481},
year={2018}
}
ResNet (CVPR'2016)
@inproceedings{he2016deep,
title={Deep residual learning for image recognition},
author={He, Kaiming and Zhang, Xiangyu and Ren, Shaoqing and Sun, Jian},
booktitle={Proceedings of the IEEE conference on computer vision and pattern recognition},
pages={770--778},
year={2016}
}
PoseTrack18 (CVPR'2018)
@inproceedings{andriluka2018posetrack,
title={Posetrack: A benchmark for human pose estimation and tracking},
author={Andriluka, Mykhaylo and Iqbal, Umar and Insafutdinov, Eldar and Pishchulin, Leonid and Milan, Anton and Gall, Juergen and Schiele, Bernt},
booktitle={Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition},
pages={5167--5176},
year={2018}
}
Results on PoseTrack2018 val with ground-truth bounding boxes
Arch | Input Size | Head | Shou | Elb | Wri | Hip | Knee | Ankl | Total | ckpt | log |
---|---|---|---|---|---|---|---|---|---|---|---|
pose_resnet_50 | 256x192 | 86.5 | 87.5 | 82.3 | 75.6 | 79.9 | 78.6 | 74.0 | 81.0 | ckpt | log |
The models are first pre-trained on COCO dataset, and then fine-tuned on PoseTrack18.
Results on PoseTrack2018 val with MMDetection pre-trained Cascade R-CNN (X-101-64x4d-FPN) human detector
Arch | Input Size | Head | Shou | Elb | Wri | Hip | Knee | Ankl | Total | ckpt | log |
---|---|---|---|---|---|---|---|---|---|---|---|
pose_resnet_50 | 256x192 | 78.9 | 81.9 | 77.8 | 70.8 | 75.3 | 73.2 | 66.4 | 75.2 | ckpt | log |
The models are first pre-trained on COCO dataset, and then fine-tuned on PoseTrack18.
Topdown Heatmap + Resnet + Coco + Wholebody on Coco_wholebody_face¶
SimpleBaseline2D (ECCV'2018)
@inproceedings{xiao2018simple,
title={Simple baselines for human pose estimation and tracking},
author={Xiao, Bin and Wu, Haiping and Wei, Yichen},
booktitle={Proceedings of the European conference on computer vision (ECCV)},
pages={466--481},
year={2018}
}
ResNet (CVPR'2016)
@inproceedings{he2016deep,
title={Deep residual learning for image recognition},
author={He, Kaiming and Zhang, Xiangyu and Ren, Shaoqing and Sun, Jian},
booktitle={Proceedings of the IEEE conference on computer vision and pattern recognition},
pages={770--778},
year={2016}
}
COCO-WholeBody-Face (ECCV'2020)
@inproceedings{jin2020whole,
title={Whole-Body Human Pose Estimation in the Wild},
author={Jin, Sheng and Xu, Lumin and Xu, Jin and Wang, Can and Liu, Wentao and Qian, Chen and Ouyang, Wanli and Luo, Ping},
booktitle={Proceedings of the European Conference on Computer Vision (ECCV)},
year={2020}
}
Results on COCO-WholeBody-Face val set
Arch | Input Size | NME | ckpt | log |
---|---|---|---|---|
pose_res50 | 256x256 | 0.0566 | ckpt | log |
Topdown Heatmap + Resnet on Deepfashion¶
SimpleBaseline2D (ECCV'2018)
@inproceedings{xiao2018simple,
title={Simple baselines for human pose estimation and tracking},
author={Xiao, Bin and Wu, Haiping and Wei, Yichen},
booktitle={Proceedings of the European conference on computer vision (ECCV)},
pages={466--481},
year={2018}
}
ResNet (CVPR'2016)
@inproceedings{he2016deep,
title={Deep residual learning for image recognition},
author={He, Kaiming and Zhang, Xiangyu and Ren, Shaoqing and Sun, Jian},
booktitle={Proceedings of the IEEE conference on computer vision and pattern recognition},
pages={770--778},
year={2016}
}
DeepFashion (CVPR'2016)
@inproceedings{liuLQWTcvpr16DeepFashion,
author = {Liu, Ziwei and Luo, Ping and Qiu, Shi and Wang, Xiaogang and Tang, Xiaoou},
title = {DeepFashion: Powering Robust Clothes Recognition and Retrieval with Rich Annotations},
booktitle = {Proceedings of IEEE Conference on Computer Vision and Pattern Recognition (CVPR)},
month = {June},
year = {2016}
}
DeepFashion (ECCV'2016)
@inproceedings{liuYLWTeccv16FashionLandmark,
author = {Liu, Ziwei and Yan, Sijie and Luo, Ping and Wang, Xiaogang and Tang, Xiaoou},
title = {Fashion Landmark Detection in the Wild},
booktitle = {European Conference on Computer Vision (ECCV)},
month = {October},
year = {2016}
}
Results on DeepFashion val set
Set | Arch | Input Size | PCK@0.2 | AUC | EPE | ckpt | log |
---|---|---|---|---|---|---|---|
upper | pose_resnet_50 | 256x256 | 0.954 | 0.578 | 16.8 | ckpt | log |
lower | pose_resnet_50 | 256x256 | 0.965 | 0.744 | 10.5 | ckpt | log |
full | pose_resnet_50 | 256x256 | 0.977 | 0.664 | 12.7 | ckpt | log |
Topdown Heatmap + Resnet on Deepfashion2¶
SimpleBaseline2D (ECCV'2018)
@inproceedings{xiao2018simple,
title={Simple baselines for human pose estimation and tracking},
author={Xiao, Bin and Wu, Haiping and Wei, Yichen},
booktitle={Proceedings of the European conference on computer vision (ECCV)},
pages={466--481},
year={2018}
}
ResNet (CVPR'2016)
@inproceedings{he2016deep,
title={Deep residual learning for image recognition},
author={He, Kaiming and Zhang, Xiangyu and Ren, Shaoqing and Sun, Jian},
booktitle={Proceedings of the IEEE conference on computer vision and pattern recognition},
pages={770--778},
year={2016}
}
DeepFashion2 (CVPR'2019)
@article{DeepFashion2,
author = {Yuying Ge and Ruimao Zhang and Lingyun Wu and Xiaogang Wang and Xiaoou Tang and Ping Luo},
title={A Versatile Benchmark for Detection, Pose Estimation, Segmentation and Re-Identification of Clothing Images},
journal={CVPR},
year={2019}
}
Results on DeepFashion2 val set
Set | Arch | Input Size | PCK@0.2 | AUC | EPE | ckpt | log |
---|---|---|---|---|---|---|---|
short_sleeved_shirt | pose_resnet_50 | 256x256 | 0.988 | 0.703 | 10.2 | ckpt | log |
long_sleeved_shirt | pose_resnet_50 | 256x256 | 0.973 | 0.587 | 16.5 | ckpt | log |
short_sleeved_outwear | pose_resnet_50 | 256x256 | 0.966 | 0.408 | 24.0 | ckpt | log |
long_sleeved_outwear | pose_resnet_50 | 256x256 | 0.987 | 0.517 | 18.1 | ckpt | log |
vest | pose_resnet_50 | 256x256 | 0.981 | 0.643 | 12.7 | ckpt | log |
sling | pose_resnet_50 | 256x256 | 0.940 | 0.557 | 21.6 | ckpt | log |
shorts | pose_resnet_50 | 256x256 | 0.975 | 0.682 | 12.4 | ckpt | log |
trousers | pose_resnet_50 | 256x256 | 0.973 | 0.625 | 14.8 | ckpt | log |
skirt | pose_resnet_50 | 256x256 | 0.952 | 0.653 | 16.6 | ckpt | log |
short_sleeved_dress | pose_resnet_50 | 256x256 | 0.980 | 0.603 | 15.6 | ckpt | log |
long_sleeved_dress | pose_resnet_50 | 256x256 | 0.976 | 0.518 | 20.1 | ckpt | log |
vest_dress | pose_resnet_50 | 256x256 | 0.980 | 0.600 | 16.0 | ckpt | log |
sling_dress | pose_resnet_50 | 256x256 | 0.967 | 0.544 | 19.5 | ckpt | log |
Topdown Heatmap + Resnet + Coco + Wholebody on Coco_wholebody_hand¶
SimpleBaseline2D (ECCV'2018)
@inproceedings{xiao2018simple,
title={Simple baselines for human pose estimation and tracking},
author={Xiao, Bin and Wu, Haiping and Wei, Yichen},
booktitle={Proceedings of the European conference on computer vision (ECCV)},
pages={466--481},
year={2018}
}
ResNet (CVPR'2016)
@inproceedings{he2016deep,
title={Deep residual learning for image recognition},
author={He, Kaiming and Zhang, Xiangyu and Ren, Shaoqing and Sun, Jian},
booktitle={Proceedings of the IEEE conference on computer vision and pattern recognition},
pages={770--778},
year={2016}
}
COCO-WholeBody-Hand (ECCV'2020)
@inproceedings{jin2020whole,
title={Whole-Body Human Pose Estimation in the Wild},
author={Jin, Sheng and Xu, Lumin and Xu, Jin and Wang, Can and Liu, Wentao and Qian, Chen and Ouyang, Wanli and Luo, Ping},
booktitle={Proceedings of the European Conference on Computer Vision (ECCV)},
year={2020}
}
Results on COCO-WholeBody-Hand val set
Arch | Input Size | PCK@0.2 | AUC | EPE | ckpt | log |
---|---|---|---|---|---|---|
pose_resnet_50 | 256x256 | 0.800 | 0.833 | 4.64 | ckpt | log |
Topdown Heatmap + Resnet on Freihand2d¶
SimpleBaseline2D (ECCV'2018)
@inproceedings{xiao2018simple,
title={Simple baselines for human pose estimation and tracking},
author={Xiao, Bin and Wu, Haiping and Wei, Yichen},
booktitle={Proceedings of the European conference on computer vision (ECCV)},
pages={466--481},
year={2018}
}
ResNet (CVPR'2016)
@inproceedings{he2016deep,
title={Deep residual learning for image recognition},
author={He, Kaiming and Zhang, Xiangyu and Ren, Shaoqing and Sun, Jian},
booktitle={Proceedings of the IEEE conference on computer vision and pattern recognition},
pages={770--778},
year={2016}
}
FreiHand (ICCV'2019)
@inproceedings{zimmermann2019freihand,
title={Freihand: A dataset for markerless capture of hand pose and shape from single rgb images},
author={Zimmermann, Christian and Ceylan, Duygu and Yang, Jimei and Russell, Bryan and Argus, Max and Brox, Thomas},
booktitle={Proceedings of the IEEE International Conference on Computer Vision},
pages={813--822},
year={2019}
}
Results on FreiHand val & test set
Set | Arch | Input Size | PCK@0.2 | AUC | EPE | ckpt | log |
---|---|---|---|---|---|---|---|
val | pose_resnet_50 | 224x224 | 0.993 | 0.868 | 3.25 | ckpt | log |
test | pose_resnet_50 | 224x224 | 0.992 | 0.868 | 3.27 | ckpt | log |
Topdown Heatmap + Resnet on Interhand2d¶
SimpleBaseline2D (ECCV'2018)
@inproceedings{xiao2018simple,
title={Simple baselines for human pose estimation and tracking},
author={Xiao, Bin and Wu, Haiping and Wei, Yichen},
booktitle={Proceedings of the European conference on computer vision (ECCV)},
pages={466--481},
year={2018}
}
ResNet (CVPR'2016)
@inproceedings{he2016deep,
title={Deep residual learning for image recognition},
author={He, Kaiming and Zhang, Xiangyu and Ren, Shaoqing and Sun, Jian},
booktitle={Proceedings of the IEEE conference on computer vision and pattern recognition},
pages={770--778},
year={2016}
}
InterHand2.6M (ECCV'2020)
@InProceedings{Moon_2020_ECCV_InterHand2.6M,
author = {Moon, Gyeongsik and Yu, Shoou-I and Wen, He and Shiratori, Takaaki and Lee, Kyoung Mu},
title = {InterHand2.6M: A Dataset and Baseline for 3D Interacting Hand Pose Estimation from a Single RGB Image},
booktitle = {European Conference on Computer Vision (ECCV)},
year = {2020}
}
Results on InterHand2.6M val & test set
Train Set | Set | Arch | Input Size | PCK@0.2 | AUC | EPE | ckpt | log |
---|---|---|---|---|---|---|---|---|
Human_annot | val(M) | pose_resnet_50 | 256x256 | 0.973 | 0.828 | 5.15 | ckpt | log |
Human_annot | test(H) | pose_resnet_50 | 256x256 | 0.973 | 0.826 | 5.27 | ckpt | log |
Human_annot | test(M) | pose_resnet_50 | 256x256 | 0.975 | 0.841 | 4.90 | ckpt | log |
Human_annot | test(H+M) | pose_resnet_50 | 256x256 | 0.975 | 0.839 | 4.97 | ckpt | log |
Machine_annot | val(M) | pose_resnet_50 | 256x256 | 0.970 | 0.824 | 5.39 | ckpt | log |
Machine_annot | test(H) | pose_resnet_50 | 256x256 | 0.969 | 0.821 | 5.52 | ckpt | log |
Machine_annot | test(M) | pose_resnet_50 | 256x256 | 0.972 | 0.838 | 5.03 | ckpt | log |
Machine_annot | test(H+M) | pose_resnet_50 | 256x256 | 0.972 | 0.837 | 5.11 | ckpt | log |
All | val(M) | pose_resnet_50 | 256x256 | 0.977 | 0.840 | 4.66 | ckpt | log |
All | test(H) | pose_resnet_50 | 256x256 | 0.979 | 0.839 | 4.65 | ckpt | log |
All | test(M) | pose_resnet_50 | 256x256 | 0.979 | 0.838 | 4.42 | ckpt | log |
All | test(H+M) | pose_resnet_50 | 256x256 | 0.979 | 0.851 | 4.46 | ckpt | log |
Topdown Heatmap + Resnet on Onehand10k¶
SimpleBaseline2D (ECCV'2018)
@inproceedings{xiao2018simple,
title={Simple baselines for human pose estimation and tracking},
author={Xiao, Bin and Wu, Haiping and Wei, Yichen},
booktitle={Proceedings of the European conference on computer vision (ECCV)},
pages={466--481},
year={2018}
}
ResNet (CVPR'2016)
@inproceedings{he2016deep,
title={Deep residual learning for image recognition},
author={He, Kaiming and Zhang, Xiangyu and Ren, Shaoqing and Sun, Jian},
booktitle={Proceedings of the IEEE conference on computer vision and pattern recognition},
pages={770--778},
year={2016}
}
OneHand10K (TCSVT'2019)
@article{wang2018mask,
title={Mask-pose cascaded cnn for 2d hand pose estimation from single color image},
author={Wang, Yangang and Peng, Cong and Liu, Yebin},
journal={IEEE Transactions on Circuits and Systems for Video Technology},
volume={29},
number={11},
pages={3258--3268},
year={2018},
publisher={IEEE}
}
Results on OneHand10K val set
Arch | Input Size | PCK@0.2 | AUC | EPE | ckpt | log |
---|---|---|---|---|---|---|
pose_resnet_50 | 256x256 | 0.989 | 0.555 | 25.19 | ckpt | log |
Topdown Heatmap + Resnet on Panoptic2d¶
SimpleBaseline2D (ECCV'2018)
@inproceedings{xiao2018simple,
title={Simple baselines for human pose estimation and tracking},
author={Xiao, Bin and Wu, Haiping and Wei, Yichen},
booktitle={Proceedings of the European conference on computer vision (ECCV)},
pages={466--481},
year={2018}
}
ResNet (CVPR'2016)
@inproceedings{he2016deep,
title={Deep residual learning for image recognition},
author={He, Kaiming and Zhang, Xiangyu and Ren, Shaoqing and Sun, Jian},
booktitle={Proceedings of the IEEE conference on computer vision and pattern recognition},
pages={770--778},
year={2016}
}
CMU Panoptic HandDB (CVPR'2017)
@inproceedings{simon2017hand,
title={Hand keypoint detection in single images using multiview bootstrapping},
author={Simon, Tomas and Joo, Hanbyul and Matthews, Iain and Sheikh, Yaser},
booktitle={Proceedings of the IEEE conference on Computer Vision and Pattern Recognition},
pages={1145--1153},
year={2017}
}
Results on CMU Panoptic (MPII+NZSL val set)
Arch | Input Size | PCKh@0.7 | AUC | EPE | ckpt | log |
---|---|---|---|---|---|---|
pose_resnet_50 | 256x256 | 0.999 | 0.713 | 9.00 | ckpt | log |
Topdown Heatmap + Resnet on Rhd2d¶
SimpleBaseline2D (ECCV'2018)
@inproceedings{xiao2018simple,
title={Simple baselines for human pose estimation and tracking},
author={Xiao, Bin and Wu, Haiping and Wei, Yichen},
booktitle={Proceedings of the European conference on computer vision (ECCV)},
pages={466--481},
year={2018}
}
ResNet (CVPR'2016)
@inproceedings{he2016deep,
title={Deep residual learning for image recognition},
author={He, Kaiming and Zhang, Xiangyu and Ren, Shaoqing and Sun, Jian},
booktitle={Proceedings of the IEEE conference on computer vision and pattern recognition},
pages={770--778},
year={2016}
}
RHD (ICCV'2017)
@TechReport{zb2017hand,
author={Christian Zimmermann and Thomas Brox},
title={Learning to Estimate 3D Hand Pose from Single RGB Images},
institution={arXiv:1705.01389},
year={2017},
note="https://arxiv.org/abs/1705.01389",
url="https://lmb.informatik.uni-freiburg.de/projects/hand3d/"
}
Results on RHD test set
Arch | Input Size | PCK@0.2 | AUC | EPE | ckpt | log |
---|---|---|---|---|---|---|
pose_resnet50 | 256x256 | 0.991 | 0.898 | 2.33 | ckpt | log |
Topdown Heatmap + Resnet on Coco-Wholebody¶
SimpleBaseline2D (ECCV'2018)
@inproceedings{xiao2018simple,
title={Simple baselines for human pose estimation and tracking},
author={Xiao, Bin and Wu, Haiping and Wei, Yichen},
booktitle={Proceedings of the European conference on computer vision (ECCV)},
pages={466--481},
year={2018}
}
COCO-WholeBody (ECCV'2020)
@inproceedings{jin2020whole,
title={Whole-Body Human Pose Estimation in the Wild},
author={Jin, Sheng and Xu, Lumin and Xu, Jin and Wang, Can and Liu, Wentao and Qian, Chen and Ouyang, Wanli and Luo, Ping},
booktitle={Proceedings of the European Conference on Computer Vision (ECCV)},
year={2020}
}
Results on COCO-WholeBody v1.0 val with detector having human AP of 56.4 on COCO val2017 dataset
Arch | Input Size | Body AP | Body AR | Foot AP | Foot AR | Face AP | Face AR | Hand AP | Hand AR | Whole AP | Whole AR | ckpt | log |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|
pose_resnet_50 | 256x192 | 0.652 | 0.739 | 0.614 | 0.746 | 0.608 | 0.716 | 0.460 | 0.584 | 0.520 | 0.633 | ckpt | log |
pose_resnet_50 | 384x288 | 0.666 | 0.747 | 0.635 | 0.763 | 0.732 | 0.812 | 0.537 | 0.647 | 0.573 | 0.671 | ckpt | log |
pose_resnet_101 | 256x192 | 0.670 | 0.754 | 0.640 | 0.767 | 0.611 | 0.723 | 0.463 | 0.589 | 0.533 | 0.647 | ckpt | log |
pose_resnet_101 | 384x288 | 0.692 | 0.770 | 0.680 | 0.798 | 0.747 | 0.822 | 0.549 | 0.658 | 0.597 | 0.692 | ckpt | log |
pose_resnet_152 | 256x192 | 0.682 | 0.764 | 0.662 | 0.788 | 0.624 | 0.728 | 0.482 | 0.606 | 0.548 | 0.661 | ckpt | log |
pose_resnet_152 | 384x288 | 0.703 | 0.780 | 0.693 | 0.813 | 0.751 | 0.825 | 0.559 | 0.667 | 0.610 | 0.705 | ckpt | log |
PoseWarper (NeurIPS’2019)¶
Posewarper + Hrnet + Posetrack18 on Posetrack18¶
PoseWarper (NeurIPS'2019)
@inproceedings{NIPS2019_gberta,
title = {Learning Temporal Pose Estimation from Sparsely Labeled Videos},
author = {Bertasius, Gedas and Feichtenhofer, Christoph, and Tran, Du and Shi, Jianbo, and Torresani, Lorenzo},
booktitle = {Advances in Neural Information Processing Systems 33},
year = {2019},
}
HRNet (CVPR'2019)
@inproceedings{sun2019deep,
title={Deep high-resolution representation learning for human pose estimation},
author={Sun, Ke and Xiao, Bin and Liu, Dong and Wang, Jingdong},
booktitle={Proceedings of the IEEE conference on computer vision and pattern recognition},
pages={5693--5703},
year={2019}
}
PoseTrack18 (CVPR'2018)
@inproceedings{andriluka2018posetrack,
title={Posetrack: A benchmark for human pose estimation and tracking},
author={Andriluka, Mykhaylo and Iqbal, Umar and Insafutdinov, Eldar and Pishchulin, Leonid and Milan, Anton and Gall, Juergen and Schiele, Bernt},
booktitle={Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition},
pages={5167--5176},
year={2018}
}
COCO (ECCV'2014)
@inproceedings{lin2014microsoft,
title={Microsoft coco: Common objects in context},
author={Lin, Tsung-Yi and Maire, Michael and Belongie, Serge and Hays, James and Perona, Pietro and Ramanan, Deva and Doll{\'a}r, Piotr and Zitnick, C Lawrence},
booktitle={European conference on computer vision},
pages={740--755},
year={2014},
organization={Springer}
}
Note that the training of PoseWarper can be split into two stages.
The first-stage is trained with the pre-trained checkpoint on COCO dataset, and the main backbone is fine-tuned on PoseTrack18 in a single-frame setting.
The second-stage is trained with the last checkpoint from the first stage, and the warping offsets are learned in a multi-frame setting while the backbone is frozen.
Results on PoseTrack2018 val with ground-truth bounding boxes
Arch | Input Size | Head | Shou | Elb | Wri | Hip | Knee | Ankl | Total | ckpt | log |
---|---|---|---|---|---|---|---|---|---|---|---|
pose_hrnet_w48 | 384x288 | 88.2 | 90.3 | 86.1 | 81.6 | 81.8 | 83.8 | 81.5 | 85.0 | ckpt | log |
Results on PoseTrack2018 val with precomputed human bounding boxes from PoseWarper supplementary data files from this link1.
Arch | Input Size | Head | Shou | Elb | Wri | Hip | Knee | Ankl | Total | ckpt | log |
---|---|---|---|---|---|---|---|---|---|---|---|
pose_hrnet_w48 | 384x288 | 81.8 | 85.6 | 82.7 | 77.2 | 76.8 | 79.0 | 74.4 | 79.8 | ckpt | log |
1 Please download the precomputed human bounding boxes on PoseTrack2018 val from $PoseWarper_supp_files/posetrack18_precomputed_boxes/val_boxes.json
and place it here: $mmpose/data/posetrack18/posetrack18_precomputed_boxes/val_boxes.json
to be consistent with the config. Please refer to DATA Preparation for more detail about data preparation.
SimpleBaseline3D (ICCV’2017)¶
Pose Lift + Simplebaseline3d on H36m¶
SimpleBaseline3D (ICCV'2017)
@inproceedings{martinez_2017_3dbaseline,
title={A simple yet effective baseline for 3d human pose estimation},
author={Martinez, Julieta and Hossain, Rayat and Romero, Javier and Little, James J.},
booktitle={ICCV},
year={2017}
}
Human3.6M (TPAMI'2014)
@article{h36m_pami,
author = {Ionescu, Catalin and Papava, Dragos and Olaru, Vlad and Sminchisescu, Cristian},
title = {Human3.6M: Large Scale Datasets and Predictive Methods for 3D Human Sensing in Natural Environments},
journal = {IEEE Transactions on Pattern Analysis and Machine Intelligence},
publisher = {IEEE Computer Society},
volume = {36},
number = {7},
pages = {1325-1339},
month = {jul},
year = {2014}
}
Results on Human3.6M dataset with ground truth 2D detections
Arch | MPJPE | P-MPJPE | ckpt | log |
---|---|---|---|---|
simple_baseline_3d_tcn1 | 43.4 | 34.3 | ckpt | log |
1 Differing from the original paper, we didn’t apply the max-norm constraint
because we found this led to a better convergence and performance.
Pose Lift + Simplebaseline3d on Mpi_inf_3dhp¶
SimpleBaseline3D (ICCV'2017)
@inproceedings{martinez_2017_3dbaseline,
title={A simple yet effective baseline for 3d human pose estimation},
author={Martinez, Julieta and Hossain, Rayat and Romero, Javier and Little, James J.},
booktitle={ICCV},
year={2017}
}
MPI-INF-3DHP (3DV'2017)
@inproceedings{mono-3dhp2017,
author = {Mehta, Dushyant and Rhodin, Helge and Casas, Dan and Fua, Pascal and Sotnychenko, Oleksandr and Xu, Weipeng and Theobalt, Christian},
title = {Monocular 3D Human Pose Estimation In The Wild Using Improved CNN Supervision},
booktitle = {3D Vision (3DV), 2017 Fifth International Conference on},
url = {http://gvv.mpi-inf.mpg.de/3dhp_dataset},
year = {2017},
organization={IEEE},
doi={10.1109/3dv.2017.00064},
}
Results on MPI-INF-3DHP dataset with ground truth 2D detections
Arch | MPJPE | P-MPJPE | 3DPCK | 3DAUC | ckpt | log |
---|---|---|---|---|---|---|
simple_baseline_3d_tcn1 | 84.3 | 53.2 | 85.0 | 52.0 | ckpt | log |
1 Differing from the original paper, we didn’t apply the max-norm constraint
because we found this led to a better convergence and performance.
HMR (CVPR’2018)¶
HMR + Resnet on Mixed¶
HMR (CVPR'2018)
@inProceedings{kanazawaHMR18,
title={End-to-end Recovery of Human Shape and Pose},
author = {Angjoo Kanazawa
and Michael J. Black
and David W. Jacobs
and Jitendra Malik},
booktitle={Computer Vision and Pattern Recognition (CVPR)},
year={2018}
}
ResNet (CVPR'2016)
@inproceedings{he2016deep,
title={Deep residual learning for image recognition},
author={He, Kaiming and Zhang, Xiangyu and Ren, Shaoqing and Sun, Jian},
booktitle={Proceedings of the IEEE conference on computer vision and pattern recognition},
pages={770--778},
year={2016}
}
Human3.6M (TPAMI'2014)
@article{h36m_pami,
author = {Ionescu, Catalin and Papava, Dragos and Olaru, Vlad and Sminchisescu, Cristian},
title = {Human3.6M: Large Scale Datasets and Predictive Methods for 3D Human Sensing in Natural Environments},
journal = {IEEE Transactions on Pattern Analysis and Machine Intelligence},
publisher = {IEEE Computer Society},
volume = {36},
number = {7},
pages = {1325-1339},
month = {jul},
year = {2014}
}
Results on Human3.6M with ground-truth bounding box having MPJPE-PA of 52.60 mm on Protocol2
Arch | Input Size | MPJPE (P1) | MPJPE-PA (P1) | MPJPE (P2) | MPJPE-PA (P2) | ckpt | log |
---|---|---|---|---|---|---|---|
hmr_resnet_50 | 224x224 | 80.75 | 55.08 | 80.35 | 52.60 | ckpt | log |
UDP (CVPR’2020)¶
Associative Embedding + Higherhrnet + Udp on Coco¶
Associative Embedding (NIPS'2017)
@inproceedings{newell2017associative,
title={Associative embedding: End-to-end learning for joint detection and grouping},
author={Newell, Alejandro and Huang, Zhiao and Deng, Jia},
booktitle={Advances in neural information processing systems},
pages={2277--2287},
year={2017}
}
HigherHRNet (CVPR'2020)
@inproceedings{cheng2020higherhrnet,
title={HigherHRNet: Scale-Aware Representation Learning for Bottom-Up Human Pose Estimation},
author={Cheng, Bowen and Xiao, Bin and Wang, Jingdong and Shi, Honghui and Huang, Thomas S and Zhang, Lei},
booktitle={Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition},
pages={5386--5395},
year={2020}
}
UDP (CVPR'2020)
@InProceedings{Huang_2020_CVPR,
author = {Huang, Junjie and Zhu, Zheng and Guo, Feng and Huang, Guan},
title = {The Devil Is in the Details: Delving Into Unbiased Data Processing for Human Pose Estimation},
booktitle = {The IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)},
month = {June},
year = {2020}
}
COCO (ECCV'2014)
@inproceedings{lin2014microsoft,
title={Microsoft coco: Common objects in context},
author={Lin, Tsung-Yi and Maire, Michael and Belongie, Serge and Hays, James and Perona, Pietro and Ramanan, Deva and Doll{\'a}r, Piotr and Zitnick, C Lawrence},
booktitle={European conference on computer vision},
pages={740--755},
year={2014},
organization={Springer}
}
Results on COCO val2017 without multi-scale test
Arch | Input Size | AP | AP50 | AP75 | AR | AR50 | ckpt | log |
---|---|---|---|---|---|---|---|---|
HigherHRNet-w32_udp | 512x512 | 0.678 | 0.862 | 0.736 | 0.724 | 0.890 | ckpt | log |
HigherHRNet-w48_udp | 512x512 | 0.690 | 0.872 | 0.750 | 0.734 | 0.891 | ckpt | log |
Associative Embedding + Hrnet + Udp on Coco¶
Associative Embedding (NIPS'2017)
@inproceedings{newell2017associative,
title={Associative embedding: End-to-end learning for joint detection and grouping},
author={Newell, Alejandro and Huang, Zhiao and Deng, Jia},
booktitle={Advances in neural information processing systems},
pages={2277--2287},
year={2017}
}
HRNet (CVPR'2019)
@inproceedings{sun2019deep,
title={Deep high-resolution representation learning for human pose estimation},
author={Sun, Ke and Xiao, Bin and Liu, Dong and Wang, Jingdong},
booktitle={Proceedings of the IEEE conference on computer vision and pattern recognition},
pages={5693--5703},
year={2019}
}
UDP (CVPR'2020)
@InProceedings{Huang_2020_CVPR,
author = {Huang, Junjie and Zhu, Zheng and Guo, Feng and Huang, Guan},
title = {The Devil Is in the Details: Delving Into Unbiased Data Processing for Human Pose Estimation},
booktitle = {The IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)},
month = {June},
year = {2020}
}
COCO (ECCV'2014)
@inproceedings{lin2014microsoft,
title={Microsoft coco: Common objects in context},
author={Lin, Tsung-Yi and Maire, Michael and Belongie, Serge and Hays, James and Perona, Pietro and Ramanan, Deva and Doll{\'a}r, Piotr and Zitnick, C Lawrence},
booktitle={European conference on computer vision},
pages={740--755},
year={2014},
organization={Springer}
}
Results on COCO val2017 without multi-scale test
Arch | Input Size | AP | AP50 | AP75 | AR | AR50 | ckpt | log |
---|---|---|---|---|---|---|---|---|
HRNet-w32_udp | 512x512 | 0.671 | 0.863 | 0.729 | 0.717 | 0.889 | ckpt | log |
HRNet-w48_udp | 512x512 | 0.681 | 0.872 | 0.741 | 0.725 | 0.892 | ckpt | log |
Topdown Heatmap + Hrnet + Udp on Coco¶
HRNet (CVPR'2019)
@inproceedings{sun2019deep,
title={Deep high-resolution representation learning for human pose estimation},
author={Sun, Ke and Xiao, Bin and Liu, Dong and Wang, Jingdong},
booktitle={Proceedings of the IEEE conference on computer vision and pattern recognition},
pages={5693--5703},
year={2019}
}
UDP (CVPR'2020)
@InProceedings{Huang_2020_CVPR,
author = {Huang, Junjie and Zhu, Zheng and Guo, Feng and Huang, Guan},
title = {The Devil Is in the Details: Delving Into Unbiased Data Processing for Human Pose Estimation},
booktitle = {The IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)},
month = {June},
year = {2020}
}
COCO (ECCV'2014)
@inproceedings{lin2014microsoft,
title={Microsoft coco: Common objects in context},
author={Lin, Tsung-Yi and Maire, Michael and Belongie, Serge and Hays, James and Perona, Pietro and Ramanan, Deva and Doll{\'a}r, Piotr and Zitnick, C Lawrence},
booktitle={European conference on computer vision},
pages={740--755},
year={2014},
organization={Springer}
}
Results on COCO val2017 with detector having human AP of 56.4 on COCO val2017 dataset
Arch | Input Size | AP | AP50 | AP75 | AR | AR50 | ckpt | log |
---|---|---|---|---|---|---|---|---|
pose_hrnet_w32_udp | 256x192 | 0.760 | 0.907 | 0.827 | 0.811 | 0.945 | ckpt | log |
pose_hrnet_w32_udp | 384x288 | 0.769 | 0.908 | 0.833 | 0.817 | 0.944 | ckpt | log |
pose_hrnet_w48_udp | 256x192 | 0.767 | 0.906 | 0.834 | 0.817 | 0.945 | ckpt | log |
pose_hrnet_w48_udp | 384x288 | 0.772 | 0.910 | 0.835 | 0.820 | 0.945 | ckpt | log |
pose_hrnet_w32_udp_regress | 256x192 | 0.758 | 0.908 | 0.823 | 0.812 | 0.943 | ckpt | log |
Note that, UDP also adopts the unbiased encoding/decoding algorithm of DARK.
Topdown Heatmap + Hrnetv2 + Udp on Onehand10k¶
HRNetv2 (TPAMI'2019)
@article{WangSCJDZLMTWLX19,
title={Deep High-Resolution Representation Learning for Visual Recognition},
author={Jingdong Wang and Ke Sun and Tianheng Cheng and
Borui Jiang and Chaorui Deng and Yang Zhao and Dong Liu and Yadong Mu and
Mingkui Tan and Xinggang Wang and Wenyu Liu and Bin Xiao},
journal={TPAMI},
year={2019}
}
UDP (CVPR'2020)
@InProceedings{Huang_2020_CVPR,
author = {Huang, Junjie and Zhu, Zheng and Guo, Feng and Huang, Guan},
title = {The Devil Is in the Details: Delving Into Unbiased Data Processing for Human Pose Estimation},
booktitle = {The IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)},
month = {June},
year = {2020}
}
OneHand10K (TCSVT'2019)
@article{wang2018mask,
title={Mask-pose cascaded cnn for 2d hand pose estimation from single color image},
author={Wang, Yangang and Peng, Cong and Liu, Yebin},
journal={IEEE Transactions on Circuits and Systems for Video Technology},
volume={29},
number={11},
pages={3258--3268},
year={2018},
publisher={IEEE}
}
Results on OneHand10K val set
Arch | Input Size | PCK@0.2 | AUC | EPE | ckpt | log |
---|---|---|---|---|---|---|
pose_hrnetv2_w18_udp | 256x256 | 0.990 | 0.572 | 23.87 | ckpt | log |
Topdown Heatmap + Hrnetv2 + Udp on Panoptic2d¶
HRNetv2 (TPAMI'2019)
@article{WangSCJDZLMTWLX19,
title={Deep High-Resolution Representation Learning for Visual Recognition},
author={Jingdong Wang and Ke Sun and Tianheng Cheng and
Borui Jiang and Chaorui Deng and Yang Zhao and Dong Liu and Yadong Mu and
Mingkui Tan and Xinggang Wang and Wenyu Liu and Bin Xiao},
journal={TPAMI},
year={2019}
}
UDP (CVPR'2020)
@InProceedings{Huang_2020_CVPR,
author = {Huang, Junjie and Zhu, Zheng and Guo, Feng and Huang, Guan},
title = {The Devil Is in the Details: Delving Into Unbiased Data Processing for Human Pose Estimation},
booktitle = {The IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)},
month = {June},
year = {2020}
}
CMU Panoptic HandDB (CVPR'2017)
@inproceedings{simon2017hand,
title={Hand keypoint detection in single images using multiview bootstrapping},
author={Simon, Tomas and Joo, Hanbyul and Matthews, Iain and Sheikh, Yaser},
booktitle={Proceedings of the IEEE conference on Computer Vision and Pattern Recognition},
pages={1145--1153},
year={2017}
}
Results on CMU Panoptic (MPII+NZSL val set)
Arch | Input Size | PCKh@0.7 | AUC | EPE | ckpt | log |
---|---|---|---|---|---|---|
pose_hrnetv2_w18_udp | 256x256 | 0.998 | 0.742 | 7.84 | ckpt | log |
Topdown Heatmap + Hrnetv2 + Udp on Rhd2d¶
HRNetv2 (TPAMI'2019)
@article{WangSCJDZLMTWLX19,
title={Deep High-Resolution Representation Learning for Visual Recognition},
author={Jingdong Wang and Ke Sun and Tianheng Cheng and
Borui Jiang and Chaorui Deng and Yang Zhao and Dong Liu and Yadong Mu and
Mingkui Tan and Xinggang Wang and Wenyu Liu and Bin Xiao},
journal={TPAMI},
year={2019}
}
UDP (CVPR'2020)
@InProceedings{Huang_2020_CVPR,
author = {Huang, Junjie and Zhu, Zheng and Guo, Feng and Huang, Guan},
title = {The Devil Is in the Details: Delving Into Unbiased Data Processing for Human Pose Estimation},
booktitle = {The IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)},
month = {June},
year = {2020}
}
RHD (ICCV'2017)
@TechReport{zb2017hand,
author={Christian Zimmermann and Thomas Brox},
title={Learning to Estimate 3D Hand Pose from Single RGB Images},
institution={arXiv:1705.01389},
year={2017},
note="https://arxiv.org/abs/1705.01389",
url="https://lmb.informatik.uni-freiburg.de/projects/hand3d/"
}
Results on RHD test set
Arch | Input Size | PCKh@0.7 | AUC | EPE | ckpt | log |
---|---|---|---|---|---|---|
pose_hrnetv2_w18_udp | 256x256 | 0.992 | 0.902 | 2.21 | ckpt | log |
ViPNAS (CVPR’2021)¶
Topdown Heatmap + Vipnas on Coco¶
ViPNAS (CVPR'2021)
@article{xu2021vipnas,
title={ViPNAS: Efficient Video Pose Estimation via Neural Architecture Search},
author={Xu, Lumin and Guan, Yingda and Jin, Sheng and Liu, Wentao and Qian, Chen and Luo, Ping and Ouyang, Wanli and Wang, Xiaogang},
booktitle={Proceedings of the IEEE conference on computer vision and pattern recognition},
year={2021}
}
COCO (ECCV'2014)
@inproceedings{lin2014microsoft,
title={Microsoft coco: Common objects in context},
author={Lin, Tsung-Yi and Maire, Michael and Belongie, Serge and Hays, James and Perona, Pietro and Ramanan, Deva and Doll{\'a}r, Piotr and Zitnick, C Lawrence},
booktitle={European conference on computer vision},
pages={740--755},
year={2014},
organization={Springer}
}
Results on COCO val2017 with detector having human AP of 56.4 on COCO val2017 dataset
Arch | Input Size | AP | AP50 | AP75 | AR | AR50 | ckpt | log |
---|---|---|---|---|---|---|---|---|
S-ViPNAS-MobileNetV3 | 256x192 | 0.700 | 0.887 | 0.778 | 0.757 | 0.929 | ckpt | log |
S-ViPNAS-Res50 | 256x192 | 0.711 | 0.893 | 0.789 | 0.769 | 0.934 | ckpt | log |
Topdown Heatmap + Vipnas on Coco-Wholebody¶
ViPNAS (CVPR'2021)
@article{xu2021vipnas,
title={ViPNAS: Efficient Video Pose Estimation via Neural Architecture Search},
author={Xu, Lumin and Guan, Yingda and Jin, Sheng and Liu, Wentao and Qian, Chen and Luo, Ping and Ouyang, Wanli and Wang, Xiaogang},
booktitle={Proceedings of the IEEE conference on computer vision and pattern recognition},
year={2021}
}
COCO-WholeBody (ECCV'2020)
@inproceedings{jin2020whole,
title={Whole-Body Human Pose Estimation in the Wild},
author={Jin, Sheng and Xu, Lumin and Xu, Jin and Wang, Can and Liu, Wentao and Qian, Chen and Ouyang, Wanli and Luo, Ping},
booktitle={Proceedings of the European Conference on Computer Vision (ECCV)},
year={2020}
}
Results on COCO-WholeBody v1.0 val with detector having human AP of 56.4 on COCO val2017 dataset
Arch | Input Size | Body AP | Body AR | Foot AP | Foot AR | Face AP | Face AR | Hand AP | Hand AR | Whole AP | Whole AR | ckpt | log |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|
S-ViPNAS-MobileNetV3 | 256x192 | 0.619 | 0.700 | 0.477 | 0.608 | 0.585 | 0.689 | 0.386 | 0.505 | 0.473 | 0.578 | ckpt | log |
S-ViPNAS-Res50 | 256x192 | 0.643 | 0.726 | 0.553 | 0.694 | 0.587 | 0.698 | 0.410 | 0.529 | 0.495 | 0.607 | ckpt | log |
Topdown Heatmap + Vipnas + Dark on Coco-Wholebody¶
ViPNAS (CVPR'2021)
@article{xu2021vipnas,
title={ViPNAS: Efficient Video Pose Estimation via Neural Architecture Search},
author={Xu, Lumin and Guan, Yingda and Jin, Sheng and Liu, Wentao and Qian, Chen and Luo, Ping and Ouyang, Wanli and Wang, Xiaogang},
booktitle={Proceedings of the IEEE conference on computer vision and pattern recognition},
year={2021}
}
DarkPose (CVPR'2020)
@inproceedings{zhang2020distribution,
title={Distribution-aware coordinate representation for human pose estimation},
author={Zhang, Feng and Zhu, Xiatian and Dai, Hanbin and Ye, Mao and Zhu, Ce},
booktitle={Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition},
pages={7093--7102},
year={2020}
}
COCO-WholeBody (ECCV'2020)
@inproceedings{jin2020whole,
title={Whole-Body Human Pose Estimation in the Wild},
author={Jin, Sheng and Xu, Lumin and Xu, Jin and Wang, Can and Liu, Wentao and Qian, Chen and Ouyang, Wanli and Luo, Ping},
booktitle={Proceedings of the European Conference on Computer Vision (ECCV)},
year={2020}
}
Results on COCO-WholeBody v1.0 val with detector having human AP of 56.4 on COCO val2017 dataset
Arch | Input Size | Body AP | Body AR | Foot AP | Foot AR | Face AP | Face AR | Hand AP | Hand AR | Whole AP | Whole AR | ckpt | log |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|
S-ViPNAS-MobileNetV3_dark | 256x192 | 0.632 | 0.710 | 0.530 | 0.660 | 0.672 | 0.771 | 0.404 | 0.519 | 0.508 | 0.607 | ckpt | log |
S-ViPNAS-Res50_dark | 256x192 | 0.650 | 0.732 | 0.550 | 0.686 | 0.684 | 0.784 | 0.437 | 0.554 | 0.528 | 0.632 | ckpt | log |
Wingloss (CVPR’2018)¶
Deeppose + Resnet + Wingloss on WFLW¶
DeepPose (CVPR'2014)
@inproceedings{toshev2014deeppose,
title={Deeppose: Human pose estimation via deep neural networks},
author={Toshev, Alexander and Szegedy, Christian},
booktitle={Proceedings of the IEEE conference on computer vision and pattern recognition},
pages={1653--1660},
year={2014}
}
ResNet (CVPR'2016)
@inproceedings{he2016deep,
title={Deep residual learning for image recognition},
author={He, Kaiming and Zhang, Xiangyu and Ren, Shaoqing and Sun, Jian},
booktitle={Proceedings of the IEEE conference on computer vision and pattern recognition},
pages={770--778},
year={2016}
}
Wingloss (CVPR'2018)
@inproceedings{feng2018wing,
title={Wing Loss for Robust Facial Landmark Localisation with Convolutional Neural Networks},
author={Feng, Zhen-Hua and Kittler, Josef and Awais, Muhammad and Huber, Patrik and Wu, Xiao-Jun},
booktitle={Computer Vision and Pattern Recognition (CVPR), 2018 IEEE Conference on},
year={2018},
pages ={2235-2245},
organization={IEEE}
}
WFLW (CVPR'2018)
@inproceedings{wu2018look,
title={Look at boundary: A boundary-aware face alignment algorithm},
author={Wu, Wayne and Qian, Chen and Yang, Shuo and Wang, Quan and Cai, Yici and Zhou, Qiang},
booktitle={Proceedings of the IEEE conference on computer vision and pattern recognition},
pages={2129--2138},
year={2018}
}
Results on WFLW dataset
The model is trained on WFLW train.
Arch | Input Size | NMEtest | NMEpose | NMEillumination | NMEocclusion | NMEblur | NMEmakeup | NMEexpression | ckpt | log |
---|---|---|---|---|---|---|---|---|---|---|
deeppose_res50_wingloss | 256x256 | 4.64 | 8.25 | 4.59 | 5.56 | 5.26 | 4.59 | 5.07 | ckpt | log |
DarkPose (CVPR’2020)¶
Topdown Heatmap + Resnet + Dark on Coco¶
SimpleBaseline2D (ECCV'2018)
@inproceedings{xiao2018simple,
title={Simple baselines for human pose estimation and tracking},
author={Xiao, Bin and Wu, Haiping and Wei, Yichen},
booktitle={Proceedings of the European conference on computer vision (ECCV)},
pages={466--481},
year={2018}
}
ResNet (CVPR'2016)
@inproceedings{he2016deep,
title={Deep residual learning for image recognition},
author={He, Kaiming and Zhang, Xiangyu and Ren, Shaoqing and Sun, Jian},
booktitle={Proceedings of the IEEE conference on computer vision and pattern recognition},
pages={770--778},
year={2016}
}
DarkPose (CVPR'2020)
@inproceedings{zhang2020distribution,
title={Distribution-aware coordinate representation for human pose estimation},
author={Zhang, Feng and Zhu, Xiatian and Dai, Hanbin and Ye, Mao and Zhu, Ce},
booktitle={Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition},
pages={7093--7102},
year={2020}
}
COCO (ECCV'2014)
@inproceedings{lin2014microsoft,
title={Microsoft coco: Common objects in context},
author={Lin, Tsung-Yi and Maire, Michael and Belongie, Serge and Hays, James and Perona, Pietro and Ramanan, Deva and Doll{\'a}r, Piotr and Zitnick, C Lawrence},
booktitle={European conference on computer vision},
pages={740--755},
year={2014},
organization={Springer}
}
Results on COCO val2017 with detector having human AP of 56.4 on COCO val2017 dataset
Arch | Input Size | AP | AP50 | AP75 | AR | AR50 | ckpt | log |
---|---|---|---|---|---|---|---|---|
pose_resnet_50_dark | 256x192 | 0.724 | 0.898 | 0.800 | 0.777 | 0.936 | ckpt | log |
pose_resnet_50_dark | 384x288 | 0.735 | 0.900 | 0.801 | 0.785 | 0.937 | ckpt | log |
pose_resnet_101_dark | 256x192 | 0.732 | 0.899 | 0.808 | 0.786 | 0.938 | ckpt | log |
pose_resnet_101_dark | 384x288 | 0.749 | 0.902 | 0.816 | 0.799 | 0.939 | ckpt | log |
pose_resnet_152_dark | 256x192 | 0.745 | 0.905 | 0.821 | 0.797 | 0.942 | ckpt | log |
pose_resnet_152_dark | 384x288 | 0.757 | 0.909 | 0.826 | 0.806 | 0.943 | ckpt | log |
Topdown Heatmap + Hrnet + Dark on Coco¶
HRNet (CVPR'2019)
@inproceedings{sun2019deep,
title={Deep high-resolution representation learning for human pose estimation},
author={Sun, Ke and Xiao, Bin and Liu, Dong and Wang, Jingdong},
booktitle={Proceedings of the IEEE conference on computer vision and pattern recognition},
pages={5693--5703},
year={2019}
}
DarkPose (CVPR'2020)
@inproceedings{zhang2020distribution,
title={Distribution-aware coordinate representation for human pose estimation},
author={Zhang, Feng and Zhu, Xiatian and Dai, Hanbin and Ye, Mao and Zhu, Ce},
booktitle={Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition},
pages={7093--7102},
year={2020}
}
COCO (ECCV'2014)
@inproceedings{lin2014microsoft,
title={Microsoft coco: Common objects in context},
author={Lin, Tsung-Yi and Maire, Michael and Belongie, Serge and Hays, James and Perona, Pietro and Ramanan, Deva and Doll{\'a}r, Piotr and Zitnick, C Lawrence},
booktitle={European conference on computer vision},
pages={740--755},
year={2014},
organization={Springer}
}
Results on COCO val2017 with detector having human AP of 56.4 on COCO val2017 dataset
Arch | Input Size | AP | AP50 | AP75 | AR | AR50 | ckpt | log |
---|---|---|---|---|---|---|---|---|
pose_hrnet_w32_dark | 256x192 | 0.757 | 0.907 | 0.823 | 0.808 | 0.943 | ckpt | log |
pose_hrnet_w32_dark | 384x288 | 0.766 | 0.907 | 0.831 | 0.815 | 0.943 | ckpt | log |
pose_hrnet_w48_dark | 256x192 | 0.764 | 0.907 | 0.830 | 0.814 | 0.943 | ckpt | log |
pose_hrnet_w48_dark | 384x288 | 0.772 | 0.910 | 0.836 | 0.820 | 0.946 | ckpt | log |
Topdown Heatmap + Hrnet + Dark on Mpii¶
HRNet (CVPR'2019)
@inproceedings{sun2019deep,
title={Deep high-resolution representation learning for human pose estimation},
author={Sun, Ke and Xiao, Bin and Liu, Dong and Wang, Jingdong},
booktitle={Proceedings of the IEEE conference on computer vision and pattern recognition},
pages={5693--5703},
year={2019}
}
DarkPose (CVPR'2020)
@inproceedings{zhang2020distribution,
title={Distribution-aware coordinate representation for human pose estimation},
author={Zhang, Feng and Zhu, Xiatian and Dai, Hanbin and Ye, Mao and Zhu, Ce},
booktitle={Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition},
pages={7093--7102},
year={2020}
}
MPII (CVPR'2014)
@inproceedings{andriluka14cvpr,
author = {Mykhaylo Andriluka and Leonid Pishchulin and Peter Gehler and Schiele, Bernt},
title = {2D Human Pose Estimation: New Benchmark and State of the Art Analysis},
booktitle = {IEEE Conference on Computer Vision and Pattern Recognition (CVPR)},
year = {2014},
month = {June}
}
Results on MPII val set
Arch | Input Size | Mean | Mean@0.1 | ckpt | log |
---|---|---|---|---|---|
pose_hrnet_w32_dark | 256x256 | 0.904 | 0.354 | ckpt | log |
pose_hrnet_w48_dark | 256x256 | 0.905 | 0.360 | ckpt | log |
Topdown Heatmap + Hrnetv2 + Dark on Aflw¶
HRNetv2 (TPAMI'2019)
@article{WangSCJDZLMTWLX19,
title={Deep High-Resolution Representation Learning for Visual Recognition},
author={Jingdong Wang and Ke Sun and Tianheng Cheng and
Borui Jiang and Chaorui Deng and Yang Zhao and Dong Liu and Yadong Mu and
Mingkui Tan and Xinggang Wang and Wenyu Liu and Bin Xiao},
journal={TPAMI},
year={2019}
}
DarkPose (CVPR'2020)
@inproceedings{zhang2020distribution,
title={Distribution-aware coordinate representation for human pose estimation},
author={Zhang, Feng and Zhu, Xiatian and Dai, Hanbin and Ye, Mao and Zhu, Ce},
booktitle={Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition},
pages={7093--7102},
year={2020}
}
AFLW (ICCVW'2011)
@inproceedings{koestinger2011annotated,
title={Annotated facial landmarks in the wild: A large-scale, real-world database for facial landmark localization},
author={Koestinger, Martin and Wohlhart, Paul and Roth, Peter M and Bischof, Horst},
booktitle={2011 IEEE international conference on computer vision workshops (ICCV workshops)},
pages={2144--2151},
year={2011},
organization={IEEE}
}
Results on AFLW dataset
The model is trained on AFLW train and evaluated on AFLW full and frontal.
Arch | Input Size | NMEfull | NMEfrontal | ckpt | log |
---|---|---|---|---|---|
pose_hrnetv2_w18_dark | 256x256 | 1.34 | 1.20 | ckpt | log |
Topdown Heatmap + Hrnetv2 + Dark + Coco + Wholebody on Coco_wholebody_face¶
HRNetv2 (TPAMI'2019)
@article{WangSCJDZLMTWLX19,
title={Deep High-Resolution Representation Learning for Visual Recognition},
author={Jingdong Wang and Ke Sun and Tianheng Cheng and
Borui Jiang and Chaorui Deng and Yang Zhao and Dong Liu and Yadong Mu and
Mingkui Tan and Xinggang Wang and Wenyu Liu and Bin Xiao},
journal={TPAMI},
year={2019}
}
DarkPose (CVPR'2020)
@inproceedings{zhang2020distribution,
title={Distribution-aware coordinate representation for human pose estimation},
author={Zhang, Feng and Zhu, Xiatian and Dai, Hanbin and Ye, Mao and Zhu, Ce},
booktitle={Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition},
pages={7093--7102},
year={2020}
}
COCO-WholeBody-Face (ECCV'2020)
@inproceedings{jin2020whole,
title={Whole-Body Human Pose Estimation in the Wild},
author={Jin, Sheng and Xu, Lumin and Xu, Jin and Wang, Can and Liu, Wentao and Qian, Chen and Ouyang, Wanli and Luo, Ping},
booktitle={Proceedings of the European Conference on Computer Vision (ECCV)},
year={2020}
}
Results on COCO-WholeBody-Face val set
Arch | Input Size | NME | ckpt | log |
---|---|---|---|---|
pose_hrnetv2_w18_dark | 256x256 | 0.0513 | ckpt | log |
Topdown Heatmap + Hrnetv2 + Dark on WFLW¶
HRNetv2 (TPAMI'2019)
@article{WangSCJDZLMTWLX19,
title={Deep High-Resolution Representation Learning for Visual Recognition},
author={Jingdong Wang and Ke Sun and Tianheng Cheng and
Borui Jiang and Chaorui Deng and Yang Zhao and Dong Liu and Yadong Mu and
Mingkui Tan and Xinggang Wang and Wenyu Liu and Bin Xiao},
journal={TPAMI},
year={2019}
}
DarkPose (CVPR'2020)
@inproceedings{zhang2020distribution,
title={Distribution-aware coordinate representation for human pose estimation},
author={Zhang, Feng and Zhu, Xiatian and Dai, Hanbin and Ye, Mao and Zhu, Ce},
booktitle={Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition},
pages={7093--7102},
year={2020}
}
WFLW (CVPR'2018)
@inproceedings{wu2018look,
title={Look at boundary: A boundary-aware face alignment algorithm},
author={Wu, Wayne and Qian, Chen and Yang, Shuo and Wang, Quan and Cai, Yici and Zhou, Qiang},
booktitle={Proceedings of the IEEE conference on computer vision and pattern recognition},
pages={2129--2138},
year={2018}
}
Results on WFLW dataset
The model is trained on WFLW train.
Arch | Input Size | NMEtest | NMEpose | NMEillumination | NMEocclusion | NMEblur | NMEmakeup | NMEexpression | ckpt | log |
---|---|---|---|---|---|---|---|---|---|---|
pose_hrnetv2_w18_dark | 256x256 | 3.98 | 6.99 | 3.96 | 4.78 | 4.57 | 3.87 | 4.30 | ckpt | log |
Topdown Heatmap + Hrnetv2 + Dark + Coco + Wholebody on Coco_wholebody_hand¶
HRNetv2 (TPAMI'2019)
@article{WangSCJDZLMTWLX19,
title={Deep High-Resolution Representation Learning for Visual Recognition},
author={Jingdong Wang and Ke Sun and Tianheng Cheng and
Borui Jiang and Chaorui Deng and Yang Zhao and Dong Liu and Yadong Mu and
Mingkui Tan and Xinggang Wang and Wenyu Liu and Bin Xiao},
journal={TPAMI},
year={2019}
}
DarkPose (CVPR'2020)
@inproceedings{zhang2020distribution,
title={Distribution-aware coordinate representation for human pose estimation},
author={Zhang, Feng and Zhu, Xiatian and Dai, Hanbin and Ye, Mao and Zhu, Ce},
booktitle={Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition},
pages={7093--7102},
year={2020}
}
COCO-WholeBody-Hand (ECCV'2020)
@inproceedings{jin2020whole,
title={Whole-Body Human Pose Estimation in the Wild},
author={Jin, Sheng and Xu, Lumin and Xu, Jin and Wang, Can and Liu, Wentao and Qian, Chen and Ouyang, Wanli and Luo, Ping},
booktitle={Proceedings of the European Conference on Computer Vision (ECCV)},
year={2020}
}
Results on COCO-WholeBody-Hand val set
Arch | Input Size | PCK@0.2 | AUC | EPE | ckpt | log |
---|---|---|---|---|---|---|
pose_hrnetv2_w18_dark | 256x256 | 0.814 | 0.840 | 4.37 | ckpt | log |
Topdown Heatmap + Hrnetv2 + Dark on Onehand10k¶
HRNetv2 (TPAMI'2019)
@article{WangSCJDZLMTWLX19,
title={Deep High-Resolution Representation Learning for Visual Recognition},
author={Jingdong Wang and Ke Sun and Tianheng Cheng and
Borui Jiang and Chaorui Deng and Yang Zhao and Dong Liu and Yadong Mu and
Mingkui Tan and Xinggang Wang and Wenyu Liu and Bin Xiao},
journal={TPAMI},
year={2019}
}
DarkPose (CVPR'2020)
@inproceedings{zhang2020distribution,
title={Distribution-aware coordinate representation for human pose estimation},
author={Zhang, Feng and Zhu, Xiatian and Dai, Hanbin and Ye, Mao and Zhu, Ce},
booktitle={Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition},
pages={7093--7102},
year={2020}
}
OneHand10K (TCSVT'2019)
@article{wang2018mask,
title={Mask-pose cascaded cnn for 2d hand pose estimation from single color image},
author={Wang, Yangang and Peng, Cong and Liu, Yebin},
journal={IEEE Transactions on Circuits and Systems for Video Technology},
volume={29},
number={11},
pages={3258--3268},
year={2018},
publisher={IEEE}
}
Results on OneHand10K val set
Arch | Input Size | PCK@0.2 | AUC | EPE | ckpt | log |
---|---|---|---|---|---|---|
pose_hrnetv2_w18_dark | 256x256 | 0.990 | 0.573 | 23.84 | ckpt | log |
Topdown Heatmap + Hrnetv2 + Dark on Panoptic2d¶
HRNetv2 (TPAMI'2019)
@article{WangSCJDZLMTWLX19,
title={Deep High-Resolution Representation Learning for Visual Recognition},
author={Jingdong Wang and Ke Sun and Tianheng Cheng and
Borui Jiang and Chaorui Deng and Yang Zhao and Dong Liu and Yadong Mu and
Mingkui Tan and Xinggang Wang and Wenyu Liu and Bin Xiao},
journal={TPAMI},
year={2019}
}
DarkPose (CVPR'2020)
@inproceedings{zhang2020distribution,
title={Distribution-aware coordinate representation for human pose estimation},
author={Zhang, Feng and Zhu, Xiatian and Dai, Hanbin and Ye, Mao and Zhu, Ce},
booktitle={Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition},
pages={7093--7102},
year={2020}
}
CMU Panoptic HandDB (CVPR'2017)
@inproceedings{simon2017hand,
title={Hand keypoint detection in single images using multiview bootstrapping},
author={Simon, Tomas and Joo, Hanbyul and Matthews, Iain and Sheikh, Yaser},
booktitle={Proceedings of the IEEE conference on Computer Vision and Pattern Recognition},
pages={1145--1153},
year={2017}
}
Results on CMU Panoptic (MPII+NZSL val set)
Arch | Input Size | PCKh@0.7 | AUC | EPE | ckpt | log |
---|---|---|---|---|---|---|
pose_hrnetv2_w18_dark | 256x256 | 0.999 | 0.745 | 7.77 | ckpt | log |
Topdown Heatmap + Hrnetv2 + Dark on Rhd2d¶
HRNetv2 (TPAMI'2019)
@article{WangSCJDZLMTWLX19,
title={Deep High-Resolution Representation Learning for Visual Recognition},
author={Jingdong Wang and Ke Sun and Tianheng Cheng and
Borui Jiang and Chaorui Deng and Yang Zhao and Dong Liu and Yadong Mu and
Mingkui Tan and Xinggang Wang and Wenyu Liu and Bin Xiao},
journal={TPAMI},
year={2019}
}
DarkPose (CVPR'2020)
@inproceedings{zhang2020distribution,
title={Distribution-aware coordinate representation for human pose estimation},
author={Zhang, Feng and Zhu, Xiatian and Dai, Hanbin and Ye, Mao and Zhu, Ce},
booktitle={Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition},
pages={7093--7102},
year={2020}
}
RHD (ICCV'2017)
@TechReport{zb2017hand,
author={Christian Zimmermann and Thomas Brox},
title={Learning to Estimate 3D Hand Pose from Single RGB Images},
institution={arXiv:1705.01389},
year={2017},
note="https://arxiv.org/abs/1705.01389",
url="https://lmb.informatik.uni-freiburg.de/projects/hand3d/"
}
Results on RHD test set
Arch | Input Size | PCK@0.2 | AUC | EPE | ckpt | log |
---|---|---|---|---|---|---|
pose_hrnetv2_w18_dark | 256x256 | 0.992 | 0.903 | 2.17 | ckpt | log |
Topdown Heatmap + Vipnas + Dark on Coco-Wholebody¶
ViPNAS (CVPR'2021)
@article{xu2021vipnas,
title={ViPNAS: Efficient Video Pose Estimation via Neural Architecture Search},
author={Xu, Lumin and Guan, Yingda and Jin, Sheng and Liu, Wentao and Qian, Chen and Luo, Ping and Ouyang, Wanli and Wang, Xiaogang},
booktitle={Proceedings of the IEEE conference on computer vision and pattern recognition},
year={2021}
}
DarkPose (CVPR'2020)
@inproceedings{zhang2020distribution,
title={Distribution-aware coordinate representation for human pose estimation},
author={Zhang, Feng and Zhu, Xiatian and Dai, Hanbin and Ye, Mao and Zhu, Ce},
booktitle={Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition},
pages={7093--7102},
year={2020}
}
COCO-WholeBody (ECCV'2020)
@inproceedings{jin2020whole,
title={Whole-Body Human Pose Estimation in the Wild},
author={Jin, Sheng and Xu, Lumin and Xu, Jin and Wang, Can and Liu, Wentao and Qian, Chen and Ouyang, Wanli and Luo, Ping},
booktitle={Proceedings of the European Conference on Computer Vision (ECCV)},
year={2020}
}
Results on COCO-WholeBody v1.0 val with detector having human AP of 56.4 on COCO val2017 dataset
Arch | Input Size | Body AP | Body AR | Foot AP | Foot AR | Face AP | Face AR | Hand AP | Hand AR | Whole AP | Whole AR | ckpt | log |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|
S-ViPNAS-MobileNetV3_dark | 256x192 | 0.632 | 0.710 | 0.530 | 0.660 | 0.672 | 0.771 | 0.404 | 0.519 | 0.508 | 0.607 | ckpt | log |
S-ViPNAS-Res50_dark | 256x192 | 0.650 | 0.732 | 0.550 | 0.686 | 0.684 | 0.784 | 0.437 | 0.554 | 0.528 | 0.632 | ckpt | log |
Topdown Heatmap + Hrnet + Dark on Coco-Wholebody¶
HRNet (CVPR'2019)
@inproceedings{sun2019deep,
title={Deep high-resolution representation learning for human pose estimation},
author={Sun, Ke and Xiao, Bin and Liu, Dong and Wang, Jingdong},
booktitle={Proceedings of the IEEE conference on computer vision and pattern recognition},
pages={5693--5703},
year={2019}
}
DarkPose (CVPR'2020)
@inproceedings{zhang2020distribution,
title={Distribution-aware coordinate representation for human pose estimation},
author={Zhang, Feng and Zhu, Xiatian and Dai, Hanbin and Ye, Mao and Zhu, Ce},
booktitle={Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition},
pages={7093--7102},
year={2020}
}
COCO-WholeBody (ECCV'2020)
@inproceedings{jin2020whole,
title={Whole-Body Human Pose Estimation in the Wild},
author={Jin, Sheng and Xu, Lumin and Xu, Jin and Wang, Can and Liu, Wentao and Qian, Chen and Ouyang, Wanli and Luo, Ping},
booktitle={Proceedings of the European Conference on Computer Vision (ECCV)},
year={2020}
}
Results on COCO-WholeBody v1.0 val with detector having human AP of 56.4 on COCO val2017 dataset
Arch | Input Size | Body AP | Body AR | Foot AP | Foot AR | Face AP | Face AR | Hand AP | Hand AR | Whole AP | Whole AR | ckpt | log |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|
pose_hrnet_w32_dark | 256x192 | 0.694 | 0.764 | 0.565 | 0.674 | 0.736 | 0.808 | 0.503 | 0.602 | 0.582 | 0.671 | ckpt | log |
pose_hrnet_w48_dark+ | 384x288 | 0.742 | 0.807 | 0.705 | 0.804 | 0.840 | 0.892 | 0.602 | 0.694 | 0.661 | 0.743 | ckpt | log |
Note: +
means the model is first pre-trained on original COCO dataset, and then fine-tuned on COCO-WholeBody dataset. We find this will lead to better performance.
Topdown Heatmap + Hrnet + Dark on Halpe¶
HRNet (CVPR'2019)
@inproceedings{sun2019deep,
title={Deep high-resolution representation learning for human pose estimation},
author={Sun, Ke and Xiao, Bin and Liu, Dong and Wang, Jingdong},
booktitle={Proceedings of the IEEE conference on computer vision and pattern recognition},
pages={5693--5703},
year={2019}
}
DarkPose (CVPR'2020)
@inproceedings{zhang2020distribution,
title={Distribution-aware coordinate representation for human pose estimation},
author={Zhang, Feng and Zhu, Xiatian and Dai, Hanbin and Ye, Mao and Zhu, Ce},
booktitle={Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition},
pages={7093--7102},
year={2020}
}
Halpe (CVPR'2020)
@inproceedings{li2020pastanet,
title={PaStaNet: Toward Human Activity Knowledge Engine},
author={Li, Yong-Lu and Xu, Liang and Liu, Xinpeng and Huang, Xijie and Xu, Yue and Wang, Shiyi and Fang, Hao-Shu and Ma, Ze and Chen, Mingyang and Lu, Cewu},
booktitle={CVPR},
year={2020}
}
Results on Halpe v1.0 val with detector having human AP of 56.4 on COCO val2017 dataset
Arch | Input Size | Whole AP | Whole AR | ckpt | log |
---|---|---|---|---|---|
pose_hrnet_w48_dark+ | 384x288 | 0.527 | 0.620 | ckpt | log |
Note: +
means the model is first pre-trained on original COCO dataset, and then fine-tuned on Halpe dataset. We find this will lead to better performance.
Associative Embedding (NIPS’2017)¶
Associative Embedding + Hrnet on Aic¶
Associative Embedding (NIPS'2017)
@inproceedings{newell2017associative,
title={Associative embedding: End-to-end learning for joint detection and grouping},
author={Newell, Alejandro and Huang, Zhiao and Deng, Jia},
booktitle={Advances in neural information processing systems},
pages={2277--2287},
year={2017}
}
HRNet (CVPR'2019)
@inproceedings{sun2019deep,
title={Deep high-resolution representation learning for human pose estimation},
author={Sun, Ke and Xiao, Bin and Liu, Dong and Wang, Jingdong},
booktitle={Proceedings of the IEEE conference on computer vision and pattern recognition},
pages={5693--5703},
year={2019}
}
AI Challenger (ArXiv'2017)
@article{wu2017ai,
title={Ai challenger: A large-scale dataset for going deeper in image understanding},
author={Wu, Jiahong and Zheng, He and Zhao, Bo and Li, Yixin and Yan, Baoming and Liang, Rui and Wang, Wenjia and Zhou, Shipei and Lin, Guosen and Fu, Yanwei and others},
journal={arXiv preprint arXiv:1711.06475},
year={2017}
}
Results on AIC validation set without multi-scale test
Arch | Input Size | AP | AP50 | AP75 | AR | AR50 | ckpt | log |
---|---|---|---|---|---|---|---|---|
HRNet-w32 | 512x512 | 0.303 | 0.697 | 0.225 | 0.373 | 0.755 | ckpt | log |
Results on AIC validation set with multi-scale test. 3 default scales ([2, 1, 0.5]) are used
Arch | Input Size | AP | AP50 | AP75 | AR | AR50 | ckpt | log |
---|---|---|---|---|---|---|---|---|
HRNet-w32 | 512x512 | 0.318 | 0.717 | 0.246 | 0.379 | 0.764 | ckpt | log |
Associative Embedding + Higherhrnet on Aic¶
Associative Embedding (NIPS'2017)
@inproceedings{newell2017associative,
title={Associative embedding: End-to-end learning for joint detection and grouping},
author={Newell, Alejandro and Huang, Zhiao and Deng, Jia},
booktitle={Advances in neural information processing systems},
pages={2277--2287},
year={2017}
}
HigherHRNet (CVPR'2020)
@inproceedings{cheng2020higherhrnet,
title={HigherHRNet: Scale-Aware Representation Learning for Bottom-Up Human Pose Estimation},
author={Cheng, Bowen and Xiao, Bin and Wang, Jingdong and Shi, Honghui and Huang, Thomas S and Zhang, Lei},
booktitle={Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition},
pages={5386--5395},
year={2020}
}
AI Challenger (ArXiv'2017)
@article{wu2017ai,
title={Ai challenger: A large-scale dataset for going deeper in image understanding},
author={Wu, Jiahong and Zheng, He and Zhao, Bo and Li, Yixin and Yan, Baoming and Liang, Rui and Wang, Wenjia and Zhou, Shipei and Lin, Guosen and Fu, Yanwei and others},
journal={arXiv preprint arXiv:1711.06475},
year={2017}
}
Results on AIC validation set without multi-scale test
Arch | Input Size | AP | AP50 | AP75 | AR | AR50 | ckpt | log |
---|---|---|---|---|---|---|---|---|
HigherHRNet-w32 | 512x512 | 0.315 | 0.710 | 0.243 | 0.379 | 0.757 | ckpt | log |
Results on AIC validation set with multi-scale test. 3 default scales ([2, 1, 0.5]) are used
Arch | Input Size | AP | AP50 | AP75 | AR | AR50 | ckpt | log |
---|---|---|---|---|---|---|---|---|
HigherHRNet-w32 | 512x512 | 0.323 | 0.718 | 0.254 | 0.379 | 0.758 | ckpt | log |
Associative Embedding + Higherhrnet + Udp on Coco¶
Associative Embedding (NIPS'2017)
@inproceedings{newell2017associative,
title={Associative embedding: End-to-end learning for joint detection and grouping},
author={Newell, Alejandro and Huang, Zhiao and Deng, Jia},
booktitle={Advances in neural information processing systems},
pages={2277--2287},
year={2017}
}
HigherHRNet (CVPR'2020)
@inproceedings{cheng2020higherhrnet,
title={HigherHRNet: Scale-Aware Representation Learning for Bottom-Up Human Pose Estimation},
author={Cheng, Bowen and Xiao, Bin and Wang, Jingdong and Shi, Honghui and Huang, Thomas S and Zhang, Lei},
booktitle={Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition},
pages={5386--5395},
year={2020}
}
UDP (CVPR'2020)
@InProceedings{Huang_2020_CVPR,
author = {Huang, Junjie and Zhu, Zheng and Guo, Feng and Huang, Guan},
title = {The Devil Is in the Details: Delving Into Unbiased Data Processing for Human Pose Estimation},
booktitle = {The IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)},
month = {June},
year = {2020}
}
COCO (ECCV'2014)
@inproceedings{lin2014microsoft,
title={Microsoft coco: Common objects in context},
author={Lin, Tsung-Yi and Maire, Michael and Belongie, Serge and Hays, James and Perona, Pietro and Ramanan, Deva and Doll{\'a}r, Piotr and Zitnick, C Lawrence},
booktitle={European conference on computer vision},
pages={740--755},
year={2014},
organization={Springer}
}
Results on COCO val2017 without multi-scale test
Arch | Input Size | AP | AP50 | AP75 | AR | AR50 | ckpt | log |
---|---|---|---|---|---|---|---|---|
HigherHRNet-w32_udp | 512x512 | 0.678 | 0.862 | 0.736 | 0.724 | 0.890 | ckpt | log |
HigherHRNet-w48_udp | 512x512 | 0.690 | 0.872 | 0.750 | 0.734 | 0.891 | ckpt | log |
Associative Embedding + Mobilenetv2 on Coco¶
Associative Embedding (NIPS'2017)
@inproceedings{newell2017associative,
title={Associative embedding: End-to-end learning for joint detection and grouping},
author={Newell, Alejandro and Huang, Zhiao and Deng, Jia},
booktitle={Advances in neural information processing systems},
pages={2277--2287},
year={2017}
}
MobilenetV2 (CVPR'2018)
@inproceedings{sandler2018mobilenetv2,
title={Mobilenetv2: Inverted residuals and linear bottlenecks},
author={Sandler, Mark and Howard, Andrew and Zhu, Menglong and Zhmoginov, Andrey and Chen, Liang-Chieh},
booktitle={Proceedings of the IEEE conference on computer vision and pattern recognition},
pages={4510--4520},
year={2018}
}
COCO (ECCV'2014)
@inproceedings{lin2014microsoft,
title={Microsoft coco: Common objects in context},
author={Lin, Tsung-Yi and Maire, Michael and Belongie, Serge and Hays, James and Perona, Pietro and Ramanan, Deva and Doll{\'a}r, Piotr and Zitnick, C Lawrence},
booktitle={European conference on computer vision},
pages={740--755},
year={2014},
organization={Springer}
}
Results on COCO val2017 without multi-scale test
Arch | Input Size | AP | AP50 | AP75 | AR | AR50 | ckpt | log |
---|---|---|---|---|---|---|---|---|
pose_mobilenetv2 | 512x512 | 0.380 | 0.671 | 0.368 | 0.473 | 0.741 | ckpt | log |
Results on COCO val2017 with multi-scale test. 3 default scales ([2, 1, 0.5]) are used
Arch | Input Size | AP | AP50 | AP75 | AR | AR50 | ckpt | log |
---|---|---|---|---|---|---|---|---|
pose_mobilenetv2 | 512x512 | 0.442 | 0.696 | 0.422 | 0.517 | 0.766 | ckpt | log |
Associative Embedding + Hrnet on Coco¶
Associative Embedding (NIPS'2017)
@inproceedings{newell2017associative,
title={Associative embedding: End-to-end learning for joint detection and grouping},
author={Newell, Alejandro and Huang, Zhiao and Deng, Jia},
booktitle={Advances in neural information processing systems},
pages={2277--2287},
year={2017}
}
HRNet (CVPR'2019)
@inproceedings{sun2019deep,
title={Deep high-resolution representation learning for human pose estimation},
author={Sun, Ke and Xiao, Bin and Liu, Dong and Wang, Jingdong},
booktitle={Proceedings of the IEEE conference on computer vision and pattern recognition},
pages={5693--5703},
year={2019}
}
COCO (ECCV'2014)
@inproceedings{lin2014microsoft,
title={Microsoft coco: Common objects in context},
author={Lin, Tsung-Yi and Maire, Michael and Belongie, Serge and Hays, James and Perona, Pietro and Ramanan, Deva and Doll{\'a}r, Piotr and Zitnick, C Lawrence},
booktitle={European conference on computer vision},
pages={740--755},
year={2014},
organization={Springer}
}
Results on COCO val2017 without multi-scale test
Arch | Input Size | AP | AP50 | AP75 | AR | AR50 | ckpt | log |
---|---|---|---|---|---|---|---|---|
HRNet-w32 | 512x512 | 0.654 | 0.863 | 0.720 | 0.710 | 0.892 | ckpt | log |
HRNet-w48 | 512x512 | 0.665 | 0.860 | 0.727 | 0.716 | 0.889 | ckpt | log |
Results on COCO val2017 with multi-scale test. 3 default scales ([2, 1, 0.5]) are used
Arch | Input Size | AP | AP50 | AP75 | AR | AR50 | ckpt | log |
---|---|---|---|---|---|---|---|---|
HRNet-w32 | 512x512 | 0.698 | 0.877 | 0.760 | 0.748 | 0.907 | ckpt | log |
HRNet-w48 | 512x512 | 0.712 | 0.880 | 0.771 | 0.757 | 0.909 | ckpt | log |
Associative Embedding + Higherhrnet on Coco¶
Associative Embedding (NIPS'2017)
@inproceedings{newell2017associative,
title={Associative embedding: End-to-end learning for joint detection and grouping},
author={Newell, Alejandro and Huang, Zhiao and Deng, Jia},
booktitle={Advances in neural information processing systems},
pages={2277--2287},
year={2017}
}
HigherHRNet (CVPR'2020)
@inproceedings{cheng2020higherhrnet,
title={HigherHRNet: Scale-Aware Representation Learning for Bottom-Up Human Pose Estimation},
author={Cheng, Bowen and Xiao, Bin and Wang, Jingdong and Shi, Honghui and Huang, Thomas S and Zhang, Lei},
booktitle={Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition},
pages={5386--5395},
year={2020}
}
COCO (ECCV'2014)
@inproceedings{lin2014microsoft,
title={Microsoft coco: Common objects in context},
author={Lin, Tsung-Yi and Maire, Michael and Belongie, Serge and Hays, James and Perona, Pietro and Ramanan, Deva and Doll{\'a}r, Piotr and Zitnick, C Lawrence},
booktitle={European conference on computer vision},
pages={740--755},
year={2014},
organization={Springer}
}
Results on COCO val2017 without multi-scale test
Arch | Input Size | AP | AP50 | AP75 | AR | AR50 | ckpt | log |
---|---|---|---|---|---|---|---|---|
HigherHRNet-w32 | 512x512 | 0.677 | 0.870 | 0.738 | 0.723 | 0.890 | ckpt | log |
HigherHRNet-w32 | 640x640 | 0.686 | 0.871 | 0.747 | 0.733 | 0.898 | ckpt | log |
HigherHRNet-w48 | 512x512 | 0.686 | 0.873 | 0.741 | 0.731 | 0.892 | ckpt | log |
Results on COCO val2017 with multi-scale test. 3 default scales ([2, 1, 0.5]) are used
Arch | Input Size | AP | AP50 | AP75 | AR | AR50 | ckpt | log |
---|---|---|---|---|---|---|---|---|
HigherHRNet-w32 | 512x512 | 0.706 | 0.881 | 0.771 | 0.747 | 0.901 | ckpt | log |
HigherHRNet-w32 | 640x640 | 0.706 | 0.880 | 0.770 | 0.749 | 0.902 | ckpt | log |
HigherHRNet-w48 | 512x512 | 0.716 | 0.884 | 0.775 | 0.755 | 0.901 | ckpt | log |
Associative Embedding + Resnet on Coco¶
Associative Embedding (NIPS'2017)
@inproceedings{newell2017associative,
title={Associative embedding: End-to-end learning for joint detection and grouping},
author={Newell, Alejandro and Huang, Zhiao and Deng, Jia},
booktitle={Advances in neural information processing systems},
pages={2277--2287},
year={2017}
}
ResNet (CVPR'2016)
@inproceedings{he2016deep,
title={Deep residual learning for image recognition},
author={He, Kaiming and Zhang, Xiangyu and Ren, Shaoqing and Sun, Jian},
booktitle={Proceedings of the IEEE conference on computer vision and pattern recognition},
pages={770--778},
year={2016}
}
COCO (ECCV'2014)
@inproceedings{lin2014microsoft,
title={Microsoft coco: Common objects in context},
author={Lin, Tsung-Yi and Maire, Michael and Belongie, Serge and Hays, James and Perona, Pietro and Ramanan, Deva and Doll{\'a}r, Piotr and Zitnick, C Lawrence},
booktitle={European conference on computer vision},
pages={740--755},
year={2014},
organization={Springer}
}
Results on COCO val2017 without multi-scale test
Arch | Input Size | AP | AP50 | AP75 | AR | AR50 | ckpt | log |
---|---|---|---|---|---|---|---|---|
pose_resnet_50 | 512x512 | 0.466 | 0.742 | 0.479 | 0.552 | 0.797 | ckpt | log |
pose_resnet_50 | 640x640 | 0.479 | 0.757 | 0.487 | 0.566 | 0.810 | ckpt | log |
pose_resnet_101 | 512x512 | 0.554 | 0.807 | 0.599 | 0.622 | 0.841 | ckpt | log |
pose_resnet_152 | 512x512 | 0.595 | 0.829 | 0.648 | 0.651 | 0.856 | ckpt | log |
Results on COCO val2017 with multi-scale test. 3 default scales ([2, 1, 0.5]) are used
Arch | Input Size | AP | AP50 | AP75 | AR | AR50 | ckpt | log |
---|---|---|---|---|---|---|---|---|
pose_resnet_50 | 512x512 | 0.503 | 0.765 | 0.521 | 0.591 | 0.821 | ckpt | log |
pose_resnet_50 | 640x640 | 0.525 | 0.784 | 0.542 | 0.610 | 0.832 | ckpt | log |
pose_resnet_101 | 512x512 | 0.603 | 0.831 | 0.641 | 0.668 | 0.870 | ckpt | log |
pose_resnet_152 | 512x512 | 0.660 | 0.860 | 0.713 | 0.709 | 0.889 | ckpt | log |
Associative Embedding + Hrnet + Udp on Coco¶
Associative Embedding (NIPS'2017)
@inproceedings{newell2017associative,
title={Associative embedding: End-to-end learning for joint detection and grouping},
author={Newell, Alejandro and Huang, Zhiao and Deng, Jia},
booktitle={Advances in neural information processing systems},
pages={2277--2287},
year={2017}
}
HRNet (CVPR'2019)
@inproceedings{sun2019deep,
title={Deep high-resolution representation learning for human pose estimation},
author={Sun, Ke and Xiao, Bin and Liu, Dong and Wang, Jingdong},
booktitle={Proceedings of the IEEE conference on computer vision and pattern recognition},
pages={5693--5703},
year={2019}
}
UDP (CVPR'2020)
@InProceedings{Huang_2020_CVPR,
author = {Huang, Junjie and Zhu, Zheng and Guo, Feng and Huang, Guan},
title = {The Devil Is in the Details: Delving Into Unbiased Data Processing for Human Pose Estimation},
booktitle = {The IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)},
month = {June},
year = {2020}
}
COCO (ECCV'2014)
@inproceedings{lin2014microsoft,
title={Microsoft coco: Common objects in context},
author={Lin, Tsung-Yi and Maire, Michael and Belongie, Serge and Hays, James and Perona, Pietro and Ramanan, Deva and Doll{\'a}r, Piotr and Zitnick, C Lawrence},
booktitle={European conference on computer vision},
pages={740--755},
year={2014},
organization={Springer}
}
Results on COCO val2017 without multi-scale test
Arch | Input Size | AP | AP50 | AP75 | AR | AR50 | ckpt | log |
---|---|---|---|---|---|---|---|---|
HRNet-w32_udp | 512x512 | 0.671 | 0.863 | 0.729 | 0.717 | 0.889 | ckpt | log |
HRNet-w48_udp | 512x512 | 0.681 | 0.872 | 0.741 | 0.725 | 0.892 | ckpt | log |
Associative Embedding + Hourglass + Ae on Coco¶
Associative Embedding (NIPS'2017)
@inproceedings{newell2017associative,
title={Associative embedding: End-to-end learning for joint detection and grouping},
author={Newell, Alejandro and Huang, Zhiao and Deng, Jia},
booktitle={Advances in neural information processing systems},
pages={2277--2287},
year={2017}
}
HourglassAENet (NIPS'2017)
@inproceedings{newell2017associative,
title={Associative embedding: End-to-end learning for joint detection and grouping},
author={Newell, Alejandro and Huang, Zhiao and Deng, Jia},
booktitle={Advances in neural information processing systems},
pages={2277--2287},
year={2017}
}
COCO (ECCV'2014)
@inproceedings{lin2014microsoft,
title={Microsoft coco: Common objects in context},
author={Lin, Tsung-Yi and Maire, Michael and Belongie, Serge and Hays, James and Perona, Pietro and Ramanan, Deva and Doll{\'a}r, Piotr and Zitnick, C Lawrence},
booktitle={European conference on computer vision},
pages={740--755},
year={2014},
organization={Springer}
}
Results on COCO val2017 without multi-scale test
Arch | Input Size | AP | AP50 | AP75 | AR | AR50 | ckpt | log |
---|---|---|---|---|---|---|---|---|
pose_hourglass_ae | 512x512 | 0.613 | 0.833 | 0.667 | 0.659 | 0.850 | ckpt | log |
Results on COCO val2017 with multi-scale test. 3 default scales ([2, 1, 0.5]) are used
Arch | Input Size | AP | AP50 | AP75 | AR | AR50 | ckpt | log |
---|---|---|---|---|---|---|---|---|
pose_hourglass_ae | 512x512 | 0.667 | 0.855 | 0.723 | 0.707 | 0.877 | ckpt | log |
Associative Embedding + Higherhrnet on Crowdpose¶
Associative Embedding (NIPS'2017)
@inproceedings{newell2017associative,
title={Associative embedding: End-to-end learning for joint detection and grouping},
author={Newell, Alejandro and Huang, Zhiao and Deng, Jia},
booktitle={Advances in neural information processing systems},
pages={2277--2287},
year={2017}
}
HigherHRNet (CVPR'2020)
@inproceedings{cheng2020higherhrnet,
title={HigherHRNet: Scale-Aware Representation Learning for Bottom-Up Human Pose Estimation},
author={Cheng, Bowen and Xiao, Bin and Wang, Jingdong and Shi, Honghui and Huang, Thomas S and Zhang, Lei},
booktitle={Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition},
pages={5386--5395},
year={2020}
}
CrowdPose (CVPR'2019)
@article{li2018crowdpose,
title={CrowdPose: Efficient Crowded Scenes Pose Estimation and A New Benchmark},
author={Li, Jiefeng and Wang, Can and Zhu, Hao and Mao, Yihuan and Fang, Hao-Shu and Lu, Cewu},
journal={arXiv preprint arXiv:1812.00324},
year={2018}
}
Results on CrowdPose test without multi-scale test
Arch | Input Size | AP | AP50 | AP75 | AP (E) | AP (M) | AP (H) | ckpt | log |
---|---|---|---|---|---|---|---|---|---|
HigherHRNet-w32 | 512x512 | 0.655 | 0.859 | 0.705 | 0.728 | 0.660 | 0.577 | ckpt | log |
Results on CrowdPose test with multi-scale test. 2 scales ([2, 1]) are used
Arch | Input Size | AP | AP50 | AP75 | AP (E) | AP (M) | AP (H) | ckpt | log |
---|---|---|---|---|---|---|---|---|---|
HigherHRNet-w32 | 512x512 | 0.661 | 0.864 | 0.710 | 0.742 | 0.670 | 0.566 | ckpt | log |
Associative Embedding + Hrnet on MHP¶
Associative Embedding (NIPS'2017)
@inproceedings{newell2017associative,
title={Associative embedding: End-to-end learning for joint detection and grouping},
author={Newell, Alejandro and Huang, Zhiao and Deng, Jia},
booktitle={Advances in neural information processing systems},
pages={2277--2287},
year={2017}
}
HRNet (CVPR'2019)
@inproceedings{sun2019deep,
title={Deep high-resolution representation learning for human pose estimation},
author={Sun, Ke and Xiao, Bin and Liu, Dong and Wang, Jingdong},
booktitle={Proceedings of the IEEE conference on computer vision and pattern recognition},
pages={5693--5703},
year={2019}
}
MHP (ACM MM'2018)
@inproceedings{zhao2018understanding,
title={Understanding humans in crowded scenes: Deep nested adversarial learning and a new benchmark for multi-human parsing},
author={Zhao, Jian and Li, Jianshu and Cheng, Yu and Sim, Terence and Yan, Shuicheng and Feng, Jiashi},
booktitle={Proceedings of the 26th ACM international conference on Multimedia},
pages={792--800},
year={2018}
}
Results on MHP v2.0 validation set without multi-scale test
Arch | Input Size | AP | AP50 | AP75 | AR | AR50 | ckpt | log |
---|---|---|---|---|---|---|---|---|
HRNet-w48 | 512x512 | 0.583 | 0.895 | 0.666 | 0.656 | 0.931 | ckpt | log |
Results on MHP v2.0 validation set with multi-scale test. 3 default scales ([2, 1, 0.5]) are used
Arch | Input Size | AP | AP50 | AP75 | AR | AR50 | ckpt | log |
---|---|---|---|---|---|---|---|---|
HRNet-w48 | 512x512 | 0.592 | 0.898 | 0.673 | 0.664 | 0.932 | ckpt | log |
Associative Embedding + Hrnet on Coco-Wholebody¶
Associative Embedding (NIPS'2017)
@inproceedings{newell2017associative,
title={Associative embedding: End-to-end learning for joint detection and grouping},
author={Newell, Alejandro and Huang, Zhiao and Deng, Jia},
booktitle={Advances in neural information processing systems},
pages={2277--2287},
year={2017}
}
HRNet (CVPR'2019)
@inproceedings{sun2019deep,
title={Deep high-resolution representation learning for human pose estimation},
author={Sun, Ke and Xiao, Bin and Liu, Dong and Wang, Jingdong},
booktitle={Proceedings of the IEEE conference on computer vision and pattern recognition},
pages={5693--5703},
year={2019}
}
COCO-WholeBody (ECCV'2020)
@inproceedings{jin2020whole,
title={Whole-Body Human Pose Estimation in the Wild},
author={Jin, Sheng and Xu, Lumin and Xu, Jin and Wang, Can and Liu, Wentao and Qian, Chen and Ouyang, Wanli and Luo, Ping},
booktitle={Proceedings of the European Conference on Computer Vision (ECCV)},
year={2020}
}
Results on COCO-WholeBody v1.0 val without multi-scale test
Arch | Input Size | Body AP | Body AR | Foot AP | Foot AR | Face AP | Face AR | Hand AP | Hand AR | Whole AP | Whole AR | ckpt | log |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|
HRNet-w32+ | 512x512 | 0.551 | 0.650 | 0.271 | 0.451 | 0.564 | 0.618 | 0.159 | 0.238 | 0.342 | 0.453 | ckpt | log |
HRNet-w48+ | 512x512 | 0.592 | 0.686 | 0.443 | 0.595 | 0.619 | 0.674 | 0.347 | 0.438 | 0.422 | 0.532 | ckpt | log |
Note: +
means the model is first pre-trained on original COCO dataset, and then fine-tuned on COCO-WholeBody dataset. We find this will lead to better performance.
Associative Embedding + Higherhrnet on Coco-Wholebody¶
Associative Embedding (NIPS'2017)
@inproceedings{newell2017associative,
title={Associative embedding: End-to-end learning for joint detection and grouping},
author={Newell, Alejandro and Huang, Zhiao and Deng, Jia},
booktitle={Advances in neural information processing systems},
pages={2277--2287},
year={2017}
}
HigherHRNet (CVPR'2020)
@inproceedings{cheng2020higherhrnet,
title={HigherHRNet: Scale-Aware Representation Learning for Bottom-Up Human Pose Estimation},
author={Cheng, Bowen and Xiao, Bin and Wang, Jingdong and Shi, Honghui and Huang, Thomas S and Zhang, Lei},
booktitle={Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition},
pages={5386--5395},
year={2020}
}
COCO-WholeBody (ECCV'2020)
@inproceedings{jin2020whole,
title={Whole-Body Human Pose Estimation in the Wild},
author={Jin, Sheng and Xu, Lumin and Xu, Jin and Wang, Can and Liu, Wentao and Qian, Chen and Ouyang, Wanli and Luo, Ping},
booktitle={Proceedings of the European Conference on Computer Vision (ECCV)},
year={2020}
}
Results on COCO-WholeBody v1.0 val without multi-scale test
Arch | Input Size | Body AP | Body AR | Foot AP | Foot AR | Face AP | Face AR | Hand AP | Hand AR | Whole AP | Whole AR | ckpt | log |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|
HigherHRNet-w32+ | 512x512 | 0.590 | 0.672 | 0.185 | 0.335 | 0.676 | 0.721 | 0.212 | 0.298 | 0.401 | 0.493 | ckpt | log |
HigherHRNet-w48+ | 512x512 | 0.630 | 0.706 | 0.440 | 0.573 | 0.730 | 0.777 | 0.389 | 0.477 | 0.487 | 0.574 | ckpt | log |
Note: +
means the model is first pre-trained on original COCO dataset, and then fine-tuned on COCO-WholeBody dataset. We find this will lead to better performance.
VoxelPose (ECCV’2020)¶
Voxelpose + Voxelpose on Campus¶
VoxelPose (ECCV'2020)
@inproceedings{tumultipose,
title={VoxelPose: Towards Multi-Camera 3D Human Pose Estimation in Wild Environment},
author={Tu, Hanyue and Wang, Chunyu and Zeng, Wenjun},
booktitle={ECCV},
year={2020}
}
Campus (CVPR'2014)
@inproceedings {belagian14multi,
title = {{3D} Pictorial Structures for Multiple Human Pose Estimation},
author = {Belagiannis, Vasileios and Amin, Sikandar and Andriluka, Mykhaylo and Schiele, Bernt and Navab
Nassir and Ilic, Slobodan},
booktitle = {IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR)},
year = {2014},
month = {June},
organization={IEEE}
}
Results on Campus dataset.
Arch | Actor 1 | Actor 2 | Actor 3 | Average | ckpt | log |
---|---|---|---|---|---|---|
prn32_cpn80_res50 | 97.76 | 93.92 | 98.48 | 96.72 | ckpt | log |
prn64_cpn80_res50 | 97.76 | 93.33 | 98.77 | 96.62 | ckpt | log |
Voxelpose + Voxelpose + Prn64x64x64 + Cpn80x80x20 + Panoptic on Panoptic¶
VoxelPose (ECCV'2020)
@inproceedings{tumultipose,
title={VoxelPose: Towards Multi-Camera 3D Human Pose Estimation in Wild Environment},
author={Tu, Hanyue and Wang, Chunyu and Zeng, Wenjun},
booktitle={ECCV},
year={2020}
}
CMU Panoptic (ICCV'2015)
@Article = {joo_iccv_2015,
author = {Hanbyul Joo, Hao Liu, Lei Tan, Lin Gui, Bart Nabbe, Iain Matthews, Takeo Kanade, Shohei Nobuhara, and Yaser Sheikh},
title = {Panoptic Studio: A Massively Multiview System for Social Motion Capture},
booktitle = {ICCV},
year = {2015}
}
Results on CMU Panoptic dataset.
Arch | mAP | mAR | MPJPE | Recall@500mm | ckpt | log |
---|---|---|---|---|---|---|
prn64_cpn80_res50 | 97.31 | 97.99 | 17.57 | 99.85 | ckpt | log |
Voxelpose + Voxelpose on Shelf¶
VoxelPose (ECCV'2020)
@inproceedings{tumultipose,
title={VoxelPose: Towards Multi-Camera 3D Human Pose Estimation in Wild Environment},
author={Tu, Hanyue and Wang, Chunyu and Zeng, Wenjun},
booktitle={ECCV},
year={2020}
}
Shelf (CVPR'2014)
@inproceedings {belagian14multi,
title = {{3D} Pictorial Structures for Multiple Human Pose Estimation},
author = {Belagiannis, Vasileios and Amin, Sikandar and Andriluka, Mykhaylo and Schiele, Bernt and Navab
Nassir and Ilic, Slobo
booktitle = {IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR)},
year = {2014},
month = {June},
organization={IEEE}
}
Results on Shelf dataset.
Arch | Actor 1 | Actor 2 | Actor 3 | Average | ckpt | log |
---|---|---|---|---|---|---|
prn32_cpn48_res50 | 99.10 | 94.86 | 97.52 | 97.16 | ckpt | log |
prn64_cpn80_res50 | 99.00 | 94.59 | 97.64 | 97.08 | ckpt | log |
RSN (ECCV’2020)¶
Topdown Heatmap + RSN on Coco¶
RSN (ECCV'2020)
@misc{cai2020learning,
title={Learning Delicate Local Representations for Multi-Person Pose Estimation},
author={Yuanhao Cai and Zhicheng Wang and Zhengxiong Luo and Binyi Yin and Angang Du and Haoqian Wang and Xinyu Zhou and Erjin Zhou and Xiangyu Zhang and Jian Sun},
year={2020},
eprint={2003.04030},
archivePrefix={arXiv},
primaryClass={cs.CV}
}
COCO (ECCV'2014)
@inproceedings{lin2014microsoft,
title={Microsoft coco: Common objects in context},
author={Lin, Tsung-Yi and Maire, Michael and Belongie, Serge and Hays, James and Perona, Pietro and Ramanan, Deva and Doll{\'a}r, Piotr and Zitnick, C Lawrence},
booktitle={European conference on computer vision},
pages={740--755},
year={2014},
organization={Springer}
}
Results on COCO val2017 with detector having human AP of 56.4 on COCO val2017 dataset
Arch | Input Size | AP | AP50 | AP75 | AR | AR50 | ckpt | log |
---|---|---|---|---|---|---|---|---|
rsn_18 | 256x192 | 0.704 | 0.887 | 0.779 | 0.771 | 0.926 | ckpt | log |
rsn_50 | 256x192 | 0.723 | 0.896 | 0.800 | 0.788 | 0.934 | ckpt | log |
2xrsn_50 | 256x192 | 0.745 | 0.899 | 0.818 | 0.809 | 0.939 | ckpt | log |
3xrsn_50 | 256x192 | 0.750 | 0.900 | 0.823 | 0.813 | 0.940 | ckpt | log |
CID (CVPR’2022)¶
Cid + Hrnet on Coco¶
CID (CVPR'2022)
@InProceedings{Wang_2022_CVPR,
author = {Wang, Dongkai and Zhang, Shiliang},
title = {Contextual Instance Decoupling for Robust Multi-Person Pose Estimation},
booktitle = {Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)},
month = {June},
year = {2022},
pages = {11060-11068}
}
COCO (ECCV'2014)
@inproceedings{lin2014microsoft,
title={Microsoft coco: Common objects in context},
author={Lin, Tsung-Yi and Maire, Michael and Belongie, Serge and Hays, James and Perona, Pietro and Ramanan, Deva and Doll{\'a}r, Piotr and Zitnick, C Lawrence},
booktitle={European conference on computer vision},
pages={740--755},
year={2014},
organization={Springer}
}
Results on COCO val2017 without multi-scale test
Arch | Input Size | AP | AP50 | AP75 | AR | AR50 | ckpt | log |
---|---|---|---|---|---|---|---|---|
CID | 512x512 | 0.702 | 0.887 | 0.768 | 0.755 | 0.926 | ckpt | log |
CID | 512x512 | 0.715 | 0.895 | 0.780 | 0.768 | 0.932 | ckpt | log |
CPM (CVPR’2016)¶
Topdown Heatmap + CPM on Coco¶
CPM (CVPR'2016)
@inproceedings{wei2016convolutional,
title={Convolutional pose machines},
author={Wei, Shih-En and Ramakrishna, Varun and Kanade, Takeo and Sheikh, Yaser},
booktitle={Proceedings of the IEEE conference on Computer Vision and Pattern Recognition},
pages={4724--4732},
year={2016}
}
COCO (ECCV'2014)
@inproceedings{lin2014microsoft,
title={Microsoft coco: Common objects in context},
author={Lin, Tsung-Yi and Maire, Michael and Belongie, Serge and Hays, James and Perona, Pietro and Ramanan, Deva and Doll{\'a}r, Piotr and Zitnick, C Lawrence},
booktitle={European conference on computer vision},
pages={740--755},
year={2014},
organization={Springer}
}
Results on COCO val2017 with detector having human AP of 56.4 on COCO val2017 dataset
Arch | Input Size | AP | AP50 | AP75 | AR | AR50 | ckpt | log |
---|---|---|---|---|---|---|---|---|
cpm | 256x192 | 0.623 | 0.859 | 0.704 | 0.686 | 0.903 | ckpt | log |
cpm | 384x288 | 0.650 | 0.864 | 0.725 | 0.708 | 0.905 | ckpt | log |
Topdown Heatmap + CPM on JHMDB¶
CPM (CVPR'2016)
@inproceedings{wei2016convolutional,
title={Convolutional pose machines},
author={Wei, Shih-En and Ramakrishna, Varun and Kanade, Takeo and Sheikh, Yaser},
booktitle={Proceedings of the IEEE conference on Computer Vision and Pattern Recognition},
pages={4724--4732},
year={2016}
}
JHMDB (ICCV'2013)
@inproceedings{Jhuang:ICCV:2013,
title = {Towards understanding action recognition},
author = {H. Jhuang and J. Gall and S. Zuffi and C. Schmid and M. J. Black},
booktitle = {International Conf. on Computer Vision (ICCV)},
month = Dec,
pages = {3192-3199},
year = {2013}
}
Results on Sub-JHMDB dataset
The models are pre-trained on MPII dataset only. NO test-time augmentation (multi-scale /rotation testing) is used.
Normalized by Person Size
Split | Arch | Input Size | Head | Sho | Elb | Wri | Hip | Knee | Ank | Mean | ckpt | log |
---|---|---|---|---|---|---|---|---|---|---|---|---|
Sub1 | cpm | 368x368 | 96.1 | 91.9 | 81.0 | 78.9 | 96.6 | 90.8 | 87.3 | 89.5 | ckpt | log |
Sub2 | cpm | 368x368 | 98.1 | 93.6 | 77.1 | 70.9 | 94.0 | 89.1 | 84.7 | 87.4 | ckpt | log |
Sub3 | cpm | 368x368 | 97.9 | 94.9 | 87.3 | 84.0 | 98.6 | 94.4 | 86.2 | 92.4 | ckpt | log |
Average | cpm | 368x368 | 97.4 | 93.5 | 81.5 | 77.9 | 96.4 | 91.4 | 86.1 | 89.8 | - | - |
Normalized by Torso Size
Split | Arch | Input Size | Head | Sho | Elb | Wri | Hip | Knee | Ank | Mean | ckpt | log |
---|---|---|---|---|---|---|---|---|---|---|---|---|
Sub1 | cpm | 368x368 | 89.0 | 63.0 | 54.0 | 54.9 | 68.2 | 63.1 | 61.2 | 66.0 | ckpt | log |
Sub2 | cpm | 368x368 | 90.3 | 57.9 | 46.8 | 44.3 | 60.8 | 58.2 | 62.4 | 61.1 | ckpt | log |
Sub3 | cpm | 368x368 | 91.0 | 72.6 | 59.9 | 54.0 | 73.2 | 68.5 | 65.8 | 70.3 | ckpt | log |
Average | cpm | 368x368 | 90.1 | 64.5 | 53.6 | 51.1 | 67.4 | 63.3 | 63.1 | 65.7 | - | - |
Topdown Heatmap + CPM on Mpii¶
CPM (CVPR'2016)
@inproceedings{wei2016convolutional,
title={Convolutional pose machines},
author={Wei, Shih-En and Ramakrishna, Varun and Kanade, Takeo and Sheikh, Yaser},
booktitle={Proceedings of the IEEE conference on Computer Vision and Pattern Recognition},
pages={4724--4732},
year={2016}
}
MPII (CVPR'2014)
@inproceedings{andriluka14cvpr,
author = {Mykhaylo Andriluka and Leonid Pishchulin and Peter Gehler and Schiele, Bernt},
title = {2D Human Pose Estimation: New Benchmark and State of the Art Analysis},
booktitle = {IEEE Conference on Computer Vision and Pattern Recognition (CVPR)},
year = {2014},
month = {June}
}
Results on MPII val set
Arch | Input Size | Mean | Mean@0.1 | ckpt | log |
---|---|---|---|---|---|
cpm | 368x368 | 0.876 | 0.285 | ckpt | log |
HRNet (CVPR’2019)¶
Topdown Heatmap + Hrnet on Animalpose¶
HRNet (CVPR'2019)
@inproceedings{sun2019deep,
title={Deep high-resolution representation learning for human pose estimation},
author={Sun, Ke and Xiao, Bin and Liu, Dong and Wang, Jingdong},
booktitle={Proceedings of the IEEE conference on computer vision and pattern recognition},
pages={5693--5703},
year={2019}
}
Animal-Pose (ICCV'2019)
@InProceedings{Cao_2019_ICCV,
author = {Cao, Jinkun and Tang, Hongyang and Fang, Hao-Shu and Shen, Xiaoyong and Lu, Cewu and Tai, Yu-Wing},
title = {Cross-Domain Adaptation for Animal Pose Estimation},
booktitle = {The IEEE International Conference on Computer Vision (ICCV)},
month = {October},
year = {2019}
}
Results on AnimalPose validation set (1117 instances)
Arch | Input Size | AP | AP50 | AP75 | AR | AR50 | ckpt | log |
---|---|---|---|---|---|---|---|---|
pose_hrnet_w32 | 256x256 | 0.736 | 0.959 | 0.832 | 0.775 | 0.966 | ckpt | log |
pose_hrnet_w48 | 256x256 | 0.737 | 0.959 | 0.823 | 0.778 | 0.962 | ckpt | log |
Topdown Heatmap + Hrnet on Ap10k¶
HRNet (CVPR'2019)
@inproceedings{sun2019deep,
title={Deep high-resolution representation learning for human pose estimation},
author={Sun, Ke and Xiao, Bin and Liu, Dong and Wang, Jingdong},
booktitle={Proceedings of the IEEE conference on computer vision and pattern recognition},
pages={5693--5703},
year={2019}
}
AP-10K (NeurIPS'2021)
@misc{yu2021ap10k,
title={AP-10K: A Benchmark for Animal Pose Estimation in the Wild},
author={Hang Yu and Yufei Xu and Jing Zhang and Wei Zhao and Ziyu Guan and Dacheng Tao},
year={2021},
eprint={2108.12617},
archivePrefix={arXiv},
primaryClass={cs.CV}
}
Results on AP-10K validation set
Arch | Input Size | AP | AP50 | AP75 | APM | APL | ckpt | log |
---|---|---|---|---|---|---|---|---|
pose_hrnet_w32 | 256x256 | 0.722 | 0.939 | 0.787 | 0.555 | 0.730 | ckpt | log |
pose_hrnet_w48 | 256x256 | 0.731 | 0.937 | 0.804 | 0.574 | 0.738 | ckpt | log |
Topdown Heatmap + Hrnet on Atrw¶
HRNet (CVPR'2019)
@inproceedings{sun2019deep,
title={Deep high-resolution representation learning for human pose estimation},
author={Sun, Ke and Xiao, Bin and Liu, Dong and Wang, Jingdong},
booktitle={Proceedings of the IEEE conference on computer vision and pattern recognition},
pages={5693--5703},
year={2019}
}
ATRW (ACM MM'2020)
@inproceedings{li2020atrw,
title={ATRW: A Benchmark for Amur Tiger Re-identification in the Wild},
author={Li, Shuyuan and Li, Jianguo and Tang, Hanlin and Qian, Rui and Lin, Weiyao},
booktitle={Proceedings of the 28th ACM International Conference on Multimedia},
pages={2590--2598},
year={2020}
}
Results on ATRW validation set
Arch | Input Size | AP | AP50 | AP75 | AR | AR50 | ckpt | log |
---|---|---|---|---|---|---|---|---|
pose_hrnet_w32 | 256x256 | 0.912 | 0.973 | 0.959 | 0.938 | 0.985 | ckpt | log |
pose_hrnet_w48 | 256x256 | 0.911 | 0.972 | 0.946 | 0.937 | 0.985 | ckpt | log |
Topdown Heatmap + Hrnet on Horse10¶
HRNet (CVPR'2019)
@inproceedings{sun2019deep,
title={Deep high-resolution representation learning for human pose estimation},
author={Sun, Ke and Xiao, Bin and Liu, Dong and Wang, Jingdong},
booktitle={Proceedings of the IEEE conference on computer vision and pattern recognition},
pages={5693--5703},
year={2019}
}
Horse-10 (WACV'2021)
@inproceedings{mathis2021pretraining,
title={Pretraining boosts out-of-domain robustness for pose estimation},
author={Mathis, Alexander and Biasi, Thomas and Schneider, Steffen and Yuksekgonul, Mert and Rogers, Byron and Bethge, Matthias and Mathis, Mackenzie W},
booktitle={Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision},
pages={1859--1868},
year={2021}
}
Results on Horse-10 test set
Set | Arch | Input Size | PCK@0.3 | NME | ckpt | log |
---|---|---|---|---|---|---|
split1 | pose_hrnet_w32 | 256x256 | 0.951 | 0.122 | ckpt | log |
split2 | pose_hrnet_w32 | 256x256 | 0.949 | 0.116 | ckpt | log |
split3 | pose_hrnet_w32 | 256x256 | 0.939 | 0.153 | ckpt | log |
split1 | pose_hrnet_w48 | 256x256 | 0.973 | 0.095 | ckpt | log |
split2 | pose_hrnet_w48 | 256x256 | 0.969 | 0.101 | ckpt | log |
split3 | pose_hrnet_w48 | 256x256 | 0.961 | 0.128 | ckpt | log |
Topdown Heatmap + Hrnet on Macaque¶
HRNet (CVPR'2019)
@inproceedings{sun2019deep,
title={Deep high-resolution representation learning for human pose estimation},
author={Sun, Ke and Xiao, Bin and Liu, Dong and Wang, Jingdong},
booktitle={Proceedings of the IEEE conference on computer vision and pattern recognition},
pages={5693--5703},
year={2019}
}
MacaquePose (bioRxiv'2020)
@article{labuguen2020macaquepose,
title={MacaquePose: A novel ‘in the wild’macaque monkey pose dataset for markerless motion capture},
author={Labuguen, Rollyn and Matsumoto, Jumpei and Negrete, Salvador and Nishimaru, Hiroshi and Nishijo, Hisao and Takada, Masahiko and Go, Yasuhiro and Inoue, Ken-ichi and Shibata, Tomohiro},
journal={bioRxiv},
year={2020},
publisher={Cold Spring Harbor Laboratory}
}
Results on MacaquePose with ground-truth detection bounding boxes
Arch | Input Size | AP | AP50 | AP75 | AR | AR50 | ckpt | log |
---|---|---|---|---|---|---|---|---|
pose_hrnet_w32 | 256x192 | 0.814 | 0.953 | 0.918 | 0.851 | 0.969 | ckpt | log |
pose_hrnet_w48 | 256x192 | 0.818 | 0.963 | 0.917 | 0.855 | 0.971 | ckpt | log |
Associative Embedding + Hrnet on Aic¶
Associative Embedding (NIPS'2017)
@inproceedings{newell2017associative,
title={Associative embedding: End-to-end learning for joint detection and grouping},
author={Newell, Alejandro and Huang, Zhiao and Deng, Jia},
booktitle={Advances in neural information processing systems},
pages={2277--2287},
year={2017}
}
HRNet (CVPR'2019)
@inproceedings{sun2019deep,
title={Deep high-resolution representation learning for human pose estimation},
author={Sun, Ke and Xiao, Bin and Liu, Dong and Wang, Jingdong},
booktitle={Proceedings of the IEEE conference on computer vision and pattern recognition},
pages={5693--5703},
year={2019}
}
AI Challenger (ArXiv'2017)
@article{wu2017ai,
title={Ai challenger: A large-scale dataset for going deeper in image understanding},
author={Wu, Jiahong and Zheng, He and Zhao, Bo and Li, Yixin and Yan, Baoming and Liang, Rui and Wang, Wenjia and Zhou, Shipei and Lin, Guosen and Fu, Yanwei and others},
journal={arXiv preprint arXiv:1711.06475},
year={2017}
}
Results on AIC validation set without multi-scale test
Arch | Input Size | AP | AP50 | AP75 | AR | AR50 | ckpt | log |
---|---|---|---|---|---|---|---|---|
HRNet-w32 | 512x512 | 0.303 | 0.697 | 0.225 | 0.373 | 0.755 | ckpt | log |
Results on AIC validation set with multi-scale test. 3 default scales ([2, 1, 0.5]) are used
Arch | Input Size | AP | AP50 | AP75 | AR | AR50 | ckpt | log |
---|---|---|---|---|---|---|---|---|
HRNet-w32 | 512x512 | 0.318 | 0.717 | 0.246 | 0.379 | 0.764 | ckpt | log |
Topdown Heatmap + Hrnet on Aic¶
HRNet (CVPR'2019)
@inproceedings{sun2019deep,
title={Deep high-resolution representation learning for human pose estimation},
author={Sun, Ke and Xiao, Bin and Liu, Dong and Wang, Jingdong},
booktitle={Proceedings of the IEEE conference on computer vision and pattern recognition},
pages={5693--5703},
year={2019}
}
AI Challenger (ArXiv'2017)
@article{wu2017ai,
title={Ai challenger: A large-scale dataset for going deeper in image understanding},
author={Wu, Jiahong and Zheng, He and Zhao, Bo and Li, Yixin and Yan, Baoming and Liang, Rui and Wang, Wenjia and Zhou, Shipei and Lin, Guosen and Fu, Yanwei and others},
journal={arXiv preprint arXiv:1711.06475},
year={2017}
}
Results on AIC val set with ground-truth bounding boxes
Arch | Input Size | AP | AP50 | AP75 | AR | AR50 | ckpt | log |
---|---|---|---|---|---|---|---|---|
pose_hrnet_w32 | 256x192 | 0.323 | 0.762 | 0.219 | 0.366 | 0.789 | ckpt | log |
Associative Embedding + Hrnet on Coco¶
Associative Embedding (NIPS'2017)
@inproceedings{newell2017associative,
title={Associative embedding: End-to-end learning for joint detection and grouping},
author={Newell, Alejandro and Huang, Zhiao and Deng, Jia},
booktitle={Advances in neural information processing systems},
pages={2277--2287},
year={2017}
}
HRNet (CVPR'2019)
@inproceedings{sun2019deep,
title={Deep high-resolution representation learning for human pose estimation},
author={Sun, Ke and Xiao, Bin and Liu, Dong and Wang, Jingdong},
booktitle={Proceedings of the IEEE conference on computer vision and pattern recognition},
pages={5693--5703},
year={2019}
}
COCO (ECCV'2014)
@inproceedings{lin2014microsoft,
title={Microsoft coco: Common objects in context},
author={Lin, Tsung-Yi and Maire, Michael and Belongie, Serge and Hays, James and Perona, Pietro and Ramanan, Deva and Doll{\'a}r, Piotr and Zitnick, C Lawrence},
booktitle={European conference on computer vision},
pages={740--755},
year={2014},
organization={Springer}
}
Results on COCO val2017 without multi-scale test
Arch | Input Size | AP | AP50 | AP75 | AR | AR50 | ckpt | log |
---|---|---|---|---|---|---|---|---|
HRNet-w32 | 512x512 | 0.654 | 0.863 | 0.720 | 0.710 | 0.892 | ckpt | log |
HRNet-w48 | 512x512 | 0.665 | 0.860 | 0.727 | 0.716 | 0.889 | ckpt | log |
Results on COCO val2017 with multi-scale test. 3 default scales ([2, 1, 0.5]) are used
Arch | Input Size | AP | AP50 | AP75 | AR | AR50 | ckpt | log |
---|---|---|---|---|---|---|---|---|
HRNet-w32 | 512x512 | 0.698 | 0.877 | 0.760 | 0.748 | 0.907 | ckpt | log |
HRNet-w48 | 512x512 | 0.712 | 0.880 | 0.771 | 0.757 | 0.909 | ckpt | log |
Associative Embedding + Hrnet + Udp on Coco¶
Associative Embedding (NIPS'2017)
@inproceedings{newell2017associative,
title={Associative embedding: End-to-end learning for joint detection and grouping},
author={Newell, Alejandro and Huang, Zhiao and Deng, Jia},
booktitle={Advances in neural information processing systems},
pages={2277--2287},
year={2017}
}
HRNet (CVPR'2019)
@inproceedings{sun2019deep,
title={Deep high-resolution representation learning for human pose estimation},
author={Sun, Ke and Xiao, Bin and Liu, Dong and Wang, Jingdong},
booktitle={Proceedings of the IEEE conference on computer vision and pattern recognition},
pages={5693--5703},
year={2019}
}
UDP (CVPR'2020)
@InProceedings{Huang_2020_CVPR,
author = {Huang, Junjie and Zhu, Zheng and Guo, Feng and Huang, Guan},
title = {The Devil Is in the Details: Delving Into Unbiased Data Processing for Human Pose Estimation},
booktitle = {The IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)},
month = {June},
year = {2020}
}
COCO (ECCV'2014)
@inproceedings{lin2014microsoft,
title={Microsoft coco: Common objects in context},
author={Lin, Tsung-Yi and Maire, Michael and Belongie, Serge and Hays, James and Perona, Pietro and Ramanan, Deva and Doll{\'a}r, Piotr and Zitnick, C Lawrence},
booktitle={European conference on computer vision},
pages={740--755},
year={2014},
organization={Springer}
}
Results on COCO val2017 without multi-scale test
Arch | Input Size | AP | AP50 | AP75 | AR | AR50 | ckpt | log |
---|---|---|---|---|---|---|---|---|
HRNet-w32_udp | 512x512 | 0.671 | 0.863 | 0.729 | 0.717 | 0.889 | ckpt | log |
HRNet-w48_udp | 512x512 | 0.681 | 0.872 | 0.741 | 0.725 | 0.892 | ckpt | log |
Dekr + Hrnet on Coco¶
DEKR (CVPR'2021)
@inproceedings{geng2021bottom,
title={Bottom-up human pose estimation via disentangled keypoint regression},
author={Geng, Zigang and Sun, Ke and Xiao, Bin and Zhang, Zhaoxiang and Wang, Jingdong},
booktitle={Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition},
pages={14676--14686},
year={2021}
}
HRNet (CVPR'2019)
@inproceedings{sun2019deep,
title={Deep high-resolution representation learning for human pose estimation},
author={Sun, Ke and Xiao, Bin and Liu, Dong and Wang, Jingdong},
booktitle={Proceedings of the IEEE conference on computer vision and pattern recognition},
pages={5693--5703},
year={2019}
}
COCO (ECCV'2014)
@inproceedings{lin2014microsoft,
title={Microsoft coco: Common objects in context},
author={Lin, Tsung-Yi and Maire, Michael and Belongie, Serge and Hays, James and Perona, Pietro and Ramanan, Deva and Doll{\'a}r, Piotr and Zitnick, C Lawrence},
booktitle={European conference on computer vision},
pages={740--755},
year={2014},
organization={Springer}
}
Results on COCO val2017 without multi-scale test
Arch | Input Size | AP | AP50 | AP75 | AR | AR50 | ckpt | log |
---|---|---|---|---|---|---|---|---|
HRNet-w32 | 512x512 | 0.680 | 0.868 | 0.745 | 0.728 | 0.897 | ckpt | log |
HRNet-w48 | 640x640 | 0.709 | 0.876 | 0.773 | 0.758 | 0.909 | ckpt | log |
Results on COCO val2017 with multi-scale test. 3 default scales ([2, 1, 0.5]) are used
Arch | Input Size | AP | AP50 | AP75 | AR | AR50 | ckpt |
---|---|---|---|---|---|---|---|
HRNet-w32* | 512x512 | 0.705 | 0.878 | 0.767 | 0.759 | 0.921 | ckpt |
HRNet-w48* | 640x640 | 0.722 | 0.882 | 0.785 | 0.778 | 0.928 | ckpt |
* these configs are generally used for evaluation. The training settings are identical to their single-scale counterparts.
The results of models provided by the authors on COCO val2017 using the same evaluation protocol
Arch | Input Size | Setting | AP | AP50 | AP75 | AR | AR50 | ckpt |
---|---|---|---|---|---|---|---|---|
HRNet-w32 | 512x512 | single-scale | 0.678 | 0.868 | 0.744 | 0.728 | 0.897 | see official implementation |
HRNet-w48 | 640x640 | single-scale | 0.707 | 0.876 | 0.773 | 0.757 | 0.909 | see official implementation |
HRNet-w32 | 512x512 | multi-scale | 0.708 | 0.880 | 0.773 | 0.763 | 0.921 | see official implementation |
HRNet-w48 | 640x640 | multi-scale | 0.721 | 0.881 | 0.786 | 0.779 | 0.927 | see official implementation |
The discrepancy between these results and that shown in paper is attributed to the differences in implementation details in evaluation process.
Topdown Heatmap + Hrnet + Fp16 on Coco¶
HRNet (CVPR'2019)
@inproceedings{sun2019deep,
title={Deep high-resolution representation learning for human pose estimation},
author={Sun, Ke and Xiao, Bin and Liu, Dong and Wang, Jingdong},
booktitle={Proceedings of the IEEE conference on computer vision and pattern recognition},
pages={5693--5703},
year={2019}
}
FP16 (ArXiv'2017)
@article{micikevicius2017mixed,
title={Mixed precision training},
author={Micikevicius, Paulius and Narang, Sharan and Alben, Jonah and Diamos, Gregory and Elsen, Erich and Garcia, David and Ginsburg, Boris and Houston, Michael and Kuchaiev, Oleksii and Venkatesh, Ganesh and others},
journal={arXiv preprint arXiv:1710.03740},
year={2017}
}
COCO (ECCV'2014)
@inproceedings{lin2014microsoft,
title={Microsoft coco: Common objects in context},
author={Lin, Tsung-Yi and Maire, Michael and Belongie, Serge and Hays, James and Perona, Pietro and Ramanan, Deva and Doll{\'a}r, Piotr and Zitnick, C Lawrence},
booktitle={European conference on computer vision},
pages={740--755},
year={2014},
organization={Springer}
}
Results on COCO val2017 with detector having human AP of 56.4 on COCO val2017 dataset
Arch | Input Size | AP | AP50 | AP75 | AR | AR50 | ckpt | log |
---|---|---|---|---|---|---|---|---|
pose_hrnet_w32_fp16 | 256x192 | 0.746 | 0.905 | 0.88 | 0.800 | 0.943 | ckpt | log |
Topdown Heatmap + Hrnet on Coco¶
HRNet (CVPR'2019)
@inproceedings{sun2019deep,
title={Deep high-resolution representation learning for human pose estimation},
author={Sun, Ke and Xiao, Bin and Liu, Dong and Wang, Jingdong},
booktitle={Proceedings of the IEEE conference on computer vision and pattern recognition},
pages={5693--5703},
year={2019}
}
COCO (ECCV'2014)
@inproceedings{lin2014microsoft,
title={Microsoft coco: Common objects in context},
author={Lin, Tsung-Yi and Maire, Michael and Belongie, Serge and Hays, James and Perona, Pietro and Ramanan, Deva and Doll{\'a}r, Piotr and Zitnick, C Lawrence},
booktitle={European conference on computer vision},
pages={740--755},
year={2014},
organization={Springer}
}
Results on COCO val2017 with detector having human AP of 56.4 on COCO val2017 dataset
Arch | Input Size | AP | AP50 | AP75 | AR | AR50 | ckpt | log |
---|---|---|---|---|---|---|---|---|
pose_hrnet_w32 | 256x192 | 0.746 | 0.904 | 0.819 | 0.799 | 0.942 | ckpt | log |
pose_hrnet_w32 | 384x288 | 0.760 | 0.906 | 0.829 | 0.810 | 0.943 | ckpt | log |
pose_hrnet_w48 | 256x192 | 0.756 | 0.907 | 0.825 | 0.806 | 0.942 | ckpt | log |
pose_hrnet_w48 | 384x288 | 0.767 | 0.910 | 0.831 | 0.816 | 0.946 | ckpt | log |
Topdown Heatmap + Hrnet + Augmentation on Coco¶
HRNet (CVPR'2019)
@inproceedings{sun2019deep,
title={Deep high-resolution representation learning for human pose estimation},
author={Sun, Ke and Xiao, Bin and Liu, Dong and Wang, Jingdong},
booktitle={Proceedings of the IEEE conference on computer vision and pattern recognition},
pages={5693--5703},
year={2019}
}
Albumentations (Information'2020)
@article{buslaev2020albumentations,
title={Albumentations: fast and flexible image augmentations},
author={Buslaev, Alexander and Iglovikov, Vladimir I and Khvedchenya, Eugene and Parinov, Alex and Druzhinin, Mikhail and Kalinin, Alexandr A},
journal={Information},
volume={11},
number={2},
pages={125},
year={2020},
publisher={Multidisciplinary Digital Publishing Institute}
}
COCO (ECCV'2014)
@inproceedings{lin2014microsoft,
title={Microsoft coco: Common objects in context},
author={Lin, Tsung-Yi and Maire, Michael and Belongie, Serge and Hays, James and Perona, Pietro and Ramanan, Deva and Doll{\'a}r, Piotr and Zitnick, C Lawrence},
booktitle={European conference on computer vision},
pages={740--755},
year={2014},
organization={Springer}
}
Results on COCO val2017 with detector having human AP of 56.4 on COCO val2017 dataset
Arch | Input Size | AP | AP50 | AP75 | AR | AR50 | ckpt | log |
---|---|---|---|---|---|---|---|---|
coarsedropout | 256x192 | 0.753 | 0.908 | 0.822 | 0.806 | 0.946 | ckpt | log |
gridmask | 256x192 | 0.752 | 0.906 | 0.825 | 0.804 | 0.943 | ckpt | log |
photometric | 256x192 | 0.753 | 0.909 | 0.825 | 0.805 | 0.943 | ckpt | log |
Topdown Heatmap + Hrnet + Udp on Coco¶
HRNet (CVPR'2019)
@inproceedings{sun2019deep,
title={Deep high-resolution representation learning for human pose estimation},
author={Sun, Ke and Xiao, Bin and Liu, Dong and Wang, Jingdong},
booktitle={Proceedings of the IEEE conference on computer vision and pattern recognition},
pages={5693--5703},
year={2019}
}
UDP (CVPR'2020)
@InProceedings{Huang_2020_CVPR,
author = {Huang, Junjie and Zhu, Zheng and Guo, Feng and Huang, Guan},
title = {The Devil Is in the Details: Delving Into Unbiased Data Processing for Human Pose Estimation},
booktitle = {The IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)},
month = {June},
year = {2020}
}
COCO (ECCV'2014)
@inproceedings{lin2014microsoft,
title={Microsoft coco: Common objects in context},
author={Lin, Tsung-Yi and Maire, Michael and Belongie, Serge and Hays, James and Perona, Pietro and Ramanan, Deva and Doll{\'a}r, Piotr and Zitnick, C Lawrence},
booktitle={European conference on computer vision},
pages={740--755},
year={2014},
organization={Springer}
}
Results on COCO val2017 with detector having human AP of 56.4 on COCO val2017 dataset
Arch | Input Size | AP | AP50 | AP75 | AR | AR50 | ckpt | log |
---|---|---|---|---|---|---|---|---|
pose_hrnet_w32_udp | 256x192 | 0.760 | 0.907 | 0.827 | 0.811 | 0.945 | ckpt | log |
pose_hrnet_w32_udp | 384x288 | 0.769 | 0.908 | 0.833 | 0.817 | 0.944 | ckpt | log |
pose_hrnet_w48_udp | 256x192 | 0.767 | 0.906 | 0.834 | 0.817 | 0.945 | ckpt | log |
pose_hrnet_w48_udp | 384x288 | 0.772 | 0.910 | 0.835 | 0.820 | 0.945 | ckpt | log |
pose_hrnet_w32_udp_regress | 256x192 | 0.758 | 0.908 | 0.823 | 0.812 | 0.943 | ckpt | log |
Note that, UDP also adopts the unbiased encoding/decoding algorithm of DARK.
Topdown Heatmap + Hrnet + Dark on Coco¶
HRNet (CVPR'2019)
@inproceedings{sun2019deep,
title={Deep high-resolution representation learning for human pose estimation},
author={Sun, Ke and Xiao, Bin and Liu, Dong and Wang, Jingdong},
booktitle={Proceedings of the IEEE conference on computer vision and pattern recognition},
pages={5693--5703},
year={2019}
}
DarkPose (CVPR'2020)
@inproceedings{zhang2020distribution,
title={Distribution-aware coordinate representation for human pose estimation},
author={Zhang, Feng and Zhu, Xiatian and Dai, Hanbin and Ye, Mao and Zhu, Ce},
booktitle={Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition},
pages={7093--7102},
year={2020}
}
COCO (ECCV'2014)
@inproceedings{lin2014microsoft,
title={Microsoft coco: Common objects in context},
author={Lin, Tsung-Yi and Maire, Michael and Belongie, Serge and Hays, James and Perona, Pietro and Ramanan, Deva and Doll{\'a}r, Piotr and Zitnick, C Lawrence},
booktitle={European conference on computer vision},
pages={740--755},
year={2014},
organization={Springer}
}
Results on COCO val2017 with detector having human AP of 56.4 on COCO val2017 dataset
Arch | Input Size | AP | AP50 | AP75 | AR | AR50 | ckpt | log |
---|---|---|---|---|---|---|---|---|
pose_hrnet_w32_dark | 256x192 | 0.757 | 0.907 | 0.823 | 0.808 | 0.943 | ckpt | log |
pose_hrnet_w32_dark | 384x288 | 0.766 | 0.907 | 0.831 | 0.815 | 0.943 | ckpt | log |
pose_hrnet_w48_dark | 256x192 | 0.764 | 0.907 | 0.830 | 0.814 | 0.943 | ckpt | log |
pose_hrnet_w48_dark | 384x288 | 0.772 | 0.910 | 0.836 | 0.820 | 0.946 | ckpt | log |
Dekr + Hrnet on Crowdpose¶
DEKR (CVPR'2021)
@inproceedings{geng2021bottom,
title={Bottom-up human pose estimation via disentangled keypoint regression},
author={Geng, Zigang and Sun, Ke and Xiao, Bin and Zhang, Zhaoxiang and Wang, Jingdong},
booktitle={Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition},
pages={14676--14686},
year={2021}
}
HRNet (CVPR'2019)
@inproceedings{sun2019deep,
title={Deep high-resolution representation learning for human pose estimation},
author={Sun, Ke and Xiao, Bin and Liu, Dong and Wang, Jingdong},
booktitle={Proceedings of the IEEE conference on computer vision and pattern recognition},
pages={5693--5703},
year={2019}
}
CrowdPose (CVPR'2019)
@article{li2018crowdpose,
title={CrowdPose: Efficient Crowded Scenes Pose Estimation and A New Benchmark},
author={Li, Jiefeng and Wang, Can and Zhu, Hao and Mao, Yihuan and Fang, Hao-Shu and Lu, Cewu},
journal={arXiv preprint arXiv:1812.00324},
year={2018}
}
Results on CrowdPose test without multi-scale test
Arch | Input Size | AP | AP50 | AP75 | AR | AR50 | ckpt | log |
---|---|---|---|---|---|---|---|---|
HRNet-w32 | 512x512 | 0.663 | 0.857 | 0.715 | 0.719 | 0.893 | ckpt | log |
HRNet-w48 | 640x640 | 0.682 | 0.869 | 0.736 | 0.742 | 0.911 | ckpt | log |
Results on CrowdPose test with multi-scale test. 3 default scales ([2, 1, 0.5]) are used
Arch | Input Size | AP | AP50 | AP75 | AR | AR50 | ckpt |
---|---|---|---|---|---|---|---|
HRNet-w32* | 512x512 | 0.692 | 0.874 | 0.748 | 0.755 | 0.926 | ckpt |
HRNet-w48* | 640x640 | 0.696 | 0.869 | 0.749 | 0.769 | 0.933 | ckpt |
* these configs are generally used for evaluation. The training settings are identical to their single-scale counterparts.
Topdown Heatmap + Hrnet on Crowdpose¶
HRNet (CVPR'2019)
@inproceedings{sun2019deep,
title={Deep high-resolution representation learning for human pose estimation},
author={Sun, Ke and Xiao, Bin and Liu, Dong and Wang, Jingdong},
booktitle={Proceedings of the IEEE conference on computer vision and pattern recognition},
pages={5693--5703},
year={2019}
}
CrowdPose (CVPR'2019)
@article{li2018crowdpose,
title={CrowdPose: Efficient Crowded Scenes Pose Estimation and A New Benchmark},
author={Li, Jiefeng and Wang, Can and Zhu, Hao and Mao, Yihuan and Fang, Hao-Shu and Lu, Cewu},
journal={arXiv preprint arXiv:1812.00324},
year={2018}
}
Results on CrowdPose test with YOLOv3 human detector
Arch | Input Size | AP | AP50 | AP75 | AP (E) | AP (M) | AP (H) | ckpt | log |
---|---|---|---|---|---|---|---|---|---|
pose_hrnet_w32 | 256x192 | 0.675 | 0.825 | 0.729 | 0.770 | 0.687 | 0.553 | ckpt | log |
Topdown Heatmap + Hrnet on H36m¶
HRNet (CVPR'2019)
@inproceedings{sun2019deep,
title={Deep high-resolution representation learning for human pose estimation},
author={Sun, Ke and Xiao, Bin and Liu, Dong and Wang, Jingdong},
booktitle={Proceedings of the IEEE conference on computer vision and pattern recognition},
pages={5693--5703},
year={2019}
}
Human3.6M (TPAMI'2014)
@article{h36m_pami,
author = {Ionescu, Catalin and Papava, Dragos and Olaru, Vlad and Sminchisescu, Cristian},
title = {Human3.6M: Large Scale Datasets and Predictive Methods for 3D Human Sensing in Natural Environments},
journal = {IEEE Transactions on Pattern Analysis and Machine Intelligence},
publisher = {IEEE Computer Society},
volume = {36},
number = {7},
pages = {1325-1339},
month = {jul},
year = {2014}
}
Results on Human3.6M test set with ground truth 2D detections
Arch | Input Size | EPE | PCK | ckpt | log |
---|---|---|---|---|---|
pose_hrnet_w32 | 256x256 | 9.43 | 0.911 | ckpt | log |
pose_hrnet_w48 | 256x256 | 7.36 | 0.932 | ckpt | log |
Associative Embedding + Hrnet on MHP¶
Associative Embedding (NIPS'2017)
@inproceedings{newell2017associative,
title={Associative embedding: End-to-end learning for joint detection and grouping},
author={Newell, Alejandro and Huang, Zhiao and Deng, Jia},
booktitle={Advances in neural information processing systems},
pages={2277--2287},
year={2017}
}
HRNet (CVPR'2019)
@inproceedings{sun2019deep,
title={Deep high-resolution representation learning for human pose estimation},
author={Sun, Ke and Xiao, Bin and Liu, Dong and Wang, Jingdong},
booktitle={Proceedings of the IEEE conference on computer vision and pattern recognition},
pages={5693--5703},
year={2019}
}
MHP (ACM MM'2018)
@inproceedings{zhao2018understanding,
title={Understanding humans in crowded scenes: Deep nested adversarial learning and a new benchmark for multi-human parsing},
author={Zhao, Jian and Li, Jianshu and Cheng, Yu and Sim, Terence and Yan, Shuicheng and Feng, Jiashi},
booktitle={Proceedings of the 26th ACM international conference on Multimedia},
pages={792--800},
year={2018}
}
Results on MHP v2.0 validation set without multi-scale test
Arch | Input Size | AP | AP50 | AP75 | AR | AR50 | ckpt | log |
---|---|---|---|---|---|---|---|---|
HRNet-w48 | 512x512 | 0.583 | 0.895 | 0.666 | 0.656 | 0.931 | ckpt | log |
Results on MHP v2.0 validation set with multi-scale test. 3 default scales ([2, 1, 0.5]) are used
Arch | Input Size | AP | AP50 | AP75 | AR | AR50 | ckpt | log |
---|---|---|---|---|---|---|---|---|
HRNet-w48 | 512x512 | 0.592 | 0.898 | 0.673 | 0.664 | 0.932 | ckpt | log |
Topdown Heatmap + Hrnet on Mpii¶
HRNet (CVPR'2019)
@inproceedings{sun2019deep,
title={Deep high-resolution representation learning for human pose estimation},
author={Sun, Ke and Xiao, Bin and Liu, Dong and Wang, Jingdong},
booktitle={Proceedings of the IEEE conference on computer vision and pattern recognition},
pages={5693--5703},
year={2019}
}
MPII (CVPR'2014)
@inproceedings{andriluka14cvpr,
author = {Mykhaylo Andriluka and Leonid Pishchulin and Peter Gehler and Schiele, Bernt},
title = {2D Human Pose Estimation: New Benchmark and State of the Art Analysis},
booktitle = {IEEE Conference on Computer Vision and Pattern Recognition (CVPR)},
year = {2014},
month = {June}
}
Results on MPII val set
Arch | Input Size | Mean | Mean@0.1 | ckpt | log |
---|---|---|---|---|---|
pose_hrnet_w32 | 256x256 | 0.900 | 0.334 | ckpt | log |
pose_hrnet_w48 | 256x256 | 0.901 | 0.337 | ckpt | log |
Topdown Heatmap + Hrnet + Dark on Mpii¶
HRNet (CVPR'2019)
@inproceedings{sun2019deep,
title={Deep high-resolution representation learning for human pose estimation},
author={Sun, Ke and Xiao, Bin and Liu, Dong and Wang, Jingdong},
booktitle={Proceedings of the IEEE conference on computer vision and pattern recognition},
pages={5693--5703},
year={2019}
}
DarkPose (CVPR'2020)
@inproceedings{zhang2020distribution,
title={Distribution-aware coordinate representation for human pose estimation},
author={Zhang, Feng and Zhu, Xiatian and Dai, Hanbin and Ye, Mao and Zhu, Ce},
booktitle={Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition},
pages={7093--7102},
year={2020}
}
MPII (CVPR'2014)
@inproceedings{andriluka14cvpr,
author = {Mykhaylo Andriluka and Leonid Pishchulin and Peter Gehler and Schiele, Bernt},
title = {2D Human Pose Estimation: New Benchmark and State of the Art Analysis},
booktitle = {IEEE Conference on Computer Vision and Pattern Recognition (CVPR)},
year = {2014},
month = {June}
}
Results on MPII val set
Arch | Input Size | Mean | Mean@0.1 | ckpt | log |
---|---|---|---|---|---|
pose_hrnet_w32_dark | 256x256 | 0.904 | 0.354 | ckpt | log |
pose_hrnet_w48_dark | 256x256 | 0.905 | 0.360 | ckpt | log |
Topdown Heatmap + Hrnet on Ochuman¶
HRNet (CVPR'2019)
@inproceedings{sun2019deep,
title={Deep high-resolution representation learning for human pose estimation},
author={Sun, Ke and Xiao, Bin and Liu, Dong and Wang, Jingdong},
booktitle={Proceedings of the IEEE conference on computer vision and pattern recognition},
pages={5693--5703},
year={2019}
}
OCHuman (CVPR'2019)
@inproceedings{zhang2019pose2seg,
title={Pose2seg: Detection free human instance segmentation},
author={Zhang, Song-Hai and Li, Ruilong and Dong, Xin and Rosin, Paul and Cai, Zixi and Han, Xi and Yang, Dingcheng and Huang, Haozhi and Hu, Shi-Min},
booktitle={Proceedings of the IEEE conference on computer vision and pattern recognition},
pages={889--898},
year={2019}
}
Results on OCHuman test dataset with ground-truth bounding boxes
Following the common setting, the models are trained on COCO train dataset, and evaluate on OCHuman dataset.
Arch | Input Size | AP | AP50 | AP75 | AR | AR50 | ckpt | log |
---|---|---|---|---|---|---|---|---|
pose_hrnet_w32 | 256x192 | 0.591 | 0.748 | 0.641 | 0.631 | 0.775 | ckpt | log |
pose_hrnet_w32 | 384x288 | 0.606 | 0.748 | 0.650 | 0.647 | 0.776 | ckpt | log |
pose_hrnet_w48 | 256x192 | 0.611 | 0.752 | 0.663 | 0.648 | 0.778 | ckpt | log |
pose_hrnet_w48 | 384x288 | 0.616 | 0.749 | 0.663 | 0.653 | 0.773 | ckpt | log |
Topdown Heatmap + Hrnet on Posetrack18¶
HRNet (CVPR'2019)
@inproceedings{sun2019deep,
title={Deep high-resolution representation learning for human pose estimation},
author={Sun, Ke and Xiao, Bin and Liu, Dong and Wang, Jingdong},
booktitle={Proceedings of the IEEE conference on computer vision and pattern recognition},
pages={5693--5703},
year={2019}
}
PoseTrack18 (CVPR'2018)
@inproceedings{andriluka2018posetrack,
title={Posetrack: A benchmark for human pose estimation and tracking},
author={Andriluka, Mykhaylo and Iqbal, Umar and Insafutdinov, Eldar and Pishchulin, Leonid and Milan, Anton and Gall, Juergen and Schiele, Bernt},
booktitle={Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition},
pages={5167--5176},
year={2018}
}
Results on PoseTrack2018 val with ground-truth bounding boxes
Arch | Input Size | Head | Shou | Elb | Wri | Hip | Knee | Ankl | Total | ckpt | log |
---|---|---|---|---|---|---|---|---|---|---|---|
pose_hrnet_w32 | 256x192 | 87.4 | 88.6 | 84.3 | 78.5 | 79.7 | 81.8 | 78.8 | 83.0 | ckpt | log |
pose_hrnet_w32 | 384x288 | 87.0 | 88.8 | 85.0 | 80.1 | 80.5 | 82.6 | 79.4 | 83.6 | ckpt | log |
pose_hrnet_w48 | 256x192 | 88.2 | 90.1 | 85.8 | 80.8 | 80.7 | 83.3 | 80.3 | 84.4 | ckpt | log |
pose_hrnet_w48 | 384x288 | 87.8 | 90.0 | 85.9 | 81.3 | 81.1 | 83.3 | 80.9 | 84.5 | ckpt | log |
The models are first pre-trained on COCO dataset, and then fine-tuned on PoseTrack18.
Results on PoseTrack2018 val with MMDetection pre-trained Cascade R-CNN (X-101-64x4d-FPN) human detector
Arch | Input Size | Head | Shou | Elb | Wri | Hip | Knee | Ankl | Total | ckpt | log |
---|---|---|---|---|---|---|---|---|---|---|---|
pose_hrnet_w32 | 256x192 | 78.0 | 82.9 | 79.5 | 73.8 | 76.9 | 76.6 | 70.2 | 76.9 | ckpt | log |
pose_hrnet_w32 | 384x288 | 79.9 | 83.6 | 80.4 | 74.5 | 74.8 | 76.1 | 70.5 | 77.3 | ckpt | log |
pose_hrnet_w48 | 256x192 | 80.1 | 83.4 | 80.6 | 74.8 | 74.3 | 76.8 | 70.4 | 77.4 | ckpt | log |
pose_hrnet_w48 | 384x288 | 80.2 | 83.8 | 80.9 | 75.2 | 74.7 | 76.7 | 71.7 | 77.8 | ckpt | log |
The models are first pre-trained on COCO dataset, and then fine-tuned on PoseTrack18.
Posewarper + Hrnet + Posetrack18 on Posetrack18¶
PoseWarper (NeurIPS'2019)
@inproceedings{NIPS2019_gberta,
title = {Learning Temporal Pose Estimation from Sparsely Labeled Videos},
author = {Bertasius, Gedas and Feichtenhofer, Christoph, and Tran, Du and Shi, Jianbo, and Torresani, Lorenzo},
booktitle = {Advances in Neural Information Processing Systems 33},
year = {2019},
}
HRNet (CVPR'2019)
@inproceedings{sun2019deep,
title={Deep high-resolution representation learning for human pose estimation},
author={Sun, Ke and Xiao, Bin and Liu, Dong and Wang, Jingdong},
booktitle={Proceedings of the IEEE conference on computer vision and pattern recognition},
pages={5693--5703},
year={2019}
}
PoseTrack18 (CVPR'2018)
@inproceedings{andriluka2018posetrack,
title={Posetrack: A benchmark for human pose estimation and tracking},
author={Andriluka, Mykhaylo and Iqbal, Umar and Insafutdinov, Eldar and Pishchulin, Leonid and Milan, Anton and Gall, Juergen and Schiele, Bernt},
booktitle={Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition},
pages={5167--5176},
year={2018}
}
COCO (ECCV'2014)
@inproceedings{lin2014microsoft,
title={Microsoft coco: Common objects in context},
author={Lin, Tsung-Yi and Maire, Michael and Belongie, Serge and Hays, James and Perona, Pietro and Ramanan, Deva and Doll{\'a}r, Piotr and Zitnick, C Lawrence},
booktitle={European conference on computer vision},
pages={740--755},
year={2014},
organization={Springer}
}
Note that the training of PoseWarper can be split into two stages.
The first-stage is trained with the pre-trained checkpoint on COCO dataset, and the main backbone is fine-tuned on PoseTrack18 in a single-frame setting.
The second-stage is trained with the last checkpoint from the first stage, and the warping offsets are learned in a multi-frame setting while the backbone is frozen.
Results on PoseTrack2018 val with ground-truth bounding boxes
Arch | Input Size | Head | Shou | Elb | Wri | Hip | Knee | Ankl | Total | ckpt | log |
---|---|---|---|---|---|---|---|---|---|---|---|
pose_hrnet_w48 | 384x288 | 88.2 | 90.3 | 86.1 | 81.6 | 81.8 | 83.8 | 81.5 | 85.0 | ckpt | log |
Results on PoseTrack2018 val with precomputed human bounding boxes from PoseWarper supplementary data files from this link1.
Arch | Input Size | Head | Shou | Elb | Wri | Hip | Knee | Ankl | Total | ckpt | log |
---|---|---|---|---|---|---|---|---|---|---|---|
pose_hrnet_w48 | 384x288 | 81.8 | 85.6 | 82.7 | 77.2 | 76.8 | 79.0 | 74.4 | 79.8 | ckpt | log |
1 Please download the precomputed human bounding boxes on PoseTrack2018 val from $PoseWarper_supp_files/posetrack18_precomputed_boxes/val_boxes.json
and place it here: $mmpose/data/posetrack18/posetrack18_precomputed_boxes/val_boxes.json
to be consistent with the config. Please refer to DATA Preparation for more detail about data preparation.
Associative Embedding + Hrnet on Coco-Wholebody¶
Associative Embedding (NIPS'2017)
@inproceedings{newell2017associative,
title={Associative embedding: End-to-end learning for joint detection and grouping},
author={Newell, Alejandro and Huang, Zhiao and Deng, Jia},
booktitle={Advances in neural information processing systems},
pages={2277--2287},
year={2017}
}
HRNet (CVPR'2019)
@inproceedings{sun2019deep,
title={Deep high-resolution representation learning for human pose estimation},
author={Sun, Ke and Xiao, Bin and Liu, Dong and Wang, Jingdong},
booktitle={Proceedings of the IEEE conference on computer vision and pattern recognition},
pages={5693--5703},
year={2019}
}
COCO-WholeBody (ECCV'2020)
@inproceedings{jin2020whole,
title={Whole-Body Human Pose Estimation in the Wild},
author={Jin, Sheng and Xu, Lumin and Xu, Jin and Wang, Can and Liu, Wentao and Qian, Chen and Ouyang, Wanli and Luo, Ping},
booktitle={Proceedings of the European Conference on Computer Vision (ECCV)},
year={2020}
}
Results on COCO-WholeBody v1.0 val without multi-scale test
Arch | Input Size | Body AP | Body AR | Foot AP | Foot AR | Face AP | Face AR | Hand AP | Hand AR | Whole AP | Whole AR | ckpt | log |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|
HRNet-w32+ | 512x512 | 0.551 | 0.650 | 0.271 | 0.451 | 0.564 | 0.618 | 0.159 | 0.238 | 0.342 | 0.453 | ckpt | log |
HRNet-w48+ | 512x512 | 0.592 | 0.686 | 0.443 | 0.595 | 0.619 | 0.674 | 0.347 | 0.438 | 0.422 | 0.532 | ckpt | log |
Note: +
means the model is first pre-trained on original COCO dataset, and then fine-tuned on COCO-WholeBody dataset. We find this will lead to better performance.
Topdown Heatmap + Hrnet on Coco-Wholebody¶
HRNet (CVPR'2019)
@inproceedings{sun2019deep,
title={Deep high-resolution representation learning for human pose estimation},
author={Sun, Ke and Xiao, Bin and Liu, Dong and Wang, Jingdong},
booktitle={Proceedings of the IEEE conference on computer vision and pattern recognition},
pages={5693--5703},
year={2019}
}
COCO-WholeBody (ECCV'2020)
@inproceedings{jin2020whole,
title={Whole-Body Human Pose Estimation in the Wild},
author={Jin, Sheng and Xu, Lumin and Xu, Jin and Wang, Can and Liu, Wentao and Qian, Chen and Ouyang, Wanli and Luo, Ping},
booktitle={Proceedings of the European Conference on Computer Vision (ECCV)},
year={2020}
}
Results on COCO-WholeBody v1.0 val with detector having human AP of 56.4 on COCO val2017 dataset
Arch | Input Size | Body AP | Body AR | Foot AP | Foot AR | Face AP | Face AR | Hand AP | Hand AR | Whole AP | Whole AR | ckpt | log |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|
pose_hrnet_w32 | 256x192 | 0.700 | 0.746 | 0.567 | 0.645 | 0.637 | 0.688 | 0.473 | 0.546 | 0.553 | 0.626 | ckpt | log |
pose_hrnet_w32 | 384x288 | 0.701 | 0.773 | 0.586 | 0.692 | 0.727 | 0.783 | 0.516 | 0.604 | 0.586 | 0.674 | ckpt | log |
pose_hrnet_w48 | 256x192 | 0.700 | 0.776 | 0.672 | 0.785 | 0.656 | 0.743 | 0.534 | 0.639 | 0.579 | 0.681 | ckpt | log |
pose_hrnet_w48 | 384x288 | 0.722 | 0.790 | 0.694 | 0.799 | 0.777 | 0.834 | 0.587 | 0.679 | 0.631 | 0.716 | ckpt | log |
Topdown Heatmap + Hrnet + Dark on Coco-Wholebody¶
HRNet (CVPR'2019)
@inproceedings{sun2019deep,
title={Deep high-resolution representation learning for human pose estimation},
author={Sun, Ke and Xiao, Bin and Liu, Dong and Wang, Jingdong},
booktitle={Proceedings of the IEEE conference on computer vision and pattern recognition},
pages={5693--5703},
year={2019}
}
DarkPose (CVPR'2020)
@inproceedings{zhang2020distribution,
title={Distribution-aware coordinate representation for human pose estimation},
author={Zhang, Feng and Zhu, Xiatian and Dai, Hanbin and Ye, Mao and Zhu, Ce},
booktitle={Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition},
pages={7093--7102},
year={2020}
}
COCO-WholeBody (ECCV'2020)
@inproceedings{jin2020whole,
title={Whole-Body Human Pose Estimation in the Wild},
author={Jin, Sheng and Xu, Lumin and Xu, Jin and Wang, Can and Liu, Wentao and Qian, Chen and Ouyang, Wanli and Luo, Ping},
booktitle={Proceedings of the European Conference on Computer Vision (ECCV)},
year={2020}
}
Results on COCO-WholeBody v1.0 val with detector having human AP of 56.4 on COCO val2017 dataset
Arch | Input Size | Body AP | Body AR | Foot AP | Foot AR | Face AP | Face AR | Hand AP | Hand AR | Whole AP | Whole AR | ckpt | log |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|
pose_hrnet_w32_dark | 256x192 | 0.694 | 0.764 | 0.565 | 0.674 | 0.736 | 0.808 | 0.503 | 0.602 | 0.582 | 0.671 | ckpt | log |
pose_hrnet_w48_dark+ | 384x288 | 0.742 | 0.807 | 0.705 | 0.804 | 0.840 | 0.892 | 0.602 | 0.694 | 0.661 | 0.743 | ckpt | log |
Note: +
means the model is first pre-trained on original COCO dataset, and then fine-tuned on COCO-WholeBody dataset. We find this will lead to better performance.
Topdown Heatmap + Hrnet + Dark on Halpe¶
HRNet (CVPR'2019)
@inproceedings{sun2019deep,
title={Deep high-resolution representation learning for human pose estimation},
author={Sun, Ke and Xiao, Bin and Liu, Dong and Wang, Jingdong},
booktitle={Proceedings of the IEEE conference on computer vision and pattern recognition},
pages={5693--5703},
year={2019}
}
DarkPose (CVPR'2020)
@inproceedings{zhang2020distribution,
title={Distribution-aware coordinate representation for human pose estimation},
author={Zhang, Feng and Zhu, Xiatian and Dai, Hanbin and Ye, Mao and Zhu, Ce},
booktitle={Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition},
pages={7093--7102},
year={2020}
}
Halpe (CVPR'2020)
@inproceedings{li2020pastanet,
title={PaStaNet: Toward Human Activity Knowledge Engine},
author={Li, Yong-Lu and Xu, Liang and Liu, Xinpeng and Huang, Xijie and Xu, Yue and Wang, Shiyi and Fang, Hao-Shu and Ma, Ze and Chen, Mingyang and Lu, Cewu},
booktitle={CVPR},
year={2020}
}
Results on Halpe v1.0 val with detector having human AP of 56.4 on COCO val2017 dataset
Arch | Input Size | Whole AP | Whole AR | ckpt | log |
---|---|---|---|---|---|
pose_hrnet_w48_dark+ | 384x288 | 0.527 | 0.620 | ckpt | log |
Note: +
means the model is first pre-trained on original COCO dataset, and then fine-tuned on Halpe dataset. We find this will lead to better performance.
HRNetv2 (TPAMI’2019)¶
Topdown Heatmap + Hrnetv2 on 300w¶
HRNetv2 (TPAMI'2019)
@article{WangSCJDZLMTWLX19,
title={Deep High-Resolution Representation Learning for Visual Recognition},
author={Jingdong Wang and Ke Sun and Tianheng Cheng and
Borui Jiang and Chaorui Deng and Yang Zhao and Dong Liu and Yadong Mu and
Mingkui Tan and Xinggang Wang and Wenyu Liu and Bin Xiao},
journal={TPAMI},
year={2019}
}
300W (IMAVIS'2016)
@article{sagonas2016300,
title={300 faces in-the-wild challenge: Database and results},
author={Sagonas, Christos and Antonakos, Epameinondas and Tzimiropoulos, Georgios and Zafeiriou, Stefanos and Pantic, Maja},
journal={Image and vision computing},
volume={47},
pages={3--18},
year={2016},
publisher={Elsevier}
}
Results on 300W dataset
The model is trained on 300W train.
Arch | Input Size | NMEcommon | NMEchallenge | NMEfull | NMEtest | ckpt | log |
---|---|---|---|---|---|---|---|
pose_hrnetv2_w18 | 256x256 | 2.86 | 5.45 | 3.37 | 3.97 | ckpt | log |
Topdown Heatmap + Hrnetv2 + Dark on Aflw¶
HRNetv2 (TPAMI'2019)
@article{WangSCJDZLMTWLX19,
title={Deep High-Resolution Representation Learning for Visual Recognition},
author={Jingdong Wang and Ke Sun and Tianheng Cheng and
Borui Jiang and Chaorui Deng and Yang Zhao and Dong Liu and Yadong Mu and
Mingkui Tan and Xinggang Wang and Wenyu Liu and Bin Xiao},
journal={TPAMI},
year={2019}
}
DarkPose (CVPR'2020)
@inproceedings{zhang2020distribution,
title={Distribution-aware coordinate representation for human pose estimation},
author={Zhang, Feng and Zhu, Xiatian and Dai, Hanbin and Ye, Mao and Zhu, Ce},
booktitle={Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition},
pages={7093--7102},
year={2020}
}
AFLW (ICCVW'2011)
@inproceedings{koestinger2011annotated,
title={Annotated facial landmarks in the wild: A large-scale, real-world database for facial landmark localization},
author={Koestinger, Martin and Wohlhart, Paul and Roth, Peter M and Bischof, Horst},
booktitle={2011 IEEE international conference on computer vision workshops (ICCV workshops)},
pages={2144--2151},
year={2011},
organization={IEEE}
}
Results on AFLW dataset
The model is trained on AFLW train and evaluated on AFLW full and frontal.
Arch | Input Size | NMEfull | NMEfrontal | ckpt | log |
---|---|---|---|---|---|
pose_hrnetv2_w18_dark | 256x256 | 1.34 | 1.20 | ckpt | log |
Topdown Heatmap + Hrnetv2 on Aflw¶
HRNetv2 (TPAMI'2019)
@article{WangSCJDZLMTWLX19,
title={Deep High-Resolution Representation Learning for Visual Recognition},
author={Jingdong Wang and Ke Sun and Tianheng Cheng and
Borui Jiang and Chaorui Deng and Yang Zhao and Dong Liu and Yadong Mu and
Mingkui Tan and Xinggang Wang and Wenyu Liu and Bin Xiao},
journal={TPAMI},
year={2019}
}
AFLW (ICCVW'2011)
@inproceedings{koestinger2011annotated,
title={Annotated facial landmarks in the wild: A large-scale, real-world database for facial landmark localization},
author={Koestinger, Martin and Wohlhart, Paul and Roth, Peter M and Bischof, Horst},
booktitle={2011 IEEE international conference on computer vision workshops (ICCV workshops)},
pages={2144--2151},
year={2011},
organization={IEEE}
}
Results on AFLW dataset
The model is trained on AFLW train and evaluated on AFLW full and frontal.
Arch | Input Size | NMEfull | NMEfrontal | ckpt | log |
---|---|---|---|---|---|
pose_hrnetv2_w18 | 256x256 | 1.41 | 1.27 | ckpt | log |
Topdown Heatmap + Hrnetv2 + Coco + Wholebody on Coco_wholebody_face¶
HRNetv2 (TPAMI'2019)
@article{WangSCJDZLMTWLX19,
title={Deep High-Resolution Representation Learning for Visual Recognition},
author={Jingdong Wang and Ke Sun and Tianheng Cheng and
Borui Jiang and Chaorui Deng and Yang Zhao and Dong Liu and Yadong Mu and
Mingkui Tan and Xinggang Wang and Wenyu Liu and Bin Xiao},
journal={TPAMI},
year={2019}
}
COCO-WholeBody-Face (ECCV'2020)
@inproceedings{jin2020whole,
title={Whole-Body Human Pose Estimation in the Wild},
author={Jin, Sheng and Xu, Lumin and Xu, Jin and Wang, Can and Liu, Wentao and Qian, Chen and Ouyang, Wanli and Luo, Ping},
booktitle={Proceedings of the European Conference on Computer Vision (ECCV)},
year={2020}
}
Results on COCO-WholeBody-Face val set
Arch | Input Size | NME | ckpt | log |
---|---|---|---|---|
pose_hrnetv2_w18 | 256x256 | 0.0569 | ckpt | log |
Topdown Heatmap + Hrnetv2 + Dark + Coco + Wholebody on Coco_wholebody_face¶
HRNetv2 (TPAMI'2019)
@article{WangSCJDZLMTWLX19,
title={Deep High-Resolution Representation Learning for Visual Recognition},
author={Jingdong Wang and Ke Sun and Tianheng Cheng and
Borui Jiang and Chaorui Deng and Yang Zhao and Dong Liu and Yadong Mu and
Mingkui Tan and Xinggang Wang and Wenyu Liu and Bin Xiao},
journal={TPAMI},
year={2019}
}
DarkPose (CVPR'2020)
@inproceedings{zhang2020distribution,
title={Distribution-aware coordinate representation for human pose estimation},
author={Zhang, Feng and Zhu, Xiatian and Dai, Hanbin and Ye, Mao and Zhu, Ce},
booktitle={Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition},
pages={7093--7102},
year={2020}
}
COCO-WholeBody-Face (ECCV'2020)
@inproceedings{jin2020whole,
title={Whole-Body Human Pose Estimation in the Wild},
author={Jin, Sheng and Xu, Lumin and Xu, Jin and Wang, Can and Liu, Wentao and Qian, Chen and Ouyang, Wanli and Luo, Ping},
booktitle={Proceedings of the European Conference on Computer Vision (ECCV)},
year={2020}
}
Results on COCO-WholeBody-Face val set
Arch | Input Size | NME | ckpt | log |
---|---|---|---|---|
pose_hrnetv2_w18_dark | 256x256 | 0.0513 | ckpt | log |
Topdown Heatmap + Hrnetv2 on Cofw¶
HRNetv2 (TPAMI'2019)
@article{WangSCJDZLMTWLX19,
title={Deep High-Resolution Representation Learning for Visual Recognition},
author={Jingdong Wang and Ke Sun and Tianheng Cheng and
Borui Jiang and Chaorui Deng and Yang Zhao and Dong Liu and Yadong Mu and
Mingkui Tan and Xinggang Wang and Wenyu Liu and Bin Xiao},
journal={TPAMI},
year={2019}
}
COFW (ICCV'2013)
@inproceedings{burgos2013robust,
title={Robust face landmark estimation under occlusion},
author={Burgos-Artizzu, Xavier P and Perona, Pietro and Doll{\'a}r, Piotr},
booktitle={Proceedings of the IEEE international conference on computer vision},
pages={1513--1520},
year={2013}
}
Results on COFW dataset
The model is trained on COFW train.
Arch | Input Size | NME | ckpt | log |
---|---|---|---|---|
pose_hrnetv2_w18 | 256x256 | 3.40 | ckpt | log |
Topdown Heatmap + Hrnetv2 + Awing on WFLW¶
HRNetv2 (TPAMI'2019)
@article{WangSCJDZLMTWLX19,
title={Deep High-Resolution Representation Learning for Visual Recognition},
author={Jingdong Wang and Ke Sun and Tianheng Cheng and
Borui Jiang and Chaorui Deng and Yang Zhao and Dong Liu and Yadong Mu and
Mingkui Tan and Xinggang Wang and Wenyu Liu and Bin Xiao},
journal={TPAMI},
year={2019}
}
AdaptiveWingloss (ICCV'2019)
@inproceedings{wang2019adaptive,
title={Adaptive wing loss for robust face alignment via heatmap regression},
author={Wang, Xinyao and Bo, Liefeng and Fuxin, Li},
booktitle={Proceedings of the IEEE/CVF international conference on computer vision},
pages={6971--6981},
year={2019}
}
WFLW (CVPR'2018)
@inproceedings{wu2018look,
title={Look at boundary: A boundary-aware face alignment algorithm},
author={Wu, Wayne and Qian, Chen and Yang, Shuo and Wang, Quan and Cai, Yici and Zhou, Qiang},
booktitle={Proceedings of the IEEE conference on computer vision and pattern recognition},
pages={2129--2138},
year={2018}
}
Results on WFLW dataset
The model is trained on WFLW train.
Arch | Input Size | NMEtest | NMEpose | NMEillumination | NMEocclusion | NMEblur | NMEmakeup | NMEexpression | ckpt | log |
---|---|---|---|---|---|---|---|---|---|---|
pose_hrnetv2_w18_awing | 256x256 | 4.02 | 6.94 | 3.96 | 4.78 | 4.59 | 3.85 | 4.28 | ckpt | log |
Topdown Heatmap + Hrnetv2 + Dark on WFLW¶
HRNetv2 (TPAMI'2019)
@article{WangSCJDZLMTWLX19,
title={Deep High-Resolution Representation Learning for Visual Recognition},
author={Jingdong Wang and Ke Sun and Tianheng Cheng and
Borui Jiang and Chaorui Deng and Yang Zhao and Dong Liu and Yadong Mu and
Mingkui Tan and Xinggang Wang and Wenyu Liu and Bin Xiao},
journal={TPAMI},
year={2019}
}
DarkPose (CVPR'2020)
@inproceedings{zhang2020distribution,
title={Distribution-aware coordinate representation for human pose estimation},
author={Zhang, Feng and Zhu, Xiatian and Dai, Hanbin and Ye, Mao and Zhu, Ce},
booktitle={Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition},
pages={7093--7102},
year={2020}
}
WFLW (CVPR'2018)
@inproceedings{wu2018look,
title={Look at boundary: A boundary-aware face alignment algorithm},
author={Wu, Wayne and Qian, Chen and Yang, Shuo and Wang, Quan and Cai, Yici and Zhou, Qiang},
booktitle={Proceedings of the IEEE conference on computer vision and pattern recognition},
pages={2129--2138},
year={2018}
}
Results on WFLW dataset
The model is trained on WFLW train.
Arch | Input Size | NMEtest | NMEpose | NMEillumination | NMEocclusion | NMEblur | NMEmakeup | NMEexpression | ckpt | log |
---|---|---|---|---|---|---|---|---|---|---|
pose_hrnetv2_w18_dark | 256x256 | 3.98 | 6.99 | 3.96 | 4.78 | 4.57 | 3.87 | 4.30 | ckpt | log |
Topdown Heatmap + Hrnetv2 on WFLW¶
HRNetv2 (TPAMI'2019)
@article{WangSCJDZLMTWLX19,
title={Deep High-Resolution Representation Learning for Visual Recognition},
author={Jingdong Wang and Ke Sun and Tianheng Cheng and
Borui Jiang and Chaorui Deng and Yang Zhao and Dong Liu and Yadong Mu and
Mingkui Tan and Xinggang Wang and Wenyu Liu and Bin Xiao},
journal={TPAMI},
year={2019}
}
WFLW (CVPR'2018)
@inproceedings{wu2018look,
title={Look at boundary: A boundary-aware face alignment algorithm},
author={Wu, Wayne and Qian, Chen and Yang, Shuo and Wang, Quan and Cai, Yici and Zhou, Qiang},
booktitle={Proceedings of the IEEE conference on computer vision and pattern recognition},
pages={2129--2138},