Bottom Up Models¶
HigherHRNet: Scale-Aware Representation Learning for Bottom-Up Human Pose Estimation¶
Introduction¶
@inproceedings{cheng2020higherhrnet,
title={HigherHRNet: Scale-Aware Representation Learning for Bottom-Up Human Pose Estimation},
author={Cheng, Bowen and Xiao, Bin and Wang, Jingdong and Shi, Honghui and Huang, Thomas S and Zhang, Lei},
booktitle={Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition},
pages={5386--5395},
year={2020}
}
Results and models¶
2d Human Pose Estimation¶
Results on COCO val2017 without multi-scale test¶
Arch | Input Size | AP | AP50 | AP75 | AR | AR50 | ckpt | log |
---|---|---|---|---|---|---|---|---|
HigherHRNet-w32 | 512x512 | 0.677 | 0.870 | 0.738 | 0.723 | 0.890 | ckpt | log |
HigherHRNet-w32 | 640x640 | 0.686 | 0.871 | 0.747 | 0.733 | 0.898 | ckpt | log |
HigherHRNet-w48 | 512x512 | 0.686 | 0.873 | 0.741 | 0.731 | 0.892 | ckpt | log |
Results on COCO val2017 with multi-scale test. 3 default scales ([2, 1, 0.5]) are used¶
Arch | Input Size | AP | AP50 | AP75 | AR | AR50 | ckpt | log |
---|---|---|---|---|---|---|---|---|
HigherHRNet-w32 | 512x512 | 0.706 | 0.881 | 0.771 | 0.747 | 0.901 | ckpt | log |
HigherHRNet-w32 | 640x640 | 0.706 | 0.880 | 0.770 | 0.749 | 0.902 | ckpt | log |
HigherHRNet-w48 | 512x512 | 0.716 | 0.884 | 0.775 | 0.755 | 0.901 | ckpt | log |
Results on CrowdPose test without multi-scale test¶
Arch | Input Size | AP | AP50 | AP75 | AP (E) | AP (M) | AP (H) | ckpt | log |
---|---|---|---|---|---|---|---|---|---|
HigherHRNet-w32 | 512x512 | 0.655 | 0.859 | 0.705 | 0.728 | 0.660 | 0.577 | ckpt | log |
Results on CrowdPose test with multi-scale test. 2 scales ([2, 1]) are used¶
Arch | Input Size | AP | AP50 | AP75 | AP (E) | AP (M) | AP (H) | ckpt | log |
---|---|---|---|---|---|---|---|---|---|
HigherHRNet-w32 | 512x512 | 0.661 | 0.864 | 0.710 | 0.742 | 0.670 | 0.566 | ckpt | log |
Results on AIC validation set without multi-scale test¶
Arch | Input Size | AP | AP50 | AP75 | AR | AR50 | ckpt | log |
---|---|---|---|---|---|---|---|---|
HigherHRNet-w32 | 512x512 | 0.315 | 0.710 | 0.243 | 0.379 | 0.757 | ckpt | log |
Results on AIC validation set with multi-scale test. 3 default scales ([2, 1, 0.5]) are used¶
Arch | Input Size | AP | AP50 | AP75 | AR | AR50 | ckpt | log |
---|---|---|---|---|---|---|---|---|
HigherHRNet-w32 | 512x512 | 0.323 | 0.718 | 0.254 | 0.379 | 0.758 | ckpt | log |
Associative Embedding (AE) + HRNet¶
Introduction¶
@inproceedings{newell2017associative,
title={Associative embedding: End-to-end learning for joint detection and grouping},
author={Newell, Alejandro and Huang, Zhiao and Deng, Jia},
booktitle={Advances in neural information processing systems},
pages={2277--2287},
year={2017}
}
@inproceedings{sun2019deep,
title={Deep high-resolution representation learning for human pose estimation},
author={Sun, Ke and Xiao, Bin and Liu, Dong and Wang, Jingdong},
booktitle={Proceedings of the IEEE conference on computer vision and pattern recognition},
pages={5693--5703},
year={2019}
}
Results and models¶
2d Human Pose Estimation¶
Results on COCO val2017 without multi-scale test¶
Arch | Input Size | AP | AP50 | AP75 | AR | AR50 | ckpt | log |
---|---|---|---|---|---|---|---|---|
HRNet-w32 | 512x512 | 0.654 | 0.863 | 0.720 | 0.710 | 0.892 | ckpt | log |
HRNet-w48 | 512x512 | 0.665 | 0.860 | 0.727 | 0.716 | 0.889 | ckpt | log |
Results on COCO val2017 with multi-scale test. 3 default scales ([2, 1, 0.5]) are used¶
Arch | Input Size | AP | AP50 | AP75 | AR | AR50 | ckpt | log |
---|---|---|---|---|---|---|---|---|
HRNet-w32 | 512x512 | 0.698 | 0.877 | 0.760 | 0.748 | 0.907 | ckpt | log |
HRNet-w48 | 512x512 | 0.712 | 0.880 | 0.771 | 0.757 | 0.909 | ckpt | log |
Results on MHP v2.0 validation set without multi-scale test¶
Arch | Input Size | AP | AP50 | AP75 | AR | AR50 | ckpt | log |
---|---|---|---|---|---|---|---|---|
HRNet-w48 | 512x512 | 0.583 | 0.895 | 0.666 | 0.656 | 0.931 | ckpt | log |
Results on MHP v2.0 validation set with multi-scale test. 3 default scales ([2, 1, 0.5]) are used¶
Arch | Input Size | AP | AP50 | AP75 | AR | AR50 | ckpt | log |
---|---|---|---|---|---|---|---|---|
HRNet-w48 | 512x512 | 0.592 | 0.898 | 0.673 | 0.664 | 0.932 | ckpt | log |
Associative Embedding (AE) + Mobilenetv2¶
Introduction¶
@inproceedings{newell2017associative,
title={Associative embedding: End-to-end learning for joint detection and grouping},
author={Newell, Alejandro and Huang, Zhiao and Deng, Jia},
booktitle={Advances in neural information processing systems},
pages={2277--2287},
year={2017}
}
@inproceedings{sandler2018mobilenetv2,
title={Mobilenetv2: Inverted residuals and linear bottlenecks},
author={Sandler, Mark and Howard, Andrew and Zhu, Menglong and Zhmoginov, Andrey and Chen, Liang-Chieh},
booktitle={Proceedings of the IEEE conference on computer vision and pattern recognition},
pages={4510--4520},
year={2018}
}
Results and models¶
2d Human Pose Estimation¶
Results on COCO val2017 without multi-scale test¶
Arch | Input Size | AP | AP50 | AP75 | AR | AR50 | ckpt | log |
---|---|---|---|---|---|---|---|---|
pose_mobilenetv2 | 512x512 | 0.380 | 0.671 | 0.368 | 0.473 | 0.741 | ckpt | log |
Results on COCO val2017 with multi-scale test. 3 default scales ([2, 1, 0.5]) are used¶
Arch | Input Size | AP | AP50 | AP75 | AR | AR50 | ckpt | log |
---|---|---|---|---|---|---|---|---|
pose_mobilenetv2 | 512x512 | 0.442 | 0.696 | 0.422 | 0.517 | 0.766 | ckpt | log |
Associative Embedding (AE) + ResNet¶
Introduction¶
@inproceedings{newell2017associative,
title={Associative embedding: End-to-end learning for joint detection and grouping},
author={Newell, Alejandro and Huang, Zhiao and Deng, Jia},
booktitle={Advances in neural information processing systems},
pages={2277--2287},
year={2017}
}
@inproceedings{he2016deep,
title={Deep residual learning for image recognition},
author={He, Kaiming and Zhang, Xiangyu and Ren, Shaoqing and Sun, Jian},
booktitle={Proceedings of the IEEE conference on computer vision and pattern recognition},
pages={770--778},
year={2016}
}
Results and models¶
2d Human Pose Estimation¶
Results on COCO val2017 without multi-scale test¶
Arch | Input Size | AP | AP50 | AP75 | AR | AR50 | ckpt | log |
---|---|---|---|---|---|---|---|---|
pose_resnet_50 | 512x512 | 0.466 | 0.742 | 0.479 | 0.552 | 0.797 | ckpt | log |
pose_resnet_50 | 640x640 | 0.479 | 0.757 | 0.487 | 0.566 | 0.810 | ckpt | log |
pose_resnet_101 | 512x512 | 0.554 | 0.807 | 0.599 | 0.622 | 0.841 | ckpt | log |
pose_resnet_152 | 512x512 | 0.595 | 0.829 | 0.648 | 0.651 | 0.856 | ckpt | log |
Results on COCO val2017 with multi-scale test. 3 default scales ([2, 1, 0.5]) are used¶
Arch | Input Size | AP | AP50 | AP75 | AR | AR50 | ckpt | log |
---|---|---|---|---|---|---|---|---|
pose_resnet_50 | 512x512 | 0.503 | 0.765 | 0.521 | 0.591 | 0.821 | ckpt | log |
pose_resnet_50 | 640x640 | 0.525 | 0.784 | 0.542 | 0.610 | 0.832 | ckpt | log |
pose_resnet_101 | 512x512 | 0.603 | 0.831 | 0.641 | 0.668 | 0.870 | ckpt | log |
pose_resnet_152 | 512x512 | 0.660 | 0.860 | 0.713 | 0.709 | 0.889 | ckpt | log |
The Devil is in the Details: Delving into Unbiased Data Processing for Human Pose Estimation¶
Introduction¶
@InProceedings{Huang_2020_CVPR,
author = {Huang, Junjie and Zhu, Zheng and Guo, Feng and Huang, Guan},
title = {The Devil Is in the Details: Delving Into Unbiased Data Processing for Human Pose Estimation},
booktitle = {The IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)},
month = {June},
year = {2020}
}
Note that, UDP also adopts the unbiased encoding/decoding algorithm of DARK.
Results and models¶
2d Human Pose Estimation¶
Results on COCO val2017 without multi-scale test¶
Arch | Input Size | AP | AP50 | AP75 | AR | AR50 | ckpt | log |
---|---|---|---|---|---|---|---|---|
HRNet-w32_udp | 512x512 | 0.671 | 0.863 | 0.729 | 0.717 | 0.889 | ckpt | log |
HRNet-w48_udp | 512x512 | 0.681 | 0.872 | 0.741 | 0.725 | 0.892 | ckpt | log |
HigherHRNet-w32_udp | 512x512 | 0.678 | 0.862 | 0.736 | 0.724 | 0.890 | ckpt | log |
HigherHRNet-w48_udp | 512x512 | 0.690 | 0.872 | 0.750 | 0.734 | 0.891 | ckpt | log |