Bottom Up Models¶

HigherHRNet: Scale-Aware Representation Learning for Bottom-Up Human Pose Estimation¶

Introduction¶

@inproceedings{cheng2020higherhrnet,
  title={HigherHRNet: Scale-Aware Representation Learning for Bottom-Up Human Pose Estimation},
  author={Cheng, Bowen and Xiao, Bin and Wang, Jingdong and Shi, Honghui and Huang, Thomas S and Zhang, Lei},
  booktitle={Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition},
  pages={5386--5395},
  year={2020}
}

Results and models¶

2d Human Pose Estimation¶

Results on COCO val2017 without multi-scale test¶

Arch	Input Size	AP	AP⁵⁰	AP⁷⁵	AR	AR⁵⁰	ckpt	log
HigherHRNet-w32	512x512	0.677	0.870	0.738	0.723	0.890	ckpt	log
HigherHRNet-w32	640x640	0.686	0.871	0.747	0.733	0.898	ckpt	log
HigherHRNet-w48	512x512	0.686	0.873	0.741	0.731	0.892	ckpt	log

Results on COCO val2017 with multi-scale test. 3 default scales ([2, 1, 0.5]) are used¶

Arch	Input Size	AP	AP⁵⁰	AP⁷⁵	AR	AR⁵⁰	ckpt	log
HigherHRNet-w32	512x512	0.706	0.881	0.771	0.747	0.901	ckpt	log
HigherHRNet-w32	640x640	0.706	0.880	0.770	0.749	0.902	ckpt	log
HigherHRNet-w48	512x512	0.716	0.884	0.775	0.755	0.901	ckpt	log

Results on CrowdPose test without multi-scale test¶

Arch	Input Size	AP	AP⁵⁰	AP⁷⁵	AP (E)	AP (M)	AP (H)	ckpt	log
HigherHRNet-w32	512x512	0.655	0.859	0.705	0.728	0.660	0.577	ckpt	log

Results on CrowdPose test with multi-scale test. 2 scales ([2, 1]) are used¶

Arch	Input Size	AP	AP⁵⁰	AP⁷⁵	AP (E)	AP (M)	AP (H)	ckpt	log
HigherHRNet-w32	512x512	0.661	0.864	0.710	0.742	0.670	0.566	ckpt	log

Results on AIC validation set without multi-scale test¶

Arch	Input Size	AP	AP⁵⁰	AP⁷⁵	AR	AR⁵⁰	ckpt	log
HigherHRNet-w32	512x512	0.315	0.710	0.243	0.379	0.757	ckpt	log

Results on AIC validation set with multi-scale test. 3 default scales ([2, 1, 0.5]) are used¶

Arch	Input Size	AP	AP⁵⁰	AP⁷⁵	AR	AR⁵⁰	ckpt	log
HigherHRNet-w32	512x512	0.323	0.718	0.254	0.379	0.758	ckpt	log

Associative Embedding (AE) + HRNet¶

Introduction¶

@inproceedings{newell2017associative,
  title={Associative embedding: End-to-end learning for joint detection and grouping},
  author={Newell, Alejandro and Huang, Zhiao and Deng, Jia},
  booktitle={Advances in neural information processing systems},
  pages={2277--2287},
  year={2017}
}

@inproceedings{sun2019deep,
  title={Deep high-resolution representation learning for human pose estimation},
  author={Sun, Ke and Xiao, Bin and Liu, Dong and Wang, Jingdong},
  booktitle={Proceedings of the IEEE conference on computer vision and pattern recognition},
  pages={5693--5703},
  year={2019}
}

Results and models¶

2d Human Pose Estimation¶

Results on COCO val2017 without multi-scale test¶

Arch	Input Size	AP	AP⁵⁰	AP⁷⁵	AR	AR⁵⁰	ckpt	log
HRNet-w32	512x512	0.654	0.863	0.720	0.710	0.892	ckpt	log
HRNet-w48	512x512	0.665	0.860	0.727	0.716	0.889	ckpt	log

Results on COCO val2017 with multi-scale test. 3 default scales ([2, 1, 0.5]) are used¶

Arch	Input Size	AP	AP⁵⁰	AP⁷⁵	AR	AR⁵⁰	ckpt	log
HRNet-w32	512x512	0.698	0.877	0.760	0.748	0.907	ckpt	log
HRNet-w48	512x512	0.712	0.880	0.771	0.757	0.909	ckpt	log

Results on MHP v2.0 validation set without multi-scale test¶

Arch	Input Size	AP	AP⁵⁰	AP⁷⁵	AR	AR⁵⁰	ckpt	log
HRNet-w48	512x512	0.583	0.895	0.666	0.656	0.931	ckpt	log

Results on MHP v2.0 validation set with multi-scale test. 3 default scales ([2, 1, 0.5]) are used¶

Arch	Input Size	AP	AP⁵⁰	AP⁷⁵	AR	AR⁵⁰	ckpt	log
HRNet-w48	512x512	0.592	0.898	0.673	0.664	0.932	ckpt	log

Results on AIC validation set without multi-scale test¶

Arch	Input Size	AP	AP⁵⁰	AP⁷⁵	AR	AR⁵⁰	ckpt	log
HRNet-w32	512x512	0.303	0.697	0.225	0.373	0.755	ckpt	log

Results on AIC validation set with multi-scale test. 3 default scales ([2, 1, 0.5]) are used¶

Arch	Input Size	AP	AP⁵⁰	AP⁷⁵	AR	AR⁵⁰	ckpt	log
HRNet-w32	512x512	0.318	0.717	0.246	0.379	0.764	ckpt	log

Associative Embedding (AE) + Mobilenetv2¶

Introduction¶

@inproceedings{newell2017associative,
  title={Associative embedding: End-to-end learning for joint detection and grouping},
  author={Newell, Alejandro and Huang, Zhiao and Deng, Jia},
  booktitle={Advances in neural information processing systems},
  pages={2277--2287},
  year={2017}
}

@inproceedings{sandler2018mobilenetv2,
  title={Mobilenetv2: Inverted residuals and linear bottlenecks},
  author={Sandler, Mark and Howard, Andrew and Zhu, Menglong and Zhmoginov, Andrey and Chen, Liang-Chieh},
  booktitle={Proceedings of the IEEE conference on computer vision and pattern recognition},
  pages={4510--4520},
  year={2018}
}

Results and models¶

2d Human Pose Estimation¶

Results on COCO val2017 without multi-scale test¶

Arch	Input Size	AP	AP⁵⁰	AP⁷⁵	AR	AR⁵⁰	ckpt	log
pose_mobilenetv2	512x512	0.380	0.671	0.368	0.473	0.741	ckpt	log

Results on COCO val2017 with multi-scale test. 3 default scales ([2, 1, 0.5]) are used¶

Arch	Input Size	AP	AP⁵⁰	AP⁷⁵	AR	AR⁵⁰	ckpt	log
pose_mobilenetv2	512x512	0.442	0.696	0.422	0.517	0.766	ckpt	log

Associative Embedding (AE) + ResNet¶

Introduction¶

@inproceedings{newell2017associative,
  title={Associative embedding: End-to-end learning for joint detection and grouping},
  author={Newell, Alejandro and Huang, Zhiao and Deng, Jia},
  booktitle={Advances in neural information processing systems},
  pages={2277--2287},
  year={2017}
}

@inproceedings{he2016deep,
  title={Deep residual learning for image recognition},
  author={He, Kaiming and Zhang, Xiangyu and Ren, Shaoqing and Sun, Jian},
  booktitle={Proceedings of the IEEE conference on computer vision and pattern recognition},
  pages={770--778},
  year={2016}
}

Results and models¶

2d Human Pose Estimation¶

Results on COCO val2017 without multi-scale test¶

Arch	Input Size	AP	AP⁵⁰	AP⁷⁵	AR	AR⁵⁰	ckpt	log
pose_resnet_50	512x512	0.466	0.742	0.479	0.552	0.797	ckpt	log
pose_resnet_50	640x640	0.479	0.757	0.487	0.566	0.810	ckpt	log
pose_resnet_101	512x512	0.554	0.807	0.599	0.622	0.841	ckpt	log
pose_resnet_152	512x512	0.595	0.829	0.648	0.651	0.856	ckpt	log

Results on COCO val2017 with multi-scale test. 3 default scales ([2, 1, 0.5]) are used¶

Arch	Input Size	AP	AP⁵⁰	AP⁷⁵	AR	AR⁵⁰	ckpt	log
pose_resnet_50	512x512	0.503	0.765	0.521	0.591	0.821	ckpt	log
pose_resnet_50	640x640	0.525	0.784	0.542	0.610	0.832	ckpt	log
pose_resnet_101	512x512	0.603	0.831	0.641	0.668	0.870	ckpt	log
pose_resnet_152	512x512	0.660	0.860	0.713	0.709	0.889	ckpt	log

The Devil is in the Details: Delving into Unbiased Data Processing for Human Pose Estimation¶

Introduction¶

@InProceedings{Huang_2020_CVPR,
author = {Huang, Junjie and Zhu, Zheng and Guo, Feng and Huang, Guan},
title = {The Devil Is in the Details: Delving Into Unbiased Data Processing for Human Pose Estimation},
booktitle = {The IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)},
month = {June},
year = {2020}
}

Note that, UDP also adopts the unbiased encoding/decoding algorithm of DARK.

Results and models¶

2d Human Pose Estimation¶

Results on COCO val2017 without multi-scale test¶

Arch	Input Size	AP	AP⁵⁰	AP⁷⁵	AR	AR⁵⁰	ckpt	log
HRNet-w32_udp	512x512	0.671	0.863	0.729	0.717	0.889	ckpt	log
HRNet-w48_udp	512x512	0.681	0.872	0.741	0.725	0.892	ckpt	log
HigherHRNet-w32_udp	512x512	0.678	0.862	0.736	0.724	0.890	ckpt	log
HigherHRNet-w48_udp	512x512	0.690	0.872	0.750	0.734	0.891	ckpt	log