Top Down Models¶

Imagenet classification with deep convolutional neural networks¶

Introduction¶

@inproceedings{krizhevsky2012imagenet,
  title={Imagenet classification with deep convolutional neural networks},
  author={Krizhevsky, Alex and Sutskever, Ilya and Hinton, Geoffrey E},
  booktitle={Advances in neural information processing systems},
  pages={1097--1105},
  year={2012}
}

Results and models¶

2d Human Pose Estimation¶

Results on COCO val2017 with detector having human AP of 56.4 on COCO val2017 dataset¶

Arch	Input Size	AP	AP⁵⁰	AP⁷⁵	AR	AR⁵⁰	ckpt	log
pose_alexnet	256x192	0.397	0.758	0.381	0.478	0.822	ckpt	log

Deep high-resolution representation learning for human pose estimation¶

Introduction¶

@inproceedings{sun2019deep,
  title={Deep high-resolution representation learning for human pose estimation},
  author={Sun, Ke and Xiao, Bin and Liu, Dong and Wang, Jingdong},
  booktitle={Proceedings of the IEEE conference on computer vision and pattern recognition},
  pages={5693--5703},
  year={2019}
}

@article{buslaev2020albumentations,
  title={Albumentations: fast and flexible image augmentations},
  author={Buslaev, Alexander and Iglovikov, Vladimir I and Khvedchenya, Eugene and Parinov, Alex and Druzhinin, Mikhail and Kalinin, Alexandr A},
  journal={Information},
  volume={11},
  number={2},
  pages={125},
  year={2020},
  publisher={Multidisciplinary Digital Publishing Institute}
}

Results and models¶

2d Human Pose Estimation¶

Results on COCO val2017 with detector having human AP of 56.4 on COCO val2017 dataset¶

Arch	Input Size	AP	AP⁵⁰	AP⁷⁵	AR	AR⁵⁰	ckpt	log
coarsedropout	256x192	0.753	0.908	0.822	0.806	0.946	ckpt	log
gridmask	256x192	0.752	0.906	0.825	0.804	0.943	ckpt	log
photometric	256x192	0.753	0.909	0.825	0.805	0.943	ckpt	log

Convolutional pose machines¶

Introduction¶

@inproceedings{wei2016convolutional,
  title={Convolutional pose machines},
  author={Wei, Shih-En and Ramakrishna, Varun and Kanade, Takeo and Sheikh, Yaser},
  booktitle={Proceedings of the IEEE conference on Computer Vision and Pattern Recognition},
  pages={4724--4732},
  year={2016}
}

Results and models¶

2d Human Pose Estimation¶

Results on COCO val2017 with detector having human AP of 56.4 on COCO val2017 dataset¶

Arch	Input Size	AP	AP⁵⁰	AP⁷⁵	AR	AR⁵⁰	ckpt	log
cpm	256x192	0.623	0.859	0.704	0.686	0.903	ckpt	log
cpm	384x288	0.650	0.864	0.725	0.708	0.905	ckpt	log

Results on MPII val set¶

Arch	Input Size	Mean	Mean@0.1	ckpt	log
cpm	368x368	0.876	0.285	ckpt	log

Results on Sub-JHMDB dataset¶

The models are pre-trained on MPII dataset only. NO test-time augmentation (multi-scale /rotation testing) is used.

Normalized by Person Size¶

Split	Arch	Input Size	Head	Sho	Elb	Wri	Hip	Knee	Ank	Mean	ckpt	log
Sub1	cpm	368x368	96.1	91.9	81.0	78.9	96.6	90.8	87.3	89.5	ckpt	log
Sub2	cpm	368x368	98.1	93.6	77.1	70.9	94.0	89.1	84.7	87.4	ckpt	log
Sub3	cpm	368x368	97.9	94.9	87.3	84.0	98.6	94.4	86.2	92.4	ckpt	log
Average	cpm	368x368	97.4	93.5	81.5	77.9	96.4	91.4	86.1	89.8	-	-

Normalized by Torso Size¶

Split	Arch	Input Size	Head	Sho	Elb	Wri	Hip	Knee	Ank	Mean	ckpt	log
Sub1	cpm	368x368	89.0	63.0	54.0	54.9	68.2	63.1	61.2	66.0	ckpt	log
Sub2	cpm	368x368	90.3	57.9	46.8	44.3	60.8	58.2	62.4	61.1	ckpt	log
Sub3	cpm	368x368	91.0	72.6	59.9	54.0	73.2	68.5	65.8	70.3	ckpt	log
Average	cpm	368x368	90.1	64.5	53.6	51.1	67.4	63.3	63.1	65.7	-	-

Distribution-aware coordinate representation for human pose estimation¶

Introduction¶

@inproceedings{zhang2020distribution,
  title={Distribution-aware coordinate representation for human pose estimation},
  author={Zhang, Feng and Zhu, Xiatian and Dai, Hanbin and Ye, Mao and Zhu, Ce},
  booktitle={Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition},
  pages={7093--7102},
  year={2020}
}

Results and models¶

2d Human Pose Estimation¶

Results on COCO val2017 with detector having human AP of 56.4 on COCO val2017 dataset¶

Arch	Input Size	AP	AP⁵⁰	AP⁷⁵	AR	AR⁵⁰	ckpt	log
pose_resnet_50_dark	256x192	0.724	0.898	0.800	0.777	0.936	ckpt	log
pose_resnet_50_dark	384x288	0.735	0.900	0.801	0.785	0.937	ckpt	log
pose_resnet_101_dark	256x192	0.732	0.899	0.808	0.786	0.938	ckpt	log
pose_resnet_101_dark	384x288	0.749	0.902	0.816	0.799	0.939	ckpt	log
pose_resnet_152_dark	256x192	0.745	0.905	0.821	0.797	0.942	ckpt	log
pose_resnet_152_dark	384x288	0.757	0.909	0.826	0.806	0.943	ckpt	log
pose_hrnet_w32_dark	256x192	0.757	0.907	0.823	0.808	0.943	ckpt	log
pose_hrnet_w32_dark	384x288	0.766	0.907	0.831	0.815	0.943	ckpt	log
pose_hrnet_w48_dark	256x192	0.764	0.907	0.830	0.814	0.943	ckpt	log
pose_hrnet_w48_dark	384x288	0.772	0.910	0.836	0.820	0.946	ckpt	log

Results on MPII val set¶

Arch	Input Size	Mean	Mean@0.1	ckpt	log
pose_hrnet_w32_dark	256x256	0.904	0.354	ckpt	log
pose_hrnet_w48_dark	256x256	0.905	0.360	ckpt	log

Deeppose: Human pose estimation via deep neural networks¶

Introduction¶

@inproceedings{toshev2014deeppose,
  title={Deeppose: Human pose estimation via deep neural networks},
  author={Toshev, Alexander and Szegedy, Christian},
  booktitle={Proceedings of the IEEE conference on computer vision and pattern recognition},
  pages={1653--1660},
  year={2014}
}

Results and models¶

2d Human Pose Estimation¶

Results on COCO val2017 with detector having human AP of 56.4 on COCO val2017 dataset¶

Arch	Input Size	AP	AP⁵⁰	AP⁷⁵	AR	AR⁵⁰	ckpt	log
deeppose_resnet_50	256x192	0.526	0.816	0.586	0.638	0.887	ckpt	log
deeppose_resnet_101	256x192	0.560	0.832	0.628	0.668	0.900	ckpt	log
deeppose_resnet_152	256x192	0.583	0.843	0.659	0.686	0.907	ckpt	log

Results on MPII val set¶

Arch	Input Size	Mean	Mean@0.1	ckpt	log
deeppose_resnet_50	256x256	0.825	0.174	ckpt	log
deeppose_resnet_101	256x256	0.841	0.193	ckpt	log
deeppose_resnet_152	256x256	0.850	0.198	ckpt	log

Stacked hourglass networks for human pose estimation¶

Introduction¶

@inproceedings{newell2016stacked,
  title={Stacked hourglass networks for human pose estimation},
  author={Newell, Alejandro and Yang, Kaiyu and Deng, Jia},
  booktitle={European conference on computer vision},
  pages={483--499},
  year={2016},
  organization={Springer}
}

Results and models¶

2d Human Pose Estimation¶

Results on COCO val2017 with detector having human AP of 56.4 on COCO val2017 dataset¶

Arch	Input Size	AP	AP⁵⁰	AP⁷⁵	AR	AR⁵⁰	ckpt	log
pose_hourglass_52	256x256	0.726	0.896	0.799	0.780	0.934	ckpt	log
pose_hourglass_52	384x384	0.746	0.900	0.813	0.797	0.939	ckpt	log

Results on MPII val set¶

Arch	Input Size	Mean	Mean@0.1	ckpt	log
pose_hourglass_52	256x256	0.889	0.317	ckpt	log
pose_hourglass_52	384x384	0.894	0.366	ckpt	log

Deep high-resolution representation learning for human pose estimation¶

Introduction¶

@inproceedings{sun2019deep,
  title={Deep high-resolution representation learning for human pose estimation},
  author={Sun, Ke and Xiao, Bin and Liu, Dong and Wang, Jingdong},
  booktitle={Proceedings of the IEEE conference on computer vision and pattern recognition},
  pages={5693--5703},
  year={2019}
}

Results and models¶

2d Human Pose Estimation¶

Results on COCO val2017 with detector having human AP of 56.4 on COCO val2017 dataset¶

Arch	Input Size	AP	AP⁵⁰	AP⁷⁵	AR	AR⁵⁰	ckpt	log
pose_hrnet_w32	256x192	0.746	0.904	0.819	0.799	0.942	ckpt	log
pose_hrnet_w32	384x288	0.760	0.906	0.829	0.810	0.943	ckpt	log
pose_hrnet_w48	256x192	0.756	0.907	0.825	0.806	0.942	ckpt	log
pose_hrnet_w48	384x288	0.767	0.910	0.831	0.816	0.946	ckpt	log
pose_hrnet_w32_fp16¹	256x192	0.746	0.905	0.88	0.800	0.943	ckpt	log

¹ Please refer to fp16/README.md for the method we use for mixed precision training.

Results on AIC val set with ground-truth bounding boxes¶

Arch	Input Size	AP	AP⁵⁰	AP⁷⁵	AR	AR⁵⁰	ckpt	log
pose_hrnet_w32	256x192	0.323	0.762	0.219	0.366	0.789	ckpt	log

Results on MPII val set¶

Arch	Input Size	Mean	Mean@0.1	ckpt	log
pose_hrnet_w32	256x256	0.900	0.334	ckpt	log
pose_hrnet_w48	256x256	0.901	0.337	ckpt	log

Results on CrowdPose test with YOLOv3 human detector¶

Arch	Input Size	AP	AP⁵⁰	AP⁷⁵	AP (E)	AP (M)	AP (H)	ckpt	log
pose_hrnet_w32	256x192	0.675	0.825	0.729	0.770	0.687	0.553	ckpt	log

Results on PoseTrack2018 val with ground-truth bounding boxes¶

Arch	Input Size	Head	Shou	Elb	Wri	Hip	Knee	Ankl	Total	ckpt	log
pose_hrnet_w32	256x192	87.4	88.6	84.3	78.5	79.7	81.8	78.8	83.0	ckpt	log

The models are first pre-trained on COCO dataset, and then fine-tuned on PoseTrack18.

Results on PoseTrack2018 val with MMDetection pre-trained Cascade R-CNN (X-101-64x4d-FPN) human detector¶

Arch	Input Size	Head	Shou	Elb	Wri	Hip	Knee	Ankl	Total	ckpt	log
pose_hrnet_w32	256x192	78.0	82.9	79.5	73.8	76.9	76.6	70.2	76.9	ckpt	log

The models are first pre-trained on COCO dataset, and then fine-tuned on PoseTrack18.

Mobilenetv2: Inverted residuals and linear bottlenecks¶

Introduction¶

@inproceedings{sandler2018mobilenetv2,
  title={Mobilenetv2: Inverted residuals and linear bottlenecks},
  author={Sandler, Mark and Howard, Andrew and Zhu, Menglong and Zhmoginov, Andrey and Chen, Liang-Chieh},
  booktitle={Proceedings of the IEEE conference on computer vision and pattern recognition},
  pages={4510--4520},
  year={2018}
}

Results and models¶

2d Human Pose Estimation¶

Results on COCO val2017 with detector having human AP of 56.4 on COCO val2017 dataset¶

Arch	Input Size	AP	AP⁵⁰	AP⁷⁵	AR	AR⁵⁰	ckpt	log
pose_mobilenetv2	256x192	0.646	0.874	0.723	0.707	0.917	ckpt	log
pose_mobilenetv2	384x288	0.673	0.879	0.743	0.729	0.916	ckpt	log

Results on MPII val set¶

Arch	Input Size	Mean	Mean@0.1	ckpt	log
pose_mobilenetv2	256x256	0.854	0.235	ckpt	log

Rethinking on multi-stage networks for human pose estimation¶

Introduction¶

@article{li2019rethinking,
  title={Rethinking on Multi-Stage Networks for Human Pose Estimation},
  author={Li, Wenbo and Wang, Zhicheng and Yin, Binyi and Peng, Qixiang and Du, Yuming and Xiao, Tianzi and Yu, Gang and Lu, Hongtao and Wei, Yichen and Sun, Jian},
  journal={arXiv preprint arXiv:1901.00148},
  year={2019}
}

Results and models¶

2d Human Pose Estimation¶

Results on COCO val2017 with detector having human AP of 56.4 on COCO val2017 dataset¶

Arch	Input Size	AP	AP⁵⁰	AP⁷⁵	AR	AR⁵⁰	ckpt	log
mspn_50	256x192	0.723	0.895	0.794	0.788	0.933	ckpt	log
2xmspn_50	256x192	0.754	0.903	0.825	0.815	0.941	ckpt	log
3xmspn_50	256x192	0.758	0.904	0.830	0.821	0.943	ckpt	log
4xmspn_50	256x192	0.764	0.906	0.835	0.826	0.944	ckpt	log

ResNeSt: Split-Attention Networks¶

Introduction¶

@article{zhang2020resnest,
title={ResNeSt: Split-Attention Networks},
author={Zhang, Hang and Wu, Chongruo and Zhang, Zhongyue and Zhu, Yi and Zhang, Zhi and Lin, Haibin and Sun, Yue and He, Tong and Muller, Jonas and Manmatha, R. and Li, Mu and Smola, Alexander},
journal={arXiv preprint arXiv:2004.08955},
year={2020}
}

Results and models¶

2d Human Pose Estimation¶

Results on COCO val2017 with detector having human AP of 56.4 on COCO val2017 dataset¶

Arch	Input Size	AP	AP⁵⁰	AP⁷⁵	AR	AR⁵⁰	ckpt	log
pose_resnest_50	256x192	0.721	0.899	0.802	0.776	0.938	ckpt	log
pose_resnest_50	384x288	0.737	0.900	0.811	0.789	0.938	ckpt	log
pose_resnest_101	256x192	0.725	0.899	0.807	0.781	0.939	ckpt	log
pose_resnest_101	384x288	0.746	0.906	0.820	0.798	0.943	ckpt	log

Simple baselines for human pose estimation and tracking¶

Introduction¶

@inproceedings{xiao2018simple,
  title={Simple baselines for human pose estimation and tracking},
  author={Xiao, Bin and Wu, Haiping and Wei, Yichen},
  booktitle={Proceedings of the European conference on computer vision (ECCV)},
  pages={466--481},
  year={2018}
}

Results and models¶

2d Human Pose Estimation¶

Results on COCO val2017 with detector having human AP of 56.4 on COCO val2017 dataset¶

Arch	Input Size	AP	AP⁵⁰	AP⁷⁵	AR	AR⁵⁰	ckpt	log
pose_resnet_50	256x192	0.718	0.898	0.795	0.773	0.937	ckpt	log
pose_resnet_50	384x288	0.731	0.900	0.799	0.783	0.931	ckpt	log
pose_resnet_101	256x192	0.726	0.899	0.806	0.781	0.939	ckpt	log
pose_resnet_101	384x288	0.748	0.905	0.817	0.798	0.940	ckpt	log
pose_resnet_152	256x192	0.735	0.905	0.812	0.790	0.943	ckpt	log
pose_resnet_152	384x288	0.750	0.908	0.821	0.800	0.942	ckpt	log
pose_resnet_50_fp16¹	256x192	0.717	0.898	0.793	0.772	0.936	ckpt	log

¹ Please refer to fp16/README.md for the method we use for mixed precision training.

Results on OCHuman test dataset with ground-truth bounding boxes¶

Following the common setting, the models are trained on COCO train dataset, and evaluate on OCHuman dataset.

Arch	Input Size	AP	AP⁵⁰	AP⁷⁵	AR	AR⁵⁰	ckpt	log
pose_resnet_50	256x192	0.546	0.726	0.593	0.592	0.755	ckpt	log
pose_resnet_50	384x288	0.539	0.723	0.574	0.588	0.756	ckpt	log
pose_resnet_101	256x192	0.559	0.724	0.606	0.605	0.751	ckpt	log
pose_resnet_101	384x288	0.571	0.715	0.615	0.615	0.748	ckpt	log
pose_resnet_152	256x192	0.570	0.725	0.617	0.616	0.754	ckpt	log
pose_resnet_152	384x288	0.582	0.723	0.627	0.627	0.752	ckpt	log

Results on AIC val set with ground-truth bounding boxes¶

Arch	Input Size	AP	AP⁵⁰	AP⁷⁵	AR	AR⁵⁰	ckpt	log
pose_resnet_101	256x192	0.294	0.736	0.174	0.337	0.763	ckpt	log

Results on MHP v2.0 val set¶

Arch	Input Size	AP	AP⁵⁰	AP⁷⁵	AR	AR⁵⁰	ckpt	log
pose_resnet_101	256x192	0.583	0.897	0.669	0.636	0.918	ckpt	log

Note that, the evaluation metric used here is mAP (adapted from COCO), which may be different from the official evaluation codes. Please be cautious if you use the results in papers.

Results on MPII val set¶

Arch	Input Size	Mean	Mean@0.1	ckpt	log
pose_resnet_50	256x256	0.882	0.286	ckpt	log
pose_resnet_101	256x256	0.888	0.290	ckpt	log
pose_resnet_152	256x256	0.889	0.303	ckpt	log

Results on MPII-TRB val set¶

Arch	Input Size	Skeleton Acc	Contour Acc	Mean Acc	ckpt	log
pose_resnet_50	256x256	0.887	0.858	0.868	ckpt	log
pose_resnet_101	256x256	0.890	0.863	0.873	ckpt	log
pose_resnet_152	256x256	0.897	0.868	0.879	ckpt	log

Results on CrowdPose test with YOLOv3 human detector¶

Arch	Input Size	AP	AP⁵⁰	AP⁷⁵	AP (E)	AP (M)	AP (H)	ckpt	log
pose_resnet_50	256x192	0.637	0.808	0.692	0.739	0.650	0.506	ckpt	log
pose_resnet_101	256x192	0.647	0.810	0.703	0.744	0.658	0.522	ckpt	log
pose_resnet_101	320x256	0.661	0.821	0.714	0.759	0.671	0.536	ckpt	log
pose_resnet_152	256x192	0.656	0.818	0.712	0.754	0.666	0.532	ckpt	log

Results on PoseTrack2018 val with ground-truth bounding boxes¶

Arch	Input Size	Head	Shou	Elb	Wri	Hip	Knee	Ankl	Total	ckpt	log
pose_resnet_50	256x192	86.5	87.5	82.3	75.6	79.9	78.6	74.0	81.0	ckpt	log

The models are first pre-trained on COCO dataset, and then fine-tuned on PoseTrack18.

Results on PoseTrack2018 val with MMDetection pre-trained Cascade R-CNN (X-101-64x4d-FPN) human detector¶

Arch	Input Size	Head	Shou	Elb	Wri	Hip	Knee	Ankl	Total	ckpt	log
pose_resnet_50	256x192	78.9	81.9	77.8	70.8	75.3	73.2	66.4	75.2	ckpt	log

The models are first pre-trained on COCO dataset, and then fine-tuned on PoseTrack18.

Results on Sub-JHMDB dataset¶

The models are pre-trained on MPII dataset only. NO test-time augmentation (multi-scale /rotation testing) is used.

Normalized by Person Size¶

Split	Arch	Input Size	Head	Sho	Elb	Wri	Hip	Knee	Ank	Mean	ckpt	log
Sub1	pose_resnet_50	256x256	99.1	98.0	93.8	91.3	99.4	96.5	92.8	96.1	ckpt	log
Sub2	pose_resnet_50	256x256	99.3	97.1	90.6	87.0	98.9	96.3	94.1	95.0	ckpt	log
Sub3	pose_resnet_50	256x256	99.0	97.9	94.0	91.6	99.7	98.0	94.7	96.7	ckpt	log
Average	pose_resnet_50	256x256	99.2	97.7	92.8	90.0	99.3	96.9	93.9	96.0	-	-
Sub1	pose_resnet_50 (2 Deconv.)	256x256	99.1	98.5	94.6	92.0	99.4	94.6	92.5	96.1	ckpt	log
Sub2	pose_resnet_50 (2 Deconv.)	256x256	99.3	97.8	91.0	87.0	99.1	96.5	93.8	95.2	ckpt	log
Sub3	pose_resnet_50 (2 Deconv.)	256x256	98.8	98.4	94.3	92.1	99.8	97.5	93.8	96.7	ckpt	log
Average	pose_resnet_50 (2 Deconv.)	256x256	99.1	98.2	93.3	90.4	99.4	96.2	93.4	96.0	-	-

Normalized by Torso Size¶

Split	Arch	Input Size	Head	Sho	Elb	Wri	Hip	Knee	Ank	Mean	ckpt	log
Sub1	pose_resnet_50	256x256	93.3	83.2	74.4	72.7	85.0	81.2	78.9	81.9	ckpt	log
Sub2	pose_resnet_50	256x256	94.1	74.9	64.5	62.5	77.9	71.9	78.6	75.5	ckpt	log
Sub3	pose_resnet_50	256x256	97.0	82.2	74.9	70.7	84.7	83.7	84.2	82.9	ckpt	log
Average	pose_resnet_50	256x256	94.8	80.1	71.3	68.6	82.5	78.9	80.6	80.1	-	-
Sub1	pose_resnet_50 (2 Deconv.)	256x256	92.4	80.6	73.2	70.5	82.3	75.4	75.0	79.2	ckpt	log
Sub2	pose_resnet_50 (2 Deconv.)	256x256	93.4	73.6	63.8	60.5	75.1	68.4	75.5	73.7	ckpt	log
Sub3	pose_resnet_50 (2 Deconv.)	256x256	96.1	81.2	72.6	67.9	83.6	80.9	81.5	81.2	ckpt	log
Average	pose_resnet_50 (2 Deconv.)	256x256	94.0	78.5	69.9	66.3	80.3	74.9	77.3	78.0	-	-

ResNetV1D¶

Introduction¶

@inproceedings{he2019bag,
  title={Bag of tricks for image classification with convolutional neural networks},
  author={He, Tong and Zhang, Zhi and Zhang, Hang and Zhang, Zhongyue and Xie, Junyuan and Li, Mu},
  booktitle={Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition},
  pages={558--567},
  year={2019}
}

Results and models¶

2d Human Pose Estimation¶

Results on COCO val2017 with detector having human AP of 56.4 on COCO val2017 dataset¶

Arch	Input Size	AP	AP⁵⁰	AP⁷⁵	AR	AR⁵⁰	ckpt	log
pose_resnetv1d_50	256x192	0.722	0.897	0.799	0.777	0.933	ckpt	log
pose_resnetv1d_50	384x288	0.730	0.900	0.799	0.780	0.934	ckpt	log
pose_resnetv1d_101	256x192	0.731	0.899	0.809	0.786	0.938	ckpt	log
pose_resnetv1d_101	384x288	0.748	0.902	0.816	0.799	0.939	ckpt	log
pose_resnetv1d_152	256x192	0.737	0.902	0.812	0.791	0.940	ckpt	log
pose_resnetv1d_152	384x288	0.752	0.909	0.821	0.802	0.944	ckpt	log

Results on MPII val set¶

Arch	Input Size	Mean	Mean@0.1	ckpt	log
pose_resnetv1d_50	256x256	0.881	0.290	ckpt	log
pose_resnetv1d_101	256x256	0.883	0.295	ckpt	log
pose_resnetv1d_152	256x256	0.888	0.300	ckpt	log

Aggregated residual transformations for deep neural networks¶

Introduction¶

@inproceedings{xie2017aggregated,
  title={Aggregated residual transformations for deep neural networks},
  author={Xie, Saining and Girshick, Ross and Doll{\'a}r, Piotr and Tu, Zhuowen and He, Kaiming},
  booktitle={Proceedings of the IEEE conference on computer vision and pattern recognition},
  pages={1492--1500},
  year={2017}
}

Results and models¶

2d Human Pose Estimation¶

Results on COCO val2017 with detector having human AP of 56.4 on COCO val2017 dataset¶

Arch	Input Size	AP	AP⁵⁰	AP⁷⁵	AR	AR⁵⁰	ckpt	log
pose_resnext_50	256x192	0.714	0.898	0.789	0.771	0.937	ckpt	log
pose_resnext_50	384x288	0.724	0.899	0.794	0.777	0.935	ckpt	log
pose_resnext_101	256x192	0.726	0.900	0.801	0.782	0.940	ckpt	log
pose_resnext_101	384x288	0.743	0.903	0.815	0.795	0.939	ckpt	log
pose_resnext_152	256x192	0.730	0.904	0.808	0.786	0.940	ckpt	log
pose_resnext_152	384x288	0.742	0.902	0.810	0.794	0.939	ckpt	log

Results on MPII val set¶

Arch	Input Size	Mean	Mean@0.1	ckpt	log
pose_resnext_152	256x256	0.887	0.294	ckpt	log

Learning delicate local representations for multi-person pose estimation¶

Introduction¶

@misc{cai2020learning,
    title={Learning Delicate Local Representations for Multi-Person Pose Estimation},
    author={Yuanhao Cai and Zhicheng Wang and Zhengxiong Luo and Binyi Yin and Angang Du and Haoqian Wang and Xinyu Zhou and Erjin Zhou and Xiangyu Zhang and Jian Sun},
    year={2020},
    eprint={2003.04030},
    archivePrefix={arXiv},
    primaryClass={cs.CV}
}

Results and models¶

2d Human Pose Estimation¶

Results on COCO val2017 with detector having human AP of 56.4 on COCO val2017 dataset¶

Arch	Input Size	AP	AP⁵⁰	AP⁷⁵	AR	AR⁵⁰	ckpt	log
rsn_18	256x192	0.704	0.887	0.779	0.771	0.926	ckpt	log
rsn_50	256x192	0.723	0.896	0.800	0.788	0.934	ckpt	log
2xrsn_50	256x192	0.745	0.899	0.818	0.809	0.939	ckpt	log
3xrsn_50	256x192	0.750	0.900	0.823	0.813	0.940	ckpt	log

Improving Convolutional Networks with Self-Calibrated Convolutions¶

Introduction¶

@inproceedings{liu2020improving,
  title={Improving Convolutional Networks with Self-Calibrated Convolutions},
  author={Liu, Jiang-Jiang and Hou, Qibin and Cheng, Ming-Ming and Wang, Changhu and Feng, Jiashi},
  booktitle={Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition},
  pages={10096--10105},
  year={2020}
}

Results and models¶

2d Human Pose Estimation¶

Results on COCO val2017 with detector having human AP of 56.4 on COCO val2017 dataset¶

Arch	Input Size	AP	AP⁵⁰	AP⁷⁵	AR	AR⁵⁰	ckpt	log
pose_scnet_50	256x192	0.728	0.899	0.807	0.784	0.938	ckpt	log
pose_scnet_50	384x288	0.751	0.906	0.818	0.802	0.943	ckpt	log
pose_scnet_101	256x192	0.733	0.903	0.813	0.790	0.941	ckpt	log
pose_scnet_101	384x288	0.752	0.906	0.823	0.804	0.943	ckpt	log

Results on MPII val set¶

Arch	Input Size	Mean	Mean@0.1	ckpt	log
pose_scnet_50	256x256	0.888	0.290	ckpt	log
pose_scnet_101	256x256	0.886	0.293	ckpt	log

Squeeze-and-excitation networks¶

Introduction¶

@inproceedings{hu2018squeeze,
  title={Squeeze-and-excitation networks},
  author={Hu, Jie and Shen, Li and Sun, Gang},
  booktitle={Proceedings of the IEEE conference on computer vision and pattern recognition},
  pages={7132--7141},
  year={2018}
}

Results and models¶

2d Human Pose Estimation¶

Results on COCO val2017 with detector having human AP of 56.4 on COCO val2017 dataset¶

Arch	Input Size	AP	AP⁵⁰	AP⁷⁵	AR	AR⁵⁰	ckpt	log
pose_seresnet_50	256x192	0.728	0.900	0.809	0.784	0.940	ckpt	log
pose_seresnet_50	384x288	0.748	0.905	0.819	0.799	0.941	ckpt	log
pose_seresnet_101	256x192	0.734	0.904	0.815	0.790	0.942	ckpt	log
pose_seresnet_101	384x288	0.753	0.907	0.823	0.805	0.943	ckpt	log
pose_seresnet_152*	256x192	0.730	0.899	0.810	0.786	0.940	ckpt	log
pose_seresnet_152*	384x288	0.753	0.906	0.823	0.806	0.945	ckpt	log

Note that * means without imagenet pre-training.

Results on MPII val set¶

Arch	Input Size	Mean	Mean@0.1	ckpt	log
pose_seresnet_50	256x256	0.884	0.292	ckpt	log
pose_seresnet_101	256x256	0.884	0.295	ckpt	log
pose_seresnet_152*	256x256	0.884	0.287	ckpt	log

Note that * means without imagenet pre-training.

Shufflenet: An extremely efficient convolutional neural network for mobile devices¶

Introduction¶

@inproceedings{zhang2018shufflenet,
  title={Shufflenet: An extremely efficient convolutional neural network for mobile devices},
  author={Zhang, Xiangyu and Zhou, Xinyu and Lin, Mengxiao and Sun, Jian},
  booktitle={Proceedings of the IEEE conference on computer vision and pattern recognition},
  pages={6848--6856},
  year={2018}
}

Results and models¶

2d Human Pose Estimation¶

Results on COCO val2017 with detector having human AP of 56.4 on COCO val2017 dataset¶

Arch	Input Size	AP	AP⁵⁰	AP⁷⁵	AR	AR⁵⁰	ckpt	log
pose_shufflenetv1	256x192	0.585	0.845	0.650	0.651	0.894	ckpt	log
pose_shufflenetv1	384x288	0.622	0.859	0.685	0.684	0.901	ckpt	log

Results on MPII val set¶

Arch	Input Size	Mean	Mean@0.1	ckpt	log
pose_shufflenetv1	256x256	0.823	0.195	ckpt	log

Shufflenet v2: Practical guidelines for efficient cnn architecture design¶

Introduction¶

@inproceedings{ma2018shufflenet,
  title={Shufflenet v2: Practical guidelines for efficient cnn architecture design},
  author={Ma, Ningning and Zhang, Xiangyu and Zheng, Hai-Tao and Sun, Jian},
  booktitle={Proceedings of the European conference on computer vision (ECCV)},
  pages={116--131},
  year={2018}
}

Results and models¶

2d Human Pose Estimation¶

Results on COCO val2017 with detector having human AP of 56.4 on COCO val2017 dataset¶

Arch	Input Size	AP	AP⁵⁰	AP⁷⁵	AR	AR⁵⁰	ckpt	log
pose_shufflenetv2	256x192	0.599	0.854	0.663	0.664	0.899	ckpt	log
pose_shufflenetv2	384x288	0.636	0.865	0.705	0.697	0.909	ckpt	log

Results on MPII val set¶

Arch	Input Size	Mean	Mean@0.1	ckpt	log
pose_shufflenetv2	256x256	0.828	0.205	ckpt	log

The Devil is in the Details: Delving into Unbiased Data Processing for Human Pose Estimation¶

Introduction¶

@InProceedings{Huang_2020_CVPR,
author = {Huang, Junjie and Zhu, Zheng and Guo, Feng and Huang, Guan},
title = {The Devil Is in the Details: Delving Into Unbiased Data Processing for Human Pose Estimation},
booktitle = {The IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)},
month = {June},
year = {2020}
}

Note that, UDP also adopts the unbiased encoding/decoding algorithm of DARK.

Results and models¶

2d Human Pose Estimation¶

Results on COCO val2017 with detector having human AP of 56.4 on COCO val2017 dataset¶

Arch	Input Size	AP	AP⁵⁰	AP⁷⁵	AR	AR⁵⁰	ckpt	log
pose_hrnet_w32_udp	256x192	0.760	0.907	0.827	0.811	0.945	ckpt	log
pose_hrnet_w32_udp	384x288	0.769	0.908	0.833	0.817	0.944	ckpt	log
pose_hrnet_w48_udp	256x192	0.767	0.906	0.834	0.817	0.945	ckpt	log
pose_hrnet_w48_udp	384x288	0.772	0.910	0.835	0.820	0.945	ckpt	log
pose_hrnet_w32_udp_regress	256x192	0.758	0.908	0.823	0.812	0.943	ckpt	log

Very Deep Convolutional Networks for Large-Scale Image Recognition¶

Introduction¶

@article{simonyan2014very,
  title={Very deep convolutional networks for large-scale image recognition},
  author={Simonyan, Karen and Zisserman, Andrew},
  journal={arXiv preprint arXiv:1409.1556},
  year={2014}
}