Wholebody¶

Coco-Wholebody Dataset¶

Associative Embedding + Hrnet on Coco-Wholebody¶

Associative Embedding (NIPS'2017)

@inproceedings{newell2017associative,
  title={Associative embedding: End-to-end learning for joint detection and grouping},
  author={Newell, Alejandro and Huang, Zhiao and Deng, Jia},
  booktitle={Advances in neural information processing systems},
  pages={2277--2287},
  year={2017}
}

HRNet (CVPR'2019)

@inproceedings{sun2019deep,
  title={Deep high-resolution representation learning for human pose estimation},
  author={Sun, Ke and Xiao, Bin and Liu, Dong and Wang, Jingdong},
  booktitle={Proceedings of the IEEE conference on computer vision and pattern recognition},
  pages={5693--5703},
  year={2019}
}

COCO-WholeBody (ECCV'2020)

@inproceedings{jin2020whole,
  title={Whole-Body Human Pose Estimation in the Wild},
  author={Jin, Sheng and Xu, Lumin and Xu, Jin and Wang, Can and Liu, Wentao and Qian, Chen and Ouyang, Wanli and Luo, Ping},
  booktitle={Proceedings of the European Conference on Computer Vision (ECCV)},
  year={2020}
}

Results on COCO-WholeBody v1.0 val without multi-scale test

Arch	Input Size	Body AP	Body AR	Foot AP	Foot AR	Face AP	Face AR	Hand AP	Hand AR	Whole AP	Whole AR	ckpt	log
HRNet-w32+	512x512	0.551	0.650	0.271	0.451	0.564	0.618	0.159	0.238	0.342	0.453	ckpt	log
HRNet-w48+	512x512	0.592	0.686	0.443	0.595	0.619	0.674	0.347	0.438	0.422	0.532	ckpt	log

Note: + means the model is first pre-trained on original COCO dataset, and then fine-tuned on COCO-WholeBody dataset. We find this will lead to better performance.

Associative Embedding + Higherhrnet on Coco-Wholebody¶

Associative Embedding (NIPS'2017)

@inproceedings{newell2017associative,
  title={Associative embedding: End-to-end learning for joint detection and grouping},
  author={Newell, Alejandro and Huang, Zhiao and Deng, Jia},
  booktitle={Advances in neural information processing systems},
  pages={2277--2287},
  year={2017}
}

HigherHRNet (CVPR'2020)

@inproceedings{cheng2020higherhrnet,
  title={HigherHRNet: Scale-Aware Representation Learning for Bottom-Up Human Pose Estimation},
  author={Cheng, Bowen and Xiao, Bin and Wang, Jingdong and Shi, Honghui and Huang, Thomas S and Zhang, Lei},
  booktitle={Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition},
  pages={5386--5395},
  year={2020}
}

COCO-WholeBody (ECCV'2020)

@inproceedings{jin2020whole,
  title={Whole-Body Human Pose Estimation in the Wild},
  author={Jin, Sheng and Xu, Lumin and Xu, Jin and Wang, Can and Liu, Wentao and Qian, Chen and Ouyang, Wanli and Luo, Ping},
  booktitle={Proceedings of the European Conference on Computer Vision (ECCV)},
  year={2020}
}

Results on COCO-WholeBody v1.0 val without multi-scale test

Arch	Input Size	Body AP	Body AR	Foot AP	Foot AR	Face AP	Face AR	Hand AP	Hand AR	Whole AP	Whole AR	ckpt	log
HigherHRNet-w32+	512x512	0.590	0.672	0.185	0.335	0.676	0.721	0.212	0.298	0.401	0.493	ckpt	log
HigherHRNet-w48+	512x512	0.630	0.706	0.440	0.573	0.730	0.777	0.389	0.477	0.487	0.574	ckpt	log

Note: + means the model is first pre-trained on original COCO dataset, and then fine-tuned on COCO-WholeBody dataset. We find this will lead to better performance.

Topdown Heatmap + Resnet on Coco-Wholebody¶

SimpleBaseline2D (ECCV'2018)

@inproceedings{xiao2018simple,
  title={Simple baselines for human pose estimation and tracking},
  author={Xiao, Bin and Wu, Haiping and Wei, Yichen},
  booktitle={Proceedings of the European conference on computer vision (ECCV)},
  pages={466--481},
  year={2018}
}

COCO-WholeBody (ECCV'2020)

@inproceedings{jin2020whole,
  title={Whole-Body Human Pose Estimation in the Wild},
  author={Jin, Sheng and Xu, Lumin and Xu, Jin and Wang, Can and Liu, Wentao and Qian, Chen and Ouyang, Wanli and Luo, Ping},
  booktitle={Proceedings of the European Conference on Computer Vision (ECCV)},
  year={2020}
}

Results on COCO-WholeBody v1.0 val with detector having human AP of 56.4 on COCO val2017 dataset

Arch	Input Size	Body AP	Body AR	Foot AP	Foot AR	Face AP	Face AR	Hand AP	Hand AR	Whole AP	Whole AR	ckpt	log
pose_resnet_50	256x192	0.652	0.739	0.614	0.746	0.608	0.716	0.460	0.584	0.520	0.633	ckpt	log
pose_resnet_50	384x288	0.666	0.747	0.635	0.763	0.732	0.812	0.537	0.647	0.573	0.671	ckpt	log
pose_resnet_101	256x192	0.670	0.754	0.640	0.767	0.611	0.723	0.463	0.589	0.533	0.647	ckpt	log
pose_resnet_101	384x288	0.692	0.770	0.680	0.798	0.747	0.822	0.549	0.658	0.597	0.692	ckpt	log
pose_resnet_152	256x192	0.682	0.764	0.662	0.788	0.624	0.728	0.482	0.606	0.548	0.661	ckpt	log
pose_resnet_152	384x288	0.703	0.780	0.693	0.813	0.751	0.825	0.559	0.667	0.610	0.705	ckpt	log

Topdown Heatmap + Hrnet on Coco-Wholebody¶

HRNet (CVPR'2019)

@inproceedings{sun2019deep,
  title={Deep high-resolution representation learning for human pose estimation},
  author={Sun, Ke and Xiao, Bin and Liu, Dong and Wang, Jingdong},
  booktitle={Proceedings of the IEEE conference on computer vision and pattern recognition},
  pages={5693--5703},
  year={2019}
}

COCO-WholeBody (ECCV'2020)

@inproceedings{jin2020whole,
  title={Whole-Body Human Pose Estimation in the Wild},
  author={Jin, Sheng and Xu, Lumin and Xu, Jin and Wang, Can and Liu, Wentao and Qian, Chen and Ouyang, Wanli and Luo, Ping},
  booktitle={Proceedings of the European Conference on Computer Vision (ECCV)},
  year={2020}
}

Results on COCO-WholeBody v1.0 val with detector having human AP of 56.4 on COCO val2017 dataset

Arch	Input Size	Body AP	Body AR	Foot AP	Foot AR	Face AP	Face AR	Hand AP	Hand AR	Whole AP	Whole AR	ckpt	log
pose_hrnet_w32	256x192	0.700	0.746	0.567	0.645	0.637	0.688	0.473	0.546	0.553	0.626	ckpt	log
pose_hrnet_w32	384x288	0.701	0.773	0.586	0.692	0.727	0.783	0.516	0.604	0.586	0.674	ckpt	log
pose_hrnet_w48	256x192	0.700	0.776	0.672	0.785	0.656	0.743	0.534	0.639	0.579	0.681	ckpt	log
pose_hrnet_w48	384x288	0.722	0.790	0.694	0.799	0.777	0.834	0.587	0.679	0.631	0.716	ckpt	log

Topdown Heatmap + Tcformer on Coco-Wholebody¶

TCFormer (CVPR'2022)

@inproceedings{zeng2022not,
  title={Not All Tokens Are Equal: Human-centric Visual Analysis via Token Clustering Transformer},
  author={Zeng, Wang and Jin, Sheng and Liu, Wentao and Qian, Chen and Luo, Ping and Ouyang, Wanli and Wang, Xiaogang},
  booktitle={Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition},
  pages={11101--11111},
  year={2022}
}

COCO-WholeBody (ECCV'2020)

@inproceedings{jin2020whole,
  title={Whole-Body Human Pose Estimation in the Wild},
  author={Jin, Sheng and Xu, Lumin and Xu, Jin and Wang, Can and Liu, Wentao and Qian, Chen and Ouyang, Wanli and Luo, Ping},
  booktitle={Proceedings of the European Conference on Computer Vision (ECCV)},
  year={2020}
}

Results on COCO-WholeBody v1.0 val with detector having human AP of 56.4 on COCO val2017 dataset

Arch	Input Size	Body AP	Body AR	Foot AP	Foot AR	Face AP	Face AR	Hand AP	Hand AR	Whole AP	Whole AR	ckpt	log
tcformer	256x192	0.691	0.769	0.690	0.809	0.650	0.747	0.534	0.647	0.574	0.678	ckpt	log

Topdown Heatmap + Vipnas on Coco-Wholebody¶

ViPNAS (CVPR'2021)

@article{xu2021vipnas,
  title={ViPNAS: Efficient Video Pose Estimation via Neural Architecture Search},
  author={Xu, Lumin and Guan, Yingda and Jin, Sheng and Liu, Wentao and Qian, Chen and Luo, Ping and Ouyang, Wanli and Wang, Xiaogang},
  booktitle={Proceedings of the IEEE conference on computer vision and pattern recognition},
  year={2021}
}

COCO-WholeBody (ECCV'2020)

@inproceedings{jin2020whole,
  title={Whole-Body Human Pose Estimation in the Wild},
  author={Jin, Sheng and Xu, Lumin and Xu, Jin and Wang, Can and Liu, Wentao and Qian, Chen and Ouyang, Wanli and Luo, Ping},
  booktitle={Proceedings of the European Conference on Computer Vision (ECCV)},
  year={2020}
}

Results on COCO-WholeBody v1.0 val with detector having human AP of 56.4 on COCO val2017 dataset

Arch	Input Size	Body AP	Body AR	Foot AP	Foot AR	Face AP	Face AR	Hand AP	Hand AR	Whole AP	Whole AR	ckpt	log
S-ViPNAS-MobileNetV3	256x192	0.619	0.700	0.477	0.608	0.585	0.689	0.386	0.505	0.473	0.578	ckpt	log
S-ViPNAS-Res50	256x192	0.643	0.726	0.553	0.694	0.587	0.698	0.410	0.529	0.495	0.607	ckpt	log

Topdown Heatmap + Vipnas + Dark on Coco-Wholebody¶

ViPNAS (CVPR'2021)

@article{xu2021vipnas,
  title={ViPNAS: Efficient Video Pose Estimation via Neural Architecture Search},
  author={Xu, Lumin and Guan, Yingda and Jin, Sheng and Liu, Wentao and Qian, Chen and Luo, Ping and Ouyang, Wanli and Wang, Xiaogang},
  booktitle={Proceedings of the IEEE conference on computer vision and pattern recognition},
  year={2021}
}

DarkPose (CVPR'2020)

@inproceedings{zhang2020distribution,
  title={Distribution-aware coordinate representation for human pose estimation},
  author={Zhang, Feng and Zhu, Xiatian and Dai, Hanbin and Ye, Mao and Zhu, Ce},
  booktitle={Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition},
  pages={7093--7102},
  year={2020}
}

COCO-WholeBody (ECCV'2020)

@inproceedings{jin2020whole,
  title={Whole-Body Human Pose Estimation in the Wild},
  author={Jin, Sheng and Xu, Lumin and Xu, Jin and Wang, Can and Liu, Wentao and Qian, Chen and Ouyang, Wanli and Luo, Ping},
  booktitle={Proceedings of the European Conference on Computer Vision (ECCV)},
  year={2020}
}

Results on COCO-WholeBody v1.0 val with detector having human AP of 56.4 on COCO val2017 dataset

Arch	Input Size	Body AP	Body AR	Foot AP	Foot AR	Face AP	Face AR	Hand AP	Hand AR	Whole AP	Whole AR	ckpt	log
S-ViPNAS-MobileNetV3_dark	256x192	0.632	0.710	0.530	0.660	0.672	0.771	0.404	0.519	0.508	0.607	ckpt	log
S-ViPNAS-Res50_dark	256x192	0.650	0.732	0.550	0.686	0.684	0.784	0.437	0.554	0.528	0.632	ckpt	log

Topdown Heatmap + Hrnet + Dark on Coco-Wholebody¶

HRNet (CVPR'2019)

@inproceedings{sun2019deep,
  title={Deep high-resolution representation learning for human pose estimation},
  author={Sun, Ke and Xiao, Bin and Liu, Dong and Wang, Jingdong},
  booktitle={Proceedings of the IEEE conference on computer vision and pattern recognition},
  pages={5693--5703},
  year={2019}
}

DarkPose (CVPR'2020)

@inproceedings{zhang2020distribution,
  title={Distribution-aware coordinate representation for human pose estimation},
  author={Zhang, Feng and Zhu, Xiatian and Dai, Hanbin and Ye, Mao and Zhu, Ce},
  booktitle={Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition},
  pages={7093--7102},
  year={2020}
}

COCO-WholeBody (ECCV'2020)

@inproceedings{jin2020whole,
  title={Whole-Body Human Pose Estimation in the Wild},
  author={Jin, Sheng and Xu, Lumin and Xu, Jin and Wang, Can and Liu, Wentao and Qian, Chen and Ouyang, Wanli and Luo, Ping},
  booktitle={Proceedings of the European Conference on Computer Vision (ECCV)},
  year={2020}
}

Results on COCO-WholeBody v1.0 val with detector having human AP of 56.4 on COCO val2017 dataset

Arch	Input Size	Body AP	Body AR	Foot AP	Foot AR	Face AP	Face AR	Hand AP	Hand AR	Whole AP	Whole AR	ckpt	log
pose_hrnet_w32_dark	256x192	0.694	0.764	0.565	0.674	0.736	0.808	0.503	0.602	0.582	0.671	ckpt	log
pose_hrnet_w48_dark+	384x288	0.742	0.807	0.705	0.804	0.840	0.892	0.602	0.694	0.661	0.743	ckpt	log

Note: + means the model is first pre-trained on original COCO dataset, and then fine-tuned on COCO-WholeBody dataset. We find this will lead to better performance.

Halpe Dataset¶

Topdown Heatmap + Hrnet + Dark on Halpe¶

HRNet (CVPR'2019)

@inproceedings{sun2019deep,
  title={Deep high-resolution representation learning for human pose estimation},
  author={Sun, Ke and Xiao, Bin and Liu, Dong and Wang, Jingdong},
  booktitle={Proceedings of the IEEE conference on computer vision and pattern recognition},
  pages={5693--5703},
  year={2019}
}

DarkPose (CVPR'2020)

@inproceedings{zhang2020distribution,
  title={Distribution-aware coordinate representation for human pose estimation},
  author={Zhang, Feng and Zhu, Xiatian and Dai, Hanbin and Ye, Mao and Zhu, Ce},
  booktitle={Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition},
  pages={7093--7102},
  year={2020}
}

Halpe (CVPR'2020)

@inproceedings{li2020pastanet,
  title={PaStaNet: Toward Human Activity Knowledge Engine},
  author={Li, Yong-Lu and Xu, Liang and Liu, Xinpeng and Huang, Xijie and Xu, Yue and Wang, Shiyi and Fang, Hao-Shu and Ma, Ze and Chen, Mingyang and Lu, Cewu},
  booktitle={CVPR},
  year={2020}
}

Results on Halpe v1.0 val with detector having human AP of 56.4 on COCO val2017 dataset

Arch	Input Size	Whole AP	Whole AR	ckpt	log
pose_hrnet_w48_dark+	384x288	0.527	0.620	ckpt	log

Note: + means the model is first pre-trained on original COCO dataset, and then fine-tuned on Halpe dataset. We find this will lead to better performance.