API Documentation


mmpose.apis.get_track_id(results, results_last, next_id, min_keypoints=3, use_oks=False, tracking_thr=0.3, use_one_euro=False, fps=None)[source]

Get track id for each person instance on the current frame.

  • results (list[dict]) – The bbox & pose results of the current frame (bbox_result, pose_result).

  • results_last (list[dict]) – The bbox & pose & track_id info of the last frame (bbox_result, pose_result, track_id).

  • next_id (int) – The track id for the new person instance.

  • min_keypoints (int) – Minimum number of keypoints recognized as person. default: 3.

  • use_oks (bool) – Flag to using oks tracking. default: False.

  • tracking_thr (float) – The threshold for tracking.

  • use_one_euro (bool) – Option to use one-euro-filter. default: False.

  • fps (optional) – Parameters that d_cutoff when one-euro-filter is used as a video input


The bbox & pose & track_id info of the

current frame (bbox_result, pose_result, track_id).

int: The track id for the new person instance.

Return type


mmpose.apis.inference_bottom_up_pose_model(model, img_or_path, pose_nms_thr=0.9, return_heatmap=False, outputs=None)[source]

Inference a single image.

num_people: P num_keypoints: K bbox height: H bbox width: W

  • model (nn.Module) – The loaded pose model.

  • img_or_path (str| np.ndarray) – Image filename or loaded image.

  • pose_nms_thr (float) – retain oks overlap < pose_nms_thr, default: 0.9.

  • return_heatmap (bool) – Flag to return heatmap, default: False.

  • outputs (list(str) | tuple(str)) – Names of layers whose outputs need to be returned, default: None.


The predicted pose info.

The length of the list is the number of people (P). Each item in the list is a ndarray, containing each person’s pose (ndarray[Kx3]): x, y, score.

list[dict[np.ndarray[N, K, H, W] | torch.tensor[N, K, H, W]]]:

Output feature maps from layers specified in outputs. Includes ‘heatmap’ if return_heatmap is True.

Return type


mmpose.apis.inference_top_down_pose_model(model, img_or_path, person_results, bbox_thr=None, format='xywh', dataset='TopDownCocoDataset', return_heatmap=False, outputs=None)[source]

Inference a single image with a list of person bounding boxes.

num_people: P num_keypoints: K bbox height: H bbox width: W

  • model (nn.Module) – The loaded pose model.

  • img_or_path (str| np.ndarray) – Image filename or loaded image.

  • person_results (List(dict)) – the item in the dict may contain ‘bbox’ and/or ‘track_id’. ‘bbox’ (4, ) or (5, ): The person bounding box, which contains 4 box coordinates (and score). ‘track_id’ (int): The unique id for each human instance.

  • bbox_thr – Threshold for bounding boxes. Only bboxes with higher scores will be fed into the pose detector. If bbox_thr is None, ignore it.

  • format – bbox format (‘xyxy’ | ‘xywh’). Default: ‘xywh’. ‘xyxy’ means (left, top, right, bottom), ‘xywh’ means (left, top, width, height).

  • dataset (str) – Dataset name, e.g. ‘TopDownCocoDataset’.

  • return_heatmap (bool) – Flag to return heatmap, default: False

  • outputs (list(str) | tuple(str)) – Names of layers whose outputs need to be returned, default: None


The bbox & pose info,

Each item in the list is a dictionary, containing the bbox: (left, top, right, bottom, [score]) and the pose (ndarray[Kx3]): x, y, score

list[dict[np.ndarray[N, K, H, W] | torch.tensor[N, K, H, W]]]:

Output feature maps from layers specified in outputs. Includes ‘heatmap’ if return_heatmap is True.

Return type


mmpose.apis.init_pose_model(config, checkpoint=None, device='cuda:0')[source]

Initialize a pose model from config file.

  • config (str or mmcv.Config) – Config file path or the config object.

  • checkpoint (str, optional) – Checkpoint path. If left as None, the model will not load any weights.


The constructed detector.

Return type


mmpose.apis.multi_gpu_test(model, data_loader, tmpdir=None, gpu_collect=False)[source]

Test model with multiple gpus.

This method tests model with multiple gpus and collects the results under two different modes: gpu and cpu modes. By setting ‘gpu_collect=True’ it encodes results to gpu tensors and use gpu communication for results collection. On cpu mode it saves the results on different gpus to ‘tmpdir’ and collects them by the rank 0 worker.

  • model (nn.Module) – Model to be tested.

  • data_loader (nn.Dataloader) – Pytorch data loader.

  • tmpdir (str) – Path of directory to save the temporary results from different gpus under cpu mode.

  • gpu_collect (bool) – Option to use either gpu or cpu to collect results.


The prediction results.

Return type


mmpose.apis.single_gpu_test(model, data_loader)[source]

Test model with a single gpu.

This method tests model with a single gpu and displays test progress bar.

  • model (nn.Module) – Model to be tested.

  • data_loader (nn.Dataloader) – Pytorch data loader.


The prediction results.

Return type


mmpose.apis.train_model(model, dataset, cfg, distributed=False, validate=False, timestamp=None, meta=None)[source]

Train model entry function.

  • model (nn.Module) – The model to be trained.

  • dataset (Dataset) – Train dataset.

  • cfg (dict) – The config dict for training.

  • distributed (bool) – Whether to use distributed training. Default: False.

  • validate (bool) – Whether to do evaluation. Default: False.

  • timestamp (str | None) – Local time for runner. Default: None.

  • meta (dict | None) – Meta dict to record some important information. Default: None

mmpose.apis.vis_pose_result(model, img, result, kpt_score_thr=0.3, dataset='TopDownCocoDataset', show=False, out_file=None)[source]

Visualize the detection results on the image.

  • model (nn.Module) – The loaded detector.

  • img (str | np.ndarray) – Image filename or loaded image.

  • result (list[dict]) – The results to draw over img (bbox_result, pose_result).

  • kpt_score_thr (float) – The threshold to visualize the keypoints.

  • skeleton (list[tuple()]) – Default None.

  • show (bool) – Whether to show the image. Default True.

  • out_file (str|None) – The filename of the output visualization image.

mmpose.apis.vis_pose_tracking_result(model, img, result, kpt_score_thr=0.3, dataset='TopDownCocoDataset', show=False, out_file=None)[source]

Visualize the pose tracking results on the image.

  • model (nn.Module) – The loaded detector.

  • img (str | np.ndarray) – Image filename or loaded image.

  • result (list[dict]) – The results to draw over img (bbox_result, pose_result).

  • kpt_score_thr (float) – The threshold to visualize the keypoints.

  • skeleton (list[tuple()]) – Default None.

  • show (bool) – Whether to show the image. Default True.

  • out_file (str|None) – The filename of the output visualization image.



class mmpose.core.evaluation.DistEvalHook(dataloader, interval=1, gpu_collect=False, save_best=True, key_indicator='AP', rule=None, **eval_kwargs)[source]

Distributed evaluation hook.

This hook will regularly perform evaluation in a given interval when performing in distributed environment.

  • dataloader (DataLoader) – A PyTorch dataloader.

  • interval (int) – Evaluation interval (by epochs). Default: 1.

  • gpu_collect (bool) – Whether to use gpu or cpu to collect results. Default: False.

  • save_best (bool) – Whether to save best checkpoint during evaluation. Default: True.

  • key_indicator (str | None) – Key indicator to measure the best checkpoint during evaluation when save_best is set to True. Options are the evaluation metrics to the test dataset. e.g., top1_acc, top5_acc, mean_class_accuracy, mean_average_precision for action recognition dataset (RawframeDataset and VideoDataset). AR@AN, auc for action localization dataset (ActivityNetDataset). Default: top1_acc.

  • rule (str | None) – Comparison rule for best score. If set to None, it will infer a reasonable rule. Default: ‘None’.

  • eval_kwargs (dict, optional) – Arguments for evaluation.


Called after each training epoch to evaluate the model.

class mmpose.core.evaluation.EvalHook(dataloader, interval=1, gpu_collect=False, save_best=True, key_indicator='AP', rule=None, **eval_kwargs)[source]

Non-Distributed evaluation hook.

This hook will regularly perform evaluation in a given interval when performing in non-distributed environment.

  • dataloader (DataLoader) – A PyTorch dataloader.

  • interval (int) – Evaluation interval (by epochs). Default: 1.

  • gpu_collect (bool) – Whether to use gpu or cpu to collect results. Default: False.

  • save_best (bool) – Whether to save best checkpoint during evaluation. Default: True.

  • key_indicator (str | None) – Key indicator to measure the best checkpoint during evaluation when save_best is set to True. Options are the evaluation metrics to the test dataset. e.g., acc, AP, PCK. Default: AP.

  • rule (str | None) – Comparison rule for best score. If set to None, it will infer a reasonable rule. Default: ‘None’.

  • eval_kwargs (dict, optional) – Arguments for evaluation.


Called after every training epoch to evaluate the results.

evaluate(runner, results)[source]

Evaluate the results.

  • runner (mmcv.Runner) – The underlined training runner.

  • results (list) – Output results.

mmpose.core.evaluation.aggregate_results(scale, aggregated_heatmaps, tags_list, heatmaps, tags, test_scale_factor, project2image, flip_test, align_corners=False)[source]

Aggregate multi-scale outputs.


batch size: N keypoints num : K heatmap width: W heatmap height: H

  • scale (int) – current scale

  • aggregated_heatmaps (torch.Tensor | None) – Aggregated heatmaps.

  • tags_list (list(torch.Tensor)) – Tags list of previous scale.

  • heatmaps (List(torch.Tensor[NxKxWxH])) – A batch of heatmaps.

  • tags (List(torch.Tensor[NxKxWxH])) – A batch of tag maps.

  • test_scale_factor (List(int)) – Multi-scale factor for testing.

  • project2image (bool) – Option to resize to base scale.

  • flip_test (bool) – Option to use flip test.

  • align_corners (bool) – Align corners when performing interpolation.


a tuple containing aggregated results.

  • aggregated_heatmaps (torch.Tensor): Heatmaps with multi scale.

  • tags_list (list(torch.Tensor)): Tag list of multi scale.

Return type


mmpose.core.evaluation.compute_similarity_transform(source_points, target_points)[source]

Computes a similarity transform (sR, t) that takes a set of 3D points source_points (N x 3) closest to a set of 3D points target_points, where R is an 3x3 rotation matrix, t 3x1 translation, s scale. And return the transformed 3D points source_points_hat (N x 3). i.e. solves the orthogonal Procrutes problem.


Points number: N

  • source_points (np.ndarray([N, 3])) – Source point set.

  • target_points (np.ndarray([N, 3])) – Target point set.


Transformed source point set.

Return type

source_points_hat (np.ndarray([N, 3]))

mmpose.core.evaluation.get_group_preds(grouped_joints, center, scale, heatmap_size, use_udp=False)[source]

Transform the grouped joints back to the image.

  • grouped_joints (list) – Grouped person joints.

  • center (np.ndarray[2, ]) – Center of the bounding box (x, y).

  • scale (np.ndarray[2, ]) – Scale of the bounding box wrt [width, height].

  • heatmap_size (np.ndarray[2, ]) – Size of the destination heatmaps.

  • use_udp (bool) – Unbiased data processing. Paper ref: Huang et al. The Devil is in the Details: Delving into Unbiased Data Processing for Human Pose Estimation (CVPR 2020).


List of the pose result for each person.

Return type


mmpose.core.evaluation.get_multi_stage_outputs(outputs, outputs_flip, num_joints, with_heatmaps, with_ae, tag_per_joint=True, flip_index=None, project2image=True, size_projected=None, align_corners=False)[source]

Inference the model to get multi-stage outputs (heatmaps & tags), and resize them to base sizes.

  • outputs (list(torch.Tensor)) – Outputs of network

  • outputs_flip (list(torch.Tensor)) – Flip outputs of network

  • num_joints (int) – Number of joints

  • with_heatmaps (list[bool]) – Option to output heatmaps for different stages.

  • with_ae (list[bool]) – Option to output ae tags for different stages.

  • tag_per_joint (bool) – Option to use one tag map per joint.

  • flip_index (list[int]) – Keypoint flip index.

  • project2image (bool) – Option to resize to base scale.

  • size_projected ([w, h]) – Base size of heatmaps.

  • align_corners (bool) – Align corners when performing interpolation.


A tuple containing multi-stage outputs.

  • outputs (list(torch.Tensor)): List of simple outputs and flip outputs.

  • heatmaps (torch.Tensor): Multi-stage heatmaps that are resized to the base size.

  • tags (torch.Tensor): Multi-stage tags that are resized to the base size.

Return type


mmpose.core.evaluation.keypoint_auc(pred, gt, mask, normalize, num_step=20)[source]

Calculate the pose accuracy of PCK for each individual keypoint and the averaged accuracy across all keypoints for coordinates.


batch_size: N num_keypoints: K

  • pred (np.ndarray[N, K, 2]) – Predicted keypoint location.

  • gt (np.ndarray[N, K, 2]) – Groundtruth keypoint location.

  • mask (np.ndarray[N, K]) – Visibility of the target. False for invisible joints, and True for visible. Invisible joints will be ignored for accuracy calculation.

  • normalize (float) – Normalization factor.


Area under curve.

Return type


mmpose.core.evaluation.keypoint_epe(pred, gt, mask)[source]

Calculate the end-point error.


batch_size: N num_keypoints: K

  • pred (np.ndarray[N, K, 2]) – Predicted keypoint location.

  • gt (np.ndarray[N, K, 2]) – Groundtruth keypoint location.

  • mask (np.ndarray[N, K]) – Visibility of the target. False for invisible joints, and True for visible. Invisible joints will be ignored for accuracy calculation.


Average end-point error.

Return type


mmpose.core.evaluation.keypoint_mpjpe(pred, gt, mask, alignment='none')[source]

Calculate the mean per-joint position error (MPJPE) and the error after rigid alignment with the ground truth (P-MPJPE).

batch_size: N num_keypoints: K keypoint_dims: C

  • pred (np.ndarray[N, K, C]) – Predicted keypoint location.

  • gt (np.ndarray[N, K, C]) – Groundtruth keypoint location.

  • mask (np.ndarray[N, K]) – Visibility of the target. False for invisible joints, and True for visible. Invisible joints will be ignored for accuracy calculation.

  • alignment (str, optional) –

    method to align the prediction with the groundtruth. Supported options are: - 'none': no alignment will be applied - 'scale': align in the least-square sense in scale - 'procrustes': align in the least-square sense in scale,

    rotation and translation.


A tuple containing joint position errors

  • mpjpe (float|np.ndarray[N]): mean per-joint position error.

  • p-mpjpe (float|np.ndarray[N]): mpjpe after rigid alignment with the

    ground truth

Return type


mmpose.core.evaluation.keypoint_pck_accuracy(pred, gt, mask, thr, normalize)[source]

Calculate the pose accuracy of PCK for each individual keypoint and the averaged accuracy across all keypoints for coordinates.


PCK metric measures accuracy of the localization of the body joints. The distances between predicted positions and the ground-truth ones are typically normalized by the bounding box size. The threshold (thr) of the normalized distance is commonly set as 0.05, 0.1 or 0.2 etc.

batch_size: N num_keypoints: K

  • pred (np.ndarray[N, K, 2]) – Predicted keypoint location.

  • gt (np.ndarray[N, K, 2]) – Groundtruth keypoint location.

  • mask (np.ndarray[N, K]) – Visibility of the target. False for invisible joints, and True for visible. Invisible joints will be ignored for accuracy calculation.

  • thr (float) – Threshold of PCK calculation.

  • normalize (np.ndarray[N, 2]) – Normalization factor for H&W.


A tuple containing keypoint accuracy.

  • acc (np.ndarray[K]): Accuracy of each keypoint.

  • avg_acc (float): Averaged accuracy across all keypoints.

  • cnt (int): Number of valid keypoints.

Return type


mmpose.core.evaluation.keypoints_from_heatmaps(heatmaps, center, scale, unbiased=False, post_process='default', kernel=11, valid_radius_factor=0.0546875, use_udp=False, target_type='GaussianHeatMap')[source]

Get final keypoint predictions from heatmaps and transform them back to the image.


batch size: N num keypoints: K heatmap height: H heatmap width: W

  • heatmaps (np.ndarray[N, K, H, W]) – model predicted heatmaps.

  • center (np.ndarray[N, 2]) – Center of the bounding box (x, y).

  • scale (np.ndarray[N, 2]) – Scale of the bounding box wrt height/width.

  • post_process (str/None) – Choice of methods to post-process heatmaps. Currently supported: None, ‘default’, ‘unbiased’, ‘megvii’.

  • unbiased (bool) – Option to use unbiased decoding. Mutually exclusive with megvii. Note: this arg is deprecated and unbiased=True can be replaced by post_process=’unbiased’ Paper ref: Zhang et al. Distribution-Aware Coordinate Representation for Human Pose Estimation (CVPR 2020).

  • kernel (int) – Gaussian kernel size (K) for modulation, which should match the heatmap gaussian sigma when training. K=17 for sigma=3 and k=11 for sigma=2.

  • valid_radius_factor (float) – The radius factor of the positive area in classification heatmap for UDP.

  • use_udp (bool) – Use unbiased data processing.

  • target_type (str) – ‘GaussianHeatMap’ or ‘CombinedTarget’. GaussianHeatMap: Classification target with gaussian distribution. CombinedTarget: The combination of classification target (response map) and regression target (offset map). Paper ref: Huang et al. The Devil is in the Details: Delving into Unbiased Data Processing for Human Pose Estimation (CVPR 2020).


A tuple containing keypoint predictions and scores.

  • preds (np.ndarray[N, K, 2]): Predicted keypoint location in images.

  • maxvals (np.ndarray[N, K, 1]): Scores (confidence) of the keypoints.

Return type


mmpose.core.evaluation.keypoints_from_regression(regression_preds, center, scale, img_size)[source]

Get final keypoint predictions from regression vectors and transform them back to the image.


batch_size: N num_keypoints: K

  • regression_preds (np.ndarray[N, K, 2]) – model prediction.

  • center (np.ndarray[N, 2]) – Center of the bounding box (x, y).

  • scale (np.ndarray[N, 2]) – Scale of the bounding box wrt height/width.

  • img_size (list(img_width, img_height)) – model input image size.


Predicted keypoint location in images. maxvals (np.ndarray[N, K, 1]): Scores (confidence) of the keypoints.

Return type

preds (np.ndarray[N, K, 2])

mmpose.core.evaluation.pose_pck_accuracy(output, target, mask, thr=0.05, normalize=None)[source]

Calculate the pose accuracy of PCK for each individual keypoint and the averaged accuracy across all keypoints from heatmaps.


PCK metric measures accuracy of the localization of the body joints. The distances between predicted positions and the ground-truth ones are typically normalized by the bounding box size. The threshold (thr) of the normalized distance is commonly set as 0.05, 0.1 or 0.2 etc.

batch_size: N num_keypoints: K heatmap height: H heatmap width: W

  • output (np.ndarray[N, K, H, W]) – Model output heatmaps.

  • target (np.ndarray[N, K, H, W]) – Groundtruth heatmaps.

  • mask (np.ndarray[N, K]) – Visibility of the target. False for invisible joints, and True for visible. Invisible joints will be ignored for accuracy calculation.

  • thr (float) – Threshold of PCK calculation. Default 0.05.

  • normalize (np.ndarray[N, 2]) – Normalization factor for H&W.


A tuple containing keypoint accuracy.

  • np.ndarray[K]: Accuracy of each keypoint.

  • float: Averaged accuracy across all keypoints.

  • int: Number of valid keypoints.

Return type


mmpose.core.evaluation.post_dark_udp(coords, batch_heatmaps, kernel=3)[source]

DARK post-pocessing. Implemented by udp. Paper ref: Huang et al. The Devil is in the Details: Delving into Unbiased Data Processing for Human Pose Estimation (CVPR 2020). Zhang et al. Distribution-Aware Coordinate Representation for Human Pose Estimation (CVPR 2020).


batch size: B num keypoints: K num persons: N hight of heatmaps: H width of heatmaps: W B=1 for bottom_up paradigm where all persons share the same heatmap. B=N for top_down paradigm where each person has its own heatmaps.

  • coords (np.ndarray[N, K, 2]) – Initial coordinates of human pose.

  • batch_heatmaps (np.ndarray[B, K, H, W]) – batch_heatmaps

  • kernel (int) – Gaussian kernel size (K) for modulation.


Refined coordinates.

Return type

res (np.ndarray[N, K, 2])


class mmpose.core.fp16.Fp16OptimizerHook(grad_clip=None, coalesce=True, bucket_size_mb=- 1, loss_scale=512.0, distributed=True)[source]

FP16 optimizer hook.

The steps of fp16 optimizer is as follows. 1. Scale the loss value. 2. BP in the fp16 model. 2. Copy gradients from fp16 model to fp32 weights. 3. Update fp32 weights. 4. Copy updated parameters from fp32 weights to fp16 model.

Refer to https://arxiv.org/abs/1710.03740 for more details.


loss_scale (float) – Scale factor multiplied with loss.


Backward optimization steps for Mixed Precision Training.

  1. Scale the loss by a scale factor.

  2. Backward the loss to obtain the gradients (fp16).

  3. Copy gradients from the model to the fp32 weight copy.

  4. Scale the gradients back and update the fp32 weight copy.

  5. Copy back the params from fp32 weight copy to the fp16 model.


runner (mmcv.Runner) – The underlines training runner.


Preparing steps before Mixed Precision Training.

  1. Make a master copy of fp32 weights for optimization.

  2. Convert the main model from fp32 to fp16.


runner (mmcv.Runner) – The underlines training runner.

static copy_grads_to_fp32(fp16_net, fp32_weights)[source]

Copy gradients from fp16 model to fp32 weight copy.

static copy_params_to_fp16(fp16_net, fp32_weights)[source]

Copy updated params from fp32 weight copy to fp16 model.

mmpose.core.fp16.auto_fp16(apply_to=None, out_fp32=False)[source]

Decorator to enable fp16 training automatically.

This decorator is useful when you write custom modules and want to support mixed precision training. If inputs arguments are fp32 tensors, they will be converted to fp16 automatically. Arguments other than fp32 tensors are ignored.

  • apply_to (Iterable, optional) – The argument names to be converted. None indicates all arguments.

  • out_fp32 (bool) – Whether to convert the output back to fp32.


>>> import torch.nn as nn
>>> class MyModule1(nn.Module):
>>>     # Convert x and y to fp16
>>>     @auto_fp16()
>>>     def forward(self, x, y):
>>>         pass
>>> import torch.nn as nn
>>> class MyModule2(nn.Module):
>>>     # convert pred to fp16
>>>     @auto_fp16(apply_to=('pred', ))
>>>     def do_something(self, pred, others):
>>>         pass
mmpose.core.fp16.cast_tensor_type(inputs, src_type, dst_type)[source]

Recursively convert Tensor in inputs from src_type to dst_type.

  • inputs – Inputs that to be casted.

  • src_type (torch.dtype) – Source type.

  • dst_type (torch.dtype) – Destination type.


The same type with inputs, but all contained Tensors have been cast.

mmpose.core.fp16.force_fp32(apply_to=None, out_fp16=False)[source]

Decorator to convert input arguments to fp32 in force.

This decorator is useful when you write custom modules and want to support mixed precision training. If there are some inputs that must be processed in fp32 mode, then this decorator can handle it. If inputs arguments are fp16 tensors, they will be converted to fp32 automatically. Arguments other than fp16 tensors are ignored.

  • apply_to (Iterable, optional) – The argument names to be converted. None indicates all arguments.

  • out_fp16 (bool) – Whether to convert the output back to fp16.


>>> import torch.nn as nn
>>> class MyModule1(nn.Module):
>>>     # Convert x and y to fp32
>>>     @force_fp32()
>>>     def loss(self, x, y):
>>>         pass
>>> import torch.nn as nn
>>> class MyModule2(nn.Module):
>>>     # convert pred to fp32
>>>     @force_fp32(apply_to=('pred', ))
>>>     def post_process(self, pred, others):
>>>         pass

Wrap the FP32 model to FP16.

  1. Convert FP32 model to FP16.

  2. Remain some necessary layers to be FP32, e.g., normalization layers.


model (nn.Module) – Model in FP32.


class mmpose.core.utils.WeightNormClipHook(max_norm=1.0, module_param_names='weight')[source]

Apply weight norm clip regularization.

The module’s parameter will be clip to a given maximum norm before each forward pass.

  • max_norm (float) – The maximum norm of the parameter.

  • module_param_names (str|list) – The parameter name (or name list) to apply weight norm clip.

hook(module, _input)[source]

Hook function.

property hook_type

Hook type Subclasses should overwrite this function to return a string value in.

{forward, forward_pre, backward}

mmpose.core.utils.allreduce_grads(params, coalesce=True, bucket_size_mb=- 1)[source]

Allreduce gradients.

  • params (list[torch.Parameters]) – List of parameters of a model

  • coalesce (bool, optional) – Whether allreduce parameters as a whole. Default: True.

  • bucket_size_mb (int, optional) – Size of bucket, the unit is MB. Default: -1.


mmpose.core.post_processing.affine_transform(pt, trans_mat)[source]

Apply an affine transformation to the points.

  • pt (np.ndarray) – a 2 dimensional point to be transformed

  • trans_mat (np.ndarray) – 2x3 matrix of an affine transform


Transformed points.

Return type


mmpose.core.post_processing.flip_back(output_flipped, flip_pairs, target_type='GaussianHeatMap')[source]

Flip the flipped heatmaps back to the original form.


batch_size: N num_keypoints: K heatmap height: H heatmap width: W

  • output_flipped (np.ndarray[N, K, H, W]) – The output heatmaps obtained from the flipped images.

  • flip_pairs (list[tuple()) – Pairs of keypoints which are mirrored (for example, left ear – right ear).

  • target_type (str) – GaussianHeatMap or CombinedTarget


heatmaps that flipped back to the original image

Return type


mmpose.core.post_processing.fliplr_joints(joints_3d, joints_3d_visible, img_width, flip_pairs)[source]

Flip human joints horizontally.


num_keypoints: K

  • joints_3d (np.ndarray([K, 3])) – Coordinates of keypoints.

  • joints_3d_visible (np.ndarray([K, 1])) – Visibility of keypoints.

  • img_width (int) – Image width.

  • flip_pairs (list[tuple()]) – Pairs of keypoints which are mirrored (for example, left ear – right ear).


Flipped human joints.

  • joints_3d_flipped (np.ndarray([K, 3])): Flipped joints.

  • joints_3d_visible_flipped (np.ndarray([K, 1])): Joint visibility.

Return type


mmpose.core.post_processing.fliplr_regression(regression, flip_pairs, center_mode='static', center_x=0.5, center_index=0)[source]

Flip human joints horizontally.


batch_size: N num_keypoint: K

  • regression (np.ndarray([..., K, C])) –

    Coordinates of keypoints, where K is the joint number and C is the dimension. Example shapes are: - [N, K, C]: a batch of keypoints where N is the batch size. - [N, T, K, C]: a batch of pose sequences, where T is the frame


  • flip_pairs (list[tuple()]) – Pairs of keypoints which are mirrored (for example, left ear – right ear).

  • center_mode (str) – The mode to set the center location on the x-axis to flip around. Options are: - static: use a static x value (see center_x also) - root: use a root joint (see center_index also)

  • center_x (float) – Set the x-axis location of the flip center. Only used when center_mode=static.

  • center_index (int) – Set the index of the root joint, whose x location will be used as the flip center. Only used when center_mode=root.


Flipped human joints.

  • regression_flipped (np.ndarray([…, K, C])): Flipped joints.

Return type


mmpose.core.post_processing.get_affine_transform(center, scale, rot, output_size, shift=(0.0, 0.0), inv=False)[source]

Get the affine transform matrix, given the center/scale/rot/output_size.

  • center (np.ndarray[2, ]) – Center of the bounding box (x, y).

  • scale (np.ndarray[2, ]) – Scale of the bounding box wrt [width, height].

  • rot (float) – Rotation angle (degree).

  • output_size (np.ndarray[2, ] | list(2,)) – Size of the destination heatmaps.

  • shift (0-100%) – Shift translation ratio wrt the width/height. Default (0., 0.).

  • inv (bool) – Option to inverse the affine transform direction. (inv=False: src->dst or inv=True: dst->src)


The transform matrix.

Return type


mmpose.core.post_processing.get_warp_matrix(theta, size_input, size_dst, size_target)[source]

Calculate the transformation matrix under the constraint of unbiased. Paper ref: Huang et al. The Devil is in the Details: Delving into Unbiased Data Processing for Human Pose Estimation (CVPR 2020).

  • theta (float) – Rotation angle in degrees.

  • size_input (np.ndarray) – Size of input image [w, h].

  • size_dst (np.ndarray) – Size of output image [w, h].

  • size_target (np.ndarray) – Size of ROI in input plane [w, h].


A matrix for transformation.

Return type

matrix (np.ndarray)

mmpose.core.post_processing.oks_iou(g, d, a_g, a_d, sigmas=None, vis_thr=None)[source]

Calculate oks ious.

  • g – Ground truth keypoints.

  • d – Detected keypoints.

  • a_g – Area of the ground truth object.

  • a_d – Area of the detected object.

  • sigmas – standard deviation of keypoint labelling.

  • vis_thr – threshold of the keypoint visibility.


The oks ious.

Return type


mmpose.core.post_processing.oks_nms(kpts_db, thr, sigmas=None, vis_thr=None)[source]

OKS NMS implementations.

  • kpts_db – keypoints.

  • thr – Retain overlap < thr.

  • sigmas – standard deviation of keypoint labelling.

  • vis_thr – threshold of the keypoint visibility.


indexes to keep.

Return type


mmpose.core.post_processing.rotate_point(pt, angle_rad)[source]

Rotate a point by an angle.

  • pt (list[float]) – 2 dimensional point to be rotated

  • angle_rad (float) – rotation angle by radian


Rotated point.

Return type


mmpose.core.post_processing.soft_oks_nms(kpts_db, thr, max_dets=20, sigmas=None, vis_thr=None)[source]

Soft OKS NMS implementations.

  • kpts_db

  • thr – retain oks overlap < thr.

  • max_dets – max number of detections to keep.

  • sigmas – Keypoint labelling uncertainty.


indexes to keep.

Return type


mmpose.core.post_processing.transform_preds(coords, center, scale, output_size, use_udp=False)[source]

Get final keypoint predictions from heatmaps and apply scaling and translation to map them back to the image.


num_keypoints: K

  • coords (np.ndarray[K, ndims]) –

    • If ndims=2, corrds are predicted keypoint location.

    • If ndims=4, corrds are composed of (x, y, scores, tags)

    • If ndims=5, corrds are composed of (x, y, scores, tags, flipped_tags)

  • center (np.ndarray[2, ]) – Center of the bounding box (x, y).

  • scale (np.ndarray[2, ]) – Scale of the bounding box wrt [width, height].

  • output_size (np.ndarray[2, ] | list(2,)) – Size of the destination heatmaps.

  • use_udp (bool) – Use unbiased data processing


Predicted coordinates in the images.

Return type


mmpose.core.post_processing.warp_affine_joints(joints, mat)[source]

Apply affine transformation defined by the transform matrix on the joints.

  • joints (np.ndarray[..., 2]) – Origin coordinate of joints.

  • mat (np.ndarray[3, 2]) – The affine matrix.


Result coordinate of joints.

Return type

matrix (np.ndarray[…, 2])



class mmpose.models.backbones.AlexNet(num_classes=- 1)[source]

AlexNet backbone.

The input for AlexNet is a 224x224 RGB image.


num_classes (int) – number of classes for classification. The default value is -1, which uses the backbone as a feature extractor without the top classifier.


Forward function.


x (tensor | tuple[tensor]) – x could be a Torch.tensor or a tuple of Torch.tensor, containing input data for forward computation.

class mmpose.models.backbones.CPM(in_channels, out_channels, feat_channels=128, middle_channels=32, num_stages=6, norm_cfg={'requires_grad': True, 'type': 'BN'})[source]

CPM backbone.

Convolutional Pose Machines. More details can be found in the paper .

  • in_channels (int) – The input channels of the CPM.

  • out_channels (int) – The output channels of the CPM.

  • feat_channels (int) – Feature channel of each CPM stage.

  • middle_channels (int) – Feature channel of conv after the middle stage.

  • num_stages (int) – Number of stages.

  • norm_cfg (dict) – Dictionary to construct and config norm layer.


>>> from mmpose.models import CPM
>>> import torch
>>> self = CPM(3, 17)
>>> self.eval()
>>> inputs = torch.rand(1, 3, 368, 368)
>>> level_outputs = self.forward(inputs)
>>> for level_output in level_outputs:
...     print(tuple(level_output.shape))
(1, 17, 46, 46)
(1, 17, 46, 46)
(1, 17, 46, 46)
(1, 17, 46, 46)
(1, 17, 46, 46)
(1, 17, 46, 46)

Model forward function.


Initialize the weights in backbone.


pretrained (str, optional) – Path to pre-trained weights. Defaults to None.

class mmpose.models.backbones.HRNet(extra, in_channels=3, conv_cfg=None, norm_cfg={'type': 'BN'}, norm_eval=False, with_cp=False, zero_init_residual=False)[source]

HRNet backbone.

High-Resolution Representations for Labeling Pixels and Regions

  • extra (dict) – detailed configuration for each stage of HRNet.

  • in_channels (int) – Number of input image channels. Default: 3.

  • conv_cfg (dict) – dictionary to construct and config conv layer.

  • norm_cfg (dict) – dictionary to construct and config norm layer.

  • norm_eval (bool) – Whether to set norm layers to eval mode, namely, freeze running stats (mean and var). Note: Effect on Batch Norm and its variants only. Default: False

  • with_cp (bool) – Use checkpoint or not. Using checkpoint will save some memory while slowing down the training speed.

  • zero_init_residual (bool) – whether to use zero init for last norm layer in resblocks to let them behave as identity.


>>> from mmpose.models import HRNet
>>> import torch
>>> extra = dict(
>>>     stage1=dict(
>>>         num_modules=1,
>>>         num_branches=1,
>>>         block='BOTTLENECK',
>>>         num_blocks=(4, ),
>>>         num_channels=(64, )),
>>>     stage2=dict(
>>>         num_modules=1,
>>>         num_branches=2,
>>>         block='BASIC',
>>>         num_blocks=(4, 4),
>>>         num_channels=(32, 64)),
>>>     stage3=dict(
>>>         num_modules=4,
>>>         num_branches=3,
>>>         block='BASIC',
>>>         num_blocks=(4, 4, 4),
>>>         num_channels=(32, 64, 128)),
>>>     stage4=dict(
>>>         num_modules=3,
>>>         num_branches=4,
>>>         block='BASIC',
>>>         num_blocks=(4, 4, 4, 4),
>>>         num_channels=(32, 64, 128, 256)))
>>> self = HRNet(extra, in_channels=1)
>>> self.eval()
>>> inputs = torch.rand(1, 1, 32, 32)
>>> level_outputs = self.forward(inputs)
>>> for level_out in level_outputs:
...     print(tuple(level_out.shape))
(1, 32, 8, 8)
(1, 64, 4, 4)
(1, 128, 2, 2)
(1, 256, 1, 1)

Forward function.


Initialize the weights in backbone.


pretrained (str, optional) – Path to pre-trained weights. Defaults to None.

property norm1

the normalization layer named “norm1”



property norm2

the normalization layer named “norm2”




Convert the model into training mode.

class mmpose.models.backbones.HourglassNet(downsample_times=5, num_stacks=2, stage_channels=(256, 256, 384, 384, 384, 512), stage_blocks=(2, 2, 2, 2, 2, 4), feat_channel=256, norm_cfg={'requires_grad': True, 'type': 'BN'})[source]

HourglassNet backbone.

Stacked Hourglass Networks for Human Pose Estimation. More details can be found in the paper .

  • downsample_times (int) – Downsample times in a HourglassModule.

  • num_stacks (int) – Number of HourglassModule modules stacked, 1 for Hourglass-52, 2 for Hourglass-104.

  • stage_channels (list[int]) – Feature channel of each sub-module in a HourglassModule.

  • stage_blocks (list[int]) – Number of sub-modules stacked in a HourglassModule.

  • feat_channel (int) – Feature channel of conv after a HourglassModule.

  • norm_cfg (dict) – Dictionary to construct and config norm layer.


>>> from mmpose.models import HourglassNet
>>> import torch
>>> self = HourglassNet()
>>> self.eval()
>>> inputs = torch.rand(1, 3, 511, 511)
>>> level_outputs = self.forward(inputs)
>>> for level_output in level_outputs:
...     print(tuple(level_output.shape))
(1, 256, 128, 128)
(1, 256, 128, 128)

Model forward function.


Initialize the weights in backbone.


pretrained (str, optional) – Path to pre-trained weights. Defaults to None.

class mmpose.models.backbones.MSPN(unit_channels=256, num_stages=4, num_units=4, num_blocks=[2, 2, 2, 2], norm_cfg={'type': 'BN'}, res_top_channels=64)[source]

MSPN backbone. Paper ref: Li et al. “Rethinking on Multi-Stage Networks for Human Pose Estimation” (CVPR 2020).

  • unit_channels (int) – Number of Channels in an upsample unit. Default: 256

  • num_stages (int) – Number of stages in a multi-stage MSPN. Default: 4

  • num_units (int) – NUmber of downsample/upsample units in a single-stage network. Default: 4 Note: Make sure num_units == len(self.num_blocks)

  • num_blocks (list) – Number of bottlenecks in each downsample unit. Default: [2, 2, 2, 2]

  • norm_cfg (dict) – dictionary to construct and config norm layer. Default: dict(type=’BN’)

  • res_top_channels (int) – Number of channels of feature from ResNetTop. Default: 64.


>>> from mmpose.models import MSPN
>>> import torch
>>> self = MSPN(num_stages=2,num_units=2,num_blocks=[2,2])
>>> self.eval()
>>> inputs = torch.rand(1, 3, 511, 511)
>>> level_outputs = self.forward(inputs)
>>> for level_output in level_outputs:
...     for feature in level_output:
...         print(tuple(feature.shape))
(1, 256, 64, 64)
(1, 256, 128, 128)
(1, 256, 64, 64)
(1, 256, 128, 128)

Model forward function.


Initialize model weights.

class mmpose.models.backbones.MobileNetV2(widen_factor=1.0, out_indices=(7), frozen_stages=- 1, conv_cfg=None, norm_cfg={'type': 'BN'}, act_cfg={'type': 'ReLU6'}, norm_eval=False, with_cp=False)[source]

MobileNetV2 backbone.

  • widen_factor (float) – Width multiplier, multiply number of channels in each layer by this amount. Default: 1.0.

  • out_indices (None or Sequence[int]) – Output from which stages. Default: (7, ).

  • frozen_stages (int) – Stages to be frozen (all param fixed). Default: -1, which means not freezing any parameters.

  • conv_cfg (dict) – Config dict for convolution layer. Default: None, which means using conv2d.

  • norm_cfg (dict) – Config dict for normalization layer. Default: dict(type=’BN’).

  • act_cfg (dict) – Config dict for activation layer. Default: dict(type=’ReLU6’).

  • norm_eval (bool) – Whether to set norm layers to eval mode, namely, freeze running stats (mean and var). Note: Effect on Batch Norm and its variants only. Default: False.

  • with_cp (bool) – Use checkpoint or not. Using checkpoint will save some memory while slowing down the training speed. Default: False.


Forward function.


x (tensor | tuple[tensor]) – x could be a Torch.tensor or a tuple of Torch.tensor, containing input data for forward computation.


Init backbone weights.


pretrained (str | None) – If pretrained is a string, then it initializes backbone weights by loading the pretrained checkpoint. If pretrained is None, then it follows default initializer or customized initializer in subclasses.

make_layer(out_channels, num_blocks, stride, expand_ratio)[source]

Stack InvertedResidual blocks to build a layer for MobileNetV2.

  • out_channels (int) – out_channels of block.

  • num_blocks (int) – number of blocks.

  • stride (int) – stride of the first block. Default: 1

  • expand_ratio (int) – Expand the number of channels of the hidden layer in InvertedResidual by this ratio. Default: 6.


Sets the module in training mode.

This has any effect only on certain modules. See documentations of particular modules for details of their behaviors in training/evaluation mode, if they are affected, e.g. Dropout, BatchNorm, etc.


mode (bool) – whether to set training mode (True) or evaluation mode (False). Default: True.



Return type


class mmpose.models.backbones.MobileNetV3(arch='small', conv_cfg=None, norm_cfg={'type': 'BN'}, out_indices=(10), frozen_stages=- 1, norm_eval=False, with_cp=False)[source]

MobileNetV3 backbone.

  • arch (str) – Architechture of mobilnetv3, from {small, big}. Default: small.

  • conv_cfg (dict) – Config dict for convolution layer. Default: None, which means using conv2d.

  • norm_cfg (dict) – Config dict for normalization layer. Default: dict(type=’BN’).

  • out_indices (None or Sequence[int]) – Output from which stages. Default: (10, ), which means output tensors from final stage.

  • frozen_stages (int) – Stages to be frozen (all param fixed). Defualt: -1, which means not freezing any parameters.

  • norm_eval (bool) – Whether to set norm layers to eval mode, namely, freeze running stats (mean and var). Note: Effect on Batch Norm and its variants only. Default: False.

  • with_cp (bool) – Use checkpoint or not. Using checkpoint will save some memory while slowing down the training speed. Defualt: False.


Forward function.


x (tensor | tuple[tensor]) – x could be a Torch.tensor or a tuple of Torch.tensor, containing input data for forward computation.


Init backbone weights.


pretrained (str | None) – If pretrained is a string, then it initializes backbone weights by loading the pretrained checkpoint. If pretrained is None, then it follows default initializer or customized initializer in subclasses.


Sets the module in training mode.

This has any effect only on certain modules. See documentations of particular modules for details of their behaviors in training/evaluation mode, if they are affected, e.g. Dropout, BatchNorm, etc.


mode (bool) – whether to set training mode (True) or evaluation mode (False). Default: True.



Return type


class mmpose.models.backbones.RSN(unit_channels=256, num_stages=4, num_units=4, num_blocks=[2, 2, 2, 2], num_steps=4, norm_cfg={'type': 'BN'}, res_top_channels=64, expand_times=26)[source]

Residual Steps Network backbone. Paper ref: Cai et al. “Learning Delicate Local Representations for Multi-Person Pose Estimation” (ECCV 2020).

  • unit_channels (int) – Number of Channels in an upsample unit. Default: 256

  • num_stages (int) – Number of stages in a multi-stage RSN. Default: 4

  • num_units (int) – NUmber of downsample/upsample units in a single-stage RSN. Default: 4 Note: Make sure num_units == len(self.num_blocks)

  • num_blocks (list) – Number of RSBs (Residual Steps Block) in each downsample unit. Default: [2, 2, 2, 2]

  • num_steps (int) – Number of steps in a RSB. Default:4

  • norm_cfg (dict) – dictionary to construct and config norm layer. Default: dict(type=’BN’)

  • res_top_channels (int) – Number of channels of feature from ResNet_top. Default: 64.

  • expand_times (int) – Times by which the in_channels are expanded in RSB. Default:26.


>>> from mmpose.models import RSN
>>> import torch
>>> self = RSN(num_stages=2,num_units=2,num_blocks=[2,2])
>>> self.eval()
>>> inputs = torch.rand(1, 3, 511, 511)
>>> level_outputs = self.forward(inputs)
>>> for level_output in level_outputs:
...     for feature in level_output:
...         print(tuple(feature.shape))
(1, 256, 64, 64)
(1, 256, 128, 128)
(1, 256, 64, 64)
(1, 256, 128, 128)

Model forward function.


Initialize model weights.

class mmpose.models.backbones.RegNet(arch, in_channels=3, stem_channels=32, base_channels=32, strides=(2, 2, 2, 2), dilations=(1, 1, 1, 1), out_indices=(3), style='pytorch', deep_stem=False, avg_down=False, frozen_stages=- 1, conv_cfg=None, norm_cfg={'requires_grad': True, 'type': 'BN'}, norm_eval=False, with_cp=False, zero_init_residual=True)[source]

RegNet backbone.

More details can be found in paper .

  • arch (dict) – The parameter of RegNets. - w0 (int): initial width - wa (float): slope of width - wm (float): quantization parameter to quantize the width - depth (int): depth of the backbone - group_w (int): width of group - bot_mul (float): bottleneck ratio, i.e. expansion of bottlneck.

  • strides (Sequence[int]) – Strides of the first block of each stage.

  • base_channels (int) – Base channels after stem layer.

  • in_channels (int) – Number of input image channels. Default: 3.

  • dilations (Sequence[int]) – Dilation of each stage.

  • out_indices (Sequence[int]) – Output from which stages.

  • style (str) – pytorch or caffe. If set to “pytorch”, the stride-two layer is the 3x3 conv layer, otherwise the stride-two layer is the first 1x1 conv layer. Default: “pytorch”.

  • frozen_stages (int) – Stages to be frozen (all param fixed). -1 means not freezing any parameters. Default: -1.

  • norm_cfg (dict) – dictionary to construct and config norm layer. Default: dict(type=’BN’, requires_grad=True).

  • norm_eval (bool) – Whether to set norm layers to eval mode, namely, freeze running stats (mean and var). Note: Effect on Batch Norm and its variants only. Default: False.

  • with_cp (bool) – Use checkpoint or not. Using checkpoint will save some memory while slowing down the training speed. Default: False.

  • zero_init_residual (bool) – whether to use zero init for last norm layer in resblocks to let them behave as identity. Default: True.


>>> from mmpose.models import RegNet
>>> import torch
>>> self = RegNet(
>>> self.eval()
>>> inputs = torch.rand(1, 3, 32, 32)
>>> level_outputs = self.forward(inputs)
>>> for level_out in level_outputs:
...     print(tuple(level_out.shape))
(1, 96, 8, 8)
(1, 192, 4, 4)
(1, 432, 2, 2)
(1, 1008, 1, 1)
adjust_width_group(widths, bottleneck_ratio, groups)[source]

Adjusts the compatibility of widths and groups.

  • widths (list[int]) – Width of each stage.

  • bottleneck_ratio (float) – Bottleneck ratio.

  • groups (int) – number of groups in each stage


The adjusted widths and groups of each stage.

Return type



Forward function.

static generate_regnet(initial_width, width_slope, width_parameter, depth, divisor=8)[source]

Generates per block width from RegNet parameters.

  • initial_width ([int]) – Initial width of the backbone

  • width_slope ([float]) – Slope of the quantized linear function

  • width_parameter ([int]) – Parameter used to quantize the width.

  • depth ([int]) – Depth of the backbone.

  • divisor (int, optional) – The divisor of channels. Defaults to 8.


return a list of widths of each stage and the number of


Return type

list, int


Gets widths/stage_blocks of network at each stage.


widths (list[int]) – Width in each stage.


width and depth of each stage

Return type


static quantize_float(number, divisor)[source]

Converts a float to closest non-zero int divisible by divior.

  • number (int) – Original number to be quantized.

  • divisor (int) – Divisor used to quantize the number.


quantized number that is divisible by devisor.

Return type


class mmpose.models.backbones.ResNeSt(depth, groups=1, width_per_group=4, radix=2, reduction_factor=4, avg_down_stride=True, **kwargs)[source]

ResNeSt backbone.

Please refer to the paper for details.

  • depth (int) – Network depth, from {50, 101, 152, 200}.

  • groups (int) – Groups of conv2 in Bottleneck. Default: 32.

  • width_per_group (int) – Width per group of conv2 in Bottleneck. Default: 4.

  • radix (int) – Radix of SpltAtConv2d. Default: 2

  • reduction_factor (int) – Reduction factor of SplitAttentionConv2d. Default: 4.

  • avg_down_stride (bool) – Whether to use average pool for stride in Bottleneck. Default: True.

  • in_channels (int) – Number of input image channels. Default: 3.

  • stem_channels (int) – Output channels of the stem layer. Default: 64.

  • num_stages (int) – Stages of the network. Default: 4.

  • strides (Sequence[int]) – Strides of the first block of each stage. Default: (1, 2, 2, 2).

  • dilations (Sequence[int]) – Dilation of each stage. Default: (1, 1, 1, 1).

  • out_indices (Sequence[int]) – Output from which stages. If only one stage is specified, a single tensor (feature map) is returned, otherwise multiple stages are specified, a tuple of tensors will be returned. Default: (3, ).

  • style (str) – pytorch or caffe. If set to “pytorch”, the stride-two layer is the 3x3 conv layer, otherwise the stride-two layer is the first 1x1 conv layer.

  • deep_stem (bool) – Replace 7x7 conv in input stem with 3 3x3 conv. Default: False.

  • avg_down (bool) – Use AvgPool instead of stride conv when downsampling in the bottleneck. Default: False.

  • frozen_stages (int) – Stages to be frozen (stop grad and set eval mode). -1 means not freezing any parameters. Default: -1.

  • conv_cfg (dict | None) – The config dict for conv layers. Default: None.

  • norm_cfg (dict) – The config dict for norm layers.

  • norm_eval (bool) – Whether to set norm layers to eval mode, namely, freeze running stats (mean and var). Note: Effect on Batch Norm and its variants only. Default: False.

  • with_cp (bool) – Use checkpoint or not. Using checkpoint will save some memory while slowing down the training speed. Default: False.

  • zero_init_residual (bool) – Whether to use zero init for last norm layer in resblocks to let them behave as identity. Default: True.


Make a ResLayer.

class mmpose.models.backbones.ResNeXt(depth, groups=32, width_per_group=4, **kwargs)[source]

ResNeXt backbone.

Please refer to the paper for details.

  • depth (int) – Network depth, from {50, 101, 152}.

  • groups (int) – Groups of conv2 in Bottleneck. Default: 32.

  • width_per_group (int) – Width per group of conv2 in Bottleneck. Default: 4.

  • in_channels (int) – Number of input image channels. Default: 3.

  • stem_channels (int) – Output channels of the stem layer. Default: 64.

  • num_stages (int) – Stages of the network. Default: 4.

  • strides (Sequence[int]) – Strides of the first block of each stage. Default: (1, 2, 2, 2).

  • dilations (Sequence[int]) – Dilation of each stage. Default: (1, 1, 1, 1).

  • out_indices (Sequence[int]) – Output from which stages. If only one stage is specified, a single tensor (feature map) is returned, otherwise multiple stages are specified, a tuple of tensors will be returned. Default: (3, ).

  • style (str) – pytorch or caffe. If set to “pytorch”, the stride-two layer is the 3x3 conv layer, otherwise the stride-two layer is the first 1x1 conv layer.

  • deep_stem (bool) – Replace 7x7 conv in input stem with 3 3x3 conv. Default: False.

  • avg_down (bool) – Use AvgPool instead of stride conv when downsampling in the bottleneck. Default: False.

  • frozen_stages (int) – Stages to be frozen (stop grad and set eval mode). -1 means not freezing any parameters. Default: -1.

  • conv_cfg (dict | None) – The config dict for conv layers. Default: None.

  • norm_cfg (dict) – The config dict for norm layers.

  • norm_eval (bool) – Whether to set norm layers to eval mode, namely, freeze running stats (mean and var). Note: Effect on Batch Norm and its variants only. Default: False.

  • with_cp (bool) – Use checkpoint or not. Using checkpoint will save some memory while slowing down the training speed. Default: False.

  • zero_init_residual (bool) – Whether to use zero init for last norm layer in resblocks to let them behave as identity. Default: True.


Make a ResLayer.

class mmpose.models.backbones.ResNet(depth, in_channels=3, stem_channels=64, base_channels=64, expansion=None, num_stages=4, strides=(1, 2, 2, 2), dilations=(1, 1, 1, 1), out_indices=(3), style='pytorch', deep_stem=False, avg_down=False, frozen_stages=- 1, conv_cfg=None, norm_cfg={'requires_grad': True, 'type': 'BN'}, norm_eval=False, with_cp=False, zero_init_residual=True)[source]

ResNet backbone.

Please refer to the paper for details.

  • depth (int) – Network depth, from {18, 34, 50, 101, 152}.

  • in_channels (int) – Number of input image channels. Default: 3.

  • stem_channels (int) – Output channels of the stem layer. Default: 64.

  • base_channels (int) – Middle channels of the first stage. Default: 64.

  • num_stages (int) – Stages of the network. Default: 4.

  • strides (Sequence[int]) – Strides of the first block of each stage. Default: (1, 2, 2, 2).

  • dilations (Sequence[int]) – Dilation of each stage. Default: (1, 1, 1, 1).

  • out_indices (Sequence[int]) – Output from which stages. If only one stage is specified, a single tensor (feature map) is returned, otherwise multiple stages are specified, a tuple of tensors will be returned. Default: (3, ).

  • style (str) – pytorch or caffe. If set to “pytorch”, the stride-two layer is the 3x3 conv layer, otherwise the stride-two layer is the first 1x1 conv layer.

  • deep_stem (bool) – Replace 7x7 conv in input stem with 3 3x3 conv. Default: False.

  • avg_down (bool) – Use AvgPool instead of stride conv when downsampling in the bottleneck. Default: False.

  • frozen_stages (int) – Stages to be frozen (stop grad and set eval mode). -1 means not freezing any parameters. Default: -1.

  • conv_cfg (dict | None) – The config dict for conv layers. Default: None.

  • norm_cfg (dict) – The config dict for norm layers.

  • norm_eval (bool) – Whether to set norm layers to eval mode, namely, freeze running stats (mean and var). Note: Effect on Batch Norm and its variants only. Default: False.

  • with_cp (bool) – Use checkpoint or not. Using checkpoint will save some memory while slowing down the training speed. Default: False.

  • zero_init_residual (bool) – Whether to use zero init for last norm layer in resblocks to let them behave as identity. Default: True.


>>> from mmpose.models import ResNet
>>> import torch
>>> self = ResNet(depth=18)
>>> self.eval()
>>> inputs = torch.rand(1, 3, 32, 32)
>>> level_outputs = self.forward(inputs)
>>> for level_out in level_outputs:
...     print(tuple(level_out.shape))
(1, 64, 8, 8)
(1, 128, 4, 4)
(1, 256, 2, 2)
(1, 512, 1, 1)

Forward function.


Initialize the weights in backbone.


pretrained (str, optional) – Path to pre-trained weights. Defaults to None.


Make a ResLayer.

property norm1

the normalization layer named “norm1”




Convert the model into training mode.

class mmpose.models.backbones.ResNetV1d(**kwargs)[source]

ResNetV1d variant described in Bag of Tricks.

Compared with default ResNet(ResNetV1b), ResNetV1d replaces the 7x7 conv in the input stem with three 3x3 convs. And in the downsampling block, a 2x2 avg_pool with stride 2 is added before conv, whose stride is changed to 1.

class mmpose.models.backbones.SCNet(depth, **kwargs)[source]

SCNet backbone.

Improving Convolutional Networks with Self-Calibrated Convolutions, Jiang-Jiang Liu, Qibin Hou, Ming-Ming Cheng, Changhu Wang, Jiashi Feng, IEEE CVPR, 2020. http://mftp.mmcheng.net/Papers/20cvprSCNet.pdf

  • depth (int) – Depth of scnet, from {50, 101}.

  • in_channels (int) – Number of input image channels. Normally 3.

  • base_channels (int) – Number of base channels of hidden layer.

  • num_stages (int) – SCNet stages, normally 4.

  • strides (Sequence[int]) – Strides of the first block of each stage.

  • dilations (Sequence[int]) – Dilation of each stage.

  • out_indices (Sequence[int]) – Output from which stages.

  • style (str) – pytorch or caffe. If set to “pytorch”, the stride-two layer is the 3x3 conv layer, otherwise the stride-two layer is the first 1x1 conv layer.

  • deep_stem (bool) – Replace 7x7 conv in input stem with 3 3x3 conv

  • avg_down (bool) – Use AvgPool instead of stride conv when downsampling in the bottleneck.

  • frozen_stages (int) – Stages to be frozen (stop grad and set eval mode). -1 means not freezing any parameters.

  • norm_cfg (dict) – Dictionary to construct and config norm layer.

  • norm_eval (bool) – Whether to set norm layers to eval mode, namely, freeze running stats (mean and var). Note: Effect on Batch Norm and its variants only.

  • with_cp (bool) – Use checkpoint or not. Using checkpoint will save some memory while slowing down the training speed.

  • zero_init_residual (bool) – Whether to use zero init for last norm layer in resblocks to let them behave as identity.


>>> from mmpose.models import SCNet
>>> import torch
>>> self = SCNet(depth=50)
>>> self.eval()
>>> inputs = torch.rand(1, 3, 224, 224)
>>> level_outputs = self.forward(inputs)
>>> for level_out in level_outputs:
...     print(tuple(level_out.shape))
(1, 64, 56, 56)
(1, 128, 28, 28)
(1, 256, 14, 14)
(1, 512, 7, 7)
class mmpose.models.backbones.SEResNeXt(depth, groups=32, width_per_group=4, **kwargs)[source]

SEResNeXt backbone.

Please refer to the paper for details.

  • depth (int) – Network depth, from {50, 101, 152}.

  • groups (int) – Groups of conv2 in Bottleneck. Default: 32.

  • width_per_group (int) – Width per group of conv2 in Bottleneck. Default: 4.

  • se_ratio (int) – Squeeze ratio in SELayer. Default: 16.

  • in_channels (int) – Number of input image channels. Default: 3.

  • stem_channels (int) – Output channels of the stem layer. Default: 64.

  • num_stages (int) – Stages of the network. Default: 4.

  • strides (Sequence[int]) – Strides of the first block of each stage. Default: (1, 2, 2, 2).

  • dilations (Sequence[int]) – Dilation of each stage. Default: (1, 1, 1, 1).

  • out_indices (Sequence[int]) – Output from which stages. If only one stage is specified, a single tensor (feature map) is returned, otherwise multiple stages are specified, a tuple of tensors will be returned. Default: (3, ).

  • style (str) – pytorch or caffe. If set to “pytorch”, the stride-two layer is the 3x3 conv layer, otherwise the stride-two layer is the first 1x1 conv layer.

  • deep_stem (bool) – Replace 7x7 conv in input stem with 3 3x3 conv. Default: False.

  • avg_down (bool) – Use AvgPool instead of stride conv when downsampling in the bottleneck. Default: False.

  • frozen_stages (int) – Stages to be frozen (stop grad and set eval mode). -1 means not freezing any parameters. Default: -1.

  • conv_cfg (dict | None) – The config dict for conv layers. Default: None.

  • norm_cfg (dict) – The config dict for norm layers.

  • norm_eval (bool) – Whether to set norm layers to eval mode, namely, freeze running stats (mean and var). Note: Effect on Batch Norm and its variants only. Default: False.

  • with_cp (bool) – Use checkpoint or not. Using checkpoint will save some memory while slowing down the training speed. Default: False.

  • zero_init_residual (bool) – Whether to use zero init for last norm layer in resblocks to let them behave as identity. Default: True.


Make a ResLayer.

class mmpose.models.backbones.SEResNet(depth, se_ratio=16, **kwargs)[source]

SEResNet backbone.

Please refer to the paper for details.

  • depth (int) – Network depth, from {50, 101, 152}.

  • se_ratio (int) – Squeeze ratio in SELayer. Default: 16.

  • in_channels (int) – Number of input image channels. Default: 3.

  • stem_channels (int) – Output channels of the stem layer. Default: 64.

  • num_stages (int) – Stages of the network. Default: 4.

  • strides (Sequence[int]) – Strides of the first block of each stage. Default: (1, 2, 2, 2).

  • dilations (Sequence[int]) – Dilation of each stage. Default: (1, 1, 1, 1).

  • out_indices (Sequence[int]) – Output from which stages. If only one stage is specified, a single tensor (feature map) is returned, otherwise multiple stages are specified, a tuple of tensors will be returned. Default: (3, ).

  • style (str) – pytorch or caffe. If set to “pytorch”, the stride-two layer is the 3x3 conv layer, otherwise the stride-two layer is the first 1x1 conv layer.

  • deep_stem (bool) – Replace 7x7 conv in input stem with 3 3x3 conv. Default: False.

  • avg_down (bool) – Use AvgPool instead of stride conv when downsampling in the bottleneck. Default: False.

  • frozen_stages (int) – Stages to be frozen (stop grad and set eval mode). -1 means not freezing any parameters. Default: -1.

  • conv_cfg (dict | None) – The config dict for conv layers. Default: None.

  • norm_cfg (dict) – The config dict for norm layers.

  • norm_eval (bool) – Whether to set norm layers to eval mode, namely, freeze running stats (mean and var). Note: Effect on Batch Norm and its variants only. Default: False.

  • with_cp (bool) – Use checkpoint or not. Using checkpoint will save some memory while slowing down the training speed. Default: False.

  • zero_init_residual (bool) – Whether to use zero init for last norm layer in resblocks to let them behave as identity. Default: True.


>>> from mmpose.models import SEResNet
>>> import torch
>>> self = SEResNet(depth=50)
>>> self.eval()
>>> inputs = torch.rand(1, 3, 224, 224)
>>> level_outputs = self.forward(inputs)
>>> for level_out in level_outputs:
...     print(tuple(level_out.shape))
(1, 64, 56, 56)
(1, 128, 28, 28)
(1, 256, 14, 14)
(1, 512, 7, 7)

Make a ResLayer.

class mmpose.models.backbones.ShuffleNetV1(groups=3, widen_factor=1.0, out_indices=(2), frozen_stages=- 1, conv_cfg=None, norm_cfg={'type': 'BN'}, act_cfg={'type': 'ReLU'}, norm_eval=False, with_cp=False)[source]

ShuffleNetV1 backbone.

  • groups (int, optional) – The number of groups to be used in grouped 1x1 convolutions in each ShuffleUnit. Default: 3.

  • widen_factor (float, optional) – Width multiplier - adjusts the number of channels in each layer by this amount. Default: 1.0.

  • out_indices (Sequence[int]) – Output from which stages. Default: (2, )

  • frozen_stages (int) – Stages to be frozen (all param fixed). Default: -1, which means not freezing any parameters.

  • conv_cfg (dict) – Config dict for convolution layer. Default: None, which means using conv2d.

  • norm_cfg (dict) – Config dict for normalization layer. Default: dict(type=’BN’).

  • act_cfg (dict) – Config dict for activation layer. Default: dict(type=’ReLU’).

  • norm_eval (bool) – Whether to set norm layers to eval mode, namely, freeze running stats (mean and var). Note: Effect on Batch Norm and its variants only. Default: False.

  • with_cp (bool) – Use checkpoint or not. Using checkpoint will save some memory while slowing down the training speed. Default: False.


Forward function.


x (tensor | tuple[tensor]) – x could be a Torch.tensor or a tuple of Torch.tensor, containing input data for forward computation.


Init backbone weights.


pretrained (str | None) – If pretrained is a string, then it initializes backbone weights by loading the pretrained checkpoint. If pretrained is None, then it follows default initializer or customized initializer in subclasses.

make_layer(out_channels, num_blocks, first_block=False)[source]

Stack ShuffleUnit blocks to make a layer.

  • out_channels (int) – out_channels of the block.

  • num_blocks (int) – Number of blocks.

  • first_block (bool, optional) – Whether is the first ShuffleUnit of a sequential ShuffleUnits. Default: False, which means not using the grouped 1x1 convolution.


Sets the module in training mode.

This has any effect only on certain modules. See documentations of particular modules for details of their behaviors in training/evaluation mode, if they are affected, e.g. Dropout, BatchNorm, etc.


mode (bool) – whether to set training mode (True) or evaluation mode (False). Default: True.



Return type


class mmpose.models.backbones.ShuffleNetV2(widen_factor=1.0, out_indices=(3), frozen_stages=- 1, conv_cfg=None, norm_cfg={'type': 'BN'}, act_cfg={'type': 'ReLU'}, norm_eval=False, with_cp=False)[source]

ShuffleNetV2 backbone.

  • widen_factor (float) – Width multiplier - adjusts the number of channels in each layer by this amount. Default: 1.0.

  • out_indices (Sequence[int]) – Output from which stages. Default: (0, 1, 2, 3).

  • frozen_stages (int) – Stages to be frozen (all param fixed). Default: -1, which means not freezing any parameters.

  • conv_cfg (dict) – Config dict for convolution layer. Default: None, which means using conv2d.

  • norm_cfg (dict) – Config dict for normalization layer. Default: dict(type=’BN’).

  • act_cfg (dict) – Config dict for activation layer. Default: dict(type=’ReLU’).

  • norm_eval (bool) – Whether to set norm layers to eval mode, namely, freeze running stats (mean and var). Note: Effect on Batch Norm and its variants only. Default: False.

  • with_cp (bool) – Use checkpoint or not. Using checkpoint will save some memory while slowing down the training speed. Default: False.


Forward function.


x (tensor | tuple[tensor]) – x could be a Torch.tensor or a tuple of Torch.tensor, containing input data for forward computation.


Init backbone weights.


pretrained (str | None) – If pretrained is a string, then it initializes backbone weights by loading the pretrained checkpoint. If pretrained is None, then it follows default initializer or customized initializer in subclasses.


Sets the module in training mode.

This has any effect only on certain modules. See documentations of particular modules for details of their behaviors in training/evaluation mode, if they are affected, e.g. Dropout, BatchNorm, etc.


mode (bool) – whether to set training mode (True) or evaluation mode (False). Default: True.



Return type


class mmpose.models.backbones.TCN(in_channels, stem_channels=1024, num_blocks=2, kernel_sizes=(3, 3, 3), dropout=0.25, causal=False, residual=True, use_stride_conv=False, conv_cfg={'type': 'Conv1d'}, norm_cfg={'type': 'BN1d'}, max_norm=None)[source]

TCN backbone.

Temporal Convolutional Networks. More details can be found in the paper .

  • in_channels (int) – Number of input channels, which equals to num_keypoints * num_features.

  • stem_channels (int) – Number of feature channels. Default: 1024.

  • num_blocks (int) – NUmber of basic temporal convolutional blocks. Default: 2.

  • kernel_sizes (Sequence[int]) – Sizes of the convolving kernel of each basic block. Default: (3, 3, 3).

  • dropout (float) – Dropout rate. Default: 0.25.

  • causal (bool) – Use causal convolutions instead of symmetric convolutions (for real-time applications). Default: False.

  • residual (bool) – Use residual connection. Default: True.

  • use_stride_conv (bool) – Use TCN backbone optimized for single-frame batching, i.e. where batches have input length = receptive field, and output length = 1. This implementation replaces dilated convolutions with strided convolutions to avoid generating unused intermediate results. The weights are interchangeable with the reference implementation. Default: False

  • conv_cfg (dict) – dictionary to construct and config conv layer. Default: dict(type=’Conv1d’).

  • norm_cfg (dict) – dictionary to construct and config norm layer. Default: dict(type=’BN1d’).

  • max_norm (float|None) – if not None, the weight of convolution layers will be clipped to have a maximum norm of max_norm.


>>> from mmpose.models import TCN
>>> import torch
>>> self = TCN(in_channels=34)
>>> self.eval()
>>> inputs = torch.rand(1, 34, 243)
>>> level_outputs = self.forward(inputs)
>>> for level_out in level_outputs:
...     print(tuple(level_out.shape))
(1, 1024, 235)
(1, 1024, 217)

Forward function.


Initialize the weights.

class mmpose.models.backbones.VGG(depth, num_classes=- 1, num_stages=5, dilations=(1, 1, 1, 1, 1), out_indices=None, frozen_stages=- 1, conv_cfg=None, norm_cfg=None, act_cfg={'type': 'ReLU'}, norm_eval=False, ceil_mode=False, with_last_pool=True)[source]

VGG backbone.

  • depth (int) – Depth of vgg, from {11, 13, 16, 19}.

  • with_norm (bool) – Use BatchNorm or not.

  • num_classes (int) – number of classes for classification.

  • num_stages (int) – VGG stages, normally 5.

  • dilations (Sequence[int]) – Dilation of each stage.

  • out_indices (Sequence[int]) – Output from which stages. If only one stage is specified, a single tensor (feature map) is returned, otherwise multiple stages are specified, a tuple of tensors will be returned. When it is None, the default behavior depends on whether num_classes is specified. If num_classes <= 0, the default value is (4, ), outputing the last feature map before classifier. If num_classes > 0, the default value is (5, ), outputing the classification score. Default: None.

  • frozen_stages (int) – Stages to be frozen (all param fixed). -1 means not freezing any parameters.

  • norm_eval (bool) – Whether to set norm layers to eval mode, namely, freeze running stats (mean and var). Note: Effect on Batch Norm and its variants only. Default: False.

  • ceil_mode (bool) – Whether to use ceil_mode of MaxPool. Default: False.

  • with_last_pool (bool) – Whether to keep the last pooling before classifier. Default: True.


Forward function.


x (tensor | tuple[tensor]) – x could be a Torch.tensor or a tuple of Torch.tensor, containing input data for forward computation.


Init backbone weights.


pretrained (str | None) – If pretrained is a string, then it initializes backbone weights by loading the pretrained checkpoint. If pretrained is None, then it follows default initializer or customized initializer in subclasses.


Sets the module in training mode.

This has any effect only on certain modules. See documentations of particular modules for details of their behaviors in training/evaluation mode, if they are affected, e.g. Dropout, BatchNorm, etc.


mode (bool) – whether to set training mode (True) or evaluation mode (False). Default: True.



Return type



class mmpose.models.detectors.BottomUp(backbone, keypoint_head=None, train_cfg=None, test_cfg=None, pretrained=None, loss_pose=None)[source]

Bottom-up pose detectors.

  • backbone (dict) – Backbone modules to extract feature.

  • keypoint_head (dict) – Keypoint head to process feature.

  • train_cfg (dict) – Config for training. Default: None.

  • test_cfg (dict) – Config for testing. Default: None.

  • pretrained (str) – Path to the pretrained models.

  • loss_pose (None) – Deprecated arguments. Please use loss_keypoint for heads instead.

forward(img=None, targets=None, masks=None, joints=None, img_metas=None, return_loss=True, return_heatmap=False, **kwargs)[source]

Calls either forward_train or forward_test depending on whether return_loss is True. .. note:

batch_size: N
num_keypoints: K
num_img_channel: C
img_width: imgW
img_height: imgH
heatmaps weight: W
heatmaps height: H
max_num_people: M
  • img (torch.Tensor[NxCximgHximgW]) – Input image.

  • targets (List(torch.Tensor[NxKxHxW])) – Multi-scale target heatmaps.

  • masks (List(torch.Tensor[NxHxW])) – Masks of multi-scale target heatmaps

  • joints (List(torch.Tensor[NxMxKx2])) – Joints of multi-scale target heatmaps for ae loss

  • img_metas (dict) – Information about val&test By default this includes: - “image_file”: image path - “aug_data”: input - “test_scale_factor”: test scale factor - “base_size”: base size of input - “center”: center of image - “scale”: scale of image - “flip_index”: flip index of keypoints

  • loss (return) – Option to ‘return_loss’. ‘return_loss=True’ for training, ‘return_loss=False’ for validation & test

  • return_heatmap (bool) – Option to return heatmap.


if ‘return_loss’ is true, then return losses.

Otherwise, return predicted poses, scores, image paths and heatmaps.

Return type



Used for computing network FLOPs.

See tools/get_flops.py.


img (torch.Tensor) – Input image.



Return type


forward_test(img, img_metas, return_heatmap=False, **kwargs)[source]

Inference the bottom-up model.


Batchsize = N (currently support batchsize = 1) num_img_channel: C img_width: imgW img_height: imgH

  • flip_index (List(int)) –

  • aug_data (List(Tensor[NxCximgHximgW])) – Multi-scale image

  • test_scale_factor (List(float)) – Multi-scale factor

  • base_size (Tuple(int)) – Base size of image when scale is 1

  • center (np.ndarray) – center of image

  • scale (np.ndarray) – the scale of image

forward_train(img, targets, masks, joints, img_metas, **kwargs)[source]

Forward the bottom-up model and calculate the loss.


batch_size: N num_keypoints: K num_img_channel: C img_width: imgW img_height: imgH heatmaps weight: W heatmaps height: H max_num_people: M

  • img (torch.Tensor[NxCximgHximgW]) – Input image.

  • targets (List(torch.Tensor[NxKxHxW])) – Multi-scale target heatmaps.

  • masks (List(torch.Tensor[NxHxW])) – Masks of multi-scale target heatmaps

  • joints (List(torch.Tensor[NxMxKx2])) – Joints of multi-scale target heatmaps for ae loss

  • img_metas (dict) – Information about val&test By default this includes: - “image_file”: image path - “aug_data”: input - “test_scale_factor”: test scale factor - “base_size”: base size of input - “center”: center of image - “scale”: scale of image - “flip_index”: flip index of keypoints


The total loss for bottom-up

Return type



Weight initialization for model.

show_result(img, result, skeleton=None, kpt_score_thr=0.3, bbox_color=None, pose_kpt_color=None, pose_limb_color=None, radius=4, thickness=1, font_scale=0.5, win_name='', show=False, show_keypoint_weight=False, wait_time=0, out_file=None)[source]

Draw result over img.

  • img (str or Tensor) – The image to be displayed.

  • result (list[dict]) – The results to draw over img (bbox_result, pose_result).

  • skeleton (list[list]) – The connection of keypoints.

  • kpt_score_thr (float, optional) – Minimum score of keypoints to be shown. Default: 0.3.

  • pose_kpt_color (np.array[Nx3]`) – Color of N keypoints. If None, do not draw keypoints.

  • pose_limb_color (np.array[Mx3]) – Color of M limbs. If None, do not draw limbs.

  • radius (int) – Radius of circles.

  • thickness (int) – Thickness of lines.

  • font_scale (float) – Font scales of texts.

  • win_name (str) – The window name.

  • show (bool) – Whether to show the image. Default: False.

  • show_keypoint_weight (bool) – Whether to change the transparency using the predicted confidence scores of keypoints.

  • wait_time (int) – Value of waitKey param. Default: 0.

  • out_file (str or None) – The filename to write the image. Default: None.


Visualized image only if not show or out_file

Return type


property with_keypoint

Check if has keypoint_head.

class mmpose.models.detectors.MultiTask(backbone, heads, necks=None, head2neck=None, pretrained=None)[source]

Multi-task detectors.

  • backbone (dict) – Backbone modules to extract feature.

  • heads (List[dict]) – heads to output predictions.

  • necks (List[dict] | None) – necks to process feature.

  • (dict{int (head2neck) – int}): head index to neck index.

  • pretrained (str) – Path to the pretrained models.

forward(img, target=None, target_weight=None, img_metas=None, return_loss=True, **kwargs)[source]

Calls either forward_train or forward_test depending on whether return_loss=True. Note this setting will change the expected inputs. When return_loss=True, img and img_meta are single-nested (i.e. Tensor and List[dict]), and when resturn_loss=False, img and img_meta should be double nested (i.e. List[Tensor], List[List[dict]]), with the outer list indicating test time augmentations.


batch_size: N num_keypoints: K num_img_channel: C (Default: 3) img height: imgH img weight: imgW heatmaps height: H heatmaps weight: W

  • img (torch.Tensor[NxCximgHximgW]) – Input images.

  • target (List[torch.Tensor]) – Targets.

  • target_weight (List[torch.Tensor]) – Weights.

  • img_metas (list(dict)) – Information about data augmentation By default this includes: - “image_file: path to the image file - “center”: center of the bbox - “scale”: scale of the bbox - “rotation”: rotation of the bbox - “bbox_score”: score of bbox

  • return_loss (bool) – Option to return loss. return loss=True for training, return loss=False for validation & test.


if return loss is true, then return losses.
Otherwise, return predicted poses, boxes, image paths

and heatmaps.

Return type



Used for computing network FLOPs.

See tools/get_flops.py.


img (torch.Tensor) – Input image.



Return type


forward_test(img, img_metas, **kwargs)[source]

Defines the computation performed at every call when testing.

forward_train(img, target, target_weight, img_metas, **kwargs)[source]

Defines the computation performed at every call when training.


Weight initialization for model.

property with_necks

Check if has keypoint_head.

class mmpose.models.detectors.ParametricMesh(backbone, mesh_head, smpl, disc=None, loss_gan=None, loss_mesh=None, train_cfg=None, test_cfg=None, pretrained=None)[source]

Model-based 3D human mesh detector. Take a single color image as input and output 3D joints, SMPL parameters and camera parameters.

  • backbone (dict) – Backbone modules to extract feature.

  • mesh_head (dict) – Mesh head to process feature.

  • smpl (dict) – Config for SMPL model.

  • disc (dict) – Discriminator for SMPL parameters. Default: None.

  • loss_gan (dict) – Config for adversarial loss. Default: None.

  • loss_mesh (dict) – Config for mesh loss. Default: None.

  • train_cfg (dict) – Config for training. Default: None.

  • test_cfg (dict) – Config for testing. Default: None.

  • pretrained (str) – Path to the pretrained models.

forward(img, img_metas=None, return_loss=False, **kwargs)[source]

Forward function.

Calls either forward_train or forward_test depending on whether return_loss=True.


batch_size: N num_img_channel: C (Default: 3) img height: imgH img width: imgW

  • img (torch.Tensor[N x C x imgH x imgW]) – Input images.

  • img_metas (list(dict)) – Information about data augmentation By default this includes: - “image_file: path to the image file - “center”: center of the bbox - “scale”: scale of the bbox - “rotation”: rotation of the bbox - “bbox_score”: score of bbox

  • return_loss (bool) – Option to return loss. return loss=True for training, return loss=False for validation & test.


Return predicted 3D joints, SMPL parameters, boxes and image paths.


Used for computing network FLOPs.

See tools/get_flops.py.


img (torch.Tensor) – Input image.



Return type


forward_test(img, img_metas, **kwargs)[source]

Defines the computation performed at every call when testing.

forward_train(*args, **kwargs)[source]

Forward function for training.

For ParametricMesh, we do not use this interface.


Get 3D joints from 3D mesh using predefined joints regressor.


Weight initialization for model.


Visualize the results.

train_step(data_batch, optimizer, **kwargs)[source]

Train step function.

In this function, the detector will finish the train step following the pipeline: 1. get fake and real SMPL parameters 2. optimize discriminator (if have) 3. optimize generator

If self.train_cfg.disc_step > 1, the train step will contain multiple iterations for optimizing discriminator with different input data and only one iteration for optimizing generator after disc_step iterations for discriminator.

  • data_batch (torch.Tensor) – Batch of data as input.

  • optimizer (dict[torch.optim.Optimizer]) – Dict with optimizers for generator and discriminator (if have).


Dict with loss, information for logger, the number of samples.

Return type

outputs (dict)

val_step(data_batch, **kwargs)[source]

Forward function for evaluation.


data_batch (dict) – Contain data for forward.


Contain the results from model.

Return type


class mmpose.models.detectors.PoseLifter(backbone, neck=None, keypoint_head=None, train_cfg=None, test_cfg=None, pretrained=None)[source]

Pose lifter that lifts 2D pose to 3D pose.

forward(input, target=None, target_weight=None, metas=None, return_loss=True, **kwargs)[source]

Calls either forward_train or forward_test depending on whether return_loss=True.


Note: batch_size: N num_input_keypoints: Ki input_keypoint_dim: Ci input_sequence_len: Ti num_output_keypoints: Ko output_keypoint_dim: Co input_sequence_len: To

  • input (torch.Tensor[NxKixCixTi]) – Input keypoint coordinates.

  • target (torch.Tensor[NxKoxCoxTo]) – Output keypoint coordinates. Defaults to None.

  • target_weight (torch.Tensor[NxKox1]) – Weights across different joint types. Defaults to None.

  • metas (list(dict)) – Information about data augmentation

  • return_loss (bool) – Option to return loss. return loss=True for training, return loss=False for validation & test.


if reutrn_loss is true, return losses. Otherwise

return predicted poses

Return type



Used for computing network FLOPs.

See tools/get_flops.py.


input (torch.Tensor) – Input pose


Model output

Return type


forward_test(input, metas, **kwargs)[source]

Defines the computation performed at every call when training.

forward_train(input, target, target_weight, metas, **kwargs)[source]

Defines the computation performed at every call when training.


Weight initialization for model.


Visualize the results.

property with_keypoint

Check if has keypoint_head.

property with_neck

Check if has keypoint_head.

class mmpose.models.detectors.TopDown(backbone, neck=None, keypoint_head=None, train_cfg=None, test_cfg=None, pretrained=None, loss_pose=None)[source]

Top-down pose detectors.

  • backbone (dict) – Backbone modules to extract feature.

  • keypoint_head (dict) – Keypoint head to process feature.

  • train_cfg (dict) – Config for training. Default: None.

  • test_cfg (dict) – Config for testing. Default: None.

  • pretrained (str) – Path to the pretrained models.

  • loss_pose (None) – Deprecated arguments. Please use loss_keypoint for heads instead.

forward(img, target=None, target_weight=None, img_metas=None, return_loss=True, return_heatmap=False, **kwargs)[source]

Calls either forward_train or forward_test depending on whether return_loss=True. Note this setting will change the expected inputs. When return_loss=True, img and img_meta are single-nested (i.e. Tensor and List[dict]), and when resturn_loss=False, img and img_meta should be double nested (i.e. List[Tensor], List[List[dict]]), with the outer list indicating test time augmentations.


batch_size: N num_keypoints: K num_img_channel: C (Default: 3) img height: imgH img width: imgW heatmaps height: H heatmaps weight: W

  • img (torch.Tensor[NxCximgHximgW]) – Input images.

  • target (torch.Tensor[NxKxHxW]) – Target heatmaps.

  • target_weight (torch.Tensor[NxKx1]) – Weights across different joint types.

  • img_metas (list(dict)) – Information about data augmentation By default this includes: - “image_file: path to the image file - “center”: center of the bbox - “scale”: scale of the bbox - “rotation”: rotation of the bbox - “bbox_score”: score of bbox

  • return_loss (bool) – Option to return loss. return loss=True for training, return loss=False for validation & test.

  • return_heatmap (bool) – Option to return heatmap.


if return loss is true, then return losses.
Otherwise, return predicted poses, boxes, image paths

and heatmaps.

Return type



Used for computing network FLOPs.

See tools/get_flops.py.


img (torch.Tensor) – Input image.


Output heatmaps.

Return type


forward_test(img, img_metas, return_heatmap=False, **kwargs)[source]

Defines the computation performed at every call when testing.

forward_train(img, target, target_weight, img_metas, **kwargs)[source]

Defines the computation performed at every call when training.


Weight initialization for model.

show_result(img, result, skeleton=None, kpt_score_thr=0.3, bbox_color='green', pose_kpt_color=None, pose_limb_color=None, text_color=(255, 0, 0), radius=4, thickness=1, font_scale=0.5, win_name='', show=False, show_keypoint_weight=False, wait_time=0, out_file=None)[source]

Draw result over img.

  • img (str or Tensor) – The image to be displayed.

  • result (list[dict]) – The results to draw over img (bbox_result, pose_result).

  • skeleton (list[list]) – The connection of keypoints.

  • kpt_score_thr (float, optional) – Minimum score of keypoints to be shown. Default: 0.3.

  • bbox_color (str or tuple or Color) – Color of bbox lines.

  • pose_kpt_color (np.array[Nx3]`) – Color of N keypoints. If None, do not draw keypoints.

  • pose_limb_color (np.array[Mx3]) – Color of M limbs. If None, do not draw limbs.

  • text_color (str or tuple or Color) – Color of texts.

  • radius (int) – Radius of circles.

  • thickness (int) – Thickness of lines.

  • font_scale (float) – Font scales of texts.

  • win_name (str) – The window name.

  • show (bool) – Whether to show the image. Default: False.

  • show_keypoint_weight (bool) – Whether to change the transparency using the predicted confidence scores of keypoints.

  • wait_time (int) – Value of waitKey param. Default: 0.

  • out_file (str or None) – The filename to write the image. Default: None.


Visualized img, only if not show or out_file.

Return type


property with_keypoint

Check if has keypoint_head.

property with_neck

Check if has keypoint_head.


class mmpose.models.keypoint_heads.BottomUpHigherResolutionHead(in_channels, num_joints, tag_per_joint=True, extra=None, num_deconv_layers=1, num_deconv_filters=(32), num_deconv_kernels=(4), num_basic_blocks=4, cat_output=None, with_ae_loss=None, loss_keypoint=None)[source]

Bottom-up head for Higher Resolution.

  • in_channels (int) – Number of input channels.

  • num_joints (int) – Number of joints

  • tag_per_joint (bool) – If tag_per_joint is True, the dimension of tags equals to num_joints, else the dimension of tags is 1. Default: True

  • extra

  • num_deconv_layers (int) – Number of deconv layers. num_deconv_layers should >= 0. Note that 0 means no deconv layers.

  • num_deconv_filters (list|tuple) – Number of filters. If num_deconv_layers > 0, the length of

  • num_deconv_kernels (list|tuple) – Kernel sizes.

  • cat_output (list[bool]) – Option to concat outputs.

  • with_ae_loss (list[bool]) – Option to use ae loss.

  • loss_keypoint (dict) – Config for loss. Default: None.


Forward function.

get_loss(output, targets, masks, joints)[source]

Calculate bottom-up keypoint loss.


batch_size: N num_keypoints: K num_outputs: O heatmaps height: H heatmaps weight: W

  • output (torch.Tensor[NxKxHxW]) – Output heatmaps.

  • targets (List(torch.Tensor[NxKxHxW])) – Multi-scale target heatmaps.

  • masks (List(torch.Tensor[NxHxW])) – Masks of multi-scale target heatmaps

  • joints (List(torch.Tensor[NxMxKx2])) – Joints of multi-scale target heatmaps for ae loss


Initialize model weights.

class mmpose.models.keypoint_heads.BottomUpSimpleHead(in_channels, num_joints, num_deconv_layers=3, num_deconv_filters=(256, 256, 256), num_deconv_kernels=(4, 4, 4), tag_per_joint=True, with_ae_loss=None, extra=None, loss_keypoint=None)[source]

Bottom-up simple head.

  • in_channels (int) – Number of input channels.

  • num_joints (int) – Number of joints.

  • num_deconv_layers (int) – Number of deconv layers. num_deconv_layers should >= 0. Note that 0 means no deconv layers.

  • num_deconv_filters (list|tuple) – Number of filters. If num_deconv_layers > 0, the length of

  • num_deconv_kernels (list|tuple) – Kernel sizes.

  • tag_per_joint (bool) – If tag_per_joint is True, the dimension of tags equals to num_joints, else the dimension of tags is 1. Default: True

  • with_ae_loss (list[bool]) – Option to use ae loss or not.

  • loss_keypoint (dict) – Config for loss. Default: None.


Forward function.

get_loss(output, targets, masks, joints)[source]

Calculate bottom-up keypoint loss.


batch_size: N num_keypoints: K num_outputs: O heatmaps height: H heatmaps weight: W

  • output (torch.Tensor[NxKxHxW]) – Output heatmaps.

  • targets (List(torch.Tensor[NxKxHxW])) – Multi-scale target heatmaps.

  • masks (List(torch.Tensor[NxHxW])) – Masks of multi-scale target heatmaps

  • joints (List(torch.Tensor[NxMxKx2])) – Joints of multi-scale target heatmaps for ae loss


Initialize model weights.

class mmpose.models.keypoint_heads.FcHead(in_channels, num_joints, loss_keypoint=None, train_cfg=None, test_cfg=None)[source]

regression head with fully connected layers.

paper ref: Alexander Toshev and Christian Szegedy, ``DeepPose: Human Pose Estimation via Deep Neural Networks.’’.

  • in_channels (int) – Number of input channels

  • num_joints (int) – Number of joints

  • loss_keypoint (dict) – Config for keypoint loss. Default: None.

decode(img_metas, output, **kwargs)[source]

Decode the keypoints from output regression.

  • img_metas (list(dict)) – Information about data augmentation By default this includes: - “image_file: path to the image file - “center”: center of the bbox - “scale”: scale of the bbox - “rotation”: rotation of the bbox - “bbox_score”: score of bbox

  • output (np.ndarray[N, K, 2]) – predicted regression vector.

  • kwargs – dict contains ‘img_size’. img_size (tuple(img_width, img_height)): input image size.


Forward function.

get_accuracy(output, target, target_weight)[source]

Calculate accuracy for top-down keypoint loss.


batch_size: N num_keypoints: K

  • output (torch.Tensor[N, K, 2]) – Output keypoints.

  • target (torch.Tensor[N, K, 2]) – Target keypoints.

  • target_weight (torch.Tensor[N, K, 2]) – Weights across different joint types.

get_loss(output, target, target_weight)[source]

Calculate top-down keypoint loss.


batch_size: N num_keypoints: K

  • output (torch.Tensor[N, K, 2]) – Output keypoints.

  • target (torch.Tensor[N, K, 2]) – Target keypoints.

  • target_weight (torch.Tensor[N, K, 2]) – Weights across different joint types.

inference_model(x, flip_pairs=None)[source]

Inference function.


Output regression.

Return type

output_regression (np.ndarray)

  • x (torch.Tensor[N, K, 2]) – Input features.

  • flip_pairs (None | list[tuple()) – Pairs of keypoints which are mirrored.

class mmpose.models.keypoint_heads.HeatMap3DHead(in_channels, out_channels, depth_size=64, num_deconv_layers=3, num_deconv_filters=(256, 256, 256), num_deconv_kernels=(4, 4, 4), extra=None, in_index=0, input_transform=None, align_corners=False, loss_keypoint=None, train_cfg=None, test_cfg=None)[source]

3D heatmap head of paper ref: Gyeongsik Moon. “InterHand2.6M: A Dataset and Baseline for 3D Interacting Hand Pose Estimation from a Single RGB Image” HeatMap3DHead is a variant of TopDownSimpleHead, and is composed of (>=0) number of deconv layers and a simple conv2d layer.

  • in_channels (int) – Number of input channels

  • out_channels (int) – Number of output channels

  • depth_size (int) – Number of depth discretization size

  • num_deconv_layers (int) – Number of deconv layers. num_deconv_layers should >= 0. Note that 0 means no deconv layers.

  • num_deconv_filters (list|tuple) – Number of filters. If num_deconv_layers > 0, the length of

  • num_deconv_kernels (list|tuple) – Kernel sizes.

  • in_index (int|Sequence[int]) – Input feature index. Default: -1

  • input_transform (str|None) –

    Transformation type of input features. Options: ‘resize_concat’, ‘multiple_select’, None. ‘resize_concat’: Multiple feature maps will be resize to the

    same size as first one and than concat together. Usually used in FCN head of HRNet.

    ’multiple_select’: Multiple feature maps will be bundle into

    a list and passed into decode head.

    None: Only one select feature map is allowed. Default: None.

  • align_corners (bool) – align_corners argument of F.interpolate. Default: False.

  • loss_keypoint (dict) – Config for keypoint loss. Default: None.

decode(img_metas, output, **kwargs)[source]

Decode keypoints from heatmaps.

  • img_metas (list(dict)) – Information about data augmentation By default this includes: - “image_file: path to the image file - “center”: center of the bbox - “scale”: scale of the bbox - “rotation”: rotation of the bbox - “bbox_score”: score of bbox

  • output (np.ndarray[N, K, D, H, W]) – model predicted 3D heatmaps.


Forward function.

get_accuracy(output, target, target_weight)[source]

Calculate accuracy for top-down keypoint loss.


batch_size: N num_keypoints: K heatmaps height: H heatmaps weight: W

  • output (torch.Tensor[NxKxHxW]) – Output heatmaps.

  • target (torch.Tensor[NxKxHxW]) – Target heatmaps.

  • target_weight (torch.Tensor[NxKx1]) – Weights across different joint types.

get_loss(output, target, target_weight)[source]

Calculate 3D heatmap loss.


batch size: N num keypoints: K heatmaps depth size: D heatmaps height: H heatmaps weight: W

  • output (torch.Tensor[NxKxDxHxW]) – Output heatmaps.

  • target (torch.Tensor[NxKxDxHxW]) – Target heatmaps.

  • target_weight (torch.Tensor[NxKx1]) – Weights across different joint types.

inference_model(x, flip_pairs=None)[source]

Inference function.


Output heatmaps.

Return type

output_heatmap (np.ndarray)

  • x (torch.Tensor[NxKxHxW]) – Input features.

  • flip_pairs (None | list[tuple()) – Pairs of keypoints which are mirrored.

class mmpose.models.keypoint_heads.Heatmap1DHead(in_channels=2048, heatmap_size=64, hidden_dims=(512), loss_value=None, train_cfg=None, test_cfg=None)[source]

Root depth head of paper ref: Gyeongsik Moon. “InterHand2.6M: A Dataset and Baseline for 3D Interacting Hand Pose Estimation from a Single RGB Image”.

  • in_channels (int) – Number of input channels

  • heatmap_size (int) – Heatmap size

  • hidden_dims (list|tuple) – Number of feature dimension of FC layers.

  • loss_value (dict) – Config for heatmap 1d loss. Default: None.

decode(img_metas, output, **kwargs)[source]

Decode heatmap 1d values.

  • img_metas (list(dict)) – Information about data augmentation By default this includes: - “image_file: path to the image file

  • output (np.ndarray[N, 1]) – model predicted values.


Forward function.

get_loss(output, target, target_weight)[source]

Calculate regression loss of heatmap.


batch size: N

  • output (torch.Tensor[N, 1]) – Output depth.

  • target (torch.Tensor[N, 1]) – Target depth.

  • target_weight (torch.Tensor[N, 1]) – Weights across different data.

inference_model(x, flip_pairs=None)[source]

Inference function.


Output labels.

Return type

output_labels (np.ndarray)

  • x (torch.Tensor[NxC]) – Input features vector.

  • flip_pairs (None | list[tuple()) – Pairs of labels which are mirrored.

class mmpose.models.keypoint_heads.MultilabelClassificationHead(in_channels=2048, num_labels=2, hidden_dims=(512), loss_classification=None, train_cfg=None, test_cfg=None)[source]

Multi-label classification head. Paper ref: Gyeongsik Moon. “InterHand2.6M: A Dataset and Baseline for 3D Interacting Hand Pose Estimation from a Single RGB Image”.

  • in_channels (int) – Number of input channels

  • num_labels (int) – Number of labels

  • hidden_dims (list|tuple) – Number of hidden dimension of FC layers.

  • loss_classification (dict) – Config for classification loss. Default: None.

decode(img_metas, output, **kwargs)[source]

Decode keypoints from heatmaps.

  • img_metas (list(dict)) – Information about data augmentation By default this includes: - “image_file”: path to the image file

  • output (np.ndarray[N, L]) – model predicted labels.


Forward function.

get_accuracy(output, target, target_weight)[source]

Calculate accuracy for classification.


batch size: N number labels: L

  • output (torch.Tensor[N, L]) – Output hand visibility.

  • target (torch.Tensor[N, L]) – Target hand visibility.

  • target_weight (torch.Tensor[N, L]) – Weights across different labels.

get_loss(output, target, target_weight)[source]

Calculate regression loss of root depth.


batch_size: N

  • output (torch.Tensor[N, 1]) – Output depth.

  • target (torch.Tensor[N, 1]) – Target depth.

  • target_weight (torch.Tensor[N, 1]) – Weights across different data.

inference_model(x, flip_pairs=None)[source]

Inference function.


Output labels.

Return type

output_labels (np.ndarray)

  • x (torch.Tensor[NxC]) – Input features vector.

  • flip_pairs (None | list[tuple()]) – Pairs of labels which are mirrored.

class mmpose.models.keypoint_heads.TemporalRegressionHead(in_channels, num_joints, max_norm=None, loss_keypoint=None, train_cfg=None, test_cfg=None)[source]

Regression head of VideoPose3D.

Paper ref: Dario Pavllo. ``3D human pose estimation in video with temporal convolutions and

semi-supervised training``


in_channels (int): Number of input channels num_joints (int): Number of joints loss_keypoint (dict): Config for keypoint loss. Default: None. max_norm (float|None): if not None, the weight of convolution layers

will be clipped to have a maximum norm of max_norm.

decode(metas, output)[source]

Decode the keypoints from output regression.

  • metas (list(dict)) – Information about data augmentation. By default this includes: - “target_image_path”: path to the image file

  • output (np.ndarray[N, K, 3]) – predicted regression vector.

  • metas

    Information about data augmentation including: - target_image_path (str): Optional, path to the image file - target_mean (float): Optional, normalization parameter of

    the target pose.

    • target_std (float): Optional, normalization parameter of the

      target pose.

    • root_position (np.ndarray[3,1]): Optional, global

      position of the root joint.

    • root_index (torch.ndarray[1,]): Optional, original index of

      the root joint before root-centering.


Forward function.

get_accuracy(output, target, target_weight, metas)[source]

Calculate accuracy for keypoint loss.


batch_size: N num_keypoints: K

  • output (torch.Tensor[N, K, 3]) – Output keypoints.

  • target (torch.Tensor[N, K, 3]) – Target keypoints.

  • target_weight (torch.Tensor[N, K, 3]) – Weights across different joint types.

  • metas (list(dict)) –

    Information about data augmentation including: - target_image_path (str): Optional, path to the image file - target_mean (float): Optional, normalization parameter of

    the target pose.

    • target_std (float): Optional, normalization parameter of the

      target pose.

    • root_position (np.ndarray[3,1]): Optional, global

      position of the root joint.

    • root_index (torch.ndarray[1,]): Optional, original index of

      the root joint before root-centering.

get_loss(output, target, target_weight)[source]

Calculate keypoint loss.


batch_size: N num_keypoints: K

  • output (torch.Tensor[N, K, 3]) – Output keypoints.

  • target (torch.Tensor[N, K, 3]) – Target keypoints.

  • target_weight (torch.Tensor[N, K, 3]) – Weights across different joint types.

inference_model(x, flip_pairs=None)[source]

Inference function.


Output regression.

Return type

output_regression (np.ndarray)

  • x (torch.Tensor[N, K, 2]) – Input features.

  • flip_pairs (None | list[tuple()) – Pairs of keypoints which are mirrored.


Initialize the weights.

class mmpose.models.keypoint_heads.TopDownMSMUHead(out_shape, unit_channels=256, out_channels=17, num_stages=4, num_units=4, use_prm=False, norm_cfg={'type': 'BN'}, loss_keypoint=None, train_cfg=None, test_cfg=None)[source]

Heads for multi-stage multi-unit heads used in Multi-Stage Pose estimation Network (MSPN), and Residual Steps Networks (RSN).

  • unit_channels (int) – Number of input channels.

  • out_channels (int) – Number of output channels.

  • out_shape (tuple) – Shape of the output heatmap.

  • num_stages (int) – Number of stages.

  • num_units (int) – Number of units in each stage.

  • use_prm (bool) – Whether to use pose refine machine (PRM). Default: False.

  • norm_cfg (dict) – dictionary to construct and config norm layer. Default: dict(type=’BN’)

  • loss_keypoint (dict) – Config for keypoint loss. Default: None.


Forward function.


a list of heatmaps from multiple stages

and units.

Return type

out (list[Tensor])

get_accuracy(output, target, target_weight)[source]

Calculate accuracy for top-down keypoint loss.


batch_size: N num_keypoints: K heatmaps height: H heatmaps weight: W

  • output (torch.Tensor[NxKxHxW]) – Output heatmaps.

  • target (torch.Tensor[NxKxHxW]) – Target heatmaps.

  • target_weight (torch.Tensor[NxKx1]) – Weights across different joint types.

get_loss(output, target, target_weight)[source]

Calculate top-down keypoint loss.


batch_size: N num_keypoints: K num_outputs: O heatmaps height: H heatmaps weight: W

  • output (torch.Tensor[NxOxKxHxW]) – Output heatmaps.

  • target (torch.Tensor[NxOxKxHxW]) – Target heatmaps.

  • target_weight (torch.Tensor[NxOxKx1]) – Weights across different joint types.

inference_model(x, flip_pairs=None)[source]

Inference function.


Output heatmaps.

Return type

output_heatmap (np.ndarray)

  • x (List[torch.Tensor[NxKxHxW]]) – Input features.

  • flip_pairs (None | list[tuple()) – Pairs of keypoints which are mirrored.


Initialize model weights.

class mmpose.models.keypoint_heads.TopDownMultiStageHead(in_channels=512, out_channels=17, num_stages=1, num_deconv_layers=3, num_deconv_filters=(256, 256, 256), num_deconv_kernels=(4, 4, 4), extra=None, loss_keypoint=None, train_cfg=None, test_cfg=None)[source]

Heads for multi-stage pose models.

TopDownMultiStageHead is consisted of multiple branches, each of which has num_deconv_layers(>=0) number of deconv layers and a simple conv2d layer.

  • in_channels (int) – Number of input channels.

  • out_channels (int) – Number of output channels.

  • num_stages (int) – Number of stages.

  • num_deconv_layers (int) – Number of deconv layers. num_deconv_layers should >= 0. Note that 0 means no deconv layers.

  • num_deconv_filters (list|tuple) – Number of filters. If num_deconv_layers > 0, the length of

  • num_deconv_kernels (list|tuple) – Kernel sizes.

  • loss_keypoint (dict) – Config for keypoint loss. Default: None.


Forward function.


a list of heatmaps from multiple stages.

Return type

out (list[Tensor])

get_accuracy(output, target, target_weight)[source]

Calculate accuracy for top-down keypoint loss.


batch_size: N num_keypoints: K heatmaps height: H heatmaps weight: W

  • output (torch.Tensor[NxKxHxW]) – Output heatmaps.

  • target (torch.Tensor[NxKxHxW]) – Target heatmaps.

  • target_weight (torch.Tensor[NxKx1]) – Weights across different joint types.

get_loss(output, target, target_weight)[source]

Calculate top-down keypoint loss.


batch_size: N num_keypoints: K num_outputs: O heatmaps height: H heatmaps weight: W

  • output (torch.Tensor[NxKxHxW]) – Output heatmaps.

  • target (torch.Tensor[NxKxHxW]) – Target heatmaps.

  • target_weight (torch.Tensor[NxKx1]) – Weights across different joint types.

inference_model(x, flip_pairs=None)[source]

Inference function.


Output heatmaps.

Return type

output_heatmap (np.ndarray)

  • x (List[torch.Tensor[NxKxHxW]]) – Input features.

  • flip_pairs (None | list[tuple()) – Pairs of keypoints which are mirrored.


Initialize model weights.

class mmpose.models.keypoint_heads.TopDownSimpleHead(in_channels, out_channels, num_deconv_layers=3, num_deconv_filters=(256, 256, 256), num_deconv_kernels=(4, 4, 4), extra=None, in_index=0, input_transform=None, align_corners=False, loss_keypoint=None, train_cfg=None, test_cfg=None)[source]

Top-down model head of simple baseline paper ref: Bin Xiao. Simple Baselines for Human Pose Estimation and Tracking.

TopDownSimpleHead is consisted of (>=0) number of deconv layers and a simple conv2d layer.

  • in_channels (int) – Number of input channels

  • out_channels (int) – Number of output channels

  • num_deconv_layers (int) – Number of deconv layers. num_deconv_layers should >= 0. Note that 0 means no deconv layers.

  • num_deconv_filters (list|tuple) – Number of filters. If num_deconv_layers > 0, the length of

  • num_deconv_kernels (list|tuple) – Kernel sizes.

  • in_index (int|Sequence[int]) – Input feature index. Default: -1

  • input_transform (str|None) –

    Transformation type of input features. Options: ‘resize_concat’, ‘multiple_select’, None. ‘resize_concat’: Multiple feature maps will be resize to the

    same size as first one and than concat together. Usually used in FCN head of HRNet.

    ’multiple_select’: Multiple feature maps will be bundle into

    a list and passed into decode head.

    None: Only one select feature map is allowed. Default: None.

  • align_corners (bool) – align_corners argument of F.interpolate. Default: False.

  • loss_keypoint (dict) – Config for keypoint loss. Default: None.


Forward function.

get_accuracy(output, target, target_weight)[source]

Calculate accuracy for top-down keypoint loss.


batch_size: N num_keypoints: K heatmaps height: H heatmaps weight: W

  • output (torch.Tensor[NxKxHxW]) – Output heatmaps.

  • target (torch.Tensor[NxKxHxW]) – Target heatmaps.

  • target_weight (torch.Tensor[NxKx1]) – Weights across different joint types.

get_loss(output, target, target_weight)[source]

Calculate top-down keypoint loss.


batch_size: N num_keypoints: K heatmaps height: H heatmaps weight: W

  • output (torch.Tensor[NxKxHxW]) – Output heatmaps.

  • target (torch.Tensor[NxKxHxW]) – Target heatmaps.

  • target_weight (torch.Tensor[NxKx1]) – Weights across different joint types.

inference_model(x, flip_pairs=None)[source]

Inference function.


Output heatmaps.

Return type

output_heatmap (np.ndarray)

  • x (torch.Tensor[NxKxHxW]) – Input features.

  • flip_pairs (None | list[tuple()) – Pairs of keypoints which are mirrored.


Initialize model weights.


class mmpose.models.losses.AELoss(loss_type)[source]

Associative Embedding loss.

Associative Embedding: End-to-End Learning for Joint Detection and Grouping <https://arxiv.org/abs/1611.05424v2>

forward(tags, joints)[source]

Accumulate the tag loss for each image in the batch.


batch_size: N heatmaps weight: W heatmaps height: H max_num_people: M num_keypoints: K

  • tags (torch.Tensor[Nx(KxHxW)x1]) – tag channels of output.

  • joints (torch.Tensor[NxMxKx2]) – joints information.

singleTagLoss(pred_tag, joints)[source]

Associative embedding loss for one image.


heatmaps weight: W heatmaps height: H max_num_people: M num_keypoints: K

  • pred_tag (torch.Tensor[(KxHxW)x1]) – tag of output for one image.

  • joints (torch.Tensor[MxKx2]) – joints information for one image.

class mmpose.models.losses.BCELoss(use_target_weight=False, loss_weight=1.0)[source]

Binary Cross Entropy loss.

forward(output, target, target_weight)[source]

Forward function.


batch_size: N num_labels: K

  • output (torch.Tensor[N, K]) – Output classification.

  • target (torch.Tensor[N, K]) – Target classification.

  • target_weight (torch.Tensor[N, K] or torch.Tensor[N]) – Weights across different labels.

class mmpose.models.losses.GANLoss(gan_type, real_label_val=1.0, fake_label_val=0.0, loss_weight=1.0)[source]

Define GAN loss.

  • gan_type (str) – Support ‘vanilla’, ‘lsgan’, ‘wgan’, ‘hinge’.

  • real_label_val (float) – The value for real label. Default: 1.0.

  • fake_label_val (float) – The value for fake label. Default: 0.0.

  • loss_weight (float) – Loss weight. Default: 1.0. Note that loss_weight is only for generators; and it is always 1.0 for discriminators.

forward(input, target_is_real, is_disc=False)[source]
  • input (Tensor) – The input for the loss module, i.e., the network prediction.

  • target_is_real (bool) – Whether the targe is real or fake.

  • is_disc (bool) – Whether the loss for discriminators or not. Default: False.


GAN loss value.

Return type


get_target_label(input, target_is_real)[source]

Get target label.

  • input (Tensor) – Input tensor.

  • target_is_real (bool) – Whether the target is real or fake.


Target tensor. Return bool for wgan, otherwise,

return Tensor.

Return type

(bool | Tensor)

class mmpose.models.losses.HeatmapLoss[source]

Accumulate the heatmap loss for each image in the batch.

static forward(pred, gt, mask)[source]


batch_size: N heatmaps weight: W heatmaps height: H max_num_people: M num_keypoints: K

  • pred (torch.Tensor[NxKxHxW]) – heatmap of output.

  • gt (torch.Tensor[NxKxHxW]) – target heatmap.

  • mask (torch.Tensor[NxHxW]) – mask of target.

class mmpose.models.losses.JointsMSELoss(use_target_weight=False, loss_weight=1.0)[source]

MSE loss for heatmaps.

  • use_target_weight (bool) – Option to use weighted MSE loss. Different joint types may have different target weights.

  • loss_weight (float) – Weight of the loss. Default: 1.0.

forward(output, target, target_weight)[source]

Forward function.

class mmpose.models.losses.JointsOHKMMSELoss(use_target_weight=False, topk=8, loss_weight=1.0)[source]

MSE loss with online hard keypoint mining.

  • use_target_weight (bool) – Option to use weighted MSE loss. Different joint types may have different target weights.

  • topk (int) – Only top k joint losses are kept.

  • loss_weight (float) – Weight of the loss. Default: 1.0.

forward(output, target, target_weight)[source]

Forward function.

class mmpose.models.losses.L1Loss(use_target_weight=False, loss_weight=1.0)[source]

L1Loss loss .

forward(output, target, target_weight)[source]

Forward function.


batch_size: N num_keypoints: K

  • output (torch.Tensor[N, K, 2]) – Output regression.

  • target (torch.Tensor[N, K, 2]) – Target regression.

  • target_weight (torch.Tensor[N, K, 2]) – Weights across different joint types.

class mmpose.models.losses.MPJPELoss(use_target_weight=False, loss_weight=1.0)[source]

MPJPE (Mean Per Joint Position Error) loss.

  • use_target_weight (bool) – Option to use weighted MSE loss. Different joint types may have different target weights.

  • loss_weight (float) – Weight of the loss. Default: 1.0.

forward(output, target, target_weight)[source]

Forward function.


batch_size: N num_keypoints: K dimension of keypoints: D (D=2 or D=3)

  • output (torch.Tensor[N, K, D]) – Output regression.

  • target (torch.Tensor[N, K, D]) – Target regression.

  • target_weight (torch.Tensor[N, K, D]) – Weights across different joint types.

class mmpose.models.losses.MSELoss(use_target_weight=False, loss_weight=1.0)[source]

MSE loss for coordinate regression.

forward(output, target, target_weight)[source]

Forward function.


batch_size: N num_keypoints: K

  • output (torch.Tensor[N, K, 2]) – Output regression.

  • target (torch.Tensor[N, K, 2]) – Target regression.

  • target_weight (torch.Tensor[N, K, 2]) – Weights across different joint types.

class mmpose.models.losses.MeshLoss(joints_2d_loss_weight, joints_3d_loss_weight, vertex_loss_weight, smpl_pose_loss_weight, smpl_beta_loss_weight, img_res, focal_length=5000)[source]

Mix loss for 3D human mesh. It is composed of loss on 2D joints, 3D joints, mesh vertices and smpl paramters (if any).

  • joints_2d_loss_weight (float) – Weight for loss on 2D joints.

  • joints_3d_loss_weight (float) – Weight for loss on 3D joints.

  • vertex_loss_weight (float) – Weight for loss on 3D verteices.

  • smpl_pose_loss_weight (float) – Weight for loss on SMPL pose parameters.

  • smpl_beta_loss_weight (float) – Weight for loss on SMPL shape parameters.

  • img_res (int) – Input image resolution.

  • focal_length (float) – Focal length of camera model. Default=5000.

forward(output, target)[source]

Forward function.

  • output (dict) – dict of network predicted results. Keys: ‘vertices’, ‘joints_3d’, ‘camera’, ‘pose’(optional), ‘beta’(optional)

  • target (dict) – dict of ground-truth labels. Keys: ‘vertices’, ‘joints_3d’, ‘joints_3d_visible’, ‘joints_2d’, ‘joints_2d_visible’, ‘pose’, ‘beta’, ‘has_smpl’


dict of losses.

Return type

losses (dict)

joints_2d_loss(pred_joints_2d, gt_joints_2d, joints_2d_visible)[source]

Compute 2D reprojection loss on the joints.

The loss is weighted by joints_2d_visible.

joints_3d_loss(pred_joints_3d, gt_joints_3d, joints_3d_visible)[source]

Compute 3D joints loss for the examples that 3D joint annotations are available.

The loss is weighted by joints_3d_visible.

project_points(points_3d, camera)[source]

Perform orthographic projection of 3D points using the camera parameters, return projected 2D points in image plane.


batch size: B point number: N

  • points_3d (Tensor([B, N, 3])) – 3D points.

  • camera (Tensor([B, 3])) – camera parameters with the 3 channel as (scale, translation_x, translation_y)


projected 2D points

in image space.

Return type

points_2d (Tensor([B, N, 2]))

smpl_losses(pred_rotmat, pred_betas, gt_pose, gt_betas, has_smpl)[source]

Compute SMPL parameters loss for the examples that SMPL parameter annotations are available.

The loss is weighted by has_smpl.

vertex_loss(pred_vertices, gt_vertices, has_smpl)[source]

Compute 3D vertex loss for the examples that 3D human mesh annotations are available.

The loss is weighted by the has_smpl.

class mmpose.models.losses.MultiLossFactory(num_joints, num_stages, ae_loss_type, with_ae_loss, push_loss_factor, pull_loss_factor, with_heatmaps_loss, heatmaps_loss_factor)[source]

Loss for bottom-up models.

  • num_joints (int) – Number of keypoints.

  • num_stages (int) – Number of stages.

  • ae_loss_type (str) – Type of ae loss.

  • with_ae_loss (list[bool]) – Use ae loss or not in multi-heatmap.

  • push_loss_factor (list[float]) – Parameter of push loss in multi-heatmap.

  • pull_loss_factor (list[float]) – Parameter of pull loss in multi-heatmap.

  • with_heatmap_loss (list[bool]) – Use heatmap loss or not in multi-heatmap.

  • heatmaps_loss_factor (list[float]) – Parameter of heatmap loss in multi-heatmap.

forward(outputs, heatmaps, masks, joints)[source]

Forward function to calculate losses.


batch_size: N heatmaps weight: W heatmaps height: H max_num_people: M num_keypoints: K output_channel: C C=2K if use ae loss else K

  • outputs (List(torch.Tensor[NxCxHxW])) – outputs of stages.

  • heatmaps (List(torch.Tensor[NxKxHxW])) – target of heatmaps.

  • masks (List(torch.Tensor[NxHxW])) – masks of heatmaps.

  • joints (List(torch.Tensor[NxMxKx2])) – joints of ae loss.

class mmpose.models.losses.SmoothL1Loss(use_target_weight=False, loss_weight=1.0)[source]

SmoothL1Loss loss .

  • use_target_weight (bool) – Option to use weighted MSE loss. Different joint types may have different target weights.

  • loss_weight (float) – Weight of the loss. Default: 1.0.

forward(output, target, target_weight)[source]

Forward function.


batch_size: N num_keypoints: K dimension of keypoints: D (D=2 or D=3)

  • output (torch.Tensor[N, K, D]) – Output regression.

  • target (torch.Tensor[N, K, D]) – Target regression.

  • target_weight (torch.Tensor[N, K, D]) – Weights across different joint types.

class mmpose.models.losses.WingLoss(omega=10.0, epsilon=2.0, use_target_weight=False, loss_weight=1.0)[source]

Wing Loss ‘Wing Loss for Robust Facial Landmark Localisation with Convolutional Neural Networks’ Feng et al. CVPR’2018.

  • omega (float), epsilon (float) –

  • use_target_weight (bool) – Option to use weighted MSE loss. Different joint types may have different target weights.

  • loss_weight (float) – Weight of the loss. Default: 1.0.

criterion(pred, target)[source]

Criterion of wingloss.


batch_size: N num_keypoints: K dimension of keypoints: D (D=2 or D=3)

  • pred (torch.Tensor[N, K, D]) – Output regression.

  • target (torch.Tensor[N, K, D]) – Target regression.

forward(output, target, target_weight)[source]

Forward function.


batch_size: N num_keypoints: K dimension of keypoints: D (D=2 or D=3)

  • output (torch.Tensor[N, K, D]) – Output regression.

  • target (torch.Tensor[N, K, D]) – Target regression.

  • target_weight (torch.Tensor[N, K, D]) – Weights across different joint types.


class mmpose.datasets.AnimalATRWDataset(ann_file, img_prefix, data_cfg, pipeline, test_mode=False)[source]

ATRW dataset for animal pose estimation.

ATRW: A Benchmark for Amur Tiger Re-identification in the Wild’ ACM MM’2020 More details can be found in the `paper .

The dataset loads raw features and apply specified transforms to return a dict containing the image tensors and other information.

ATRW keypoint indexes:

0: "left_ear",
1: "right_ear",
2: "nose",
3: "right_shoulder",
4: "right_front_paw",
5: "left_shoulder",
6: "left_front_paw",
7: "right_hip",
8: "right_knee",
9: "right_back_paw",
10: "left_hip",
11: "left_knee",
12: "left_back_paw",
13: "tail",
14: "center"
  • ann_file (str) – Path to the annotation file.

  • img_prefix (str) – Path to a directory where images are held. Default: None.

  • data_cfg (dict) – config

  • pipeline (list[dict | callable]) – A sequence of data transforms.

  • test_mode (bool) – Store True when building test or validation dataset. Default: False.

evaluate(outputs, res_folder, metric='mAP', **kwargs)[source]

Evaluate coco keypoint results. The pose prediction results will be saved in ${res_folder}/result_keypoints.json.


batch_size: N num_keypoints: K heatmap height: H heatmap width: W

  • outputs (list(dict)) –

    preds (np.ndarray[N,K,3])

    The first two dimensions are coordinates, score is the third dimension of the array.

    boxes (np.ndarray[N,6])

    [center[0], center[1], scale[0] , scale[1],area, score]

    image_paths (list[str])

    For example, [‘data/coco/val2017 /000000393226.jpg’]

    heatmap (np.ndarray[N, K, H, W])

    model output heatmap

    :bbox_id (list(int)).

  • res_folder (str) – Path of directory to save the results.

  • metric (str | list[str]) – Metric to be performed. Defaults: ‘mAP’.


Evaluation results for evaluation metric.

Return type


class mmpose.datasets.AnimalFlyDataset(ann_file, img_prefix, data_cfg, pipeline, test_mode=False)[source]

AnimalFlyDataset for animal pose estimation.

`Fast animal pose estimation using deep neural networks’ Nature methods’2019. More details can be found in the `paper <https://www.biorxiv.org/content/

biorxiv/early/2018/05/25/331181.full.pdf>`__ .

The dataset loads raw features and apply specified transforms to return a dict containing the image tensors and other information.

Vinegar Fly keypoint indexes:

0: "head",
1: "eyeL",
2: "eyeR",
3: "neck",
4: "thorax",
5: "abdomen",
6: "forelegR1",
7: "forelegR2",
8: "forelegR3",
9: "forelegR4",
10: "midlegR1",
11: "midlegR2",
12: "midlegR3",
13: "midlegR4",
14: "hindlegR1",
15: "hindlegR2",
16: "hindlegR3",
17: "hindlegR4",
18: "forelegL1",
19: "forelegL2",
20: "forelegL3",
21: "forelegL4",
22: "midlegL1",
23: "midlegL2",
24: "midlegL3",
25: "midlegL4",
26: "hindlegL1",
27: "hindlegL2",
28: "hindlegL3",
29: "hindlegL4",
30: "wingL",
31: "wingR"
  • ann_file (str) – Path to the annotation file.

  • img_prefix (str) – Path to a directory where images are held. Default: None.

  • data_cfg (dict) – config

  • pipeline (list[dict | callable]) – A sequence of data transforms.

  • test_mode (bool) – Store True when building test or validation dataset. Default: False.

evaluate(outputs, res_folder, metric='PCK', **kwargs)[source]

Evaluate Fly keypoint results. The pose prediction results will be saved in ${res_folder}/result_keypoints.json.


batch_size: N num_keypoints: K heatmap height: H heatmap width: W

  • outputs (list(preds, boxes, image_path, output_heatmap)) –

    preds (np.ndarray[N,K,3])

    The first two dimensions are coordinates, score is the third dimension of the array.

    boxes (np.ndarray[N,6])

    [center[0], center[1], scale[0] , scale[1],area, score]

    image_paths (list[str])

    For example, [‘Test/source/0.jpg’]

    output_heatmap (np.ndarray[N, K, H, W])

    model outpus.

  • res_folder (str) – Path of directory to save the results.

  • metric (str | list[str]) – Metric to be performed. Options: ‘PCK’, ‘AUC’, ‘EPE’.


Evaluation results for evaluation metric.

Return type


class mmpose.datasets.AnimalHorse10Dataset(ann_file, img_prefix, data_cfg, pipeline, test_mode=False)[source]

AnimalHorse10Dataset for animal pose estimation.

Pretraining boosts out-of-domain robustness for pose estimation’ WACV’2021. More details can be found in the `paper .

The dataset loads raw features and apply specified transforms to return a dict containing the image tensors and other information.

Horse-10 keypoint indexes:

0: 'Nose',
1: 'Eye',
2: 'Nearknee',
3: 'Nearfrontfetlock',
4: 'Nearfrontfoot',
5: 'Offknee',
6: 'Offfrontfetlock',
7: 'Offfrontfoot',
8: 'Shoulder',
9: 'Midshoulder',
10: 'Elbow',
11: 'Girth',
12: 'Wither',
13: 'Nearhindhock',
14: 'Nearhindfetlock',
15: 'Nearhindfoot',
16: 'Hip',
17: 'Stifle',
18: 'Offhindhock',
19: 'Offhindfetlock',
20: 'Offhindfoot',
21: 'Ischium'
  • ann_file (str) – Path to the annotation file.

  • img_prefix (str) – Path to a directory where images are held. Default: None.

  • data_cfg (dict) – config

  • pipeline (list[dict | callable]) – A sequence of data transforms.

  • test_mode (bool) – Store True when building test or validation dataset. Default: False.

evaluate(outputs, res_folder, metric='PCK', **kwargs)[source]

Evaluate horse-10 keypoint results. The pose prediction results will be saved in ${res_folder}/result_keypoints.json.


batch_size: N num_keypoints: K heatmap height: H heatmap width: W

  • outputs (list(preds, boxes, image_path, output_heatmap)) –

    preds (np.ndarray[N,K,3])

    The first two dimensions are coordinates, score is the third dimension of the array.

    boxes (np.ndarray[N,6])

    [center[0], center[1], scale[0] , scale[1],area, score]

    image_paths (list[str])

    For example, [‘Test/source/0.jpg’]

    output_heatmap (np.ndarray[N, K, H, W])

    model outpus.

  • res_folder (str) – Path of directory to save the results.

  • metric (str | list[str]) – Metric to be performed. Options: ‘PCK’, ‘NME’.


Evaluation results for evaluation metric.

Return type


class mmpose.datasets.AnimalLocustDataset(ann_file, img_prefix, data_cfg, pipeline, test_mode=False)[source]

AnimalLocustDataset for animal pose estimation.

`DeepPoseKit, a software toolkit for fast and robust animal

pose estimation using deep learning’

Elife’2019. More details can be found in the `paper.

The dataset loads raw features and apply specified transforms to return a dict containing the image tensors and other information.

Desert Locust keypoint indexes:

0: "head",
1: "neck",
2: "thorax",
3: "abdomen1",
4: "abdomen2",
5: "anttipL",
6: "antbaseL",
7: "eyeL",
8: "forelegL1",
9: "forelegL2",
10: "forelegL3",
11: "forelegL4",
12: "midlegL1",
13: "midlegL2",
14: "midlegL3",
15: "midlegL4",
16: "hindlegL1",
17: "hindlegL2",
18: "hindlegL3",
19: "hindlegL4",
20: "anttipR",
21: "antbaseR",
22: "eyeR",
23: "forelegR1",
24: "forelegR2",
25: "forelegR3",
26: "forelegR4",
27: "midlegR1",
28: "midlegR2",
29: "midlegR3",
30: "midlegR4",
31: "hindlegR1",
32: "hindlegR2",
33: "hindlegR3",
34: "hindlegR4"
  • ann_file (str) – Path to the annotation file.

  • img_prefix (str) – Path to a directory where images are held. Default: None.

  • data_cfg (dict) – config

  • pipeline (list[dict | callable]) – A sequence of data transforms.

  • test_mode (bool) – Store True when building test or validation dataset. Default: False.

evaluate(outputs, res_folder, metric='PCK', **kwargs)[source]

Evaluate Fly keypoint results. The pose prediction results will be saved in ${res_folder}/result_keypoints.json.


batch_size: N num_keypoints: K heatmap height: H heatmap width: W

  • outputs (list(preds, boxes, image_path, output_heatmap)) –

    preds (np.ndarray[N,K,3])

    The first two dimensions are coordinates, score is the third dimension of the array.

    boxes (np.ndarray[N,6])

    [center[0], center[1], scale[0] , scale[1],area, score]

    image_paths (list[str])

    For example, [‘Test/source/0.jpg’]

    output_heatmap (np.ndarray[N, K, H, W])

    model outpus.

  • res_folder (str) – Path of directory to save the results.

  • metric (str | list[str]) – Metric to be performed. Options: ‘PCK’, ‘AUC’, ‘EPE’.


Evaluation results for evaluation metric.

Return type


class mmpose.datasets.AnimalMacaqueDataset(ann_file, img_prefix, data_cfg, pipeline, test_mode=False)[source]

MacaquePose dataset for animal pose estimation.

MacaquePose: A novel ‘in the wild’ macaque monkey pose dataset for markerless motion capture’ bioRxiv’2020 More details can be found in the `paper .

The dataset loads raw features and apply specified transforms to return a dict containing the image tensors and other information.

Macaque keypoint indexes:

0: 'nose',
1: 'left_eye',
2: 'right_eye',
3: 'left_ear',
4: 'right_ear',
5: 'left_shoulder',
6: 'right_shoulder',
7: 'left_elbow',
8: 'right_elbow',
9: 'left_wrist',
10: 'right_wrist',
11: 'left_hip',
12: 'right_hip',
13: 'left_knee',
14: 'right_knee',
15: 'left_ankle',
16: 'right_ankle'
  • ann_file (str) – Path to the annotation file.

  • img_prefix (str) – Path to a directory where images are held. Default: None.

  • data_cfg (dict) – config

  • pipeline (list[dict | callable]) – A sequence of data transforms.

  • test_mode (bool) – Store True when building test or validation dataset. Default: False.

evaluate(outputs, res_folder, metric='mAP', **kwargs)[source]

Evaluate coco keypoint results. The pose prediction results will be saved in ${res_folder}/result_keypoints.json.


batch_size: N num_keypoints: K heatmap height: H heatmap width: W

  • outputs (list(dict)) –

    preds (np.ndarray[N,K,3])

    The first two dimensions are coordinates, score is the third dimension of the array.

    boxes (np.ndarray[N,6])

    [center[0], center[1], scale[0] , scale[1],area, score]

    image_paths (list[str])

    For example, [‘data/coco/val2017 /000000393226.jpg’]

    heatmap (np.ndarray[N, K, H, W])

    model output heatmap

    :bbox_id (list(int)).

  • res_folder (str) – Path of directory to save the results.

  • metric (str | list[str]) – Metric to be performed. Defaults: ‘mAP’.


Evaluation results for evaluation metric.

Return type


class mmpose.datasets.AnimalPoseDataset(ann_file, img_prefix, data_cfg, pipeline, test_mode=False)[source]

Animal-Pose dataset for animal pose estimation.

Cross-domain Adaptation For Animal Pose Estimation’ ICCV’2019 More details can be found in the `paper .

The dataset loads raw features and apply specified transforms to return a dict containing the image tensors and other information.

Animal-Pose keypoint indexes:

0: 'L_Eye',
1: 'R_Eye',
2: 'L_EarBase',
3: 'R_EarBase',
4: 'Nose',
5: 'Throat',
6: 'TailBase',
7: 'Withers',
8: 'L_F_Elbow',
9: 'R_F_Elbow',
10: 'L_B_Elbow',
11: 'R_B_Elbow',
12: 'L_F_Knee',
13: 'R_F_Knee',
14: 'L_B_Knee',
15: 'R_B_Knee',
16: 'L_F_Paw',
17: 'R_F_Paw',
18: 'L_B_Paw',
19: 'R_B_Paw'
  • ann_file (str) – Path to the annotation file.

  • img_prefix (str) – Path to a directory where images are held. Default: None.

  • data_cfg (dict) – config

  • pipeline (list[dict | callable]) – A sequence of data transforms.

  • test_mode (bool) – Store True when building test or validation dataset. Default: False.

evaluate(outputs, res_folder, metric='mAP', **kwargs)[source]

Evaluate coco keypoint results. The pose prediction results will be saved in ${res_folder}/result_keypoints.json.


batch_size: N num_keypoints: K heatmap height: H heatmap width: W

  • outputs (list(dict)) –

    preds (np.ndarray[N,K,3])

    The first two dimensions are coordinates, score is the third dimension of the array.

    boxes (np.ndarray[N,6])

    [center[0], center[1], scale[0] , scale[1],area, score]

    image_paths (list[str])

    For example, [‘data/coco/val2017 /000000393226.jpg’]

    heatmap (np.ndarray[N, K, H, W])

    model output heatmap

    :bbox_id (list(int)).

  • res_folder (str) – Path of directory to save the results.

  • metric (str | list[str]) – Metric to be performed. Defaults: ‘mAP’.


Evaluation results for evaluation metric.

Return type


class mmpose.datasets.AnimalZebraDataset(ann_file, img_prefix, data_cfg, pipeline, test_mode=False)[source]

AnimalZebraDataset for animal pose estimation.

`DeepPoseKit, a software toolkit for fast and robust animal

pose estimation using deep learning’

Elife’2019. More details can be found in the `paper.

The dataset loads raw features and apply specified transforms to return a dict containing the image tensors and other information.

Desert Locust keypoint indexes:

0: "snout",
1: "head",
2: "neck",
3: "forelegL1",
4: "forelegR1",
5: "hindlegL1",
6: "hindlegR1",
7: "tailbase",
8: "tailtip"
  • ann_file (str) – Path to the annotation file.

  • img_prefix (str) – Path to a directory where images are held. Default: None.

  • data_cfg (dict) – config

  • pipeline (list[dict | callable]) – A sequence of data transforms.

  • test_mode (bool) – Store True when building test or validation dataset. Default: False.

evaluate(outputs, res_folder, metric='PCK', **kwargs)[source]

Evaluate Fly keypoint results. The pose prediction results will be saved in ${res_folder}/result_keypoints.json.


batch_size: N num_keypoints: K heatmap height: H heatmap width: W

  • outputs (list(preds, boxes, image_path, output_heatmap)) –

    preds (np.ndarray[N,K,3])

    The first two dimensions are coordinates, score is the third dimension of the array.

    boxes (np.ndarray[N,6])

    [center[0], center[1], scale[0] , scale[1],area, score]

    image_paths (list[str])

    For example, [‘Test/source/0.jpg’]

    output_heatmap (np.ndarray[N, K, H, W])

    model outpus.

  • res_folder (str) – Path of directory to save the results.

  • metric (str | list[str]) – Metric to be performed. Options: ‘PCK’, ‘AUC’, ‘EPE’.


Evaluation results for evaluation metric.

Return type


class mmpose.datasets.BottomUpCocoDataset(ann_file, img_prefix, data_cfg, pipeline, test_mode=False)[source]

COCO dataset for bottom-up pose estimation.

The dataset loads raw features and apply specified transforms to return a dict containing the image tensors and other information.

COCO keypoint indexes:

0: 'nose',
1: 'left_eye',
2: 'right_eye',
3: 'left_ear',
4: 'right_ear',
5: 'left_shoulder',
6: 'right_shoulder',
7: 'left_elbow',
8: 'right_elbow',
9: 'left_wrist',
10: 'right_wrist',
11: 'left_hip',
12: 'right_hip',
13: 'left_knee',
14: 'right_knee',
15: 'left_ankle',
16: 'right_ankle'
  • ann_file (str) – Path to the annotation file.

  • img_prefix (str) – Path to a directory where images are held. Default: None.

  • data_cfg (dict) – config

  • pipeline (list[dict | callable]) – A sequence of data transforms.

  • test_mode (bool) – Store True when building test or validation dataset. Default: False.

evaluate(outputs, res_folder, metric='mAP', **kwargs)[source]

Evaluate coco keypoint results. The pose prediction results will be saved in ${res_folder}/result_keypoints.json.


num_people: P num_keypoints: K

  • outputs (list(preds, scores, image_path, heatmap)) –

    • preds (list[np.ndarray(P, K, 3+tag_num)]): Pose predictions for all people in images.

    • scores (list[P]):

    • image_path (list[str]): For example, [‘coco/images/

    val2017/000000397133.jpg’] * heatmap (np.ndarray[N, K, H, W]): model outputs.

  • res_folder (str) – Path of directory to save the results.

  • metric (str | list[str]) – Metric to be performed. Defaults: ‘mAP’.


Evaluation results for evaluation metric.

Return type


class mmpose.datasets.BottomUpCrowdPoseDataset(ann_file, img_prefix, data_cfg, pipeline, test_mode=False)[source]

CrowdPose dataset for bottom-up pose estimation.

The dataset loads raw features and apply specified transforms to return a dict containing the image tensors and other information.

CrowdPose keypoint indexes:

0: 'left_shoulder',
1: 'right_shoulder',
2: 'left_elbow',
3: 'right_elbow',
4: 'left_wrist',
5: 'right_wrist',
6: 'left_hip',
7: 'right_hip',
8: 'left_knee',
9: 'right_knee',
10: 'left_ankle',
11: 'right_ankle',
12: 'top_head',
13: 'neck'
  • ann_file (str) – Path to the annotation file.

  • img_prefix (str) – Path to a directory where images are held. Default: None.

  • data_cfg (dict) – config

  • pipeline (list[dict | callable]) – A sequence of data transforms.

  • test_mode (bool) – Store True when building test or validation dataset. Default: False.

class mmpose.datasets.BottomUpMhpDataset(ann_file, img_prefix, data_cfg, pipeline, test_mode=False)[source]

MHPv2.0 dataset for top-down pose estimation.

The Multi-Human Parsing project of Learning and Vision (LV) Group, National University of Singapore (NUS) is proposed to push the frontiers of fine-grained visual understanding of humans in crowd scene. <https://lv-mhp.github.io/>

The dataset loads raw features and apply specified transforms to return a dict containing the image tensors and other information.

MHP keypoint indexes:

0: "right ankle",
1: "right knee",
2: "right hip",
3: "left hip",
4: "left knee",
5: "left ankle",
6: "pelvis",
7: "thorax",
8: "upper neck",
9: "head top",
10: "right wrist",
11: "right elbow",
12: "right shoulder",
13: "left shoulder",
14: "left elbow",
15: "left wrist",
  • ann_file (str) – Path to the annotation file.

  • img_prefix (str) – Path to a directory where images are held. Default: None.

  • data_cfg (dict) – config

  • pipeline (list[dict | callable]) – A sequence of data transforms.

  • test_mode (bool) – Store True when building test or validation dataset. Default: False.

class mmpose.datasets.Compose(transforms)[source]

Compose a data pipeline with a sequence of transforms.


transforms (list[dict | callable]) – Either config dicts of transforms or transform objects.

class mmpose.datasets.DeepFashionDataset(ann_file, img_prefix, subset, data_cfg, pipeline, test_mode=False)[source]

DeepFashion dataset (full-body clothes) for fashion landmark detection.

`DeepFashion: Powering Robust Clothes Recognition and Retrieval with Rich Annotations’ CVPR’2016 and `Fashion Landmark Detection in the Wild’ ECCV’2016

The dataset loads raw features and apply specified transforms to return a dict containing the image tensors and other information.

The dataset contains 3 categories for full-body, upper-body and lower-body.

Fashion landmark indexes for upper-body clothes:

0: 'left collar',
1: 'right collar',
2: 'left sleeve',
3: 'right sleeve',
4: 'left hem',
5: 'right hem'

Fashion landmark indexes for lower-body clothes:

0: 'left waistline',
1: 'right waistline',
2: 'left hem',
3: 'right hem'

Fashion landmark indexes for full-body clothes:

0: 'left collar',
1: 'right collar',
2: 'left sleeve',
3: 'right sleeve',
4: 'left waistline',
5: 'right waistline',
6: 'left hem',
7: 'right hem'
  • ann_file (str) – Path to the annotation file.

  • img_prefix (str) – Path to a directory where images are held. Default: None.

  • subset (str) – The FLD dataset has 3 subsets, ‘upper’, ‘lower’, and ‘full’, denoting different types of clothes.

  • data_cfg (dict) – config

  • pipeline (list[dict | callable]) – A sequence of data transforms.

  • test_mode (bool) – Store True when building test or validation dataset. Default: False.

evaluate(outputs, res_folder, metric='PCK', **kwargs)[source]

Evaluate freihand keypoint results. The pose prediction results will be saved in ${res_folder}/result_keypoints.json.


batch_size: N num_keypoints: K heatmap height: H heatmap width: W

  • outputs (list(preds, boxes, image_path, output_heatmap)) –

    preds (np.ndarray[N,K,3])

    The first two dimensions are coordinates, score is the third dimension of the array.

    boxes (np.ndarray[N,6])

    [center[0], center[1], scale[0] , scale[1],area, score]

    image_paths (list[str])

    For example, [ ‘img_00000001.jpg’]

    output_heatmap (np.ndarray[N, K, H, W])

    model outpus.

  • res_folder (str) – Path of directory to save the results.

  • metric (str | list[str]) – Metric to be performed. Options: ‘PCK’, ‘AUC’, ‘EPE’.


Evaluation results for evaluation metric.

Return type


class mmpose.datasets.DistributedSampler(dataset, num_replicas=None, rank=None, shuffle=True, seed=0)[source]

DistributedSampler inheriting from torch.utils.data.DistributedSampler.

In pytorch of lower versions, there is no shuffle argument. This child class will port one to DistributedSampler.

class mmpose.datasets.Face300WDataset(ann_file, img_prefix, data_cfg, pipeline, test_mode=False)[source]

Face300W dataset for top-down face keypoint localization.

300 faces In-the-wild challenge: Database and results. Image and Vision Computing (IMAVIS) 2019.

The dataset loads raw images and apply specified transforms to return a dict containing the image tensors and other information.

The landmark annotations follow the 68 points mark-up. The definition can be found in https://ibug.doc.ic.ac.uk/resources/300-W/.

  • ann_file (str) – Path to the annotation file.

  • img_prefix (str) – Path to a directory where images are held. Default: None.

  • data_cfg (dict) – config

  • pipeline (list[dict | callable]) – A sequence of data transforms.

  • test_mode (bool) – Store True when building test or validation dataset. Default: False.

evaluate(outputs, res_folder, metric='NME', **kwargs)[source]

Evaluate freihand keypoint results. The pose prediction results will be saved in ${res_folder}/result_keypoints.json.


batch_size: N num_keypoints: K heatmap height: H heatmap width: W

  • outputs (list(preds, boxes, image_path, output_heatmap)) –

    preds (np.ndarray[1,K,3])

    The first two dimensions are coordinates, score is the third dimension of the array.

    boxes (np.ndarray[1,6])

    [center[0], center[1], scale[0] , scale[1],area, score]

    image_path (list[str])

    For example, [‘3’, ‘0’, ‘0’, ‘W’, ‘/’, ‘i’, ‘b’, ‘u’, ‘g’, ‘/’, ‘i’, ‘m’, ‘a’, ‘g’, ‘e’, ‘_’, ‘0’, ‘1’, ‘8’, ‘.’, ‘j’, ‘p’, ‘g’]

    output_heatmap (np.ndarray[N, K, H, W])

    model outpus.

  • res_folder (str) – Path of directory to save the results.

  • metric (str | list[str]) – Metric to be performed. Options: ‘NME’.


Evaluation results for evaluation metric.

Return type


class mmpose.datasets.FreiHandDataset(ann_file, img_prefix, data_cfg, pipeline, test_mode=False)[source]

FreiHand dataset for top-down hand pose estimation.

FreiHAND: A Dataset for Markerless Capture of Hand Pose and Shape from Single RGB Images’ ICCV’2019 More details can be found in the `paper .

The dataset loads raw features and apply specified transforms to return a dict containing the image tensors and other information.

FreiHand keypoint indexes:

0: 'wrist',
1: 'thumb1',
2: 'thumb2',
3: 'thumb3',
4: 'thumb4',
5: 'forefinger1',
6: 'forefinger2',
7: 'forefinger3',
8: 'forefinger4',
9: 'middle_finger1',
10: 'middle_finger2',
11: 'middle_finger3',
12: 'middle_finger4',
13: 'ring_finger1',
14: 'ring_finger2',
15: 'ring_finger3',
16: 'ring_finger4',
17: 'pinky_finger1',
18: 'pinky_finger2',
19: 'pinky_finger3',
20: 'pinky_finger4'
  • ann_file (str) – Path to the annotation file.

  • img_prefix (str) – Path to a directory where images are held. Default: None.

  • data_cfg (dict) – config

  • pipeline (list[dict | callable]) – A sequence of data transforms.

  • test_mode (bool) – Store True when building test or validation dataset. Default: False.

evaluate(outputs, res_folder, metric='PCK', **kwargs)[source]

Evaluate freihand keypoint results. The pose prediction results will be saved in ${res_folder}/result_keypoints.json.


batch_size: N num_keypoints: K heatmap height: H heatmap width: W

  • outputs (list(preds, boxes, image_path, output_heatmap)) –

    preds (np.ndarray[N,K,3])

    The first two dimensions are coordinates, score is the third dimension of the array.

    boxes (np.ndarray[N,6])

    [center[0], center[1], scale[0] , scale[1],area, score]

    image_paths (list[str])

    For example, [‘training/rgb/ 00031426.jpg’]

    output_heatmap (np.ndarray[N, K, H, W])

    model outpus.

  • res_folder (str) – Path of directory to save the results.

  • metric (str | list[str]) – Metric to be performed. Options: ‘PCK’, ‘AUC’, ‘EPE’.


Evaluation results for evaluation metric.

Return type


class mmpose.datasets.InterHand2DDataset(ann_file, camera_file, joint_file, img_prefix, data_cfg, pipeline, test_mode=False)[source]

InterHand2.6M 2D dataset for top-down hand pose estimation.

InterHand2.6M: A Dataset and Baseline for 3D Interacting Hand Pose Estimation from a Single RGB Image’ Moon, Gyeongsik etal. ECCV’2020 More details can be found in the `paper .

The dataset loads raw features and apply specified transforms to return a dict containing the image tensors and other information.

InterHand2.6M keypoint indexes:

0: 'thumb4',
1: 'thumb3',
2: 'thumb2',
3: 'thumb1',
4: 'forefinger4',
5: 'forefinger3',
6: 'forefinger2',
7: 'forefinger1',
8: 'middle_finger4',
9: 'middle_finger3',
10: 'middle_finger2',
11: 'middle_finger1',
12: 'ring_finger4',
13: 'ring_finger3',
14: 'ring_finger2',
15: 'ring_finger1',
16: 'pinky_finger4',
17: 'pinky_finger3',
18: 'pinky_finger2',
19: 'pinky_finger1',
20: 'wrist'
  • ann_file (str) – Path to the annotation file.

  • img_prefix (str) – Path to a directory where images are held. Default: None.

  • data_cfg (dict) – config

  • pipeline (list[dict | callable]) – A sequence of data transforms.

  • test_mode (str) – Store True when building test or validation dataset. Default: False.

evaluate(outputs, res_folder, metric='PCK', **kwargs)[source]

Evaluate interhand2d keypoint results. The pose prediction results will be saved in ${res_folder}/result_keypoints.json.


batch_size: N num_keypoints: K heatmap height: H heatmap width: W

  • outputs (list(preds, boxes, image_path, output_heatmap)) –

    preds (np.ndarray[N,K,3])

    The first two dimensions are coordinates, score is the third dimension of the array.

    boxes (np.ndarray[N,6])

    [center[0], center[1], scale[0] , scale[1],area, score]

    image_paths (list[str])

    For example, [‘C’, ‘a’, ‘p’, ‘t’, ‘u’, ‘r’, ‘e’, ‘1’, ‘2’, ‘/’, ‘0’, ‘3’, ‘9’, ‘0’, ‘_’, ‘d’, ‘h’, ‘_’, ‘t’, ‘o’, ‘u’, ‘c’, ‘h’, ‘R’, ‘O’, ‘M’, ‘/’, ‘c’, ‘a’, ‘m’, ‘4’, ‘1’, ‘0’, ‘2’, ‘0’, ‘9’, ‘/’, ‘i’, ‘m’, ‘a’, ‘g’, ‘e’, ‘6’, ‘2’, ‘4’, ‘3’, ‘4’, ‘.’, ‘j’, ‘p’, ‘g’]

    output_heatmap (np.ndarray[N, K, H, W])

    model outpus.

  • res_folder (str) – Path of directory to save the results.

  • metric (str | list[str]) – Metric to be performed. Options: ‘PCK’, ‘AUC’, ‘EPE’.


Evaluation results for evaluation metric.

Return type


class mmpose.datasets.MeshAdversarialDataset(train_dataset, adversarial_dataset)[source]

Mix Dataset for the adversarial training in 3D human mesh estimation task.

The dataset combines data from two datasets and return a dict containing data from two datasets.

  • train_dataset (Dataset) – Dataset for 3D human mesh estimation.

  • adversarial_dataset (Dataset) – Dataset for adversarial learning, provides real SMPL parameters.

class mmpose.datasets.MeshH36MDataset(ann_file, img_prefix, data_cfg, pipeline, test_mode=False)[source]

Human3.6M Dataset for 3D human mesh estimation. It inherits all function from MeshBaseDataset and has its own evaluate fuction.

The dataset loads raw features and apply specified transforms to return a dict containing the image tensors and other information.

  • ann_file (str) – Path to the annotation file.

  • img_prefix (str) – Path to a directory where images are held. Default: None.

  • data_cfg (dict) – config

  • pipeline (list[dict | callable]) – A sequence of data transforms.

  • test_mode (bool) – Store True when building test or validation dataset. Default: False.

evaluate(outputs, res_folder, metric='joint_error', logger=None)[source]

Evaluate 3D keypoint results.

static evaluate_kernel(pred_joints_3d, joints_3d, joints_3d_visible)[source]

Evaluate one example.

class mmpose.datasets.MeshMixDataset(configs, partition)[source]

Mix Dataset for 3D human mesh estimation.

The dataset combines data from multiple datasets (MeshBaseDataset) and sample the data from different datasets with the provided proportions. The dataset loads raw features and apply specified transforms to return a dict containing the image tensors and other information.

  • configs (list) – List of configs for multiple datasets.

  • partition (list) – Sample proportion of multiple datasets. The the elements of it should be non-negative and the sum of it should be 1.

class mmpose.datasets.MoshDataset(ann_file, pipeline, test_mode=False)[source]

Mosh Dataset for the adversarial training in 3D human mesh estimation task.

The dataset return a dict containing real-world SMPL parameters.

  • ann_file (str) – Path to the annotation file.

  • pipeline (list[dict | callable]) – A sequence of data transforms.

  • test_mode (bool) – Store True when building test or validation dataset. Default: False.

class mmpose.datasets.OneHand10KDataset(ann_file, img_prefix, data_cfg, pipeline, test_mode=False)[source]

OneHand10K dataset for top-down hand pose estimation.

Mask-pose Cascaded CNN for 2D Hand Pose Estimation from Single Color Images’ TCSVT’2019 More details can be found in the `paper .

The dataset loads raw features and apply specified transforms to return a dict containing the image tensors and other information.

OneHand10K keypoint indexes:

0: 'wrist',
1: 'thumb1',
2: 'thumb2',
3: 'thumb3',
4: 'thumb4',
5: 'forefinger1',
6: 'forefinger2',
7: 'forefinger3',
8: 'forefinger4',
9: 'middle_finger1',
10: 'middle_finger2',
11: 'middle_finger3',
12: 'middle_finger4',
13: 'ring_finger1',
14: 'ring_finger2',
15: 'ring_finger3',
16: 'ring_finger4',
17: 'pinky_finger1',
18: 'pinky_finger2',
19: 'pinky_finger3',
20: 'pinky_finger4'
  • ann_file (str) – Path to the annotation file.

  • img_prefix (str) – Path to a directory where images are held. Default: None.

  • data_cfg (dict) – config

  • pipeline (list[dict | callable]) – A sequence of data transforms.

  • test_mode (bool) – Store True when building test or validation dataset. Default: False.

evaluate(outputs, res_folder, metric='PCK', **kwargs)[source]

Evaluate onehand10k keypoint results. The pose prediction results will be saved in ${res_folder}/result_keypoints.json.


batch_size: N num_keypoints: K heatmap height: H heatmap width: W

  • outputs (list(preds, boxes, image_path, output_heatmap)) –

    preds (np.ndarray[N,K,3])

    The first two dimensions are coordinates, score is the third dimension of the array.

    boxes (np.ndarray[N,6])

    [center[0], center[1], scale[0] , scale[1],area, score]

    image_paths (list[str])

    For example, [‘Test/source/0.jpg’]

    output_heatmap (np.ndarray[N, K, H, W])

    model outpus.

  • res_folder (str) – Path of directory to save the results.

  • metric (str | list[str]) – Metric to be performed. Options: ‘PCK’, ‘AUC’, ‘EPE’.


Evaluation results for evaluation metric.

Return type


class mmpose.datasets.PanopticDataset(ann_file, img_prefix, data_cfg, pipeline, test_mode=False)[source]

Panoptic dataset for top-down hand pose estimation.

Hand Keypoint Detection in Single Images using Multiview Bootstrapping’ CVPR’2017 More details can be found in the `paper .

The dataset loads raw features and apply specified transforms to return a dict containing the image tensors and other information.

Panoptic keypoint indexes:

0: 'wrist',
1: 'thumb1',
2: 'thumb2',
3: 'thumb3',
4: 'thumb4',
5: 'forefinger1',
6: 'forefinger2',
7: 'forefinger3',
8: 'forefinger4',
9: 'middle_finger1',
10: 'middle_finger2',
11: 'middle_finger3',
12: 'middle_finger4',
13: 'ring_finger1',
14: 'ring_finger2',
15: 'ring_finger3',
16: 'ring_finger4',
17: 'pinky_finger1',
18: 'pinky_finger2',
19: 'pinky_finger3',
20: 'pinky_finger4'
  • ann_file (str) – Path to the annotation file.

  • img_prefix (str) – Path to a directory where images are held. Default: None.

  • data_cfg (dict) – config

  • pipeline (list[dict | callable]) – A sequence of data transforms.

  • test_mode (bool) – Store True when building test or validation dataset. Default: False.

evaluate(outputs, res_folder, metric='PCKh', **kwargs)[source]

Evaluate panoptic keypoint results. The pose prediction results will be saved in ${res_folder}/result_keypoints.json.


batch_size: N num_keypoints: K heatmap height: H heatmap width: W

  • outputs (list(preds, boxes, image_path, output_heatmap)) –

    preds (np.ndarray[N,K,3])

    The first two dimensions are coordinates, score is the third dimension of the array.

    boxes (np.ndarray[N,6])

    [center[0], center[1], scale[0] , scale[1],area, score]

    image_paths (list[str])

    For example, [‘hand_labels/’ ‘manual_test/000648952_02_l.jpg’]

    output_heatmap (np.ndarray[N, K, H, W])

    model outpus.

  • res_folder (str) – Path of directory to save the results.

  • metric (str | list[str]) – Metric to be performed. Options: ‘PCKh’, ‘AUC’, ‘EPE’.


Evaluation results for evaluation metric.

Return type


class mmpose.datasets.TopDownAicDataset(ann_file, img_prefix, data_cfg, pipeline, test_mode=False)[source]

AicDataset dataset for top-down pose estimation.

AI Challenger : A Large-scale Dataset for Going Deeper in Image Understanding

The dataset loads raw features and apply specified transforms to return a dict containing the image tensors and other information.

AIC keypoint indexes::

0: “right_shoulder”, 1: “right_elbow”, 2: “right_wrist”, 3: “left_shoulder”, 4: “left_elbow”, 5: “left_wrist”, 6: “right_hip”, 7: “right_knee”, 8: “right_ankle”, 9: “left_hip”, 10: “left_knee”, 11: “left_ankle”, 12: “head_top”, 13: “neck”

  • ann_file (str) – Path to the annotation file.

  • img_prefix (str) – Path to a directory where images are held. Default: None.

  • data_cfg (dict) – config

  • pipeline (list[dict | callable]) – A sequence of data transforms.

  • test_mode (bool) – Store True when building test or validation dataset. Default: False.

class mmpose.datasets.TopDownCocoDataset(ann_file, img_prefix, data_cfg, pipeline, test_mode=False)[source]

CocoDataset dataset for top-down pose estimation.

Microsoft COCO: Common Objects in Context’ ECCV’2014 More details can be found in the `paper .

The dataset loads raw features and apply specified transforms to return a dict containing the image tensors and other information.

COCO keypoint indexes:

0: 'nose',
1: 'left_eye',
2: 'right_eye',
3: 'left_ear',
4: 'right_ear',
5: 'left_shoulder',
6: 'right_shoulder',
7: 'left_elbow',
8: 'right_elbow',
9: 'left_wrist',
10: 'right_wrist',
11: 'left_hip',
12: 'right_hip',
13: 'left_knee',
14: 'right_knee',
15: 'left_ankle',
16: 'right_ankle'
  • ann_file (str) – Path to the annotation file.

  • img_prefix (str) – Path to a directory where images are held. Default: None.

  • data_cfg (dict) – config

  • pipeline (list[dict | callable]) – A sequence of data transforms.

  • test_mode (bool) – Store True when building test or validation dataset. Default: False.

evaluate(outputs, res_folder, metric='mAP', **kwargs)[source]

Evaluate coco keypoint results. The pose prediction results will be saved in ${res_folder}/result_keypoints.json.


batch_size: N num_keypoints: K heatmap height: H heatmap width: W

  • outputs (list(dict)) –

    preds (np.ndarray[N,K,3])

    The first two dimensions are coordinates, score is the third dimension of the array.

    boxes (np.ndarray[N,6])

    [center[0], center[1], scale[0] , scale[1],area, score]

    image_paths (list[str])

    For example, [‘data/coco/val2017 /000000393226.jpg’]

    heatmap (np.ndarray[N, K, H, W])

    model output heatmap

    :bbox_id (list(int)).

  • res_folder (str) – Path of directory to save the results.

  • metric (str | list[str]) – Metric to be performed. Defaults: ‘mAP’.


Evaluation results for evaluation metric.

Return type


class mmpose.datasets.TopDownCocoWholeBodyDataset(ann_file, img_prefix, data_cfg, pipeline, test_mode=False)[source]

CocoWholeBodyDataset dataset for top-down pose estimation.

Whole-Body Human Pose Estimation in the Wild’ ECCV’2020 More details can be found in the `paper .

The dataset loads raw features and apply specified transforms to return a dict containing the image tensors and other information.

In total, we have 133 keypoints for wholebody pose estimation.

COCO-WholeBody keypoint indexes::

0-16: 17 body keypoints 17-22: 6 foot keypoints 23-90: 68 face keypoints 91-132: 42 hand keypoints

  • ann_file (str) – Path to the annotation file.

  • img_prefix (str) – Path to a directory where images are held. Default: None.

  • data_cfg (dict) – config

  • pipeline (list[dict | callable]) – A sequence of data transforms.

  • test_mode (bool) – Store True when building test or validation dataset. Default: False.

class mmpose.datasets.TopDownCrowdPoseDataset(ann_file, img_prefix, data_cfg, pipeline, test_mode=False)[source]

CrowdPoseDataset dataset for top-down pose estimation.

The dataset loads raw features and apply specified transforms to return a dict containing the image tensors and other information.

CrowdPose keypoint indexes:

0: 'left_shoulder',
1: 'right_shoulder',
2: 'left_elbow',
3: 'right_elbow',
4: 'left_wrist',
5: 'right_wrist',
6: 'left_hip',
7: 'right_hip',
8: 'left_knee',
9: 'right_knee',
10: 'left_ankle',
11: 'right_ankle',
12: 'top_head',
13: 'neck'
  • ann_file (str) – Path to the annotation file.

  • img_prefix (str) – Path to a directory where images are held. Default: None.

  • data_cfg (dict) – config

  • pipeline (list[dict | callable]) – A sequence of data transforms.

  • test_mode (bool) – Store True when building test or validation dataset. Default: False.

class mmpose.datasets.TopDownFreiHandDataset(*args, **kwargs)[source]

Deprecated TopDownFreiHandDataset.

evaluate(cfg, preds, output_dir, *args, **kwargs)[source]

Evaluate keypoint results.

class mmpose.datasets.TopDownJhmdbDataset(ann_file, img_prefix, data_cfg, pipeline, test_mode=False)[source]

JhmdbDataset dataset for top-down pose estimation.

`Towards understanding action recognition

<https://openaccess.thecvf.com/content_iccv_2013/papers/ Jhuang_Towards_Understanding_Action_2013_ICCV_paper.pdf>`__

The dataset loads raw features and apply specified transforms to return a dict containing the image tensors and other information.

sub-JHMDB keypoint indexes::

0: “neck”, 1: “belly”, 2: “head”, 3: “right_shoulder”, 4: “left_shoulder”, 5: “right_hip”, 6: “left_hip”, 7: “right_elbow”, 8: “left_elbow”, 9: “right_knee”, 10: “left_knee”, 11: “right_wrist”, 12: “left_wrist”, 13: “right_ankle”, 14: “left_ankle”

  • ann_file (str) – Path to the annotation file.

  • img_prefix (str) – Path to a directory where images are held. Default: None.

  • data_cfg (dict) – config

  • pipeline (list[dict | callable]) – A sequence of data transforms.

  • test_mode (bool) – Store True when building test or validation dataset. Default: False.

evaluate(outputs, res_folder, metric='PCK', **kwargs)[source]

Evaluate onehand10k keypoint results. The pose prediction results will be saved in ${res_folder}/result_keypoints.json.


batch_size: N num_keypoints: K heatmap height: H heatmap width: W

  • outputs (list(preds, boxes, image_path, output_heatmap)) –

    preds (np.ndarray[N,K,3])

    The first two dimensions are coordinates, score is the third dimension of the array.

    boxes (np.ndarray[N,6])

    [center[0], center[1], scale[0] , scale[1],area, score]

    :image_path (list[str]) :output_heatmap (np.ndarray[N, K, H, W]): model outpus.

  • res_folder (str) – Path of directory to save the results.

  • metric (str | list[str]) – Metric to be performed. Options: ‘PCK’, ‘tPCK’. PCK means normalized by the bounding boxes, while tPCK means normalized by the torso size.


Evaluation results for evaluation metric.

Return type


class mmpose.datasets.TopDownMhpDataset(ann_file, img_prefix, data_cfg, pipeline, test_mode=False)[source]

MHPv2.0 dataset for top-down pose estimation.

The Multi-Human Parsing project of Learning and Vision (LV) Group, National University of Singapore (NUS) is proposed to push the frontiers of fine-grained visual understanding of humans in crowd scene. <https://lv-mhp.github.io/>

Note that, the evaluation metric used here is mAP (adapted from COCO), which may be different from the official evaluation codes. ‘https://github.com/ZhaoJ9014/Multi-Human-Parsing/tree/master/’ ‘Evaluation/Multi-Human-Pose’ Please be cautious if you use the results in papers.

The dataset loads raw features and apply specified transforms to return a dict containing the image tensors and other information.

MHP keypoint indexes:

0: "right ankle",
1: "right knee",
2: "right hip",
3: "left hip",
4: "left knee",
5: "left ankle",
6: "pelvis",
7: "thorax",
8: "upper neck",
9: "head top",
10: "right wrist",
11: "right elbow",
12: "right shoulder",
13: "left shoulder",
14: "left elbow",
15: "left wrist",
  • ann_file (str) – Path to the annotation file.

  • img_prefix (str) – Path to a directory where images are held. Default: None.

  • data_cfg (dict) – config

  • pipeline (list[dict | callable]) – A sequence of data transforms.

  • test_mode (bool) – Store True when building test or validation dataset. Default: False.

class mmpose.datasets.TopDownMpiiDataset(ann_file, img_prefix, data_cfg, pipeline, test_mode=False)[source]

MPII Dataset for top-down pose estimation.

2D Human Pose Estimation: New Benchmark and State of the Art Analysis’ CVPR’2014. More details can be found in the `paper .

The dataset loads raw features and apply specified transforms to return a dict containing the image tensors and other information.

MPII keypoint indexes:

0: 'right_ankle'
1: 'right_knee',
2: 'right_hip',
3: 'left_hip',
4: 'left_knee',
5: 'left_ankle',
6: 'pelvis',
7: 'thorax',
8: 'upper_neck',
9: 'head_top',
10: 'right_wrist',
11: 'right_elbow',
12: 'right_shoulder',
13: 'left_shoulder',
14: 'left_elbow',
15: 'left_wrist'
  • ann_file (str) – Path to the annotation file.

  • img_prefix (str) – Path to a directory where images are held. Default: None.

  • data_cfg (dict) – config

  • pipeline (list[dict | callable]) – A sequence of data transforms.

  • test_mode (bool) – Store True when building test or validation dataset. Default: False.

evaluate(outputs, res_folder, metric='PCKh', **kwargs)[source]

Evaluate PCKh for MPII dataset. Adapted from https://github.com/leoxiaobin/deep-high-resolution-net.pytorch Copyright (c) Microsoft, under the MIT License.


batch_size: N num_keypoints: K heatmap height: H heatmap width: W

  • outputs (list(preds, boxes, image_path, heatmap)) –

    • preds (np.ndarray[N,K,3]): The first two dimensions are coordinates, score is the third dimension of the array.

    • boxes (np.ndarray[N,6]): [center[0], center[1], scale[0] , scale[1],area, score]

    • image_paths (list[str]): For example, [‘/val2017/000000 397133.jpg’]

    • heatmap (np.ndarray[N, K, H, W]): model output heatmap.

  • res_folder (str) – Path of directory to save the results.

  • metric (str | list[str]) – Metrics to be performed. Defaults: ‘PCKh’.


PCKh for each joint

Return type


class mmpose.datasets.TopDownMpiiTrbDataset(ann_file, img_prefix, data_cfg, pipeline, test_mode=False)[source]

MPII-TRB Dataset dataset for top-down pose estimation.

TRB: A Novel Triplet Representation for Understanding 2D Human Body ICCV’2019 More details can be found in the paper .

The dataset loads raw features and apply specified transforms to return a dict containing the image tensors and other information.

MPII-TRB keypoint indexes:

0: 'left_shoulder'
1: 'right_shoulder'
2: 'left_elbow'
3: 'right_elbow'
4: 'left_wrist'
5: 'right_wrist'
6: 'left_hip'
7: 'right_hip'
8: 'left_knee'
9: 'right_knee'
10: 'left_ankle'
11: 'right_ankle'
12: 'head'
13: 'neck'

14: 'right_neck'
15: 'left_neck'
16: 'medial_right_shoulder'
17: 'lateral_right_shoulder'
18: 'medial_right_bow'
19: 'lateral_right_bow'
20: 'medial_right_wrist'
21: 'lateral_right_wrist'
22: 'medial_left_shoulder'
23: 'lateral_left_shoulder'
24: 'medial_left_bow'
25: 'lateral_left_bow'
26: 'medial_left_wrist'
27: 'lateral_left_wrist'
28: 'medial_right_hip'
29: 'lateral_right_hip'
30: 'medial_right_knee'
31: 'lateral_right_knee'
32: 'medial_right_ankle'
33: 'lateral_right_ankle'
34: 'medial_left_hip'
35: 'lateral_left_hip'
36: 'medial_left_knee'
37: 'lateral_left_knee'
38: 'medial_left_ankle'
39: 'lateral_left_ankle'
  • ann_file (str) – Path to the annotation file.

  • img_prefix (str) – Path to a directory where images are held. Default: None.

  • data_cfg (dict) – config

  • pipeline (list[dict | callable]) – A sequence of data transforms.

  • test_mode (bool) – Store True when building test or validation dataset. Default: False.

evaluate(outputs, res_folder, metric='PCKh', **kwargs)[source]

Evaluate PCKh for MPII-TRB dataset.


batch_size: N num_keypoints: K heatmap height: H heatmap width: W

  • outputs (list(preds, boxes, image_paths, heatmap)) –

    • preds (np.ndarray[N,K,3]): The first two dimensions are coordinates, score is the third dimension of the array.

    • boxes (np.ndarray[N,6]): [center[0], center[1], scale[0] , scale[1],area, score]

    • image_paths (list[str]): For example, [‘/val2017/000000 397133.jpg’]

    • heatmap (np.ndarray[N, K, H, W]): model output heatmap.

    • bbox_ids (list[str]): For example, [‘27407’]

  • res_folder (str) – Path of directory to save the results.

  • metric (str | list[str]) – Metrics to be performed. Defaults: ‘PCKh’.


PCKh for each joint

Return type


class mmpose.datasets.TopDownOCHumanDataset(ann_file, img_prefix, data_cfg, pipeline, test_mode=False)[source]

OChuman dataset for top-down pose estimation.

Pose2Seg: Detection Free Human Instance Segmentation’ CVPR’2019 More details can be found in the `paper .

“Occluded Human (OCHuman)” dataset contains 8110 heavily occluded human instances within 4731 images. OCHuman dataset is designed for validation and testing. To evaluate on OCHuman, the model should be trained on COCO training set, and then test the robustness of the model to occlusion using OCHuman.

OCHuman keypoint indexes (same as COCO):

0: 'nose',
1: 'left_eye',
2: 'right_eye',
3: 'left_ear',
4: 'right_ear',
5: 'left_shoulder',
6: 'right_shoulder',
7: 'left_elbow',
8: 'right_elbow',
9: 'left_wrist',
10: 'right_wrist',
11: 'left_hip',
12: 'right_hip',
13: 'left_knee',
14: 'right_knee',
15: 'left_ankle',
16: 'right_ankle'
  • ann_file (str) – Path to the annotation file.

  • img_prefix (str) – Path to a directory where images are held. Default: None.

  • data_cfg (dict) – config

  • pipeline (list[dict | callable]) – A sequence of data transforms.

  • test_mode (bool) – Store True when building test or validation dataset. Default: False.

class mmpose.datasets.TopDownOneHand10KDataset(*args, **kwargs)[source]

Deprecated TopDownOneHand10KDataset.

evaluate(cfg, preds, output_dir, *args, **kwargs)[source]

Evaluate keypoint results.

class mmpose.datasets.TopDownPanopticDataset(*args, **kwargs)[source]

Deprecated TopDownPanopticDataset.

evaluate(cfg, preds, output_dir, *args, **kwargs)[source]

Evaluate keypoint results.

class mmpose.datasets.TopDownPoseTrack18Dataset(ann_file, img_prefix, data_cfg, pipeline, test_mode=False)[source]

PoseTrack18 dataset for top-down pose estimation.

Posetrack: A benchmark for human pose estimation and tracking’ CVPR’2018 More details can be found in the `paper .

The dataset loads raw features and apply specified transforms to return a dict containing the image tensors and other information.

PoseTrack2018 keypoint indexes::

0: ‘nose’, 1: ‘head_bottom’, 2: ‘head_top’, 3: ‘left_ear’, 4: ‘right_ear’, 5: ‘left_shoulder’, 6: ‘right_shoulder’, 7: ‘left_elbow’, 8: ‘right_elbow’, 9: ‘left_wrist’, 10: ‘right_wrist’, 11: ‘left_hip’, 12: ‘right_hip’, 13: ‘left_knee’, 14: ‘right_knee’, 15: ‘left_ankle’, 16: ‘right_ankle’

  • ann_file (str) – Path to the annotation file.

  • img_prefix (str) – Path to a directory where images are held. Default: None.

  • data_cfg (dict) – config

  • pipeline (list[dict | callable]) – A sequence of data transforms.

  • test_mode (bool) – Store True when building test or validation dataset. Default: False.

evaluate(outputs, res_folder, metric='mAP', **kwargs)[source]

Evaluate coco keypoint results. The pose prediction results will be saved in ${res_folder}/result_keypoints.json.


num_keypoints: K

  • outputs (list(preds, boxes, image_paths)) –

    preds (np.ndarray[N,K,3])

    The first two dimensions are coordinates, score is the third dimension of the array.

    boxes (np.ndarray[N,6])

    [center[0], center[1], scale[0] , scale[1],area, score]

    image_paths (list[str])

    For example, [‘val/010016_mpii_test /000024.jpg’]

    heatmap (np.ndarray[N, K, H, W])

    model output heatmap.

    :bbox_id (list(int))

  • res_folder (str) – Path of directory to save the results.

  • metric (str | list[str]) – Metric to be performed. Defaults: ‘mAP’.


Evaluation results for evaluation metric.

Return type


mmpose.datasets.build_dataloader(dataset, samples_per_gpu, workers_per_gpu, num_gpus=1, dist=True, shuffle=True, seed=None, drop_last=True, pin_memory=True, **kwargs)[source]

Build PyTorch DataLoader.

In distributed training, each GPU/process has a dataloader. In non-distributed training, there is only one dataloader for all GPUs.

  • dataset (Dataset) – A PyTorch dataset.

  • samples_per_gpu (int) – Number of training samples on each GPU, i.e., batch size of each GPU.

  • workers_per_gpu (int) – How many subprocesses to use for data loading for each GPU.

  • num_gpus (int) – Number of GPUs. Only used in non-distributed training.

  • dist (bool) – Distributed training/test or not. Default: True.

  • shuffle (bool) – Whether to shuffle the data at every epoch. Default: True.

  • drop_last (bool) – Whether to drop the last incomplete batch in epoch. Default: True

  • pin_memory (bool) – Whether to use pin_memory in DataLoader. Default: True

  • kwargs – any keyword argument to be used to initialize DataLoader


A PyTorch dataloader.

Return type


mmpose.datasets.build_dataset(cfg, default_args=None)[source]

Build a dataset from config dict.

  • cfg (dict) – Config dict. It should at least contain the key “type”.

  • default_args (dict, optional) – Default initialization arguments. Default: None.


The constructed dataset.

Return type



class mmpose.datasets.datasets.top_down.TopDownAicDataset(ann_file, img_prefix, data_cfg, pipeline, test_mode=False)[source]

AicDataset dataset for top-down pose estimation.

AI Challenger : A Large-scale Dataset for Going Deeper in Image Understanding

The dataset loads raw features and apply specified transforms to return a dict containing the image tensors and other information.

AIC keypoint indexes::

0: “right_shoulder”, 1: “right_elbow”, 2: “right_wrist”, 3: “left_shoulder”, 4: “left_elbow”, 5: “left_wrist”, 6: “right_hip”, 7: “right_knee”, 8: “right_ankle”, 9: “left_hip”, 10: “left_knee”, 11: “left_ankle”, 12: “head_top”, 13: “neck”

  • ann_file (str) – Path to the annotation file.

  • img_prefix (str) – Path to a directory where images are held. Default: None.

  • data_cfg (dict) – config

  • pipeline (list[dict | callable]) – A sequence of data transforms.

  • test_mode (bool) – Store True when building test or validation dataset. Default: False.

class mmpose.datasets.datasets.top_down.TopDownCocoDataset(ann_file, img_prefix, data_cfg, pipeline, test_mode=False)[source]

CocoDataset dataset for top-down pose estimation.

Microsoft COCO: Common Objects in Context’ ECCV’2014 More details can be found in the `paper .

The dataset loads raw features and apply specified transforms to return a dict containing the image tensors and other information.

COCO keypoint indexes:

0: 'nose',
1: 'left_eye',
2: 'right_eye',
3: 'left_ear',
4: 'right_ear',
5: 'left_shoulder',
6: 'right_shoulder',
7: 'left_elbow',
8: 'right_elbow',
9: 'left_wrist',
10: 'right_wrist',
11: 'left_hip',
12: 'right_hip',
13: 'left_knee',
14: 'right_knee',
15: 'left_ankle',
16: 'right_ankle'
  • ann_file (str) – Path to the annotation file.

  • img_prefix (str) – Path to a directory where images are held. Default: None.

  • data_cfg (dict) – config

  • pipeline (list[dict | callable]) – A sequence of data transforms.

  • test_mode (bool) – Store True when building test or validation dataset. Default: False.

evaluate(outputs, res_folder, metric='mAP', **kwargs)[source]

Evaluate coco keypoint results. The pose prediction results will be saved in ${res_folder}/result_keypoints.json.


batch_size: N num_keypoints: K heatmap height: H heatmap width: W

  • outputs (list(dict)) –

    preds (np.ndarray[N,K,3])

    The first two dimensions are coordinates, score is the third dimension of the array.

    boxes (np.ndarray[N,6])

    [center[0], center[1], scale[0] , scale[1],area, score]

    image_paths (list[str])

    For example, [‘data/coco/val2017 /000000393226.jpg’]

    heatmap (np.ndarray[N, K, H, W])

    model output heatmap

    :bbox_id (list(int)).

  • res_folder (str) – Path of directory to save the results.

  • metric (str | list[str]) – Metric to be performed. Defaults: ‘mAP’.


Evaluation results for evaluation metric.

Return type


class mmpose.datasets.datasets.top_down.TopDownCocoWholeBodyDataset(ann_file, img_prefix, data_cfg, pipeline, test_mode=False)[source]

CocoWholeBodyDataset dataset for top-down pose estimation.

Whole-Body Human Pose Estimation in the Wild’ ECCV’2020 More details can be found in the `paper .

The dataset loads raw features and apply specified transforms to return a dict containing the image tensors and other information.

In total, we have 133 keypoints for wholebody pose estimation.

COCO-WholeBody keypoint indexes::

0-16: 17 body keypoints 17-22: 6 foot keypoints 23-90: 68 face keypoints 91-132: 42 hand keypoints

  • ann_file (str) – Path to the annotation file.

  • img_prefix (str) – Path to a directory where images are held. Default: None.

  • data_cfg (dict) – config

  • pipeline (list[dict | callable]) – A sequence of data transforms.

  • test_mode (bool) – Store True when building test or validation dataset. Default: False.

class mmpose.datasets.datasets.top_down.TopDownCrowdPoseDataset(ann_file, img_prefix, data_cfg, pipeline, test_mode=False)[source]

CrowdPoseDataset dataset for top-down pose estimation.

The dataset loads raw features and apply specified transforms to return a dict containing the image tensors and other information.

CrowdPose keypoint indexes:

0: 'left_shoulder',
1: 'right_shoulder',
2: 'left_elbow',
3: 'right_elbow',
4: 'left_wrist',
5: 'right_wrist',
6: 'left_hip',
7: 'right_hip',
8: 'left_knee',
9: 'right_knee',
10: 'left_ankle',
11: 'right_ankle',
12: 'top_head',
13: 'neck'
  • ann_file (str) – Path to the annotation file.

  • img_prefix (str) – Path to a directory where images are held. Default: None.

  • data_cfg (dict) – config

  • pipeline (list[dict | callable]) – A sequence of data transforms.

  • test_mode (bool) – Store True when building test or validation dataset. Default: False.

class mmpose.datasets.datasets.top_down.TopDownJhmdbDataset(ann_file, img_prefix, data_cfg, pipeline, test_mode=False)[source]

JhmdbDataset dataset for top-down pose estimation.

`Towards understanding action recognition

<https://openaccess.thecvf.com/content_iccv_2013/papers/ Jhuang_Towards_Understanding_Action_2013_ICCV_paper.pdf>`__

The dataset loads raw features and apply specified transforms to return a dict containing the image tensors and other information.

sub-JHMDB keypoint indexes::

0: “neck”, 1: “belly”, 2: “head”, 3: “right_shoulder”, 4: “left_shoulder”, 5: “right_hip”, 6: “left_hip”, 7: “right_elbow”, 8: “left_elbow”, 9: “right_knee”, 10: “left_knee”, 11: “right_wrist”, 12: “left_wrist”, 13: “right_ankle”, 14: “left_ankle”

  • ann_file (str) – Path to the annotation file.

  • img_prefix (str) – Path to a directory where images are held. Default: None.

  • data_cfg (dict) – config

  • pipeline (list[dict | callable]) – A sequence of data transforms.

  • test_mode (bool) – Store True when building test or validation dataset. Default: False.

evaluate(outputs, res_folder, metric='PCK', **kwargs)[source]

Evaluate onehand10k keypoint results. The pose prediction results will be saved in ${res_folder}/result_keypoints.json.


batch_size: N num_keypoints: K heatmap height: H heatmap width: W

  • outputs (list(preds, boxes, image_path, output_heatmap)) –

    preds (np.ndarray[N,K,3])

    The first two dimensions are coordinates, score is the third dimension of the array.

    boxes (np.ndarray[N,6])

    [center[0], center[1], scale[0] , scale[1],area, score]

    :image_path (list[str]) :output_heatmap (np.ndarray[N, K, H, W]): model outpus.

  • res_folder (str) – Path of directory to save the results.

  • metric (str | list[str]) – Metric to be performed. Options: ‘PCK’, ‘tPCK’. PCK means normalized by the bounding boxes, while tPCK means normalized by the torso size.


Evaluation results for evaluation metric.

Return type


class mmpose.datasets.datasets.top_down.TopDownMhpDataset(ann_file, img_prefix, data_cfg, pipeline, test_mode=False)[source]

MHPv2.0 dataset for top-down pose estimation.

The Multi-Human Parsing project of Learning and Vision (LV) Group, National University of Singapore (NUS) is proposed to push the frontiers of fine-grained visual understanding of humans in crowd scene. <https://lv-mhp.github.io/>

Note that, the evaluation metric used here is mAP (adapted from COCO), which may be different from the official evaluation codes. ‘https://github.com/ZhaoJ9014/Multi-Human-Parsing/tree/master/’ ‘Evaluation/Multi-Human-Pose’ Please be cautious if you use the results in papers.

The dataset loads raw features and apply specified transforms to return a dict containing the image tensors and other information.

MHP keypoint indexes:

0: "right ankle",
1: "right knee",
2: "right hip",
3: "left hip",
4: "left knee",
5: "left ankle",
6: "pelvis",
7: "thorax",
8: "upper neck",
9: "head top",
10: "right wrist",
11: "right elbow",
12: "right shoulder",
13: "left shoulder",
14: "left elbow",
15: "left wrist",
  • ann_file (str) – Path to the annotation file.

  • img_prefix (str) – Path to a directory where images are held. Default: None.

  • data_cfg (dict) – config

  • pipeline (list[dict | callable]) – A sequence of data transforms.

  • test_mode (bool) – Store True when building test or validation dataset. Default: False.

class mmpose.datasets.datasets.top_down.TopDownMpiiDataset(ann_file, img_prefix, data_cfg, pipeline, test_mode=False)[source]

MPII Dataset for top-down pose estimation.

2D Human Pose Estimation: New Benchmark and State of the Art Analysis’ CVPR’2014. More details can be found in the `paper .

The dataset loads raw features and apply specified transforms to return a dict containing the image tensors and other information.

MPII keypoint indexes:

0: 'right_ankle'
1: 'right_knee',
2: 'right_hip',
3: 'left_hip',
4: 'left_knee',
5: 'left_ankle',
6: 'pelvis',
7: 'thorax',
8: 'upper_neck',
9: 'head_top',
10: 'right_wrist',
11: 'right_elbow',
12: 'right_shoulder',
13: 'left_shoulder',
14: 'left_elbow',
15: 'left_wrist'
  • ann_file (str) – Path to the annotation file.

  • img_prefix (str) – Path to a directory where images are held. Default: None.

  • data_cfg (dict) – config

  • pipeline (list[dict | callable]) – A sequence of data transforms.

  • test_mode (bool) – Store True when building test or validation dataset. Default: False.

evaluate(outputs, res_folder, metric='PCKh', **kwargs)[source]

Evaluate PCKh for MPII dataset. Adapted from https://github.com/leoxiaobin/deep-high-resolution-net.pytorch Copyright (c) Microsoft, under the MIT License.


batch_size: N num_keypoints: K heatmap height: H heatmap width: W

  • outputs (list(preds, boxes, image_path, heatmap)) –

    • preds (np.ndarray[N,K,3]): The first two dimensions are coordinates, score is the third dimension of the array.

    • boxes (np.ndarray[N,6]): [center[0], center[1], scale[0] , scale[1],area, score]

    • image_paths (list[str]): For example, [‘/val2017/000000 397133.jpg’]

    • heatmap (np.ndarray[N, K, H, W]): model output heatmap.

  • res_folder (str) – Path of directory to save the results.

  • metric (str | list[str]) – Metrics to be performed. Defaults: ‘PCKh’.


PCKh for each joint

Return type


class mmpose.datasets.datasets.top_down.TopDownMpiiTrbDataset(ann_file, img_prefix, data_cfg, pipeline, test_mode=False)[source]

MPII-TRB Dataset dataset for top-down pose estimation.

TRB: A Novel Triplet Representation for Understanding 2D Human Body ICCV’2019 More details can be found in the paper .

The dataset loads raw features and apply specified transforms to return a dict containing the image tensors and other information.

MPII-TRB keypoint indexes:

0: 'left_shoulder'
1: 'right_shoulder'
2: 'left_elbow'
3: 'right_elbow'
4: 'left_wrist'
5: 'right_wrist'
6: 'left_hip'
7: 'right_hip'
8: 'left_knee'
9: 'right_knee'
10: 'left_ankle'
11: 'right_ankle'
12: 'head'
13: 'neck'

14: 'right_neck'
15: 'left_neck'
16: 'medial_right_shoulder'
17: 'lateral_right_shoulder'
18: 'medial_right_bow'
19: 'lateral_right_bow'
20: 'medial_right_wrist'
21: 'lateral_right_wrist'
22: 'medial_left_shoulder'
23: 'lateral_left_shoulder'
24: 'medial_left_bow'
25: 'lateral_left_bow'
26: 'medial_left_wrist'
27: 'lateral_left_wrist'
28: 'medial_right_hip'
29: 'lateral_right_hip'
30: 'medial_right_knee'
31: 'lateral_right_knee'
32: 'medial_right_ankle'
33: 'lateral_right_ankle'
34: 'medial_left_hip'
35: 'lateral_left_hip'
36: 'medial_left_knee'
37: 'lateral_left_knee'
38: 'medial_left_ankle'
39: 'lateral_left_ankle'
  • ann_file (str) – Path to the annotation file.

  • img_prefix (str) – Path to a directory where images are held. Default: None.

  • data_cfg (dict) – config

  • pipeline (list[dict | callable]) – A sequence of data transforms.

  • test_mode (bool) – Store True when building test or validation dataset. Default: False.

evaluate(outputs, res_folder, metric='PCKh', **kwargs)[source]

Evaluate PCKh for MPII-TRB dataset.


batch_size: N num_keypoints: K heatmap height: H heatmap width: W

  • outputs (list(preds, boxes, image_paths, heatmap)) –

    • preds (np.ndarray[N,K,3]): The first two dimensions are coordinates, score is the third dimension of the array.

    • boxes (np.ndarray[N,6]): [center[0], center[1], scale[0] , scale[1],area, score]

    • image_paths (list[str]): For example, [‘/val2017/000000 397133.jpg’]

    • heatmap (np.ndarray[N, K, H, W]): model output heatmap.

    • bbox_ids (list[str]): For example, [‘27407’]

  • res_folder (str) – Path of directory to save the results.

  • metric (str | list[str]) – Metrics to be performed. Defaults: ‘PCKh’.


PCKh for each joint

Return type


class mmpose.datasets.datasets.top_down.TopDownOCHumanDataset(ann_file, img_prefix, data_cfg, pipeline, test_mode=False)[source]

OChuman dataset for top-down pose estimation.

Pose2Seg: Detection Free Human Instance Segmentation’ CVPR’2019 More details can be found in the `paper .

“Occluded Human (OCHuman)” dataset contains 8110 heavily occluded human instances within 4731 images. OCHuman dataset is designed for validation and testing. To evaluate on OCHuman, the model should be trained on COCO training set, and then test the robustness of the model to occlusion using OCHuman.

OCHuman keypoint indexes (same as COCO):

0: 'nose',
1: 'left_eye',
2: 'right_eye',
3: 'left_ear',
4: 'right_ear',
5: 'left_shoulder',
6: 'right_shoulder',
7: 'left_elbow',
8: 'right_elbow',
9: 'left_wrist',
10: 'right_wrist',
11: 'left_hip',
12: 'right_hip',
13: 'left_knee',
14: 'right_knee',
15: 'left_ankle',
16: 'right_ankle'
  • ann_file (str) – Path to the annotation file.

  • img_prefix (str) – Path to a directory where images are held. Default: None.

  • data_cfg (dict) – config

  • pipeline (list[dict | callable]) – A sequence of data transforms.

  • test_mode (bool) – Store True when building test or validation dataset. Default: False.

class mmpose.datasets.datasets.top_down.TopDownPoseTrack18Dataset(ann_file, img_prefix, data_cfg, pipeline, test_mode=False)[source]

PoseTrack18 dataset for top-down pose estimation.

Posetrack: A benchmark for human pose estimation and tracking’ CVPR’2018 More details can be found in the `paper .

The dataset loads raw features and apply specified transforms to return a dict containing the image tensors and other information.

PoseTrack2018 keypoint indexes::

0: ‘nose’, 1: ‘head_bottom’, 2: ‘head_top’, 3: ‘left_ear’, 4: ‘right_ear’, 5: ‘left_shoulder’, 6: ‘right_shoulder’, 7: ‘left_elbow’, 8: ‘right_elbow’, 9: ‘left_wrist’, 10: ‘right_wrist’, 11: ‘left_hip’, 12: ‘right_hip’, 13: ‘left_knee’, 14: ‘right_knee’, 15: ‘left_ankle’, 16: ‘right_ankle’

  • ann_file (str) – Path to the annotation file.

  • img_prefix (str) – Path to a directory where images are held. Default: None.

  • data_cfg (dict) – config

  • pipeline (list[dict | callable]) – A sequence of data transforms.

  • test_mode (bool) – Store True when building test or validation dataset. Default: False.

evaluate(outputs, res_folder, metric='mAP', **kwargs)[source]

Evaluate coco keypoint results. The pose prediction results will be saved in ${res_folder}/result_keypoints.json.


num_keypoints: K

  • outputs (list(preds, boxes, image_paths)) –

    preds (np.ndarray[N,K,3])

    The first two dimensions are coordinates, score is the third dimension of the array.

    boxes (np.ndarray[N,6])

    [center[0], center[1], scale[0] , scale[1],area, score]

    image_paths (list[str])

    For example, [‘val/010016_mpii_test /000024.jpg’]

    heatmap (np.ndarray[N, K, H, W])

    model output heatmap.

    :bbox_id (list(int))

  • res_folder (str) – Path of directory to save the results.

  • metric (str | list[str]) – Metric to be performed. Defaults: ‘mAP’.


Evaluation results for evaluation metric.

Return type


class mmpose.datasets.datasets.bottom_up.BottomUpAicDataset(ann_file, img_prefix, data_cfg, pipeline, test_mode=False)[source]

Aic dataset for bottom-up pose estimation.

AI Challenger : A Large-scale Dataset for Going Deeper in Image Understanding

The dataset loads raw features and apply specified transforms to return a dict containing the image tensors and other information.

AIC keypoint indexes::

0: “right_shoulder”, 1: “right_elbow”, 2: “right_wrist”, 3: “left_shoulder”, 4: “left_elbow”, 5: “left_wrist”, 6: “right_hip”, 7: “right_knee”, 8: “right_ankle”, 9: “left_hip”, 10: “left_knee”, 11: “left_ankle”, 12: “head_top”, 13: “neck”

  • ann_file (str) – Path to the annotation file.

  • img_prefix (str) – Path to a directory where images are held. Default: None.

  • data_cfg (dict) – config

  • pipeline (list[dict | callable]) – A sequence of data transforms.

  • test_mode (bool) – Store True when building test or validation dataset. Default: False.

class mmpose.datasets.datasets.bottom_up.BottomUpCocoDataset(ann_file, img_prefix, data_cfg, pipeline, test_mode=False)[source]

COCO dataset for bottom-up pose estimation.

The dataset loads raw features and apply specified transforms to return a dict containing the image tensors and other information.

COCO keypoint indexes:

0: 'nose',
1: 'left_eye',
2: 'right_eye',
3: 'left_ear',
4: 'right_ear',
5: 'left_shoulder',
6: 'right_shoulder',
7: 'left_elbow',
8: 'right_elbow',
9: 'left_wrist',
10: 'right_wrist',
11: 'left_hip',
12: 'right_hip',
13: 'left_knee',
14: 'right_knee',
15: 'left_ankle',
16: 'right_ankle'
  • ann_file (str) – Path to the annotation file.

  • img_prefix (str) – Path to a directory where images are held. Default: None.

  • data_cfg (dict) – config

  • pipeline (list[dict | callable]) – A sequence of data transforms.

  • test_mode (bool) – Store True when building test or validation dataset. Default: False.

evaluate(outputs, res_folder, metric='mAP', **kwargs)[source]

Evaluate coco keypoint results. The pose prediction results will be saved in ${res_folder}/result_keypoints.json.


num_people: P num_keypoints: K

  • outputs (list(preds, scores, image_path, heatmap)) –

    • preds (list[np.ndarray(P, K, 3+tag_num)]): Pose predictions for all people in images.

    • scores (list[P]):

    • image_path (list[str]): For example, [‘coco/images/

    val2017/000000397133.jpg’] * heatmap (np.ndarray[N, K, H, W]): model outputs.

  • res_folder (str) – Path of directory to save the results.

  • metric (str | list[str]) – Metric to be performed. Defaults: ‘mAP’.


Evaluation results for evaluation metric.

Return type


class mmpose.datasets.datasets.bottom_up.BottomUpCrowdPoseDataset(ann_file, img_prefix, data_cfg, pipeline, test_mode=False)[source]

CrowdPose dataset for bottom-up pose estimation.

The dataset loads raw features and apply specified transforms to return a dict containing the image tensors and other information.

CrowdPose keypoint indexes:

0: 'left_shoulder',
1: 'right_shoulder',
2: 'left_elbow',
3: 'right_elbow',
4: 'left_wrist',
5: 'right_wrist',
6: 'left_hip',
7: 'right_hip',
8: 'left_knee',
9: 'right_knee',
10: 'left_ankle',
11: 'right_ankle',
12: 'top_head',
13: 'neck'
  • ann_file (str) – Path to the annotation file.

  • img_prefix (str) – Path to a directory where images are held. Default: None.

  • data_cfg (dict) – config

  • pipeline (list[dict | callable]) – A sequence of data transforms.

  • test_mode (bool) – Store True when building test or validation dataset. Default: False.

class mmpose.datasets.datasets.bottom_up.BottomUpMhpDataset(ann_file, img_prefix, data_cfg, pipeline, test_mode=False)[source]

MHPv2.0 dataset for top-down pose estimation.

The Multi-Human Parsing project of Learning and Vision (LV) Group, National University of Singapore (NUS) is proposed to push the frontiers of fine-grained visual understanding of humans in crowd scene. <https://lv-mhp.github.io/>

The dataset loads raw features and apply specified transforms to return a dict containing the image tensors and other information.

MHP keypoint indexes:

0: "right ankle",
1: "right knee",
2: "right hip",
3: "left hip",
4: "left knee",
5: "left ankle",
6: "pelvis",
7: "thorax",
8: "upper neck",
9: "head top",
10: "right wrist",
11: "right elbow",
12: "right shoulder",
13: "left shoulder",
14: "left elbow",
15: "left wrist",
  • ann_file (str) – Path to the annotation file.

  • img_prefix (str) – Path to a directory where images are held. Default: None.

  • data_cfg (dict) – config

  • pipeline (list[dict | callable]) – A sequence of data transforms.

  • test_mode (bool) – Store True when building test or validation dataset. Default: False.


class mmpose.datasets.pipelines.loading.LoadImageFromFile(to_float32=False, color_type='color', channel_order='rgb')[source]

Loading image from file.

  • color_type (str) – Flags specifying the color type of a loaded image, candidates are ‘color’, ‘grayscale’ and ‘unchanged’.

  • channel_order (str) – Order of channel, candidates are ‘bgr’ and ‘rgb’.

class mmpose.datasets.pipelines.shared_transform.Albumentation(transforms, keymap=None)[source]

Albumentation augmentation (pixel-level transforms only). Adds custom pixel-level transformations from Albumentations library. Please visit https://albumentations.readthedocs.io to get more information.

Note: we only support pixel-level transforms. Please visit https://github.com/albumentations-team/ albumentations#pixel-level-transforms to get more information about pixel-level transforms.

An example of transforms is as followed: .. code-block:

        brightness_limit=[0.1, 0.3],
        contrast_limit=[0.1, 0.3],
    dict(type='ChannelShuffle', p=0.1),
            dict(type='Blur', blur_limit=3, p=1.0),
            dict(type='MedianBlur', blur_limit=3, p=1.0)
  • transforms (list[dict]) – A list of Albumentation transformations

  • keymap (dict) – Contains {‘input key’:’albumentation-style key’}, e.g., {‘img’: ‘image’}.


Import a module from albumentations.

It resembles some of build_from_cfg() logic. :param cfg: Config dict. It should at least contain the key “type”. :type cfg: dict


The constructed object.

Return type


static mapper(d, keymap)[source]

Dictionary mapper.

Renames keys according to keymap provided. :param d: old dict :type d: dict :param keymap: {‘old_key’:’new_key’} :type keymap: dict


new dict.

Return type


class mmpose.datasets.pipelines.shared_transform.Collect(keys, meta_keys, meta_name='img_metas')[source]

Collect data from the loader relevant to the specific task.

This keeps the items in keys as it is, and collect items in meta_keys into a meta item called meta_name.This is usually the last stage of the data loader pipeline. For example, when keys=’imgs’, meta_keys=(‘filename’, ‘label’, ‘original_shape’), meta_name=’img_metas’, the results will be a dict with keys ‘imgs’ and ‘img_metas’, where ‘img_metas’ is a DataContainer of another dict with keys ‘filename’, ‘label’, ‘original_shape’.

  • keys (Sequence[str|tuple]) – Required keys to be collected. If a tuple (key, key_new) is given as an element, the item retrived by key will be renamed as key_new in collected data.

  • meta_name (str) – The name of the key that contains meta infomation. This key is always populated. Default: “img_metas”.

  • meta_keys (Sequence[str|tuple]) – Keys that are collected under meta_name. The contents of the meta_name dictionary depends on meta_keys.

class mmpose.datasets.pipelines.shared_transform.Compose(transforms)[source]

Compose a data pipeline with a sequence of transforms.


transforms (list[dict | callable]) – Either config dicts of transforms or transform objects.

class mmpose.datasets.pipelines.shared_transform.MultitaskGatherTarget(pipeline_list, pipeline_indices)[source]

Gather the targets for multitask heads.

  • pipeline_list (list[list]) – List of pipelines for all heads.

  • pipeline_indices (list[int]) – Pipeline index of each head.

class mmpose.datasets.pipelines.shared_transform.NormalizeTensor(mean, std)[source]

Normalize the Tensor image (CxHxW), with mean and std.

Required key: ‘img’. Modifies key: ‘img’.

  • mean (list[float]) – Mean values of 3 channels.

  • std (list[float]) – Std values of 3 channels.

class mmpose.datasets.pipelines.shared_transform.PhotometricDistortion(brightness_delta=32, contrast_range=(0.5, 1.5), saturation_range=(0.5, 1.5), hue_delta=18)[source]

Apply photometric distortion to image sequentially, every transformation is applied with a probability of 0.5. The position of random contrast is in second or second to last.

  1. random brightness

  2. random contrast (mode 0)

  3. convert color from BGR to HSV

  4. random saturation

  5. random hue

  6. convert color from HSV to BGR

  7. random contrast (mode 1)

  8. randomly swap channels

  • brightness_delta (int) – delta of brightness.

  • contrast_range (tuple) – range of contrast.

  • saturation_range (tuple) – range of saturation.

  • hue_delta (int) – delta of hue.


Brightness distortion.


Contrast distortion.

convert(img, alpha=1, beta=0)[source]

Multiple with alpha and add beta with clip.

class mmpose.datasets.pipelines.shared_transform.RenameKeys(key_pairs)[source]

Rename the keys.

Args: key_pairs (Sequence[tuple]): Required keys to be renamed. If a tuple (key_src, key_tgt) is given as an element, the item retrived by key_src will be renamed as key_tgt.

class mmpose.datasets.pipelines.shared_transform.ToTensor[source]

Transform image to Tensor.

Required key: ‘img’. Modifies key: ‘img’.


results (dict) – contain all information about training.

class mmpose.datasets.pipelines.top_down_transform.TopDownAffine(use_udp=False)[source]

Affine transform the image to make input.

Required keys:’img’, ‘joints_3d’, ‘joints_3d_visible’, ‘ann_info’,’scale’, ‘rotation’ and ‘center’. Modified keys:’img’, ‘joints_3d’, and ‘joints_3d_visible’.


use_udp (bool) – To use unbiased data processing. Paper ref: Huang et al. The Devil is in the Details: Delving into Unbiased Data Processing for Human Pose Estimation (CVPR 2020).

class mmpose.datasets.pipelines.top_down_transform.TopDownGenerateTarget(sigma=2, kernel=(11, 11), valid_radius_factor=0.0546875, target_type='GaussianHeatMap', encoding='MSRA', unbiased_encoding=False)[source]

Generate the target heatmap.

Required keys: ‘joints_3d’, ‘joints_3d_visible’, ‘ann_info’. Modified keys: ‘target’, and ‘target_weight’.

  • sigma – Sigma of heatmap gaussian for ‘MSRA’ approach.

  • kernel – Kernel of heatmap gaussian for ‘Megvii’ approach.

  • encoding (str) – Approach to generate target heatmaps. Currently supported approaches: ‘MSRA’, ‘Megvii’, ‘UDP’. Default:’MSRA’

  • unbiased_encoding (bool) – Option to use unbiased encoding methods. Paper ref: Zhang et al. Distribution-Aware Coordinate Representation for Human Pose Estimation (CVPR 2020).

  • keypoint_pose_distance – Keypoint pose distance for UDP. Paper ref: Huang et al. The Devil is in the Details: Delving into Unbiased Data Processing for Human Pose Estimation (CVPR 2020).

  • target_type (str) – supported targets: ‘GaussianHeatMap’, ‘CombinedTarget’. Default:’GaussianHeatMap’ CombinedTarget: The combination of classification target (response map) and regression target (offset map). Paper ref: Huang et al. The Devil is in the Details: Delving into Unbiased Data Processing for Human Pose Estimation (CVPR 2020).

class mmpose.datasets.pipelines.top_down_transform.TopDownGenerateTargetRegression[source]

Generate the target regression vector (coordinates).

Required keys: ‘joints_3d’, ‘joints_3d_visible’, ‘ann_info’. Modified keys: ‘target’, and ‘target_weight’.

class mmpose.datasets.pipelines.top_down_transform.TopDownGetRandomScaleRotation(rot_factor=40, scale_factor=0.5, rot_prob=0.6)[source]

Data augmentation with random scaling & rotating.

Required key: ‘scale’. Modifies key: ‘scale’ and ‘rotation’.

  • rot_factor (int) – Rotating to [-2*rot_factor, 2*rot_factor].

  • scale_factor (float) – Scaling to [1-scale_factor, 1+scale_factor].

  • rot_prob (float) – Probability of random rotation.

class mmpose.datasets.pipelines.top_down_transform.TopDownHalfBodyTransform(num_joints_half_body=8, prob_half_body=0.3)[source]

Data augmentation with half-body transform. Keep only the upper body or the lower body at random.

Required keys: ‘joints_3d’, ‘joints_3d_visible’, and ‘ann_info’. Modifies key: ‘scale’ and ‘center’.

  • num_joints_half_body (int) – Threshold of performing half-body transform. If the body has fewer number of joints (< num_joints_half_body), ignore this step.

  • prob_half_body (float) – Probability of half-body transform.

static half_body_transform(cfg, joints_3d, joints_3d_visible)[source]

Get center&scale for half-body transform.

class mmpose.datasets.pipelines.top_down_transform.TopDownRandomFlip(flip_prob=0.5)[source]

Data augmentation with random image flip.

Required keys: ‘img’, ‘joints_3d’, ‘joints_3d_visible’, ‘center’ and ‘ann_info’. Modifies key: ‘img’, ‘joints_3d’, ‘joints_3d_visible’, ‘center’ and ‘flipped’.

  • flip (bool) – Option to perform random flip.

  • flip_prob (float) – Probability of flip.

class mmpose.datasets.pipelines.top_down_transform.TopDownRandomTranslation(trans_factor=0.15)[source]

Data augmentation with random translation.

Required key: ‘scale’ and ‘center’. Modifies key: ‘center’.


bbox height: H bbox width: W

  • trans_factor (float) – Translating center to

  • ``[-trans_factor

  • * [W (trans_factor]) –

  • + center``. (H]) –

class mmpose.datasets.pipelines.bottom_up_transform.BottomUpGenerateHeatmapTarget(sigma, use_udp=False)[source]

Generate multi-scale heatmap target for bottom-up.

  • sigma (int) – Sigma of heatmap Gaussian

  • max_num_people (int) – Maximum number of people in an image

  • use_udp (bool) – To use unbiased data processing. Paper ref: Huang et al. The Devil is in the Details: Delving into Unbiased Data Processing for Human Pose Estimation (CVPR 2020).

class mmpose.datasets.pipelines.bottom_up_transform.BottomUpGeneratePAFTarget(limb_width, skeleton=None)[source]

Generate multi-scale heatmaps and part affinity fields (PAF) target for bottom-up. Paper ref: Cao et al. Realtime Multi-Person 2D Human Pose Estimation using Part Affinity Fields (CVPR 2017).


limb_width (int) – Limb width of part affinity fields

class mmpose.datasets.pipelines.bottom_up_transform.BottomUpGenerateTarget(sigma, max_num_people, use_udp=False)[source]

Generate multi-scale heatmap target for bottom-up.

  • sigma (int) – Sigma of heatmap Gaussian

  • max_num_people (int) – Maximum number of people in an image

  • use_udp (bool) – To use unbiased data processing. Paper ref: Huang et al. The Devil is in the Details: Delving into Unbiased Data Processing for Human Pose Estimation (CVPR 2020).

class mmpose.datasets.pipelines.bottom_up_transform.BottomUpGetImgSize(test_scale_factor, current_scale=1, use_udp=False)[source]

Get multi-scale image sizes for bottom-up, including base_size and test_scale_factor. Keep the ratio and the image is resized to results[‘ann_info’][‘image_size’]×current_scale.

  • test_scale_factor (List[float]) – Multi scale

  • current_scale (int) – default 1

  • use_udp (bool) – To use unbiased data processing. Paper ref: Huang et al. The Devil is in the Details: Delving into Unbiased Data Processing for Human Pose Estimation (CVPR 2020).

class mmpose.datasets.pipelines.bottom_up_transform.BottomUpRandomAffine(rot_factor, scale_factor, scale_type, trans_factor, use_udp=False)[source]

Data augmentation with random scaling & rotating.

  • rot_factor (int) – Rotating to [-rotation_factor, rotation_factor]

  • scale_factor (float) – Scaling to [1-scale_factor, 1+scale_factor]

  • scale_type – wrt long or short length of the image.

  • trans_factor – Translation factor.

  • scale_aware_sigma – Option to use scale-aware sigma

  • use_udp (bool) – To use unbiased data processing. Paper ref: Huang et al. The Devil is in the Details: Delving into Unbiased Data Processing for Human Pose Estimation (CVPR 2020).

class mmpose.datasets.pipelines.bottom_up_transform.BottomUpRandomFlip(flip_prob=0.5)[source]

Data augmentation with random image flip for bottom-up.


flip_prob (float) – Probability of flip.

class mmpose.datasets.pipelines.bottom_up_transform.BottomUpResizeAlign(transforms, use_udp=False)[source]

Resize multi-scale size and align transform for bottom-up.

  • transforms (List) – ToTensor & Normalize

  • use_udp (bool) – To use unbiased data processing. Paper ref: Huang et al. The Devil is in the Details: Delving into Unbiased Data Processing for Human Pose Estimation (CVPR 2020).

class mmpose.datasets.pipelines.bottom_up_transform.HeatmapGenerator(output_size, num_joints, sigma=- 1, use_udp=False)[source]

Generate heatmaps for bottom-up models.

  • num_joints (int) – Number of keypoints

  • output_size (int) – Size of feature map

  • sigma (int) – Sigma of the heatmaps.

  • use_udp (bool) – To use unbiased data processing. Paper ref: Huang et al. The Devil is in the Details: Delving into Unbiased Data Processing for Human Pose Estimation (CVPR 2020).

class mmpose.datasets.pipelines.bottom_up_transform.JointsEncoder(max_num_people, num_joints, output_size, tag_per_joint)[source]

Encodes the visible joints into (coordinates, score); The coordinate of one joint and its score are of int type.

(idx * output_size**2 + y * output_size + x, 1) or (0, 0).

  • max_num_people (int) – Max number of people in an image

  • num_joints (int) – Number of keypoints

  • output_size (int) – Size of feature map

  • tag_per_joint (bool) – Option to use one tag map per joint.

class mmpose.datasets.pipelines.bottom_up_transform.PAFGenerator(output_size, limb_width, skeleton)[source]

Generate part affinity fields.

  • output_size (int) – Size of feature map.

  • limb_width (int) – Limb width of part affinity fields.

  • skeleton (list[list]) – connections of joints.

class mmpose.datasets.pipelines.mesh_transform.IUVToTensor[source]

Transform IUV image to part index mask and uv coordinates image. The 3 channels of IUV image means: part index, u coordinates, v coordinates.

Required key: ‘iuv’, ‘ann_info’. Modifies key: ‘part_index’, ‘uv_coordinates’.


results (dict) – contain all information about training.

class mmpose.datasets.pipelines.mesh_transform.LoadIUVFromFile(to_float32=False)[source]

Loading IUV image from file.

class mmpose.datasets.pipelines.mesh_transform.MeshAffine[source]

Affine transform the image to get input image. Affine transform the 2D keypoints, 3D kepoints and IUV image too.

Required keys: ‘img’, ‘joints_2d’,’joints_2d_visible’, ‘joints_3d’, ‘joints_3d_visible’, ‘pose’, ‘iuv’, ‘ann_info’,’scale’, ‘rotation’ and ‘center’. Modifies key: ‘img’, ‘joints_2d’,’joints_2d_visible’, ‘joints_3d’, ‘pose’, ‘iuv’.

class mmpose.datasets.pipelines.mesh_transform.MeshGetRandomScaleRotation(rot_factor=30, scale_factor=0.25, rot_prob=0.6)[source]

Data augmentation with random scaling & rotating.

Required key: ‘scale’. Modifies key: ‘scale’ and ‘rotation’.

  • rot_factor (int) – Rotating to [-2*rot_factor, 2*rot_factor].

  • scale_factor (float) – Scaling to [1-scale_factor, 1+scale_factor].

  • rot_prob (float) – Probability of random rotation.

class mmpose.datasets.pipelines.mesh_transform.MeshRandomChannelNoise(noise_factor=0.4)[source]

Data augmentation with random channel noise.

Required keys: ‘img’ Modifies key: ‘img’


noise_factor (float) – Multiply each channel with a factor between``[1-scale_factor, 1+scale_factor]``

class mmpose.datasets.pipelines.mesh_transform.MeshRandomFlip(flip_prob=0.5)[source]

Data augmentation with random image flip.

Required keys: ‘img’, ‘joints_2d’,’joints_2d_visible’, ‘joints_3d’, ‘joints_3d_visible’, ‘center’, ‘pose’, ‘iuv’ and ‘ann_info’. Modifies key: ‘img’, ‘joints_2d’,’joints_2d_visible’, ‘joints_3d’, ‘joints_3d_visible’, ‘center’, ‘pose’, ‘iuv’.


flip_prob (float) – Probability of flip.

class mmpose.datasets.pipelines.pose3d_transform.CameraProjection(item, mode, output_name=None, camera_type='SimpleCamera', camera_param=None)[source]

Apply camera projection to joint coordinates.

  • item (str) – The name of the pose to apply camera projection.

  • mode (str) – The type of camera projection, supported options are - world_to_camera - world_to_pixel - camera_to_world - camera_to_pixel

  • output_name (str|None) – The name of the projected pose. If None (default) is given, the projected pose will be stored in place.

  • camera_type (str) – The camera class name (should be registered in CAMERA).

  • camera_param (dict|None) – The camera parameter dict. See the camera class definition for more details. If None is given, the camera parameter will be obtained during processing of each data sample with the key “camera_param”.

Required keys:

item camera_param (if camera parameters are not given in initialization)

Modified keys:


class mmpose.datasets.pipelines.pose3d_transform.Generate3DHeatmapTarget(sigma=2, joint_indices=None)[source]

Generate the target 3d heatmap.

Required keys: ‘joints_3d’, ‘joints_3d_visible’, ‘ann_info’. Modified keys: ‘target’, and ‘target_weight’.

  • sigma – Sigma of heatmap gaussian.

  • joint_indices (list) – Indices of joints used for heatmap generation.

  • None (If) –

class mmpose.datasets.pipelines.pose3d_transform.GetRootCenteredPose(item, root_index, visible_item=None, remove_root=False, root_name=None)[source]

Zero-center the pose around a given root joint. Optionally, the root joint can be removed from the origianl pose and stored as a separate item.

Note that the root-centered joints may no longer align with some annotation information (e.g. flip_pairs, num_joints, inference_channel, etc.) due to the removal of the root joint.

  • item (str) – The name of the pose to apply root-centering.

  • root_index (int) – Root joint index in the pose.

  • visible_item (str) – The name of the visibility item.

  • remove_root (bool) – If true, remove the root joint from the pose

  • root_name (str) – Optional. If not none, it will be used as the key to store the root position separated from the original pose.

Required keys:


Modified keys:

item, visible_item, root_name

class mmpose.datasets.pipelines.pose3d_transform.NormalizeJointCoordinate(item, mean=None, std=None, norm_param_file=None)[source]

Normalize the joint coordinate with given mean and std.

  • item (str) – The name of the pose to normalize.

  • mean (array) – Mean values of joint coordiantes in shape [K, C].

  • std (array) – Std values of joint coordinates in shape [K, C].

  • norm_param_file (str) – Optionally load a dict containing mean and std from a file using mmcv.load.

Required keys:


Modified keys:


class mmpose.datasets.pipelines.pose3d_transform.PoseSequenceToTensor(item)[source]

Convert pose sequence from numpy array to Tensor.

The original pose sequence should have a shape of [T,K,C] or [K,C], where T is the sequence length, K and C are keypoint number and dimension. The converted pose sequence will have a shape of [K*C, T].


item (str) – The name of the pose sequence

Requred keys:


Modified keys:


class mmpose.datasets.pipelines.pose3d_transform.RelativeJointRandomFlip(item, root_index, visible_item=None, flip_prob=0.5)[source]

Data augmentation with random horizontal joint flip around a root joint.

  • item (str) – The name of the pose to flip.

  • root_index (int) – Root joint index in the pose.

  • visible_item (str) – The name of the visibility item which will be flipped accordingly along with the pose.

  • flip_prob (float) – Probability of flip.

Required keys:


Modified keys:



class mmpose.datasets.samplers.DistributedSampler(dataset, num_replicas=None, rank=None, shuffle=True, seed=0)[source]

DistributedSampler inheriting from torch.utils.data.DistributedSampler.

In pytorch of lower versions, there is no shuffle argument. This child class will port one to DistributedSampler.


mmpose.utils.get_root_logger(log_file=None, log_level=20)[source]

Use get_logger method in mmcv to get the root logger.

The logger will be initialized if it has not been initialized. By default a StreamHandler will be added. If log_file is specified, a FileHandler will also be added. The name of the root logger is the top-level package name, e.g., “mmpose”.

  • log_file (str | None) – The log filename. If specified, a FileHandler will be added to the root logger.

  • log_level (int) – The root logger level. Note that only the process of rank 0 is affected, while other processes will set the level to “Error” and be silent most of the time.


The root logger.

Return type
