mmpose.apis¶
mmpose.codecs¶
- class mmpose.codecs.AssociativeEmbedding(input_size: Tuple[int, int], heatmap_size: Tuple[int, int], sigma: Optional[float] = None, use_udp: bool = False, decode_keypoint_order: List[int] = [], decode_nms_kernel: int = 5, decode_gaussian_kernel: int = 3, decode_keypoint_thr: float = 0.1, decode_tag_thr: float = 1.0, decode_topk: int = 20, decode_max_instances: Optional[int] = None)[source]¶
Encode/decode keypoints with the method introduced in “Associative Embedding”. This is an asymmetric codec, where the keypoints are represented as gaussian heatmaps and position indices during encoding, and restored from predicted heatmaps and group tags.
See the paper `Associative Embedding: End-to-End Learning for Joint Detection and Grouping`_ by Newell et al (2017) for details
Note
instance number: N
keypoint number: K
keypoint dimension: D
embedding tag dimension: L
image size: [w, h]
heatmap size: [W, H]
Encoded:
- heatmaps (np.ndarray): The generated heatmap in shape (K, H, W)
where [W, H] is the heatmap_size
- keypoint_indices (np.ndarray): The keypoint position indices in shape
(N, K, 2). Each keypoint’s index is [i, v], where i is the position index in the heatmap (\(i=y*w+x\)) and v is the visibility
keypoint_weights (np.ndarray): The target weights in shape (N, K)
- Parameters
input_size (tuple) – Image size in [w, h]
heatmap_size (tuple) – Heatmap size in [W, H]
sigma (float) – The sigma value of the Gaussian heatmap
use_udp (bool) – Whether use unbiased data processing. See `UDP (CVPR 2020)`_ for details. Defaults to
False
decode_keypoint_order (List[int]) – The grouping order of the keypoint indices. The groupping usually starts from a keypoints around the head and torso, and gruadually moves out to the limbs
decode_keypoint_thr (float) – The threshold of keypoint response value in heatmaps. Defaults to 0.1
decode_tag_thr (float) – The maximum allowed tag distance when matching a keypoint to a group. A keypoint with larger tag distance to any of the existing groups will initializes a new group. Defaults to 1.0
decode_nms_kernel (int) – The kernel size of the NMS during decoding, which should be an odd integer. Defaults to 5
decode_gaussian_kernel (int) – The kernel size of the Gaussian blur during decoding, which should be an odd integer. It is only used when
self.use_udp==True
. Defaults to 3decode_topk (int) – The number top-k candidates of each keypoints that will be retrieved from the heatmaps during dedocding. Defaults to 20
decode_max_instances (int, optional) – The maximum number of instances to decode.
None
means no limitation to the instance number. Defaults toNone
Grouping`: https://arxiv.org/abs/1611.05424 .. UDP (CVPR 2020): https://arxiv.org/abs/1911.07524
- batch_decode(batch_heatmaps: torch.Tensor, batch_tags: torch.Tensor) Tuple[List[numpy.ndarray], List[numpy.ndarray]] [source]¶
Decode the keypoint coordinates from a batch of heatmaps and tagging heatmaps. The decoded keypoint coordinates are in the input image space.
- Parameters
batch_heatmaps (Tensor) – Keypoint detection heatmaps in shape (B, K, H, W)
batch_tags (Tensor) – Tagging heatmaps in shape (B, C, H, W), where \(C=L*K\)
- Returns
- batch_keypoints (List[np.ndarray]): Decoded keypoint coordinates
of the batch, each is in shape (N, K, D)
- batch_scores (List[np.ndarray]): Decoded keypoint scores of the
batch, each is in shape (N, K). It usually represents the confidience of the keypoint prediction
- Return type
tuple
- decode(encoded: Any) Tuple[numpy.ndarray, numpy.ndarray] [source]¶
Decode keypoints.
- Parameters
encoded (any) – Encoded keypoint representation using the codec
- Returns
keypoints (np.ndarray): Keypoint coordinates in shape (N, K, D)
- keypoints_visible (np.ndarray): Keypoint visibility in shape
(N, K, D)
- Return type
tuple
- encode(keypoints: numpy.ndarray, keypoints_visible: Optional[numpy.ndarray] = None) Tuple[numpy.ndarray, numpy.ndarray, numpy.ndarray] [source]¶
Encode keypoints into heatmaps and position indices. Note that the original keypoint coordinates should be in the input image space.
- Parameters
keypoints (np.ndarray) – Keypoint coordinates in shape (N, K, D)
keypoints_visible (np.ndarray) – Keypoint visibilities in shape (N, K)
- Returns
- heatmaps (np.ndarray): The generated heatmap in shape
(K, H, W) where [W, H] is the heatmap_size
- keypoint_indices (np.ndarray): The keypoint position indices
in shape (N, K, 2). Each keypoint’s index is [i, v], where i is the position index in the heatmap (\(i=y*w+x\)) and v is the visibility
- keypoint_weights (np.ndarray): The target weights in shape
(N, K)
- Return type
dict
- class mmpose.codecs.DecoupledHeatmap(input_size: Tuple[int, int], heatmap_size: Tuple[int, int], root_type: str = 'kpt_center', heatmap_min_overlap: float = 0.7, encode_max_instances: int = 30)[source]¶
Encode/decode keypoints with the method introduced in the paper CID.
See the paper Contextual Instance Decoupling for Robust Multi-Person Pose Estimation`_ by Wang et al (2022) for details
Note
instance number: N
keypoint number: K
keypoint dimension: D
image size: [w, h]
heatmap size: [W, H]
- Encoded:
- heatmaps (np.ndarray): The coupled heatmap in shape
(1+K, H, W) where [W, H] is the heatmap_size.
- instance_heatmaps (np.ndarray): The decoupled heatmap in shape
(M*K, H, W) where M is the number of instances.
- keypoint_weights (np.ndarray): The weight for heatmaps in shape
(M*K).
- instance_coords (np.ndarray): The coordinates of instance roots
in shape (M, 2)
- Parameters
input_size (tuple) – Image size in [w, h]
heatmap_size (tuple) – Heatmap size in [W, H]
root_type (str) –
The method to generate the instance root. Options are:
'kpt_center'
: Average coordinate of all visible keypoints.'bbox_center'
: Center point of bounding boxes outlined byall visible keypoints.
Defaults to
'kpt_center'
heatmap_min_overlap (float) – Minimum overlap rate among instances. Used when calculating sigmas for instances. Defaults to 0.7
background_weight (float) – Loss weight of background pixels. Defaults to 0.1
encode_max_instances (int) – The maximum number of instances to encode for each sample. Defaults to 30
Contextual_Instance_Decoupling_for_Robust_Multi-Person_Pose_Estimation_ CVPR_2022_paper.html
- decode(instance_heatmaps: numpy.ndarray, instance_scores: numpy.ndarray) Tuple[numpy.ndarray, numpy.ndarray] [source]¶
Decode keypoint coordinates from decoupled heatmaps. The decoded keypoint coordinates are in the input image space.
- Parameters
instance_heatmaps (np.ndarray) – Heatmaps in shape (N, K, H, W)
instance_scores (np.ndarray) – Confidence of instance roots prediction in shape (N, 1)
- Returns
- keypoints (np.ndarray): Decoded keypoint coordinates in shape
(N, K, D)
- scores (np.ndarray): The keypoint scores in shape (N, K). It
usually represents the confidence of the keypoint prediction
- Return type
tuple
- encode(keypoints: numpy.ndarray, keypoints_visible: Optional[numpy.ndarray] = None, bbox: Optional[numpy.ndarray] = None) dict [source]¶
Encode keypoints into heatmaps.
- Parameters
keypoints (np.ndarray) – Keypoint coordinates in shape (N, K, D)
keypoints_visible (np.ndarray) – Keypoint visibilities in shape (N, K)
bbox (np.ndarray) – Bounding box in shape (N, 8) which includes coordinates of 4 corners.
- Returns
- heatmaps (np.ndarray): The coupled heatmap in shape
(1+K, H, W) where [W, H] is the heatmap_size.
- instance_heatmaps (np.ndarray): The decoupled heatmap in shape
(N*K, H, W) where M is the number of instances.
- keypoint_weights (np.ndarray): The weight for heatmaps in shape
(N*K).
- instance_coords (np.ndarray): The coordinates of instance roots
in shape (N, 2)
- Return type
dict
- class mmpose.codecs.IntegralRegressionLabel(input_size: Tuple[int, int], heatmap_size: Tuple[int, int], sigma: float, unbiased: bool = False, blur_kernel_size: int = 11, normalize: bool = True)[source]¶
Generate keypoint coordinates and normalized heatmaps. See the paper: DSNT by Nibali et al(2018).
Note
instance number: N
keypoint number: K
keypoint dimension: D
image size: [w, h]
Encoded:
- keypoint_labels (np.ndarray): The normalized regression labels in
shape (N, K, D) where D is 2 for 2d coordinates
- heatmaps (np.ndarray): The generated heatmap in shape (K, H, W) where
[W, H] is the heatmap_size
keypoint_weights (np.ndarray): The target weights in shape (N, K)
- Parameters
input_size (tuple) – Input image size in [w, h]
heatmap_size (tuple) – Heatmap size in [W, H]
sigma (float) – The sigma value of the Gaussian heatmap
unbiased (bool) – Whether use unbiased method (DarkPose) in
'msra'
encoding. See Dark Pose for details. Defaults toFalse
blur_kernel_size (int) – The Gaussian blur kernel size of the heatmap modulation in DarkPose. The kernel size and sigma should follow the expirical formula \(sigma = 0.3*((ks-1)*0.5-1)+0.8\). Defaults to 11
normalize (bool) – Whether to normalize the heatmaps. Defaults to True.
- decode(encoded: numpy.ndarray) Tuple[numpy.ndarray, numpy.ndarray] [source]¶
Decode keypoint coordinates from normalized space to input image space.
- Parameters
encoded (np.ndarray) – Coordinates in shape (N, K, D)
- Returns
keypoints (np.ndarray): Decoded coordinates in shape (N, K, D)
- socres (np.ndarray): The keypoint scores in shape (N, K).
It usually represents the confidence of the keypoint prediction
- Return type
tuple
- encode(keypoints: numpy.ndarray, keypoints_visible: Optional[numpy.ndarray] = None) dict [source]¶
Encoding keypoints to regression labels and heatmaps.
- Parameters
keypoints (np.ndarray) – Keypoint coordinates in shape (N, K, D)
keypoints_visible (np.ndarray) – Keypoint visibilities in shape (N, K)
- Returns
- keypoint_labels (np.ndarray): The normalized regression labels in
shape (N, K, D) where D is 2 for 2d coordinates
- heatmaps (np.ndarray): The generated heatmap in shape
(K, H, W) where [W, H] is the heatmap_size
- keypoint_weights (np.ndarray): The target weights in shape
(N, K)
- Return type
dict
- class mmpose.codecs.MSRAHeatmap(input_size: Tuple[int, int], heatmap_size: Tuple[int, int], sigma: float, unbiased: bool = False, blur_kernel_size: int = 11)[source]¶
Represent keypoints as heatmaps via “MSRA” approach. See the paper: Simple Baselines for Human Pose Estimation and Tracking by Xiao et al (2018) for details.
Note
instance number: N
keypoint number: K
keypoint dimension: D
image size: [w, h]
heatmap size: [W, H]
Encoded:
- heatmaps (np.ndarray): The generated heatmap in shape (K, H, W)
where [W, H] is the heatmap_size
keypoint_weights (np.ndarray): The target weights in shape (N, K)
- Parameters
input_size (tuple) – Image size in [w, h]
heatmap_size (tuple) – Heatmap size in [W, H]
sigma (float) – The sigma value of the Gaussian heatmap
unbiased (bool) – Whether use unbiased method (DarkPose) in
'msra'
encoding. See Dark Pose for details. Defaults toFalse
blur_kernel_size (int) – The Gaussian blur kernel size of the heatmap modulation in DarkPose. The kernel size and sigma should follow the expirical formula \(sigma = 0.3*((ks-1)*0.5-1)+0.8\). Defaults to 11
- decode(encoded: numpy.ndarray) Tuple[numpy.ndarray, numpy.ndarray] [source]¶
Decode keypoint coordinates from heatmaps. The decoded keypoint coordinates are in the input image space.
- Parameters
encoded (np.ndarray) – Heatmaps in shape (K, H, W)
- Returns
- keypoints (np.ndarray): Decoded keypoint coordinates in shape
(N, K, D)
- scores (np.ndarray): The keypoint scores in shape (N, K). It
usually represents the confidence of the keypoint prediction
- Return type
tuple
- encode(keypoints: numpy.ndarray, keypoints_visible: Optional[numpy.ndarray] = None) dict [source]¶
Encode keypoints into heatmaps. Note that the original keypoint coordinates should be in the input image space.
- Parameters
keypoints (np.ndarray) – Keypoint coordinates in shape (N, K, D)
keypoints_visible (np.ndarray) – Keypoint visibilities in shape (N, K)
- Returns
- heatmaps (np.ndarray): The generated heatmap in shape
(K, H, W) where [W, H] is the heatmap_size
- keypoint_weights (np.ndarray): The target weights in shape
(N, K)
- Return type
dict
- class mmpose.codecs.MegviiHeatmap(input_size: Tuple[int, int], heatmap_size: Tuple[int, int], kernel_size: int)[source]¶
Represent keypoints as heatmaps via “Megvii” approach. See MSPN (2019) and CPN (2018) for details.
Note
instance number: N
keypoint number: K
keypoint dimension: D
image size: [w, h]
heatmap size: [W, H]
Encoded:
- heatmaps (np.ndarray): The generated heatmap in shape (K, H, W)
where [W, H] is the heatmap_size
keypoint_weights (np.ndarray): The target weights in shape (N, K)
- Parameters
input_size (tuple) – Image size in [w, h]
heatmap_size (tuple) – Heatmap size in [W, H]
kernel_size (tuple) – The kernel size of the heatmap gaussian in [ks_x, ks_y]
- decode(encoded: numpy.ndarray) Tuple[numpy.ndarray, numpy.ndarray] [source]¶
Decode keypoint coordinates from heatmaps. The decoded keypoint coordinates are in the input image space.
- Parameters
encoded (np.ndarray) – Heatmaps in shape (K, H, W)
- Returns
- keypoints (np.ndarray): Decoded keypoint coordinates in shape
(K, D)
- scores (np.ndarray): The keypoint scores in shape (K,). It
usually represents the confidence of the keypoint prediction
- Return type
tuple
- encode(keypoints: numpy.ndarray, keypoints_visible: Optional[numpy.ndarray] = None) dict [source]¶
Encode keypoints into heatmaps. Note that the original keypoint coordinates should be in the input image space.
- Parameters
keypoints (np.ndarray) – Keypoint coordinates in shape (N, K, D)
keypoints_visible (np.ndarray) – Keypoint visibilities in shape (N, K)
- Returns
- heatmaps (np.ndarray): The generated heatmap in shape
(K, H, W) where [W, H] is the heatmap_size
- keypoint_weights (np.ndarray): The target weights in shape
(N, K)
- Return type
dict
- class mmpose.codecs.RegressionLabel(input_size: Tuple[int, int])[source]¶
Generate keypoint coordinates.
Note
instance number: N
keypoint number: K
keypoint dimension: D
image size: [w, h]
Encoded:
- keypoint_labels (np.ndarray): The normalized regression labels in
shape (N, K, D) where D is 2 for 2d coordinates
keypoint_weights (np.ndarray): The target weights in shape (N, K)
- Parameters
input_size (tuple) – Input image size in [w, h]
- decode(encoded: numpy.ndarray) Tuple[numpy.ndarray, numpy.ndarray] [source]¶
Decode keypoint coordinates from normalized space to input image space.
- Parameters
encoded (np.ndarray) – Coordinates in shape (N, K, D)
- Returns
keypoints (np.ndarray): Decoded coordinates in shape (N, K, D)
- socres (np.ndarray): The keypoint scores in shape (N, K).
It usually represents the confidence of the keypoint prediction
- Return type
tuple
- encode(keypoints: numpy.ndarray, keypoints_visible: Optional[numpy.ndarray] = None) dict [source]¶
Encoding keypoints from input image space to normalized space.
- Parameters
keypoints (np.ndarray) – Keypoint coordinates in shape (N, K, D)
keypoints_visible (np.ndarray) – Keypoint visibilities in shape (N, K)
- Returns
- keypoint_labels (np.ndarray): The normalized regression labels in
shape (N, K, D) where D is 2 for 2d coordinates
- keypoint_weights (np.ndarray): The target weights in shape
(N, K)
- Return type
dict
- class mmpose.codecs.SPR(input_size: Tuple[int, int], heatmap_size: Tuple[int, int], sigma: Optional[Union[float, Tuple[float]]] = None, generate_keypoint_heatmaps: bool = False, root_type: str = 'kpt_center', minimal_diagonal_length: Union[int, float] = 5, background_weight: float = 0.1, decode_nms_kernel: int = 5, decode_max_instances: int = 30, decode_thr: float = 0.01)[source]¶
Encode/decode keypoints with Structured Pose Representation (SPR).
See the paper Single-stage multi-person pose machines by Nie et al (2017) for details
Note
instance number: N
keypoint number: K
keypoint dimension: D
image size: [w, h]
heatmap size: [W, H]
Encoded:
- heatmaps (np.ndarray): The generated heatmap in shape (1, H, W)
where [W, H] is the heatmap_size. If the keypoint heatmap is generated together, the output heatmap shape is (K+1, H, W)
- heatmap_weights (np.ndarray): The target weights for heatmaps which
has same shape with heatmaps.
- displacements (np.ndarray): The dense keypoint displacement in
shape (K*2, H, W).
- displacement_weights (np.ndarray): The target weights for heatmaps
which has same shape with displacements.
- Parameters
input_size (tuple) – Image size in [w, h]
heatmap_size (tuple) – Heatmap size in [W, H]
sigma (float or tuple, optional) – The sigma values of the Gaussian heatmaps. If sigma is a tuple, it includes both sigmas for root and keypoint heatmaps.
None
means the sigmas are computed automatically from the heatmap size. Defaults toNone
generate_keypoint_heatmaps (bool) – Whether to generate Gaussian heatmaps for each keypoint. Defaults to
False
root_type (str) –
The method to generate the instance root. Options are:
'kpt_center'
: Average coordinate of all visible keypoints.'bbox_center'
: Center point of bounding boxes outlined byall visible keypoints.
Defaults to
'kpt_center'
minimal_diagonal_length (int or float) – The threshold of diagonal length of instance bounding box. Small instances will not be used in training. Defaults to 32
background_weight (float) – Loss weight of background pixels. Defaults to 0.1
decode_thr (float) – The threshold of keypoint response value in heatmaps. Defaults to 0.01
decode_nms_kernel (int) – The kernel size of the NMS during decoding, which should be an odd integer. Defaults to 5
decode_max_instances (int) – The maximum number of instances to decode. Defaults to 30
- decode(heatmaps: torch.Tensor, displacements: torch.Tensor) Tuple[numpy.ndarray, numpy.ndarray] [source]¶
Decode the keypoint coordinates from heatmaps and displacements. The decoded keypoint coordinates are in the input image space.
- Parameters
heatmaps (Tensor) – Encoded root and keypoints (optional) heatmaps in shape (1, H, W) or (K+1, H, W)
displacements (Tensor) – Encoded keypoints displacement fields in shape (K*D, H, W)
- Returns
- keypoints (Tensor): Decoded keypoint coordinates in shape
(N, K, D)
- scores (tuple):
root_scores (Tensor): The root scores in shape (N, )
- keypoint_scores (Tensor): The keypoint scores in
shape (N, K). If keypoint heatmaps are not generated, keypoint_scores will be None
- Return type
tuple
- encode(keypoints: numpy.ndarray, keypoints_visible: Optional[numpy.ndarray] = None) dict [source]¶
Encode keypoints into root heatmaps and keypoint displacement fields. Note that the original keypoint coordinates should be in the input image space.
- Parameters
keypoints (np.ndarray) – Keypoint coordinates in shape (N, K, D)
keypoints_visible (np.ndarray) – Keypoint visibilities in shape (N, K)
- Returns
- heatmaps (np.ndarray): The generated heatmap in shape
(1, H, W) where [W, H] is the heatmap_size. If keypoint heatmaps are generated together, the shape is (K+1, H, W)
- heatmap_weights (np.ndarray): The pixel-wise weight for heatmaps
which has same shape with heatmaps
- displacements (np.ndarray): The generated displacement fields in
shape (K*D, H, W). The vector on each pixels represents the displacement of keypoints belong to the associated instance from this pixel.
- displacement_weights (np.ndarray): The pixel-wise weight for
displacements which has same shape with displacements
- Return type
dict
- get_keypoint_scores(heatmaps: torch.Tensor, keypoints: torch.Tensor)[source]¶
Calculate the keypoint scores with keypoints heatmaps and coordinates.
- Parameters
heatmaps (Tensor) – Keypoint heatmaps in shape (K, H, W)
keypoints (Tensor) – Keypoint coordinates in shape (N, K, D)
- Returns
Keypoint scores in [N, K]
- Return type
Tensor
- class mmpose.codecs.SimCCLabel(input_size: Tuple[int, int], smoothing_type: str = 'gaussian', sigma: Union[float, int, Tuple[float]] = 6.0, simcc_split_ratio: float = 2.0, label_smooth_weight: float = 0.0, normalize: bool = True, use_dark: bool = False)[source]¶
Generate keypoint representation via “SimCC” approach. See the paper: `SimCC: a Simple Coordinate Classification Perspective for Human Pose Estimation`_ by Li et al (2022) for more details. Old name: SimDR
Note
instance number: N
keypoint number: K
keypoint dimension: D
image size: [w, h]
Encoded:
- keypoint_x_labels (np.ndarray): The generated SimCC label for x-axis.
The label shape is (N, K, Wx) if
smoothing_type=='gaussian'
and (N, K) if smoothing_type==’standard’`, where \(Wx=w*simcc_split_ratio\)
- keypoint_y_labels (np.ndarray): The generated SimCC label for y-axis.
The label shape is (N, K, Wy) if
smoothing_type=='gaussian'
and (N, K) if smoothing_type==’standard’`, where \(Wy=h*simcc_split_ratio\)
keypoint_weights (np.ndarray): The target weights in shape (N, K)
- Parameters
input_size (tuple) – Input image size in [w, h]
smoothing_type (str) – The SimCC label smoothing strategy. Options are
'gaussian'
and'standard'
. Defaults to'gaussian'
sigma (float | int | tuple) – The sigma value in the Gaussian SimCC label. Defaults to 6.0
simcc_split_ratio (float) – The ratio of the label size to the input size. For example, if the input width is
w
, the x label size will be \(w*simcc_split_ratio\). Defaults to 2.0label_smooth_weight (float) – Label Smoothing weight. Defaults to 0.0
normalize (bool) – Whether to normalize the heatmaps. Defaults to True.
Estimation`: https://arxiv.org/abs/2107.03332
- decode(simcc_x: numpy.ndarray, simcc_y: numpy.ndarray) Tuple[numpy.ndarray, numpy.ndarray] [source]¶
Decode keypoint coordinates from SimCC representations. The decoded coordinates are in the input image space.
- Parameters
encoded (Tuple[np.ndarray, np.ndarray]) – SimCC labels for x-axis and y-axis
simcc_x (np.ndarray) – SimCC label for x-axis
simcc_y (np.ndarray) – SimCC label for y-axis
- Returns
keypoints (np.ndarray): Decoded coordinates in shape (N, K, D)
- socres (np.ndarray): The keypoint scores in shape (N, K).
It usually represents the confidence of the keypoint prediction
- Return type
tuple
- encode(keypoints: numpy.ndarray, keypoints_visible: Optional[numpy.ndarray] = None) dict [source]¶
Encoding keypoints into SimCC labels. Note that the original keypoint coordinates should be in the input image space.
- Parameters
keypoints (np.ndarray) – Keypoint coordinates in shape (N, K, D)
keypoints_visible (np.ndarray) – Keypoint visibilities in shape (N, K)
- Returns
- keypoint_x_labels (np.ndarray): The generated SimCC label for
x-axis. The label shape is (N, K, Wx) if
smoothing_type=='gaussian'
and (N, K) if smoothing_type==’standard’`, where \(Wx=w*simcc_split_ratio\)
- keypoint_y_labels (np.ndarray): The generated SimCC label for
y-axis. The label shape is (N, K, Wy) if
smoothing_type=='gaussian'
and (N, K) if smoothing_type==’standard’`, where \(Wy=h*simcc_split_ratio\)
- keypoint_weights (np.ndarray): The target weights in shape
(N, K)
- Return type
dict
- class mmpose.codecs.UDPHeatmap(input_size: Tuple[int, int], heatmap_size: Tuple[int, int], heatmap_type: str = 'gaussian', sigma: float = 2.0, radius_factor: float = 0.0546875, blur_kernel_size: int = 11)[source]¶
Generate keypoint heatmaps by Unbiased Data Processing (UDP). See the paper: `The Devil is in the Details: Delving into Unbiased Data Processing for Human Pose Estimation`_ by Huang et al (2020) for details.
Note
instance number: N
keypoint number: K
keypoint dimension: D
image size: [w, h]
heatmap size: [W, H]
Encoded:
- heatmap (np.ndarray): The generated heatmap in shape (C_out, H, W)
where [W, H] is the heatmap_size, and the C_out is the output channel number which depends on the heatmap_type. If heatmap_type==’gaussian’, C_out equals to keypoint number K; if heatmap_type==’combined’, C_out equals to K*3 (x_offset, y_offset and class label)
keypoint_weights (np.ndarray): The target weights in shape (K,)
- Parameters
input_size (tuple) – Image size in [w, h]
heatmap_size (tuple) – Heatmap size in [W, H]
heatmap_type (str) –
The heatmap type to encode the keypoitns. Options are:
'gaussian'
: Gaussian heatmap'combined'
: Combination of a binary label map and offsetmaps for X and Y axes.
sigma (float) – The sigma value of the Gaussian heatmap when
heatmap_type=='gaussian'
. Defaults to 2.0radius_factor (float) – The radius factor of the binary label map when
heatmap_type=='combined'
. The positive region is defined as the neighbor of the keypoit with the radius \(r=radius_factor*max(W, H)\). Defaults to 0.0546875blur_kernel_size (int) – The Gaussian blur kernel size of the heatmap modulation in DarkPose. Defaults to 11
Human Pose Estimation`: https://arxiv.org/abs/1911.07524
- decode(encoded: numpy.ndarray) Tuple[numpy.ndarray, numpy.ndarray] [source]¶
Decode keypoint coordinates from heatmaps. The decoded keypoint coordinates are in the input image space.
- Parameters
encoded (np.ndarray) – Heatmaps in shape (K, H, W)
- Returns
- keypoints (np.ndarray): Decoded keypoint coordinates in shape
(N, K, D)
- scores (np.ndarray): The keypoint scores in shape (N, K). It
usually represents the confidence of the keypoint prediction
- Return type
tuple
- encode(keypoints: numpy.ndarray, keypoints_visible: Optional[numpy.ndarray] = None) dict [source]¶
Encode keypoints into heatmaps. Note that the original keypoint coordinates should be in the input image space.
- Parameters
keypoints (np.ndarray) – Keypoint coordinates in shape (N, K, D)
keypoints_visible (np.ndarray) – Keypoint visibilities in shape (N, K)
- Returns
- heatmap (np.ndarray): The generated heatmap in shape
(C_out, H, W) where [W, H] is the heatmap_size, and the C_out is the output channel number which depends on the heatmap_type. If heatmap_type==’gaussian’, C_out equals to keypoint number K; if heatmap_type==’combined’, C_out equals to K*3 (x_offset, y_offset and class label)
- keypoint_weights (np.ndarray): The target weights in shape
(K,)
- Return type
dict
mmpose.models¶
backbones¶
- class mmpose.models.backbones.AlexNet(num_classes=- 1, init_cfg=None)[source]¶
AlexNet backbone.
The input for AlexNet is a 224x224 RGB image.
- Parameters
num_classes (int) – number of classes for classification. The default value is -1, which uses the backbone as a feature extractor without the top classifier.
init_cfg (dict or list[dict], optional) – Initialization config dict. Default: None
- class mmpose.models.backbones.CPM(in_channels, out_channels, feat_channels=128, middle_channels=32, num_stages=6, norm_cfg={'requires_grad': True, 'type': 'BN'}, init_cfg=[{'type': 'Normal', 'std': 0.001, 'layer': ['Conv2d']}, {'type': 'Constant', 'val': 1, 'layer': ['_BatchNorm', 'GroupNorm']}])[source]¶
CPM backbone.
Convolutional Pose Machines. More details can be found in the paper .
- Parameters
in_channels (int) – The input channels of the CPM.
out_channels (int) – The output channels of the CPM.
feat_channels (int) – Feature channel of each CPM stage.
middle_channels (int) – Feature channel of conv after the middle stage.
num_stages (int) – Number of stages.
norm_cfg (dict) – Dictionary to construct and config norm layer.
init_cfg (dict or list[dict], optional) –
Initialization config dict. Default: ``[
dict(type=’Normal’, std=0.001, layer=[‘Conv2d’]), dict(
type=’Constant’, val=1, layer=[‘_BatchNorm’, ‘GroupNorm’])
]``
Example
>>> from mmpose.models import CPM >>> import torch >>> self = CPM(3, 17) >>> self.eval() >>> inputs = torch.rand(1, 3, 368, 368) >>> level_outputs = self.forward(inputs) >>> for level_output in level_outputs: ... print(tuple(level_output.shape)) (1, 17, 46, 46) (1, 17, 46, 46) (1, 17, 46, 46) (1, 17, 46, 46) (1, 17, 46, 46) (1, 17, 46, 46)
- class mmpose.models.backbones.HRFormer(extra, in_channels=3, conv_cfg=None, norm_cfg={'requires_grad': True, 'type': 'BN'}, transformer_norm_cfg={'eps': 1e-06, 'type': 'LN'}, norm_eval=False, with_cp=False, zero_init_residual=False, frozen_stages=- 1, init_cfg=[{'type': 'Normal', 'std': 0.001, 'layer': ['Conv2d']}, {'type': 'Constant', 'val': 1, 'layer': ['_BatchNorm', 'GroupNorm']}])[source]¶
HRFormer backbone.
This backbone is the implementation of HRFormer: High-Resolution Transformer for Dense Prediction.
- Parameters
extra (dict) –
Detailed configuration for each stage of HRNet. There must be 4 stages, the configuration for each stage must have 5 keys:
num_modules (int): The number of HRModule in this stage.
num_branches (int): The number of branches in the HRModule.
block (str): The type of block.
- num_blocks (tuple): The number of blocks in each branch.
The length must be equal to num_branches.
- num_channels (tuple): The number of channels in each branch.
The length must be equal to num_branches.
in_channels (int) – Number of input image channels. Normally 3.
conv_cfg (dict) – Dictionary to construct and config conv layer. Default: None.
norm_cfg (dict) – Config of norm layer. Use SyncBN by default.
transformer_norm_cfg (dict) – Config of transformer norm layer. Use LN by default.
norm_eval (bool) – Whether to set norm layers to eval mode, namely, freeze running stats (mean and var). Note: Effect on Batch Norm and its variants only. Default: False.
zero_init_residual (bool) – Whether to use zero init for last norm layer in resblocks to let them behave as identity. Default: False.
frozen_stages (int) – Stages to be frozen (stop grad and set eval mode). -1 means not freezing any parameters. Default: -1.
init_cfg (dict or list[dict], optional) –
Initialization config dict. Default: ``[
dict(type=’Normal’, std=0.001, layer=[‘Conv2d’]), dict(
type=’Constant’, val=1, layer=[‘_BatchNorm’, ‘GroupNorm’])
]``
Example
>>> from mmpose.models import HRFormer >>> import torch >>> extra = dict( >>> stage1=dict( >>> num_modules=1, >>> num_branches=1, >>> block='BOTTLENECK', >>> num_blocks=(2, ), >>> num_channels=(64, )), >>> stage2=dict( >>> num_modules=1, >>> num_branches=2, >>> block='HRFORMER', >>> window_sizes=(7, 7), >>> num_heads=(1, 2), >>> mlp_ratios=(4, 4), >>> num_blocks=(2, 2), >>> num_channels=(32, 64)), >>> stage3=dict( >>> num_modules=4, >>> num_branches=3, >>> block='HRFORMER', >>> window_sizes=(7, 7, 7), >>> num_heads=(1, 2, 4), >>> mlp_ratios=(4, 4, 4), >>> num_blocks=(2, 2, 2), >>> num_channels=(32, 64, 128)), >>> stage4=dict( >>> num_modules=2, >>> num_branches=4, >>> block='HRFORMER', >>> window_sizes=(7, 7, 7, 7), >>> num_heads=(1, 2, 4, 8), >>> mlp_ratios=(4, 4, 4, 4), >>> num_blocks=(2, 2, 2, 2), >>> num_channels=(32, 64, 128, 256))) >>> self = HRFormer(extra, in_channels=1) >>> self.eval() >>> inputs = torch.rand(1, 1, 32, 32) >>> level_outputs = self.forward(inputs) >>> for level_out in level_outputs: ... print(tuple(level_out.shape)) (1, 32, 8, 8) (1, 64, 4, 4) (1, 128, 2, 2) (1, 256, 1, 1)
- class mmpose.models.backbones.HRNet(extra, in_channels=3, conv_cfg=None, norm_cfg={'type': 'BN'}, norm_eval=False, with_cp=False, zero_init_residual=False, frozen_stages=- 1, init_cfg=[{'type': 'Normal', 'std': 0.001, 'layer': ['Conv2d']}, {'type': 'Constant', 'val': 1, 'layer': ['_BatchNorm', 'GroupNorm']}])[source]¶
HRNet backbone.
High-Resolution Representations for Labeling Pixels and Regions
- Parameters
extra (dict) – detailed configuration for each stage of HRNet.
in_channels (int) – Number of input image channels. Default: 3.
conv_cfg (dict) – dictionary to construct and config conv layer.
norm_cfg (dict) – dictionary to construct and config norm layer.
norm_eval (bool) – Whether to set norm layers to eval mode, namely, freeze running stats (mean and var). Note: Effect on Batch Norm and its variants only. Default: False
with_cp (bool) – Use checkpoint or not. Using checkpoint will save some memory while slowing down the training speed.
zero_init_residual (bool) – whether to use zero init for last norm layer in resblocks to let them behave as identity.
frozen_stages (int) – Stages to be frozen (stop grad and set eval mode). -1 means not freezing any parameters. Default: -1.
init_cfg (dict or list[dict], optional) –
Initialization config dict. Default: ``[
dict(type=’Normal’, std=0.001, layer=[‘Conv2d’]), dict(
type=’Constant’, val=1, layer=[‘_BatchNorm’, ‘GroupNorm’])
]``
Example
>>> from mmpose.models import HRNet >>> import torch >>> extra = dict( >>> stage1=dict( >>> num_modules=1, >>> num_branches=1, >>> block='BOTTLENECK', >>> num_blocks=(4, ), >>> num_channels=(64, )), >>> stage2=dict( >>> num_modules=1, >>> num_branches=2, >>> block='BASIC', >>> num_blocks=(4, 4), >>> num_channels=(32, 64)), >>> stage3=dict( >>> num_modules=4, >>> num_branches=3, >>> block='BASIC', >>> num_blocks=(4, 4, 4), >>> num_channels=(32, 64, 128)), >>> stage4=dict( >>> num_modules=3, >>> num_branches=4, >>> block='BASIC', >>> num_blocks=(4, 4, 4, 4), >>> num_channels=(32, 64, 128, 256))) >>> self = HRNet(extra, in_channels=1) >>> self.eval() >>> inputs = torch.rand(1, 1, 32, 32) >>> level_outputs = self.forward(inputs) >>> for level_out in level_outputs: ... print(tuple(level_out.shape)) (1, 32, 8, 8)
- property norm1¶
the normalization layer named “norm1”
- Type
nn.Module
- property norm2¶
the normalization layer named “norm2”
- Type
nn.Module
- class mmpose.models.backbones.HourglassAENet(downsample_times=4, num_stacks=1, out_channels=34, stage_channels=(256, 384, 512, 640, 768), feat_channels=256, norm_cfg={'requires_grad': True, 'type': 'BN'}, init_cfg=[{'type': 'Normal', 'std': 0.001, 'layer': ['Conv2d']}, {'type': 'Constant', 'val': 1, 'layer': ['_BatchNorm', 'GroupNorm']}])[source]¶
Hourglass-AE Network proposed by Newell et al.
Associative Embedding: End-to-End Learning for Joint Detection and Grouping.
More details can be found in the paper .
- Parameters
downsample_times (int) – Downsample times in a HourglassModule.
num_stacks (int) – Number of HourglassModule modules stacked, 1 for Hourglass-52, 2 for Hourglass-104.
stage_channels (list[int]) – Feature channel of each sub-module in a HourglassModule.
stage_blocks (list[int]) – Number of sub-modules stacked in a HourglassModule.
feat_channels (int) – Feature channel of conv after a HourglassModule.
norm_cfg (dict) – Dictionary to construct and config norm layer.
init_cfg (dict or list[dict], optional) –
Initialization config dict. Default: ``[
dict(type=’Normal’, std=0.001, layer=[‘Conv2d’]), dict(
type=’Constant’, val=1, layer=[‘_BatchNorm’, ‘GroupNorm’])
]``
Example
>>> from mmpose.models import HourglassAENet >>> import torch >>> self = HourglassAENet() >>> self.eval() >>> inputs = torch.rand(1, 3, 512, 512) >>> level_outputs = self.forward(inputs) >>> for level_output in level_outputs: ... print(tuple(level_output.shape)) (1, 34, 128, 128)
- class mmpose.models.backbones.HourglassNet(downsample_times=5, num_stacks=2, stage_channels=(256, 256, 384, 384, 384, 512), stage_blocks=(2, 2, 2, 2, 2, 4), feat_channel=256, norm_cfg={'requires_grad': True, 'type': 'BN'}, init_cfg=[{'type': 'Normal', 'std': 0.001, 'layer': ['Conv2d']}, {'type': 'Constant', 'val': 1, 'layer': ['_BatchNorm', 'GroupNorm']}])[source]¶
HourglassNet backbone.
Stacked Hourglass Networks for Human Pose Estimation. More details can be found in the paper .
- Parameters
downsample_times (int) – Downsample times in a HourglassModule.
num_stacks (int) – Number of HourglassModule modules stacked, 1 for Hourglass-52, 2 for Hourglass-104.
stage_channels (list[int]) – Feature channel of each sub-module in a HourglassModule.
stage_blocks (list[int]) – Number of sub-modules stacked in a HourglassModule.
feat_channel (int) – Feature channel of conv after a HourglassModule.
norm_cfg (dict) – Dictionary to construct and config norm layer.
init_cfg (dict or list[dict], optional) –
Initialization config dict. Default: ``[
dict(type=’Normal’, std=0.001, layer=[‘Conv2d’]), dict(
type=’Constant’, val=1, layer=[‘_BatchNorm’, ‘GroupNorm’])
]``
Example
>>> from mmpose.models import HourglassNet >>> import torch >>> self = HourglassNet() >>> self.eval() >>> inputs = torch.rand(1, 3, 511, 511) >>> level_outputs = self.forward(inputs) >>> for level_output in level_outputs: ... print(tuple(level_output.shape)) (1, 256, 128, 128) (1, 256, 128, 128)
- class mmpose.models.backbones.LiteHRNet(extra, in_channels=3, conv_cfg=None, norm_cfg={'type': 'BN'}, norm_eval=False, with_cp=False, init_cfg=[{'type': 'Normal', 'std': 0.001, 'layer': ['Conv2d']}, {'type': 'Constant', 'val': 1, 'layer': ['_BatchNorm', 'GroupNorm']}])[source]¶
Lite-HRNet backbone.
Lite-HRNet: A Lightweight High-Resolution Network.
Code adapted from ‘https://github.com/HRNet/Lite-HRNet’.
- Parameters
extra (dict) – detailed configuration for each stage of HRNet.
in_channels (int) – Number of input image channels. Default: 3.
conv_cfg (dict) – dictionary to construct and config conv layer.
norm_cfg (dict) – dictionary to construct and config norm layer.
norm_eval (bool) – Whether to set norm layers to eval mode, namely, freeze running stats (mean and var). Note: Effect on Batch Norm and its variants only. Default: False
with_cp (bool) – Use checkpoint or not. Using checkpoint will save some memory while slowing down the training speed.
init_cfg (dict or list[dict], optional) –
Initialization config dict. Default: ``[
dict(type=’Normal’, std=0.001, layer=[‘Conv2d’]), dict(
type=’Constant’, val=1, layer=[‘_BatchNorm’, ‘GroupNorm’])
]``
Example
>>> from mmpose.models import LiteHRNet >>> import torch >>> extra=dict( >>> stem=dict(stem_channels=32, out_channels=32, expand_ratio=1), >>> num_stages=3, >>> stages_spec=dict( >>> num_modules=(2, 4, 2), >>> num_branches=(2, 3, 4), >>> num_blocks=(2, 2, 2), >>> module_type=('LITE', 'LITE', 'LITE'), >>> with_fuse=(True, True, True), >>> reduce_ratios=(8, 8, 8), >>> num_channels=( >>> (40, 80), >>> (40, 80, 160), >>> (40, 80, 160, 320), >>> )), >>> with_head=False) >>> self = LiteHRNet(extra, in_channels=1) >>> self.eval() >>> inputs = torch.rand(1, 1, 32, 32) >>> level_outputs = self.forward(inputs) >>> for level_out in level_outputs: ... print(tuple(level_out.shape)) (1, 40, 8, 8)
- class mmpose.models.backbones.MSPN(unit_channels=256, num_stages=4, num_units=4, num_blocks=[2, 2, 2, 2], norm_cfg={'type': 'BN'}, res_top_channels=64, init_cfg=[{'type': 'Kaiming', 'layer': ['Conv2d']}, {'type': 'Constant', 'val': 1, 'layer': ['_BatchNorm', 'GroupNorm']}, {'type': 'Normal', 'std': 0.01, 'layer': ['Linear']}])[source]¶
MSPN backbone. Paper ref: Li et al. “Rethinking on Multi-Stage Networks for Human Pose Estimation” (CVPR 2020).
- Parameters
unit_channels (int) – Number of Channels in an upsample unit. Default: 256
num_stages (int) – Number of stages in a multi-stage MSPN. Default: 4
num_units (int) – Number of downsample/upsample units in a single-stage network. Default: 4 Note: Make sure num_units == len(self.num_blocks)
num_blocks (list) – Number of bottlenecks in each downsample unit. Default: [2, 2, 2, 2]
norm_cfg (dict) – dictionary to construct and config norm layer. Default: dict(type=’BN’)
res_top_channels (int) – Number of channels of feature from ResNetTop. Default: 64.
init_cfg (dict or list[dict], optional) –
Initialization config dict. Default: ``[
dict(type=’Kaiming’, layer=[‘Conv2d’]), dict(
type=’Constant’, val=1, layer=[‘_BatchNorm’, ‘GroupNorm’]),
- dict(
type=’Normal’, std=0.01, layer=[‘Linear’]),
]``
Example
>>> from mmpose.models import MSPN >>> import torch >>> self = MSPN(num_stages=2,num_units=2,num_blocks=[2,2]) >>> self.eval() >>> inputs = torch.rand(1, 3, 511, 511) >>> level_outputs = self.forward(inputs) >>> for level_output in level_outputs: ... for feature in level_output: ... print(tuple(feature.shape)) ... (1, 256, 64, 64) (1, 256, 128, 128) (1, 256, 64, 64) (1, 256, 128, 128)
- class mmpose.models.backbones.MobileNetV2(widen_factor=1.0, out_indices=(7,), frozen_stages=- 1, conv_cfg=None, norm_cfg={'type': 'BN'}, act_cfg={'type': 'ReLU6'}, norm_eval=False, with_cp=False, init_cfg=[{'type': 'Kaiming', 'layer': ['Conv2d']}, {'type': 'Constant', 'val': 1, 'layer': ['_BatchNorm', 'GroupNorm']}])[source]¶
MobileNetV2 backbone.
- Parameters
widen_factor (float) – Width multiplier, multiply number of channels in each layer by this amount. Default: 1.0.
out_indices (None or Sequence[int]) – Output from which stages. Default: (7, ).
frozen_stages (int) – Stages to be frozen (all param fixed). Default: -1, which means not freezing any parameters.
conv_cfg (dict) – Config dict for convolution layer. Default: None, which means using conv2d.
norm_cfg (dict) – Config dict for normalization layer. Default: dict(type=’BN’).
act_cfg (dict) – Config dict for activation layer. Default: dict(type=’ReLU6’).
norm_eval (bool) – Whether to set norm layers to eval mode, namely, freeze running stats (mean and var). Note: Effect on Batch Norm and its variants only. Default: False.
with_cp (bool) – Use checkpoint or not. Using checkpoint will save some memory while slowing down the training speed. Default: False.
init_cfg (dict or list[dict], optional) –
Initialization config dict. Default: ``[
dict(type=’Kaiming’, layer=[‘Conv2d’]), dict(
type=’Constant’, val=1, layer=[‘_BatchNorm’, ‘GroupNorm’])
]``
- forward(x)[source]¶
Forward function.
- Parameters
x (Tensor | tuple[Tensor]) – x could be a torch.Tensor or a tuple of torch.Tensor, containing input data for forward computation.
- make_layer(out_channels, num_blocks, stride, expand_ratio)[source]¶
Stack InvertedResidual blocks to build a layer for MobileNetV2.
- Parameters
out_channels (int) – out_channels of block.
num_blocks (int) – number of blocks.
stride (int) – stride of the first block. Default: 1
expand_ratio (int) – Expand the number of channels of the hidden layer in InvertedResidual by this ratio. Default: 6.
- class mmpose.models.backbones.MobileNetV3(arch='small', conv_cfg=None, norm_cfg={'type': 'BN'}, out_indices=(- 1,), frozen_stages=- 1, norm_eval=False, with_cp=False, init_cfg=[{'type': 'Kaiming', 'layer': ['Conv2d']}, {'type': 'Constant', 'val': 1, 'layer': ['_BatchNorm']}])[source]¶
MobileNetV3 backbone.
- Parameters
arch (str) – Architecture of mobilnetv3, from {small, big}. Default: small.
conv_cfg (dict) – Config dict for convolution layer. Default: None, which means using conv2d.
norm_cfg (dict) – Config dict for normalization layer. Default: dict(type=’BN’).
out_indices (None or Sequence[int]) – Output from which stages. Default: (-1, ), which means output tensors from final stage.
frozen_stages (int) – Stages to be frozen (all param fixed). Default: -1, which means not freezing any parameters.
norm_eval (bool) – Whether to set norm layers to eval mode, namely, freeze running stats (mean and var). Note: Effect on Batch Norm and its variants only. Default: False.
with_cp (bool) – Use checkpoint or not. Using checkpoint will save some memory while slowing down the training speed. Default: False.
init_cfg (dict or list[dict], optional) –
Initialization config dict. Default: ``[
dict(type=’Kaiming’, layer=[‘Conv2d’]), dict(
type=’Constant’, val=1, layer=[‘_BatchNorm’])
]``
- class mmpose.models.backbones.PyramidVisionTransformer(pretrain_img_size=224, in_channels=3, embed_dims=64, num_stages=4, num_layers=[3, 4, 6, 3], num_heads=[1, 2, 5, 8], patch_sizes=[4, 2, 2, 2], strides=[4, 2, 2, 2], paddings=[0, 0, 0, 0], sr_ratios=[8, 4, 2, 1], out_indices=(0, 1, 2, 3), mlp_ratios=[8, 8, 4, 4], qkv_bias=True, drop_rate=0.0, attn_drop_rate=0.0, drop_path_rate=0.1, use_abs_pos_embed=True, norm_after_stage=False, use_conv_ffn=False, act_cfg={'type': 'GELU'}, norm_cfg={'eps': 1e-06, 'type': 'LN'}, convert_weights=True, init_cfg=[{'type': 'TruncNormal', 'std': 0.02, 'layer': ['Linear']}, {'type': 'Constant', 'val': 1, 'layer': ['LayerNorm']}, {'type': 'Kaiming', 'layer': ['Conv2d']}])[source]¶
Pyramid Vision Transformer (PVT)
Implementation of Pyramid Vision Transformer: A Versatile Backbone for Dense Prediction without Convolutions.
- Parameters
pretrain_img_size (int | tuple[int]) – The size of input image when pretrain. Defaults: 224.
in_channels (int) – Number of input channels. Default: 3.
embed_dims (int) – Embedding dimension. Default: 64.
num_stags (int) – The num of stages. Default: 4.
num_layers (Sequence[int]) – The layer number of each transformer encode layer. Default: [3, 4, 6, 3].
num_heads (Sequence[int]) – The attention heads of each transformer encode layer. Default: [1, 2, 5, 8].
patch_sizes (Sequence[int]) – The patch_size of each patch embedding. Default: [4, 2, 2, 2].
strides (Sequence[int]) – The stride of each patch embedding. Default: [4, 2, 2, 2].
paddings (Sequence[int]) – The padding of each patch embedding. Default: [0, 0, 0, 0].
sr_ratios (Sequence[int]) – The spatial reduction rate of each transformer encode layer. Default: [8, 4, 2, 1].
out_indices (Sequence[int] | int) – Output from which stages. Default: (0, 1, 2, 3).
mlp_ratios (Sequence[int]) – The ratio of the mlp hidden dim to the embedding dim of each transformer encode layer. Default: [8, 8, 4, 4].
qkv_bias (bool) – Enable bias for qkv if True. Default: True.
drop_rate (float) – Probability of an element to be zeroed. Default 0.0.
attn_drop_rate (float) – The drop out rate for attention layer. Default 0.0.
drop_path_rate (float) – stochastic depth rate. Default 0.1.
use_abs_pos_embed (bool) – If True, add absolute position embedding to the patch embedding. Defaults: True.
use_conv_ffn (bool) – If True, use Convolutional FFN to replace FFN. Default: False.
act_cfg (dict) – The activation config for FFNs. Default: dict(type=’GELU’).
norm_cfg (dict) – Config dict for normalization layer. Default: dict(type=’LN’).
pretrained (str, optional) – model pretrained path. Default: None.
convert_weights (bool) – The flag indicates whether the pre-trained model is from the original repo. We may need to convert some keys to make it compatible. Default: True.
init_cfg (dict or list[dict], optional) –
Initialization config dict. Default: ``[
dict(type=’TruncNormal’, std=.02, layer=[‘Linear’]), dict(type=’Constant’, val=1, layer=[‘LayerNorm’]), dict(type=’Normal’, std=0.01, layer=[‘Conv2d’])
]``
- forward(x)[source]¶
Defines the computation performed at every call.
Should be overridden by all subclasses.
Note
Although the recipe for forward pass needs to be defined within this function, one should call the
Module
instance afterwards instead of this since the former takes care of running the registered hooks while the latter silently ignores them.
- class mmpose.models.backbones.PyramidVisionTransformerV2(**kwargs)[source]¶
Implementation of PVTv2: Improved Baselines with Pyramid Vision Transformer.
- class mmpose.models.backbones.RSN(unit_channels=256, num_stages=4, num_units=4, num_blocks=[2, 2, 2, 2], num_steps=4, norm_cfg={'type': 'BN'}, res_top_channels=64, expand_times=26, init_cfg=[{'type': 'Kaiming', 'layer': ['Conv2d']}, {'type': 'Constant', 'val': 1, 'layer': ['_BatchNorm', 'GroupNorm']}, {'type': 'Normal', 'std': 0.01, 'layer': ['Linear']}])[source]¶
Residual Steps Network backbone. Paper ref: Cai et al. “Learning Delicate Local Representations for Multi-Person Pose Estimation” (ECCV 2020).
- Parameters
unit_channels (int) – Number of Channels in an upsample unit. Default: 256
num_stages (int) – Number of stages in a multi-stage RSN. Default: 4
num_units (int) – NUmber of downsample/upsample units in a single-stage RSN. Default: 4 Note: Make sure num_units == len(self.num_blocks)
num_blocks (list) – Number of RSBs (Residual Steps Block) in each downsample unit. Default: [2, 2, 2, 2]
num_steps (int) – Number of steps in a RSB. Default:4
norm_cfg (dict) – dictionary to construct and config norm layer. Default: dict(type=’BN’)
res_top_channels (int) – Number of channels of feature from ResNet_top. Default: 64.
expand_times (int) – Times by which the in_channels are expanded in RSB. Default:26.
init_cfg (dict or list[dict], optional) –
Initialization config dict. Default: ``[
dict(type=’Kaiming’, layer=[‘Conv2d’]), dict(
type=’Constant’, val=1, layer=[‘_BatchNorm’, ‘GroupNorm’]),
- dict(
type=’Normal’, std=0.01, layer=[‘Linear’]),
]``
Example
>>> from mmpose.models import RSN >>> import torch >>> self = RSN(num_stages=2,num_units=2,num_blocks=[2,2]) >>> self.eval() >>> inputs = torch.rand(1, 3, 511, 511) >>> level_outputs = self.forward(inputs) >>> for level_output in level_outputs: ... for feature in level_output: ... print(tuple(feature.shape)) ... (1, 256, 64, 64) (1, 256, 128, 128) (1, 256, 64, 64) (1, 256, 128, 128)
- class mmpose.models.backbones.RegNet(arch, in_channels=3, stem_channels=32, base_channels=32, strides=(2, 2, 2, 2), dilations=(1, 1, 1, 1), out_indices=(3,), style='pytorch', deep_stem=False, avg_down=False, frozen_stages=- 1, conv_cfg=None, norm_cfg={'requires_grad': True, 'type': 'BN'}, norm_eval=False, with_cp=False, zero_init_residual=True, init_cfg=[{'type': 'Kaiming', 'layer': ['Conv2d']}, {'type': 'Constant', 'val': 1, 'layer': ['_BatchNorm', 'GroupNorm']}])[source]¶
RegNet backbone.
More details can be found in paper .
- Parameters
arch (dict) – The parameter of RegNets. - w0 (int): initial width - wa (float): slope of width - wm (float): quantization parameter to quantize the width - depth (int): depth of the backbone - group_w (int): width of group - bot_mul (float): bottleneck ratio, i.e. expansion of bottleneck.
strides (Sequence[int]) – Strides of the first block of each stage.
base_channels (int) – Base channels after stem layer.
in_channels (int) – Number of input image channels. Default: 3.
dilations (Sequence[int]) – Dilation of each stage.
out_indices (Sequence[int]) – Output from which stages.
style (str) – pytorch or caffe. If set to “pytorch”, the stride-two layer is the 3x3 conv layer, otherwise the stride-two layer is the first 1x1 conv layer. Default: “pytorch”.
frozen_stages (int) – Stages to be frozen (all param fixed). -1 means not freezing any parameters. Default: -1.
norm_cfg (dict) – dictionary to construct and config norm layer. Default: dict(type=’BN’, requires_grad=True).
norm_eval (bool) – Whether to set norm layers to eval mode, namely, freeze running stats (mean and var). Note: Effect on Batch Norm and its variants only. Default: False.
with_cp (bool) – Use checkpoint or not. Using checkpoint will save some memory while slowing down the training speed. Default: False.
zero_init_residual (bool) – whether to use zero init for last norm layer in resblocks to let them behave as identity. Default: True.
init_cfg (dict or list[dict], optional) –
Initialization config dict. Default: ``[
dict(type=’Kaiming’, layer=[‘Conv2d’]), dict(
type=’Constant’, val=1, layer=[‘_BatchNorm’, ‘GroupNorm’])
]``
Example
>>> from mmpose.models import RegNet >>> import torch >>> self = RegNet( arch=dict( w0=88, wa=26.31, wm=2.25, group_w=48, depth=25, bot_mul=1.0), out_indices=(0, 1, 2, 3)) >>> self.eval() >>> inputs = torch.rand(1, 3, 32, 32) >>> level_outputs = self.forward(inputs) >>> for level_out in level_outputs: ... print(tuple(level_out.shape)) (1, 96, 8, 8) (1, 192, 4, 4) (1, 432, 2, 2) (1, 1008, 1, 1)
- adjust_width_group(widths, bottleneck_ratio, groups)[source]¶
Adjusts the compatibility of widths and groups.
- Parameters
widths (list[int]) – Width of each stage.
bottleneck_ratio (float) – Bottleneck ratio.
groups (int) – number of groups in each stage
- Returns
The adjusted widths and groups of each stage.
- Return type
tuple(list)
- static generate_regnet(initial_width, width_slope, width_parameter, depth, divisor=8)[source]¶
Generates per block width from RegNet parameters.
- Parameters
initial_width ([int]) – Initial width of the backbone
width_slope ([float]) – Slope of the quantized linear function
width_parameter ([int]) – Parameter used to quantize the width.
depth ([int]) – Depth of the backbone.
divisor (int, optional) – The divisor of channels. Defaults to 8.
- Returns
- return a list of widths of each stage and the number of
stages
- Return type
list, int
- class mmpose.models.backbones.ResNeSt(depth, groups=1, width_per_group=4, radix=2, reduction_factor=4, avg_down_stride=True, **kwargs)[source]¶
ResNeSt backbone.
Please refer to the paper for details.
- Parameters
depth (int) – Network depth, from {50, 101, 152, 200}.
groups (int) – Groups of conv2 in Bottleneck. Default: 32.
width_per_group (int) – Width per group of conv2 in Bottleneck. Default: 4.
radix (int) – Radix of SpltAtConv2d. Default: 2
reduction_factor (int) – Reduction factor of SplitAttentionConv2d. Default: 4.
avg_down_stride (bool) – Whether to use average pool for stride in Bottleneck. Default: True.
in_channels (int) – Number of input image channels. Default: 3.
stem_channels (int) – Output channels of the stem layer. Default: 64.
num_stages (int) – Stages of the network. Default: 4.
strides (Sequence[int]) – Strides of the first block of each stage. Default:
(1, 2, 2, 2)
.dilations (Sequence[int]) – Dilation of each stage. Default:
(1, 1, 1, 1)
.out_indices (Sequence[int]) – Output from which stages. If only one stage is specified, a single tensor (feature map) is returned, otherwise multiple stages are specified, a tuple of tensors will be returned. Default:
(3, )
.style (str) – pytorch or caffe. If set to “pytorch”, the stride-two layer is the 3x3 conv layer, otherwise the stride-two layer is the first 1x1 conv layer.
deep_stem (bool) – Replace 7x7 conv in input stem with 3 3x3 conv. Default: False.
avg_down (bool) – Use AvgPool instead of stride conv when downsampling in the bottleneck. Default: False.
frozen_stages (int) – Stages to be frozen (stop grad and set eval mode). -1 means not freezing any parameters. Default: -1.
conv_cfg (dict | None) – The config dict for conv layers. Default: None.
norm_cfg (dict) – The config dict for norm layers.
norm_eval (bool) – Whether to set norm layers to eval mode, namely, freeze running stats (mean and var). Note: Effect on Batch Norm and its variants only. Default: False.
with_cp (bool) – Use checkpoint or not. Using checkpoint will save some memory while slowing down the training speed. Default: False.
zero_init_residual (bool) – Whether to use zero init for last norm layer in resblocks to let them behave as identity. Default: True.
init_cfg (dict or list[dict], optional) –
Initialization config dict. Default: ``[
dict(type=’Kaiming’, layer=[‘Conv2d’]), dict(
type=’Constant’, val=1, layer=[‘_BatchNorm’, ‘GroupNorm’])
]``
- class mmpose.models.backbones.ResNeXt(depth, groups=32, width_per_group=4, **kwargs)[source]¶
ResNeXt backbone.
Please refer to the paper for details.
- Parameters
depth (int) – Network depth, from {50, 101, 152}.
groups (int) – Groups of conv2 in Bottleneck. Default: 32.
width_per_group (int) – Width per group of conv2 in Bottleneck. Default: 4.
in_channels (int) – Number of input image channels. Default: 3.
stem_channels (int) – Output channels of the stem layer. Default: 64.
num_stages (int) – Stages of the network. Default: 4.
strides (Sequence[int]) – Strides of the first block of each stage. Default:
(1, 2, 2, 2)
.dilations (Sequence[int]) – Dilation of each stage. Default:
(1, 1, 1, 1)
.out_indices (Sequence[int]) – Output from which stages. If only one stage is specified, a single tensor (feature map) is returned, otherwise multiple stages are specified, a tuple of tensors will be returned. Default:
(3, )
.style (str) – pytorch or caffe. If set to “pytorch”, the stride-two layer is the 3x3 conv layer, otherwise the stride-two layer is the first 1x1 conv layer.
deep_stem (bool) – Replace 7x7 conv in input stem with 3 3x3 conv. Default: False.
avg_down (bool) – Use AvgPool instead of stride conv when downsampling in the bottleneck. Default: False.
frozen_stages (int) – Stages to be frozen (stop grad and set eval mode). -1 means not freezing any parameters. Default: -1.
conv_cfg (dict | None) – The config dict for conv layers. Default: None.
norm_cfg (dict) – The config dict for norm layers.
norm_eval (bool) – Whether to set norm layers to eval mode, namely, freeze running stats (mean and var). Note: Effect on Batch Norm and its variants only. Default: False.
with_cp (bool) – Use checkpoint or not. Using checkpoint will save some memory while slowing down the training speed. Default: False.
zero_init_residual (bool) – Whether to use zero init for last norm layer in resblocks to let them behave as identity. Default: True.
init_cfg –
Initialization config dict. Default: ``[
dict(type=’Kaiming’, layer=[‘Conv2d’]), dict(
type=’Constant’, val=1, layer=[‘_BatchNorm’, ‘GroupNorm’])
]``
- class mmpose.models.backbones.ResNet(depth, in_channels=3, stem_channels=64, base_channels=64, expansion=None, num_stages=4, strides=(1, 2, 2, 2), dilations=(1, 1, 1, 1), out_indices=(3,), style='pytorch', deep_stem=False, avg_down=False, frozen_stages=- 1, conv_cfg=None, norm_cfg={'requires_grad': True, 'type': 'BN'}, norm_eval=False, with_cp=False, zero_init_residual=True, init_cfg=[{'type': 'Kaiming', 'layer': ['Conv2d']}, {'type': 'Constant', 'val': 1, 'layer': ['_BatchNorm', 'GroupNorm']}])[source]¶
ResNet backbone.
Please refer to the paper for details.
- Parameters
depth (int) – Network depth, from {18, 34, 50, 101, 152}.
in_channels (int) – Number of input image channels. Default: 3.
stem_channels (int) – Output channels of the stem layer. Default: 64.
base_channels (int) – Middle channels of the first stage. Default: 64.
num_stages (int) – Stages of the network. Default: 4.
strides (Sequence[int]) – Strides of the first block of each stage. Default:
(1, 2, 2, 2)
.dilations (Sequence[int]) – Dilation of each stage. Default:
(1, 1, 1, 1)
.out_indices (Sequence[int]) – Output from which stages. If only one stage is specified, a single tensor (feature map) is returned, otherwise multiple stages are specified, a tuple of tensors will be returned. Default:
(3, )
.style (str) – pytorch or caffe. If set to “pytorch”, the stride-two layer is the 3x3 conv layer, otherwise the stride-two layer is the first 1x1 conv layer.
deep_stem (bool) – Replace 7x7 conv in input stem with 3 3x3 conv. Default: False.
avg_down (bool) – Use AvgPool instead of stride conv when downsampling in the bottleneck. Default: False.
frozen_stages (int) – Stages to be frozen (stop grad and set eval mode). -1 means not freezing any parameters. Default: -1.
conv_cfg (dict | None) – The config dict for conv layers. Default: None.
norm_cfg (dict) – The config dict for norm layers.
norm_eval (bool) – Whether to set norm layers to eval mode, namely, freeze running stats (mean and var). Note: Effect on Batch Norm and its variants only. Default: False.
with_cp (bool) – Use checkpoint or not. Using checkpoint will save some memory while slowing down the training speed. Default: False.
zero_init_residual (bool) – Whether to use zero init for last norm layer in resblocks to let them behave as identity. Default: True.
init_cfg (dict or list[dict], optional) –
Initialization config dict. Default: ``[
dict(type=’Kaiming’, layer=[‘Conv2d’]), dict(
type=’Constant’, val=1, layer=[‘_BatchNorm’, ‘GroupNorm’])
]``
Example
>>> from mmpose.models import ResNet >>> import torch >>> self = ResNet(depth=18, out_indices=(0, 1, 2, 3)) >>> self.eval() >>> inputs = torch.rand(1, 3, 32, 32) >>> level_outputs = self.forward(inputs) >>> for level_out in level_outputs: ... print(tuple(level_out.shape)) (1, 64, 8, 8) (1, 128, 4, 4) (1, 256, 2, 2) (1, 512, 1, 1)
- property norm1¶
the normalization layer named “norm1”
- Type
nn.Module
- class mmpose.models.backbones.ResNetV1d(**kwargs)[source]¶
ResNetV1d variant described in Bag of Tricks.
Compared with default ResNet(ResNetV1b), ResNetV1d replaces the 7x7 conv in the input stem with three 3x3 convs. And in the downsampling block, a 2x2 avg_pool with stride 2 is added before conv, whose stride is changed to 1.
- class mmpose.models.backbones.SCNet(depth, **kwargs)[source]¶
SCNet backbone.
Improving Convolutional Networks with Self-Calibrated Convolutions, Jiang-Jiang Liu, Qibin Hou, Ming-Ming Cheng, Changhu Wang, Jiashi Feng, IEEE CVPR, 2020. http://mftp.mmcheng.net/Papers/20cvprSCNet.pdf
- Parameters
depth (int) – Depth of scnet, from {50, 101}.
in_channels (int) – Number of input image channels. Normally 3.
base_channels (int) – Number of base channels of hidden layer.
num_stages (int) – SCNet stages, normally 4.
strides (Sequence[int]) – Strides of the first block of each stage.
dilations (Sequence[int]) – Dilation of each stage.
out_indices (Sequence[int]) – Output from which stages.
style (str) – pytorch or caffe. If set to “pytorch”, the stride-two layer is the 3x3 conv layer, otherwise the stride-two layer is the first 1x1 conv layer.
deep_stem (bool) – Replace 7x7 conv in input stem with 3 3x3 conv
avg_down (bool) – Use AvgPool instead of stride conv when downsampling in the bottleneck.
frozen_stages (int) – Stages to be frozen (stop grad and set eval mode). -1 means not freezing any parameters.
norm_cfg (dict) – Dictionary to construct and config norm layer.
norm_eval (bool) – Whether to set norm layers to eval mode, namely, freeze running stats (mean and var). Note: Effect on Batch Norm and its variants only.
with_cp (bool) – Use checkpoint or not. Using checkpoint will save some memory while slowing down the training speed.
zero_init_residual (bool) – Whether to use zero init for last norm layer in resblocks to let them behave as identity.
Example
>>> from mmpose.models import SCNet >>> import torch >>> self = SCNet(depth=50, out_indices=(0, 1, 2, 3)) >>> self.eval() >>> inputs = torch.rand(1, 3, 224, 224) >>> level_outputs = self.forward(inputs) >>> for level_out in level_outputs: ... print(tuple(level_out.shape)) (1, 256, 56, 56) (1, 512, 28, 28) (1, 1024, 14, 14) (1, 2048, 7, 7)
- class mmpose.models.backbones.SEResNeXt(depth, groups=32, width_per_group=4, **kwargs)[source]¶
SEResNeXt backbone.
Please refer to the paper for details.
- Parameters
depth (int) – Network depth, from {50, 101, 152}.
groups (int) – Groups of conv2 in Bottleneck. Default: 32.
width_per_group (int) – Width per group of conv2 in Bottleneck. Default: 4.
se_ratio (int) – Squeeze ratio in SELayer. Default: 16.
in_channels (int) – Number of input image channels. Default: 3.
stem_channels (int) – Output channels of the stem layer. Default: 64.
num_stages (int) – Stages of the network. Default: 4.
strides (Sequence[int]) – Strides of the first block of each stage. Default:
(1, 2, 2, 2)
.dilations (Sequence[int]) – Dilation of each stage. Default:
(1, 1, 1, 1)
.out_indices (Sequence[int]) – Output from which stages. If only one stage is specified, a single tensor (feature map) is returned, otherwise multiple stages are specified, a tuple of tensors will be returned. Default:
(3, )
.style (str) – pytorch or caffe. If set to “pytorch”, the stride-two layer is the 3x3 conv layer, otherwise the stride-two layer is the first 1x1 conv layer.
deep_stem (bool) – Replace 7x7 conv in input stem with 3 3x3 conv. Default: False.
avg_down (bool) – Use AvgPool instead of stride conv when downsampling in the bottleneck. Default: False.
frozen_stages (int) – Stages to be frozen (stop grad and set eval mode). -1 means not freezing any parameters. Default: -1.
conv_cfg (dict | None) – The config dict for conv layers. Default: None.
norm_cfg (dict) – The config dict for norm layers.
norm_eval (bool) – Whether to set norm layers to eval mode, namely, freeze running stats (mean and var). Note: Effect on Batch Norm and its variants only. Default: False.
with_cp (bool) – Use checkpoint or not. Using checkpoint will save some memory while slowing down the training speed. Default: False.
zero_init_residual (bool) – Whether to use zero init for last norm layer in resblocks to let them behave as identity. Default: True.
init_cfg (dict or list[dict], optional) –
Initialization config dict. Default: ``[
dict(type=’Kaiming’, layer=[‘Conv2d’]), dict(
type=’Constant’, val=1, layer=[‘_BatchNorm’, ‘GroupNorm’])
]``
Example
>>> from mmpose.models import SEResNeXt >>> import torch >>> self = SEResNet(depth=50, out_indices=(0, 1, 2, 3)) >>> self.eval() >>> inputs = torch.rand(1, 3, 224, 224) >>> level_outputs = self.forward(inputs) >>> for level_out in level_outputs: ... print(tuple(level_out.shape)) (1, 256, 56, 56) (1, 512, 28, 28) (1, 1024, 14, 14) (1, 2048, 7, 7)
- class mmpose.models.backbones.SEResNet(depth, se_ratio=16, **kwargs)[source]¶
SEResNet backbone.
Please refer to the paper for details.
- Parameters
depth (int) – Network depth, from {50, 101, 152}.
se_ratio (int) – Squeeze ratio in SELayer. Default: 16.
in_channels (int) – Number of input image channels. Default: 3.
stem_channels (int) – Output channels of the stem layer. Default: 64.
num_stages (int) – Stages of the network. Default: 4.
strides (Sequence[int]) – Strides of the first block of each stage. Default:
(1, 2, 2, 2)
.dilations (Sequence[int]) – Dilation of each stage. Default:
(1, 1, 1, 1)
.out_indices (Sequence[int]) – Output from which stages. If only one stage is specified, a single tensor (feature map) is returned, otherwise multiple stages are specified, a tuple of tensors will be returned. Default:
(3, )
.style (str) – pytorch or caffe. If set to “pytorch”, the stride-two layer is the 3x3 conv layer, otherwise the stride-two layer is the first 1x1 conv layer.
deep_stem (bool) – Replace 7x7 conv in input stem with 3 3x3 conv. Default: False.
avg_down (bool) – Use AvgPool instead of stride conv when downsampling in the bottleneck. Default: False.
frozen_stages (int) – Stages to be frozen (stop grad and set eval mode). -1 means not freezing any parameters. Default: -1.
conv_cfg (dict | None) – The config dict for conv layers. Default: None.
norm_cfg (dict) – The config dict for norm layers.
norm_eval (bool) – Whether to set norm layers to eval mode, namely, freeze running stats (mean and var). Note: Effect on Batch Norm and its variants only. Default: False.
with_cp (bool) – Use checkpoint or not. Using checkpoint will save some memory while slowing down the training speed. Default: False.
zero_init_residual (bool) – Whether to use zero init for last norm layer in resblocks to let them behave as identity. Default: True.
init_cfg (dict or list[dict], optional) –
Initialization config dict. Default: ``[
dict(type=’Kaiming’, layer=[‘Conv2d’]), dict(
type=’Constant’, val=1, layer=[‘_BatchNorm’, ‘GroupNorm’])
]``
Example
>>> from mmpose.models import SEResNet >>> import torch >>> self = SEResNet(depth=50, out_indices=(0, 1, 2, 3)) >>> self.eval() >>> inputs = torch.rand(1, 3, 224, 224) >>> level_outputs = self.forward(inputs) >>> for level_out in level_outputs: ... print(tuple(level_out.shape)) (1, 256, 56, 56) (1, 512, 28, 28) (1, 1024, 14, 14) (1, 2048, 7, 7)
- class mmpose.models.backbones.ShuffleNetV1(groups=3, widen_factor=1.0, out_indices=(2,), frozen_stages=- 1, conv_cfg=None, norm_cfg={'type': 'BN'}, act_cfg={'type': 'ReLU'}, norm_eval=False, with_cp=False, init_cfg=[{'type': 'Normal', 'std': 0.01, 'layer': ['Conv2d']}, {'type': 'Constant', 'val': 1, 'bias': 0.0001, 'layer': ['_BatchNorm', 'GroupNorm']}])[source]¶
ShuffleNetV1 backbone.
- Parameters
groups (int, optional) – The number of groups to be used in grouped 1x1 convolutions in each ShuffleUnit. Default: 3.
widen_factor (float, optional) – Width multiplier - adjusts the number of channels in each layer by this amount. Default: 1.0.
out_indices (Sequence[int]) – Output from which stages. Default: (2, )
frozen_stages (int) – Stages to be frozen (all param fixed). Default: -1, which means not freezing any parameters.
conv_cfg (dict) – Config dict for convolution layer. Default: None, which means using conv2d.
norm_cfg (dict) – Config dict for normalization layer. Default: dict(type=’BN’).
act_cfg (dict) – Config dict for activation layer. Default: dict(type=’ReLU’).
norm_eval (bool) – Whether to set norm layers to eval mode, namely, freeze running stats (mean and var). Note: Effect on Batch Norm and its variants only. Default: False.
with_cp (bool) – Use checkpoint or not. Using checkpoint will save some memory while slowing down the training speed. Default: False.
init_cfg (dict or list[dict], optional) –
Initialization config dict. Default: ``[
dict(type=’Normal’, std=0.01, layer=[‘Conv2d’]), dict(
type=’Constant’, val=1, bias=0.0001 layer=[‘_BatchNorm’, ‘GroupNorm’])
]``
- forward(x)[source]¶
Forward function.
- Parameters
x (Tensor | tuple[Tensor]) – x could be a torch.Tensor or a tuple of torch.Tensor, containing input data for forward computation.
- make_layer(out_channels, num_blocks, first_block=False)[source]¶
Stack ShuffleUnit blocks to make a layer.
- Parameters
out_channels (int) – out_channels of the block.
num_blocks (int) – Number of blocks.
first_block (bool, optional) – Whether is the first ShuffleUnit of a sequential ShuffleUnits. Default: False, which means using the grouped 1x1 convolution.
- class mmpose.models.backbones.ShuffleNetV2(widen_factor=1.0, out_indices=(3,), frozen_stages=- 1, conv_cfg=None, norm_cfg={'type': 'BN'}, act_cfg={'type': 'ReLU'}, norm_eval=False, with_cp=False, init_cfg=[{'type': 'Normal', 'std': 0.01, 'layer': ['Conv2d']}, {'type': 'Constant', 'val': 1, 'bias': 0.0001, 'layer': ['_BatchNorm', 'GroupNorm']}])[source]¶
ShuffleNetV2 backbone.
- Parameters
widen_factor (float) – Width multiplier - adjusts the number of channels in each layer by this amount. Default: 1.0.
out_indices (Sequence[int]) – Output from which stages. Default: (0, 1, 2, 3).
frozen_stages (int) – Stages to be frozen (all param fixed). Default: -1, which means not freezing any parameters.
conv_cfg (dict) – Config dict for convolution layer. Default: None, which means using conv2d.
norm_cfg (dict) – Config dict for normalization layer. Default: dict(type=’BN’).
act_cfg (dict) – Config dict for activation layer. Default: dict(type=’ReLU’).
norm_eval (bool) – Whether to set norm layers to eval mode, namely, freeze running stats (mean and var). Note: Effect on Batch Norm and its variants only. Default: False.
with_cp (bool) – Use checkpoint or not. Using checkpoint will save some memory while slowing down the training speed. Default: False.
init_cfg (dict or list[dict], optional) –
Initialization config dict. Default: ``[
dict(type=’Normal’, std=0.01, layer=[‘Conv2d’]), dict(
type=’Constant’, val=1, bias=0.0001 layer=[‘_BatchNorm’, ‘GroupNorm’])
]``
- class mmpose.models.backbones.SwinTransformer(pretrain_img_size=224, in_channels=3, embed_dims=96, patch_size=4, window_size=7, mlp_ratio=4, depths=(2, 2, 6, 2), num_heads=(3, 6, 12, 24), strides=(4, 2, 2, 2), out_indices=(0, 1, 2, 3), qkv_bias=True, qk_scale=None, patch_norm=True, drop_rate=0.0, attn_drop_rate=0.0, drop_path_rate=0.1, use_abs_pos_embed=False, act_cfg={'type': 'GELU'}, norm_cfg={'type': 'LN'}, with_cp=False, convert_weights=False, frozen_stages=- 1, init_cfg=[{'type': 'TruncNormal', 'std': 0.02, 'layer': ['Linear']}, {'type': 'Constant', 'val': 1, 'layer': ['LayerNorm']}])[source]¶
Swin Transformer A PyTorch implement of : Swin Transformer: Hierarchical Vision Transformer using Shifted Windows -
Inspiration from https://github.com/microsoft/Swin-Transformer
- Parameters
pretrain_img_size (int | tuple[int]) – The size of input image when pretrain. Defaults: 224.
in_channels (int) – The num of input channels. Defaults: 3.
embed_dims (int) – The feature dimension. Default: 96.
patch_size (int | tuple[int]) – Patch size. Default: 4.
window_size (int) – Window size. Default: 7.
mlp_ratio (int) – Ratio of mlp hidden dim to embedding dim. Default: 4.
depths (tuple[int]) – Depths of each Swin Transformer stage. Default: (2, 2, 6, 2).
num_heads (tuple[int]) – Parallel attention heads of each Swin Transformer stage. Default: (3, 6, 12, 24).
strides (tuple[int]) – The patch merging or patch embedding stride of each Swin Transformer stage. (In swin, we set kernel size equal to stride.) Default: (4, 2, 2, 2).
out_indices (tuple[int]) – Output from which stages. Default: (0, 1, 2, 3).
qkv_bias (bool, optional) – If True, add a learnable bias to query, key, value. Default: True
qk_scale (float | None, optional) – Override default qk scale of head_dim ** -0.5 if set. Default: None.
patch_norm (bool) – If add a norm layer for patch embed and patch merging. Default: True.
drop_rate (float) – Dropout rate. Defaults: 0.
attn_drop_rate (float) – Attention dropout rate. Default: 0.
drop_path_rate (float) – Stochastic depth rate. Defaults: 0.1.
use_abs_pos_embed (bool) – If True, add absolute position embedding to the patch embedding. Defaults: False.
act_cfg (dict) – Config dict for activation layer. Default: dict(type=’LN’).
norm_cfg (dict) – Config dict for normalization layer at output of backone. Defaults: dict(type=’LN’).
with_cp (bool, optional) – Use checkpoint or not. Using checkpoint will save some memory while slowing down the training speed. Default: False.
pretrained (str, optional) – model pretrained path. Default: None.
convert_weights (bool) – The flag indicates whether the pre-trained model is from the original repo. We may need to convert some keys to make it compatible. Default: False.
frozen_stages (int) – Stages to be frozen (stop grad and set eval mode). Default: -1 (-1 means not freezing any parameters).
init_cfg (dict or list[dict], optional) –
Initialization config dict. Default: ``[
dict(type=’TruncNormal’, std=.02, layer=[‘Linear’]), dict(type=’Constant’, val=1, layer=[‘LayerNorm’]),
]``
- forward(x)[source]¶
Forward function.
- Parameters
x (Tensor | tuple[Tensor]) – x could be a torch.Tensor or a tuple of torch.Tensor, containing input data for forward computation.
- class mmpose.models.backbones.TCN(in_channels, stem_channels=1024, num_blocks=2, kernel_sizes=(3, 3, 3), dropout=0.25, causal=False, residual=True, use_stride_conv=False, conv_cfg={'type': 'Conv1d'}, norm_cfg={'type': 'BN1d'}, max_norm=None, init_cfg=[{'type': 'Kaiming', 'mode': 'fan_in', 'nonlinearity': 'relu', 'layer': ['Conv2d']}, {'type': 'Constant', 'val': 1, 'layer': ['_BatchNorm', 'GroupNorm']}])[source]¶
TCN backbone.
Temporal Convolutional Networks. More details can be found in the paper .
- Parameters
in_channels (int) – Number of input channels, which equals to num_keypoints * num_features.
stem_channels (int) – Number of feature channels. Default: 1024.
num_blocks (int) – NUmber of basic temporal convolutional blocks. Default: 2.
kernel_sizes (Sequence[int]) – Sizes of the convolving kernel of each basic block. Default:
(3, 3, 3)
.dropout (float) – Dropout rate. Default: 0.25.
causal (bool) – Use causal convolutions instead of symmetric convolutions (for real-time applications). Default: False.
residual (bool) – Use residual connection. Default: True.
use_stride_conv (bool) – Use TCN backbone optimized for single-frame batching, i.e. where batches have input length = receptive field, and output length = 1. This implementation replaces dilated convolutions with strided convolutions to avoid generating unused intermediate results. The weights are interchangeable with the reference implementation. Default: False
conv_cfg (dict) – dictionary to construct and config conv layer. Default: dict(type=’Conv1d’).
norm_cfg (dict) – dictionary to construct and config norm layer. Default: dict(type=’BN1d’).
max_norm (float|None) – if not None, the weight of convolution layers will be clipped to have a maximum norm of max_norm.
init_cfg (dict or list[dict], optional) –
Initialization config dict. Default: ``[
- dict(
type=’Kaiming’, mode=’fan_in’, nonlinearity=’relu’, layer=[‘Conv2d’]),
- dict(
type=’Constant’, val=1, layer=[‘_BatchNorm’, ‘GroupNorm’])
]``
Example
>>> from mmpose.models import TCN >>> import torch >>> self = TCN(in_channels=34) >>> self.eval() >>> inputs = torch.rand(1, 34, 243) >>> level_outputs = self.forward(inputs) >>> for level_out in level_outputs: ... print(tuple(level_out.shape)) (1, 1024, 235) (1, 1024, 217)
- class mmpose.models.backbones.V2VNet(input_channels, output_channels, mid_channels=32, init_cfg={'layer': ['Conv3d', 'ConvTranspose3d'], 'std': 0.001, 'type': 'Normal'})[source]¶
V2VNet.
- Please refer to the paper <https://arxiv.org/abs/1711.07399>
for details.
- Parameters
input_channels (int) – Number of channels of the input feature volume.
output_channels (int) – Number of channels of the output volume.
mid_channels (int) – Input and output channels of the encoder-decoder block.
init_cfg (dict or list[dict], optional) –
Initialization config dict. Default: ``dict(
type=’Normal’, std=0.001, layer=[‘Conv3d’, ‘ConvTranspose3d’]
)``
- class mmpose.models.backbones.VGG(depth, num_classes=- 1, num_stages=5, dilations=(1, 1, 1, 1, 1), out_indices=None, frozen_stages=- 1, conv_cfg=None, norm_cfg=None, act_cfg={'type': 'ReLU'}, norm_eval=False, ceil_mode=False, with_last_pool=True, init_cfg=[{'type': 'Kaiming', 'layer': ['Conv2d']}, {'type': 'Constant', 'val': 1, 'layer': ['_BatchNorm', 'GroupNorm']}, {'type': 'Normal', 'std': 0.01, 'layer': ['Linear']}])[source]¶
VGG backbone.
- Parameters
depth (int) – Depth of vgg, from {11, 13, 16, 19}.
with_norm (bool) – Use BatchNorm or not.
num_classes (int) – number of classes for classification.
num_stages (int) – VGG stages, normally 5.
dilations (Sequence[int]) – Dilation of each stage.
out_indices (Sequence[int]) – Output from which stages. If only one stage is specified, a single tensor (feature map) is returned, otherwise multiple stages are specified, a tuple of tensors will be returned. When it is None, the default behavior depends on whether num_classes is specified. If num_classes <= 0, the default value is (4, ), outputting the last feature map before classifier. If num_classes > 0, the default value is (5, ), outputting the classification score. Default: None.
frozen_stages (int) – Stages to be frozen (all param fixed). -1 means not freezing any parameters.
norm_eval (bool) – Whether to set norm layers to eval mode, namely, freeze running stats (mean and var). Note: Effect on Batch Norm and its variants only. Default: False.
ceil_mode (bool) – Whether to use ceil_mode of MaxPool. Default: False.
with_last_pool (bool) – Whether to keep the last pooling before classifier. Default: True.
init_cfg (dict or list[dict], optional) –
Initialization config dict. Default: ``[
dict(type=’Kaiming’, layer=[‘Conv2d’]), dict(
type=’Constant’, val=1, layer=[‘_BatchNorm’, ‘GroupNorm’]),
- dict(
type=’Normal’, std=0.01, layer=[‘Linear’]),
]``
- class mmpose.models.backbones.ViPNAS_MobileNetV3(wid=[16, 16, 24, 40, 80, 112, 160], expan=[None, 1, 5, 4, 5, 5, 6], dep=[None, 1, 4, 4, 4, 4, 4], ks=[3, 3, 7, 7, 5, 7, 5], group=[None, 8, 120, 20, 100, 280, 240], att=[None, True, True, False, True, True, True], stride=[2, 1, 2, 2, 2, 1, 2], act=['HSwish', 'ReLU', 'ReLU', 'ReLU', 'HSwish', 'HSwish', 'HSwish'], conv_cfg=None, norm_cfg={'type': 'BN'}, frozen_stages=- 1, norm_eval=False, with_cp=False, init_cfg=[{'type': 'Normal', 'std': 0.001, 'layer': ['Conv2d']}, {'type': 'Constant', 'val': 1, 'layer': ['_BatchNorm', 'GroupNorm']}])[source]¶
ViPNAS_MobileNetV3 backbone.
“ViPNAS: Efficient Video Pose Estimation via Neural Architecture Search” More details can be found in the paper .
- Parameters
wid (list(int)) – Searched width config for each stage.
expan (list(int)) – Searched expansion ratio config for each stage.
dep (list(int)) – Searched depth config for each stage.
ks (list(int)) – Searched kernel size config for each stage.
group (list(int)) – Searched group number config for each stage.
att (list(bool)) – Searched attention config for each stage.
stride (list(int)) – Stride config for each stage.
act (list(dict)) – Activation config for each stage.
conv_cfg (dict) – Config dict for convolution layer. Default: None, which means using conv2d.
norm_cfg (dict) – Config dict for normalization layer. Default: dict(type=’BN’).
frozen_stages (int) – Stages to be frozen (all param fixed). Default: -1, which means not freezing any parameters.
norm_eval (bool) – Whether to set norm layers to eval mode, namely, freeze running stats (mean and var). Note: Effect on Batch Norm and its variants only. Default: False.
with_cp (bool) – Use checkpoint or not. Using checkpoint will save some memory while slowing down the training speed. Default: False.
init_cfg (dict or list[dict], optional) –
Initialization config dict. Default: ``[
dict(type=’Normal’, std=0.001, layer=[‘Conv2d’]), dict(
type=’Constant’, val=1, layer=[‘_BatchNorm’, ‘GroupNorm’])
]``
- class mmpose.models.backbones.ViPNAS_ResNet(depth, in_channels=3, num_stages=4, strides=(1, 2, 2, 2), dilations=(1, 1, 1, 1), out_indices=(3,), style='pytorch', deep_stem=False, avg_down=False, frozen_stages=- 1, conv_cfg=None, norm_cfg={'requires_grad': True, 'type': 'BN'}, norm_eval=False, with_cp=False, zero_init_residual=True, wid=[48, 80, 160, 304, 608], expan=[None, 1, 1, 1, 1], dep=[None, 4, 6, 7, 3], ks=[7, 3, 5, 5, 5], group=[None, 16, 16, 16, 16], att=[None, True, False, True, True], init_cfg=[{'type': 'Normal', 'std': 0.001, 'layer': ['Conv2d']}, {'type': 'Constant', 'val': 1, 'layer': ['_BatchNorm', 'GroupNorm']}])[source]¶
ViPNAS_ResNet backbone.
“ViPNAS: Efficient Video Pose Estimation via Neural Architecture Search” More details can be found in the paper .
- Parameters
depth (int) – Network depth, from {18, 34, 50, 101, 152}.
in_channels (int) – Number of input image channels. Default: 3.
num_stages (int) – Stages of the network. Default: 4.
strides (Sequence[int]) – Strides of the first block of each stage. Default:
(1, 2, 2, 2)
.dilations (Sequence[int]) – Dilation of each stage. Default:
(1, 1, 1, 1)
.out_indices (Sequence[int]) – Output from which stages. If only one stage is specified, a single tensor (feature map) is returned, otherwise multiple stages are specified, a tuple of tensors will be returned. Default:
(3, )
.style (str) – pytorch or caffe. If set to “pytorch”, the stride-two layer is the 3x3 conv layer, otherwise the stride-two layer is the first 1x1 conv layer.
deep_stem (bool) – Replace 7x7 conv in input stem with 3 3x3 conv. Default: False.
avg_down (bool) – Use AvgPool instead of stride conv when downsampling in the bottleneck. Default: False.
frozen_stages (int) – Stages to be frozen (stop grad and set eval mode). -1 means not freezing any parameters. Default: -1.
conv_cfg (dict | None) – The config dict for conv layers. Default: None.
norm_cfg (dict) – The config dict for norm layers.
norm_eval (bool) – Whether to set norm layers to eval mode, namely, freeze running stats (mean and var). Note: Effect on Batch Norm and its variants only. Default: False.
with_cp (bool) – Use checkpoint or not. Using checkpoint will save some memory while slowing down the training speed. Default: False.
zero_init_residual (bool) – Whether to use zero init for last norm layer in resblocks to let them behave as identity. Default: True.
wid (list(int)) – Searched width config for each stage.
expan (list(int)) – Searched expansion ratio config for each stage.
dep (list(int)) – Searched depth config for each stage.
ks (list(int)) – Searched kernel size config for each stage.
group (list(int)) – Searched group number config for each stage.
att (list(bool)) – Searched attention config for each stage.
init_cfg (dict or list[dict], optional) –
Initialization config dict. Default: ``[
dict(type=’Normal’, std=0.001, layer=[‘Conv2d’]), dict(
type=’Constant’, val=1, layer=[‘_BatchNorm’, ‘GroupNorm’])
]``
- property norm1¶
the normalization layer named “norm1”
- Type
nn.Module
necks¶
- class mmpose.models.necks.FPN(in_channels, out_channels, num_outs, start_level=0, end_level=- 1, add_extra_convs=False, relu_before_extra_convs=False, no_norm_on_lateral=False, conv_cfg=None, norm_cfg=None, act_cfg=None, upsample_cfg={'mode': 'nearest'})[source]¶
Feature Pyramid Network.
This is an implementation of paper Feature Pyramid Networks for Object Detection.
- Parameters
in_channels (list[int]) – Number of input channels per scale.
out_channels (int) – Number of output channels (used at each scale).
num_outs (int) – Number of output scales.
start_level (int) – Index of the start input backbone level used to build the feature pyramid. Default: 0.
end_level (int) – Index of the end input backbone level (exclusive) to build the feature pyramid. Default: -1, which means the last level.
add_extra_convs (bool | str) –
If bool, it decides whether to add conv layers on top of the original feature maps. Default to False. If True, it is equivalent to add_extra_convs=’on_input’. If str, it specifies the source feature map of the extra convs. Only the following options are allowed
’on_input’: Last feat map of neck inputs (i.e. backbone feature).
’on_lateral’: Last feature map after lateral convs.
’on_output’: The last output feature map after fpn convs.
relu_before_extra_convs (bool) – Whether to apply relu before the extra conv. Default: False.
no_norm_on_lateral (bool) – Whether to apply norm on lateral. Default: False.
conv_cfg (dict) – Config dict for convolution layer. Default: None.
norm_cfg (dict) – Config dict for normalization layer. Default: None.
act_cfg (dict) – Config dict for activation layer in ConvModule. Default: None.
upsample_cfg (dict) – Config dict for interpolate layer. Default: dict(mode=’nearest’).
Example
>>> import torch >>> in_channels = [2, 3, 5, 7] >>> scales = [340, 170, 84, 43] >>> inputs = [torch.rand(1, c, s, s) ... for c, s in zip(in_channels, scales)] >>> self = FPN(in_channels, 11, len(in_channels)).eval() >>> outputs = self.forward(inputs) >>> for i in range(len(outputs)): ... print(f'outputs[{i}].shape = {outputs[i].shape}') outputs[0].shape = torch.Size([1, 11, 340, 340]) outputs[1].shape = torch.Size([1, 11, 170, 170]) outputs[2].shape = torch.Size([1, 11, 84, 84]) outputs[3].shape = torch.Size([1, 11, 43, 43])
- class mmpose.models.necks.GlobalAveragePooling[source]¶
Global Average Pooling neck.
Note that we use view to remove extra channel after pooling. We do not use squeeze as it will also remove the batch dimension when the tensor has a batch dimension of size 1, which can lead to unexpected errors.
- class mmpose.models.necks.PoseWarperNeck(in_channels, out_channels, inner_channels, deform_groups=17, dilations=(3, 6, 12, 18, 24), trans_conv_kernel=1, res_blocks_cfg=None, offsets_kernel=3, deform_conv_kernel=3, in_index=0, input_transform=None, freeze_trans_layer=True, norm_eval=False, im2col_step=80)[source]¶
PoseWarper neck.
“Learning temporal pose estimation from sparsely-labeled videos”.
- Parameters
in_channels (int) – Number of input channels from backbone
out_channels (int) – Number of output channels
inner_channels (int) – Number of intermediate channels of the res block
deform_groups (int) – Number of groups in the deformable conv
dilations (list|tuple) – different dilations of the offset conv layers
trans_conv_kernel (int) – the kernel of the trans conv layer, which is used to get heatmap from the output of backbone. Default: 1
res_blocks_cfg (dict|None) –
config of residual blocks. If None, use the default values. If not None, it should contain the following keys:
block (str): the type of residual block, Default: ‘BASIC’.
num_blocks (int): the number of blocks, Default: 20.
offsets_kernel (int) – the kernel of offset conv layer.
deform_conv_kernel (int) – the kernel of defomrable conv layer.
in_index (int|Sequence[int]) – Input feature index. Default: 0
input_transform (str|None) –
Transformation type of input features. Options: ‘resize_concat’, ‘multiple_select’, None. Default: None.
’resize_concat’: Multiple feature maps will be resize to the same size as first one and than concat together. Usually used in FCN head of HRNet.
’multiple_select’: Multiple feature maps will be bundle into a list and passed into decode head.
None: Only one select feature map is allowed.
freeze_trans_layer (bool) – Whether to freeze the transition layer (stop grad and set eval mode). Default: True.
norm_eval (bool) – Whether to set norm layers to eval mode, namely, freeze running stats (mean and var). Note: Effect on Batch Norm and its variants only. Default: False.
im2col_step (int) – the argument im2col_step in deformable conv, Default: 80.
- forward(inputs, frame_weight)[source]¶
Defines the computation performed at every call.
Should be overridden by all subclasses.
Note
Although the recipe for forward pass needs to be defined within this function, one should call the
Module
instance afterwards instead of this since the former takes care of running the registered hooks while the latter silently ignores them.
detectors¶
- class mmpose.models.pose_estimators.BottomupPoseEstimator(backbone: Union[mmengine.config.config.ConfigDict, dict], neck: Optional[Union[mmengine.config.config.ConfigDict, dict]] = None, head: Optional[Union[mmengine.config.config.ConfigDict, dict]] = None, train_cfg: Optional[Union[mmengine.config.config.ConfigDict, dict]] = None, test_cfg: Optional[Union[mmengine.config.config.ConfigDict, dict]] = None, data_preprocessor: Optional[Union[mmengine.config.config.ConfigDict, dict]] = None, init_cfg: Optional[Union[mmengine.config.config.ConfigDict, dict, List[Union[mmengine.config.config.ConfigDict, dict]]]] = None)[source]¶
Base class for bottom-up pose estimators.
- Parameters
backbone (dict) – The backbone config
neck (dict, optional) – The neck config. Defaults to
None
head (dict, optional) – The head config. Defaults to
None
train_cfg (dict, optional) – The runtime config for training process. Defaults to
None
test_cfg (dict, optional) – The runtime config for testing process. Defaults to
None
data_preprocessor (dict, optional) – The data preprocessing config to build the instance of
BaseDataPreprocessor
. Defaults toNone
.init_cfg (dict, optional) – The config to control the initialization. Defaults to
None
- add_pred_to_datasample(batch_pred_instances: List[mmengine.structures.instance_data.InstanceData], batch_pred_fields: Optional[List[mmengine.structures.pixel_data.PixelData]], batch_data_samples: List[mmpose.structures.pose_data_sample.PoseDataSample]) List[mmpose.structures.pose_data_sample.PoseDataSample] [source]¶
Add predictions into data samples.
- Parameters
batch_pred_instances (List[InstanceData]) – The predicted instances of the input data batch
batch_pred_fields (List[PixelData], optional) – The predicted fields (e.g. heatmaps) of the input batch
batch_data_samples (List[PoseDataSample]) – The input data batch
- Returns
A list of data samples where the predictions are stored in the
pred_instances
field of each data sample. The length of the list is the batch size whenmerge==False
, or 1 whenmerge==True
.- Return type
List[PoseDataSample]
- loss(inputs: torch.Tensor, data_samples: List[mmpose.structures.pose_data_sample.PoseDataSample]) dict [source]¶
Calculate losses from a batch of inputs and data samples.
- Parameters
inputs (Tensor) – Inputs with shape (N, C, H, W).
data_samples (List[
PoseDataSample
]) – The batch data samples.
- Returns
A dictionary of losses.
- Return type
dict
- predict(inputs: Union[torch.Tensor, List[torch.Tensor]], data_samples: List[mmpose.structures.pose_data_sample.PoseDataSample]) List[mmpose.structures.pose_data_sample.PoseDataSample] [source]¶
Predict results from a batch of inputs and data samples with post- processing.
- Parameters
inputs (Tensor | List[Tensor]) – Input image in tensor or image pyramid as a list of tensors. Each tensor is in shape [B, C, H, W]
data_samples (List[
PoseDataSample
]) – The batch data samples
- Returns
The pose estimation results of the input images. The return value is PoseDataSample instances with
pred_instances
andpred_fields``(optional) field , and ``pred_instances
usually contains the following keys:- keypoints (Tensor): predicted keypoint coordinates in shape
(num_instances, K, D) where K is the keypoint number and D is the keypoint dimension
- keypoint_scores (Tensor): predicted keypoint scores in shape
(num_instances, K)
- Return type
list[
PoseDataSample
]
- class mmpose.models.pose_estimators.TopdownPoseEstimator(backbone: Union[mmengine.config.config.ConfigDict, dict], neck: Optional[Union[mmengine.config.config.ConfigDict, dict]] = None, head: Optional[Union[mmengine.config.config.ConfigDict, dict]] = None, train_cfg: Optional[Union[mmengine.config.config.ConfigDict, dict]] = None, test_cfg: Optional[Union[mmengine.config.config.ConfigDict, dict]] = None, data_preprocessor: Optional[Union[mmengine.config.config.ConfigDict, dict]] = None, init_cfg: Optional[Union[mmengine.config.config.ConfigDict, dict, List[Union[mmengine.config.config.ConfigDict, dict]]]] = None, metainfo: Optional[dict] = None)[source]¶
Base class for top-down pose estimators.
- Parameters
backbone (dict) – The backbone config
neck (dict, optional) – The neck config. Defaults to
None
head (dict, optional) – The head config. Defaults to
None
train_cfg (dict, optional) – The runtime config for training process. Defaults to
None
test_cfg (dict, optional) – The runtime config for testing process. Defaults to
None
data_preprocessor (dict, optional) – The data preprocessing config to build the instance of
BaseDataPreprocessor
. Defaults toNone
init_cfg (dict, optional) – The config to control the initialization. Defaults to
None
metainfo (dict) – Meta information for dataset, such as keypoints definition and properties. If set, the metainfo of the input data batch will be overridden. For more details, please refer to https://mmpose.readthedocs.io/en/1.x/user_guides/ prepare_datasets.html#create-a-custom-dataset-info- config-file-for-the-dataset. Defaults to
None
- add_pred_to_datasample(batch_pred_instances: List[mmengine.structures.instance_data.InstanceData], batch_pred_fields: Optional[List[mmengine.structures.pixel_data.PixelData]], batch_data_samples: List[mmpose.structures.pose_data_sample.PoseDataSample]) List[mmpose.structures.pose_data_sample.PoseDataSample] [source]¶
Add predictions into data samples.
- Parameters
batch_pred_instances (List[InstanceData]) – The predicted instances of the input data batch
batch_pred_fields (List[PixelData], optional) – The predicted fields (e.g. heatmaps) of the input batch
batch_data_samples (List[PoseDataSample]) – The input data batch
- Returns
A list of data samples where the predictions are stored in the
pred_instances
field of each data sample.- Return type
List[PoseDataSample]
- loss(inputs: torch.Tensor, data_samples: List[mmpose.structures.pose_data_sample.PoseDataSample]) dict [source]¶
Calculate losses from a batch of inputs and data samples.
- Parameters
inputs (Tensor) – Inputs with shape (N, C, H, W).
data_samples (List[
PoseDataSample
]) – The batch data samples.
- Returns
A dictionary of losses.
- Return type
dict
- predict(inputs: torch.Tensor, data_samples: List[mmpose.structures.pose_data_sample.PoseDataSample]) List[mmpose.structures.pose_data_sample.PoseDataSample] [source]¶
Predict results from a batch of inputs and data samples with post- processing.
- Parameters
inputs (Tensor) – Inputs with shape (N, C, H, W)
data_samples (List[
PoseDataSample
]) – The batch data samples
- Returns
The pose estimation results of the input images. The return value is PoseDataSample instances with
pred_instances
andpred_fields``(optional) field , and ``pred_instances
usually contains the following keys:- keypoints (Tensor): predicted keypoint coordinates in shape
(num_instances, K, D) where K is the keypoint number and D is the keypoint dimension
- keypoint_scores (Tensor): predicted keypoint scores in shape
(num_instances, K)
- Return type
list[
PoseDataSample
]
heads¶
- class mmpose.models.heads.AssociativeEmbeddingHead(in_channels: Union[int, Sequence[int]], num_keypoints: int, tag_dim: int = 1, tag_per_keypoint: bool = True, deconv_out_channels: Optional[Sequence[int]] = (256, 256, 256), deconv_kernel_sizes: Optional[Sequence[int]] = (4, 4, 4), conv_out_channels: Optional[Sequence[int]] = None, conv_kernel_sizes: Optional[Sequence[int]] = None, has_final_layer: bool = True, input_transform: str = 'select', input_index: Union[int, Sequence[int]] = - 1, align_corners: bool = False, keypoint_loss: Union[mmengine.config.config.ConfigDict, dict] = {'type': 'KeypointMSELoss'}, tag_loss: Union[mmengine.config.config.ConfigDict, dict] = {'type': 'AssociativeEmbeddingLoss'}, decoder: Optional[Union[mmengine.config.config.ConfigDict, dict]] = None, init_cfg: Optional[Union[mmengine.config.config.ConfigDict, dict]] = None)[source]¶
- forward(feats: Tuple[torch.Tensor]) Tuple[torch.Tensor, torch.Tensor] [source]¶
Forward the network. The input is multi scale feature maps and the output is the heatmaps and tags.
- Parameters
feats (Tuple[Tensor]) – Multi scale feature maps.
- Returns
heatmaps (Tensor): output heatmaps
tags (Tensor): output tags
- Return type
tuple
- loss(feats: Tuple[torch.Tensor], batch_data_samples: Optional[List[mmpose.structures.pose_data_sample.PoseDataSample]], train_cfg: Union[mmengine.config.config.ConfigDict, dict] = {}) dict [source]¶
Calculate losses from a batch of inputs and data samples.
- Parameters
feats (Tuple[Tensor]) – The multi-stage features
batch_data_samples (List[
PoseDataSample
]) – The batch data samplestrain_cfg (dict) – The runtime config for training process. Defaults to {}
- Returns
A dictionary of losses.
- Return type
dict
- predict(feats: Union[Tuple[torch.Tensor], List[Tuple[torch.Tensor]], List[List[Tuple[torch.Tensor]]]], batch_data_samples: Optional[List[mmpose.structures.pose_data_sample.PoseDataSample]], test_cfg: Union[mmengine.config.config.ConfigDict, dict] = {}) Union[List[mmengine.structures.instance_data.InstanceData], Tuple[List[mmengine.structures.instance_data.InstanceData], List[mmengine.structures.pixel_data.PixelData]]] [source]¶
Predict results from features.
- Parameters
feats (Features) –
The features which could be in following forms:
Tuple[Tensor]: multi-stage features from the backbone
- List[Tuple[Tensor]]: multiple features for TTA where either
flip_test or multiscale_test is applied
- List[List[Tuple[Tensor]]]: multiple features for TTA where
both flip_test and multiscale_test are applied
batch_data_samples (List[
PoseDataSample
]) – The batch data samplestest_cfg (dict) – The runtime config for testing process. Defaults to {}
- Returns
If
test_cfg['output_heatmap']==True
, return both pose and heatmap prediction; otherwise only return the pose prediction.The pose prediction is a list of
InstanceData
, each contains the following fields:- keypoints (np.ndarray): predicted keypoint coordinates in
shape (num_instances, K, D) where K is the keypoint number and D is the keypoint dimension
- keypoint_scores (np.ndarray): predicted keypoint scores in
shape (num_instances, K)
The heatmap prediction is a list of
PixelData
, each contains the following fields:heatmaps (Tensor): The predicted heatmaps in shape (K, h, w)
- Return type
Union[InstanceList | Tuple[InstanceList | PixelDataList]]
- class mmpose.models.heads.BaseHead(init_cfg: Optional[Union[dict, List[dict]]] = None)[source]¶
Base head. A subclass should override
predict()
andloss()
.- Parameters
init_cfg (dict, optional) – The extra init config of layers. Defaults to None.
- decode(batch_outputs: Union[torch.Tensor, Tuple[torch.Tensor]]) List[mmengine.structures.instance_data.InstanceData] [source]¶
Decode keypoints from outputs.
- Parameters
batch_outputs (Tensor | Tuple[Tensor]) – The network outputs of a data batch
- Returns
A list of InstanceData, each contains the decoded pose information of the instances of one data sample.
- Return type
List[InstanceData]
- abstract loss(feats: Tuple[torch.Tensor], batch_data_samples: Optional[List[mmpose.structures.pose_data_sample.PoseDataSample]], train_cfg: Optional[Union[mmengine.config.config.ConfigDict, dict]] = {}) dict [source]¶
Calculate losses from a batch of inputs and data samples.
- abstract predict(feats: Union[Tuple[torch.Tensor], List[Tuple[torch.Tensor]], List[List[Tuple[torch.Tensor]]]], batch_data_samples: Optional[List[mmpose.structures.pose_data_sample.PoseDataSample]], test_cfg: Optional[Union[mmengine.config.config.ConfigDict, dict]] = {}) Union[List[mmengine.structures.instance_data.InstanceData], Tuple[List[mmengine.structures.instance_data.InstanceData], List[mmengine.structures.pixel_data.PixelData]]] [source]¶
Predict results from features.
- class mmpose.models.heads.CIDHead(in_channels: Union[int, Sequence[int]], gfd_channels: int, num_keypoints: int, prior_prob: float = 0.01, input_transform: str = 'select', input_index: Union[int, Sequence[int]] = - 1, align_corners: bool = False, coupled_heatmap_loss: Optional[Union[mmengine.config.config.ConfigDict, dict]] = {'type': 'FocalHeatmapLoss'}, decoupled_heatmap_loss: Optional[Union[mmengine.config.config.ConfigDict, dict]] = {'type': 'FocalHeatmapLoss'}, contrastive_loss: Optional[Union[mmengine.config.config.ConfigDict, dict]] = {'type': 'InfoNCELoss'}, decoder: Optional[Union[mmengine.config.config.ConfigDict, dict]] = None, init_cfg: Optional[Union[mmengine.config.config.ConfigDict, dict]] = None)[source]¶
Contextual Instance Decoupling head introduced in `Contextual Instance Decoupling for Robust Multi-Person Pose Estimation (CID)`_ by Wang et al (2022). The head is composed of an Instance Information Abstraction (IIA) module and a Global Feature Decoupling (GFD) module.
- Parameters
in_channels (int | Sequence[int]) – Number of channels in the input feature map
num_keypoints (int) – Number of keypoints
gfd_channels (int) – Number of filters in GFD module
max_train_instances (int) – Maximum number of instances in a batch during training. Defaults to 200
input_transform (str) –
Transformation of input features which should be one of the following options:
'resize_concat'
: Resize multiple feature maps specifiedby
input_index
to the same size as the first one and concat these feature maps
'select'
: Select feature map(s) specified byinput_index
. Multiple selected features will be bundled into a tuple
Defaults to
'select'
input_index (int | Sequence[int]) – The feature map index used in the input transformation. See also
input_transform
. Defaults to -1align_corners (bool) – align_corners argument of
torch.nn.functional.interpolate()
used in the input transformation. Defaults toFalse
heatmap_loss (Config) – Config of the heatmap loss. Defaults to use
KeypointMSELoss
coupled_heatmap_loss (Config) – Config of the loss for coupled heatmaps. Defaults to use
SoftWeightSmoothL1Loss
decoupled_heatmap_loss (Config) – Config of the loss for decoupled heatmaps. Defaults to use
SoftWeightSmoothL1Loss
contrastive_loss (Config) – Config of the contrastive loss for representation vectors of instances. Defaults to use
InfoNCELoss
decoder (Config, optional) – The decoder config that controls decoding keypoint coordinates from the network output. Defaults to
None
init_cfg (Config, optional) – Config to control the initialization. See
default_init_cfg
for default settings
Contextual_Instance_Decoupling_for_Robust_Multi-Person_Pose_Estimation_ CVPR_2022_paper.html
- forward(feats: Tuple[torch.Tensor]) torch.Tensor [source]¶
Forward the network. The input is multi scale feature maps and the output is the heatmap.
- Parameters
feats (Tuple[Tensor]) – Multi scale feature maps.
- Returns
output heatmap.
- Return type
Tensor
- loss(feats: Tuple[torch.Tensor], batch_data_samples: Optional[List[mmpose.structures.pose_data_sample.PoseDataSample]], train_cfg: Union[mmengine.config.config.ConfigDict, dict] = {}) dict [source]¶
Calculate losses from a batch of inputs and data samples.
- Parameters
feats (Tuple[Tensor]) – The multi-stage features
batch_data_samples (List[
PoseDataSample
]) – The batch data samplestrain_cfg (dict) – The runtime config for training process. Defaults to {}
- Returns
A dictionary of losses.
- Return type
dict
- predict(feats: Union[Tuple[torch.Tensor], List[Tuple[torch.Tensor]], List[List[Tuple[torch.Tensor]]]], batch_data_samples: Optional[List[mmpose.structures.pose_data_sample.PoseDataSample]], test_cfg: Union[mmengine.config.config.ConfigDict, dict] = {}) Union[List[mmengine.structures.instance_data.InstanceData], Tuple[List[mmengine.structures.instance_data.InstanceData], List[mmengine.structures.pixel_data.PixelData]]] [source]¶
Predict results from features.
- Parameters
feats (Tuple[Tensor] | List[Tuple[Tensor]]) – The multi-stage features (or multiple multi-stage features in TTA)
batch_data_samples (List[
PoseDataSample
]) – The batch data samplestest_cfg (dict) – The runtime config for testing process. Defaults to {}
- Returns
If
test_cfg['output_heatmap']==True
, return both pose and heatmap prediction; otherwise only return the pose prediction.The pose prediction is a list of
InstanceData
, each contains the following fields:- keypoints (np.ndarray): predicted keypoint coordinates in
shape (num_instances, K, D) where K is the keypoint number and D is the keypoint dimension
- keypoint_scores (np.ndarray): predicted keypoint scores in
shape (num_instances, K)
The heatmap prediction is a list of
PixelData
, each contains the following fields:heatmaps (Tensor): The predicted heatmaps in shape (K, h, w)
- Return type
Union[InstanceList | Tuple[InstanceList | PixelDataList]]
- class mmpose.models.heads.CPMHead(in_channels: Union[int, Sequence[int]], out_channels: int, num_stages: int, deconv_out_channels: Optional[Sequence[int]] = None, deconv_kernel_sizes: Optional[Sequence[int]] = None, has_final_layer: bool = True, loss: Union[mmengine.config.config.ConfigDict, dict, List[Union[mmengine.config.config.ConfigDict, dict]]] = {'type': 'KeypointMSELoss', 'use_target_weight': True}, decoder: Optional[Union[mmengine.config.config.ConfigDict, dict]] = None, init_cfg: Optional[Union[mmengine.config.config.ConfigDict, dict]] = None)[source]¶
Multi-stage heatmap head introduced in Convolutional Pose Machines by Wei et al (2016) and used by Stacked Hourglass Networks by Newell et al (2016). The head consists of multiple branches, each of which has some deconv layers and a simple conv2d layer.
- Parameters
in_channels (int | Sequence[int]) – Number of channels in the input feature maps.
out_channels (int) – Number of channels in the output heatmaps.
num_stages (int) – Number of stages.
deconv_out_channels (Sequence[int], optional) – The output channel number of each deconv layer. Defaults to
(256, 256, 256)
deconv_kernel_sizes (Sequence[int | tuple], optional) – The kernel size of each deconv layer. Each element should be either an integer for both height and width dimensions, or a tuple of two integers for the height and the width dimension respectively. Defaults to
(4, 4, 4)
has_final_layer (bool) – Whether have the final 1x1 Conv2d layer. Defaults to
True
loss (Config | List[Config]) – Config of the keypoint loss of different stages. Defaults to use
KeypointMSELoss
.decoder (Config, optional) – The decoder config that controls decoding keypoint coordinates from the network output. Defaults to
None
init_cfg (Config, optional) – Config to control the initialization. See
default_init_cfg
for default settings
- forward(feats: Sequence[torch.Tensor]) List[torch.Tensor] [source]¶
Forward the network. The input is multi-stage feature maps and the output is a list of heatmaps from multiple stages.
- Parameters
feats (Sequence[Tensor]) – Multi-stage feature maps.
- Returns
A list of output heatmaps from multiple stages.
- Return type
List[Tensor]
- loss(feats: Sequence[torch.Tensor], batch_data_samples: Optional[List[mmpose.structures.pose_data_sample.PoseDataSample]], train_cfg: Optional[Union[mmengine.config.config.ConfigDict, dict]] = {}) dict [source]¶
Calculate losses from a batch of inputs and data samples.
- Parameters
feats (Sequence[Tensor]) – Multi-stage feature maps.
batch_data_samples (List[
PoseDataSample
]) – The Data Samples. It usually includes information such as gt_instances.train_cfg (Config, optional) – The training config.
- Returns
A dictionary of loss components.
- Return type
dict
- predict(feats: Union[Tuple[torch.Tensor], List[Tuple[torch.Tensor]], List[List[Tuple[torch.Tensor]]]], batch_data_samples: Optional[List[mmpose.structures.pose_data_sample.PoseDataSample]], test_cfg: Optional[Union[mmengine.config.config.ConfigDict, dict]] = {}) Union[List[mmengine.structures.instance_data.InstanceData], Tuple[List[mmengine.structures.instance_data.InstanceData], List[mmengine.structures.pixel_data.PixelData]]] [source]¶
Predict results from multi-stage feature maps.
- Parameters
feats (Tuple[Tensor] | List[Tuple[Tensor]]) – The multi-stage features (or multiple multi-stage features in TTA)
batch_data_samples (List[
PoseDataSample
]) – The batch data samplestest_cfg (dict) – The runtime config for testing process. Defaults to {}
- Returns
If
test_cfg['output_heatmap']==True
, return both pose and heatmap prediction; otherwise only return the pose prediction.The pose prediction is a list of
InstanceData
, each contains the following fields:- keypoints (np.ndarray): predicted keypoint coordinates in
shape (num_instances, K, D) where K is the keypoint number and D is the keypoint dimension
- keypoint_scores (np.ndarray): predicted keypoint scores in
shape (num_instances, K)
The heatmap prediction is a list of
PixelData
, each contains the following fields:heatmaps (Tensor): The predicted heatmaps in shape (K, h, w)
- Return type
Union[InstanceList | Tuple[InstanceList | PixelDataList]]
- class mmpose.models.heads.DEKRHead(in_channels: Union[int, Sequence[int]], num_keypoints: int, num_heatmap_filters: int = 32, num_displacement_filters_per_keypoint: int = 15, input_transform: str = 'select', input_index: Union[int, Sequence[int]] = - 1, align_corners: bool = False, heatmap_loss: Union[mmengine.config.config.ConfigDict, dict] = {'type': 'KeypointMSELoss', 'use_target_weight': True}, displacement_loss: Union[mmengine.config.config.ConfigDict, dict] = {'supervise_empty': False, 'type': 'SoftWeightSmoothL1Loss', 'use_target_weight': True}, decoder: Optional[Union[mmengine.config.config.ConfigDict, dict]] = None, rescore_cfg: Optional[Union[mmengine.config.config.ConfigDict, dict]] = None, init_cfg: Optional[Union[mmengine.config.config.ConfigDict, dict]] = None)[source]¶
DisEntangled Keypoint Regression head introduced in Bottom-up human pose estimation via disentangled keypoint regression by Geng et al (2021). The head is composed of a heatmap branch and a displacement branch.
- Parameters
in_channels (int | Sequence[int]) – Number of channels in the input feature map
num_joints (int) – Number of joints
num_heatmap_filters (int) – Number of filters for heatmap branch. Defaults to 32
num_offset_filters_per_joint (int) – Number of filters for each joint in displacement branch. Defaults to 15
input_transform (str) –
Transformation of input features which should be one of the following options:
'resize_concat'
: Resize multiple feature maps specifiedby
input_index
to the same size as the first one and concat these feature maps
'select'
: Select feature map(s) specified byinput_index
. Multiple selected features will be bundled into a tuple
Defaults to
'select'
input_index (int | Sequence[int]) – The feature map index used in the input transformation. See also
input_transform
. Defaults to -1align_corners (bool) – align_corners argument of
torch.nn.functional.interpolate()
used in the input transformation. Defaults toFalse
heatmap_loss (Config) – Config of the heatmap loss. Defaults to use
KeypointMSELoss
displacement_loss (Config) – Config of the displacement regression loss. Defaults to use
SoftWeightSmoothL1Loss
decoder (Config, optional) – The decoder config that controls decoding keypoint coordinates from the network output. Defaults to
None
rescore_cfg (Config, optional) – The config for rescore net which estimates OKS via predicted keypoints and keypoint scores. Defaults to
None
init_cfg (Config, optional) – Config to control the initialization. See
default_init_cfg
for default settings
- decode(heatmaps: Tuple[torch.Tensor], displacements: Tuple[torch.Tensor], test_cfg: Union[mmengine.config.config.ConfigDict, dict] = {}, metainfo: dict = {}) List[mmengine.structures.instance_data.InstanceData] [source]¶
Decode keypoints from outputs.
- Parameters
heatmaps (Tuple[Tensor]) – The output heatmaps inferred from one image or multi-scale images.
displacements (Tuple[Tensor]) – The output displacement fields inferred from one image or multi-scale images.
test_cfg (dict) – The runtime config for testing process. Defaults to {}
metainfo (dict) – The metainfo of test dataset. Defaults to {}
- Returns
- A list of InstanceData, each contains the
decoded pose information of the instances of one data sample.
- Return type
List[InstanceData]
- forward(feats: Tuple[torch.Tensor]) torch.Tensor [source]¶
Forward the network. The input is multi scale feature maps and the output is a tuple of heatmap and displacement.
- Parameters
feats (Tuple[Tensor]) – Multi scale feature maps.
- Returns
output heatmap and displacement.
- Return type
Tuple[Tensor]
- loss(feats: Tuple[torch.Tensor], batch_data_samples: Optional[List[mmpose.structures.pose_data_sample.PoseDataSample]], train_cfg: Union[mmengine.config.config.ConfigDict, dict] = {}) dict [source]¶
Calculate losses from a batch of inputs and data samples.
- Parameters
feats (Tuple[Tensor]) – The multi-stage features
batch_data_samples (List[
PoseDataSample
]) – The batch data samplestrain_cfg (dict) – The runtime config for training process. Defaults to {}
- Returns
A dictionary of losses.
- Return type
dict
- predict(feats: Union[Tuple[torch.Tensor], List[Tuple[torch.Tensor]], List[List[Tuple[torch.Tensor]]]], batch_data_samples: Optional[List[mmpose.structures.pose_data_sample.PoseDataSample]], test_cfg: Union[mmengine.config.config.ConfigDict, dict] = {}) Union[List[mmengine.structures.instance_data.InstanceData], Tuple[List[mmengine.structures.instance_data.InstanceData], List[mmengine.structures.pixel_data.PixelData]]] [source]¶
Predict results from features.
- Parameters
feats (Tuple[Tensor] | List[Tuple[Tensor]]) – The multi-stage features (or multiple multi-scale features in TTA)
batch_data_samples (List[
PoseDataSample
]) – The batch data samplestest_cfg (dict) – The runtime config for testing process. Defaults to {}
- Returns
If
test_cfg['output_heatmap']==True
, return both pose and heatmap prediction; otherwise only return the pose prediction.The pose prediction is a list of
InstanceData
, each contains the following fields:- keypoints (np.ndarray): predicted keypoint coordinates in
shape (num_instances, K, D) where K is the keypoint number and D is the keypoint dimension
- keypoint_scores (np.ndarray): predicted keypoint scores in
shape (num_instances, K)
The heatmap prediction is a list of
PixelData
, each contains the following fields:- heatmaps (Tensor): The predicted heatmaps in shape (1, h, w)
or (K+1, h, w) if keypoint heatmaps are predicted
- displacements (Tensor): The predicted displacement fields
in shape (K*2, h, w)
- Return type
Union[InstanceList | Tuple[InstanceList | PixelDataList]]
- class mmpose.models.heads.DSNTHead(in_channels: Union[int, Sequence[int]], in_featuremap_size: Tuple[int, int], num_joints: int, lambda_t: int = - 1, debias: bool = False, beta: float = 1.0, deconv_out_channels: Optional[Sequence[int]] = (256, 256, 256), deconv_kernel_sizes: Optional[Sequence[int]] = (4, 4, 4), conv_out_channels: Optional[Sequence[int]] = None, conv_kernel_sizes: Optional[Sequence[int]] = None, has_final_layer: bool = True, input_transform: str = 'select', input_index: Union[int, Sequence[int]] = - 1, align_corners: bool = False, loss: Union[mmengine.config.config.ConfigDict, dict] = {'losses': [{'type': 'SmoothL1Loss', 'use_target_weight': True}, {'type': 'JSDiscretLoss', 'use_target_weight': True}], 'type': 'MultipleLossWrapper'}, decoder: Optional[Union[mmengine.config.config.ConfigDict, dict]] = None, init_cfg: Optional[Union[mmengine.config.config.ConfigDict, dict]] = None)[source]¶
Top-down integral regression head introduced in DSNT by Nibali et al(2018). The head contains a differentiable spatial to numerical transform (DSNT) layer that do soft-argmax operation on the predicted heatmaps to regress the coordinates.
This head is used for algorithms that require supervision of heatmaps in DSNT approach.
- Parameters
in_channels (int | sequence[int]) – Number of input channels
in_featuremap_size (int | sequence[int]) – Size of input feature map
num_joints (int) – Number of joints
lambda_t (int) – Discard heatmap-based loss when current epoch > lambda_t. Defaults to -1.
debias (bool) – Whether to remove the bias of Integral Pose Regression. see `Removing the Bias of Integral Pose Regression`_ by Gu et al (2021). Defaults to
False
.beta (float) – A smoothing parameter in softmax. Defaults to
1.0
.deconv_out_channels (sequence[int]) – The output channel number of each deconv layer. Defaults to
(256, 256, 256)
deconv_kernel_sizes (sequence[int | tuple], optional) – The kernel size of each deconv layer. Each element should be either an integer for both height and width dimensions, or a tuple of two integers for the height and the width dimension respectively.Defaults to
(4, 4, 4)
conv_out_channels (sequence[int], optional) – The output channel number of each intermediate conv layer.
None
means no intermediate conv layer between deconv layers and the final conv layer. Defaults toNone
conv_kernel_sizes (sequence[int | tuple], optional) – The kernel size of each intermediate conv layer. Defaults to
None
input_transform (str) –
Transformation of input features which should be one of the following options:
'resize_concat'
: Resize multiple feature maps specifiedby
input_index
to the same size as the first one and concat these feature maps
'select'
: Select feature map(s) specified byinput_index
. Multiple selected features will be bundled into a tuple
Defaults to
'select'
input_index (int | sequence[int]) – The feature map index used in the input transformation. See also
input_transform
. Defaults to -1align_corners (bool) – align_corners argument of
torch.nn.functional.interpolate()
used in the input transformation. Defaults toFalse
loss (Config) – Config for keypoint loss. Defaults to use
DSNTLoss
decoder (Config, optional) – The decoder config that controls decoding keypoint coordinates from the network output. Defaults to
None
init_cfg (Config, optional) – Config to control the initialization. See
default_init_cfg
for default settings
- loss(inputs: Tuple[torch.Tensor], batch_data_samples: Optional[List[mmpose.structures.pose_data_sample.PoseDataSample]], train_cfg: Union[mmengine.config.config.ConfigDict, dict] = {}) dict [source]¶
Calculate losses from a batch of inputs and data samples.
- class mmpose.models.heads.HeatmapHead(in_channels: Union[int, Sequence[int]], out_channels: int, deconv_out_channels: Optional[Sequence[int]] = (256, 256, 256), deconv_kernel_sizes: Optional[Sequence[int]] = (4, 4, 4), conv_out_channels: Optional[Sequence[int]] = None, conv_kernel_sizes: Optional[Sequence[int]] = None, has_final_layer: bool = True, input_transform: str = 'select', input_index: Union[int, Sequence[int]] = - 1, align_corners: bool = False, loss: Union[mmengine.config.config.ConfigDict, dict] = {'type': 'KeypointMSELoss', 'use_target_weight': True}, decoder: Optional[Union[mmengine.config.config.ConfigDict, dict]] = None, init_cfg: Optional[Union[mmengine.config.config.ConfigDict, dict]] = None, extra=None)[source]¶
Top-down heatmap head introduced in Simple Baselines by Xiao et al (2018). The head is composed of a few deconvolutional layers followed by a convolutional layer to generate heatmaps from low-resolution feature maps.
- Parameters
in_channels (int | Sequence[int]) – Number of channels in the input feature map
out_channels (int) – Number of channels in the output heatmap
deconv_out_channels (Sequence[int], optional) – The output channel number of each deconv layer. Defaults to
(256, 256, 256)
deconv_kernel_sizes (Sequence[int | tuple], optional) – The kernel size of each deconv layer. Each element should be either an integer for both height and width dimensions, or a tuple of two integers for the height and the width dimension respectively.Defaults to
(4, 4, 4)
conv_out_channels (Sequence[int], optional) – The output channel number of each intermediate conv layer.
None
means no intermediate conv layer between deconv layers and the final conv layer. Defaults toNone
conv_kernel_sizes (Sequence[int | tuple], optional) – The kernel size of each intermediate conv layer. Defaults to
None
has_final_layer (bool) – Whether have the final 1x1 Conv2d layer. Defaults to
True
input_transform (str) –
Transformation of input features which should be one of the following options:
'resize_concat'
: Resize multiple feature maps specifiedby
input_index
to the same size as the first one and concat these feature maps
'select'
: Select feature map(s) specified byinput_index
. Multiple selected features will be bundled into a tuple
Defaults to
'select'
input_index (int | Sequence[int]) – The feature map index used in the input transformation. See also
input_transform
. Defaults to -1align_corners (bool) – align_corners argument of
torch.nn.functional.interpolate()
used in the input transformation. Defaults toFalse
loss (Config) – Config of the keypoint loss. Defaults to use
KeypointMSELoss
decoder (Config, optional) – The decoder config that controls decoding keypoint coordinates from the network output. Defaults to
None
init_cfg (Config, optional) – Config to control the initialization. See
default_init_cfg
for default settingsextra (dict, optional) – Extra configurations. Defaults to
None
- forward(feats: Tuple[torch.Tensor]) torch.Tensor [source]¶
Forward the network. The input is multi scale feature maps and the output is the heatmap.
- Parameters
feats (Tuple[Tensor]) – Multi scale feature maps.
- Returns
output heatmap.
- Return type
Tensor
- loss(feats: Tuple[torch.Tensor], batch_data_samples: Optional[List[mmpose.structures.pose_data_sample.PoseDataSample]], train_cfg: Union[mmengine.config.config.ConfigDict, dict] = {}) dict [source]¶
Calculate losses from a batch of inputs and data samples.
- Parameters
feats (Tuple[Tensor]) – The multi-stage features
batch_data_samples (List[
PoseDataSample
]) – The batch data samplestrain_cfg (dict) – The runtime config for training process. Defaults to {}
- Returns
A dictionary of losses.
- Return type
dict
- predict(feats: Union[Tuple[torch.Tensor], List[Tuple[torch.Tensor]], List[List[Tuple[torch.Tensor]]]], batch_data_samples: Optional[List[mmpose.structures.pose_data_sample.PoseDataSample]], test_cfg: Union[mmengine.config.config.ConfigDict, dict] = {}) Union[List[mmengine.structures.instance_data.InstanceData], Tuple[List[mmengine.structures.instance_data.InstanceData], List[mmengine.structures.pixel_data.PixelData]]] [source]¶
Predict results from features.
- Parameters
feats (Tuple[Tensor] | List[Tuple[Tensor]]) – The multi-stage features (or multiple multi-stage features in TTA)
batch_data_samples (List[
PoseDataSample
]) – The batch data samplestest_cfg (dict) – The runtime config for testing process. Defaults to {}
- Returns
If
test_cfg['output_heatmap']==True
, return both pose and heatmap prediction; otherwise only return the pose prediction.The pose prediction is a list of
InstanceData
, each contains the following fields:- keypoints (np.ndarray): predicted keypoint coordinates in
shape (num_instances, K, D) where K is the keypoint number and D is the keypoint dimension
- keypoint_scores (np.ndarray): predicted keypoint scores in
shape (num_instances, K)
The heatmap prediction is a list of
PixelData
, each contains the following fields:heatmaps (Tensor): The predicted heatmaps in shape (K, h, w)
- Return type
Union[InstanceList | Tuple[InstanceList | PixelDataList]]
- class mmpose.models.heads.IntegralRegressionHead(in_channels: Union[int, Sequence[int]], in_featuremap_size: Tuple[int, int], num_joints: int, debias: bool = False, beta: float = 1.0, deconv_out_channels: Optional[Sequence[int]] = (256, 256, 256), deconv_kernel_sizes: Optional[Sequence[int]] = (4, 4, 4), conv_out_channels: Optional[Sequence[int]] = None, conv_kernel_sizes: Optional[Sequence[int]] = None, has_final_layer: bool = True, input_transform: str = 'select', input_index: Union[int, Sequence[int]] = - 1, align_corners: bool = False, loss: Union[mmengine.config.config.ConfigDict, dict] = {'type': 'SmoothL1Loss', 'use_target_weight': True}, decoder: Optional[Union[mmengine.config.config.ConfigDict, dict]] = None, init_cfg: Optional[Union[mmengine.config.config.ConfigDict, dict]] = None)[source]¶
Top-down integral regression head introduced in IPR by Xiao et al(2018). The head contains a differentiable spatial to numerical transform (DSNT) layer that do soft-argmax operation on the predicted heatmaps to regress the coordinates.
This head is used for algorithms that only supervise the coordinates.
- Parameters
in_channels (int | sequence[int]) – Number of input channels
in_featuremap_size (int | sequence[int]) – Size of input feature map
num_joints (int) – Number of joints
debias (bool) – Whether to remove the bias of Integral Pose Regression. see `Removing the Bias of Integral Pose Regression`_ by Gu et al (2021). Defaults to
False
.beta (float) – A smoothing parameter in softmax. Defaults to
1.0
.deconv_out_channels (sequence[int]) – The output channel number of each deconv layer. Defaults to
(256, 256, 256)
deconv_kernel_sizes (sequence[int | tuple], optional) – The kernel size of each deconv layer. Each element should be either an integer for both height and width dimensions, or a tuple of two integers for the height and the width dimension respectively.Defaults to
(4, 4, 4)
conv_out_channels (sequence[int], optional) – The output channel number of each intermediate conv layer.
None
means no intermediate conv layer between deconv layers and the final conv layer. Defaults toNone
conv_kernel_sizes (sequence[int | tuple], optional) – The kernel size of each intermediate conv layer. Defaults to
None
input_transform (str) –
Transformation of input features which should be one of the following options:
'resize_concat'
: Resize multiple feature maps specifiedby
input_index
to the same size as the first one and concat these feature maps
'select'
: Select feature map(s) specified byinput_index
. Multiple selected features will be bundled into a tuple
Defaults to
'select'
input_index (int | sequence[int]) – The feature map index used in the input transformation. See also
input_transform
. Defaults to -1align_corners (bool) – align_corners argument of
torch.nn.functional.interpolate()
used in the input transformation. Defaults toFalse
loss (Config) – Config for keypoint loss. Defaults to use
SmoothL1Loss
decoder (Config, optional) – The decoder config that controls decoding keypoint coordinates from the network output. Defaults to
None
init_cfg (Config, optional) – Config to control the initialization. See
default_init_cfg
for default settings
- forward(feats: Tuple[torch.Tensor]) Union[torch.Tensor, Tuple[torch.Tensor]] [source]¶
Forward the network. The input is multi scale feature maps and the output is the coordinates.
- Parameters
feats (Tuple[Tensor]) – Multi scale feature maps.
- Returns
output coordinates(and sigmas[optional]).
- Return type
Tensor
- loss(inputs: Tuple[torch.Tensor], batch_data_samples: Optional[List[mmpose.structures.pose_data_sample.PoseDataSample]], train_cfg: Union[mmengine.config.config.ConfigDict, dict] = {}) dict [source]¶
Calculate losses from a batch of inputs and data samples.
- predict(feats: Tuple[torch.Tensor], batch_data_samples: Optional[List[mmpose.structures.pose_data_sample.PoseDataSample]], test_cfg: Union[mmengine.config.config.ConfigDict, dict] = {}) Union[List[mmengine.structures.instance_data.InstanceData], Tuple[List[mmengine.structures.instance_data.InstanceData], List[mmengine.structures.pixel_data.PixelData]]] [source]¶
Predict results from features.
- Parameters
feats (Tuple[Tensor] | List[Tuple[Tensor]]) – The multi-stage features (or multiple multi-stage features in TTA)
batch_data_samples (List[
PoseDataSample
]) – The batch data samplestest_cfg (dict) – The runtime config for testing process. Defaults to {}
- Returns
If
test_cfg['output_heatmap']==True
, return both pose and heatmap prediction; otherwise only return the pose prediction.The pose prediction is a list of
InstanceData
, each contains the following fields:- keypoints (np.ndarray): predicted keypoint coordinates in
shape (num_instances, K, D) where K is the keypoint number and D is the keypoint dimension
- keypoint_scores (np.ndarray): predicted keypoint scores in
shape (num_instances, K)
The heatmap prediction is a list of
PixelData
, each contains the following fields:heatmaps (Tensor): The predicted heatmaps in shape (K, h, w)
- Return type
Union[InstanceList | Tuple[InstanceList | PixelDataList]]
- class mmpose.models.heads.MSPNHead(num_stages: int = 4, num_units: int = 4, out_shape: tuple = (64, 48), unit_channels: int = 256, out_channels: int = 17, use_prm: bool = False, norm_cfg: Union[mmengine.config.config.ConfigDict, dict] = {'type': 'BN'}, level_indices: Sequence[int] = [], loss: Union[mmengine.config.config.ConfigDict, dict, List[Union[mmengine.config.config.ConfigDict, dict]]] = {'type': 'KeypointMSELoss', 'use_target_weight': True}, decoder: Optional[Union[mmengine.config.config.ConfigDict, dict]] = None, init_cfg: Optional[Union[mmengine.config.config.ConfigDict, dict]] = None)[source]¶
Multi-stage multi-unit heatmap head introduced in `Multi-Stage Pose estimation Network (MSPN)`_ by Li et al (2019), and used by `Residual Steps Networks (RSN)`_ by Cai et al (2020). The head consists of multiple stages and each stage consists of multiple units. Each unit of each stage has some conv layers.
- Parameters
num_stages (int) – Number of stages.
num_units (int) – Number of units in each stage.
out_shape (tuple) – The output shape of the output heatmaps.
unit_channels (int) – Number of input channels.
out_channels (int) – Number of output channels.
out_shape – Shape of the output heatmaps.
use_prm (bool) – Whether to use pose refine machine (PRM). Defaults to
False
.norm_cfg (Config) – Config to construct the norm layer. Defaults to
dict(type='BN')
loss (Config | List[Config]) – Config of the keypoint loss for different stages and different units. Defaults to use
KeypointMSELoss
.level_indices (Sequence[int]) – The indices that specified the level of target heatmaps.
decoder (Config, optional) – The decoder config that controls decoding keypoint coordinates from the network output. Defaults to
None
init_cfg (Config, optional) – Config to control the initialization. See
default_init_cfg
for default settings
- property default_init_cfg¶
Default config for weight initialization.
- forward(feats: Sequence[Sequence[torch.Tensor]]) List[torch.Tensor] [source]¶
Forward the network. The input is multi-stage multi-unit feature maps and the output is a list of heatmaps from multiple stages.
- Parameters
feats (Sequence[Sequence[Tensor]]) – Feature maps from multiple stages and units.
- Returns
- A list of output heatmaps from multiple stages
and units.
- Return type
List[Tensor]
- loss(feats: Sequence[Sequence[torch.Tensor]], batch_data_samples: Optional[List[mmpose.structures.pose_data_sample.PoseDataSample]], train_cfg: Optional[Union[mmengine.config.config.ConfigDict, dict]] = {}) dict [source]¶
Calculate losses from a batch of inputs and data samples.
Note
batch_size: B
num_output_heatmap_levels: L
num_keypoints: K
heatmaps height: H
heatmaps weight: W
num_instances: N (usually 1 in topdown heatmap heads)
- Parameters
feats (Sequence[Sequence[Tensor]]) – Feature maps from multiple stages and units
batch_data_samples (List[
PoseDataSample
]) – The Data Samples. It usually includes information such as gt_instance_labels and gt_fields.train_cfg (Config, optional) – The training config
- Returns
A dictionary of loss components.
- Return type
dict
- predict(feats: Union[Sequence[Sequence[torch.Tensor]], List[Sequence[Sequence[torch.Tensor]]]], batch_data_samples: Optional[List[mmpose.structures.pose_data_sample.PoseDataSample]], test_cfg: Optional[Union[mmengine.config.config.ConfigDict, dict]] = {}) Union[List[mmengine.structures.instance_data.InstanceData], Tuple[List[mmengine.structures.instance_data.InstanceData], List[mmengine.structures.pixel_data.PixelData]]] [source]¶
Predict results from multi-stage feature maps.
- Parameters
feats (Sequence[Sequence[Tensor]]) – Multi-stage multi-unit features (or multiple MSMU features for TTA)
batch_data_samples (List[
PoseDataSample
]) – The Data Samples. It usually includes information such as gt_instance_labels.test_cfg (Config, optional) – The testing/inference config
- Returns
If
test_cfg['output_heatmap']==True
, return both pose and heatmap prediction; otherwise only return the pose prediction.The pose prediction is a list of
InstanceData
, each contains the following fields:- keypoints (np.ndarray): predicted keypoint coordinates in
shape (num_instances, K, D) where K is the keypoint number and D is the keypoint dimension
- keypoint_scores (np.ndarray): predicted keypoint scores in
shape (num_instances, K)
The heatmap prediction is a list of
PixelData
, each contains the following fields:heatmaps (Tensor): The predicted heatmaps in shape (K, h, w)
- Return type
Union[InstanceList | Tuple[InstanceList | PixelDataList]]
- class mmpose.models.heads.RLEHead(in_channels: Union[int, Sequence[int]], num_joints: int, input_transform: str = 'select', input_index: Union[int, Sequence[int]] = - 1, align_corners: bool = False, loss: Union[mmengine.config.config.ConfigDict, dict] = {'type': 'RLELoss', 'use_target_weight': True}, decoder: Optional[Union[mmengine.config.config.ConfigDict, dict]] = None, init_cfg: Optional[Union[mmengine.config.config.ConfigDict, dict]] = None)[source]¶
Top-down regression head introduced in RLE by Li et al(2021). The head is composed of fully-connected layers to predict the coordinates and sigma(the variance of the coordinates) together.
- Parameters
in_channels (int | sequence[int]) – Number of input channels
num_joints (int) – Number of joints
input_transform (str) –
Transformation of input features which should be one of the following options:
'resize_concat'
: Resize multiple feature maps specifiedby
input_index
to the same size as the first one and concat these feature maps
'select'
: Select feature map(s) specified byinput_index
. Multiple selected features will be bundled into a tuple
Defaults to
'select'
input_index (int | sequence[int]) – The feature map index used in the input transformation. See also
input_transform
. Defaults to -1align_corners (bool) – align_corners argument of
torch.nn.functional.interpolate()
used in the input transformation. Defaults toFalse
loss (Config) – Config for keypoint loss. Defaults to use
RLELoss
decoder (Config, optional) – The decoder config that controls decoding keypoint coordinates from the network output. Defaults to
None
init_cfg (Config, optional) – Config to control the initialization. See
default_init_cfg
for default settings
- forward(feats: Tuple[torch.Tensor]) torch.Tensor [source]¶
Forward the network. The input is multi scale feature maps and the output is the coordinates.
- Parameters
feats (Tuple[Tensor]) – Multi scale feature maps.
- Returns
output coordinates(and sigmas[optional]).
- Return type
Tensor
- loss(inputs: Tuple[torch.Tensor], batch_data_samples: Optional[List[mmpose.structures.pose_data_sample.PoseDataSample]], train_cfg: Union[mmengine.config.config.ConfigDict, dict] = {}) dict [source]¶
Calculate losses from a batch of inputs and data samples.
- predict(feats: Tuple[torch.Tensor], batch_data_samples: Optional[List[mmpose.structures.pose_data_sample.PoseDataSample]], test_cfg: Union[mmengine.config.config.ConfigDict, dict] = {}) Union[List[mmengine.structures.instance_data.InstanceData], Tuple[List[mmengine.structures.instance_data.InstanceData], List[mmengine.structures.pixel_data.PixelData]]] [source]¶
Predict results from outputs.
- class mmpose.models.heads.RTMCCHead(in_channels: Union[int, Sequence[int]], out_channels: int, input_size: Tuple[int, int], in_featuremap_size: Tuple[int, int], simcc_split_ratio: float = 2.0, final_layer_kernel_size: int = 1, gau_cfg: Union[mmengine.config.config.ConfigDict, dict] = {'act_fn': 'ReLU', 'drop_path': 0.0, 'dropout_rate': 0.0, 'expansion_factor': 2, 'hidden_dims': 256, 'pos_enc': False, 's': 128, 'use_rel_bias': False}, input_transform: str = 'select', input_index: Union[int, Sequence[int]] = - 1, align_corners: bool = False, loss: Union[mmengine.config.config.ConfigDict, dict] = {'type': 'KLDiscretLoss', 'use_target_weight': True}, decoder: Optional[Union[mmengine.config.config.ConfigDict, dict]] = None, init_cfg: Optional[Union[mmengine.config.config.ConfigDict, dict]] = None)[source]¶
Top-down head introduced in RTMPose (2023). The head is composed of a large-kernel convolutional layer, a fully-connected layer and a Gated Attention Unit to generate 1d representation from low-resolution feature maps.
- Parameters
in_channels (int | sequence[int]) – Number of channels in the input feature map.
out_channels (int) – Number of channels in the output heatmap.
input_size (tuple) – Size of input image in shape [w, h].
in_featuremap_size (int | sequence[int]) – Size of input feature map.
simcc_split_ratio (float) – Split ratio of pixels. Default: 2.0.
final_layer_kernel_size (int) – Kernel size of the convolutional layer. Default: 1.
gau_cfg (Config) –
Config dict for the Gated Attention Unit. Default: dict(
hidden_dims=256, s=128, expansion_factor=2, dropout_rate=0., drop_path=0., act_fn=’ReLU’, use_rel_bias=False, pos_enc=False).
input_transform (str) –
Transformation of input features which should be one of the following options:
'resize_concat'
: Resize multiple feature maps specifiedby
input_index
to the same size as the first one and concat these feature maps
'select'
: Select feature map(s) specified byinput_index
. Multiple selected features will be bundled into a tuple
Defaults to
'select'
input_index (int | sequence[int]) – The feature map index used in the input transformation. See also
input_transform
. Defaults to -1align_corners (bool) – align_corners argument of
torch.nn.functional.interpolate()
used in the input transformation. Defaults toFalse
loss (Config) – Config of the keypoint loss. Defaults to use
KLDiscretLoss
decoder (Config, optional) – The decoder config that controls decoding keypoint coordinates from the network output. Defaults to
None
init_cfg (Config, optional) – Config to control the initialization. See
default_init_cfg
for default settings
- forward(feats: Tuple[torch.Tensor]) Tuple[torch.Tensor, torch.Tensor] [source]¶
Forward the network.
The input is multi scale feature maps and the output is the heatmap.
- Parameters
feats (Tuple[Tensor]) – Multi scale feature maps.
- Returns
1d representation of x. pred_y (Tensor): 1d representation of y.
- Return type
pred_x (Tensor)
- loss(feats: Tuple[torch.Tensor], batch_data_samples: Optional[List[mmpose.structures.pose_data_sample.PoseDataSample]], train_cfg: Optional[Union[mmengine.config.config.ConfigDict, dict]] = {}) dict [source]¶
Calculate losses from a batch of inputs and data samples.
- predict(feats: Tuple[torch.Tensor], batch_data_samples: Optional[List[mmpose.structures.pose_data_sample.PoseDataSample]], test_cfg: Optional[Union[mmengine.config.config.ConfigDict, dict]] = {}) List[mmengine.structures.instance_data.InstanceData] [source]¶
Predict results from features.
- Parameters
feats (Tuple[Tensor] | List[Tuple[Tensor]]) – The multi-stage features (or multiple multi-stage features in TTA)
batch_data_samples (List[
PoseDataSample
]) – The batch data samplestest_cfg (dict) – The runtime config for testing process. Defaults to {}
- Returns
The pose predictions, each contains the following fields:
- keypoints (np.ndarray): predicted keypoint coordinates in
shape (num_instances, K, D) where K is the keypoint number and D is the keypoint dimension
- keypoint_scores (np.ndarray): predicted keypoint scores in
shape (num_instances, K)
- keypoint_x_labels (np.ndarray, optional): The predicted 1-D
intensity distribution in the x direction
- keypoint_y_labels (np.ndarray, optional): The predicted 1-D
intensity distribution in the y direction
- Return type
List[InstanceData]
- class mmpose.models.heads.RegressionHead(in_channels: Union[int, Sequence[int]], num_joints: int, input_transform: str = 'select', input_index: Union[int, Sequence[int]] = - 1, align_corners: bool = False, loss: Union[mmengine.config.config.ConfigDict, dict] = {'type': 'SmoothL1Loss', 'use_target_weight': True}, decoder: Optional[Union[mmengine.config.config.ConfigDict, dict]] = None, init_cfg: Optional[Union[mmengine.config.config.ConfigDict, dict]] = None)[source]¶
Top-down regression head introduced in Deeppose by Toshev et al (2014). The head is composed of fully-connected layers to predict the coordinates directly.
- Parameters
in_channels (int | sequence[int]) – Number of input channels
num_joints (int) – Number of joints
input_transform (str) –
Transformation of input features which should be one of the following options:
'resize_concat'
: Resize multiple feature maps specifiedby
input_index
to the same size as the first one and concat these feature maps
'select'
: Select feature map(s) specified byinput_index
. Multiple selected features will be bundled into a tuple
Defaults to
'select'
input_index (int | sequence[int]) – The feature map index used in the input transformation. See also
input_transform
. Defaults to -1align_corners (bool) – align_corners argument of
torch.nn.functional.interpolate()
used in the input transformation. Defaults toFalse
loss (Config) – Config for keypoint loss. Defaults to use
SmoothL1Loss
decoder (Config, optional) – The decoder config that controls decoding keypoint coordinates from the network output. Defaults to
None
init_cfg (Config, optional) – Config to control the initialization. See
default_init_cfg
for default settings
- forward(feats: Tuple[torch.Tensor]) torch.Tensor [source]¶
Forward the network. The input is multi scale feature maps and the output is the coordinates.
- Parameters
feats (Tuple[Tensor]) – Multi scale feature maps.
- Returns
output coordinates(and sigmas[optional]).
- Return type
Tensor
- loss(inputs: Tuple[torch.Tensor], batch_data_samples: Optional[List[mmpose.structures.pose_data_sample.PoseDataSample]], train_cfg: Union[mmengine.config.config.ConfigDict, dict] = {}) dict [source]¶
Calculate losses from a batch of inputs and data samples.
- predict(feats: Tuple[torch.Tensor], batch_data_samples: Optional[List[mmpose.structures.pose_data_sample.PoseDataSample]], test_cfg: Union[mmengine.config.config.ConfigDict, dict] = {}) Union[List[mmengine.structures.instance_data.InstanceData], Tuple[List[mmengine.structures.instance_data.InstanceData], List[mmengine.structures.pixel_data.PixelData]]] [source]¶
Predict results from outputs.
- class mmpose.models.heads.SimCCHead(in_channels: Union[int, Sequence[int]], out_channels: int, input_size: Tuple[int, int], in_featuremap_size: Tuple[int, int], simcc_split_ratio: float = 2.0, deconv_type: str = 'heatmap', deconv_out_channels: Optional[Sequence[int]] = (256, 256, 256), deconv_kernel_sizes: Optional[Sequence[int]] = (4, 4, 4), deconv_num_groups: Optional[Sequence[int]] = (16, 16, 16), conv_out_channels: Optional[Sequence[int]] = None, conv_kernel_sizes: Optional[Sequence[int]] = None, has_final_layer: bool = True, input_transform: str = 'select', input_index: Union[int, Sequence[int]] = - 1, align_corners: bool = False, loss: Union[mmengine.config.config.ConfigDict, dict] = {'type': 'KLDiscretLoss', 'use_target_weight': True}, decoder: Optional[Union[mmengine.config.config.ConfigDict, dict]] = None, init_cfg: Optional[Union[mmengine.config.config.ConfigDict, dict]] = None)[source]¶
Top-down heatmap head introduced in SimCC by Li et al (2022). The head is composed of a few deconvolutional layers followed by a fully- connected layer to generate 1d representation from low-resolution feature maps.
- Parameters
in_channels (int | sequence[int]) – Number of channels in the input feature map
out_channels (int) – Number of channels in the output heatmap
input_size (tuple) – Input image size in shape [w, h]
in_featuremap_size (int | sequence[int]) – Size of input feature map
simcc_split_ratio (float) – Split ratio of pixels
deconv_type (str, optional) –
The type of deconv head which should be one of the following options:
'heatmap'
: make deconv layers in HeatmapHead'vipnas'
: make deconv layers in ViPNASHead
Defaults to
'Heatmap'
deconv_out_channels (sequence[int]) – The output channel number of each deconv layer. Defaults to
(256, 256, 256)
deconv_kernel_sizes (sequence[int | tuple], optional) – The kernel size of each deconv layer. Each element should be either an integer for both height and width dimensions, or a tuple of two integers for the height and the width dimension respectively.Defaults to
(4, 4, 4)
deconv_num_groups (Sequence[int], optional) – The group number of each deconv layer. Defaults to
(16, 16, 16)
conv_out_channels (sequence[int], optional) – The output channel number of each intermediate conv layer.
None
means no intermediate conv layer between deconv layers and the final conv layer. Defaults toNone
conv_kernel_sizes (sequence[int | tuple], optional) – The kernel size of each intermediate conv layer. Defaults to
None
input_transform (str) –
Transformation of input features which should be one of the following options:
'resize_concat'
: Resize multiple feature maps specifiedby
input_index
to the same size as the first one and concat these feature maps
'select'
: Select feature map(s) specified byinput_index
. Multiple selected features will be bundled into a tuple
Defaults to
'select'
input_index (int | sequence[int]) – The feature map index used in the input transformation. See also
input_transform
. Defaults to -1align_corners (bool) – align_corners argument of
torch.nn.functional.interpolate()
used in the input transformation. Defaults toFalse
loss (Config) – Config of the keypoint loss. Defaults to use
KLDiscretLoss
decoder (Config, optional) – The decoder config that controls decoding keypoint coordinates from the network output. Defaults to
None
init_cfg (Config, optional) – Config to control the initialization. See
default_init_cfg
for default settings
- forward(feats: Tuple[torch.Tensor]) Tuple[torch.Tensor, torch.Tensor] [source]¶
Forward the network. The input is multi scale feature maps and the output is the heatmap.
- Parameters
feats (Tuple[Tensor]) – Multi scale feature maps.
- Returns
1d representation of x. pred_y (Tensor): 1d representation of y.
- Return type
pred_x (Tensor)
- loss(feats: Tuple[torch.Tensor], batch_data_samples: Optional[List[mmpose.structures.pose_data_sample.PoseDataSample]], train_cfg: Optional[Union[mmengine.config.config.ConfigDict, dict]] = {}) dict [source]¶
Calculate losses from a batch of inputs and data samples.
- predict(feats: Tuple[torch.Tensor], batch_data_samples: Optional[List[mmpose.structures.pose_data_sample.PoseDataSample]], test_cfg: Optional[Union[mmengine.config.config.ConfigDict, dict]] = {}) List[mmengine.structures.instance_data.InstanceData] [source]¶
Predict results from features.
- Parameters
feats (Tuple[Tensor] | List[Tuple[Tensor]]) – The multi-stage features (or multiple multi-stage features in TTA)
batch_data_samples (List[
PoseDataSample
]) – The batch data samplestest_cfg (dict) – The runtime config for testing process. Defaults to {}
- Returns
The pose predictions, each contains the following fields:
- keypoints (np.ndarray): predicted keypoint coordinates in
shape (num_instances, K, D) where K is the keypoint number and D is the keypoint dimension
- keypoint_scores (np.ndarray): predicted keypoint scores in
shape (num_instances, K)
- keypoint_x_labels (np.ndarray, optional): The predicted 1-D
intensity distribution in the x direction
- keypoint_y_labels (np.ndarray, optional): The predicted 1-D
intensity distribution in the y direction
- Return type
List[InstanceData]
- class mmpose.models.heads.ViPNASHead(in_channels: Union[int, Sequence[int]], out_channels: int, deconv_out_channels: Optional[Sequence[int]] = (144, 144, 144), deconv_kernel_sizes: Optional[Sequence[int]] = (4, 4, 4), deconv_num_groups: Optional[Sequence[int]] = (16, 16, 16), conv_out_channels: Optional[Sequence[int]] = None, conv_kernel_sizes: Optional[Sequence[int]] = None, has_final_layer: bool = True, input_transform: str = 'select', input_index: Union[int, Sequence[int]] = - 1, align_corners: bool = False, loss: Union[mmengine.config.config.ConfigDict, dict] = {'type': 'KeypointMSELoss', 'use_target_weight': True}, decoder: Optional[Union[mmengine.config.config.ConfigDict, dict]] = None, init_cfg: Optional[Union[mmengine.config.config.ConfigDict, dict]] = None)[source]¶
ViPNAS heatmap head introduced in ViPNAS by Xu et al (2021). The head is composed of a few deconvolutional layers followed by a convolutional layer to generate heatmaps from low-resolution feature maps. Specifically, different from the :class: HeatmapHead introduced by Simple Baselines, the group numbers in the deconvolutional layers are elastic and thus can be optimized by neural architecture search (NAS).
- Parameters
in_channels (int | Sequence[int]) – Number of channels in the input feature map
out_channels (int) – Number of channels in the output heatmap
deconv_out_channels (Sequence[int], optional) – The output channel number of each deconv layer. Defaults to
(144, 144, 144)
deconv_kernel_sizes (Sequence[int | tuple], optional) – The kernel size of each deconv layer. Each element should be either an integer for both height and width dimensions, or a tuple of two integers for the height and the width dimension respectively.Defaults to
(4, 4, 4)
deconv_num_groups (Sequence[int], optional) – The group number of each deconv layer. Defaults to
(16, 16, 16)
conv_out_channels (Sequence[int], optional) – The output channel number of each intermediate conv layer.
None
means no intermediate conv layer between deconv layers and the final conv layer. Defaults toNone
conv_kernel_sizes (Sequence[int | tuple], optional) – The kernel size of each intermediate conv layer. Defaults to
None
has_final_layer (bool) – Whether have the final 1x1 Conv2d layer. Defaults to
True
input_transform (str) –
Transformation of input features which should be one of the following options:
'resize_concat'
: Resize multiple feature maps specifiedby
input_index
to the same size as the first one and concat these feature maps
'select'
: Select feature map(s) specified byinput_index
. Multiple selected features will be bundled into a tuple
Defaults to
'select'
input_index (int | Sequence[int]) – The feature map index used in the input transformation. See also
input_transform
. Defaults to -1align_corners (bool) – align_corners argument of
torch.nn.functional.interpolate()
used in the input transformation. Defaults toFalse
loss (Config) – Config of the keypoint loss. Defaults to use
KeypointMSELoss
decoder (Config, optional) – The decoder config that controls decoding keypoint coordinates from the network output. Defaults to
None
init_cfg (Config, optional) – Config to control the initialization. See
default_init_cfg
for default settings
losses¶
- class mmpose.models.losses.AdaptiveWingLoss(alpha=2.1, omega=14, epsilon=1, theta=0.5, use_target_weight=False, loss_weight=1.0)[source]¶
Adaptive wing loss. paper ref: ‘Adaptive Wing Loss for Robust Face Alignment via Heatmap Regression’ Wang et al. ICCV’2019.
- Parameters
alpha (float), omega (float), epsilon (float), theta (float) – are hyper-parameters.
use_target_weight (bool) – Option to use weighted MSE loss. Different joint types may have different target weights.
loss_weight (float) – Weight of the loss. Default: 1.0.
- criterion(pred, target)[source]¶
Criterion of wingloss.
Note
batch_size: N num_keypoints: K
- Parameters
pred (torch.Tensor[NxKxHxW]) – Predicted heatmaps.
target (torch.Tensor[NxKxHxW]) – Target heatmaps.
- forward(output: torch.Tensor, target: torch.Tensor, target_weights: Optional[torch.Tensor] = None)[source]¶
Forward function.
Note
batch_size: N num_keypoints: K
- Parameters
output (torch.Tensor[N, K, H, W]) – Output heatmaps.
target (torch.Tensor[N, K, H, W]) – Target heatmaps.
target_weight (torch.Tensor[N, K]) – Weights across different joint types.
- class mmpose.models.losses.AssociativeEmbeddingLoss(loss_weight: float = 1.0, push_loss_factor: float = 0.5)[source]¶
Associative Embedding loss.
Details can be found in Associative Embedding
Note
batch size: B
instance number: N
keypoint number: K
keypoint dimension: D
embedding tag dimension: L
heatmap size: [W, H]
- Parameters
loss_weight (float) – Weight of the loss. Defaults to 1.0
push_loss_factor (float) – A factor that controls the weight between the push loss and the pull loss. Defaults to 0.5
- forward(tags: torch.Tensor, keypoint_indices: Union[List[torch.Tensor], torch.Tensor])[source]¶
Compute associative embedding loss on a batch of data.
- Parameters
tags (Tensor) – Tagging heatmaps in shape (B, L*K, H, W)
keypoint_indices (Tensor|List[Tensor]) – Ground-truth keypint position indices represented by a Tensor in shape (B, N, K, 2), or a list of B Tensors in shape (N_i, K, 2) Each keypoint’s index is represented as [i, v], where i is the position index in the heatmap (\(i=y*w+x\)) and v is the visibility
- Returns
pull_loss (Tensor)
push_loss (Tensor)
- Return type
tuple
- class mmpose.models.losses.BCELoss(use_target_weight=False, loss_weight=1.0)[source]¶
Binary Cross Entropy loss.
- Parameters
use_target_weight (bool) – Option to use weighted loss. Different joint types may have different target weights.
loss_weight (float) – Weight of the loss. Default: 1.0.
- forward(output, target, target_weight=None)[source]¶
Forward function.
Note
batch_size: N
num_labels: K
- Parameters
output (torch.Tensor[N, K]) – Output classification.
target (torch.Tensor[N, K]) – Target classification.
target_weight (torch.Tensor[N, K] or torch.Tensor[N]) – Weights across different labels.
- class mmpose.models.losses.BoneLoss(joint_parents, use_target_weight=False, loss_weight=1.0)[source]¶
Bone length loss.
- Parameters
joint_parents (list) – Indices of each joint’s parent joint.
use_target_weight (bool) – Option to use weighted bone loss. Different bone types may have different target weights.
loss_weight (float) – Weight of the loss. Default: 1.0.
- forward(output, target, target_weight=None)[source]¶
Forward function.
Note
batch_size: N
num_keypoints: K
dimension of keypoints: D (D=2 or D=3)
- Parameters
output (torch.Tensor[N, K, D]) – Output regression.
target (torch.Tensor[N, K, D]) – Target regression.
target_weight (torch.Tensor[N, K-1]) – Weights across different bone types.
- class mmpose.models.losses.CombinedLoss(losses: Dict[str, Union[mmengine.config.config.ConfigDict, dict]])[source]¶
A wrapper to combine multiple loss functions. These loss functions can have different input type (e.g. heatmaps or regression values), and can only be involed individually and explixitly.
- Parameters
losses (Dict[str, ConfigType]) – The names and configs of loss functions to be wrapped
- Example::
>>> heatmap_loss_cfg = dict(type='KeypointMSELoss') >>> ae_loss_cfg = dict(type='AssociativeEmbeddingLoss') >>> loss_module = CombinedLoss( ... losses=dict( ... heatmap_loss=heatmap_loss_cfg, ... ae_loss=ae_loss_cfg)) >>> loss_hm = loss_module.heatmap_loss(pred_heatmap, gt_heatmap) >>> loss_ae = loss_module.ae_loss(pred_tags, keypoint_indices)
- class mmpose.models.losses.JSDiscretLoss(use_target_weight=True, size_average: bool = True)[source]¶
Discrete JS Divergence loss for DSNT with Gaussian Heatmap.
Modified from the official implementation.
- Parameters
use_target_weight (bool) – Option to use weighted loss. Different joint types may have different target weights.
size_average (bool) – Option to average the loss by the batch_size.
- forward(pred_hm, gt_hm, target_weight=None)[source]¶
Forward function.
- Parameters
pred_hm (torch.Tensor[N, K, H, W]) – Predicted heatmaps.
gt_hm (torch.Tensor[N, K, H, W]) – Target heatmaps.
target_weight (torch.Tensor[N, K] or torch.Tensor[N]) – Weights across different labels.
- Returns
Loss value.
- Return type
torch.Tensor
- class mmpose.models.losses.KLDiscretLoss(beta=1.0, label_softmax=False, use_target_weight=True)[source]¶
Discrete KL Divergence loss for SimCC with Gaussian Label Smoothing. Modified from `the official implementation.
<https://github.com/leeyegy/SimCC>`_. :param beta: Temperature factor of Softmax. :type beta: float :param label_softmax: Whether to use Softmax on labels. :type label_softmax: bool :param use_target_weight: Option to use weighted loss.
Different joint types may have different target weights.
- forward(pred_simcc, gt_simcc, target_weight)[source]¶
Forward function.
- Parameters
pred_simcc (Tuple[Tensor, Tensor]) – Predicted SimCC vectors of x-axis and y-axis.
gt_simcc (Tuple[Tensor, Tensor]) – Target representations.
target_weight (torch.Tensor[N, K] or torch.Tensor[N]) – Weights across different labels.
- class mmpose.models.losses.KeypointMSELoss(use_target_weight: bool = False, skip_empty_channel: bool = False, loss_weight: float = 1.0)[source]¶
MSE loss for heatmaps.
- Parameters
use_target_weight (bool) – Option to use weighted MSE loss. Different joint types may have different target weights. Defaults to
False
skip_empty_channel (bool) – If
True
, heatmap channels with no non-zero value (which means no visible ground-truth keypoint in the image) will not be used to calculate the loss. Defaults toFalse
loss_weight (float) – Weight of the loss. Defaults to 1.0
- forward(output: torch.Tensor, target: torch.Tensor, target_weights: Optional[torch.Tensor] = None, mask: Optional[torch.Tensor] = None) torch.Tensor [source]¶
Forward function of loss.
Note
batch_size: B
num_keypoints: K
heatmaps height: H
heatmaps weight: W
- Parameters
output (Tensor) – The output heatmaps with shape [B, K, H, W]
target (Tensor) – The target heatmaps with shape [B, K, H, W]
target_weights (Tensor, optional) – The target weights of differet keypoints, with shape [B, K] (keypoint-wise) or [B, K, H, W] (pixel-wise).
mask (Tensor, optional) – The masks of valid heatmap pixels in shape [B, K, H, W] or [B, 1, H, W]. If
None
, no mask will be applied. Defaults toNone
- Returns
The calculated loss.
- Return type
Tensor
- class mmpose.models.losses.KeypointOHKMMSELoss(use_target_weight: bool = False, topk: int = 8, loss_weight: float = 1.0)[source]¶
MSE loss with online hard keypoint mining.
- Parameters
use_target_weight (bool) – Option to use weighted MSE loss. Different joint types may have different target weights. Defaults to
False
topk (int) – Only top k joint losses are kept. Defaults to 8
loss_weight (float) – Weight of the loss. Defaults to 1.0
- forward(output: torch.Tensor, target: torch.Tensor, target_weights: torch.Tensor) torch.Tensor [source]¶
Forward function of loss.
Note
batch_size: B
num_keypoints: K
heatmaps height: H
heatmaps weight: W
- Parameters
output (Tensor) – The output heatmaps with shape [B, K, H, W].
target (Tensor) – The target heatmaps with shape [B, K, H, W].
target_weights (Tensor) – The target weights of differet keypoints, with shape [B, K].
- Returns
The calculated loss.
- Return type
Tensor
- class mmpose.models.losses.MPJPELoss(use_target_weight=False, loss_weight=1.0)[source]¶
MPJPE (Mean Per Joint Position Error) loss.
- Parameters
use_target_weight (bool) – Option to use weighted MSE loss. Different joint types may have different target weights.
loss_weight (float) – Weight of the loss. Default: 1.0.
- forward(output, target, target_weight=None)[source]¶
Forward function.
Note
batch_size: N
num_keypoints: K
dimension of keypoints: D (D=2 or D=3)
- Parameters
output (torch.Tensor[N, K, D]) – Output regression.
target (torch.Tensor[N, K, D]) – Target regression.
target_weight (torch.Tensor[N,K,D]) – Weights across different joint types.
- class mmpose.models.losses.MSELoss(use_target_weight=False, loss_weight=1.0)[source]¶
MSE loss for coordinate regression.
- class mmpose.models.losses.MultipleLossWrapper(losses: list)[source]¶
A wrapper to collect multiple loss functions together and return a list of losses in the same order.
- Parameters
losses (list) – List of Loss Config
- forward(input_list, target_list, keypoint_weights=None)[source]¶
Forward function.
Note
batch_size: N
num_keypoints: K
dimension of keypoints: D (D=2 or D=3)
- Parameters
input_list (List[Tensor]) – List of inputs.
target_list (List[Tensor]) – List of targets.
keypoint_weights (Tensor[N, K, D]) – Weights across different joint types.
- class mmpose.models.losses.RLELoss(use_target_weight=False, size_average=True, residual=True, q_distribution='laplace')[source]¶
RLE Loss.
Human Pose Regression With Residual Log-Likelihood Estimation arXiv:.
Code is modified from the official implementation.
- Parameters
use_target_weight (bool) – Option to use weighted loss. Different joint types may have different target weights.
size_average (bool) – Option to average the loss by the batch_size.
residual (bool) – Option to add L1 loss and let the flow learn the residual error distribution.
q_dis (string) – Option for the identity Q(error) distribution, Options: “laplace” or “gaussian”
- forward(pred, sigma, target, target_weight=None)[source]¶
Forward function.
Note
batch_size: N
num_keypoints: K
dimension of keypoints: D (D=2 or D=3)
- Parameters
pred (Tensor[N, K, D]) – Output regression.
sigma (Tensor[N, K, D]) – Output sigma.
target (Tensor[N, K, D]) – Target regression.
target_weight (Tensor[N, K, D]) – Weights across different joint types.
- class mmpose.models.losses.SemiSupervisionLoss(joint_parents, projection_loss_weight=1.0, bone_loss_weight=1.0, warmup_iterations=0)[source]¶
Semi-supervision loss for unlabeled data. It is composed of projection loss and bone loss.
Paper ref: 3D human pose estimation in video with temporal convolutions and semi-supervised training Dario Pavllo et al. CVPR’2019.
- Parameters
joint_parents (list) – Indices of each joint’s parent joint.
projection_loss_weight (float) – Weight for projection loss.
bone_loss_weight (float) – Weight for bone loss.
warmup_iterations (int) – Number of warmup iterations. In the first warmup_iterations iterations, the model is trained only on labeled data, and semi-supervision loss will be 0. This is a workaround since currently we cannot access epoch number in loss functions. Note that the iteration number in an epoch can be changed due to different GPU numbers in multi-GPU settings. So please set this parameter carefully. warmup_iterations = dataset_size // samples_per_gpu // gpu_num * warmup_epochs
- forward(output, target)[source]¶
Defines the computation performed at every call.
Should be overridden by all subclasses.
Note
Although the recipe for forward pass needs to be defined within this function, one should call the
Module
instance afterwards instead of this since the former takes care of running the registered hooks while the latter silently ignores them.
- class mmpose.models.losses.SmoothL1Loss(use_target_weight=False, loss_weight=1.0)[source]¶
SmoothL1Loss loss.
- Parameters
use_target_weight (bool) – Option to use weighted MSE loss. Different joint types may have different target weights.
loss_weight (float) – Weight of the loss. Default: 1.0.
- forward(output, target, target_weight=None)[source]¶
Forward function.
Note
batch_size: N
num_keypoints: K
dimension of keypoints: D (D=2 or D=3)
- Parameters
output (torch.Tensor[N, K, D]) – Output regression.
target (torch.Tensor[N, K, D]) – Target regression.
target_weight (torch.Tensor[N, K, D]) – Weights across different joint types.
- class mmpose.models.losses.SoftWeightSmoothL1Loss(use_target_weight=False, supervise_empty=True, beta=1.0, loss_weight=1.0)[source]¶
Smooth L1 loss with soft weight for regression.
- Parameters
use_target_weight (bool) – Option to use weighted MSE loss. Different joint types may have different target weights.
supervise_empty (bool) – Whether to supervise the output with zero weight.
beta (float) – Specifies the threshold at which to change between L1 and L2 loss.
loss_weight (float) – Weight of the loss. Default: 1.0.
- forward(output, target, target_weight=None)[source]¶
Forward function.
Note
batch_size: N
num_keypoints: K
dimension of keypoints: D (D=2 or D=3)
- Parameters
output (torch.Tensor[N, K, D]) – Output regression.
target (torch.Tensor[N, K, D]) – Target regression.
target_weight (torch.Tensor[N, K, D]) – Weights across different joint types.
- class mmpose.models.losses.SoftWingLoss(omega1=2.0, omega2=20.0, epsilon=0.5, use_target_weight=False, loss_weight=1.0)[source]¶
Soft Wing Loss ‘Structure-Coherent Deep Feature Learning for Robust Face Alignment’ Lin et al. TIP’2021.
- Parameters
omega1 (float) – The first threshold.
omega2 (float) – The second threshold.
epsilon (float) – Also referred to as curvature.
use_target_weight (bool) – Option to use weighted MSE loss. Different joint types may have different target weights.
loss_weight (float) – Weight of the loss. Default: 1.0.
- criterion(pred, target)[source]¶
Criterion of wingloss.
Note
batch_size: N num_keypoints: K dimension of keypoints: D (D=2 or D=3)
- Parameters
pred (torch.Tensor[N, K, D]) – Output regression.
target (torch.Tensor[N, K, D]) – Target regression.
- forward(output, target, target_weight=None)[source]¶
Forward function.
Note
batch_size: N num_keypoints: K dimension of keypoints: D (D=2 or D=3)
- Parameters
output (torch.Tensor[N, K, D]) – Output regression.
target (torch.Tensor[N, K, D]) – Target regression.
target_weight (torch.Tensor[N, K, D]) – Weights across different joint types.
- class mmpose.models.losses.WingLoss(omega=10.0, epsilon=2.0, use_target_weight=False, loss_weight=1.0)[source]¶
Wing Loss. paper ref: ‘Wing Loss for Robust Facial Landmark Localisation with Convolutional Neural Networks’ Feng et al. CVPR’2018.
- Parameters
omega (float) – Also referred to as width.
epsilon (float) – Also referred to as curvature.
use_target_weight (bool) – Option to use weighted MSE loss. Different joint types may have different target weights.
loss_weight (float) – Weight of the loss. Default: 1.0.
- criterion(pred, target)[source]¶
Criterion of wingloss.
Note
batch_size: N
num_keypoints: K
dimension of keypoints: D (D=2 or D=3)
- Parameters
pred (torch.Tensor[N, K, D]) – Output regression.
target (torch.Tensor[N, K, D]) – Target regression.
- forward(output, target, target_weight=None)[source]¶
Forward function.
Note
batch_size: N
num_keypoints: K
dimension of keypoints: D (D=2 or D=3)
- Parameters
output (torch.Tensor[N, K, D]) – Output regression.
target (torch.Tensor[N, K, D]) – Target regression.
target_weight (torch.Tensor[N,K,D]) – Weights across different joint types.
misc¶
- class mmpose.models.utils.PatchEmbed(in_channels=3, embed_dims=768, conv_type='Conv2d', kernel_size=16, stride=16, padding='corner', dilation=1, bias=True, norm_cfg=None, input_size=None, init_cfg=None)[source]¶
Image to Patch Embedding.
We use a conv layer to implement PatchEmbed.
- Parameters
in_channels (int) – The num of input channels. Default: 3
embed_dims (int) – The dimensions of embedding. Default: 768
conv_type (str) – The config dict for embedding conv layer type selection. Default: “Conv2d.
kernel_size (int) – The kernel_size of embedding conv. Default: 16.
stride (int) – The slide stride of embedding conv. Default: None (Would be set as kernel_size).
padding (int | tuple | string) – The padding length of embedding conv. When it is a string, it means the mode of adaptive padding, support “same” and “corner” now. Default: “corner”.
dilation (int) – The dilation rate of embedding conv. Default: 1.
bias (bool) – Bias of embed conv. Default: True.
norm_cfg (dict, optional) – Config dict for normalization layer. Default: None.
input_size (int | tuple | None) – The size of input, which will be used to calculate the out size. Only work when dynamic_size is False. Default: None.
init_cfg (mmcv.ConfigDict, optional) – The Config for initialization. Default: None.
- class mmpose.models.utils.RTMCCBlock(num_token, in_token_dims, out_token_dims, expansion_factor=2, s=128, eps=1e-05, dropout_rate=0.0, drop_path=0.0, attn_type='self-attn', act_fn='SiLU', bias=False, use_rel_bias=True, pos_enc=False)[source]¶
Gated Attention Unit (GAU) in RTMBlock.
- Parameters
num_token (int) – The number of tokens.
in_token_dims (int) – The input token dimension.
out_token_dims (int) – The output token dimension.
expansion_factor (int, optional) – The expansion factor of the intermediate token dimension. Defaults to 2.
s (int, optional) – The self-attention feature dimension. Defaults to 128.
eps (float, optional) – The minimum value in clamp. Defaults to 1e-5.
dropout_rate (float, optional) – The dropout rate. Defaults to 0.0.
drop_path (float, optional) – The drop path rate. Defaults to 0.0.
attn_type (str, optional) –
Type of attention which should be one of the following options:
’self-attn’: Self-attention.
’cross-attn’: Cross-attention.
Defaults to ‘self-attn’.
act_fn (str, optional) –
The activation function which should be one of the following options:
’ReLU’: ReLU activation.
’SiLU’: SiLU activation.
Defaults to ‘SiLU’.
bias (bool, optional) – Whether to use bias in linear layers. Defaults to False.
use_rel_bias (bool, optional) – Whether to use relative bias. Defaults to True.
pos_enc (bool, optional) – Whether to use rotary position embedding. Defaults to False.
- Reference:
- mmpose.models.utils.nchw_to_nlc(x)[source]¶
Flatten [N, C, H, W] shape tensor to [N, L, C] shape tensor.
- Parameters
x (Tensor) – The input tensor of shape [N, C, H, W] before conversion.
- Returns
The output tensor of shape [N, L, C] after conversion.
- Return type
Tensor
- mmpose.models.utils.nlc_to_nchw(x, hw_shape)[source]¶
Convert [N, L, C] shape tensor to [N, C, H, W] shape tensor.
- Parameters
x (Tensor) – The input tensor of shape [N, L, C] before conversion.
hw_shape (Sequence[int]) – The height and width of output feature map.
- Returns
The output tensor of shape [N, C, H, W] after conversion.
- Return type
Tensor
- mmpose.models.utils.rope(x, dim)[source]¶
Applies Rotary Position Embedding to input tensor.
- Parameters
x (torch.Tensor) – Input tensor.
dim (int | list[int]) – The spatial dimension(s) to apply rotary position embedding.
- Returns
- The tensor after applying rotary position
embedding.
- Return type
torch.Tensor
mmpose.datasets¶
- class mmpose.datasets.CombinedDataset(metainfo: dict, datasets: list, pipeline: List[Union[dict, Callable]] = [], **kwargs)[source]¶
A wrapper of combined dataset.
- Parameters
metainfo (dict) – The meta information of combined dataset.
datasets (list) – The configs of datasets to be combined.
pipeline (list, optional) – Processing pipeline. Defaults to [].
- get_data_info(idx: int) dict [source]¶
Get annotation by index.
- Parameters
idx (int) – Global index of
CombinedDataset
.- Returns
The idx-th annotation of the datasets.
- Return type
dict
- property metainfo¶
Get meta information of dataset.
- Returns
meta information collected from
BaseDataset.METAINFO
, annotation file and metainfo argument during instantiation.- Return type
dict
- class mmpose.datasets.MultiSourceSampler(dataset: Sized, batch_size: int, source_ratio: List[Union[int, float]], shuffle: bool = True, round_up: bool = True, seed: Optional[int] = None)[source]¶
Multi-Source Sampler. According to the sampling ratio, sample data from different datasets to form batches.
- Parameters
dataset (Sized) – The dataset
batch_size (int) – Size of mini-batch
source_ratio (list[int | float]) – The sampling ratio of different source datasets in a mini-batch
shuffle (bool) – Whether shuffle the dataset or not. Defaults to
True
round_up (bool) – Whether to add extra samples to make the number of samples evenly divisible by the world size. Defaults to True.
seed (int, optional) – Random seed. If
None
, set a random seed. Defaults toNone
- mmpose.datasets.build_dataset(cfg, default_args=None)[source]¶
Build a dataset from config dict.
- Parameters
cfg (dict) – Config dict. It should at least contain the key “type”.
default_args (dict, optional) – Default initialization arguments. Default: None.
- Returns
The constructed dataset.
- Return type
Dataset
datasets¶
- class mmpose.datasets.datasets.base.BaseCocoStyleDataset(ann_file: str = '', bbox_file: Optional[str] = None, data_mode: str = 'topdown', metainfo: Optional[dict] = None, data_root: Optional[str] = None, data_prefix: dict = {'img': ''}, filter_cfg: Optional[dict] = None, indices: Optional[Union[int, Sequence[int]]] = None, serialize_data: bool = True, pipeline: List[Union[dict, Callable]] = [], test_mode: bool = False, lazy_init: bool = False, max_refetch: int = 1000)[source]
Base class for COCO-style datasets.
- Parameters
ann_file (str) – Annotation file path. Default: ‘’.
bbox_file (str, optional) – Detection result file path. If
bbox_file
is set, detected bboxes loaded from this file will be used instead of ground-truth bboxes. This setting is only for evaluation, i.e., ignored whentest_mode
isFalse
. Default:None
.data_mode (str) – Specifies the mode of data samples:
'topdown'
or'bottomup'
. In'topdown'
mode, each data sample contains one instance; while in'bottomup'
mode, each data sample contains all instances in a image. Default:'topdown'
metainfo (dict, optional) – Meta information for dataset, such as class information. Default:
None
.data_root (str, optional) – The root directory for
data_prefix
andann_file
. Default:None
.data_prefix (dict, optional) – Prefix for training data. Default:
dict(img='')
.filter_cfg (dict, optional) – Config for filter data. Default: None.
indices (int or Sequence[int], optional) – Support using first few data in annotation file to facilitate training/testing on a smaller dataset. Default:
None
which means using alldata_infos
.serialize_data (bool, optional) – Whether to hold memory using serialized objects, when enabled, data loader workers can use shared RAM from master process instead of making a copy. Default:
True
.pipeline (list, optional) – Processing pipeline. Default: [].
test_mode (bool, optional) –
test_mode=True
means in test phase. Default:False
.lazy_init (bool, optional) – Whether to load annotation during instantiation. In some cases, such as visualization, only the meta information of the dataset is needed, which is not necessary to load annotation file.
Basedataset
can skip load annotations to save time by setlazy_init=False
. Default:False
.max_refetch (int, optional) – If
Basedataset.prepare_data
get a None img. The maximum extra number of cycles to get a valid image. Default: 1000.
- filter_data() List[dict] [source]
Filter annotations according to filter_cfg. Defaults return full
data_list
.If ‘bbox_score_thr` in filter_cfg, the annotation with bbox_score below the threshold bbox_score_thr will be filtered out.
- get_data_info(idx: int) dict [source]
Get data info by index.
- Parameters
idx (int) – Index of data info.
- Returns
Data info.
- Return type
dict
- load_data_list() List[dict] [source]
Load data list from COCO annotation file or person detection result file.
- parse_data_info(raw_data_info: dict) Optional[dict] [source]
Parse raw COCO annotation of an instance.
- Parameters
raw_data_info (dict) –
Raw data information loaded from
ann_file
. It should have following contents:'raw_ann_info'
: Raw annotation of an instance'raw_img_info'
: Raw information of the image thatcontains the instance
- Returns
Parsed instance annotation
- Return type
dict | None
- prepare_data(idx) Any [source]
Get data processed by
self.pipeline
.BaseCocoStyleDataset
overrides this method frommmengine.dataset.BaseDataset
to add the metainfo into thedata_info
before it is passed to the pipeline.- Parameters
idx (int) – The index of
data_info
.- Returns
Depends on
self.pipeline
.- Return type
Any
- class mmpose.datasets.datasets.body.AicDataset(ann_file: str = '', bbox_file: Optional[str] = None, data_mode: str = 'topdown', metainfo: Optional[dict] = None, data_root: Optional[str] = None, data_prefix: dict = {'img': ''}, filter_cfg: Optional[dict] = None, indices: Optional[Union[int, Sequence[int]]] = None, serialize_data: bool = True, pipeline: List[Union[dict, Callable]] = [], test_mode: bool = False, lazy_init: bool = False, max_refetch: int = 1000)[source]
AIC dataset for pose estimation.
“AI Challenger : A Large-scale Dataset for Going Deeper in Image Understanding”, arXiv’2017. More details can be found in the paper
AIC keypoints:
0: "right_shoulder", 1: "right_elbow", 2: "right_wrist", 3: "left_shoulder", 4: "left_elbow", 5: "left_wrist", 6: "right_hip", 7: "right_knee", 8: "right_ankle", 9: "left_hip", 10: "left_knee", 11: "left_ankle", 12: "head_top", 13: "neck"
- Parameters
ann_file (str) – Annotation file path. Default: ‘’.
bbox_file (str, optional) – Detection result file path. If
bbox_file
is set, detected bboxes loaded from this file will be used instead of ground-truth bboxes. This setting is only for evaluation, i.e., ignored whentest_mode
isFalse
. Default:None
.data_mode (str) – Specifies the mode of data samples:
'topdown'
or'bottomup'
. In'topdown'
mode, each data sample contains one instance; while in'bottomup'
mode, each data sample contains all instances in a image. Default:'topdown'
metainfo (dict, optional) – Meta information for dataset, such as class information. Default:
None
.data_root (str, optional) – The root directory for
data_prefix
andann_file
. Default:None
.data_prefix (dict, optional) – Prefix for training data. Default:
dict(img=None, ann=None)
.filter_cfg (dict, optional) – Config for filter data. Default: None.
indices (int or Sequence[int], optional) – Support using first few data in annotation file to facilitate training/testing on a smaller dataset. Default:
None
which means using alldata_infos
.serialize_data (bool, optional) – Whether to hold memory using serialized objects, when enabled, data loader workers can use shared RAM from master process instead of making a copy. Default:
True
.pipeline (list, optional) – Processing pipeline. Default: [].
test_mode (bool, optional) –
test_mode=True
means in test phase. Default:False
.lazy_init (bool, optional) – Whether to load annotation during instantiation. In some cases, such as visualization, only the meta information of the dataset is needed, which is not necessary to load annotation file.
Basedataset
can skip load annotations to save time by setlazy_init=False
. Default:False
.max_refetch (int, optional) – If
Basedataset.prepare_data
get a None img. The maximum extra number of cycles to get a valid image. Default: 1000.
- class mmpose.datasets.datasets.body.CocoDataset(ann_file: str = '', bbox_file: Optional[str] = None, data_mode: str = 'topdown', metainfo: Optional[dict] = None, data_root: Optional[str] = None, data_prefix: dict = {'img': ''}, filter_cfg: Optional[dict] = None, indices: Optional[Union[int, Sequence[int]]] = None, serialize_data: bool = True, pipeline: List[Union[dict, Callable]] = [], test_mode: bool = False, lazy_init: bool = False, max_refetch: int = 1000)[source]
COCO dataset for pose estimation.
“Microsoft COCO: Common Objects in Context”, ECCV’2014. More details can be found in the paper .
COCO keypoints:
0: 'nose', 1: 'left_eye', 2: 'right_eye', 3: 'left_ear', 4: 'right_ear', 5: 'left_shoulder', 6: 'right_shoulder', 7: 'left_elbow', 8: 'right_elbow', 9: 'left_wrist', 10: 'right_wrist', 11: 'left_hip', 12: 'right_hip', 13: 'left_knee', 14: 'right_knee', 15: 'left_ankle', 16: 'right_ankle'
- Parameters
ann_file (str) – Annotation file path. Default: ‘’.
bbox_file (str, optional) – Detection result file path. If
bbox_file
is set, detected bboxes loaded from this file will be used instead of ground-truth bboxes. This setting is only for evaluation, i.e., ignored whentest_mode
isFalse
. Default:None
.data_mode (str) – Specifies the mode of data samples:
'topdown'
or'bottomup'
. In'topdown'
mode, each data sample contains one instance; while in'bottomup'
mode, each data sample contains all instances in a image. Default:'topdown'
metainfo (dict, optional) – Meta information for dataset, such as class information. Default:
None
.data_root (str, optional) – The root directory for
data_prefix
andann_file
. Default:None
.data_prefix (dict, optional) – Prefix for training data. Default:
dict(img=None, ann=None)
.filter_cfg (dict, optional) – Config for filter data. Default: None.
indices (int or Sequence[int], optional) – Support using first few data in annotation file to facilitate training/testing on a smaller dataset. Default:
None
which means using alldata_infos
.serialize_data (bool, optional) – Whether to hold memory using serialized objects, when enabled, data loader workers can use shared RAM from master process instead of making a copy. Default:
True
.pipeline (list, optional) – Processing pipeline. Default: [].
test_mode (bool, optional) –
test_mode=True
means in test phase. Default:False
.lazy_init (bool, optional) – Whether to load annotation during instantiation. In some cases, such as visualization, only the meta information of the dataset is needed, which is not necessary to load annotation file.
Basedataset
can skip load annotations to save time by setlazy_init=False
. Default:False
.max_refetch (int, optional) – If
Basedataset.prepare_data
get a None img. The maximum extra number of cycles to get a valid image. Default: 1000.
- class mmpose.datasets.datasets.body.CrowdPoseDataset(ann_file: str = '', bbox_file: Optional[str] = None, data_mode: str = 'topdown', metainfo: Optional[dict] = None, data_root: Optional[str] = None, data_prefix: dict = {'img': ''}, filter_cfg: Optional[dict] = None, indices: Optional[Union[int, Sequence[int]]] = None, serialize_data: bool = True, pipeline: List[Union[dict, Callable]] = [], test_mode: bool = False, lazy_init: bool = False, max_refetch: int = 1000)[source]
CrowdPose dataset for pose estimation.
“CrowdPose: Efficient Crowded Scenes Pose Estimation and A New Benchmark”, CVPR’2019. More details can be found in the paper.
CrowdPose keypoints:
0: 'left_shoulder', 1: 'right_shoulder', 2: 'left_elbow', 3: 'right_elbow', 4: 'left_wrist', 5: 'right_wrist', 6: 'left_hip', 7: 'right_hip', 8: 'left_knee', 9: 'right_knee', 10: 'left_ankle', 11: 'right_ankle', 12: 'top_head', 13: 'neck'
- Parameters
ann_file (str) – Annotation file path. Default: ‘’.
bbox_file (str, optional) – Detection result file path. If
bbox_file
is set, detected bboxes loaded from this file will be used instead of ground-truth bboxes. This setting is only for evaluation, i.e., ignored whentest_mode
isFalse
. Default:None
.data_mode (str) – Specifies the mode of data samples:
'topdown'
or'bottomup'
. In'topdown'
mode, each data sample contains one instance; while in'bottomup'
mode, each data sample contains all instances in a image. Default:'topdown'
metainfo (dict, optional) – Meta information for dataset, such as class information. Default:
None
.data_root (str, optional) – The root directory for
data_prefix
andann_file
. Default:None
.data_prefix (dict, optional) – Prefix for training data. Default:
dict(img=None, ann=None)
.filter_cfg (dict, optional) – Config for filter data. Default: None.
indices (int or Sequence[int], optional) – Support using first few data in annotation file to facilitate training/testing on a smaller dataset. Default:
None
which means using alldata_infos
.serialize_data (bool, optional) – Whether to hold memory using serialized objects, when enabled, data loader workers can use shared RAM from master process instead of making a copy. Default:
True
.pipeline (list, optional) – Processing pipeline. Default: [].
test_mode (bool, optional) –
test_mode=True
means in test phase. Default:False
.lazy_init (bool, optional) – Whether to load annotation during instantiation. In some cases, such as visualization, only the meta information of the dataset is needed, which is not necessary to load annotation file.
Basedataset
can skip load annotations to save time by setlazy_init=False
. Default:False
.max_refetch (int, optional) – If
Basedataset.prepare_data
get a None img. The maximum extra number of cycles to get a valid image. Default: 1000.
- class mmpose.datasets.datasets.body.JhmdbDataset(ann_file: str = '', bbox_file: Optional[str] = None, data_mode: str = 'topdown', metainfo: Optional[dict] = None, data_root: Optional[str] = None, data_prefix: dict = {'img': ''}, filter_cfg: Optional[dict] = None, indices: Optional[Union[int, Sequence[int]]] = None, serialize_data: bool = True, pipeline: List[Union[dict, Callable]] = [], test_mode: bool = False, lazy_init: bool = False, max_refetch: int = 1000)[source]
JhmdbDataset dataset for pose estimation.
“Towards understanding action recognition”, ICCV’2013. More details can be found in the paper
sub-JHMDB keypoints:
0: "neck", 1: "belly", 2: "head", 3: "right_shoulder", 4: "left_shoulder", 5: "right_hip", 6: "left_hip", 7: "right_elbow", 8: "left_elbow", 9: "right_knee", 10: "left_knee", 11: "right_wrist", 12: "left_wrist", 13: "right_ankle", 14: "left_ankle"
- Parameters
ann_file (str) – Annotation file path. Default: ‘’.
bbox_file (str, optional) – Detection result file path. If
bbox_file
is set, detected bboxes loaded from this file will be used instead of ground-truth bboxes. This setting is only for evaluation, i.e., ignored whentest_mode
isFalse
. Default:None
.data_mode (str) – Specifies the mode of data samples:
'topdown'
or'bottomup'
. In'topdown'
mode, each data sample contains one instance; while in'bottomup'
mode, each data sample contains all instances in a image. Default:'topdown'
metainfo (dict, optional) – Meta information for dataset, such as class information. Default:
None
.data_root (str, optional) – The root directory for
data_prefix
andann_file
. Default:None
.data_prefix (dict, optional) – Prefix for training data. Default:
dict(img=None, ann=None)
.filter_cfg (dict, optional) – Config for filter data. Default: None.
indices (int or Sequence[int], optional) – Support using first few data in annotation file to facilitate training/testing on a smaller dataset. Default:
None
which means using alldata_infos
.serialize_data (bool, optional) – Whether to hold memory using serialized objects, when enabled, data loader workers can use shared RAM from master process instead of making a copy. Default:
True
.pipeline (list, optional) – Processing pipeline. Default: [].
test_mode (bool, optional) –
test_mode=True
means in test phase. Default:False
.lazy_init (bool, optional) – Whether to load annotation during instantiation. In some cases, such as visualization, only the meta information of the dataset is needed, which is not necessary to load annotation file.
Basedataset
can skip load annotations to save time by setlazy_init=False
. Default:False
.max_refetch (int, optional) – If
Basedataset.prepare_data
get a None img. The maximum extra number of cycles to get a valid image. Default: 1000.
- parse_data_info(raw_data_info: dict) Optional[dict] [source]
Parse raw COCO annotation of an instance.
- Parameters
raw_data_info (dict) –
Raw data information loaded from
ann_file
. It should have following contents:'raw_ann_info'
: Raw annotation of an instance'raw_img_info'
: Raw information of the image thatcontains the instance
- Returns
Parsed instance annotation
- Return type
dict
- class mmpose.datasets.datasets.body.MhpDataset(ann_file: str = '', bbox_file: Optional[str] = None, data_mode: str = 'topdown', metainfo: Optional[dict] = None, data_root: Optional[str] = None, data_prefix: dict = {'img': ''}, filter_cfg: Optional[dict] = None, indices: Optional[Union[int, Sequence[int]]] = None, serialize_data: bool = True, pipeline: List[Union[dict, Callable]] = [], test_mode: bool = False, lazy_init: bool = False, max_refetch: int = 1000)[source]
MHPv2.0 dataset for pose estimation.
“Understanding Humans in Crowded Scenes: Deep Nested Adversarial Learning and A New Benchmark for Multi-Human Parsing”, ACM MM’2018. More details can be found in the paper
MHP keypoints:
0: "right ankle", 1: "right knee", 2: "right hip", 3: "left hip", 4: "left knee", 5: "left ankle", 6: "pelvis", 7: "thorax", 8: "upper neck", 9: "head top", 10: "right wrist", 11: "right elbow", 12: "right shoulder", 13: "left shoulder", 14: "left elbow", 15: "left wrist",
- Parameters
ann_file (str) – Annotation file path. Default: ‘’.
bbox_file (str, optional) – Detection result file path. If
bbox_file
is set, detected bboxes loaded from this file will be used instead of ground-truth bboxes. This setting is only for evaluation, i.e., ignored whentest_mode
isFalse
. Default:None
.data_mode (str) – Specifies the mode of data samples:
'topdown'
or'bottomup'
. In'topdown'
mode, each data sample contains one instance; while in'bottomup'
mode, each data sample contains all instances in a image. Default:'topdown'
metainfo (dict, optional) – Meta information for dataset, such as class information. Default:
None
.data_root (str, optional) – The root directory for
data_prefix
andann_file
. Default:None
.data_prefix (dict, optional) – Prefix for training data. Default:
dict(img=None, ann=None)
.filter_cfg (dict, optional) – Config for filter data. Default: None.
indices (int or Sequence[int], optional) – Support using first few data in annotation file to facilitate training/testing on a smaller dataset. Default:
None
which means using alldata_infos
.serialize_data (bool, optional) – Whether to hold memory using serialized objects, when enabled, data loader workers can use shared RAM from master process instead of making a copy. Default:
True
.pipeline (list, optional) – Processing pipeline. Default: [].
test_mode (bool, optional) –
test_mode=True
means in test phase. Default:False
.lazy_init (bool, optional) – Whether to load annotation during instantiation. In some cases, such as visualization, only the meta information of the dataset is needed, which is not necessary to load annotation file.
Basedataset
can skip load annotations to save time by setlazy_init=False
. Default:False
.max_refetch (int, optional) – If
Basedataset.prepare_data
get a None img. The maximum extra number of cycles to get a valid image. Default: 1000.
- class mmpose.datasets.datasets.body.MpiiDataset(ann_file: str = '', bbox_file: Optional[str] = None, headbox_file: Optional[str] = None, data_mode: str = 'topdown', metainfo: Optional[dict] = None, data_root: Optional[str] = None, data_prefix: dict = {'img': ''}, filter_cfg: Optional[dict] = None, indices: Optional[Union[int, Sequence[int]]] = None, serialize_data: bool = True, pipeline: List[Union[dict, Callable]] = [], test_mode: bool = False, lazy_init: bool = False, max_refetch: int = 1000)[source]
MPII Dataset for pose estimation.
“2D Human Pose Estimation: New Benchmark and State of the Art Analysis” ,CVPR’2014. More details can be found in the paper .
MPII keypoints:
0: 'right_ankle' 1: 'right_knee', 2: 'right_hip', 3: 'left_hip', 4: 'left_knee', 5: 'left_ankle', 6: 'pelvis', 7: 'thorax', 8: 'upper_neck', 9: 'head_top', 10: 'right_wrist', 11: 'right_elbow', 12: 'right_shoulder', 13: 'left_shoulder', 14: 'left_elbow', 15: 'left_wrist'
- Parameters
ann_file (str) – Annotation file path. Default: ‘’.
bbox_file (str, optional) – Detection result file path. If
bbox_file
is set, detected bboxes loaded from this file will be used instead of ground-truth bboxes. This setting is only for evaluation, i.e., ignored whentest_mode
isFalse
. Default:None
.headbox_file (str, optional) – The path of
mpii_gt_val.mat
which provides the headboxes information used forPCKh
. Default:None
.data_mode (str) – Specifies the mode of data samples:
'topdown'
or'bottomup'
. In'topdown'
mode, each data sample contains one instance; while in'bottomup'
mode, each data sample contains all instances in a image. Default:'topdown'
metainfo (dict, optional) – Meta information for dataset, such as class information. Default:
None
.data_root (str, optional) – The root directory for
data_prefix
andann_file
. Default:None
.data_prefix (dict, optional) – Prefix for training data. Default:
dict(img=None, ann=None)
.filter_cfg (dict, optional) – Config for filter data. Default: None.
indices (int or Sequence[int], optional) – Support using first few data in annotation file to facilitate training/testing on a smaller dataset. Default:
None
which means using alldata_infos
.serialize_data (bool, optional) – Whether to hold memory using serialized objects, when enabled, data loader workers can use shared RAM from master process instead of making a copy. Default:
True
.pipeline (list, optional) – Processing pipeline. Default: [].
test_mode (bool, optional) –
test_mode=True
means in test phase. Default:False
.lazy_init (bool, optional) – Whether to load annotation during instantiation. In some cases, such as visualization, only the meta information of the dataset is needed, which is not necessary to load annotation file.
Basedataset
can skip load annotations to save time by setlazy_init=False
. Default:False
.max_refetch (int, optional) – If
Basedataset.prepare_data
get a None img. The maximum extra number of cycles to get a valid image. Default: 1000.
- class mmpose.datasets.datasets.body.MpiiTrbDataset(ann_file: str = '', bbox_file: Optional[str] = None, data_mode: str = 'topdown', metainfo: Optional[dict] = None, data_root: Optional[str] = None, data_prefix: dict = {'img': ''}, filter_cfg: Optional[dict] = None, indices: Optional[Union[int, Sequence[int]]] = None, serialize_data: bool = True, pipeline: List[Union[dict, Callable]] = [], test_mode: bool = False, lazy_init: bool = False, max_refetch: int = 1000)[source]
MPII-TRB Dataset dataset for pose estimation.
“TRB: A Novel Triplet Representation for Understanding 2D Human Body”, ICCV’2019. More details can be found in the paper .
MPII-TRB keypoints:
0: 'left_shoulder' 1: 'right_shoulder' 2: 'left_elbow' 3: 'right_elbow' 4: 'left_wrist' 5: 'right_wrist' 6: 'left_hip' 7: 'right_hip' 8: 'left_knee' 9: 'right_knee' 10: 'left_ankle' 11: 'right_ankle' 12: 'head' 13: 'neck' 14: 'right_neck' 15: 'left_neck' 16: 'medial_right_shoulder' 17: 'lateral_right_shoulder' 18: 'medial_right_bow' 19: 'lateral_right_bow' 20: 'medial_right_wrist' 21: 'lateral_right_wrist' 22: 'medial_left_shoulder' 23: 'lateral_left_shoulder' 24: 'medial_left_bow' 25: 'lateral_left_bow' 26: 'medial_left_wrist' 27: 'lateral_left_wrist' 28: 'medial_right_hip' 29: 'lateral_right_hip' 30: 'medial_right_knee' 31: 'lateral_right_knee' 32: 'medial_right_ankle' 33: 'lateral_right_ankle' 34: 'medial_left_hip' 35: 'lateral_left_hip' 36: 'medial_left_knee' 37: 'lateral_left_knee' 38: 'medial_left_ankle' 39: 'lateral_left_ankle'
- Parameters
ann_file (str) – Annotation file path. Default: ‘’.
bbox_file (str, optional) – Detection result file path. If
bbox_file
is set, detected bboxes loaded from this file will be used instead of ground-truth bboxes. This setting is only for evaluation, i.e., ignored whentest_mode
isFalse
. Default:None
.data_mode (str) – Specifies the mode of data samples:
'topdown'
or'bottomup'
. In'topdown'
mode, each data sample contains one instance; while in'bottomup'
mode, each data sample contains all instances in a image. Default:'topdown'
metainfo (dict, optional) – Meta information for dataset, such as class information. Default:
None
.data_root (str, optional) – The root directory for
data_prefix
andann_file
. Default:None
.data_prefix (dict, optional) – Prefix for training data. Default:
dict(img=None, ann=None)
.filter_cfg (dict, optional) – Config for filter data. Default: None.
indices (int or Sequence[int], optional) – Support using first few data in annotation file to facilitate training/testing on a smaller dataset. Default:
None
which means using alldata_infos
.serialize_data (bool, optional) – Whether to hold memory using serialized objects, when enabled, data loader workers can use shared RAM from master process instead of making a copy. Default:
True
.pipeline (list, optional) – Processing pipeline. Default: [].
test_mode (bool, optional) –
test_mode=True
means in test phase. Default:False
.lazy_init (bool, optional) – Whether to load annotation during instantiation. In some cases, such as visualization, only the meta information of the dataset is needed, which is not necessary to load annotation file.
Basedataset
can skip load annotations to save time by setlazy_init=False
. Default:False
.max_refetch (int, optional) – If
Basedataset.prepare_data
get a None img. The maximum extra number of cycles to get a valid image. Default: 1000.
- class mmpose.datasets.datasets.body.OCHumanDataset(ann_file: str = '', bbox_file: Optional[str] = None, data_mode: str = 'topdown', metainfo: Optional[dict] = None, data_root: Optional[str] = None, data_prefix: dict = {'img': ''}, filter_cfg: Optional[dict] = None, indices: Optional[Union[int, Sequence[int]]] = None, serialize_data: bool = True, pipeline: List[Union[dict, Callable]] = [], test_mode: bool = False, lazy_init: bool = False, max_refetch: int = 1000)[source]
OChuman dataset for pose estimation.
“Pose2Seg: Detection Free Human Instance Segmentation”, CVPR’2019. More details can be found in the paper .
“Occluded Human (OCHuman)” dataset contains 8110 heavily occluded human instances within 4731 images. OCHuman dataset is designed for validation and testing. To evaluate on OCHuman, the model should be trained on COCO training set, and then test the robustness of the model to occlusion using OCHuman.
OCHuman keypoints (same as COCO):
0: 'nose', 1: 'left_eye', 2: 'right_eye', 3: 'left_ear', 4: 'right_ear', 5: 'left_shoulder', 6: 'right_shoulder', 7: 'left_elbow', 8: 'right_elbow', 9: 'left_wrist', 10: 'right_wrist', 11: 'left_hip', 12: 'right_hip', 13: 'left_knee', 14: 'right_knee', 15: 'left_ankle', 16: 'right_ankle'
- Parameters
ann_file (str) – Annotation file path. Default: ‘’.
bbox_file (str, optional) – Detection result file path. If
bbox_file
is set, detected bboxes loaded from this file will be used instead of ground-truth bboxes. This setting is only for evaluation, i.e., ignored whentest_mode
isFalse
. Default:None
.data_mode (str) – Specifies the mode of data samples:
'topdown'
or'bottomup'
. In'topdown'
mode, each data sample contains one instance; while in'bottomup'
mode, each data sample contains all instances in a image. Default:'topdown'
metainfo (dict, optional) – Meta information for dataset, such as class information. Default:
None
.data_root (str, optional) – The root directory for
data_prefix
andann_file
. Default:None
.data_prefix (dict, optional) – Prefix for training data. Default:
dict(img=None, ann=None)
.filter_cfg (dict, optional) – Config for filter data. Default: None.
indices (int or Sequence[int], optional) – Support using first few data in annotation file to facilitate training/testing on a smaller dataset. Default:
None
which means using alldata_infos
.serialize_data (bool, optional) – Whether to hold memory using serialized objects, when enabled, data loader workers can use shared RAM from master process instead of making a copy. Default:
True
.pipeline (list, optional) – Processing pipeline. Default: [].
test_mode (bool, optional) –
test_mode=True
means in test phase. Default:False
.lazy_init (bool, optional) – Whether to load annotation during instantiation. In some cases, such as visualization, only the meta information of the dataset is needed, which is not necessary to load annotation file.
Basedataset
can skip load annotations to save time by setlazy_init=False
. Default:False
.max_refetch (int, optional) – If
Basedataset.prepare_data
get a None img. The maximum extra number of cycles to get a valid image. Default: 1000.
- class mmpose.datasets.datasets.body.PoseTrack18Dataset(ann_file: str = '', bbox_file: Optional[str] = None, data_mode: str = 'topdown', metainfo: Optional[dict] = None, data_root: Optional[str] = None, data_prefix: dict = {'img': ''}, filter_cfg: Optional[dict] = None, indices: Optional[Union[int, Sequence[int]]] = None, serialize_data: bool = True, pipeline: List[Union[dict, Callable]] = [], test_mode: bool = False, lazy_init: bool = False, max_refetch: int = 1000)[source]
PoseTrack18 dataset for pose estimation.
“Posetrack: A benchmark for human pose estimation and tracking”, CVPR’2018. More details can be found in the paper .
PoseTrack2018 keypoints:
0: 'nose', 1: 'head_bottom', 2: 'head_top', 3: 'left_ear', 4: 'right_ear', 5: 'left_shoulder', 6: 'right_shoulder', 7: 'left_elbow', 8: 'right_elbow', 9: 'left_wrist', 10: 'right_wrist', 11: 'left_hip', 12: 'right_hip', 13: 'left_knee', 14: 'right_knee', 15: 'left_ankle', 16: 'right_ankle'
- Parameters
ann_file (str) – Annotation file path. Default: ‘’.
bbox_file (str, optional) – Detection result file path. If
bbox_file
is set, detected bboxes loaded from this file will be used instead of ground-truth bboxes. This setting is only for evaluation, i.e., ignored whentest_mode
isFalse
. Default:None
.data_mode (str) – Specifies the mode of data samples:
'topdown'
or'bottomup'
. In'topdown'
mode, each data sample contains one instance; while in'bottomup'
mode, each data sample contains all instances in a image. Default:'topdown'
metainfo (dict, optional) – Meta information for dataset, such as class information. Default:
None
.data_root (str, optional) – The root directory for
data_prefix
andann_file
. Default:None
.data_prefix (dict, optional) – Prefix for training data. Default:
dict(img=None, ann=None)
.filter_cfg (dict, optional) – Config for filter data. Default: None.
indices (int or Sequence[int], optional) – Support using first few data in annotation file to facilitate training/testing on a smaller dataset. Default:
None
which means using alldata_infos
.serialize_data (bool, optional) – Whether to hold memory using serialized objects, when enabled, data loader workers can use shared RAM from master process instead of making a copy. Default:
True
.pipeline (list, optional) – Processing pipeline. Default: [].
test_mode (bool, optional) –
test_mode=True
means in test phase. Default:False
.lazy_init (bool, optional) – Whether to load annotation during instantiation. In some cases, such as visualization, only the meta information of the dataset is needed, which is not necessary to load annotation file.
Basedataset
can skip load annotations to save time by setlazy_init=False
. Default:False
.max_refetch (int, optional) – If
Basedataset.prepare_data
get a None img. The maximum extra number of cycles to get a valid image. Default: 1000.
- class mmpose.datasets.datasets.body.PoseTrack18VideoDataset(ann_file: str = '', bbox_file: Optional[str] = None, data_mode: str = 'topdown', frame_weights: List[Union[int, float]] = [0.0, 1.0], frame_sampler_mode: str = 'random', frame_range: Optional[Union[int, List[int]]] = None, num_sampled_frame: Optional[int] = None, frame_indices: Optional[Sequence[int]] = None, ph_fill_len: int = 6, metainfo: Optional[dict] = None, data_root: Optional[str] = None, data_prefix: dict = {'img': ''}, filter_cfg: Optional[dict] = None, indices: Optional[Union[int, Sequence[int]]] = None, serialize_data: bool = True, pipeline: List[Union[dict, Callable]] = [], test_mode: bool = False, lazy_init: bool = False, max_refetch: int = 1000)[source]
PoseTrack18 dataset for video pose estimation.
“Posetrack: A benchmark for human pose estimation and tracking”, CVPR’2018. More details can be found in the paper .
PoseTrack2018 keypoints:
0: 'nose', 1: 'head_bottom', 2: 'head_top', 3: 'left_ear', 4: 'right_ear', 5: 'left_shoulder', 6: 'right_shoulder', 7: 'left_elbow', 8: 'right_elbow', 9: 'left_wrist', 10: 'right_wrist', 11: 'left_hip', 12: 'right_hip', 13: 'left_knee', 14: 'right_knee', 15: 'left_ankle', 16: 'right_ankle'
- Parameters
ann_file (str) – Annotation file path. Default: ‘’.
bbox_file (str, optional) – Detection result file path. If
bbox_file
is set, detected bboxes loaded from this file will be used instead of ground-truth bboxes. This setting is only for evaluation, i.e., ignored whentest_mode
isFalse
. Default:None
.data_mode (str) – Specifies the mode of data samples:
'topdown'
or'bottomup'
. In'topdown'
mode, each data sample contains one instance; while in'bottomup'
mode, each data sample contains all instances in a image. Default:'topdown'
frame_weights (List[Union[int, float]]) – The weight of each frame for aggregation. The first weight is for the center frame, then on ascending order of frame indices. Note that the length of
frame_weights
should be consistent with the number of sampled frames. Default: [0.0, 1.0]frame_sampler_mode (str) – Specifies the mode of frame sampler:
'fixed'
or'random'
. In'fixed'
mode, each frame index relative to the center frame is fixed, specified byframe_indices
, while in'random'
mode, each frame index relative to the center frame is sampled fromframe_range
with certain randomness. Default:'random'
.frame_range (int | List[int], optional) – The sampling range of supporting frames in the same video for center frame. Only valid when
frame_sampler_mode
is'random'
. Default:None
.num_sampled_frame (int, optional) – The number of sampled frames, except the center frame. Only valid when
frame_sampler_mode
is'random'
. Default: 1.frame_indices (Sequence[int], optional) – The sampled frame indices, including the center frame indicated by 0. Only valid when
frame_sampler_mode
is'fixed'
. Default:None
.ph_fill_len (int) – The length of the placeholder to fill in the image filenames. Default: 6
metainfo (dict, optional) – Meta information for dataset, such as class information. Default:
None
.data_root (str, optional) – The root directory for
data_prefix
andann_file
. Default:None
.data_prefix (dict, optional) – Prefix for training data. Default:
dict(img='')
.filter_cfg (dict, optional) – Config for filter data. Default: None.
indices (int or Sequence[int], optional) – Support using first few data in annotation file to facilitate training/testing on a smaller dataset. Default:
None
which means using alldata_infos
.serialize_data (bool, optional) – Whether to hold memory using serialized objects, when enabled, data loader workers can use shared RAM from master process instead of making a copy. Default:
True
.pipeline (list, optional) – Processing pipeline. Default: [].
test_mode (bool, optional) –
test_mode=True
means in test phase. Default:False
.lazy_init (bool, optional) – Whether to load annotation during instantiation. In some cases, such as visualization, only the meta information of the dataset is needed, which is not necessary to load annotation file.
Basedataset
can skip load annotations to save time by setlazy_init=False
. Default:False
.max_refetch (int, optional) – If
Basedataset.prepare_data
get a None img. The maximum extra number of cycles to get a valid image. Default: 1000.
- parse_data_info(raw_data_info: dict) Optional[dict] [source]
Parse raw annotation of an instance.
- Parameters
raw_data_info (dict) –
Raw data information loaded from
ann_file
. It should have following contents:'raw_ann_info'
: Raw annotation of an instance'raw_img_info'
: Raw information of the image thatcontains the instance
- Returns
Parsed instance annotation
- Return type
dict
- class mmpose.datasets.datasets.face.AFLWDataset(ann_file: str = '', bbox_file: Optional[str] = None, data_mode: str = 'topdown', metainfo: Optional[dict] = None, data_root: Optional[str] = None, data_prefix: dict = {'img': ''}, filter_cfg: Optional[dict] = None, indices: Optional[Union[int, Sequence[int]]] = None, serialize_data: bool = True, pipeline: List[Union[dict, Callable]] = [], test_mode: bool = False, lazy_init: bool = False, max_refetch: int = 1000)[source]
AFLW dataset for face keypoint localization.
“Annotated Facial Landmarks in the Wild: A Large-scale, Real-world Database for Facial Landmark Localization”. In Proc. First IEEE International Workshop on Benchmarking Facial Image Analysis Technologies, 2011.
The landmark annotations follow the 19 points mark-up. The definition can be found in https://www.tugraz.at/institute/icg/research /team-bischof/lrs/downloads/aflw/
Args: ann_file (str): Annotation file path. Default: ‘’. bbox_file (str, optional): Detection result file path. If
bbox_file
is set, detected bboxes loaded from this file will be used instead of ground-truth bboxes. This setting is only for evaluation, i.e., ignored whentest_mode
isFalse
. Default:None
.- data_mode (str): Specifies the mode of data samples:
'topdown'
or 'bottomup'
. In'topdown'
mode, each data sample contains one instance; while in'bottomup'
mode, each data sample contains all instances in a image. Default:'topdown'
- metainfo (dict, optional): Meta information for dataset, such as class
information. Default:
None
.- data_root (str, optional): The root directory for
data_prefix
and ann_file
. Default:None
.- data_prefix (dict, optional): Prefix for training data. Default:
dict(img=None, ann=None)
.
filter_cfg (dict, optional): Config for filter data. Default: None. indices (int or Sequence[int], optional): Support using first few
data in annotation file to facilitate training/testing on a smaller dataset. Default:
None
which means using alldata_infos
.- serialize_data (bool, optional): Whether to hold memory using
serialized objects, when enabled, data loader workers can use shared RAM from master process instead of making a copy. Default:
True
.
pipeline (list, optional): Processing pipeline. Default: []. test_mode (bool, optional):
test_mode=True
means in test phase.Default:
False
.- lazy_init (bool, optional): Whether to load annotation during
instantiation. In some cases, such as visualization, only the meta information of the dataset is needed, which is not necessary to load annotation file.
Basedataset
can skip load annotations to save time by setlazy_init=False
. Default:False
.- max_refetch (int, optional): If
Basedataset.prepare_data
get a None img. The maximum extra number of cycles to get a valid image. Default: 1000.
- parse_data_info(raw_data_info: dict) Optional[dict] [source]
Parse raw Face AFLW annotation of an instance.
- Parameters
raw_data_info (dict) –
Raw data information loaded from
ann_file
. It should have following contents:'raw_ann_info'
: Raw annotation of an instance'raw_img_info'
: Raw information of the image thatcontains the instance
- Returns
Parsed instance annotation
- Return type
dict
- data_mode (str): Specifies the mode of data samples:
- class mmpose.datasets.datasets.face.COFWDataset(ann_file: str = '', bbox_file: Optional[str] = None, data_mode: str = 'topdown', metainfo: Optional[dict] = None, data_root: Optional[str] = None, data_prefix: dict = {'img': ''}, filter_cfg: Optional[dict] = None, indices: Optional[Union[int, Sequence[int]]] = None, serialize_data: bool = True, pipeline: List[Union[dict, Callable]] = [], test_mode: bool = False, lazy_init: bool = False, max_refetch: int = 1000)[source]
COFW dataset for face keypoint localization.
“Robust face landmark estimation under occlusion”, ICCV’2013.
The landmark annotations follow the 29 points mark-up. The definition can be found in `http://www.vision.caltech.edu/xpburgos/ICCV13/`__ .
- Parameters
ann_file (str) – Annotation file path. Default: ‘’.
bbox_file (str, optional) – Detection result file path. If
bbox_file
is set, detected bboxes loaded from this file will be used instead of ground-truth bboxes. This setting is only for evaluation, i.e., ignored whentest_mode
isFalse
. Default:None
.data_mode (str) – Specifies the mode of data samples:
'topdown'
or'bottomup'
. In'topdown'
mode, each data sample contains one instance; while in'bottomup'
mode, each data sample contains all instances in a image. Default:'topdown'
metainfo (dict, optional) – Meta information for dataset, such as class information. Default:
None
.data_root (str, optional) – The root directory for
data_prefix
andann_file
. Default:None
.data_prefix (dict, optional) – Prefix for training data. Default:
dict(img=None, ann=None)
.filter_cfg (dict, optional) – Config for filter data. Default: None.
indices (int or Sequence[int], optional) – Support using first few data in annotation file to facilitate training/testing on a smaller dataset. Default:
None
which means using alldata_infos
.serialize_data (bool, optional) – Whether to hold memory using serialized objects, when enabled, data loader workers can use shared RAM from master process instead of making a copy. Default:
True
.pipeline (list, optional) – Processing pipeline. Default: [].
test_mode (bool, optional) –
test_mode=True
means in test phase. Default:False
.lazy_init (bool, optional) – Whether to load annotation during instantiation. In some cases, such as visualization, only the meta information of the dataset is needed, which is not necessary to load annotation file.
Basedataset
can skip load annotations to save time by setlazy_init=False
. Default:False
.max_refetch (int, optional) – If
Basedataset.prepare_data
get a None img. The maximum extra number of cycles to get a valid image. Default: 1000.
- class mmpose.datasets.datasets.face.CocoWholeBodyFaceDataset(ann_file: str = '', bbox_file: Optional[str] = None, data_mode: str = 'topdown', metainfo: Optional[dict] = None, data_root: Optional[str] = None, data_prefix: dict = {'img': ''}, filter_cfg: Optional[dict] = None, indices: Optional[Union[int, Sequence[int]]] = None, serialize_data: bool = True, pipeline: List[Union[dict, Callable]] = [], test_mode: bool = False, lazy_init: bool = False, max_refetch: int = 1000)[source]
CocoWholeBodyDataset for face keypoint localization.
Whole-Body Human Pose Estimation in the Wild’, ECCV’2020. More details can be found in the `paper .
The face landmark annotations follow the 68 points mark-up.
- Parameters
ann_file (str) – Annotation file path. Default: ‘’.
bbox_file (str, optional) – Detection result file path. If
bbox_file
is set, detected bboxes loaded from this file will be used instead of ground-truth bboxes. This setting is only for evaluation, i.e., ignored whentest_mode
isFalse
. Default:None
.data_mode (str) – Specifies the mode of data samples:
'topdown'
or'bottomup'
. In'topdown'
mode, each data sample contains one instance; while in'bottomup'
mode, each data sample contains all instances in a image. Default:'topdown'
metainfo (dict, optional) – Meta information for dataset, such as class information. Default:
None
.data_root (str, optional) – The root directory for
data_prefix
andann_file
. Default:None
.data_prefix (dict, optional) – Prefix for training data. Default:
dict(img=None, ann=None)
.filter_cfg (dict, optional) – Config for filter data. Default: None.
indices (int or Sequence[int], optional) – Support using first few data in annotation file to facilitate training/testing on a smaller dataset. Default:
None
which means using alldata_infos
.serialize_data (bool, optional) – Whether to hold memory using serialized objects, when enabled, data loader workers can use shared RAM from master process instead of making a copy. Default:
True
.pipeline (list, optional) – Processing pipeline. Default: [].
test_mode (bool, optional) –
test_mode=True
means in test phase. Default:False
.lazy_init (bool, optional) – Whether to load annotation during instantiation. In some cases, such as visualization, only the meta information of the dataset is needed, which is not necessary to load annotation file.
Basedataset
can skip load annotations to save time by setlazy_init=False
. Default:False
.max_refetch (int, optional) – If
Basedataset.prepare_data
get a None img. The maximum extra number of cycles to get a valid image. Default: 1000.
- parse_data_info(raw_data_info: dict) Optional[dict] [source]
Parse raw CocoWholeBody Face annotation of an instance.
- Parameters
raw_data_info (dict) –
Raw data information loaded from
ann_file
. It should have following contents:'raw_ann_info'
: Raw annotation of an instance'raw_img_info'
: Raw information of the image thatcontains the instance
- Returns
Parsed instance annotation
- Return type
dict
- class mmpose.datasets.datasets.face.Face300WDataset(ann_file: str = '', bbox_file: Optional[str] = None, data_mode: str = 'topdown', metainfo: Optional[dict] = None, data_root: Optional[str] = None, data_prefix: dict = {'img': ''}, filter_cfg: Optional[dict] = None, indices: Optional[Union[int, Sequence[int]]] = None, serialize_data: bool = True, pipeline: List[Union[dict, Callable]] = [], test_mode: bool = False, lazy_init: bool = False, max_refetch: int = 1000)[source]
300W dataset for face keypoint localization.
“300 faces In-the-wild challenge: Database and results”, Image and Vision Computing (IMAVIS) 2019.
The landmark annotations follow the 68 points mark-up. The definition can be found in https://ibug.doc.ic.ac.uk/resources/300-W/.
- Parameters
ann_file (str) – Annotation file path. Default: ‘’.
bbox_file (str, optional) – Detection result file path. If
bbox_file
is set, detected bboxes loaded from this file will be used instead of ground-truth bboxes. This setting is only for evaluation, i.e., ignored whentest_mode
isFalse
. Default:None
.data_mode (str) – Specifies the mode of data samples:
'topdown'
or'bottomup'
. In'topdown'
mode, each data sample contains one instance; while in'bottomup'
mode, each data sample contains all instances in a image. Default:'topdown'
metainfo (dict, optional) – Meta information for dataset, such as class information. Default:
None
.data_root (str, optional) – The root directory for
data_prefix
andann_file
. Default:None
.data_prefix (dict, optional) – Prefix for training data. Default:
dict(img=None, ann=None)
.filter_cfg (dict, optional) – Config for filter data. Default: None.
indices (int or Sequence[int], optional) – Support using first few data in annotation file to facilitate training/testing on a smaller dataset. Default:
None
which means using alldata_infos
.serialize_data (bool, optional) – Whether to hold memory using serialized objects, when enabled, data loader workers can use shared RAM from master process instead of making a copy. Default:
True
.pipeline (list, optional) – Processing pipeline. Default: [].
test_mode (bool, optional) –
test_mode=True
means in test phase. Default:False
.lazy_init (bool, optional) – Whether to load annotation during instantiation. In some cases, such as visualization, only the meta information of the dataset is needed, which is not necessary to load annotation file.
Basedataset
can skip load annotations to save time by setlazy_init=False
. Default:False
.max_refetch (int, optional) – If
Basedataset.prepare_data
get a None img. The maximum extra number of cycles to get a valid image. Default: 1000.
- parse_data_info(raw_data_info: dict) Optional[dict] [source]
Parse raw Face300W annotation of an instance.
- Parameters
raw_data_info (dict) –
Raw data information loaded from
ann_file
. It should have following contents:'raw_ann_info'
: Raw annotation of an instance'raw_img_info'
: Raw information of the image thatcontains the instance
- Returns
Parsed instance annotation
- Return type
dict
- class mmpose.datasets.datasets.face.WFLWDataset(ann_file: str = '', bbox_file: Optional[str] = None, data_mode: str = 'topdown', metainfo: Optional[dict] = None, data_root: Optional[str] = None, data_prefix: dict = {'img': ''}, filter_cfg: Optional[dict] = None, indices: Optional[Union[int, Sequence[int]]] = None, serialize_data: bool = True, pipeline: List[Union[dict, Callable]] = [], test_mode: bool = False, lazy_init: bool = False, max_refetch: int = 1000)[source]
WFLW dataset for face keypoint localization.
“Look at Boundary: A Boundary-Aware Face Alignment Algorithm”, CVPR’2018.
The landmark annotations follow the 98 points mark-up. The definition can be found in `https://wywu.github.io/projects/LAB/WFLW.html`__ .
- Parameters
ann_file (str) – Annotation file path. Default: ‘’.
bbox_file (str, optional) – Detection result file path. If
bbox_file
is set, detected bboxes loaded from this file will be used instead of ground-truth bboxes. This setting is only for evaluation, i.e., ignored whentest_mode
isFalse
. Default:None
.data_mode (str) – Specifies the mode of data samples:
'topdown'
or'bottomup'
. In'topdown'
mode, each data sample contains one instance; while in'bottomup'
mode, each data sample contains all instances in a image. Default:'topdown'
metainfo (dict, optional) – Meta information for dataset, such as class information. Default:
None
.data_root (str, optional) – The root directory for
data_prefix
andann_file
. Default:None
.data_prefix (dict, optional) – Prefix for training data. Default:
dict(img=None, ann=None)
.filter_cfg (dict, optional) – Config for filter data. Default: None.
indices (int or Sequence[int], optional) – Support using first few data in annotation file to facilitate training/testing on a smaller dataset. Default:
None
which means using alldata_infos
.serialize_data (bool, optional) – Whether to hold memory using serialized objects, when enabled, data loader workers can use shared RAM from master process instead of making a copy. Default:
True
.pipeline (list, optional) – Processing pipeline. Default: [].
test_mode (bool, optional) –
test_mode=True
means in test phase. Default:False
.lazy_init (bool, optional) – Whether to load annotation during instantiation. In some cases, such as visualization, only the meta information of the dataset is needed, which is not necessary to load annotation file.
Basedataset
can skip load annotations to save time by setlazy_init=False
. Default:False
.max_refetch (int, optional) – If
Basedataset.prepare_data
get a None img. The maximum extra number of cycles to get a valid image. Default: 1000.
- parse_data_info(raw_data_info: dict) Optional[dict] [source]
Parse raw Face WFLW annotation of an instance.
- Parameters
raw_data_info (dict) –
Raw data information loaded from
ann_file
. It should have following contents:'raw_ann_info'
: Raw annotation of an instance'raw_img_info'
: Raw information of the image thatcontains the instance
- Returns
Parsed instance annotation
- Return type
dict
- class mmpose.datasets.datasets.hand.CocoWholeBodyHandDataset(ann_file: str = '', bbox_file: Optional[str] = None, data_mode: str = 'topdown', metainfo: Optional[dict] = None, data_root: Optional[str] = None, data_prefix: dict = {'img': ''}, filter_cfg: Optional[dict] = None, indices: Optional[Union[int, Sequence[int]]] = None, serialize_data: bool = True, pipeline: List[Union[dict, Callable]] = [], test_mode: bool = False, lazy_init: bool = False, max_refetch: int = 1000)[source]
CocoWholeBodyDataset for hand pose estimation.
“Whole-Body Human Pose Estimation in the Wild”, ECCV’2020. More details can be found in the paper .
COCO-WholeBody Hand keypoints:
0: 'wrist', 1: 'thumb1', 2: 'thumb2', 3: 'thumb3', 4: 'thumb4', 5: 'forefinger1', 6: 'forefinger2', 7: 'forefinger3', 8: 'forefinger4', 9: 'middle_finger1', 10: 'middle_finger2', 11: 'middle_finger3', 12: 'middle_finger4', 13: 'ring_finger1', 14: 'ring_finger2', 15: 'ring_finger3', 16: 'ring_finger4', 17: 'pinky_finger1', 18: 'pinky_finger2', 19: 'pinky_finger3', 20: 'pinky_finger4'
- Parameters
ann_file (str) – Annotation file path. Default: ‘’.
bbox_file (str, optional) – Detection result file path. If
bbox_file
is set, detected bboxes loaded from this file will be used instead of ground-truth bboxes. This setting is only for evaluation, i.e., ignored whentest_mode
isFalse
. Default:None
.data_mode (str) – Specifies the mode of data samples:
'topdown'
or'bottomup'
. In'topdown'
mode, each data sample contains one instance; while in'bottomup'
mode, each data sample contains all instances in a image. Default:'topdown'
metainfo (dict, optional) – Meta information for dataset, such as class information. Default:
None
.data_root (str, optional) – The root directory for
data_prefix
andann_file
. Default:None
.data_prefix (dict, optional) – Prefix for training data. Default:
dict(img=None, ann=None)
.filter_cfg (dict, optional) – Config for filter data. Default: None.
indices (int or Sequence[int], optional) – Support using first few data in annotation file to facilitate training/testing on a smaller dataset. Default:
None
which means using alldata_infos
.serialize_data (bool, optional) – Whether to hold memory using serialized objects, when enabled, data loader workers can use shared RAM from master process instead of making a copy. Default:
True
.pipeline (list, optional) – Processing pipeline. Default: [].
test_mode (bool, optional) –
test_mode=True
means in test phase. Default:False
.lazy_init (bool, optional) – Whether to load annotation during instantiation. In some cases, such as visualization, only the meta information of the dataset is needed, which is not necessary to load annotation file.
Basedataset
can skip load annotations to save time by setlazy_init=False
. Default:False
.max_refetch (int, optional) – If
Basedataset.prepare_data
get a None img. The maximum extra number of cycles to get a valid image. Default: 1000.
- class mmpose.datasets.datasets.hand.FreiHandDataset(ann_file: str = '', bbox_file: Optional[str] = None, data_mode: str = 'topdown', metainfo: Optional[dict] = None, data_root: Optional[str] = None, data_prefix: dict = {'img': ''}, filter_cfg: Optional[dict] = None, indices: Optional[Union[int, Sequence[int]]] = None, serialize_data: bool = True, pipeline: List[Union[dict, Callable]] = [], test_mode: bool = False, lazy_init: bool = False, max_refetch: int = 1000)[source]
FreiHand dataset for hand pose estimation.
“FreiHAND: A Dataset for Markerless Capture of Hand Pose and Shape from Single RGB Images”, ICCV’2019. More details can be found in the paper .
FreiHand keypoints:
0: 'wrist', 1: 'thumb1', 2: 'thumb2', 3: 'thumb3', 4: 'thumb4', 5: 'forefinger1', 6: 'forefinger2', 7: 'forefinger3', 8: 'forefinger4', 9: 'middle_finger1', 10: 'middle_finger2', 11: 'middle_finger3', 12: 'middle_finger4', 13: 'ring_finger1', 14: 'ring_finger2', 15: 'ring_finger3', 16: 'ring_finger4', 17: 'pinky_finger1', 18: 'pinky_finger2', 19: 'pinky_finger3', 20: 'pinky_finger4'
- Parameters
ann_file (str) – Annotation file path. Default: ‘’.
bbox_file (str, optional) – Detection result file path. If
bbox_file
is set, detected bboxes loaded from this file will be used instead of ground-truth bboxes. This setting is only for evaluation, i.e., ignored whentest_mode
isFalse
. Default:None
.data_mode (str) – Specifies the mode of data samples:
'topdown'
or'bottomup'
. In'topdown'
mode, each data sample contains one instance; while in'bottomup'
mode, each data sample contains all instances in a image. Default:'topdown'
metainfo (dict, optional) – Meta information for dataset, such as class information. Default:
None
.data_root (str, optional) – The root directory for
data_prefix
andann_file
. Default:None
.data_prefix (dict, optional) – Prefix for training data. Default:
dict(img=None, ann=None)
.filter_cfg (dict, optional) – Config for filter data. Default: None.
indices (int or Sequence[int], optional) – Support using first few data in annotation file to facilitate training/testing on a smaller dataset. Default:
None
which means using alldata_infos
.serialize_data (bool, optional) – Whether to hold memory using serialized objects, when enabled, data loader workers can use shared RAM from master process instead of making a copy. Default:
True
.pipeline (list, optional) – Processing pipeline. Default: [].
test_mode (bool, optional) –
test_mode=True
means in test phase. Default:False
.lazy_init (bool, optional) – Whether to load annotation during instantiation. In some cases, such as visualization, only the meta information of the dataset is needed, which is not necessary to load annotation file.
Basedataset
can skip load annotations to save time by setlazy_init=False
. Default:False
.max_refetch (int, optional) – If
Basedataset.prepare_data
get a None img. The maximum extra number of cycles to get a valid image. Default: 1000.
- parse_data_info(raw_data_info: dict) Optional[dict] [source]
Parse raw COCO annotation of an instance.
- Parameters
raw_data_info (dict) –
Raw data information loaded from
ann_file
. It should have following contents:'raw_ann_info'
: Raw annotation of an instance'raw_img_info'
: Raw information of the image thatcontains the instance
- Returns
Parsed instance annotation
- Return type
dict
- class mmpose.datasets.datasets.hand.OneHand10KDataset(ann_file: str = '', bbox_file: Optional[str] = None, data_mode: str = 'topdown', metainfo: Optional[dict] = None, data_root: Optional[str] = None, data_prefix: dict = {'img': ''}, filter_cfg: Optional[dict] = None, indices: Optional[Union[int, Sequence[int]]] = None, serialize_data: bool = True, pipeline: List[Union[dict, Callable]] = [], test_mode: bool = False, lazy_init: bool = False, max_refetch: int = 1000)[source]
OneHand10K dataset for hand pose estimation.
“Mask-pose Cascaded CNN for 2D Hand Pose Estimation from Single Color Images”, TCSVT’2019. More details can be found in the paper .
OneHand10K keypoints:
0: 'wrist', 1: 'thumb1', 2: 'thumb2', 3: 'thumb3', 4: 'thumb4', 5: 'forefinger1', 6: 'forefinger2', 7: 'forefinger3', 8: 'forefinger4', 9: 'middle_finger1', 10: 'middle_finger2', 11: 'middle_finger3', 12: 'middle_finger4', 13: 'ring_finger1', 14: 'ring_finger2', 15: 'ring_finger3', 16: 'ring_finger4', 17: 'pinky_finger1', 18: 'pinky_finger2', 19: 'pinky_finger3', 20: 'pinky_finger4'
- Parameters
ann_file (str) – Annotation file path. Default: ‘’.
bbox_file (str, optional) – Detection result file path. If
bbox_file
is set, detected bboxes loaded from this file will be used instead of ground-truth bboxes. This setting is only for evaluation, i.e., ignored whentest_mode
isFalse
. Default:None
.data_mode (str) – Specifies the mode of data samples:
'topdown'
or'bottomup'
. In'topdown'
mode, each data sample contains one instance; while in'bottomup'
mode, each data sample contains all instances in a image. Default:'topdown'
metainfo (dict, optional) – Meta information for dataset, such as class information. Default:
None
.data_root (str, optional) – The root directory for
data_prefix
andann_file
. Default:None
.data_prefix (dict, optional) – Prefix for training data. Default:
dict(img=None, ann=None)
.filter_cfg (dict, optional) – Config for filter data. Default: None.
indices (int or Sequence[int], optional) – Support using first few data in annotation file to facilitate training/testing on a smaller dataset. Default:
None
which means using alldata_infos
.serialize_data (bool, optional) – Whether to hold memory using serialized objects, when enabled, data loader workers can use shared RAM from master process instead of making a copy. Default:
True
.pipeline (list, optional) – Processing pipeline. Default: [].
test_mode (bool, optional) –
test_mode=True
means in test phase. Default:False
.lazy_init (bool, optional) – Whether to load annotation during instantiation. In some cases, such as visualization, only the meta information of the dataset is needed, which is not necessary to load annotation file.
Basedataset
can skip load annotations to save time by setlazy_init=False
. Default:False
.max_refetch (int, optional) – If
Basedataset.prepare_data
get a None img. The maximum extra number of cycles to get a valid image. Default: 1000.
- class mmpose.datasets.datasets.hand.PanopticHand2DDataset(ann_file: str = '', bbox_file: Optional[str] = None, data_mode: str = 'topdown', metainfo: Optional[dict] = None, data_root: Optional[str] = None, data_prefix: dict = {'img': ''}, filter_cfg: Optional[dict] = None, indices: Optional[Union[int, Sequence[int]]] = None, serialize_data: bool = True, pipeline: List[Union[dict, Callable]] = [], test_mode: bool = False, lazy_init: bool = False, max_refetch: int = 1000)[source]
Panoptic 2D dataset for hand pose estimation.
“Hand Keypoint Detection in Single Images using Multiview Bootstrapping”, CVPR’2017. More details can be found in the paper .
Panoptic keypoints:
0: 'wrist', 1: 'thumb1', 2: 'thumb2', 3: 'thumb3', 4: 'thumb4', 5: 'forefinger1', 6: 'forefinger2', 7: 'forefinger3', 8: 'forefinger4', 9: 'middle_finger1', 10: 'middle_finger2', 11: 'middle_finger3', 12: 'middle_finger4', 13: 'ring_finger1', 14: 'ring_finger2', 15: 'ring_finger3', 16: 'ring_finger4', 17: 'pinky_finger1', 18: 'pinky_finger2', 19: 'pinky_finger3', 20: 'pinky_finger4'
- Parameters
ann_file (str) – Annotation file path. Default: ‘’.
bbox_file (str, optional) – Detection result file path. If
bbox_file
is set, detected bboxes loaded from this file will be used instead of ground-truth bboxes. This setting is only for evaluation, i.e., ignored whentest_mode
isFalse
. Default:None
.data_mode (str) – Specifies the mode of data samples:
'topdown'
or'bottomup'
. In'topdown'
mode, each data sample contains one instance; while in'bottomup'
mode, each data sample contains all instances in a image. Default:'topdown'
metainfo (dict, optional) – Meta information for dataset, such as class information. Default:
None
.data_root (str, optional) – The root directory for
data_prefix
andann_file
. Default:None
.data_prefix (dict, optional) – Prefix for training data. Default:
dict(img=None, ann=None)
.filter_cfg (dict, optional) – Config for filter data. Default: None.
indices (int or Sequence[int], optional) – Support using first few data in annotation file to facilitate training/testing on a smaller dataset. Default:
None
which means using alldata_infos
.serialize_data (bool, optional) – Whether to hold memory using serialized objects, when enabled, data loader workers can use shared RAM from master process instead of making a copy. Default:
True
.pipeline (list, optional) – Processing pipeline. Default: [].
test_mode (bool, optional) –
test_mode=True
means in test phase. Default:False
.lazy_init (bool, optional) – Whether to load annotation during instantiation. In some cases, such as visualization, only the meta information of the dataset is needed, which is not necessary to load annotation file.
Basedataset
can skip load annotations to save time by setlazy_init=False
. Default:False
.max_refetch (int, optional) – If
Basedataset.prepare_data
get a None img. The maximum extra number of cycles to get a valid image. Default: 1000.
- parse_data_info(raw_data_info: dict) Optional[dict] [source]
Parse raw COCO annotation of an instance.
- Parameters
raw_data_info (dict) –
Raw data information loaded from
ann_file
. It should have following contents:'raw_ann_info'
: Raw annotation of an instance'raw_img_info'
: Raw information of the image thatcontains the instance
- Returns
Parsed instance annotation
- Return type
dict
- class mmpose.datasets.datasets.hand.Rhd2DDataset(ann_file: str = '', bbox_file: Optional[str] = None, data_mode: str = 'topdown', metainfo: Optional[dict] = None, data_root: Optional[str] = None, data_prefix: dict = {'img': ''}, filter_cfg: Optional[dict] = None, indices: Optional[Union[int, Sequence[int]]] = None, serialize_data: bool = True, pipeline: List[Union[dict, Callable]] = [], test_mode: bool = False, lazy_init: bool = False, max_refetch: int = 1000)[source]
Rendered Handpose Dataset for hand pose estimation.
“Learning to Estimate 3D Hand Pose from Single RGB Images”, ICCV’2017. More details can be found in the paper .
Rhd keypoints:
0: 'wrist', 1: 'thumb4', 2: 'thumb3', 3: 'thumb2', 4: 'thumb1', 5: 'forefinger4', 6: 'forefinger3', 7: 'forefinger2', 8: 'forefinger1', 9: 'middle_finger4', 10: 'middle_finger3', 11: 'middle_finger2', 12: 'middle_finger1', 13: 'ring_finger4', 14: 'ring_finger3', 15: 'ring_finger2', 16: 'ring_finger1', 17: 'pinky_finger4', 18: 'pinky_finger3', 19: 'pinky_finger2', 20: 'pinky_finger1'
- Parameters
ann_file (str) – Annotation file path. Default: ‘’.
bbox_file (str, optional) – Detection result file path. If
bbox_file
is set, detected bboxes loaded from this file will be used instead of ground-truth bboxes. This setting is only for evaluation, i.e., ignored whentest_mode
isFalse
. Default:None
.data_mode (str) – Specifies the mode of data samples:
'topdown'
or'bottomup'
. In'topdown'
mode, each data sample contains one instance; while in'bottomup'
mode, each data sample contains all instances in a image. Default:'topdown'
metainfo (dict, optional) – Meta information for dataset, such as class information. Default:
None
.data_root (str, optional) – The root directory for
data_prefix
andann_file
. Default:None
.data_prefix (dict, optional) – Prefix for training data. Default:
dict(img=None, ann=None)
.filter_cfg (dict, optional) – Config for filter data. Default: None.
indices (int or Sequence[int], optional) – Support using first few data in annotation file to facilitate training/testing on a smaller dataset. Default:
None
which means using alldata_infos
.serialize_data (bool, optional) – Whether to hold memory using serialized objects, when enabled, data loader workers can use shared RAM from master process instead of making a copy. Default:
True
.pipeline (list, optional) – Processing pipeline. Default: [].
test_mode (bool, optional) –
test_mode=True
means in test phase. Default:False
.lazy_init (bool, optional) – Whether to load annotation during instantiation. In some cases, such as visualization, only the meta information of the dataset is needed, which is not necessary to load annotation file.
Basedataset
can skip load annotations to save time by setlazy_init=False
. Default:False
.max_refetch (int, optional) – If
Basedataset.prepare_data
get a None img. The maximum extra number of cycles to get a valid image. Default: 1000.
- class mmpose.datasets.datasets.animal.AP10KDataset(ann_file: str = '', bbox_file: Optional[str] = None, data_mode: str = 'topdown', metainfo: Optional[dict] = None, data_root: Optional[str] = None, data_prefix: dict = {'img': ''}, filter_cfg: Optional[dict] = None, indices: Optional[Union[int, Sequence[int]]] = None, serialize_data: bool = True, pipeline: List[Union[dict, Callable]] = [], test_mode: bool = False, lazy_init: bool = False, max_refetch: int = 1000)[source]
AP-10K dataset for animal pose estimation.
“AP-10K: A Benchmark for Animal Pose Estimation in the Wild” Neurips Dataset Track’2021. More details can be found in the paper .
AP-10K keypoints:
0: 'L_Eye', 1: 'R_Eye', 2: 'Nose', 3: 'Neck', 4: 'root of tail', 5: 'L_Shoulder', 6: 'L_Elbow', 7: 'L_F_Paw', 8: 'R_Shoulder', 9: 'R_Elbow', 10: 'R_F_Paw, 11: 'L_Hip', 12: 'L_Knee', 13: 'L_B_Paw', 14: 'R_Hip', 15: 'R_Knee', 16: 'R_B_Paw'
- Parameters
ann_file (str) – Annotation file path. Default: ‘’.
bbox_file (str, optional) – Detection result file path. If
bbox_file
is set, detected bboxes loaded from this file will be used instead of ground-truth bboxes. This setting is only for evaluation, i.e., ignored whentest_mode
isFalse
. Default:None
.data_mode (str) – Specifies the mode of data samples:
'topdown'
or'bottomup'
. In'topdown'
mode, each data sample contains one instance; while in'bottomup'
mode, each data sample contains all instances in a image. Default:'topdown'
metainfo (dict, optional) – Meta information for dataset, such as class information. Default:
None
.data_root (str, optional) – The root directory for
data_prefix
andann_file
. Default:None
.data_prefix (dict, optional) – Prefix for training data. Default:
dict(img=None, ann=None)
.filter_cfg (dict, optional) – Config for filter data. Default: None.
indices (int or Sequence[int], optional) – Support using first few data in annotation file to facilitate training/testing on a smaller dataset. Default:
None
which means using alldata_infos
.serialize_data (bool, optional) – Whether to hold memory using serialized objects, when enabled, data loader workers can use shared RAM from master process instead of making a copy. Default:
True
.pipeline (list, optional) – Processing pipeline. Default: [].
test_mode (bool, optional) –
test_mode=True
means in test phase. Default:False
.lazy_init (bool, optional) – Whether to load annotation during instantiation. In some cases, such as visualization, only the meta information of the dataset is needed, which is not necessary to load annotation file.
Basedataset
can skip load annotations to save time by setlazy_init=False
. Default:False
.max_refetch (int, optional) – If
Basedataset.prepare_data
get a None img. The maximum extra number of cycles to get a valid image. Default: 1000.
- class mmpose.datasets.datasets.animal.ATRWDataset(ann_file: str = '', bbox_file: Optional[str] = None, data_mode: str = 'topdown', metainfo: Optional[dict] = None, data_root: Optional[str] = None, data_prefix: dict = {'img': ''}, filter_cfg: Optional[dict] = None, indices: Optional[Union[int, Sequence[int]]] = None, serialize_data: bool = True, pipeline: List[Union[dict, Callable]] = [], test_mode: bool = False, lazy_init: bool = False, max_refetch: int = 1000)[source]
ATRW dataset for animal pose estimation.
“ATRW: A Benchmark for Amur Tiger Re-identification in the Wild” ACM MM’2020. More details can be found in the paper .
ATRW keypoints:
0: "left_ear", 1: "right_ear", 2: "nose", 3: "right_shoulder", 4: "right_front_paw", 5: "left_shoulder", 6: "left_front_paw", 7: "right_hip", 8: "right_knee", 9: "right_back_paw", 10: "left_hip", 11: "left_knee", 12: "left_back_paw", 13: "tail", 14: "center"
- Parameters
ann_file (str) – Annotation file path. Default: ‘’.
bbox_file (str, optional) – Detection result file path. If
bbox_file
is set, detected bboxes loaded from this file will be used instead of ground-truth bboxes. This setting is only for evaluation, i.e., ignored whentest_mode
isFalse
. Default:None
.data_mode (str) – Specifies the mode of data samples:
'topdown'
or'bottomup'
. In'topdown'
mode, each data sample contains one instance; while in'bottomup'
mode, each data sample contains all instances in a image. Default:'topdown'
metainfo (dict, optional) – Meta information for dataset, such as class information. Default:
None
.data_root (str, optional) – The root directory for
data_prefix
andann_file
. Default:None
.data_prefix (dict, optional) – Prefix for training data. Default:
dict(img=None, ann=None)
.filter_cfg (dict, optional) – Config for filter data. Default: None.
indices (int or Sequence[int], optional) – Support using first few data in annotation file to facilitate training/testing on a smaller dataset. Default:
None
which means using alldata_infos
.serialize_data (bool, optional) – Whether to hold memory using serialized objects, when enabled, data loader workers can use shared RAM from master process instead of making a copy. Default:
True
.pipeline (list, optional) – Processing pipeline. Default: [].
test_mode (bool, optional) –
test_mode=True
means in test phase. Default:False
.lazy_init (bool, optional) – Whether to load annotation during instantiation. In some cases, such as visualization, only the meta information of the dataset is needed, which is not necessary to load annotation file.
Basedataset
can skip load annotations to save time by setlazy_init=False
. Default:False
.max_refetch (int, optional) – If
Basedataset.prepare_data
get a None img. The maximum extra number of cycles to get a valid image. Default: 1000.
- class mmpose.datasets.datasets.animal.AnimalPoseDataset(ann_file: str = '', bbox_file: Optional[str] = None, data_mode: str = 'topdown', metainfo: Optional[dict] = None, data_root: Optional[str] = None, data_prefix: dict = {'img': ''}, filter_cfg: Optional[dict] = None, indices: Optional[Union[int, Sequence[int]]] = None, serialize_data: bool = True, pipeline: List[Union[dict, Callable]] = [], test_mode: bool = False, lazy_init: bool = False, max_refetch: int = 1000)[source]
Animal-Pose dataset for animal pose estimation.
“Cross-domain Adaptation For Animal Pose Estimation” ICCV’2019 More details can be found in the paper .
Animal-Pose keypoints:
0: 'L_Eye', 1: 'R_Eye', 2: 'L_EarBase', 3: 'R_EarBase', 4: 'Nose', 5: 'Throat', 6: 'TailBase', 7: 'Withers', 8: 'L_F_Elbow', 9: 'R_F_Elbow', 10: 'L_B_Elbow', 11: 'R_B_Elbow', 12: 'L_F_Knee', 13: 'R_F_Knee', 14: 'L_B_Knee', 15: 'R_B_Knee', 16: 'L_F_Paw', 17: 'R_F_Paw', 18: 'L_B_Paw', 19: 'R_B_Paw'
- Parameters
ann_file (str) – Annotation file path. Default: ‘’.
bbox_file (str, optional) – Detection result file path. If
bbox_file
is set, detected bboxes loaded from this file will be used instead of ground-truth bboxes. This setting is only for evaluation, i.e., ignored whentest_mode
isFalse
. Default:None
.data_mode (str) – Specifies the mode of data samples:
'topdown'
or'bottomup'
. In'topdown'
mode, each data sample contains one instance; while in'bottomup'
mode, each data sample contains all instances in a image. Default:'topdown'
metainfo (dict, optional) – Meta information for dataset, such as class information. Default:
None
.data_root (str, optional) – The root directory for
data_prefix
andann_file
. Default:None
.data_prefix (dict, optional) – Prefix for training data. Default:
dict(img=None, ann=None)
.filter_cfg (dict, optional) – Config for filter data. Default: None.
indices (int or Sequence[int], optional) – Support using first few data in annotation file to facilitate training/testing on a smaller dataset. Default:
None
which means using alldata_infos
.serialize_data (bool, optional) – Whether to hold memory using serialized objects, when enabled, data loader workers can use shared RAM from master process instead of making a copy. Default:
True
.pipeline (list, optional) – Processing pipeline. Default: [].
test_mode (bool, optional) –
test_mode=True
means in test phase. Default:False
.lazy_init (bool, optional) – Whether to load annotation during instantiation. In some cases, such as visualization, only the meta information of the dataset is needed, which is not necessary to load annotation file.
Basedataset
can skip load annotations to save time by setlazy_init=False
. Default:False
.max_refetch (int, optional) – If
Basedataset.prepare_data
get a None img. The maximum extra number of cycles to get a valid image. Default: 1000.
- class mmpose.datasets.datasets.animal.FlyDataset(ann_file: str = '', bbox_file: Optional[str] = None, data_mode: str = 'topdown', metainfo: Optional[dict] = None, data_root: Optional[str] = None, data_prefix: dict = {'img': ''}, filter_cfg: Optional[dict] = None, indices: Optional[Union[int, Sequence[int]]] = None, serialize_data: bool = True, pipeline: List[Union[dict, Callable]] = [], test_mode: bool = False, lazy_init: bool = False, max_refetch: int = 1000)[source]
FlyDataset for animal pose estimation.
“Fast animal pose estimation using deep neural networks” Nature methods’2019. More details can be found in the paper .
Vinegar Fly keypoints:
0: "head", 1: "eyeL", 2: "eyeR", 3: "neck", 4: "thorax", 5: "abdomen", 6: "forelegR1", 7: "forelegR2", 8: "forelegR3", 9: "forelegR4", 10: "midlegR1", 11: "midlegR2", 12: "midlegR3", 13: "midlegR4", 14: "hindlegR1", 15: "hindlegR2", 16: "hindlegR3", 17: "hindlegR4", 18: "forelegL1", 19: "forelegL2", 20: "forelegL3", 21: "forelegL4", 22: "midlegL1", 23: "midlegL2", 24: "midlegL3", 25: "midlegL4", 26: "hindlegL1", 27: "hindlegL2", 28: "hindlegL3", 29: "hindlegL4", 30: "wingL", 31: "wingR"
- Parameters
ann_file (str) – Annotation file path. Default: ‘’.
bbox_file (str, optional) – Detection result file path. If
bbox_file
is set, detected bboxes loaded from this file will be used instead of ground-truth bboxes. This setting is only for evaluation, i.e., ignored whentest_mode
isFalse
. Default:None
.data_mode (str) – Specifies the mode of data samples:
'topdown'
or'bottomup'
. In'topdown'
mode, each data sample contains one instance; while in'bottomup'
mode, each data sample contains all instances in a image. Default:'topdown'
metainfo (dict, optional) – Meta information for dataset, such as class information. Default:
None
.data_root (str, optional) – The root directory for
data_prefix
andann_file
. Default:None
.data_prefix (dict, optional) – Prefix for training data. Default:
dict(img=None, ann=None)
.filter_cfg (dict, optional) – Config for filter data. Default: None.
indices (int or Sequence[int], optional) – Support using first few data in annotation file to facilitate training/testing on a smaller dataset. Default:
None
which means using alldata_infos
.serialize_data (bool, optional) – Whether to hold memory using serialized objects, when enabled, data loader workers can use shared RAM from master process instead of making a copy. Default:
True
.pipeline (list, optional) – Processing pipeline. Default: [].
test_mode (bool, optional) –
test_mode=True
means in test phase. Default:False
.lazy_init (bool, optional) – Whether to load annotation during instantiation. In some cases, such as visualization, only the meta information of the dataset is needed, which is not necessary to load annotation file.
Basedataset
can skip load annotations to save time by setlazy_init=False
. Default:False
.max_refetch (int, optional) – If
Basedataset.prepare_data
get a None img. The maximum extra number of cycles to get a valid image. Default: 1000.
- class mmpose.datasets.datasets.animal.Horse10Dataset(ann_file: str = '', bbox_file: Optional[str] = None, data_mode: str = 'topdown', metainfo: Optional[dict] = None, data_root: Optional[str] = None, data_prefix: dict = {'img': ''}, filter_cfg: Optional[dict] = None, indices: Optional[Union[int, Sequence[int]]] = None, serialize_data: bool = True, pipeline: List[Union[dict, Callable]] = [], test_mode: bool = False, lazy_init: bool = False, max_refetch: int = 1000)[source]
Horse10Dataset for animal pose estimation.
“Pretraining boosts out-of-domain robustness for pose estimation” WACV’2021. More details can be found in the paper .
Horse-10 keypoints:
0: 'Nose', 1: 'Eye', 2: 'Nearknee', 3: 'Nearfrontfetlock', 4: 'Nearfrontfoot', 5: 'Offknee', 6: 'Offfrontfetlock', 7: 'Offfrontfoot', 8: 'Shoulder', 9: 'Midshoulder', 10: 'Elbow', 11: 'Girth', 12: 'Wither', 13: 'Nearhindhock', 14: 'Nearhindfetlock', 15: 'Nearhindfoot', 16: 'Hip', 17: 'Stifle', 18: 'Offhindhock', 19: 'Offhindfetlock', 20: 'Offhindfoot', 21: 'Ischium'
- Parameters
ann_file (str) – Annotation file path. Default: ‘’.
bbox_file (str, optional) – Detection result file path. If
bbox_file
is set, detected bboxes loaded from this file will be used instead of ground-truth bboxes. This setting is only for evaluation, i.e., ignored whentest_mode
isFalse
. Default:None
.data_mode (str) – Specifies the mode of data samples:
'topdown'
or'bottomup'
. In'topdown'
mode, each data sample contains one instance; while in'bottomup'
mode, each data sample contains all instances in a image. Default:'topdown'
metainfo (dict, optional) – Meta information for dataset, such as class information. Default:
None
.data_root (str, optional) – The root directory for
data_prefix
andann_file
. Default:None
.data_prefix (dict, optional) – Prefix for training data. Default:
dict(img=None, ann=None)
.filter_cfg (dict, optional) – Config for filter data. Default: None.
indices (int or Sequence[int], optional) – Support using first few data in annotation file to facilitate training/testing on a smaller dataset. Default:
None
which means using alldata_infos
.serialize_data (bool, optional) – Whether to hold memory using serialized objects, when enabled, data loader workers can use shared RAM from master process instead of making a copy. Default:
True
.pipeline (list, optional) – Processing pipeline. Default: [].
test_mode (bool, optional) –
test_mode=True
means in test phase. Default:False
.lazy_init (bool, optional) – Whether to load annotation during instantiation. In some cases, such as visualization, only the meta information of the dataset is needed, which is not necessary to load annotation file.
Basedataset
can skip load annotations to save time by setlazy_init=False
. Default:False
.max_refetch (int, optional) – If
Basedataset.prepare_data
get a None img. The maximum extra number of cycles to get a valid image. Default: 1000.
- class mmpose.datasets.datasets.animal.LocustDataset(ann_file: str = '', bbox_file: Optional[str] = None, data_mode: str = 'topdown', metainfo: Optional[dict] = None, data_root: Optional[str] = None, data_prefix: dict = {'img': ''}, filter_cfg: Optional[dict] = None, indices: Optional[Union[int, Sequence[int]]] = None, serialize_data: bool = True, pipeline: List[Union[dict, Callable]] = [], test_mode: bool = False, lazy_init: bool = False, max_refetch: int = 1000)[source]
LocustDataset for animal pose estimation.
“DeepPoseKit, a software toolkit for fast and robust animal pose estimation using deep learning” Elife’2019. More details can be found in the paper .
Desert Locust keypoints:
0: "head", 1: "neck", 2: "thorax", 3: "abdomen1", 4: "abdomen2", 5: "anttipL", 6: "antbaseL", 7: "eyeL", 8: "forelegL1", 9: "forelegL2", 10: "forelegL3", 11: "forelegL4", 12: "midlegL1", 13: "midlegL2", 14: "midlegL3", 15: "midlegL4", 16: "hindlegL1", 17: "hindlegL2", 18: "hindlegL3", 19: "hindlegL4", 20: "anttipR", 21: "antbaseR", 22: "eyeR", 23: "forelegR1", 24: "forelegR2", 25: "forelegR3", 26: "forelegR4", 27: "midlegR1", 28: "midlegR2", 29: "midlegR3", 30: "midlegR4", 31: "hindlegR1", 32: "hindlegR2", 33: "hindlegR3", 34: "hindlegR4"
- Parameters
ann_file (str) – Annotation file path. Default: ‘’.
bbox_file (str, optional) – Detection result file path. If
bbox_file
is set, detected bboxes loaded from this file will be used instead of ground-truth bboxes. This setting is only for evaluation, i.e., ignored whentest_mode
isFalse
. Default:None
.data_mode (str) – Specifies the mode of data samples:
'topdown'
or'bottomup'
. In'topdown'
mode, each data sample contains one instance; while in'bottomup'
mode, each data sample contains all instances in a image. Default:'topdown'
metainfo (dict, optional) – Meta information for dataset, such as class information. Default:
None
.data_root (str, optional) – The root directory for
data_prefix
andann_file
. Default:None
.data_prefix (dict, optional) – Prefix for training data. Default:
dict(img=None, ann=None)
.filter_cfg (dict, optional) – Config for filter data. Default: None.
indices (int or Sequence[int], optional) – Support using first few data in annotation file to facilitate training/testing on a smaller dataset. Default:
None
which means using alldata_infos
.serialize_data (bool, optional) – Whether to hold memory using serialized objects, when enabled, data loader workers can use shared RAM from master process instead of making a copy. Default:
True
.pipeline (list, optional) – Processing pipeline. Default: [].
test_mode (bool, optional) –
test_mode=True
means in test phase. Default:False
.lazy_init (bool, optional) – Whether to load annotation during instantiation. In some cases, such as visualization, only the meta information of the dataset is needed, which is not necessary to load annotation file.
Basedataset
can skip load annotations to save time by setlazy_init=False
. Default:False
.max_refetch (int, optional) – If
Basedataset.prepare_data
get a None img. The maximum extra number of cycles to get a valid image. Default: 1000.
- parse_data_info(raw_data_info: dict) Optional[dict] [source]
Parse raw Locust annotation of an instance.
- Parameters
raw_data_info (dict) –
Raw data information loaded from
ann_file
. It should have following contents:'raw_ann_info'
: Raw annotation of an instance'raw_img_info'
: Raw information of the image thatcontains the instance
- Returns
Parsed instance annotation
- Return type
dict
- class mmpose.datasets.datasets.animal.MacaqueDataset(ann_file: str = '', bbox_file: Optional[str] = None, data_mode: str = 'topdown', metainfo: Optional[dict] = None, data_root: Optional[str] = None, data_prefix: dict = {'img': ''}, filter_cfg: Optional[dict] = None, indices: Optional[Union[int, Sequence[int]]] = None, serialize_data: bool = True, pipeline: List[Union[dict, Callable]] = [], test_mode: bool = False, lazy_init: bool = False, max_refetch: int = 1000)[source]
MacaquePose dataset for animal pose estimation.
“MacaquePose: A novel ‘in the wild’ macaque monkey pose dataset for markerless motion capture” bioRxiv’2020. More details can be found in the paper .
Macaque keypoints:
0: 'nose', 1: 'left_eye', 2: 'right_eye', 3: 'left_ear', 4: 'right_ear', 5: 'left_shoulder', 6: 'right_shoulder', 7: 'left_elbow', 8: 'right_elbow', 9: 'left_wrist', 10: 'right_wrist', 11: 'left_hip', 12: 'right_hip', 13: 'left_knee', 14: 'right_knee', 15: 'left_ankle', 16: 'right_ankle'
- Parameters
ann_file (str) – Annotation file path. Default: ‘’.
bbox_file (str, optional) – Detection result file path. If
bbox_file
is set, detected bboxes loaded from this file will be used instead of ground-truth bboxes. This setting is only for evaluation, i.e., ignored whentest_mode
isFalse
. Default:None
.data_mode (str) – Specifies the mode of data samples:
'topdown'
or'bottomup'
. In'topdown'
mode, each data sample contains one instance; while in'bottomup'
mode, each data sample contains all instances in a image. Default:'topdown'
metainfo (dict, optional) – Meta information for dataset, such as class information. Default:
None
.data_root (str, optional) – The root directory for
data_prefix
andann_file
. Default:None
.data_prefix (dict, optional) – Prefix for training data. Default:
dict(img=None, ann=None)
.filter_cfg (dict, optional) – Config for filter data. Default: None.
indices (int or Sequence[int], optional) – Support using first few data in annotation file to facilitate training/testing on a smaller dataset. Default:
None
which means using alldata_infos
.serialize_data (bool, optional) – Whether to hold memory using serialized objects, when enabled, data loader workers can use shared RAM from master process instead of making a copy. Default:
True
.pipeline (list, optional) – Processing pipeline. Default: [].
test_mode (bool, optional) –
test_mode=True
means in test phase. Default:False
.lazy_init (bool, optional) – Whether to load annotation during instantiation. In some cases, such as visualization, only the meta information of the dataset is needed, which is not necessary to load annotation file.
Basedataset
can skip load annotations to save time by setlazy_init=False
. Default:False
.max_refetch (int, optional) – If
Basedataset.prepare_data
get a None img. The maximum extra number of cycles to get a valid image. Default: 1000.
- class mmpose.datasets.datasets.animal.ZebraDataset(ann_file: str = '', bbox_file: Optional[str] = None, data_mode: str = 'topdown', metainfo: Optional[dict] = None, data_root: Optional[str] = None, data_prefix: dict = {'img': ''}, filter_cfg: Optional[dict] = None, indices: Optional[Union[int, Sequence[int]]] = None, serialize_data: bool = True, pipeline: List[Union[dict, Callable]] = [], test_mode: bool = False, lazy_init: bool = False, max_refetch: int = 1000)[source]
ZebraDataset for animal pose estimation.
“DeepPoseKit, a software toolkit for fast and robust animal pose estimation using deep learning” Elife’2019. More details can be found in the paper .
Zebra keypoints:
0: "snout", 1: "head", 2: "neck", 3: "forelegL1", 4: "forelegR1", 5: "hindlegL1", 6: "hindlegR1", 7: "tailbase", 8: "tailtip"
- Parameters
ann_file (str) – Annotation file path. Default: ‘’.
bbox_file (str, optional) – Detection result file path. If
bbox_file
is set, detected bboxes loaded from this file will be used instead of ground-truth bboxes. This setting is only for evaluation, i.e., ignored whentest_mode
isFalse
. Default:None
.data_mode (str) – Specifies the mode of data samples:
'topdown'
or'bottomup'
. In'topdown'
mode, each data sample contains one instance; while in'bottomup'
mode, each data sample contains all instances in a image. Default:'topdown'
metainfo (dict, optional) – Meta information for dataset, such as class information. Default:
None
.data_root (str, optional) – The root directory for
data_prefix
andann_file
. Default:None
.data_prefix (dict, optional) – Prefix for training data. Default:
dict(img=None, ann=None)
.filter_cfg (dict, optional) – Config for filter data. Default: None.
indices (int or Sequence[int], optional) – Support using first few data in annotation file to facilitate training/testing on a smaller dataset. Default:
None
which means using alldata_infos
.serialize_data (bool, optional) – Whether to hold memory using serialized objects, when enabled, data loader workers can use shared RAM from master process instead of making a copy. Default:
True
.pipeline (list, optional) – Processing pipeline. Default: [].
test_mode (bool, optional) –
test_mode=True
means in test phase. Default:False
.lazy_init (bool, optional) – Whether to load annotation during instantiation. In some cases, such as visualization, only the meta information of the dataset is needed, which is not necessary to load annotation file.
Basedataset
can skip load annotations to save time by setlazy_init=False
. Default:False
.max_refetch (int, optional) – If
Basedataset.prepare_data
get a None img. The maximum extra number of cycles to get a valid image. Default: 1000.
- parse_data_info(raw_data_info: dict) Optional[dict] [source]
Parse raw Zebra annotation of an instance.
- Parameters
raw_data_info (dict) –
Raw data information loaded from
ann_file
. It should have following contents:'raw_ann_info'
: Raw annotation of an instance'raw_img_info'
: Raw information of the image thatcontains the instance
- Returns
Parsed instance annotation
- Return type
dict
- class mmpose.datasets.datasets.fashion.DeepFashionDataset(ann_file: str = '', subset: str = '', bbox_file: Optional[str] = None, data_mode: str = 'topdown', metainfo: Optional[dict] = None, data_root: Optional[str] = None, data_prefix: dict = {'img': ''}, filter_cfg: Optional[dict] = None, indices: Optional[Union[int, Sequence[int]]] = None, serialize_data: bool = True, pipeline: List[Union[dict, Callable]] = [], test_mode: bool = False, lazy_init: bool = False, max_refetch: int = 1000)[source]
DeepFashion dataset (full-body clothes) for fashion landmark detection.
“DeepFashion: Powering Robust Clothes Recognition and Retrieval with Rich Annotations”, CVPR’2016. “Fashion Landmark Detection in the Wild”, ECCV’2016.
The dataset contains 3 categories for full-body, upper-body and lower-body.
Fashion landmark indexes for upper-body clothes:
0: 'left collar', 1: 'right collar', 2: 'left sleeve', 3: 'right sleeve', 4: 'left hem', 5: 'right hem'
Fashion landmark indexes for lower-body clothes:
0: 'left waistline', 1: 'right waistline', 2: 'left hem', 3: 'right hem'
Fashion landmark indexes for full-body clothes:
0: 'left collar', 1: 'right collar', 2: 'left sleeve', 3: 'right sleeve', 4: 'left waistline', 5: 'right waistline', 6: 'left hem', 7: 'right hem'
- Parameters
ann_file (str) – Annotation file path. Default: ‘’.
subset (str) – Specifies the subset of body:
'full'
,'upper'
or'lower'
. Default: ‘’, which means'full'
.bbox_file (str, optional) – Detection result file path. If
bbox_file
is set, detected bboxes loaded from this file will be used instead of ground-truth bboxes. This setting is only for evaluation, i.e., ignored whentest_mode
isFalse
. Default:None
.data_mode (str) – Specifies the mode of data samples:
'topdown'
or'bottomup'
. In'topdown'
mode, each data sample contains one instance; while in'bottomup'
mode, each data sample contains all instances in a image. Default:'topdown'
metainfo (dict, optional) – Meta information for dataset, such as class information. Default:
None
.data_root (str, optional) – The root directory for
data_prefix
andann_file
. Default:None
.data_prefix (dict, optional) – Prefix for training data. Default:
dict(img='')
.filter_cfg (dict, optional) – Config for filter data. Default: None.
indices (int or Sequence[int], optional) – Support using first few data in annotation file to facilitate training/testing on a smaller dataset. Default:
None
which means using alldata_infos
.serialize_data (bool, optional) – Whether to hold memory using serialized objects, when enabled, data loader workers can use shared RAM from master process instead of making a copy. Default:
True
.pipeline (list, optional) – Processing pipeline. Default: [].
test_mode (bool, optional) –
test_mode=True
means in test phase. Default:False
.lazy_init (bool, optional) – Whether to load annotation during instantiation. In some cases, such as visualization, only the meta information of the dataset is needed, which is not necessary to load annotation file.
Basedataset
can skip load annotations to save time by setlazy_init=False
. Default:False
.max_refetch (int, optional) – If
Basedataset.prepare_data
get a None img. The maximum extra number of cycles to get a valid image. Default: 1000.
transforms¶
- class mmpose.datasets.transforms.loading.LoadImage(to_float32: bool = False, color_type: str = 'color', imdecode_backend: str = 'cv2', file_client_args: Optional[dict] = None, ignore_empty: bool = False, *, backend_args: Optional[dict] = None)[source]¶
Load an image from file or from the np.ndarray in
results['img']
.Required Keys:
img_path
img (optional)
Modified Keys:
img
img_shape
ori_shape
img_path (optional)
- Parameters
to_float32 (bool) – Whether to convert the loaded image to a float32 numpy array. If set to False, the loaded image is an uint8 array. Defaults to False.
color_type (str) – The flag argument for :func:
mmcv.imfrombytes
. Defaults to ‘color’.imdecode_backend (str) – The image decoding backend type. The backend argument for :func:
mmcv.imfrombytes
. See :func:mmcv.imfrombytes
for details. Defaults to ‘cv2’.file_client_args (dict) – Arguments to instantiate a FileClient. See
mmengine.fileio.FileClient
for details. Defaults todict(backend='disk')
.ignore_empty (bool) – Whether to allow loading empty image or file path not existent. Defaults to False.
- class mmpose.datasets.transforms.common_transforms.Albumentation(transforms: List[dict], keymap: Optional[dict] = None)[source]¶
Albumentation augmentation (pixel-level transforms only).
Adds custom pixel-level transformations from Albumentations library. Please visit https://albumentations.ai/docs/ to get more information.
Note: we only support pixel-level transforms. Please visit https://github.com/albumentations-team/ albumentations#pixel-level-transforms to get more information about pixel-level transforms.
Required Keys:
img
Modified Keys:
img
- Parameters
transforms (List[dict]) –
A list of Albumentation transforms. An example of
transforms
is as followed: .. code-block:: python- [
- dict(
type=’RandomBrightnessContrast’, brightness_limit=[0.1, 0.3], contrast_limit=[0.1, 0.3], p=0.2),
dict(type=’ChannelShuffle’, p=0.1), dict(
type=’OneOf’, transforms=[
dict(type=’Blur’, blur_limit=3, p=1.0), dict(type=’MedianBlur’, blur_limit=3, p=1.0)
], p=0.1),
]
keymap (dict | None) – key mapping from
input key
toalbumentation-style key
. Defaults to None, which will use {‘img’: ‘image’}.
- albu_builder(cfg: dict) None [source]¶
Import a module from albumentations.
It resembles some of
build_from_cfg()
logic.- Parameters
cfg (dict) – Config dict. It should at least contain the key “type”.
- Returns
The constructed transform object
- Return type
albumentations.BasicTransform
- transform(results: dict) dict [source]¶
The transform function of
Albumentation
to apply albumentations transforms.See
transform()
method ofBaseTransform
for details.- Parameters
results (dict) – Result dict from the data pipeline.
- Returns
updated result dict.
- Return type
dict
- class mmpose.datasets.transforms.common_transforms.GenerateTarget(encoder: Union[mmengine.config.config.ConfigDict, dict, List[Union[mmengine.config.config.ConfigDict, dict]]], target_type: Optional[str] = None, multilevel: bool = False, use_dataset_keypoint_weights: bool = False)[source]¶
Encode keypoints into Target.
The generated target is usually the supervision signal of the model learning, e.g. heatmaps or regression labels.
Required Keys:
keypoints
keypoints_visible
dataset_keypoint_weights
Added Keys:
- The keys of the encoded items from the codec will be updated into
the results, e.g.
'heatmaps'
or'keypoint_weights'
. See the specific codec for more details.
- Parameters
encoder (dict | list[dict]) – The codec config for keypoint encoding. Both single encoder and multiple encoders (given as a list) are supported
multilevel (bool) – Determine the method to handle multiple encoders. If
multilevel==True
, generate multilevel targets from a group of encoders of the same type (e.g. multipleMSRAHeatmap
encoders with different sigma values); Ifmultilevel==False
, generate combined targets from a group of different encoders. This argument will have no effect in case of single encoder. Defaults toFalse
use_dataset_keypoint_weights (bool) – Whether use the keypoint weights from the dataset meta information. Defaults to
False
target_type (str, deprecated) – This argument is deprecated and has no effect. Defaults to
None
- transform(results: Dict) Optional[dict] [source]¶
The transform function of
GenerateTarget
.See
transform()
method ofBaseTransform
for details.
- class mmpose.datasets.transforms.common_transforms.GetBBoxCenterScale(padding: float = 1.25)[source]¶
Convert bboxes from [x, y, w, h] to center and scale.
The center is the coordinates of the bbox center, and the scale is the bbox width and height normalized by a scale factor.
Required Keys:
bbox
Added Keys:
bbox_center
bbox_scale
- Parameters
padding (float) – The bbox padding scale that will be multilied to bbox_scale. Defaults to 1.25
- transform(results: Dict) Optional[dict] [source]¶
The transform function of
GetBBoxCenterScale
.See
transform()
method ofBaseTransform
for details.- Parameters
results (dict) – The result dict
- Returns
The result dict.
- Return type
dict
- class mmpose.datasets.transforms.common_transforms.PhotometricDistortion(brightness_delta: int = 32, contrast_range: Sequence[Union[int, float]] = (0.5, 1.5), saturation_range: Sequence[Union[int, float]] = (0.5, 1.5), hue_delta: int = 18)[source]¶
Apply photometric distortion to image sequentially, every transformation is applied with a probability of 0.5. The position of random contrast is in second or second to last.
random brightness
random contrast (mode 0)
convert color from BGR to HSV
random saturation
random hue
convert color from HSV to BGR
random contrast (mode 1)
randomly swap channels
Required Keys:
img
Modified Keys:
img
- Parameters
brightness_delta (int) – delta of brightness.
contrast_range (tuple) – range of contrast.
saturation_range (tuple) – range of saturation.
hue_delta (int) – delta of hue.
- transform(results: dict) dict [source]¶
The transform function of
PhotometricDistortion
to perform photometric distortion on images.See
transform()
method ofBaseTransform
for details.- Parameters
results (dict) – Result dict from the data pipeline.
- Returns
Result dict with images distorted.
- Return type
dict
- class mmpose.datasets.transforms.common_transforms.RandomBBoxTransform(shift_factor: float = 0.16, shift_prob: float = 0.3, scale_factor: Tuple[float, float] = (0.5, 1.5), scale_prob: float = 1.0, rotate_factor: float = 80.0, rotate_prob: float = 0.6)[source]¶
Rnadomly shift, resize and rotate the bounding boxes.
Required Keys:
bbox_center
bbox_scale
Modified Keys:
bbox_center
bbox_scale
- Added Keys:
bbox_rotation
- Parameters
shift_factor (float) – Randomly shift the bbox in range \([-dx, dx]\) and \([-dy, dy]\) in X and Y directions, where \(dx(y) = x(y)_scale \cdot shift_factor\) in pixels. Defaults to 0.16
shift_prob (float) – Probability of applying random shift. Defaults to 0.3
scale_factor (Tuple[float, float]) – Randomly resize the bbox in range \([scale_factor[0], scale_factor[1]]\). Defaults to (0.5, 1.5)
scale_prob (float) – Probability of applying random resizing. Defaults to 1.0
rotate_factor (float) – Randomly rotate the bbox in \([-rotate_factor, rotate_factor]\) in degrees. Defaults to 80.0
rotate_prob (float) – Probability of applying random rotation. Defaults to 0.6
- class mmpose.datasets.transforms.common_transforms.RandomFlip(prob: Union[float, List[float]] = 0.5, direction: Union[str, List[str]] = 'horizontal')[source]¶
Randomly flip the image, bbox and keypoints.
Required Keys:
img
img_shape
flip_indices
input_size (optional)
bbox (optional)
bbox_center (optional)
keypoints (optional)
keypoints_visible (optional)
img_mask (optional)
Modified Keys:
img
bbox (optional)
bbox_center (optional)
keypoints (optional)
keypoints_visible (optional)
img_mask (optional)
Added Keys:
flip
flip_direction
- Parameters
prob (float | list[float]) – The flipping probability. If a list is given, the argument direction should be a list with the same length. And each element in prob indicates the flipping probability of the corresponding one in
direction
. Defaults to 0.5direction (str | list[str]) – The flipping direction. Options are
'horizontal'
,'vertical'
and'diagonal'
. If a list is is given, each data sample’s flipping direction will be sampled from a distribution determined by the argumentprob
. Defaults to'horizontal'
.
- transform(results: dict) dict [source]¶
The transform function of
RandomFlip
.See
transform()
method ofBaseTransform
for details.- Parameters
results (dict) – The result dict
- Returns
The result dict.
- Return type
dict
- class mmpose.datasets.transforms.common_transforms.RandomHalfBody(min_total_keypoints: int = 9, min_upper_keypoints: int = 2, min_lower_keypoints: int = 3, padding: float = 1.5, prob: float = 0.3, upper_prioritized_prob: float = 0.7)[source]¶
Data augmentation with half-body transform that keeps only the upper or lower body at random.
Required Keys:
keypoints
keypoints_visible
upper_body_ids
lower_body_ids
Modified Keys:
bbox
bbox_center
bbox_scale
- Parameters
min_total_keypoints (int) – The minimum required number of total valid keypoints of a person to apply half-body transform. Defaults to 8
min_half_keypoints (int) – The minimum required number of valid half-body keypoints of a person to apply half-body transform. Defaults to 2
padding (float) – The bbox padding scale that will be multilied to bbox_scale. Defaults to 1.5
prob (float) – The probability to apply half-body transform when the keypoint number meets the requirement. Defaults to 0.3
- class mmpose.datasets.transforms.topdown_transforms.TopdownAffine(input_size: Tuple[int, int], use_udp: bool = False)[source]¶
Get the bbox image as the model input by affine transform.
Required Keys:
img
bbox_center
bbox_scale
bbox_rotation (optional)
keypoints (optional)
Modified Keys:
img
bbox_scale
Added Keys:
input_size
transformed_keypoints
- Parameters
input_size (Tuple[int, int]) – The input image size of the model in [w, h]. The bbox region will be cropped and resize to input_size
use_udp (bool) – Whether use unbiased data processing. See `UDP (CVPR 2020)`_ for details. Defaults to
False
- transform(results: Dict) Optional[dict] [source]¶
The transform function of
TopdownAffine
.See
transform()
method ofBaseTransform
for details.- Parameters
results (dict) – The result dict
- Returns
The result dict.
- Return type
dict
- class mmpose.datasets.transforms.bottomup_transforms.BottomupGetHeatmapMask[source]¶
Generate the mask of valid regions from the segmentation annotation.
Required Keys:
img_shape
invalid_segs (optional)
warp_mat (optional)
flip (optional)
flip_direction (optional)
heatmaps (optional)
Added Keys:
heatmap_mask
- transform(results: Dict) Optional[dict] [source]¶
The transform function of
BottomupGetHeatmapMask
to perform photometric distortion on images.See
transform()
method ofBaseTransform
for details.- Parameters
results (dict) – Result dict from the data pipeline.
- Returns
Result dict with images distorted.
- Return type
dict
- class mmpose.datasets.transforms.bottomup_transforms.BottomupRandomAffine(input_size: Tuple[int, int], shift_factor: float = 0.2, shift_prob: float = 1.0, scale_factor: Tuple[float, float] = (0.75, 1.5), scale_prob: float = 1.0, scale_type: str = 'short', rotate_factor: float = 30.0, rotate_prob: float = 1, use_udp: bool = False)[source]¶
Randomly shift, resize and rotate the image.
Required Keys:
img
img_shape
keypoints (optional)
Modified Keys:
img
keypoints (optional)
Added Keys:
input_size
warp_mat
- Parameters
input_size (Tuple[int, int]) – The input image size of the model in [w, h]
shift_factor (float) – Randomly shift the image in range \([-dx, dx]\) and \([-dy, dy]\) in X and Y directions, where \(dx(y) = img_w(h) \cdot shift_factor\) in pixels. Defaults to 0.2
shift_prob (float) – Probability of applying random shift. Defaults to 1.0
scale_factor (Tuple[float, float]) – Randomly resize the image in range \([scale_factor[0], scale_factor[1]]\). Defaults to (0.75, 1.5)
scale_prob (float) – Probability of applying random resizing. Defaults to 1.0
scale_type (str) – wrt
long
orshort
length of the image. Defaults toshort
rotate_factor (float) – Randomly rotate the bbox in \([-rotate_factor, rotate_factor]\) in degrees. Defaults to 40.0
use_udp (bool) – Whether use unbiased data processing. See `UDP (CVPR 2020)`_ for details. Defaults to
False
- transform(results: Dict) Optional[dict] [source]¶
The transform function of
BottomupRandomAffine
to perform photometric distortion on images.See
transform()
method ofBaseTransform
for details.- Parameters
results (dict) – Result dict from the data pipeline.
- Returns
Result dict with images distorted.
- Return type
dict
- class mmpose.datasets.transforms.bottomup_transforms.BottomupResize(input_size: Tuple[int, int], aug_scales: Optional[List[float]] = None, size_factor: int = 32, resize_mode: str = 'fit', use_udp: bool = False)[source]¶
Resize the image to the input size of the model. Optionally, the image can be resized to multiple sizes to build a image pyramid for multi-scale inference.
Required Keys:
img
ori_shape
Modified Keys:
img
img_shape
Added Keys:
input_size
warp_mat
aug_scale
- Parameters
input_size (Tuple[int, int]) – The input size of the model in [w, h]. Note that the actually size of the resized image will be affected by
resize_mode
andsize_factor
, thus may not exactly equals to theinput_size
aug_scales (List[float], optional) – The extra input scales for multi-scale testing. If given, the input image will be resized to different scales to build a image pyramid. And heatmaps from all scales will be aggregated to make final prediction. Defaults to
None
size_factor (int) – The actual input size will be ceiled to a multiple of the size_factor value at both sides. Defaults to 16
resize_mode (str) –
The method to resize the image to the input size. Options are:
'fit'
: The image will be resized according to therelatively longer side with the aspect ratio kept. The resized image will entirely fits into the range of the input size
'expand'
: The image will be resized according to therelatively shorter side with the aspect ratio kept. The resized image will exceed the given input size at the longer side
use_udp (bool) – Whether use unbiased data processing. See `UDP (CVPR 2020)`_ for details. Defaults to
False
- transform(results: Dict) Optional[dict] [source]¶
The transform function of
BottomupResize
to perform photometric distortion on images.See
transform()
method ofBaseTransform
for details.- Parameters
results (dict) – Result dict from the data pipeline.
- Returns
Result dict with images distorted.
- Return type
dict
- class mmpose.datasets.transforms.formatting.PackPoseInputs(meta_keys=('id', 'img_id', 'img_path', 'category_id', 'crowd_index', 'ori_shape', 'img_shape', 'input_size', 'input_center', 'input_scale', 'flip', 'flip_direction', 'flip_indices', 'raw_ann_info'), pack_transformed=False)[source]¶
Pack the inputs data for pose estimation.
The
img_meta
item is always populated. The contents of theimg_meta
dictionary depends onmeta_keys
. By default it includes:id
: id of the data sampleimg_id
: id of the image'category_id'
: the id of the instance categoryimg_path
: path to the image filecrowd_index
(optional): measure the crowding level of an image,defined in CrowdPose dataset
ori_shape
: original shape of the image as a tuple (h, w, c)img_shape
: shape of the image input to the network as a tuple (h, w). Note that images may be zero padded on the bottom/right if the batch tensor is larger than this shape.input_size
: the input size to the networkflip
: a boolean indicating if image flip transform was usedflip_direction
: the flipping directionflip_indices
: the indices of each keypoint’s symmetric keypointraw_ann_info
(optional): raw annotation of the instance(s)
- Parameters
meta_keys (Sequence[str], optional) – Meta keys which will be stored in :obj: PoseDataSample as meta info. Defaults to
('id', 'img_id', 'img_path', 'category_id', 'crowd_index, 'ori_shape', 'img_shape',, 'input_size', 'input_center', 'input_scale', 'flip', 'flip_direction', 'flip_indices', 'raw_ann_info')
- mmpose.datasets.transforms.formatting.image_to_tensor(img: Union[numpy.ndarray, Sequence[numpy.ndarray]]) torch.Tensor [source]¶
Translate image or sequence of images to tensor. Multiple image tensors will be stacked.
- Parameters
value (np.ndarray | Sequence[np.ndarray]) – The original image or image sequence
- Returns
The output tensor.
- Return type
torch.Tensor
mmpose.structures¶
- class mmpose.structures.MultilevelPixelData(*, metainfo: Optional[dict] = None, **kwargs)[source]¶
Data structure for multi-level pixel-wise annotations or predictions.
All data items in
data_fields
ofMultilevelPixelData
are lists of np.ndarray or torch.Tensor, and should meet the following requirements:Have the same length, which is the number of levels
- At each level, the data should have 3 dimensions in order of channel,
height and weight
At each level, the data should have the same height and weight
Examples
>>> metainfo = dict(num_keypoints=17) >>> sizes = [(64, 48), (128, 96), (256, 192)] >>> heatmaps = [np.random.rand(17, h, w) for h, w in sizes] >>> masks = [torch.rand(1, h, w) for h, w in sizes] >>> data = MultilevelPixelData(metainfo=metainfo, ... heatmaps=heatmaps, ... masks=masks)
>>> # get data item >>> heatmaps = data.heatmaps # A list of 3 numpy.ndarrays >>> masks = data.masks # A list of 3 torch.Tensors
>>> # get level >>> data_l0 = data[0] # PixelData with fields 'heatmaps' and 'masks' >>> data.nlevel 3
>>> # get shape >>> data.shape ((64, 48), (128, 96), (256, 192))
>>> # set >>> offset_maps = [torch.rand(2, h, w) for h, w in sizes] >>> data.offset_maps = offset_maps
- cpu() mmpose.structures.multilevel_pixel_data.MultilevelPixelData [source]¶
Convert all tensors to CPU in data.
- cuda() mmpose.structures.multilevel_pixel_data.MultilevelPixelData [source]¶
Convert all tensors to GPU in data.
- detach() mmpose.structures.multilevel_pixel_data.MultilevelPixelData [source]¶
Detach all tensors in data.
- property nlevel¶
Return the level number.
- Returns
The level number, or
None
if the data has not been assigned.- Return type
Optional[int]
- numpy() mmpose.structures.multilevel_pixel_data.MultilevelPixelData [source]¶
Convert all tensor to np.narray in data.
- set_data(data: dict) None [source]¶
Set or change key-value pairs in
data_field
by parameterdata
.- Parameters
data (dict) – A dict contains annotations of image or model predictions.
- set_field(value: Any, name: str, dtype: Optional[Union[Type, Tuple[Type, ...]]] = None, field_type: str = 'data') None [source]¶
Special method for set union field, used as property.setter functions.
- property shape: Optional[Tuple[Tuple]]¶
Get the shape of multi-level pixel data.
- Returns
A tuple of data shape at each level, or
None
if the data has not been assigned.- Return type
Optional[tuple]
- to(*args, **kwargs) mmpose.structures.multilevel_pixel_data.MultilevelPixelData [source]¶
Apply same name function to all tensors in data_fields.
- to_tensor() mmpose.structures.multilevel_pixel_data.MultilevelPixelData [source]¶
Convert all tensor to np.narray in data.
- class mmpose.structures.PoseDataSample(*, metainfo: Optional[dict] = None, **kwargs)[source]¶
The base data structure of MMPose that is used as the interface between modules.
The attributes of
PoseDataSample
includes:- ``gt_instances``(InstanceData): Ground truth of instances with
keypoint annotations
- ``pred_instances``(InstanceData): Instances with keypoint
predictions
- ``gt_fields``(PixelData): Ground truth of spatial distribution
annotations like keypoint heatmaps and part affine fields (PAF)
``pred_fields``(PixelData): Predictions of spatial distributions
Examples
>>> import torch >>> from mmengine.structures import InstanceData, PixelData >>> from mmpose.structures import PoseDataSample
>>> pose_meta = dict(img_shape=(800, 1216), ... crop_size=(256, 192), ... heatmap_size=(64, 48)) >>> gt_instances = InstanceData() >>> gt_instances.bboxes = torch.rand((1, 4)) >>> gt_instances.keypoints = torch.rand((1, 17, 2)) >>> gt_instances.keypoints_visible = torch.rand((1, 17, 1)) >>> gt_fields = PixelData() >>> gt_fields.heatmaps = torch.rand((17, 64, 48))
>>> data_sample = PoseDataSample(gt_instances=gt_instances, ... gt_fields=gt_fields, ... metainfo=pose_meta) >>> assert 'img_shape' in data_sample >>> len(data_sample.gt_intances) 1
- mmpose.structures.bbox_cs2xywh(center: numpy.ndarray, scale: numpy.ndarray, padding: float = 1.0) numpy.ndarray [source]¶
Transform the bbox format from (center, scale) to (x,y,w,h).
- Parameters
center (ndarray) – BBox center (x, y) in shape (2,) or (n, 2)
scale (ndarray) – BBox scale (w, h) in shape (2,) or (n, 2)
padding (float) – BBox padding factor that will be multilied to scale. Default: 1.0
- Returns
BBox (x, y, w, h) in shape (4, ) or (n, 4)
- Return type
ndarray[float32]
- mmpose.structures.bbox_cs2xyxy(center: numpy.ndarray, scale: numpy.ndarray, padding: float = 1.0) numpy.ndarray [source]¶
Transform the bbox format from (center, scale) to (x,y,w,h).
- Parameters
center (ndarray) – BBox center (x, y) in shape (2,) or (n, 2)
scale (ndarray) – BBox scale (w, h) in shape (2,) or (n, 2)
padding (float) – BBox padding factor that will be multilied to scale. Default: 1.0
- Returns
BBox (x, y, w, h) in shape (4, ) or (n, 4)
- Return type
ndarray[float32]
- mmpose.structures.bbox_xywh2cs(bbox: numpy.ndarray, padding: float = 1.0) Tuple[numpy.ndarray, numpy.ndarray] [source]¶
Transform the bbox format from (x,y,w,h) into (center, scale)
- Parameters
bbox (ndarray) – Bounding box(es) in shape (4,) or (n, 4), formatted as (x, y, h, w)
padding (float) – BBox padding factor that will be multilied to scale. Default: 1.0
- Returns
A tuple containing center and scale. - np.ndarray[float32]: Center (x, y) of the bbox in shape (2,) or
(n, 2)
- np.ndarray[float32]: Scale (w, h) of the bbox in shape (2,) or
(n, 2)
- Return type
tuple
- mmpose.structures.bbox_xywh2xyxy(bbox_xywh: numpy.ndarray) numpy.ndarray [source]¶
Transform the bbox format from xywh to x1y1x2y2.
- Parameters
bbox_xywh (ndarray) – Bounding boxes (with scores), shaped (n, 4) or (n, 5). (left, top, width, height, [score])
- Returns
- Bounding boxes (with scores), shaped (n, 4) or
(n, 5). (left, top, right, bottom, [score])
- Return type
np.ndarray
- mmpose.structures.bbox_xyxy2cs(bbox: numpy.ndarray, padding: float = 1.0) Tuple[numpy.ndarray, numpy.ndarray] [source]¶
Transform the bbox format from (x,y,w,h) into (center, scale)
- Parameters
bbox (ndarray) – Bounding box(es) in shape (4,) or (n, 4), formatted as (left, top, right, bottom)
padding (float) – BBox padding factor that will be multilied to scale. Default: 1.0
- Returns
A tuple containing center and scale. - np.ndarray[float32]: Center (x, y) of the bbox in shape (2,) or
(n, 2)
- np.ndarray[float32]: Scale (w, h) of the bbox in shape (2,) or
(n, 2)
- Return type
tuple
- mmpose.structures.bbox_xyxy2xywh(bbox_xyxy: numpy.ndarray) numpy.ndarray [source]¶
Transform the bbox format from x1y1x2y2 to xywh.
- Parameters
bbox_xyxy (np.ndarray) – Bounding boxes (with scores), shaped (n, 4) or (n, 5). (left, top, right, bottom, [score])
- Returns
- Bounding boxes (with scores),
shaped (n, 4) or (n, 5). (left, top, width, height, [score])
- Return type
np.ndarray
- mmpose.structures.flip_bbox(bbox: numpy.ndarray, image_size: Tuple[int, int], bbox_format: str = 'xywh', direction: str = 'horizontal') numpy.ndarray [source]¶
Flip the bbox in the given direction.
- Parameters
bbox (np.ndarray) – The bounding boxes. The shape should be (…, 4) if
bbox_format
is'xyxy'
or'xywh'
, and (…, 2) ifbbox_format
is'center'
image_size (tuple) – The image shape in [w, h]
bbox_format (str) – The bbox format. Options are
'xywh'
,'xyxy'
and'center'
.direction (str) – The flip direction. Options are
'horizontal'
,'vertical'
and'diagonal'
. Defaults to'horizontal'
- Returns
The flipped bounding boxes.
- Return type
np.ndarray
- mmpose.structures.flip_keypoints(keypoints: numpy.ndarray, keypoints_visible: Optional[numpy.ndarray], image_size: Tuple[int, int], flip_indices: List[int], direction: str = 'horizontal') Tuple[numpy.ndarray, Optional[numpy.ndarray]] [source]¶
Flip keypoints in the given direction.
Note
keypoint number: K
keypoint dimension: D
- Parameters
keypoints (np.ndarray) – Keypoints in shape (…, K, D)
keypoints_visible (np.ndarray, optional) – The visibility of keypoints in shape (…, K, 1). Set
None
if the keypoint visibility is unavailableimage_size (tuple) – The image shape in [w, h]
flip_indices (List[int]) – The indices of each keypoint’s symmetric keypoint
direction (str) – The flip direction. Options are
'horizontal'
,'vertical'
and'diagonal'
. Defaults to'horizontal'
- Returns
- keypoints_flipped (np.ndarray): Flipped keypoints in shape
(…, K, D)
- keypoints_visible_flipped (np.ndarray, optional): Flipped keypoints’
visibility in shape (…, K, 1). Return
None
if the inputkeypoints_visible
isNone
- Return type
tuple
- mmpose.structures.get_udp_warp_matrix(center: numpy.ndarray, scale: numpy.ndarray, rot: float, output_size: Tuple[int, int]) numpy.ndarray [source]¶
Calculate the affine transformation matrix under the unbiased constraint. See `UDP (CVPR 2020)`_ for details.
Note
The bbox number: N
- Parameters
center (np.ndarray[2, ]) – Center of the bounding box (x, y).
scale (np.ndarray[2, ]) – Scale of the bounding box wrt [width, height].
rot (float) – Rotation angle (degree).
output_size (tuple) – Size ([w, h]) of the output image
- Returns
A 2x3 transformation matrix
- Return type
np.ndarray
- mmpose.structures.get_warp_matrix(center: numpy.ndarray, scale: numpy.ndarray, rot: float, output_size: Tuple[int, int], shift: Tuple[float, float] = (0.0, 0.0), inv: bool = False) numpy.ndarray [source]¶
Calculate the affine transformation matrix that can warp the bbox area in the input image to the output size.
- Parameters
center (np.ndarray[2, ]) – Center of the bounding box (x, y).
scale (np.ndarray[2, ]) – Scale of the bounding box wrt [width, height].
rot (float) – Rotation angle (degree).
output_size (np.ndarray[2, ] | list(2,)) – Size of the destination heatmaps.
shift (0-100%) – Shift translation ratio wrt the width/height. Default (0., 0.).
inv (bool) – Option to inverse the affine transform direction. (inv=False: src->dst or inv=True: dst->src)
- Returns
A 2x3 transformation matrix
- Return type
np.ndarray
- mmpose.structures.merge_data_samples(data_samples: List[mmpose.structures.pose_data_sample.PoseDataSample]) mmpose.structures.pose_data_sample.PoseDataSample [source]¶
Merge the given data samples into a single data sample.
This function can be used to merge the top-down predictions with bboxes from the same image. The merged data sample will contain all instances from the input data samples, and the identical metainfo with the first input data sample.
- Parameters
data_samples (List[
PoseDataSample
]) – The data samples to merge- Returns
The merged data sample.
- Return type
- mmpose.structures.revert_heatmap(heatmap, bbox_center, bbox_scale, img_shape)[source]¶
Revert predicted heatmap on the original image.
- Parameters
heatmap (np.ndarray or torch.tensor) – predicted heatmap.
bbox_center (np.ndarray) – bounding box center coordinate.
bbox_scale (np.ndarray) – bounding box scale.
img_shape (tuple or list) – size of original image.
- mmpose.structures.split_instances(instances: mmengine.structures.instance_data.InstanceData) List[mmengine.structures.instance_data.InstanceData] [source]¶
Convert instances into a list where each element is a dict that contains information about one instance.
bbox¶
- mmpose.structures.bbox.bbox_cs2xywh(center: numpy.ndarray, scale: numpy.ndarray, padding: float = 1.0) numpy.ndarray [source]¶
Transform the bbox format from (center, scale) to (x,y,w,h).
- Parameters
center (ndarray) – BBox center (x, y) in shape (2,) or (n, 2)
scale (ndarray) – BBox scale (w, h) in shape (2,) or (n, 2)
padding (float) – BBox padding factor that will be multilied to scale. Default: 1.0
- Returns
BBox (x, y, w, h) in shape (4, ) or (n, 4)
- Return type
ndarray[float32]
- mmpose.structures.bbox.bbox_cs2xyxy(center: numpy.ndarray, scale: numpy.ndarray, padding: float = 1.0) numpy.ndarray [source]¶
Transform the bbox format from (center, scale) to (x,y,w,h).
- Parameters
center (ndarray) – BBox center (x, y) in shape (2,) or (n, 2)
scale (ndarray) – BBox scale (w, h) in shape (2,) or (n, 2)
padding (float) – BBox padding factor that will be multilied to scale. Default: 1.0
- Returns
BBox (x, y, w, h) in shape (4, ) or (n, 4)
- Return type
ndarray[float32]
- mmpose.structures.bbox.bbox_xywh2cs(bbox: numpy.ndarray, padding: float = 1.0) Tuple[numpy.ndarray, numpy.ndarray] [source]¶
Transform the bbox format from (x,y,w,h) into (center, scale)
- Parameters
bbox (ndarray) – Bounding box(es) in shape (4,) or (n, 4), formatted as (x, y, h, w)
padding (float) – BBox padding factor that will be multilied to scale. Default: 1.0
- Returns
A tuple containing center and scale. - np.ndarray[float32]: Center (x, y) of the bbox in shape (2,) or
(n, 2)
- np.ndarray[float32]: Scale (w, h) of the bbox in shape (2,) or
(n, 2)
- Return type
tuple
- mmpose.structures.bbox.bbox_xywh2xyxy(bbox_xywh: numpy.ndarray) numpy.ndarray [source]¶
Transform the bbox format from xywh to x1y1x2y2.
- Parameters
bbox_xywh (ndarray) – Bounding boxes (with scores), shaped (n, 4) or (n, 5). (left, top, width, height, [score])
- Returns
- Bounding boxes (with scores), shaped (n, 4) or
(n, 5). (left, top, right, bottom, [score])
- Return type
np.ndarray
- mmpose.structures.bbox.bbox_xyxy2cs(bbox: numpy.ndarray, padding: float = 1.0) Tuple[numpy.ndarray, numpy.ndarray] [source]¶
Transform the bbox format from (x,y,w,h) into (center, scale)
- Parameters
bbox (ndarray) – Bounding box(es) in shape (4,) or (n, 4), formatted as (left, top, right, bottom)
padding (float) – BBox padding factor that will be multilied to scale. Default: 1.0
- Returns
A tuple containing center and scale. - np.ndarray[float32]: Center (x, y) of the bbox in shape (2,) or
(n, 2)
- np.ndarray[float32]: Scale (w, h) of the bbox in shape (2,) or
(n, 2)
- Return type
tuple
- mmpose.structures.bbox.bbox_xyxy2xywh(bbox_xyxy: numpy.ndarray) numpy.ndarray [source]¶
Transform the bbox format from x1y1x2y2 to xywh.
- Parameters
bbox_xyxy (np.ndarray) – Bounding boxes (with scores), shaped (n, 4) or (n, 5). (left, top, right, bottom, [score])
- Returns
- Bounding boxes (with scores),
shaped (n, 4) or (n, 5). (left, top, width, height, [score])
- Return type
np.ndarray
- mmpose.structures.bbox.flip_bbox(bbox: numpy.ndarray, image_size: Tuple[int, int], bbox_format: str = 'xywh', direction: str = 'horizontal') numpy.ndarray [source]¶
Flip the bbox in the given direction.
- Parameters
bbox (np.ndarray) – The bounding boxes. The shape should be (…, 4) if
bbox_format
is'xyxy'
or'xywh'
, and (…, 2) ifbbox_format
is'center'
image_size (tuple) – The image shape in [w, h]
bbox_format (str) – The bbox format. Options are
'xywh'
,'xyxy'
and'center'
.direction (str) – The flip direction. Options are
'horizontal'
,'vertical'
and'diagonal'
. Defaults to'horizontal'
- Returns
The flipped bounding boxes.
- Return type
np.ndarray
- mmpose.structures.bbox.get_udp_warp_matrix(center: numpy.ndarray, scale: numpy.ndarray, rot: float, output_size: Tuple[int, int]) numpy.ndarray [source]¶
Calculate the affine transformation matrix under the unbiased constraint. See `UDP (CVPR 2020)`_ for details.
Note
The bbox number: N
- Parameters
center (np.ndarray[2, ]) – Center of the bounding box (x, y).
scale (np.ndarray[2, ]) – Scale of the bounding box wrt [width, height].
rot (float) – Rotation angle (degree).
output_size (tuple) – Size ([w, h]) of the output image
- Returns
A 2x3 transformation matrix
- Return type
np.ndarray
- mmpose.structures.bbox.get_warp_matrix(center: numpy.ndarray, scale: numpy.ndarray, rot: float, output_size: Tuple[int, int], shift: Tuple[float, float] = (0.0, 0.0), inv: bool = False) numpy.ndarray [source]¶
Calculate the affine transformation matrix that can warp the bbox area in the input image to the output size.
- Parameters
center (np.ndarray[2, ]) – Center of the bounding box (x, y).
scale (np.ndarray[2, ]) – Scale of the bounding box wrt [width, height].
rot (float) – Rotation angle (degree).
output_size (np.ndarray[2, ] | list(2,)) – Size of the destination heatmaps.
shift (0-100%) – Shift translation ratio wrt the width/height. Default (0., 0.).
inv (bool) – Option to inverse the affine transform direction. (inv=False: src->dst or inv=True: dst->src)
- Returns
A 2x3 transformation matrix
- Return type
np.ndarray
keypoint¶
- mmpose.structures.keypoint.flip_keypoints(keypoints: numpy.ndarray, keypoints_visible: Optional[numpy.ndarray], image_size: Tuple[int, int], flip_indices: List[int], direction: str = 'horizontal') Tuple[numpy.ndarray, Optional[numpy.ndarray]] [source]¶
Flip keypoints in the given direction.
Note
keypoint number: K
keypoint dimension: D
- Parameters
keypoints (np.ndarray) – Keypoints in shape (…, K, D)
keypoints_visible (np.ndarray, optional) – The visibility of keypoints in shape (…, K, 1). Set
None
if the keypoint visibility is unavailableimage_size (tuple) – The image shape in [w, h]
flip_indices (List[int]) – The indices of each keypoint’s symmetric keypoint
direction (str) – The flip direction. Options are
'horizontal'
,'vertical'
and'diagonal'
. Defaults to'horizontal'
- Returns
- keypoints_flipped (np.ndarray): Flipped keypoints in shape
(…, K, D)
- keypoints_visible_flipped (np.ndarray, optional): Flipped keypoints’
visibility in shape (…, K, 1). Return
None
if the inputkeypoints_visible
isNone
- Return type
tuple
mmpose.registry¶
MMPose provides following registry nodes to support using modules across projects.
Each node is a child of the root registry in MMEngine. More details can be found at https://mmengine.readthedocs.io/en/latest/tutorials/registry.html.
mmpose.evaluation¶
metrics¶
- class mmpose.evaluation.metrics.AUC(norm_factor: float = 30, num_thrs: int = 20, collect_device: str = 'cpu', prefix: Optional[str] = None)[source]¶
AUC evaluation metric.
Calculate the Area Under Curve (AUC) of keypoint PCK accuracy.
By altering the threshold percentage in the calculation of PCK accuracy, AUC can be generated to further evaluate the pose estimation algorithms.
Note
length of dataset: N
num_keypoints: K
number of keypoint dimensions: D (typically D = 2)
- Parameters
norm_factor (float) – AUC normalization factor, Default: 30 (pixels).
num_thrs (int) – number of thresholds to calculate auc. Default: 20.
collect_device (str) – Device name used for collecting results from different ranks during distributed training. Must be
'cpu'
or'gpu'
. Default:'cpu'
.prefix (str, optional) – The prefix that will be added in the metric names to disambiguate homonymous metrics of different evaluators. If prefix is not provided in the argument,
self.default_prefix
will be used instead. Default:None
.
- compute_metrics(results: list) Dict[str, float] [source]¶
Compute the metrics from processed results.
- Parameters
results (list) – The processed results of each batch.
- Returns
The computed metrics. The keys are the names of the metrics, and the values are corresponding results.
- Return type
Dict[str, float]
- process(data_batch: Sequence[dict], data_samples: Sequence[dict]) None [source]¶
Process one batch of data samples and predictions. The processed results should be stored in
self.results
, which will be used to compute the metrics when all batches have been processed.- Parameters
data_batch (Sequence[dict]) – A batch of data from the dataloader.
data_sample (Sequence[dict]) – A batch of outputs from the model.
- class mmpose.evaluation.metrics.CocoMetric(ann_file: Optional[str] = None, use_area: bool = True, iou_type: str = 'keypoints', score_mode: str = 'bbox_keypoint', keypoint_score_thr: float = 0.2, nms_mode: str = 'oks_nms', nms_thr: float = 0.9, format_only: bool = False, outfile_prefix: Optional[str] = None, collect_device: str = 'cpu', prefix: Optional[str] = None)[source]¶
COCO pose estimation task evaluation metric.
Evaluate AR, AP, and mAP for keypoint detection tasks. Support COCO dataset and other datasets in COCO format. Please refer to COCO keypoint evaluation for more details.
- Parameters
ann_file (str, optional) – Path to the coco format annotation file. If not specified, ground truth annotations from the dataset will be converted to coco format. Defaults to None
use_area (bool) – Whether to use
'area'
message in the annotations. If the ground truth annotations (e.g. CrowdPose, AIC) do not have the field'area'
, please setuse_area=False
. Defaults toTrue
iou_type (str) – The same parameter as iouType in
xtcocotools.COCOeval
, which can be'keypoints'
, or'keypoints_crowd'
(used in CrowdPose dataset). Defaults to'keypoints'
score_mode (str) –
The mode to score the prediction results which should be one of the following options:
'bbox'
: Take the score of bbox as the score of theprediction results.
'bbox_keypoint'
: Use keypoint score to rescore theprediction results.
'bbox_rle'
: Use rle_score to rescore theprediction results.
Defaults to ``’bbox_keypoint’`
keypoint_score_thr (float) – The threshold of keypoint score. The keypoints with score lower than it will not be included to rescore the prediction results. Valid only when
score_mode
isbbox_keypoint
. Defaults to0.2
nms_mode (str) –
The mode to perform Non-Maximum Suppression (NMS), which should be one of the following options:
'oks_nms'
: Use Object Keypoint Similarity (OKS) toperform NMS.
'soft_oks_nms'
: Use Object Keypoint Similarity (OKS)to perform soft NMS.
'none'
: Do not perform NMS. Typically for bottomup modeoutput.
Defaults to ``’oks_nms’`
nms_thr (float) – The Object Keypoint Similarity (OKS) threshold used in NMS when
nms_mode
is'oks_nms'
or'soft_oks_nms'
. Will retain the prediction results with OKS lower thannms_thr
. Defaults to0.9
format_only (bool) – Whether only format the output results without doing quantitative evaluation. This is designed for the need of test submission when the ground truth annotations are absent. If set to
True
,outfile_prefix
should specify the path to store the output results. Defaults toFalse
outfile_prefix (str | None) – The prefix of json files. It includes the file path and the prefix of filename, e.g.,
'a/b/prefix'
. If not specified, a temp file will be created. Defaults toNone
collect_device (str) – Device name used for collecting results from different ranks during distributed training. Must be
'cpu'
or'gpu'
. Defaults to'cpu'
prefix (str, optional) – The prefix that will be added in the metric names to disambiguate homonymous metrics of different evaluators. If prefix is not provided in the argument,
self.default_prefix
will be used instead. Defaults toNone
- compute_metrics(results: list) Dict[str, float] [source]¶
Compute the metrics from processed results.
- Parameters
results (list) – The processed results of each batch.
- Returns
The computed metrics. The keys are the names of the metrics, and the values are corresponding results.
- Return type
Dict[str, float]
- gt_to_coco_json(gt_dicts: Sequence[dict], outfile_prefix: str) str [source]¶
Convert ground truth to coco format json file.
- Parameters
gt_dicts (Sequence[dict]) –
Ground truth of the dataset. Each dict contains the ground truth information about the data sample. Required keys of the each gt_dict in gt_dicts:
img_id: image id of the data sample
width: original image width
height: original image height
raw_ann_info: the raw annotation information
- Optional keys:
- crowd_index: measure the crowding level of an image,
defined in CrowdPose dataset
It is worth mentioning that, in order to compute CocoMetric, there are some required keys in the raw_ann_info:
id: the id to distinguish different annotations
image_id: the image id of this annotation
category_id: the category of the instance.
bbox: the object bounding box
- keypoints: the keypoints cooridinates along with their
visibilities. Note that it need to be aligned with the official COCO format, e.g., a list with length N * 3, in which N is the number of keypoints. And each triplet represent the [x, y, visible] of the keypoint.
- iscrowd: indicating whether the annotation is a crowd.
It is useful when matching the detection results to the ground truth.
- There are some optional keys as well:
area: it is necessary when self.use_area is True
- num_keypoints: it is necessary when self.iou_type
is set as keypoints_crowd.
outfile_prefix (str) – The filename prefix of the json files. If the prefix is “somepath/xxx”, the json file will be named “somepath/xxx.gt.json”.
- Returns
The filename of the json file.
- Return type
str
- process(data_batch: Sequence[dict], data_samples: Sequence[dict]) None [source]¶
Process one batch of data samples and predictions. The processed results should be stored in
self.results
, which will be used to compute the metrics when all batches have been processed.- Parameters
data_batch (Sequence[dict]) – A batch of data from the dataloader.
data_samples (Sequence[dict]) –
A batch of outputs from the model, each of which has the following keys:
’id’: The id of the sample
’img_id’: The image_id of the sample
’pred_instances’: The prediction results of instance(s)
- results2json(keypoints: Dict[int, list], outfile_prefix: str) str [source]¶
Dump the keypoint detection results to a COCO style json file.
- Parameters
keypoints (Dict[int, list]) – Keypoint detection results of the dataset.
outfile_prefix (str) – The filename prefix of the json files. If the prefix is “somepath/xxx”, the json files will be named “somepath/xxx.keypoints.json”,
- Returns
The json file name of keypoint results.
- Return type
str
- class mmpose.evaluation.metrics.CocoWholeBodyMetric(ann_file: Optional[str] = None, use_area: bool = True, iou_type: str = 'keypoints', score_mode: str = 'bbox_keypoint', keypoint_score_thr: float = 0.2, nms_mode: str = 'oks_nms', nms_thr: float = 0.9, format_only: bool = False, outfile_prefix: Optional[str] = None, collect_device: str = 'cpu', prefix: Optional[str] = None)[source]¶
COCO-WholeBody evaluation metric.
Evaluate AR, AP, and mAP for COCO-WholeBody keypoint detection tasks. Support COCO-WholeBody dataset. Please refer to COCO keypoint evaluation for more details.
- Parameters
ann_file (str, optional) – Path to the coco format annotation file. If not specified, ground truth annotations from the dataset will be converted to coco format. Defaults to None
use_area (bool) – Whether to use
'area'
message in the annotations. If the ground truth annotations (e.g. CrowdPose, AIC) do not have the field'area'
, please setuse_area=False
. Defaults toTrue
iou_type (str) – The same parameter as iouType in
xtcocotools.COCOeval
, which can be'keypoints'
, or'keypoints_crowd'
(used in CrowdPose dataset). Defaults to'keypoints'
score_mode (str) –
The mode to score the prediction results which should be one of the following options:
'bbox'
: Take the score of bbox as the score of theprediction results.
'bbox_keypoint'
: Use keypoint score to rescore theprediction results.
'bbox_rle'
: Use rle_score to rescore theprediction results.
Defaults to ``’bbox_keypoint’`
keypoint_score_thr (float) – The threshold of keypoint score. The keypoints with score lower than it will not be included to rescore the prediction results. Valid only when
score_mode
isbbox_keypoint
. Defaults to0.2
nms_mode (str) –
The mode to perform Non-Maximum Suppression (NMS), which should be one of the following options:
'oks_nms'
: Use Object Keypoint Similarity (OKS) toperform NMS.
'soft_oks_nms'
: Use Object Keypoint Similarity (OKS)to perform soft NMS.
'none'
: Do not perform NMS. Typically for bottomup modeoutput.
Defaults to ``’oks_nms’`
nms_thr (float) – The Object Keypoint Similarity (OKS) threshold used in NMS when
nms_mode
is'oks_nms'
or'soft_oks_nms'
. Will retain the prediction results with OKS lower thannms_thr
. Defaults to0.9
format_only (bool) – Whether only format the output results without doing quantitative evaluation. This is designed for the need of test submission when the ground truth annotations are absent. If set to
True
,outfile_prefix
should specify the path to store the output results. Defaults toFalse
outfile_prefix (str | None) – The prefix of json files. It includes the file path and the prefix of filename, e.g.,
'a/b/prefix'
. If not specified, a temp file will be created. Defaults toNone
**kwargs – Keyword parameters passed to
mmeval.BaseMetric
- gt_to_coco_json(gt_dicts: Sequence[dict], outfile_prefix: str) str [source]¶
Convert ground truth to coco format json file.
- Parameters
gt_dicts (Sequence[dict]) –
Ground truth of the dataset. Each dict contains the ground truth information about the data sample. Required keys of the each gt_dict in gt_dicts:
img_id: image id of the data sample
width: original image width
height: original image height
raw_ann_info: the raw annotation information
- Optional keys:
- crowd_index: measure the crowding level of an image,
defined in CrowdPose dataset
It is worth mentioning that, in order to compute CocoMetric, there are some required keys in the raw_ann_info:
id: the id to distinguish different annotations
image_id: the image id of this annotation
category_id: the category of the instance.
bbox: the object bounding box
- keypoints: the keypoints cooridinates along with their
visibilities. Note that it need to be aligned with the official COCO format, e.g., a list with length N * 3, in which N is the number of keypoints. And each triplet represent the [x, y, visible] of the keypoint.
’keypoints’
- iscrowd: indicating whether the annotation is a crowd.
It is useful when matching the detection results to the ground truth.
- There are some optional keys as well:
area: it is necessary when self.use_area is True
- num_keypoints: it is necessary when self.iou_type
is set as keypoints_crowd.
outfile_prefix (str) – The filename prefix of the json files. If the prefix is “somepath/xxx”, the json file will be named “somepath/xxx.gt.json”.
- Returns
The filename of the json file.
- Return type
str
- results2json(keypoints: Dict[int, list], outfile_prefix: str) str [source]¶
Dump the keypoint detection results to a COCO style json file.
- Parameters
keypoints (Dict[int, list]) – Keypoint detection results of the dataset.
outfile_prefix (str) – The filename prefix of the json files. If the prefix is “somepath/xxx”, the json files will be named “somepath/xxx.keypoints.json”,
- Returns
The json file name of keypoint results.
- Return type
str
- class mmpose.evaluation.metrics.EPE(collect_device: str = 'cpu', prefix: Optional[str] = None)[source]¶
EPE evaluation metric.
Calculate the end-point error (EPE) of keypoints.
Note
length of dataset: N
num_keypoints: K
number of keypoint dimensions: D (typically D = 2)
- Parameters
collect_device (str) – Device name used for collecting results from different ranks during distributed training. Must be
'cpu'
or'gpu'
. Default:'cpu'
.prefix (str, optional) – The prefix that will be added in the metric names to disambiguate homonymous metrics of different evaluators. If prefix is not provided in the argument,
self.default_prefix
will be used instead. Default:None
.
- compute_metrics(results: list) Dict[str, float] [source]¶
Compute the metrics from processed results.
- Parameters
results (list) – The processed results of each batch.
- Returns
The computed metrics. The keys are the names of the metrics, and the values are corresponding results.
- Return type
Dict[str, float]
- process(data_batch: Sequence[dict], data_samples: Sequence[dict]) None [source]¶
Process one batch of data samples and predictions. The processed results should be stored in
self.results
, which will be used to compute the metrics when all batches have been processed.- Parameters
data_batch (Sequence[dict]) – A batch of data from the dataloader.
data_samples (Sequence[dict]) – A batch of outputs from the model.
- class mmpose.evaluation.metrics.JhmdbPCKAccuracy(thr: float = 0.05, norm_item: Union[str, Sequence[str]] = 'bbox', collect_device: str = 'cpu', prefix: Optional[str] = None)[source]¶
PCK accuracy evaluation metric for Jhmdb dataset.
Calculate the pose accuracy of Percentage of Correct Keypoints (PCK) for each individual keypoint and the averaged accuracy across all keypoints. PCK metric measures accuracy of the localization of the body joints. The distances between predicted positions and the ground-truth ones are typically normalized by the person bounding box size. The threshold (thr) of the normalized distance is commonly set as 0.05, 0.1 or 0.2 etc.
Note
length of dataset: N
num_keypoints: K
number of keypoint dimensions: D (typically D = 2)
- Parameters
thr (float) – Threshold of PCK calculation. Default: 0.05.
norm_item (str | Sequence[str]) – The item used for normalization. Valid items include ‘bbox’, ‘head’, ‘torso’, which correspond to ‘PCK’, ‘PCKh’ and ‘tPCK’ respectively. Default:
'bbox'
.collect_device (str) – Device name used for collecting results from different ranks during distributed training. Must be
'cpu'
or'gpu'
. Default:'cpu'
.prefix (str, optional) – The prefix that will be added in the metric names to disambiguate homonymous metrics of different evaluators. If prefix is not provided in the argument,
self.default_prefix
will be used instead. Default:None
.
Examples
>>> from mmpose.evaluation.metrics import JhmdbPCKAccuracy >>> import numpy as np >>> from mmengine.structures import InstanceData >>> num_keypoints = 15 >>> keypoints = np.random.random((1, num_keypoints, 2)) * 10 >>> gt_instances = InstanceData() >>> gt_instances.keypoints = keypoints >>> gt_instances.keypoints_visible = np.ones( ... (1, num_keypoints, 1)).astype(bool) >>> gt_instances.bboxes = np.random.random((1, 4)) * 20 >>> gt_instances.head_size = np.random.random((1, 1)) * 10 >>> pred_instances = InstanceData() >>> pred_instances.keypoints = keypoints >>> data_sample = { ... 'gt_instances': gt_instances.to_dict(), ... 'pred_instances': pred_instances.to_dict(), ... } >>> data_samples = [data_sample] >>> data_batch = [{'inputs': None}] >>> jhmdb_pck_metric = JhmdbPCKAccuracy(thr=0.2, norm_item=['bbox', 'torso']) ... UserWarning: The prefix is not set in metric class JhmdbPCKAccuracy. >>> jhmdb_pck_metric.process(data_batch, data_samples) >>> jhmdb_pck_metric.evaluate(1) 10/26 17:48:09 - mmengine - INFO - Evaluating JhmdbPCKAccuracy (normalized by ``"bbox_size"``)... # noqa 10/26 17:48:09 - mmengine - INFO - Evaluating JhmdbPCKAccuracy (normalized by ``"torso_size"``)... # noqa {'Head PCK': 1.0, 'Sho PCK': 1.0, 'Elb PCK': 1.0, 'Wri PCK': 1.0, 'Hip PCK': 1.0, 'Knee PCK': 1.0, 'Ank PCK': 1.0, 'PCK': 1.0, 'Head tPCK': 1.0, 'Sho tPCK': 1.0, 'Elb tPCK': 1.0, 'Wri tPCK': 1.0, 'Hip tPCK': 1.0, 'Knee tPCK': 1.0, 'Ank tPCK': 1.0, 'tPCK': 1.0}
- compute_metrics(results: list) Dict[str, float] [source]¶
Compute the metrics from processed results.
- Parameters
results (list) – The processed results of each batch.
- Returns
The computed metrics. The keys are the names of the metrics, and the values are corresponding results. If ‘bbox’ in self.norm_item, the returned results are the pck accuracy normalized by bbox_size, which have the following keys:
’Head PCK’: The PCK of head
’Sho PCK’: The PCK of shoulder
’Elb PCK’: The PCK of elbow
’Wri PCK’: The PCK of wrist
’Hip PCK’: The PCK of hip
’Knee PCK’: The PCK of knee
’Ank PCK’: The PCK of ankle
’PCK’: The mean PCK over all keypoints
If ‘torso’ in self.norm_item, the returned results are the pck accuracy normalized by torso_size, which have the following keys:
’Head tPCK’: The PCK of head
’Sho tPCK’: The PCK of shoulder
’Elb tPCK’: The PCK of elbow
’Wri tPCK’: The PCK of wrist
’Hip tPCK’: The PCK of hip
’Knee tPCK’: The PCK of knee
’Ank tPCK’: The PCK of ankle
’tPCK’: The mean PCK over all keypoints
- Return type
Dict[str, float]
- class mmpose.evaluation.metrics.KeypointPartitionMetric(metric: dict, partitions: dict)[source]¶
Wrapper metric for evaluating pose metric on user-defined body parts.
Sometimes one may be interested in the performance of a pose model on certain body parts rather than on all the keypoints. For example,
CocoWholeBodyMetric
evaluates coco metric on body, foot, face, lefthand and righthand. However,CocoWholeBodyMetric
cannot be applied to arbitrary custom datasets. This wrapper metric solves this problem.- Supported metrics:
CocoMetric
Note 1: all keypoint ground truth should be stored inkeypoints not other data fields. Note 2: ann_file is not supported, it will be ignored. Note 3: score_mode other than ‘bbox’ may produce results different from the
CocoWholebodyMetric
. Note 4: nms_mode other than ‘none’ may produce results different from theCocoWholebodyMetric
.PCKAccuracy
Note 1: data fields required byPCKAccuracy
shouldbe provided, such as bbox, head_size, etc. Note 2: In terms of
- ‘torso’, since it is specifically designed for
JhmdbDataset
, it is not recommended to use it for other datasets.
AUC
supported without limitations.EPE
supported without limitations.NME
only norm_mode = ‘use_norm_item’ is supported, ‘keypoint_distance’ is incompatible withKeypointPartitionMetric
.- Incompatible metrics:
- The following metrics are dataset specific metrics:
CocoWholeBodyMetric
MpiiPCKAccuracy
JhmdbPCKAccuracy
PoseTrack18Metric
Keypoint partitioning is included in these metrics.
- Parameters
metric (dict) – arguments to instantiate a metric, please refer to the arguments required by the metric of your choice.
partitions (dict) –
definition of body partitions. For example, if we have 10 keypoints in total, the first 7 keypoints belong to body and the last 3 keypoints belong to foot, this field can be like this:
- dict(
body=[0, 1, 2, 3, 4, 5, 6], foot=[7, 8, 9], all=[0, 1, 2, 3, 4, 5, 6, 7, 8, 9]
)
where the numbers are the indices of keypoints and they can be discontinuous.
- compute_metrics(results: list) dict [source]¶
Compute the metrics from processed results.
- Parameters
results (list) – The processed results of each batch.
- Returns
The computed metrics. The keys are the names of the metrics, and the values are corresponding results.
- Return type
dict
- property dataset_meta: Optional[dict]¶
Meta info of the dataset.
- Type
Optional[dict]
- class mmpose.evaluation.metrics.MpiiPCKAccuracy(thr: float = 0.5, norm_item: Union[str, Sequence[str]] = 'head', collect_device: str = 'cpu', prefix: Optional[str] = None)[source]¶
PCKh accuracy evaluation metric for MPII dataset.
Calculate the pose accuracy of Percentage of Correct Keypoints (PCK) for each individual keypoint and the averaged accuracy across all keypoints. PCK metric measures accuracy of the localization of the body joints. The distances between predicted positions and the ground-truth ones are typically normalized by the person bounding box size. The threshold (thr) of the normalized distance is commonly set as 0.05, 0.1 or 0.2 etc.
Note
length of dataset: N
num_keypoints: K
number of keypoint dimensions: D (typically D = 2)
- Parameters
thr (float) – Threshold of PCK calculation. Default: 0.05.
norm_item (str | Sequence[str]) – The item used for normalization. Valid items include ‘bbox’, ‘head’, ‘torso’, which correspond to ‘PCK’, ‘PCKh’ and ‘tPCK’ respectively. Default:
'head'
.collect_device (str) – Device name used for collecting results from different ranks during distributed training. Must be
'cpu'
or'gpu'
. Default:'cpu'
.prefix (str, optional) – The prefix that will be added in the metric names to disambiguate homonymous metrics of different evaluators. If prefix is not provided in the argument,
self.default_prefix
will be used instead. Default:None
.
Examples
>>> from mmpose.evaluation.metrics import MpiiPCKAccuracy >>> import numpy as np >>> from mmengine.structures import InstanceData >>> num_keypoints = 16 >>> keypoints = np.random.random((1, num_keypoints, 2)) * 10 >>> gt_instances = InstanceData() >>> gt_instances.keypoints = keypoints + 1.0 >>> gt_instances.keypoints_visible = np.ones( ... (1, num_keypoints, 1)).astype(bool) >>> gt_instances.head_size = np.random.random((1, 1)) * 10 >>> pred_instances = InstanceData() >>> pred_instances.keypoints = keypoints >>> data_sample = { ... 'gt_instances': gt_instances.to_dict(), ... 'pred_instances': pred_instances.to_dict(), ... } >>> data_samples = [data_sample] >>> data_batch = [{'inputs': None}] >>> mpii_pck_metric = MpiiPCKAccuracy(thr=0.3, norm_item='head') ... UserWarning: The prefix is not set in metric class MpiiPCKAccuracy. >>> mpii_pck_metric.process(data_batch, data_samples) >>> mpii_pck_metric.evaluate(1) 10/26 17:43:39 - mmengine - INFO - Evaluating MpiiPCKAccuracy (normalized by ``"head_size"``)... # noqa {'Head PCK': 100.0, 'Shoulder PCK': 100.0, 'Elbow PCK': 100.0, Wrist PCK': 100.0, 'Hip PCK': 100.0, 'Knee PCK': 100.0, 'Ankle PCK': 100.0, 'PCK': 100.0, 'PCK@0.1': 100.0}
- compute_metrics(results: list) Dict[str, float] [source]¶
Compute the metrics from processed results.
- Parameters
results (list) – The processed results of each batch.
- Returns
The computed metrics. The keys are the names of the metrics, and the values are corresponding results. If ‘head’ in self.norm_item, the returned results are the pck accuracy normalized by head_size, which have the following keys:
’Head PCK’: The PCK of head
’Shoulder PCK’: The PCK of shoulder
’Elbow PCK’: The PCK of elbow
’Wrist PCK’: The PCK of wrist
’Hip PCK’: The PCK of hip
’Knee PCK’: The PCK of knee
’Ankle PCK’: The PCK of ankle
’PCK’: The mean PCK over all keypoints
’PCK@0.1’: The mean PCK at threshold 0.1
- Return type
Dict[str, float]
- class mmpose.evaluation.metrics.NME(norm_mode: str, norm_item: Optional[str] = None, keypoint_indices: Optional[Sequence[int]] = None, collect_device: str = 'cpu', prefix: Optional[str] = None)[source]¶
NME evaluation metric.
Calculate the normalized mean error (NME) of keypoints.
Note
length of dataset: N
num_keypoints: K
number of keypoint dimensions: D (typically D = 2)
- Parameters
norm_mode (str) – The normalization mode. There are two valid modes: ‘use_norm_item’ and ‘keypoint_distance’. When set as ‘use_norm_item’, should specify the argument norm_item, which represents the item in the datainfo that will be used as the normalization factor. When set as ‘keypoint_distance’, should specify the argument keypoint_indices that are used to calculate the keypoint distance as the normalization factor.
norm_item (str, optional) – The item used as the normalization factor. For example, ‘bbox_size’ in ‘AFLWDataset’. Only valid when
norm_mode
isuse_norm_item
. Default:None
.keypoint_indices (Sequence[int], optional) – The keypoint indices used to calculate the keypoint distance as the normalization factor. Only valid when
norm_mode
iskeypoint_distance
. If set as None, will use the defaultkeypoint_indices
in DEFAULT_KEYPOINT_INDICES for specific datasets, else use the givenkeypoint_indices
of the dataset. Default:None
.collect_device (str) – Device name used for collecting results from different ranks during distributed training. Must be
'cpu'
or'gpu'
. Default:'cpu'
.prefix (str, optional) – The prefix that will be added in the metric names to disambiguate homonymous metrics of different evaluators. If prefix is not provided in the argument,
self.default_prefix
will be used instead. Default:None
.
- compute_metrics(results: list) Dict[str, float] [source]¶
Compute the metrics from processed results.
- Parameters
results (list) – The processed results of each batch.
- Returns
The computed metrics. The keys are the names of the metrics, and the values are corresponding results.
- Return type
Dict[str, float]
- process(data_batch: Sequence[dict], data_samples: Sequence[dict]) None [source]¶
Process one batch of data samples and predictions. The processed results should be stored in
self.results
, which will be used to compute the metrics when all batches have been processed.- Parameters
data_batch (Sequence[dict]) – A batch of data from the dataloader.
data_samples (Sequence[dict]) – A batch of outputs from the model.
- class mmpose.evaluation.metrics.PCKAccuracy(thr: float = 0.05, norm_item: Union[str, Sequence[str]] = 'bbox', collect_device: str = 'cpu', prefix: Optional[str] = None)[source]¶
PCK accuracy evaluation metric. Calculate the pose accuracy of Percentage of Correct Keypoints (PCK) for each individual keypoint and the averaged accuracy across all keypoints. PCK metric measures accuracy of the localization of the body joints. The distances between predicted positions and the ground-truth ones are typically normalized by the person bounding box size. The threshold (thr) of the normalized distance is commonly set as 0.05, 0.1 or 0.2 etc. .. note:
- length of dataset: N - num_keypoints: K - number of keypoint dimensions: D (typically D = 2)
- Parameters
thr (float) – Threshold of PCK calculation. Default: 0.05.
norm_item (str | Sequence[str]) – The item used for normalization. Valid items include ‘bbox’, ‘head’, ‘torso’, which correspond to ‘PCK’, ‘PCKh’ and ‘tPCK’ respectively. Default:
'bbox'
.collect_device (str) – Device name used for collecting results from different ranks during distributed training. Must be
'cpu'
or'gpu'
. Default:'cpu'
.prefix (str, optional) – The prefix that will be added in the metric names to disambiguate homonymous metrics of different evaluators. If prefix is not provided in the argument,
self.default_prefix
will be used instead. Default:None
.
Examples
>>> from mmpose.evaluation.metrics import PCKAccuracy >>> import numpy as np >>> from mmengine.structures import InstanceData >>> num_keypoints = 15 >>> keypoints = np.random.random((1, num_keypoints, 2)) * 10 >>> gt_instances = InstanceData() >>> gt_instances.keypoints = keypoints >>> gt_instances.keypoints_visible = np.ones( ... (1, num_keypoints, 1)).astype(bool) >>> gt_instances.bboxes = np.random.random((1, 4)) * 20 >>> pred_instances = InstanceData() >>> pred_instances.keypoints = keypoints >>> data_sample = { ... 'gt_instances': gt_instances.to_dict(), ... 'pred_instances': pred_instances.to_dict(), ... } >>> data_samples = [data_sample] >>> data_batch = [{'inputs': None}] >>> pck_metric = PCKAccuracy(thr=0.5, norm_item='bbox') ...: UserWarning: The prefix is not set in metric class PCKAccuracy. >>> pck_metric.process(data_batch, data_samples) >>> pck_metric.evaluate(1) 10/26 15:37:57 - mmengine - INFO - Evaluating PCKAccuracy (normalized by ``"bbox_size"``)... # noqa {'PCK': 1.0}
- compute_metrics(results: list) Dict[str, float] [source]¶
Compute the metrics from processed results.
- Parameters
results (list) – The processed results of each batch.
- Returns
The computed metrics. The keys are the names of the metrics, and the values are corresponding results. The returned result dict may have the following keys:
’PCK’: The pck accuracy normalized by bbox_size.
’PCKh’: The pck accuracy normalized by head_size.
’tPCK’: The pck accuracy normalized by torso_size.
- Return type
Dict[str, float]
- process(data_batch: Sequence[dict], data_samples: Sequence[dict]) None [source]¶
Process one batch of data samples and predictions.
The processed results should be stored in
self.results
, which will be used to compute the metrics when all batches have been processed. :param data_batch: A batch of datafrom the dataloader.
- Parameters
data_samples (Sequence[dict]) – A batch of outputs from the model.
- class mmpose.evaluation.metrics.PoseTrack18Metric(ann_file: Optional[str] = None, score_mode: str = 'bbox_keypoint', keypoint_score_thr: float = 0.2, nms_mode: str = 'oks_nms', nms_thr: float = 0.9, format_only: bool = False, outfile_prefix: Optional[str] = None, collect_device: str = 'cpu', prefix: Optional[str] = None)[source]¶
PoseTrack18 evaluation metric.
Evaluate AP, and mAP for keypoint detection tasks. Support PoseTrack18 (video) dataset. Please refer to https://github.com/leonid-pishchulin/poseval for more details.
- Parameters
ann_file (str, optional) – Path to the coco format annotation file. If not specified, ground truth annotations from the dataset will be converted to coco format. Defaults to None
score_mode (str) –
The mode to score the prediction results which should be one of the following options:
'bbox'
: Take the score of bbox as the score of theprediction results.
'bbox_keypoint'
: Use keypoint score to rescore theprediction results.
Defaults to ``’bbox_keypoint’`
keypoint_score_thr (float) – The threshold of keypoint score. The keypoints with score lower than it will not be included to rescore the prediction results. Valid only when
score_mode
isbbox_keypoint
. Defaults to0.2
nms_mode (str) –
The mode to perform Non-Maximum Suppression (NMS), which should be one of the following options:
'oks_nms'
: Use Object Keypoint Similarity (OKS) toperform NMS.
'soft_oks_nms'
: Use Object Keypoint Similarity (OKS)to perform soft NMS.
'none'
: Do not perform NMS. Typically for bottomup modeoutput.
Defaults to ``’oks_nms’`
nms_thr (float) – The Object Keypoint Similarity (OKS) threshold used in NMS when
nms_mode
is'oks_nms'
or'soft_oks_nms'
. Will retain the prediction results with OKS lower thannms_thr
. Defaults to0.9
format_only (bool) – Whether only format the output results without doing quantitative evaluation. This is designed for the need of test submission when the ground truth annotations are absent. If set to
True
,outfile_prefix
should specify the path to store the output results. Defaults toFalse
outfile_prefix (str | None) – The prefix of json files. It includes the file path and the prefix of filename, e.g.,
'a/b/prefix'
. If not specified, a temp file will be created. Defaults toNone
**kwargs – Keyword parameters passed to
mmeval.BaseMetric
- results2json(keypoints: Dict[int, list], outfile_prefix: str) str [source]¶
Dump the keypoint detection results into a json file.
- Parameters
keypoints (Dict[int, list]) – Keypoint detection results of the dataset.
outfile_prefix (str) – The filename prefix of the json files. If the prefix is “somepath/xxx”, the json files will be named “somepath/xxx.keypoints.json”.
- Returns
The json file name of keypoint results.
- Return type
str
functional¶
- mmpose.evaluation.functional.keypoint_auc(pred: numpy.ndarray, gt: numpy.ndarray, mask: numpy.ndarray, norm_factor: numpy.ndarray, num_thrs: int = 20) float [source]¶
Calculate the Area under curve (AUC) of keypoint PCK accuracy.
Note
instance number: N
keypoint number: K
- Parameters
pred (np.ndarray[N, K, 2]) – Predicted keypoint location.
gt (np.ndarray[N, K, 2]) – Groundtruth keypoint location.
mask (np.ndarray[N, K]) – Visibility of the target. False for invisible joints, and True for visible. Invisible joints will be ignored for accuracy calculation.
norm_factor (float) – Normalization factor.
num_thrs (int) – number of thresholds to calculate auc.
- Returns
Area under curve (AUC) of keypoint PCK accuracy.
- Return type
float
- mmpose.evaluation.functional.keypoint_epe(pred: numpy.ndarray, gt: numpy.ndarray, mask: numpy.ndarray) float [source]¶
Calculate the end-point error.
Note
instance number: N
keypoint number: K
- Parameters
pred (np.ndarray[N, K, 2]) – Predicted keypoint location.
gt (np.ndarray[N, K, 2]) – Groundtruth keypoint location.
mask (np.ndarray[N, K]) – Visibility of the target. False for invisible joints, and True for visible. Invisible joints will be ignored for accuracy calculation.
- Returns
Average end-point error.
- Return type
float
- mmpose.evaluation.functional.keypoint_nme(pred: numpy.ndarray, gt: numpy.ndarray, mask: numpy.ndarray, normalize_factor: numpy.ndarray) float [source]¶
Calculate the normalized mean error (NME).
Note
instance number: N
keypoint number: K
- Parameters
pred (np.ndarray[N, K, 2]) – Predicted keypoint location.
gt (np.ndarray[N, K, 2]) – Groundtruth keypoint location.
mask (np.ndarray[N, K]) – Visibility of the target. False for invisible joints, and True for visible. Invisible joints will be ignored for accuracy calculation.
normalize_factor (np.ndarray[N, 2]) – Normalization factor.
- Returns
normalized mean error
- Return type
float
- mmpose.evaluation.functional.keypoint_pck_accuracy(pred: numpy.ndarray, gt: numpy.ndarray, mask: numpy.ndarray, thr: numpy.ndarray, norm_factor: numpy.ndarray) tuple [source]¶
Calculate the pose accuracy of PCK for each individual keypoint and the averaged accuracy across all keypoints for coordinates.
Note
PCK metric measures accuracy of the localization of the body joints. The distances between predicted positions and the ground-truth ones are typically normalized by the bounding box size. The threshold (thr) of the normalized distance is commonly set as 0.05, 0.1 or 0.2 etc.
instance number: N
keypoint number: K
- Parameters
pred (np.ndarray[N, K, 2]) – Predicted keypoint location.
gt (np.ndarray[N, K, 2]) – Groundtruth keypoint location.
mask (np.ndarray[N, K]) – Visibility of the target. False for invisible joints, and True for visible. Invisible joints will be ignored for accuracy calculation.
thr (float) – Threshold of PCK calculation.
norm_factor (np.ndarray[N, 2]) – Normalization factor for H&W.
- Returns
A tuple containing keypoint accuracy.
acc (np.ndarray[K]): Accuracy of each keypoint.
avg_acc (float): Averaged accuracy across all keypoints.
cnt (int): Number of valid keypoints.
- Return type
tuple
- mmpose.evaluation.functional.multilabel_classification_accuracy(pred: numpy.ndarray, gt: numpy.ndarray, mask: numpy.ndarray, thr: float = 0.5) float [source]¶
Get multi-label classification accuracy.
Note
batch size: N
label number: L
- Parameters
pred (np.ndarray[N, L, 2]) – model predicted labels.
gt (np.ndarray[N, L, 2]) – ground-truth labels.
mask (np.ndarray[N, 1] or np.ndarray[N, L]) – reliability of ground-truth labels.
thr (float) – Threshold for calculating accuracy.
- Returns
multi-label classification accuracy.
- Return type
float
- mmpose.evaluation.functional.nms(dets: numpy.ndarray, thr: float) List[int] [source]¶
Greedily select boxes with high confidence and overlap <= thr.
- Parameters
dets (np.ndarray) – [[x1, y1, x2, y2, score]].
thr (float) – Retain overlap < thr.
- Returns
Indexes to keep.
- Return type
list
- mmpose.evaluation.functional.oks_nms(kpts_db: List[dict], thr: float, sigmas: Optional[numpy.ndarray] = None, vis_thr: Optional[float] = None, score_per_joint: bool = False)[source]¶
OKS NMS implementations.
- Parameters
kpts_db (List[dict]) – The keypoints results of the same image.
thr (float) – The threshold of NMS. Will retain oks overlap < thr.
sigmas (np.ndarray, optional) – Keypoint labelling uncertainty. Please refer to COCO keypoint evaluation for more details. If not given, use the sigmas on COCO dataset. Defaults to
None
vis_thr (float, optional) – Threshold of the keypoint visibility. If specified, will calculate OKS based on those keypoints whose visibility higher than vis_thr. If not given, calculate the OKS based on all keypoints. Defaults to
None
score_per_joint (bool) – Whether the input scores (in kpts_db) are per-joint scores. Defaults to
False
- Returns
indexes to keep.
- Return type
np.ndarray
- mmpose.evaluation.functional.pose_pck_accuracy(output: numpy.ndarray, target: numpy.ndarray, mask: numpy.ndarray, thr: float = 0.05, normalize: Optional[numpy.ndarray] = None) tuple [source]¶
Calculate the pose accuracy of PCK for each individual keypoint and the averaged accuracy across all keypoints from heatmaps.
Note
PCK metric measures accuracy of the localization of the body joints. The distances between predicted positions and the ground-truth ones are typically normalized by the bounding box size. The threshold (thr) of the normalized distance is commonly set as 0.05, 0.1 or 0.2 etc.
batch_size: N
num_keypoints: K
heatmap height: H
heatmap width: W
- Parameters
output (np.ndarray[N, K, H, W]) – Model output heatmaps.
target (np.ndarray[N, K, H, W]) – Groundtruth heatmaps.
mask (np.ndarray[N, K]) – Visibility of the target. False for invisible joints, and True for visible. Invisible joints will be ignored for accuracy calculation.
thr (float) – Threshold of PCK calculation. Default 0.05.
normalize (np.ndarray[N, 2]) – Normalization factor for H&W.
- Returns
A tuple containing keypoint accuracy.
np.ndarray[K]: Accuracy of each keypoint.
float: Averaged accuracy across all keypoints.
int: Number of valid keypoints.
- Return type
tuple
- mmpose.evaluation.functional.simcc_pck_accuracy(output: Tuple[numpy.ndarray, numpy.ndarray], target: Tuple[numpy.ndarray, numpy.ndarray], simcc_split_ratio: float, mask: numpy.ndarray, thr: float = 0.05, normalize: Optional[numpy.ndarray] = None) tuple [source]¶
Calculate the pose accuracy of PCK for each individual keypoint and the averaged accuracy across all keypoints from SimCC.
Note
PCK metric measures accuracy of the localization of the body joints. The distances between predicted positions and the ground-truth ones are typically normalized by the bounding box size. The threshold (thr) of the normalized distance is commonly set as 0.05, 0.1 or 0.2 etc.
instance number: N
keypoint number: K
- Parameters
output (Tuple[np.ndarray, np.ndarray]) – Model predicted SimCC.
target (Tuple[np.ndarray, np.ndarray]) – Groundtruth SimCC.
mask (np.ndarray[N, K]) – Visibility of the target. False for invisible joints, and True for visible. Invisible joints will be ignored for accuracy calculation.
thr (float) – Threshold of PCK calculation. Default 0.05.
normalize (np.ndarray[N, 2]) – Normalization factor for H&W.
- Returns
A tuple containing keypoint accuracy.
np.ndarray[K]: Accuracy of each keypoint.
float: Averaged accuracy across all keypoints.
int: Number of valid keypoints.
- Return type
tuple
- mmpose.evaluation.functional.soft_oks_nms(kpts_db: List[dict], thr: float, max_dets: int = 20, sigmas: Optional[numpy.ndarray] = None, vis_thr: Optional[float] = None, score_per_joint: bool = False)[source]¶
Soft OKS NMS implementations.
- Parameters
kpts_db (List[dict]) – The keypoints results of the same image.
thr (float) – The threshold of NMS. Will retain oks overlap < thr.
max_dets (int) – Maximum number of detections to keep. Defaults to 20
sigmas (np.ndarray, optional) – Keypoint labelling uncertainty. Please refer to COCO keypoint evaluation for more details. If not given, use the sigmas on COCO dataset. Defaults to
None
vis_thr (float, optional) – Threshold of the keypoint visibility. If specified, will calculate OKS based on those keypoints whose visibility higher than vis_thr. If not given, calculate the OKS based on all keypoints. Defaults to
None
score_per_joint (bool) – Whether the input scores (in kpts_db) are per-joint scores. Defaults to
False
- Returns
indexes to keep.
- Return type
np.ndarray
mmpose.visualization¶
- class mmpose.visualization.PoseLocalVisualizer(name: str = 'visualizer', image: Optional[numpy.ndarray] = None, vis_backends: Optional[Dict] = None, save_dir: Optional[str] = None, bbox_color: Optional[Union[str, Tuple[int]]] = 'green', kpt_color: Optional[Union[str, Tuple[Tuple[int]]]] = 'red', link_color: Optional[Union[str, Tuple[Tuple[int]]]] = None, text_color: Optional[Union[str, Tuple[int]]] = (255, 255, 255), skeleton: Optional[Union[List, Tuple]] = None, line_width: Union[int, float] = 1, radius: Union[int, float] = 3, show_keypoint_weight: bool = False, alpha: float = 0.8)[source]¶
MMPose Local Visualizer.
- Parameters
name (str) – Name of the instance. Defaults to ‘visualizer’.
image (np.ndarray, optional) – the origin image to draw. The format should be RGB. Defaults to
None
vis_backends (list, optional) – Visual backend config list. Defaults to
None
save_dir (str, optional) – Save file dir for all storage backends. If it is
None
, the backend storage will not save any data. Defaults toNone
bbox_color (str, tuple(int), optional) – Color of bbox lines. The tuple of color should be in BGR order. Defaults to
'green'
kpt_color (str, tuple(tuple(int)), optional) – Color of keypoints. The tuple of color should be in BGR order. Defaults to
'red'
link_color (str, tuple(tuple(int)), optional) – Color of skeleton. The tuple of color should be in BGR order. Defaults to
None
line_width (int, float) – The width of lines. Defaults to 1
radius (int, float) – The radius of keypoints. Defaults to 4
show_keypoint_weight (bool) – Whether to adjust the transparency of keypoints according to their score. Defaults to
False
alpha (int, float) – The transparency of bboxes. Defaults to
0.8
Examples
>>> import numpy as np >>> from mmengine.structures import InstanceData >>> from mmpose.structures import PoseDataSample >>> from mmpose.visualization import PoseLocalVisualizer
>>> pose_local_visualizer = PoseLocalVisualizer(radius=1) >>> image = np.random.randint(0, 256, ... size=(10, 12, 3)).astype('uint8') >>> gt_instances = InstanceData() >>> gt_instances.keypoints = np.array([[[1, 1], [2, 2], [4, 4], ... [8, 8]]]) >>> gt_pose_data_sample = PoseDataSample() >>> gt_pose_data_sample.gt_instances = gt_instances >>> dataset_meta = {'skeleton_links': [[0, 1], [1, 2], [2, 3]]} >>> pose_local_visualizer.set_dataset_meta(dataset_meta) >>> pose_local_visualizer.add_datasample('image', image, ... gt_pose_data_sample) >>> pose_local_visualizer.add_datasample( ... 'image', image, gt_pose_data_sample, ... out_file='out_file.jpg') >>> pose_local_visualizer.add_datasample( ... 'image', image, gt_pose_data_sample, ... show=True) >>> pred_instances = InstanceData() >>> pred_instances.keypoints = np.array([[[1, 1], [2, 2], [4, 4], ... [8, 8]]]) >>> pred_instances.score = np.array([0.8, 1, 0.9, 1]) >>> pred_pose_data_sample = PoseDataSample() >>> pred_pose_data_sample.pred_instances = pred_instances >>> pose_local_visualizer.add_datasample('image', image, ... gt_pose_data_sample, ... pred_pose_data_sample)
- add_datasample(name: str, image: numpy.ndarray, data_sample: mmpose.structures.pose_data_sample.PoseDataSample, draw_gt: bool = True, draw_pred: bool = True, draw_heatmap: bool = False, draw_bbox: bool = False, show_kpt_idx: bool = False, show: bool = False, wait_time: float = 0, out_file: Optional[str] = None, kpt_score_thr: float = 0.3, step: int = 0) None [source]¶
Draw datasample and save to all backends.
If GT and prediction are plotted at the same time, they are
displayed in a stitched image where the left image is the ground truth and the right image is the prediction. - If
show
is True, all storage backends are ignored, and the images will be displayed in a local window. - Ifout_file
is specified, the drawn image will be saved toout_file
. t is usually used when the display is not available.- Parameters
name (str) – The image identifier
image (np.ndarray) – The image to draw
data_sample (
PoseDataSample
, optional) – The data sample to visualizedraw_gt (bool) – Whether to draw GT PoseDataSample. Default to
True
draw_pred (bool) – Whether to draw Prediction PoseDataSample. Defaults to
True
draw_bbox (bool) – Whether to draw bounding boxes. Default to
False
draw_heatmap (bool) – Whether to draw heatmaps. Defaults to
False
show (bool) – Whether to display the drawn image. Default to
False
wait_time (float) – The interval of show (s). Defaults to 0
out_file (str) – Path to output file. Defaults to
None
pred_score_thr (float) – The threshold to visualize the bboxes and masks. Defaults to 0.3
step (int) – Global step value to record. Defaults to 0
mmpose.engine¶
hooks¶
- class mmpose.engine.hooks.ExpMomentumEMA(model: torch.nn.modules.module.Module, momentum: float = 0.0002, gamma: int = 2000, interval=1, device: Optional[torch.device] = None, update_buffers: bool = False)[source]¶
Exponential moving average (EMA) with exponential momentum strategy, which is used in YOLOX.
Ported from ` the implementation of MMDetection <https://github.com/open-mmlab/mmdetection/blob/3.x/mmdet/models/layers/ema.py>`_.
- Parameters
model (nn.Module) – The model to be averaged.
momentum (float) –
- The momentum used for updating ema parameter.
Ema’s parameter are updated with the formula:
averaged_param = (1-momentum) * averaged_param + momentum * source_param. Defaults to 0.0002.
gamma (int) – Use a larger momentum early in training and gradually annealing to a smaller value to update the ema model smoothly. The momentum is calculated as (1 - momentum) * exp(-(1 + steps) / gamma) + momentum. Defaults to 2000.
interval (int) – Interval between two updates. Defaults to 1.
device (torch.device, optional) – If provided, the averaged model will be stored on the
device
. Defaults to None.update_buffers (bool) – if True, it will compute running averages for both the parameters and the buffers of the model. Defaults to False.
- avg_func(averaged_param: torch.Tensor, source_param: torch.Tensor, steps: int) None [source]¶
Compute the moving average of the parameters using the exponential momentum strategy.
- Parameters
averaged_param (Tensor) – The averaged parameters.
source_param (Tensor) – The source parameters.
steps (int) – The number of times the parameters have been updated.
- class mmpose.engine.hooks.PoseVisualizationHook(enable: bool = False, interval: int = 50, score_thr: float = 0.3, show: bool = False, wait_time: float = 0.0, out_dir: Optional[str] = None, file_client_args: dict = {'backend': 'disk'})[source]¶
Pose Estimation Visualization Hook. Used to visualize validation and testing process prediction results.
In the testing phase:
- If
show
is True, it means that only the prediction results are visualized without storing data, so
vis_backends
needs to be excluded.
- If
- If
out_dir
is specified, it means that the prediction results need to be saved to
out_dir
. In order to avoid vis_backends also storing data, sovis_backends
needs to be excluded.
- If
vis_backends
takes effect if the user does not specifyshow
and out_dir`. You can set
vis_backends
to WandbVisBackend or TensorboardVisBackend to store the prediction result in Wandb or Tensorboard.
- Parameters
enable (bool) – whether to draw prediction results. If it is False, it means that no drawing will be done. Defaults to False.
interval (int) – The interval of visualization. Defaults to 50.
score_thr (float) – The threshold to visualize the bboxes and masks. Defaults to 0.3.
show (bool) – Whether to display the drawn image. Default to False.
wait_time (float) – The interval of show (s). Defaults to 0.
out_dir (str, optional) – directory where painted images will be saved in testing process.
file_client_args (dict) – Arguments to instantiate a FileClient. See
mmengine.fileio.FileClient
for details. Defaults todict(backend='disk')
.
- after_test_iter(runner: mmengine.runner.runner.Runner, batch_idx: int, data_batch: dict, outputs: Sequence[mmpose.structures.pose_data_sample.PoseDataSample]) None [source]¶
Run after every testing iterations.
- Parameters
runner (
Runner
) – The runner of the testing process.batch_idx (int) – The index of the current batch in the test loop.
data_batch (dict) – Data from dataloader.
outputs (Sequence[
PoseDataSample
]) – Outputs from model.
- after_val_iter(runner: mmengine.runner.runner.Runner, batch_idx: int, data_batch: dict, outputs: Sequence[mmpose.structures.pose_data_sample.PoseDataSample]) None [source]¶
Run after every
self.interval
validation iterations.- Parameters
runner (
Runner
) – The runner of the validation process.batch_idx (int) – The index of the current batch in the val loop.
data_batch (dict) – Data from dataloader.
outputs (Sequence[
PoseDataSample
]) – Outputs from model.
mmpose.apis.webcam¶
MMPose Webcam API: Tools to build simple interactive webcam applications and demos