Advanced Training¶
Resume Training¶
Resume training means to continue training from the state saved from one of the previous trainings, where the state includes the model weights, the state of the optimizer and the optimizer parameter adjustment strategy.
Automatically resume training¶
Users can add --resume
to the end of the training command to resume training. The program will automatically load the latest weight file from work_dirs
to resume training. If there is a latest checkpoint
in work_dirs
(e.g. the training was interrupted during the previous training), the training will be resumed from the checkpoint
. Otherwise (e.g. the previous training did not save checkpoint
in time or a new training task was started), the training will be restarted.
Here is an example of resuming training:
python tools/train.py configs/body_2d_keypoint/topdown_heatmap/coco/td-hm_res50_8xb64-210e_coco-256x192.py --resume
Specify the checkpoint to resume training¶
You can also specify the checkpoint
path for --resume
. MMPose will automatically read the checkpoint
and resume training from it. The command is as follows:
python tools/train.py configs/body_2d_keypoint/topdown_heatmap/coco/td-hm_res50_8xb64-210e_coco-256x192.py \
--resume work_dirs/td-hm_res50_8xb64-210e_coco-256x192/latest.pth
If you hope to manually specify the checkpoint
path in the config file, in addition to setting resume=True
, you also need to set the load_from
.
It should be noted that if only load_from
is set without setting resume=True
, only the weights in the checkpoint
will be loaded and the training will be restarted from scratch, instead of continuing from the previous state.
The following example is equivalent to the example above that specifies the --resume
parameter:
resume = True
load_from = 'work_dirs/td-hm_res50_8xb64-210e_coco-256x192/latest.pth'
# model settings
model = dict(
## omitted ##
)
Automatic Mixed Precision (AMP) Training¶
Mixed precision training can reduce training time and storage requirements without changing the model or reducing the model training accuracy, thus supporting larger batch sizes, larger models, and larger input sizes.
To enable Automatic Mixing Precision (AMP) training, add --amp
to the end of the training command, which is as follows:
python tools/train.py ${CONFIG_FILE} --amp
Specific examples are as follows:
python tools/train.py configs/body_2d_keypoint/topdown_heatmap/coco/td-hm_res50_8xb64-210e_coco-256x192.py --amp
Set the random seed¶
If you want to specify the random seed during training, you can use the following command:
python ./tools/train.py \
${CONFIG} \ # config file
--cfg-options randomness.seed=2023 \ # set the random seed = 2023
[randomness.diff_rank_seed=True] \ # Set different seeds according to rank.
[randomness.deterministic=True] # Set the cuDNN backend deterministic option to True
# `[]` stands for optional parameters, when actually entering the command line, you do not need to enter `[]`
randomness
has three parameters that can be set, with the following meanings.
randomness.seed=2023
, set the random seed to2023
.randomness.diff_rank_seed=True
, set different seeds according to globalrank
. Defaults toFalse
.randomness.deterministic=True
, set the deterministic option forcuDNN
backend, i.e., settorch.backends.cudnn.deterministic
toTrue
andtorch.backends.cudnn.benchmark
toFalse
. Defaults toFalse
. See Pytorch Randomness for more details.
Use Tensorboard to Visualize Training¶
Install Tensorboard environment
pip install tensorboard
Enable Tensorboard in the config file
visualizer = dict(vis_backends=[dict(type='LocalVisBackend'),dict(type='TensorboardVisBackend')])
After training, you can use the following command to visualize the training process.
tensorboard --logdir work_dir/${CONFIG}/${TIMESTAMP}/vis_data