By adding the GradientClip callback, the gradient norm_type
(default:2) norm
is clipped to at most max_norm
(default:1) using torch::nn_utils_clip_grad_norm_()
,
which can avoid loss divergence.
Arguments
- max_norm
(float or int): max norm of the gradients
- norm_type
(float or int): type of the used p-norm. Can be
Inf
for infinity norm.
References
See FastAI documentation for the GradientClip callback.