By adding the GradientClip callback, the gradient norm_type
(default:2) norm
is clipped to at most max_norm
(default:1) using torch::nn_utils_clip_grad_norm_()
,
which can avoid loss divergence.
References
See FastAI documentation for the GradientClip callback.