- on_train_batch_start
- optimaze_step 
  - training_step
- on_before_zero_grad
- scaler.scale(loss)将loss放大
- on_before_backward
- model.backward梯度反传
- on_after_backward
- scaler.unscale_(optimizer)将grad还原
- on_before_optimizer_step
- _clip_gradients梯度裁剪
- scaler.step(optimizer)如果发现nan梯度,则optimizer跳过
- scaler.update()更新缩放器
 
- on_train_batch_end










