Day11 when gradient is small……

怎么知道是局部小 还是鞍点?
using Math
example – using Hessan



Dont afraid of saddle point(鞍点)
local minima VS saddle Point


Day12 Tips for training :Batch and Momentum
why we use batch?
前面有讲到这里, 前倾回归


shuffle :有可能batch结束后,就会重新分一次batch
small vs big

未考虑平行运算(并行 --gpu)



| Aspect | Small Batch Size(100个样本) | Large Batch Size(10000个样本) |
|---|---|---|
| Speed for one update (no parallel) | Faster | Slower |
| Speed for one update (with parallel) | Same | Same (not too large) |
| Time for one epoch | Slower | Faster |
| Gradient | Noisy | Stable |
| Optimization | Better | Worse |
| Generalization | Better | Worse |
batch is a hyperparameter……

Momentum
惯性

concluding:











