[论文笔记][CoRR-2015]Distilling the Knowledge in a Neural Network

Advantage of soft target

When the soft targets have high entropy, they provide much more information per training case than hard targets and much less variance in the gradient between training cases, so the small model can often be trained on much less data than the original cumbersome model and using a much higher learning rate.

如何理解soft target这一做法?

paper slides