0
月份:2020年1月
[论文笔记][EMNLP-2016]Sequence-Level Knowledge Distillation
0
[论文笔记][ICLR-2016]Unifying distillation and privileged information
0
[论文笔记][CoRR-2019]Making Neural Machine Reading Comprehension Faster
0
[论文笔记][CoRR-2015]Distilling the Knowledge in a Neural Network
Advantage of soft target:
When the soft targets have high entropy, they provide much more information per training case than hard targets and much less variance in the gradient between training cases, so the small model can often be trained on much less data than the original cumbersome model and using a much higher learning rate.
0