一. Architecture
二. Detail
三. Loss Function
0
Wp1,Wp2 and Wp3 are peephole connections(遗忘门和输入门考虑了 Ct-1,而输出门考虑了 Ct)
word hashing和DSSM的有些不同
letter-trigram layer:30k*3=90k, word hashing之后每个单词的纬度为30k,然后3-gram拼接为90k
Given a word (e.g. good), we first add word starting and ending marks to the word(e.g. #good#). Then, we break the word into lettern-grams(e.g. letter trigrams: #go, goo, ood, od#). Finally, the word is represented using a vector of letter n-grams.