-
- transpose
- permute
- squeeze
- unsqueeze
- view
- reshape
- cat
- stack
- repeat
- expand
月份:2019年9月
[论文笔记][ACL-2019]Deep Unknown Intent Detection with Margin Loss
一、Background
-
- Kim and Kim (INTERSPEECH2018)Joint Learning of Domain Classification and Out-of-Domain Detection with Dynamic Class Weighting for Satisficing False Acceptance Rates still need out-of-domain samples。
- Yu et al., 2017(IJCAI2017)Open Category Classification by Adversarial Sample Generation
tries to generate positive and negative examples from known classes by using adversarial learning to augment training data, but does not work well in the discrete data space like text. - Brychcin and Kra´l(EACL2017)Unsupervised Dialogue Act Induction using Gaussian Mixtures tries to model intents through clustering. Still, it does not make good use of prior knowledge provided by known intents and clustering results are usually unsatisfactory.
- LMCL:Wang et al. (CVPR 2018b)Cosface: Large margin cosine loss for deep face recognition
二、Contributions
-
- We propose a two-stage method for unknown intent detection with BiLSTM;
- We introduce margin loss on BiLSTM to learn discriminative deep features, which is suitable for the detection task;
- Experiments conducted on two benchmark dialogue datasets show the effectiveness of the proposed method.
三、Method
1. BiLSTM
拼接前后向最有一个时间步的输出。
2. Large Margin Cosine Loss (LMCL)
Transforms softmax loss into cosine loss by applying L2 normalization on both features and weight vectors.
(ICLR 2019)DO DEEP GENERATIVE MODELS KNOW WHAT THEY DON’T KNOW?
3. Local Outlier Factor (LOF)
利用k-近邻找异常点。
四、Experiments
1. Baselines
-
- Maximum Softmax Probability (MSP)
- DOC:SOTA method in the field of open-world classification(EMNLP2017)
-
DOC(Softmax)
-
LOF (Softmax)
2. Result
五、Refs
[论文笔记][ACL-2019]A Novel Bi-directional Interrelated Model for Joint Intent Detection and Slot Filling
一、Background
-
- Joint model奠基者:
- 其他Model的缺点
- Attention-based recurrent neural network models for joint intent detection(Liu and Lane 2016,INTERSPEECH2016) just applied a joint loss function to link the two tasks implicitly
- Multi-domain joint semantic frame parsing using bi-directional rnn-lstm(Hakkani-T¨ur et al. 2016,INTERSPEECH2016) did not establish the explicit relationships between the slots and intent.
- Slot-gated modeling for joint slot filling and intent prediction(Goo et al. (2018),NAACL-HLT-2018). Slot information is not used in intent detection task(Slot-gate利用ID信息只对SF任务加成) .The bi-directional direct connections are still not established.
二、Contributions
-
- A novel ID subnet is proposed to apply the slot information to intent detection task;(最大的贡献)
- A novel iteration mechanism inside the SF-ID network in order to enhance the connections between the intent and slots;
- The experiments on two benchmark datasets show the effectiveness and superiority of the proposed model.
三、Approaches
-
- 1.SF-First Mode:We apply the intent context vector Cinte and slot context vector Cslot in the SF subnet and generate the slot reinforce vector Rslot. Then, the newly formed vector Rslot is fed to the ID subnet to bring the slot information.(将SF信息带入ID中).
- SF subnet
Cislot和Cinte就和Slot-Gated 里面的一样,分每个隐藏状态的attention和整个隐藏状态的attention。
- ID subnet
- Iteration Mechanism
- SF subnet
- 2.ID-First Mode
- 3.CRF layer:加在the SF subnet outputs上
- 1.SF-First Mode:We apply the intent context vector Cinte and slot context vector Cslot in the SF subnet and generate the slot reinforce vector Rslot. Then, the newly formed vector Rslot is fed to the ID subnet to bring the slot information.(将SF信息带入ID中).
四、Experiment
五、Refs
ConfigParser/configparser
- ConfigParser:在Python2中的包名
- configparser: 在Python3中的包名
按节(section)、键(option)、值来读写配置文件。
有个坑:大写会全部变小写,必须重写类
[论文笔记][AAAI-2019]Unsupervised Transfer Learning for Spoken Language Understanding in Intelligent Agents
一、Contributions
- 应用ELMo从原始ASR文本上做非监督的知识迁移,展示了SLU准确性的提升
- 提出了EMLo-Light(ELMoL),非常适合商业的设置,与ELMo对比减少了耗时和内存需求
- 结合非监督迁移学习(UT)和监督迁移学习(ST),展示了二者结合的效果。
- 在多种资源条件下,我们在benchmark SLU数据集和来自Alexa数的据集上评估了本算法的效果。
二、Methods for Unsupervised Transfer
- Embeddings from Language Model (ELMo)
首先使用unlabeled data训练EMLo(CNN-BIG-LSTM,CNN得到与上下文无关的词表达,BiLSTM得到与上下文有关的词表达),
其中使用了两层BiLSTM,训练过程中BiLSTM的参数保持不变,然后在ET和IC联合任务上来优化参数y和si。
-
- 优点:advantage of this unsupervised pre-training is that the CNNBIG-LSTM weights do not experience catastrophic forgetting, therefore the SLU architecture can be trained without losing the knowledge gained from unlabeled data
- 缺点:computing ELMo embeddings at runtime introduces many additional parameters (e.g., weights from the large CNNBIG-LSTM)
- ELMo-Light (ELMoL) for SLU tasks
移除CNN,使用单层BiLSTM在unlabeled data训练词的表示,然后只需要训练三个ELMO的参数(y; s0; s1),因为BiLSTM只有一层。最终在labeled data上fine-tune BiLSTM 的参数。
这边提出了一些训练ELMoL的技巧:(由于在labeled data上fine-tune BiLSTM的参数会导致catastrophic forgetting,忘掉之前在unlabeled data上的信息)
-
- gradual unfreezing (guf ):updating only the top layer for a few epochs keeping lower layers frozen, and then progressively updating bottom layer
- discriminative fine-tuning (discr):combined with discr, e.g, using different learning rates across network layers
- slanted triangular learning rates (tlr ):learning rates that are initially slow which discourages drastic updates in early stages of learning, then it rapidly increase allowing more exploration of the parameter space and then slowly decrease enabling learned parameters to stabilize
-
Combining Supervised Transfer (ST) with UT
- ELMo+ST:1.在unlabeled data训练LM model得到ELMo embedding;2.在labeled data上训练IC/ET任务;3.fine-tune。特点是第2步和第3步中LM中的参数(主要是BiLSTM的参数)不变。
- ELMoL+ST:1.在unlabeled data训练LM model得到ELMoL embedding;2.在labeled data上训练IC/ET任务,训练上层网络时,保持底层网络参数不变,一旦上层网络稳定,就开始训练整个网络,此时底层网络使用比上层更小的学习率;3.使用上述的三个技巧(guf、discr、tlr)fine-tune所有的网络。
三、Experiment Result
四、Future Work
- apply the transfer techniques across different languages.
- experiment with alternative architectures such as transformer and adversarial networks