【ACL-2019】Deep Unknown Intent Detection with Margin Loss

一、Background

二、Contributions

    • We propose a two-stage method for unknown intent detection with BiLSTM;
    • We introduce margin loss on BiLSTM to learn discriminative deep features, which is suitable for the detection task;
    • Experiments conducted on two benchmark dialogue datasets show the effectiveness of the proposed method.

三、Method

1. BiLSTM

拼接前后向最有一个时间步的输出。

2. Large Margin Cosine Loss (LMCL)

Transforms softmax loss into cosine loss by applying L2 normalization on both features and weight vectors.

ICLR 2019)DO DEEP GENERATIVE MODELS KNOW WHAT THEY DON’T KNOW?

3. Local Outlier Factor (LOF)

利用k-近邻找异常点。

可参考基于k-近邻的多元时间序列局部异常检测

四、Experiments

1. Baselines

    • Maximum Softmax Probability (MSP)
    • DOC:SOTA method in the field of open-world classification(EMNLP2017
    • DOC(Softmax)

    • LOF (Softmax)

2. Result

五、Refs

paper/基于k-近邻的多元时间序列局部异常检测 

【ACL-2019】A Novel Bi-directional Interrelated Model for Joint Intent Detection and Slot Filling

一、Background

二、Contributions

    1. A novel ID subnet is proposed to apply the slot information to intent detection task;(最大的贡献
    2. A novel iteration mechanism inside the SF-ID network in order to enhance the connections between the intent and slots;
    3. The experiments on two benchmark datasets show the effectiveness and superiority of the proposed model.

三、Approaches

    • 1.SF-First Mode:We apply the intent context vector Cinte and slot context vector Cslot in the SF subnet and generate the slot reinforce vector Rslot. Then, the newly formed vector Rslot is fed to the ID subnet to bring the slot information.(将SF信息带入ID中).
      • SF subnetCislot和Cinte就和Slot-Gated 里面的一样,分每个隐藏状态的attention和整个隐藏状态的attention。
      • ID subnet
      • Iteration Mechanism
    • 2.ID-First Mode
    • 3.CRF layer:加在the SF subnet outputs上

四、Experiment

五、Refs

paper

【AAAI-2019】Unsupervised Transfer Learning for Spoken Language Understanding in Intelligent Agents

一、Contributions

  • 应用ELMo从原始ASR文本上做非监督的知识迁移,展示了SLU准确性的提升
  • 提出了EMLo-Light(ELMoL),非常适合商业的设置,与ELMo对比减少了耗时和内存需求
  • 结合非监督迁移学习(UT)和监督迁移学习(ST),展示了二者结合的效果。
  • 在多种资源条件下,我们在benchmark SLU数据集和来自Alexa数的据集上评估了本算法的效果。

二、Methods for Unsupervised Transfer

  • Embeddings from Language Model (ELMo)

首先使用unlabeled data训练EMLo(CNN-BIG-LSTM,CNN得到与上下文无关的词表达,BiLSTM得到与上下文有关的词表达),

其中使用了两层BiLSTM,训练过程中BiLSTM的参数保持不变,然后在ET和IC联合任务上来优化参数y和si。

    • 优点:advantage of this unsupervised pre-training is that the CNNBIG-LSTM weights do not experience catastrophic forgetting, therefore the SLU architecture can be trained without losing the knowledge gained from unlabeled data
    • 缺点:computing ELMo embeddings at runtime introduces many additional parameters (e.g., weights from the large CNNBIG-LSTM)
  • ELMo-Light (ELMoL) for SLU tasks

移除CNN,使用单层BiLSTM在unlabeled data训练词的表示,然后只需要训练三个ELMO的参数(y; s0; s1),因为BiLSTM只有一层。最终在labeled data上fine-tune BiLSTM 的参数。

这边提出了一些训练ELMoL的技巧:(由于在labeled data上fine-tune BiLSTM的参数会导致catastrophic forgetting,忘掉之前在unlabeled data上的信息)

    1.  gradual unfreezing (guf ):updating only the top layer for a few epochs keeping lower layers frozen, and then progressively updating bottom layer
    2. discriminative fine-tuning (discr):combined with discr, e.g, using different learning rates across network layers
    3.  slanted triangular learning rates (tlr ):learning rates that are initially slow which discourages drastic updates in early stages of learning, then it rapidly increase allowing more exploration of the parameter space and then slowly decrease enabling learned parameters to stabilize
  • Combining Supervised Transfer (ST) with UT

    • ELMo+ST:1.在unlabeled data训练LM model得到ELMo embedding;2.在labeled data上训练IC/ET任务;3.fine-tune。特点是第2步和第3步中LM中的参数(主要是BiLSTM的参数)不变。
    •  ELMoL+ST:1.在unlabeled data训练LM model得到ELMoL embedding;2.在labeled data上训练IC/ET任务,训练上层网络时,保持底层网络参数不变,一旦上层网络稳定,就开始训练整个网络,此时底层网络使用比上层更小的学习率;3.使用上述的三个技巧(guf、discr、tlr)fine-tune所有的网络。

三、Experiment Result

四、Future Work

  • apply the transfer techniques across different languages.
  • experiment with alternative architectures such as transformer and adversarial networks

五、Refs

paper/reading1