[论文笔记][WMT-2019]Low-Resource Corpus Filtering using Multilingual Sentence Embeddings

LASER Multilingual Representations

1. Main idea

first train an encoder that learns to produce a multilingual, fixed-size sentence representation; and then compute a distance between two sentences in the learned embedding space.

2. Encoder

3. Margin

4. Neighborhood

paper

0