[论文笔记][2019]SpanBERT: Improving Pre-training by Representing and Predicting Spans

Architecture

Span Masking and its implementation

cand_lens = list(range(FLAGS.lower_span_length, FLAGS.upper_span_length + 1))
len_distrib = [FLAGS.geometric_p * (1 - FLAGS.geometric_p) ** (i - FLAGS.lower_span_length) for i in cand_lens]
len_distrib = [x / (sum(len_distrib)) for x in len_distrib]
span_len = np.random.choice(cand_lens, p=len_distrib)  # Geometric Gistribution 几何分布
cur_idx = np.random.choice(len(start_indices))  # Uniform Distribution 均分分布

link:https://github.com/gongel/private_bert/blob/master/create_pretraining_span_data.py

Loss function

Finetune

    • GLUE: Add a linear classifier on top of the [CLS] token
    • QA(SQuAD1.0/2.0)
      • Add two linear classifiers independently on top of it for predicting the answer span boundary (start and end);
      • Add [CLS] and [SEP] for passage-answer pair.

paper

0