Overview
Generator
The generator then learns to predict the original identities of the masked-out tokens.
Discriminator
The discriminator is trained to distinguish tokens in the data from tokens that have been replaced by generator samples.
Loss
- x^masked: The tokens in the selected positions are replaced with a [MASK] token:
- x^corrupt: Replacing the masked-out tokens with generator samples and train the discriminator to predict which tokens in x^corrupt match the original input x