BEiT: BERT Pre-Training of Image Transformers - OpenReview BEiT relies on a pre-pre-trained tokenizer that transforms image patches into discrete tokens, which are then masked and predicted Extensive experiments show that this self-supervised pre-training improve SoTA in various downstream tasks such as image classification and semantic segmentation
BEIT: RE-TRAINING OF IMAGE TRANSFORMERS - OpenReview We pretrain BEIT and conduct extensive fine-tuning experiments on downstream tasks, such as image classification, and semantic segmentation We present that the self-attention mechanism of self-supervised BEIT learns to distinguish semantic regions and object boundaries, although without using any human annotation