I was recently asked to summarize a Machine Learning paper from Google Brain Team's latest research output: Big Self-Supervised Models are Strong Semi-Supervised Learners. I guess that's what an "Abstract" is for but the goal is to make this summary a real tl;dr version of the paper and tailor the technical content to be understandable to interested machine learning novices and, at the same time, credible and intriguing to well-versed machine learning researchers. I'll let you, my reader, decide how many of those goals I'm able to achieve, so here goes...
Strong semi-supervised learners could solve the well-known problem of building image classification models that can learn from a large dataset of unlabelled images, while deriving the benefits of only a few labeled examples.
Google researchers led by Ting Chen have developed SimCLRv2, a bigger version of SimCLR, a semi-supervised learning framework driven by contrastive learning. To provide a superior quality of classification, the framework has been augmented with deeper networks and more fine-tuning.
SimCLRv2 shows substantial improvement over the first version. This has been achieved by strengthening it with deeper neural networks, both convolutional and non-linear, fine-tuning the non-linear networks, and enriching the memory mechanism.
SimCLRv2’s first step is pre-training by contrastive learning of unlabelled images with SimCLR. The labeled examples are then used to transform this task-agnostic model into a classification-specific one. The classification is further improved by knowledge distillation.
- The pre-training is performed using convolutional neural networks with ResNet architecture. SimCLRv2 uses ResNet-152, which has 3x the number of layers of ResNet-50 that’s used in its predecessor.
- The fine-tuning transformation is done by a fully-connected network of non-linear multi-layer perceptrons. This network has three layers, which is one more than in SimCLR. Hence, the fine-tuning starts from the middle layer instead of the first and unlike in SimCLR, the network doesn’t get discarded.
- The knowledge acquired thus far from the aforementioned steps is distilled using the “teacher-student” paradigm. The fine-tuned network is the teacher, which transfers knowledge to a newly created student network that’s smaller or similar in size to it.
The main benefit of SimCLRv2 over SimCLR, which is the state-of-the-art in contrastive learning frameworks, comes from scaling up the model from ResNet-50 to ResNet-152, which gives a 29% relative improvement in top-1 accuracy when fine-tuned on 1% of labeled examples.
Frameworks such as SimCLRv2 will lead to the proliferation of the “unsupervised pre-train, supervised fine-tune” paradigm in computer vision research, much like in natural language processing, where it's extensively used.