Strong Semi-Supervised Learners

I was recently asked to summarize a Machine Learning paper from Google Brain Team's latest research output: Big Self-Supervised Models are Strong Semi-Supervised Learners. I guess that's what an "Abstract" is for but the goal is to make this summary a real tl;dr version of the paper and tailor the technical content to be understandable to interested machine learning novices and, at the same time, credible and intriguing to well-versed machine learning researchers. I'll let you, my reader, decide how many of those goals I'm able to achieve, so here goes...

Why

Strong semi-supervised learners could solve the well-known problem of building image classification models that can learn from a large dataset of unlabelled images, while deriving the benefits of only a few labeled examples.

What

Google researchers led by Ting Chen have developed SimCLRv2, a bigger version of SimCLR, a semi-supervised learning framework driven by contrastive learning. To provide a superior quality of classification, the framework has been augmented with deeper networks and more fine-tuning.

SimCLRv2 shows substantial improvement over the first version. This has been achieved by strengthening it with deeper neural networks, both convolutional and non-linear, fine-tuning the non-linear networks, and enriching the memory mechanism.

How

SimCLRv2’s first step is pre-training by contrastive learning of unlabelled images with SimCLR. The labeled examples are then used to transform this task-agnostic model into a classification-specific one. The classification is further improved by knowledge distillation.

The pre-training is performed using convolutional neural networks with ResNet architecture. SimCLRv2 uses ResNet-152, which has 3x the number of layers of ResNet-50 that’s used in its predecessor.
The fine-tuning transformation is done by a fully-connected network of non-linear multi-layer perceptrons. This network has three layers, which is one more than in SimCLR. Hence, the fine-tuning starts from the middle layer instead of the first and unlike in SimCLR, the network doesn’t get discarded.
The knowledge acquired thus far from the aforementioned steps is distilled using the “teacher-student” paradigm. The fine-tuned network is the teacher, which transfers knowledge to a newly created student network that’s smaller or similar in size to it.

The main benefit of SimCLRv2 over SimCLR, which is the state-of-the-art in contrastive learning frameworks, comes from scaling up the model from ResNet-50 to ResNet-152, which gives a 29% relative improvement in top-1 accuracy when fine-tuned on 1% of labeled examples.

What next

Frameworks such as SimCLRv2 will lead to the proliferation of the “unsupervised pre-train, supervised fine-tune” paradigm in computer vision research, much like in natural language processing, where it's extensively used.

When less is more but bigger is better

Why

What

How

What next

More such content...

Data Pipeline for Customer Success Dashboards

When less is more but bigger is better

Of problems of data and problematic data