simCLR | Notion

This paper introduces SimCLR (Simple Framework for Contrastive Learning of Visual Representations), a self-supervised learning method for training visual representations without labeled data.

Key Concepts

Self-supervised learning: The model learns useful representations by solving a pretext task created from the data itself, without requiring manual labels. In this case, the task is to identify which augmented views come from the same image.

Contrastive learning: The core idea is to learn representations by pulling together different augmented views of the same image (positive pairs) while pushing apart views from different images (negative pairs).

The SimCLR Framework

The framework has four main components:

Data Augmentation Module: Takes an image and creates two different augmented versions (views) using:
- Random cropping and resizing
- Color distortions (brightness, contrast, saturation, hue adjustments)
- Gaussian blur
Base Encoder f(·): A neural network (ResNet-50 in their experiments) that extracts feature representations from the augmented images.
Projection Head g(·): A small MLP network that maps the representations to a space where the contrastive loss is applied. Interestingly, they find that the representations before this projection head are better for downstream tasks.
Contrastive Loss (NT-Xent): A normalized temperature-scaled cross-entropy loss that encourages the model to identify which augmented views came from the same original image.

Key Findings

Data augmentation is crucial: The combination of random cropping and strong color distortion is particularly important. The paper shows that contrastive learning benefits from stronger data augmentation than supervised learning.
Projection head helps: Adding a nonlinear projection head significantly improves the quality of learned representations, even though the final representations used are taken from before this head.
Bigger is better: Larger batch sizes (up to 8192) and longer training improve performance. Larger models also benefit more from self-supervised learning than supervised learning.
Simplicity works: Despite using simpler components than many previous methods, SimCLR achieves state-of-the-art results.

Results

Linear evaluation on ImageNet: 76.5% top-1 accuracy (7% improvement over previous best)
Semi-supervised learning: With only 1% of labels, achieves 85.8% top-5 accuracy
Transfer learning: Competitive or better performance than supervised pre-training on most downstream tasks

Key Concepts

The SimCLR Framework

Key Findings

Results

Why It Matters