learn2learn.vision

Datasets, models, and other utilities related to computer vision.

learn2learn.vision.models

Description

A set of commonly used models for meta-learning vision tasks. For simplicity, all models' forward conform to the following API:

1
2
3
4
def forward(self, x):
    x = self.features(x)
    x = self.classifier(x)
    return x

OmniglotFC

1
OmniglotFC(input_size, output_size, sizes=None)

[Source]

Description

The fully-connected network used for Omniglot experiments, as described in Santoro et al, 2016.

References

  1. Santoro et al. 2016. “Meta-Learning with Memory-Augmented Neural Networks.” ICML.

Arguments

  • input_size (int) - The dimensionality of the input.
  • output_size (int) - The dimensionality of the output.
  • sizes (list, optional, default=None) - A list of hidden layer sizes.

Example

1
2
3
net = OmniglotFC(input_size=28**2,
                 output_size=10,
                 sizes=[64, 64, 64])

OmniglotCNN

1
OmniglotCNN(output_size=5, hidden_size=64, layers=4)

Source

Description

The convolutional network commonly used for Omniglot, as described by Finn et al, 2017.

This network assumes inputs of shapes (1, 28, 28).

References

  1. Finn et al. 2017. “Model-Agnostic Meta-Learning for Fast Adaptation of Deep Networks.” ICML.

Arguments

  • output_size (int) - The dimensionality of the network's output.
  • hidden_size (int, optional, default=64) - The dimensionality of the hidden representation.
  • layers (int, optional, default=4) - The number of convolutional layers.

Example

1
model = OmniglotCNN(output_size=20, hidden_size=128, layers=3)

CNN4

1
2
3
4
5
6
CNN4(output_size,
     hidden_size=64,
     layers=4,
     channels=3,
     max_pool=True,
     embedding_size=None)

[Source]

Description

The convolutional network commonly used for MiniImagenet, as described by Ravi et Larochelle, 2017.

This network assumes inputs of shapes (3, 84, 84).

Instantiate CNN4Backbone if you only need the feature extractor.

References

  1. Ravi and Larochelle. 2017. “Optimization as a Model for Few-Shot Learning.” ICLR.

Arguments

  • output_size (int) - The dimensionality of the network's output.
  • hidden_size (int, optional, default=32) - The dimensionality of the hidden representation.
  • layers (int, optional, default=4) - The number of convolutional layers.

Example

1
model = CNN4(output_size=20, hidden_size=128, layers=3)

ResNet12

1
2
3
4
5
6
7
8
ResNet12(output_size,
         hidden_size=640,
         avg_pool=True,
         wider=True,
         embedding_dropout=0.0,
         dropblock_dropout=0.1,
         dropblock_size=5,
         channels=3)

[Source]

Description

The 12-layer residual network from Mishra et al, 2017.

The code is adapted from Lee et al, 2019 who share it under the Apache 2 license.

Instantiate ResNet12Backbone if you only need the feature extractor.

List of changes:

  • Rename ResNet to ResNet12.
  • Small API modifications.
  • Fix code style to be compatible with PEP8.
  • Support multiple devices in DropBlock

References

  1. Mishra et al. 2017. “A Simple Neural Attentive Meta-Learner.” ICLR 18.
  2. Lee et al. 2019. “Meta-Learning with Differentiable Convex Optimization.” CVPR 19.
  3. Lee et al's code: https://github.com/kjunelee/MetaOptNet/
  4. Oreshkin et al. 2018. “TADAM: Task Dependent Adaptive Metric for Improved Few-Shot Learning.” NeurIPS 18.

Arguments

  • output_size (int) - The dimensionality of the output (eg, number of classes).
  • hidden_size (list, optional, default=640) - Size of the embedding once features are extracted. (640 is for mini-ImageNet; used for the classifier layer)
  • avg_pool (bool, optional, default=True) - Set to False for the 16k-dim embeddings of Lee et al, 2019.
  • wider (bool, optional, default=True) - True uses (64, 160, 320, 640) filters akin to Lee et al, 2019. False uses (64, 128, 256, 512) filters, akin to Oreshkin et al, 2018.
  • embedding_dropout (float, optional, default=0.0) - Dropout rate on the flattened embedding layer.
  • dropblock_dropout (float, optional, default=0.1) - Dropout rate for the residual layers.
  • dropblock_size (int, optional, default=5) - Size of drop blocks.

Example

1
model = ResNet12(output_size=ways, hidden_size=1600, avg_pool=False)

WRN28

1
WRN28(output_size, hidden_size=640, dropout=0.0)

[Source]

Description

The 28-layer 10-depth wide residual network from Dhillon et al, 2020.

The code is adapted from Ye et al, 2020 who share it under the MIT license.

Instantiate WRN28Backbone if you only need the feature extractor.

References

  1. Dhillon et al. 2020. “A Baseline for Few-Shot Image Classification.” ICLR 20.
  2. Ye et al. 2020. “Few-Shot Learning via Embedding Adaptation with Set-to-Set Functions.” CVPR 20.
  3. Ye et al's code: https://github.com/Sha-Lab/FEAT

Arguments

  • output_size (int) - The dimensionality of the output.
  • hidden_size (list, optional, default=640) - Size of the embedding once features are extracted. (640 is for mini-ImageNet; used for the classifier layer)
  • dropout (float, optional, default=0.0) - Dropout rate.

Example

1
model = WRN28(output_size=ways, hidden_size=1600, avg_pool=False)

get_pretrained_backbone

1
2
3
4
5
get_pretrained_backbone(model,
                        dataset,
                        spec='default',
                        root='~/data',
                        download=False)

[Source]

Description

Returns pretrained backbone for a benchmark dataset.

The returned object is a torch.nn.Module instance.

Arguments

  • model (str) - The name of the model (cnn4, resnet12, or wrn28)
  • dataset (str) - The name of the benchmark dataset (mini-imagenet or tiered-imagenet).
  • spec (str, optional, default='default') - Which weight specification to load (default).
  • root (str, optional, default='~/data') - Location of the pretrained weights.
  • download (bool, optional, default=False) - Download the pretrained weights if not available?

Example

1
2
3
4
5
6
backbone = l2l.vision.models.get_pretrained_backbone(
    model='resnet12',
    dataset='mini-imagenet',
    root='~/.data',
    download=True,
)

learn2learn.vision.datasets

Description

Some datasets commonly used in meta-learning vision tasks.

FullOmniglot

1
FullOmniglot(*args, **kwds)

[Source]

Description

This class provides an interface to the Omniglot dataset.

The Omniglot dataset was introduced by Lake et al., 2015. Omniglot consists of 1623 character classes from 50 different alphabets, each containing 20 samples. While the original dataset is separated in background and evaluation sets, this class concatenates both sets and leaves to the user the choice of classes splitting as was done in Ravi and Larochelle, 2017. The background and evaluation splits are available in the torchvision package.

References

  1. Lake et al. 2015. “Human-Level Concept Learning through Probabilistic Program Induction.” Science.
  2. Ravi and Larochelle. 2017. “Optimization as a Model for Few-Shot Learning.” ICLR.

Arguments

  • root (str) - Path to download the data.
  • transform (Transform, optional, default=None) - Input pre-processing.
  • target_transform (Transform, optional, default=None) - Target pre-processing.
  • download (bool, optional, default=False) - Whether to download the dataset.

Example

1
2
3
4
5
6
7
8
omniglot = l2l.vision.datasets.FullOmniglot(root='./data',
                                            transform=transforms.Compose([
                                                transforms.Resize(28, interpolation=LANCZOS),
                                                transforms.ToTensor(),
                                                lambda x: 1.0 - x,
                                            ]),
                                            download=True)
omniglot = l2l.data.MetaDataset(omniglot)

MiniImagenet

1
MiniImagenet(*args, **kwds)

[Source]

Description

The mini-ImageNet dataset was originally introduced by Vinyals et al., 2016.

It consists of 60'000 colour images of sizes 84x84 pixels. The dataset is divided in 3 splits of 64 training, 16 validation, and 20 testing classes each containing 600 examples. The classes are sampled from the ImageNet dataset, and we use the splits from Ravi & Larochelle, 2017.

References

  1. Vinyals et al. 2016. “Matching Networks for One Shot Learning.” NeurIPS.
  2. Ravi and Larochelle. 2017. “Optimization as a Model for Few-Shot Learning.” ICLR.

Arguments

  • root (str) - Path to download the data.
  • mode (str, optional, default='train') - Which split to use. Must be 'train', 'validation', or 'test'.
  • transform (Transform, optional, default=None) - Input pre-processing.
  • target_transform (Transform, optional, default=None) - Target pre-processing.

Example

1
2
3
train_dataset = l2l.vision.datasets.MiniImagenet(root='./data', mode='train')
train_dataset = l2l.data.MetaDataset(train_dataset)
train_generator = l2l.data.TaskGenerator(dataset=train_dataset, ways=ways)

TieredImagenet

1
TieredImagenet(*args, **kwds)

[Source]

Description

The tiered-ImageNet dataset was originally introduced by Ren et al, 2018 and we download the data directly from the link provided in their repository.

Like mini-ImageNet, tiered-ImageNet builds on top of ILSVRC-12, but consists of 608 classes (779,165 images) instead of 100. The train-validation-test split is made such that classes from similar categories are in the same splits. There are 34 categories each containing between 10 and 30 classes. Of these categories, 20 (351 classes; 448,695 images) are used for training, 6 (97 classes; 124,261 images) for validation, and 8 (160 class; 206,209 images) for testing.

References

  1. Ren et al, 2018. "Meta-Learning for Semi-Supervised Few-Shot Classification." ICLR '18.
  2. Ren Mengye. 2018. "few-shot-ssl-public". https://github.com/renmengye/few-shot-ssl-public

Arguments

  • root (str) - Path to download the data.
  • mode (str, optional, default='train') - Which split to use. Must be 'train', 'validation', or 'test'.
  • transform (Transform, optional, default=None) - Input pre-processing.
  • target_transform (Transform, optional, default=None) - Target pre-processing.
  • download (bool, optional, default=False) - Whether to download the dataset.

Example

1
2
3
train_dataset = l2l.vision.datasets.TieredImagenet(root='./data', mode='train', download=True)
train_dataset = l2l.data.MetaDataset(train_dataset)
train_generator = l2l.data.TaskDataset(dataset=train_dataset, num_tasks=1000)

FC100

1
FC100(*args, **kwds)

[Source]

Description

The FC100 dataset was originally introduced by Oreshkin et al., 2018.

It is based on CIFAR100, but unlike CIFAR-FS training, validation, and testing classes are split so as to minimize the information overlap between splits. The 100 classes are grouped into 20 superclasses of which 12 (60 classes) are used for training, 4 (20 classes) for validation, and 4 (20 classes) for testing. Each class contains 600 images. The specific splits are provided in the Supplementary Material of the paper. Our data is downloaded from the link provided by [2].

References

  1. Oreshkin et al. 2018. "TADAM: Task Dependent Adaptive Metric for Improved Few-Shot Learning." NeurIPS.
  2. Kwoonjoon Lee. 2019. "MetaOptNet." https://github.com/kjunelee/MetaOptNet

Arguments

  • root (str) - Path to download the data.
  • mode (str, optional, default='train') - Which split to use. Must be 'train', 'validation', or 'test'.
  • transform (Transform, optional, default=None) - Input pre-processing.
  • target_transform (Transform, optional, default=None) - Target pre-processing.

Example

1
2
3
train_dataset = l2l.vision.datasets.FC100(root='./data', mode='train')
train_dataset = l2l.data.MetaDataset(train_dataset)
train_generator = l2l.data.TaskDataset(dataset=train_dataset, num_tasks=1000)

CIFARFS

1
CIFARFS(*args, **kwds)

[Source]

Description

The CIFAR Few-Shot dataset as originally introduced by Bertinetto et al., 2019.

It consists of 60'000 colour images of sizes 32x32 pixels. The dataset is divided in 3 splits of 64 training, 16 validation, and 20 testing classes each containing 600 examples. The classes are sampled from the CIFAR-100 dataset, and we use the splits from Bertinetto et al., 2019.

References

  1. Bertinetto et al. 2019. "Meta-learning with differentiable closed-form solvers". ICLR.

Arguments

  • root (str) - Path to download the data.
  • mode (str, optional, default='train') - Which split to use. Must be 'train', 'validation', or 'test'.
  • transform (Transform, optional, default=None) - Input pre-processing.
  • target_transform (Transform, optional, default=None) - Target pre-processing.

Example

1
2
3
train_dataset = l2l.vision.datasets.CIFARFS(root='./data', mode='train')
train_dataset = l2l.data.MetaDataset(train_dataset)
train_generator = l2l.data.TaskGenerator(dataset=train_dataset, ways=ways)

VGGFlower102

1
VGGFlower102(*args, **kwds)

[Source]

Description

The VGG Flowers dataset was originally introduced by Nilsback and Zisserman, 2006 and then re-purposed for few-shot learning in Triantafillou et al., 2020.

The dataset consists of 102 classes of flowers, with each class consisting of 40 to 258 images. We provide the raw (unprocessed) images, and follow the train-validation-test splits of Triantafillou et al.

References

  1. Nilsback, M. and A. Zisserman. 2006. "A Visual Vocabulary for Flower Classification." CVPR '06.
  2. Triantafillou et al. 2020. "Meta-Dataset: A Dataset of Datasets for Learning to Learn from Few Examples." ICLR '20.
  3. https://www.robots.ox.ac.uk/~vgg/data/flowers/

Arguments

  • root (str) - Path to download the data.
  • mode (str, optional, default='train') - Which split to use. Must be 'train', 'validation', or 'test'.
  • transform (Transform, optional, default=None) - Input pre-processing.
  • target_transform (Transform, optional, default=None) - Target pre-processing.
  • download (bool, optional, default=False) - Whether to download the dataset.

Example

1
2
3
train_dataset = l2l.vision.datasets.VGGFlower102(root='./data', mode='train')
train_dataset = l2l.data.MetaDataset(train_dataset)
train_generator = l2l.data.TaskDataset(dataset=train_dataset, num_tasks=1000)

FGVCAircraft

1
FGVCAircraft(*args, **kwds)

[Source]

Description

The FGVC Aircraft dataset was originally introduced by Maji et al., 2013 and then re-purposed for few-shot learning in Triantafillou et al., 2020.

The dataset consists of 10,200 images of aircraft (102 classes, each 100 images). We provided the raw (un-processed) images and follow the train-validation-test splits of Triantafillou et al. TODO: Triantafillou et al. recommend cropping the images using the bounding box information, to remove copyright information and ensure that only one plane is visible in the image.

References

  1. Maji et al. 2013. "Fine-Grained Visual Classification of Aircraft." arXiv [cs.CV].
  2. Triantafillou et al. 2020. "Meta-Dataset: A Dataset of Datasets for Learning to Learn from Few Examples." ICLR '20.
  3. http://www.robots.ox.ac.uk/~vgg/data/fgvc-aircraft/

Arguments

  • root (str) - Path to download the data.
  • mode (str, optional, default='train') - Which split to use. Must be 'train', 'validation', or 'test'.
  • transform (Transform, optional, default=None) - Input pre-processing.
  • target_transform (Transform, optional, default=None) - Target pre-processing.
  • download (bool, optional, default=False) - Whether to download the dataset.

Example

1
2
3
train_dataset = l2l.vision.datasets.FGVCAircraft(root='./data', mode='train', download=True)
train_dataset = l2l.data.MetaDataset(train_dataset)
train_generator = l2l.data.TaskDataset(dataset=train_dataset, num_tasks=1000)

FGVCFungi

1
FGVCFungi(*args, **kwds)

[Source]

Description

The FGVC Fungi dataset was originally introduced in the 5th Workshop on Fine-Grained Visual Categorization (FGVC) and then re-purposed for few-shot learning in Triantafillou et al., 2020.

The dataset consists of 1,394 classes and 89,760 images of fungi. We provide the raw (unprocessed) images, and follow the train-validation-test splits of Triantafillou et al.

Important You must agree to the original Terms of Use to use this dataset. More information here: https://github.com/visipedia/fgvcx_fungi_comp

References

  1. https://sites.google.com/view/fgvc5/home
  2. Triantafillou et al. 2020. "Meta-Dataset: A Dataset of Datasets for Learning to Learn from Few Examples." ICLR '20.
  3. https://github.com/visipedia/fgvcx_fungi_comp

Arguments

  • root (str) - Path to download the data.
  • mode (str, optional, default='train') - Which split to use. Must be 'train', 'validation', or 'test'.
  • transform (Transform, optional, default=None) - Input pre-processing.
  • target_transform (Transform, optional, default=None) - Target pre-processing.
  • download (bool, optional, default=False) - Whether to download the dataset.

Example

1
2
3
train_dataset = l2l.vision.datasets.FGVCFungi(root='./data', mode='train')
train_dataset = l2l.data.MetaDataset(train_dataset)
train_generator = l2l.data.TaskDataset(dataset=train_dataset, num_tasks=1000)

DescribableTextures

1
DescribableTextures(*args, **kwds)

[Source]

Description

The VGG Describable Textures dataset was originally introduced by Cimpoi et al., 2014 and then re-purposed for few-shot learning in Triantafillou et al., 2020.

The dataset consists of 5640 images organized according to 47 texture classes. Each class consists of 120 images between 300x300 and 640x640 pixels. Each image contains at least 90% of the texture. We follow the train-validation-test splits of Triantafillou et al., 2020. (33 classes for train, 7 for validation and test.)

References

  1. Cimpoi et al. 2014. "Describing Textures in the Wild." CVPR'14.
  2. Triantafillou et al. 2020. "Meta-Dataset: A Dataset of Datasets for Learning to Learn from Few Examples." ICLR '20.
  3. https://www.robots.ox.ac.uk/~vgg/data/dtd/

Arguments

  • root (str) - Path to download the data.
  • mode (str, optional, default='train') - Which split to use. Must be 'train', 'validation', or 'test'.
  • transform (Transform, optional, default=None) - Input pre-processing.
  • target_transform (Transform, optional, default=None) - Target pre-processing.
  • download (bool, optional, default=False) - Whether to download the dataset.

Example

1
2
3
train_dataset = l2l.vision.datasets.DescribableTextures(root='./data', mode='train')
train_dataset = l2l.data.MetaDataset(train_dataset)
train_generator = l2l.data.TaskDataset(dataset=train_dataset, num_tasks=1000)

CUBirds200

1
CUBirds200(*args, **kwds)

[Source]

Description

The Caltech-UCSD Birds dataset was originally introduced by Wellinder et al., 2010 and then re-purposed for few-shot learning in Triantafillou et al., 2020.

The dataset consists of 6,033 bird images classified into 200 bird species. The train set consists of 140 classes, while the validation and test sets each contain 30. We provide the raw (unprocessed) images, and follow the train-validation-test splits of Triantafillou et al.

This dataset includes 43 images that overlap with the ILSVRC-2012 (ImageNet) dataset. They are omitted by default, but can be included by setting the include_imagenet_duplicates flag to True.

References

  1. Welinder et al. 2010. "Caltech-UCSD Birds 200." Caltech Technical Report.
  2. Triantafillou et al. 2020. "Meta-Dataset: A Dataset of Datasets for Learning to Learn from Few Examples." ICLR '20.
  3. http://www.vision.caltech.edu/visipedia/CUB-200.html

Arguments

  • root (str) - Path to download the data.
  • mode (str, optional, default='train') - Which split to use. Must be 'train', 'validation', or 'test'.
  • transform (Transform, optional, default=None) - Input pre-processing.
  • target_transform (Transform, optional, default=None) - Target pre-processing.
  • download (bool, optional, default=False) - Whether to download the dataset.
  • include_imagenet_duplicates (bool, optional, default=False) - Whether to include images that are also present in the ImageNet 2012 dataset.

Example

1
2
3
train_dataset = l2l.vision.datasets.CUBirds200(root='./data', mode='train')
train_dataset = l2l.data.MetaDataset(train_dataset)
train_generator = l2l.data.TaskDataset(dataset=train_dataset, num_tasks=1000)

Quickdraw

1
Quickdraw(*args, **kwds)

[Source]

Description

The Quickdraw dataset was originally introduced by Google Creative Lab in 2017 and then re-purposed for few-shot learning in Triantafillou et al., 2020. See Ha and Heck, 2017 for more information.

The dataset consists of roughly 50M drawing images of 345 objects. Each image was hand-drawn by human annotators and is represented as black-and-white 28x28 pixel array. We follow the train-validation-test splits of Triantafillou et al., 2020. (241 classes for train, 52 for validation, and 52 for test.)

References

  1. https://github.com/googlecreativelab/quickdraw-dataset
  2. Ha, David, and Douglas Eck. 2017. "A Neural Representation of Sketch Drawings." ArXiv '17.
  3. Triantafillou et al. 2020. "Meta-Dataset: A Dataset of Datasets for Learning to Learn from Few Examples." ICLR '20.

Arguments

  • root (str) - Path to download the data.
  • mode (str, optional, default='train') - Which split to use. Must be 'train', 'validation', or 'test'.
  • transform (Transform, optional, default=None) - Input pre-processing.
  • target_transform (Transform, optional, default=None) - Target pre-processing.
  • download (bool, optional, default=False) - Whether to download the dataset.

Example

1
2
3
train_dataset = l2l.vision.datasets.Quickdraw(root='./data', mode='train')
train_dataset = l2l.data.MetaDataset(train_dataset)
train_generator = l2l.data.TaskDataset(dataset=train_dataset, num_tasks=1000)

learn2learn.vision.transforms

Description

A set of transformations commonly used in meta-learning vision tasks.

RandomClassRotation

1
RandomClassRotation(dataset, degrees)

[Source]

Description

Samples rotations from a given list uniformly at random, and applies it to all images from a given class.

Arguments

  • degrees (list) - The rotations to be sampled.

Example

1
transform = RandomClassRotation([0, 90, 180, 270])

learn2learn.vision.benchmarks

The benchmark modules provides a convenient interface to standardized benchmarks in the literature. It provides train/validation/test TaskDatasets and TaskTransforms for pre-defined datasets.

This utility is useful for researchers to compare new algorithms against existing benchmarks. For a more fine-grained control over tasks and data, we recommend directly using l2l.data.TaskDataset and l2l.data.TaskTransforms.

list_tasksets

1
list_tasksets()

[Source]

Description

Returns a list of all available benchmarks.

Example

1
2
3
for name in l2l.vision.benchmarks.list_tasksets():
    print(name)
    tasksets = l2l.vision.benchmarks.get_tasksets(name)

get_tasksets

1
2
3
4
5
6
7
8
9
get_tasksets(name,
             train_ways=5,
             train_samples=10,
             test_ways=5,
             test_samples=10,
             num_tasks=-1,
             root='~/data',
             device=None,
             **kwargs)

[Source]

Description

Returns the tasksets for a particular benchmark, using literature standard data and task transformations.

The returned object is a namedtuple with attributes train, validation, test which correspond to their respective TaskDatasets. See examples/vision/maml_miniimagenet.py for an example.

Arguments

  • name (str) - The name of the benchmark. Full list in list_tasksets().
  • train_ways (int, optional, default=5) - The number of classes per train tasks.
  • train_samples (int, optional, default=10) - The number of samples per train tasks.
  • test_ways (int, optional, default=5) - The number of classes per test tasks. Also used for validation tasks.
  • test_samples (int, optional, default=10) - The number of samples per test tasks. Also used for validation tasks.
  • num_tasks (int, optional, default=-1) - The number of tasks in each TaskDataset.
  • device (torch.Device, optional, default=None) - If not None, tasksets are loaded as Tensors on device.
  • root (str, optional, default='~/data') - Where the data is stored.

Example

1
2
3
4
5
6
7
train_tasks, validation_tasks, test_tasks = l2l.vision.benchmarks.get_tasksets('omniglot')
batch = train_tasks.sample()

or:

tasksets = l2l.vision.benchmarks.get_tasksets('omniglot')
batch = tasksets.train.sample()