learn2learn.algorithms

High-Level Interfaces

MAML (BaseLearner)

[Source]

Description

High-level implementation of Model-Agnostic Meta-Learning.

This class wraps an arbitrary nn.Module and augments it with clone() and adapt() methods.

For the first-order version of MAML (i.e. FOMAML), set the first_order flag to True upon initialization.

Arguments

  • model (Module) - Module to be wrapped.
  • lr (float) - Fast adaptation learning rate.
  • first_order (bool, optional, default=False) - Whether to use the first-order approximation of MAML. (FOMAML)
  • allow_unused (bool, optional, default=None) - Whether to allow differentiation of unused parameters. Defaults to allow_nograd.
  • allow_nograd (bool, optional, default=False) - Whether to allow adaptation with parameters that have requires_grad = False.

References

  1. Finn et al. 2017. "Model-Agnostic Meta-Learning for Fast Adaptation of Deep Networks."

Example

1
2
3
4
5
6
linear = l2l.algorithms.MAML(nn.Linear(20, 10), lr=0.01)
clone = linear.clone()
error = loss(clone(X), y)
clone.adapt(error)
error = loss(clone(X), y)
error.backward()

adapt(self, loss, first_order=None, allow_unused=None, allow_nograd=None)

Description

Takes a gradient step on the loss and updates the cloned parameters in place.

Arguments

  • loss (Tensor) - Loss to minimize upon update.
  • first_order (bool, optional, default=None) - Whether to use first- or second-order updates. Defaults to self.first_order.
  • allow_unused (bool, optional, default=None) - Whether to allow differentiation of unused parameters. Defaults to self.allow_unused.
  • allow_nograd (bool, optional, default=None) - Whether to allow adaptation with parameters that have requires_grad = False. Defaults to self.allow_nograd.

clone(self, first_order=None, allow_unused=None, allow_nograd=None)

Description

Returns a MAML-wrapped copy of the module whose parameters and buffers are torch.cloned from the original module.

This implies that back-propagating losses on the cloned module will populate the buffers of the original module. For more information, refer to learn2learn.clone_module().

Arguments

  • first_order (bool, optional, default=None) - Whether the clone uses first- or second-order updates. Defaults to self.first_order.
  • allow_unused (bool, optional, default=None) - Whether to allow differentiation of unused parameters. Defaults to self.allow_unused.
  • allow_nograd (bool, optional, default=False) - Whether to allow adaptation with parameters that have requires_grad = False. Defaults to self.allow_nograd.

MetaSGD (BaseLearner)

[Source]

Description

High-level implementation of Meta-SGD.

This class wraps an arbitrary nn.Module and augments it with clone() and adapt methods. It behaves similarly to MAML, but in addition a set of per-parameters learning rates are learned for fast-adaptation.

Arguments

  • model (Module) - Module to be wrapped.
  • lr (float) - Initialization value of the per-parameter fast adaptation learning rates.
  • first_order (bool, optional, default=False) - Whether to use the first-order version.
  • lrs (list of Parameters, optional, default=None) - If not None, overrides lr, and uses the list as learning rates for fast-adaptation.

References

  1. Li et al. 2017. “Meta-SGD: Learning to Learn Quickly for Few-Shot Learning.” arXiv.

Example

1
2
3
4
5
6
linear = l2l.algorithms.MetaSGD(nn.Linear(20, 10), lr=0.01)
clone = linear.clone()
error = loss(clone(X), y)
clone.adapt(error)
error = loss(clone(X), y)
error.backward()

adapt(self, loss, first_order=None)

Descritpion

Akin to MAML.adapt() but for MetaSGD: it updates the model with the learnable per-parameter learning rates.

clone(self)

Descritpion

Akin to MAML.clone() but for MetaSGD: it includes a set of learnable fast-adaptation learning rates.

GBML (Module)

[Source]

Description

General wrapper for gradient-based meta-learning implementations.

A variety of algorithms can simply be implemented by changing the kind of transform used during fast-adaptation. For example, if the transform is Scale we recover Meta-SGD [2] with adapt_transform=False and Alpha MAML [4] with adapt_transform=True. If the transform is a Kronecker-factored module (e.g. neural network, or linear), we recover KFO from [5].

Arguments

  • module (Module) - Module to be wrapped.
  • tranform (Module) - Transform used to update the module.
  • lr (float) - Fast adaptation learning rate.
  • adapt_transform (bool, optional, default=False) - Whether to update the transform's parameters during fast-adaptation.
  • first_order (bool, optional, default=False) - Whether to use the first-order approximation.
  • allow_unused (bool, optional, default=None) - Whether to allow differentiation of unused parameters. Defaults to allow_nograd.
  • allow_nograd (bool, optional, default=False) - Whether to allow adaptation with parameters that have requires_grad = False.

References

  1. Finn et al. 2017. “Model-Agnostic Meta-Learning for Fast Adaptation of Deep Networks.”
  2. Li et al. 2017. “Meta-SGD: Learning to Learn Quickly for Few-Shot Learning.”
  3. Park & Oliva. 2019. “Meta-Curvature.”
  4. Behl et al. 2019. “Alpha MAML: Adaptive Model-Agnostic Meta-Learning.”
  5. Arnold et al. 2019. “When MAML Can Adapt Fast and How to Assist When It Cannot.”

Example

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
model = SmallCNN()
transform = l2l.optim.ModuleTransform(torch.nn.Linear)
gbml = l2l.algorithms.GBML(
    module=model,
    transform=transform,
    lr=0.01,
    adapt_transform=True,
)
gbml.to(device)
opt = torch.optim.SGD(gbml.parameters(), lr=0.001)

# Training with 1 adaptation step
for iteration in range(10):
    opt.zero_grad()
    task_model = gbml.clone()
    loss = compute_loss(task_model)
    task_model.adapt(loss)
    loss.backward()
    opt.step()

adapt(self, loss, first_order=None, allow_nograd=None, allow_unused=None)

Description

Takes a gradient step on the loss and updates the cloned parameters in place.

The parameters of the transform are only adapted if self.adapt_update is True.

Arguments

  • loss (Tensor) - Loss to minimize upon update.
  • first_order (bool, optional, default=None) - Whether to use first- or second-order updates. Defaults to self.first_order.
  • allow_unused (bool, optional, default=None) - Whether to allow differentiation of unused parameters. Defaults to self.allow_unused.
  • allow_nograd (bool, optional, default=None) - Whether to allow adaptation with parameters that have requires_grad = False. Defaults to self.allow_nograd.

clone(self, first_order=None, allow_unused=None, allow_nograd=None, adapt_transform=None)

Description

Similar to MAML.clone().

Arguments

  • first_order (bool, optional, default=None) - Whether the clone uses first- or second-order updates. Defaults to self.first_order.
  • allow_unused (bool, optional, default=None) - Whether to allow differentiation of unused parameters. Defaults to self.allow_unused.
  • allow_nograd (bool, optional, default=False) - Whether to allow adaptation with parameters that have requires_grad = False. Defaults to self.allow_nograd.

PyTorch Lightning

LightningMAML (LightningEpisodicModule)

[Source]

Description

A PyTorch Lightning module for MAML.

Arguments

  • model (Module) - A PyTorch nn.Module.
  • loss (Function, optional, default=CrossEntropyLoss) - Loss function which maps the cost of the events.
  • ways (int, optional, default=5) - Number of classes in a task.
  • shots (int, optional, default=1) - Number of samples for adaptation.
  • adaptation_steps (int, optional, default=1) - Number of steps for adapting to new task.
  • lr (float, optional, default=0.001) - Learning rate of meta training.
  • adaptation_lr (float, optional, default=0.1) - Learning rate for fast adaptation.
  • scheduler_step (int, optional, default=20) - Decay interval for lr.
  • scheduler_decay (float, optional, default=1.0) - Decay rate for lr.

References

  1. Finn et al. 2017. "Model-Agnostic Meta-Learning for Fast Adaptation of Deep Networks."

Example

1
2
3
4
5
6
tasksets = l2l.vision.benchmarks.get_tasksets('omniglot')
model = l2l.vision.models.OmniglotFC(28**2, args.ways)
maml = LightningMAML(classifier, adaptation_lr=0.1, **dict_args)
episodic_data = EpisodicBatcher(tasksets.train, tasksets.validation, tasksets.test)
trainer = pl.Trainer.from_argparse_args(args)
trainer.fit(maml, episodic_data)

LightningANIL (LightningEpisodicModule)

[Source]

Description

A PyTorch Lightning module for ANIL.

Arguments

  • features (Module) - A nn.Module to extract features, which will not be adaptated.
  • classifier (Module) - A nn.Module taking features, mapping them to classification.
  • loss (Function, optional, default=CrossEntropyLoss) - Loss function which maps the cost of the events.
  • ways (int, optional, default=5) - Number of classes in a task.
  • shots (int, optional, default=1) - Number of samples for adaptation.
  • adaptation_steps (int, optional, default=1) - Number of steps for adapting to new task.
  • lr (float, optional, default=0.001) - Learning rate of meta training.
  • adaptation_lr (float, optional, default=0.1) - Learning rate for fast adaptation.
  • scheduler_step (int, optional, default=20) - Decay interval for lr.
  • scheduler_decay (float, optional, default=1.0) - Decay rate for lr.

References

  1. Raghu et al. 2020. "Rapid Learning or Feature Reuse? Towards Understanding the Effectiveness of MAML"

Example

1
2
3
4
5
6
tasksets = l2l.vision.benchmarks.get_tasksets('omniglot')
model = l2l.vision.models.OmniglotFC(28**2, args.ways)
anil = LightningANIL(model.features, model.classifier, adaptation_lr=0.1, **dict_args)
episodic_data = EpisodicBatcher(tasksets.train, tasksets.validation, tasksets.test)
trainer = pl.Trainer.from_argparse_args(args)
trainer.fit(anil, episodic_data)

LightningPrototypicalNetworks (LightningEpisodicModule)

[Source]

Description

A PyTorch Lightning module for Prototypical Networks.

Arguments

  • features (Module) - Feature extractor which classifies input tasks.
  • loss (Function, optional, default=CrossEntropyLoss) - Loss function which maps the cost of the events.
  • distance_metric (str, optional, default='euclidean') - Distance metric between samples. ['euclidean', 'cosine']
  • train_ways (int, optional, default=5) - Number of classes in for train tasks.
  • train_shots (int, optional, default=1) - Number of support samples for train tasks.
  • train_queries (int, optional, default=1) - Number of query samples for train tasks.
  • test_ways (int, optional, default=5) - Number of classes in for test tasks.
  • test_shots (int, optional, default=1) - Number of support samples for test tasks.
  • test_queries (int, optional, default=1) - Number of query samples for test tasks.
  • lr (float, optional, default=0.001) - Learning rate of meta training.
  • scheduler_step (int, optional, default=20) - Decay interval for lr.
  • scheduler_decay (float, optional, default=1.0) - Decay rate for lr.

References

  1. Snell et al. 2017. "Prototypical Networks for Few-shot Learning"

Example

1
2
3
4
5
6
tasksets = l2l.vision.benchmarks.get_tasksets('mini-imagenet')
features = Convnet()  # init model
protonet = LightningPrototypicalNetworks(features, **dict_args)
episodic_data = EpisodicBatcher(tasksets.train, tasksets.validation, tasksets.test)
trainer = pl.Trainer.from_argparse_args(args)
trainer.fit(protonet, episodic_data)

LightningMetaOptNet (LightningPrototypicalNetworks)

[Source]

Description

A PyTorch Lightning module for MetaOptNet.

Arguments

  • features (Module) - Feature extractor which classifies input tasks.
  • svm_C_reg (float, optional, default=0.1) - Regularization weight for SVM.
  • svm_max_iters (int, optional, default=15) - Maximum number of iterations for SVM convergence.
  • loss (Function, optional, default=CrossEntropyLoss) - Loss function which maps the cost of the events.
  • train_ways (int, optional, default=5) - Number of classes in for train tasks.
  • train_shots (int, optional, default=1) - Number of support samples for train tasks.
  • train_queries (int, optional, default=1) - Number of query samples for train tasks.
  • test_ways (int, optional, default=5) - Number of classes in for test tasks.
  • test_shots (int, optional, default=1) - Number of support samples for test tasks.
  • test_queries (int, optional, default=1) - Number of query samples for test tasks.
  • lr (float, optional, default=0.001) - Learning rate of meta training.
  • scheduler_step (int, optional, default=20) - Decay interval for lr.
  • scheduler_decay (float, optional, default=1.0) - Decay rate for lr.

References

  1. Lee et al. 2019. "Meta-Learning with Differentiable Convex Optimization"

Example

1
2
3
4
5
6
tasksets = l2l.vision.benchmarks.get_tasksets('mini-imagenet')
features = Convnet()  # init model
metaoptnet = LightningMetaOptNet(features, **dict_args)
episodic_data = EpisodicBatcher(tasksets.train, tasksets.validation, tasksets.test)
trainer = pl.Trainer.from_argparse_args(args)
trainer.fit(metaoptnet, episodic_data)