

learn2learn.algorithms¶

High-Level Interfaces¶

`MAML (BaseLearner)` ¶

[Source]

Description

High-level implementation of Model-Agnostic Meta-Learning.

This class wraps an arbitrary nn.Module and augments it with clone() and adapt() methods.

For the first-order version of MAML (i.e. FOMAML), set the first_order flag to True upon initialization.

Arguments

model (Module) - Module to be wrapped.
lr (float) - Fast adaptation learning rate.
first_order (bool, optional, default=False) - Whether to use the first-order approximation of MAML. (FOMAML)
allow_unused (bool, optional, default=None) - Whether to allow differentiation of unused parameters. Defaults to allow_nograd.
allow_nograd (bool, optional, default=False) - Whether to allow adaptation with parameters that have requires_grad = False.

References

Finn et al. 2017. "Model-Agnostic Meta-Learning for Fast Adaptation of Deep Networks."

Example

linear = l2l.algorithms.MAML(nn.Linear(20, 10), lr=0.01)
clone = linear.clone()
error = loss(clone(X), y)
clone.adapt(error)
error = loss(clone(X), y)
error.backward()

`adapt(self, loss, first_order=None, allow_unused=None, allow_nograd=None)` ¶

Description

Takes a gradient step on the loss and updates the cloned parameters in place.

Arguments

loss (Tensor) - Loss to minimize upon update.
first_order (bool, optional, default=None) - Whether to use first- or second-order updates. Defaults to self.first_order.
allow_unused (bool, optional, default=None) - Whether to allow differentiation of unused parameters. Defaults to self.allow_unused.
allow_nograd (bool, optional, default=None) - Whether to allow adaptation with parameters that have requires_grad = False. Defaults to self.allow_nograd.

`clone(self, first_order=None, allow_unused=None, allow_nograd=None)` ¶

Description

Returns a MAML-wrapped copy of the module whose parameters and buffers are torch.cloned from the original module.

This implies that back-propagating losses on the cloned module will populate the buffers of the original module. For more information, refer to learn2learn.clone_module().

Arguments

first_order (bool, optional, default=None) - Whether the clone uses first- or second-order updates. Defaults to self.first_order.
allow_unused (bool, optional, default=None) - Whether to allow differentiation of unused parameters. Defaults to self.allow_unused.
allow_nograd (bool, optional, default=False) - Whether to allow adaptation with parameters that have requires_grad = False. Defaults to self.allow_nograd.

`MetaSGD (BaseLearner)` ¶

[Source]

Description

High-level implementation of Meta-SGD.

This class wraps an arbitrary nn.Module and augments it with clone() and adapt methods. It behaves similarly to MAML, but in addition a set of per-parameters learning rates are learned for fast-adaptation.

Arguments

model (Module) - Module to be wrapped.
lr (float) - Initialization value of the per-parameter fast adaptation learning rates.
first_order (bool, optional, default=False) - Whether to use the first-order version.
lrs (list of Parameters, optional, default=None) - If not None, overrides lr, and uses the list as learning rates for fast-adaptation.

References

Li et al. 2017. “Meta-SGD: Learning to Learn Quickly for Few-Shot Learning.” arXiv.

Example

linear = l2l.algorithms.MetaSGD(nn.Linear(20, 10), lr=0.01)
clone = linear.clone()
error = loss(clone(X), y)
clone.adapt(error)
error = loss(clone(X), y)
error.backward()

`adapt(self, loss, first_order=None)` ¶

Descritpion

Akin to MAML.adapt() but for MetaSGD: it updates the model with the learnable per-parameter learning rates.

`clone(self)` ¶

Descritpion

Akin to MAML.clone() but for MetaSGD: it includes a set of learnable fast-adaptation learning rates.

`GBML (Module)` ¶

[Source]

Description

General wrapper for gradient-based meta-learning implementations.

A variety of algorithms can simply be implemented by changing the kind of transform used during fast-adaptation. For example, if the transform is Scale we recover Meta-SGD [2] with adapt_transform=False and Alpha MAML [4] with adapt_transform=True. If the transform is a Kronecker-factored module (e.g. neural network, or linear), we recover KFO from [5].

Arguments

module (Module) - Module to be wrapped.
tranform (Module) - Transform used to update the module.
lr (float) - Fast adaptation learning rate.
adapt_transform (bool, optional, default=False) - Whether to update the transform's parameters during fast-adaptation.
first_order (bool, optional, default=False) - Whether to use the first-order approximation.
allow_unused (bool, optional, default=None) - Whether to allow differentiation of unused parameters. Defaults to allow_nograd.
allow_nograd (bool, optional, default=False) - Whether to allow adaptation with parameters that have requires_grad = False.

References

Finn et al. 2017. “Model-Agnostic Meta-Learning for Fast Adaptation of Deep Networks.”
Li et al. 2017. “Meta-SGD: Learning to Learn Quickly for Few-Shot Learning.”
Park & Oliva. 2019. “Meta-Curvature.”
Behl et al. 2019. “Alpha MAML: Adaptive Model-Agnostic Meta-Learning.”
Arnold et al. 2019. “When MAML Can Adapt Fast and How to Assist When It Cannot.”

Example

model = SmallCNN()
transform = l2l.optim.ModuleTransform(torch.nn.Linear)
gbml = l2l.algorithms.GBML(
    module=model,
    transform=transform,
    lr=0.01,
    adapt_transform=True,
)
gbml.to(device)
opt = torch.optim.SGD(gbml.parameters(), lr=0.001)

# Training with 1 adaptation step
for iteration in range(10):
    opt.zero_grad()
    task_model = gbml.clone()
    loss = compute_loss(task_model)
    task_model.adapt(loss)
    loss.backward()
    opt.step()

`adapt(self, loss, first_order=None, allow_nograd=None, allow_unused=None)` ¶

Description

Takes a gradient step on the loss and updates the cloned parameters in place.

The parameters of the transform are only adapted if self.adapt_update is True.

Arguments

loss (Tensor) - Loss to minimize upon update.
first_order (bool, optional, default=None) - Whether to use first- or second-order updates. Defaults to self.first_order.
allow_unused (bool, optional, default=None) - Whether to allow differentiation of unused parameters. Defaults to self.allow_unused.
allow_nograd (bool, optional, default=None) - Whether to allow adaptation with parameters that have requires_grad = False. Defaults to self.allow_nograd.

`clone(self, first_order=None, allow_unused=None, allow_nograd=None, adapt_transform=None)` ¶

Description

Similar to MAML.clone().

Arguments

first_order (bool, optional, default=None) - Whether the clone uses first- or second-order updates. Defaults to self.first_order.
allow_unused (bool, optional, default=None) - Whether to allow differentiation of unused parameters. Defaults to self.allow_unused.
allow_nograd (bool, optional, default=False) - Whether to allow adaptation with parameters that have requires_grad = False. Defaults to self.allow_nograd.

PyTorch Lightning¶

`LightningMAML (LightningEpisodicModule)` ¶

[Source]

Description

A PyTorch Lightning module for MAML.

Arguments

model (Module) - A PyTorch nn.Module.
loss (Function, optional, default=CrossEntropyLoss) - Loss function which maps the cost of the events.
ways (int, optional, default=5) - Number of classes in a task.
shots (int, optional, default=1) - Number of samples for adaptation.
adaptation_steps (int, optional, default=1) - Number of steps for adapting to new task.
lr (float, optional, default=0.001) - Learning rate of meta training.
adaptation_lr (float, optional, default=0.1) - Learning rate for fast adaptation.
scheduler_step (int, optional, default=20) - Decay interval for lr.
scheduler_decay (float, optional, default=1.0) - Decay rate for lr.

References

Finn et al. 2017. "Model-Agnostic Meta-Learning for Fast Adaptation of Deep Networks."

Example

tasksets = l2l.vision.benchmarks.get_tasksets('omniglot')
model = l2l.vision.models.OmniglotFC(28**2, args.ways)
maml = LightningMAML(classifier, adaptation_lr=0.1, **dict_args)
episodic_data = EpisodicBatcher(tasksets.train, tasksets.validation, tasksets.test)
trainer = pl.Trainer.from_argparse_args(args)
trainer.fit(maml, episodic_data)

`LightningANIL (LightningEpisodicModule)` ¶

[Source]

Description

A PyTorch Lightning module for ANIL.

Arguments

features (Module) - A nn.Module to extract features, which will not be adaptated.
classifier (Module) - A nn.Module taking features, mapping them to classification.
loss (Function, optional, default=CrossEntropyLoss) - Loss function which maps the cost of the events.
ways (int, optional, default=5) - Number of classes in a task.
shots (int, optional, default=1) - Number of samples for adaptation.
adaptation_steps (int, optional, default=1) - Number of steps for adapting to new task.
lr (float, optional, default=0.001) - Learning rate of meta training.
adaptation_lr (float, optional, default=0.1) - Learning rate for fast adaptation.
scheduler_step (int, optional, default=20) - Decay interval for lr.
scheduler_decay (float, optional, default=1.0) - Decay rate for lr.

References

Raghu et al. 2020. "Rapid Learning or Feature Reuse? Towards Understanding the Effectiveness of MAML"

Example

tasksets = l2l.vision.benchmarks.get_tasksets('omniglot')
model = l2l.vision.models.OmniglotFC(28**2, args.ways)
anil = LightningANIL(model.features, model.classifier, adaptation_lr=0.1, **dict_args)
episodic_data = EpisodicBatcher(tasksets.train, tasksets.validation, tasksets.test)
trainer = pl.Trainer.from_argparse_args(args)
trainer.fit(anil, episodic_data)

`LightningPrototypicalNetworks (LightningEpisodicModule)` ¶

[Source]

Description

A PyTorch Lightning module for Prototypical Networks.

Arguments

features (Module) - Feature extractor which classifies input tasks.
loss (Function, optional, default=CrossEntropyLoss) - Loss function which maps the cost of the events.
distance_metric (str, optional, default='euclidean') - Distance metric between samples. ['euclidean', 'cosine']
train_ways (int, optional, default=5) - Number of classes in for train tasks.
train_shots (int, optional, default=1) - Number of support samples for train tasks.
train_queries (int, optional, default=1) - Number of query samples for train tasks.
test_ways (int, optional, default=5) - Number of classes in for test tasks.
test_shots (int, optional, default=1) - Number of support samples for test tasks.
test_queries (int, optional, default=1) - Number of query samples for test tasks.
lr (float, optional, default=0.001) - Learning rate of meta training.
scheduler_step (int, optional, default=20) - Decay interval for lr.
scheduler_decay (float, optional, default=1.0) - Decay rate for lr.

References

Snell et al. 2017. "Prototypical Networks for Few-shot Learning"

Example

tasksets = l2l.vision.benchmarks.get_tasksets('mini-imagenet')
features = Convnet()  # init model
protonet = LightningPrototypicalNetworks(features, **dict_args)
episodic_data = EpisodicBatcher(tasksets.train, tasksets.validation, tasksets.test)
trainer = pl.Trainer.from_argparse_args(args)
trainer.fit(protonet, episodic_data)

`LightningMetaOptNet (LightningPrototypicalNetworks)` ¶

[Source]

Description

A PyTorch Lightning module for MetaOptNet.

Arguments

features (Module) - Feature extractor which classifies input tasks.
svm_C_reg (float, optional, default=0.1) - Regularization weight for SVM.
svm_max_iters (int, optional, default=15) - Maximum number of iterations for SVM convergence.
loss (Function, optional, default=CrossEntropyLoss) - Loss function which maps the cost of the events.
train_ways (int, optional, default=5) - Number of classes in for train tasks.
train_shots (int, optional, default=1) - Number of support samples for train tasks.
train_queries (int, optional, default=1) - Number of query samples for train tasks.
test_ways (int, optional, default=5) - Number of classes in for test tasks.
test_shots (int, optional, default=1) - Number of support samples for test tasks.
test_queries (int, optional, default=1) - Number of query samples for test tasks.
lr (float, optional, default=0.001) - Learning rate of meta training.
scheduler_step (int, optional, default=20) - Decay interval for lr.
scheduler_decay (float, optional, default=1.0) - Decay rate for lr.

References

Lee et al. 2019. "Meta-Learning with Differentiable Convex Optimization"

Example

tasksets = l2l.vision.benchmarks.get_tasksets('mini-imagenet')
features = Convnet()  # init model
metaoptnet = LightningMetaOptNet(features, **dict_args)
episodic_data = EpisodicBatcher(tasksets.train, tasksets.validation, tasksets.test)
trainer = pl.Trainer.from_argparse_args(args)
trainer.fit(metaoptnet, episodic_data)

learn2learn.algorithms¶

High-Level Interfaces¶

MAML (BaseLearner) ¶

adapt(self, loss, first_order=None, allow_unused=None, allow_nograd=None) ¶

clone(self, first_order=None, allow_unused=None, allow_nograd=None) ¶

MetaSGD (BaseLearner) ¶

adapt(self, loss, first_order=None) ¶

clone(self) ¶

GBML (Module) ¶

adapt(self, loss, first_order=None, allow_nograd=None, allow_unused=None) ¶

clone(self, first_order=None, allow_unused=None, allow_nograd=None, adapt_transform=None) ¶

PyTorch Lightning¶

LightningMAML (LightningEpisodicModule) ¶

LightningANIL (LightningEpisodicModule) ¶

LightningPrototypicalNetworks (LightningEpisodicModule) ¶

LightningMetaOptNet (LightningPrototypicalNetworks) ¶

`MAML (BaseLearner)` ¶

`adapt(self, loss, first_order=None, allow_unused=None, allow_nograd=None)` ¶

`clone(self, first_order=None, allow_unused=None, allow_nograd=None)` ¶

`MetaSGD (BaseLearner)` ¶

`adapt(self, loss, first_order=None)` ¶

`clone(self)` ¶

`GBML (Module)` ¶

`adapt(self, loss, first_order=None, allow_nograd=None, allow_unused=None)` ¶

`clone(self, first_order=None, allow_unused=None, allow_nograd=None, adapt_transform=None)` ¶

`LightningMAML (LightningEpisodicModule)` ¶

`LightningANIL (LightningEpisodicModule)` ¶

`LightningPrototypicalNetworks (LightningEpisodicModule)` ¶

`LightningMetaOptNet (LightningPrototypicalNetworks)` ¶