learn2learn.gym

Environment, models, and other utilities related to reinforcement learning and OpenAI Gym.

MetaEnv

1
MetaEnv(task=None)

[Source]

Description

Interface for l2l envs. Environments have a certain number of task specific parameters that uniquely identify the environment. Tasks are then a dictionary with the names of these parameters as keys and the values of these parameters as values. Environments must then implement functions to get, set and sample tasks. The flow is then

1
2
3
4
5
6
env = EnvClass()
tasks = env.sample_tasks(num_tasks)
for task in tasks:
    env.set_task(task)
    *training code here*
    ...

Credit

Adapted from Tristan Deleu and Jonas Rothfuss' implementations.

AsyncVectorEnv

1
AsyncVectorEnv(env_fns, env=None)

[Source]

Description

Asynchronous vectorized environment for working with l2l MetaEnvs. Allows multiple environments to be run as separate processes.

Credit

Adapted from OpenAI and Tristan Deleu's implementations.

learn2learn.gym.envs.mujoco

HalfCheetahForwardBackwardEnv

1
HalfCheetahForwardBackwardEnv(task=None)

[Source]

Description

This environment requires the half-cheetah to learn to run forward or backward. At each time step the half-cheetah receives a signal composed of a control cost and a reward equal to its average velocity in the direction of the plane. The tasks are Bernoulli samples on {-1, 1} with probability 0.5, where -1 indicates the half-cheetah should move backward and +1 indicates the half-cheetah should move forward. The velocity is calculated as the distance (in the target direction) of the half-cheetah's torso position before and after taking the specified action divided by a small value dt.

Credit

Adapted from Jonas Rothfuss' implementation.

References

  1. Finn et al. 2017. "Model-Agnostic Meta-Learning for Fast Adaptation of Deep Networks." arXiv [cs.LG].
  2. Rothfuss et al. 2018. "ProMP: Proximal Meta-Policy Search." arXiv [cs.LG].

AntForwardBackwardEnv

1
AntForwardBackwardEnv(task=None)

[Source]

Description

This environment requires the ant to learn to run forward or backward. At each time step the ant receives a signal composed of a control cost and a reward equal to its average velocity in the direction of the plane. The tasks are Bernoulli samples on {-1, 1} with probability 0.5, where -1 indicates the ant should move backward and +1 indicates the ant should move forward. The velocity is calculated as the distance (in the direction of the plane) of the ant's torso position before and after taking the specified action divided by a small value dt. As noted in [1], a small positive bonus is added to the reward to stop the ant from prematurely ending the episode.

Credit

Adapted from Jonas Rothfuss' implementation.

References

  1. Finn et al. 2017. "Model-Agnostic Meta-Learning for Fast Adaptation of Deep Networks." arXiv [cs.LG].
  2. Rothfuss et al. 2018. "ProMP: Proximal Meta-Policy Search." arXiv [cs.LG].

AntDirectionEnv

1
AntDirectionEnv(task=None)

[Source]

Description

This environment requires the Ant to learn to run in a random direction in the XY plane. At each time step the ant receives a signal composed of a control cost and a reward equal to its average velocity in the direction of the plane. The tasks are 2d-arrays sampled uniformly along the unit circle. The target direction is indicated by the vector from the origin to the sampled point. The velocity is calculated as the distance (in the target direction) of the ant's torso position before and after taking the specified action divided by a small value dt. As noted in [1], a small positive bonus is added to the reward to stop the ant from prematurely ending the episode.

Credit

Adapted from Jonas Rothfuss' implementation.

References

  1. Finn et al. 2017. "Model-Agnostic Meta-Learning for Fast Adaptation of Deep Networks." arXiv [cs.LG].
  2. Rothfuss et al. 2018. "ProMP: Proximal Meta-Policy Search." arXiv [cs.LG].

HumanoidForwardBackwardEnv

1
HumanoidForwardBackwardEnv(task=None)

[Source]

Description

This environment requires the humanoid to learn to run forward or backward. At each time step the humanoid receives a signal composed of a control cost and a reward equal to its average velocity in the target direction. The tasks are Bernoulli samples on {-1, 1} with probability 0.5, where -1 indicates the humanoid should move backward and +1 indicates the humanoid should move forward. The velocity is calculated as the distance (in the target direction) of the humanoid's torso position before and after taking the specified action divided by a small value dt.

Credit

Adapted from Jonas Rothfuss' implementation.

References

  1. Finn et al. 2017. "Model-Agnostic Meta-Learning for Fast Adaptation of Deep Networks." arXiv [cs.LG].
  2. Rothfuss et al. 2018. "ProMP: Proximal Meta-Policy Search." arXiv [cs.LG].

HumanoidDirectionEnv

1
HumanoidDirectionEnv(task=None)

[Source]

Description

This environment requires the humanoid to learn to run in a random direction in the XY plane. At each time step the humanoid receives a signal composed of a control cost and a reward equal to its average velocity in the target direction. The tasks are 2d-arrays sampled uniformly along the unit circle. The target direction is indicated by the vector from the origin to the sampled point. The velocity is calculated as the distance (in the target direction) of the humanoid's torso position before and after taking the specified action divided by a small value dt. A small positive bonus is added to the reward to stop the humanoid from prematurely ending the episode.

Credit

Adapted from Jonas Rothfuss' implementation.

References

  1. Finn et al. 2017. "Model-Agnostic Meta-Learning for Fast Adaptation of Deep Networks." arXiv [cs.LG].
  2. Rothfuss et al. 2018. "ProMP: Proximal Meta-Policy Search." arXiv [cs.LG].

learn2learn.gym.envs.particles

Particles2DEnv

1
Particles2DEnv(task=None)

[Source]

Description

Each task is defined by the location of the goal. A point mass receives a directional force and moves accordingly (clipped in [-0.1,0.1]). The reward is equal to the negative distance from the goal.

Credit

Adapted from Jonas Rothfuss' implementation.

learn2learn.gym.envs.metaworld

MetaWorldML1

1
MetaWorldML1(task_name, env_type='train', n_goals=50, sample_all=False)

[Source]

Description

The ML1 Benchmark of Meta-World is focused on solving just one task on different object / goal configurations.This task can be either one of the following: 'reach', 'push' and 'pick-and-place'. The meta-training is performed on a set of 50 randomly chosen once initial object and goal positions. The meta-testing is performed on a held-out set of 10 new different configurations. The starting state of the robot arm is always fixed. The goal positions are not provided in the observation space, forcing the Sawyer robot arm to explore and adapt to the new goal through trial-and-error. This is considered a relatively easy problem for a meta-learning algorithm to solve and acts as a sanity check to a working implementation. For more information regarding this benchmark, please consult [1].

Credit

Original implementation found in https://github.com/rlworkgroup/metaworld.

References

  1. Yu, Tianhe, et al. "Meta-world: A benchmark and evaluation for multi-task and meta reinforcement learning." arXiv preprint arXiv:1910.10897 (2019).

MetaWorldML10

1
MetaWorldML10(env_type='train', sample_all=False, task_name=None)

[Source]

Description

The ML10 Benchmark of Meta-World consists of 10 different tasks for meta-training and 5 new tasks for meta-testing. For each task there is only one goal that is randomly chosen once. The starting state and object position is random. The meta-training tasks have been intentionally selected to have a structural similarity to the test tasks. No task ID is provided in the observation space, meaning the meta-learning algorithm will need to identify each task from experience. This is a much harder problem than ML1 which probably requires more samples to train. For more information regarding this benchmark, please consult [1].

Credit

Original implementation found in https://github.com/rlworkgroup/metaworld.

References

  1. Yu, Tianhe, et al. "Meta-world: A benchmark and evaluation for multi-task and meta reinforcement learning." arXiv preprint arXiv:1910.10897 (2019).

MetaWorldML45

1
MetaWorldML45(env_type='train', sample_all=False, task_name=None)

[Source]

Description

Similarly to ML10, this Benchmark has a variety of 45 different tasks for meta-training and 5 new tasks for meta-testing. For each task there is only one goal that is randomly chosen once. The starting state and object position is random. No task ID is provided in the observation space, meaning the meta-learning algorithm will need to identify each task from experience. This benchmark is significantly difficult to solve due to the diversity across tasks. For more information regarding this benchmark, please consult [1].

Credit

Original implementation found in https://github.com/rlworkgroup/metaworld.

References

  1. Yu, Tianhe, et al. "Meta-world: A benchmark and evaluation for multi-task and meta reinforcement learning." arXiv preprint arXiv:1910.10897 (2019).