learn2learn.gym

MetaEnv

MetaEnv(task=None)

[Source]

Description

Interface for l2l envs. Environments have a certain number of task specific parameters that uniquely identify the environment. Tasks are then a dictionary with the names of these parameters as keys and the values of these parameters as values. Environments must then implement functions to get, set and sample tasks. The flow is then

env = EnvClass()
tasks = env.sample_tasks(num_tasks)
for task in tasks:
    env.set_task(task)
    *training code here*
    ...

Credit

Adapted from Tristan Deleu and Jonas Rothfuss' implementations.

AsyncVectorEnv

AsyncVectorEnv(env_fns, env=None)

[Source]

Description

Asynchronous vectorized environment for working with l2l MetaEnvs. Allows multiple environments to be run as separate processes.

Credit

Adapted from OpenAI and Tristan Deleu's implementations.

learn2learn.gym.envs.mujoco

HalfCheetahForwardBackwardEnv

HalfCheetahForwardBackwardEnv(task=None)

[Source]

Description

This environment requires the half-cheetah to learn to run forward or backward. At each time step the half-cheetah receives a signal composed of a control cost and a reward equal to its average velocity in the direction of the plane. The tasks are Bernoulli samples on {-1, 1} with probability 0.5, where -1 indicates the half-cheetah should move backward and +1 indicates the half-cheetah should move forward. The velocity is calculated as the distance (in the target direction) of the half-cheetah's torso position before and after taking the specified action divided by a small value dt.

Credit

Adapted from Jonas Rothfuss' implementation.

References

  1. Finn et al. 2017. "Model-Agnostic Meta-Learning for Fast Adaptation of Deep Networks." arXiv [cs.LG].
  2. Rothfuss et al. 2018. "ProMP: Proximal Meta-Policy Search." arXiv [cs.LG].

AntForwardBackwardEnv

AntForwardBackwardEnv(task=None)

[Source]

Description

This environment requires the ant to learn to run forward or backward. At each time step the ant receives a signal composed of a control cost and a reward equal to its average velocity in the direction of the plane. The tasks are Bernoulli samples on {-1, 1} with probability 0.5, where -1 indicates the ant should move backward and +1 indicates the ant should move forward. The velocity is calculated as the distance (in the direction of the plane) of the ant's torso position before and after taking the specified action divided by a small value dt. As noted in [1], a small positive bonus is added to the reward to stop the ant from prematurely ending the episode.

Credit

Adapted from Jonas Rothfuss' implementation.

References

  1. Finn et al. 2017. "Model-Agnostic Meta-Learning for Fast Adaptation of Deep Networks." arXiv [cs.LG].
  2. Rothfuss et al. 2018. "ProMP: Proximal Meta-Policy Search." arXiv [cs.LG].

AntDirectionEnv

AntDirectionEnv(task=None)

[Source]

Description

This environment requires the Ant to learn to run in a random direction in the XY plane. At each time step the ant receives a signal composed of a control cost and a reward equal to its average velocity in the direction of the plane. The tasks are 2d-arrays sampled uniformly along the unit circle. The target direction is indicated by the vector from the origin to the sampled point. The velocity is calculated as the distance (in the target direction) of the ant's torso position before and after taking the specified action divided by a small value dt. As noted in [1], a small positive bonus is added to the reward to stop the ant from prematurely ending the episode.

Credit

Adapted from Jonas Rothfuss' implementation.

References

  1. Finn et al. 2017. "Model-Agnostic Meta-Learning for Fast Adaptation of Deep Networks." arXiv [cs.LG].
  2. Rothfuss et al. 2018. "ProMP: Proximal Meta-Policy Search." arXiv [cs.LG].

HumanoidForwardBackwardEnv

HumanoidForwardBackwardEnv(task=None)

[Source]

Description

This environment requires the humanoid to learn to run forward or backward. At each time step the humanoid receives a signal composed of a control cost and a reward equal to its average velocity in the target direction. The tasks are Bernoulli samples on {-1, 1} with probability 0.5, where -1 indicates the humanoid should move backward and +1 indicates the humanoid should move forward. The velocity is calculated as the distance (in the target direction) of the humanoid's torso position before and after taking the specified action divided by a small value dt.

Credit

Adapted from Jonas Rothfuss' implementation.

References

  1. Finn et al. 2017. "Model-Agnostic Meta-Learning for Fast Adaptation of Deep Networks." arXiv [cs.LG].
  2. Rothfuss et al. 2018. "ProMP: Proximal Meta-Policy Search." arXiv [cs.LG].

HumanoidDirectionEnv

HumanoidDirectionEnv(task=None)

[Source]

Description

This environment requires the humanoid to learn to run in a random direction in the XY plane. At each time step the humanoid receives a signal composed of a control cost and a reward equal to its average velocity in the target direction. The tasks are 2d-arrays sampled uniformly along the unit circle. The target direction is indicated by the vector from the origin to the sampled point. The velocity is calculated as the distance (in the target direction) of the humanoid's torso position before and after taking the specified action divided by a small value dt. A small positive bonus is added to the reward to stop the humanoid from prematurely ending the episode.

Credit

Adapted from Jonas Rothfuss' implementation.

References

  1. Finn et al. 2017. "Model-Agnostic Meta-Learning for Fast Adaptation of Deep Networks." arXiv [cs.LG].
  2. Rothfuss et al. 2018. "ProMP: Proximal Meta-Policy Search." arXiv [cs.LG].

learn2learn.gym.envs.particles

Particles2DEnv

Particles2DEnv(task=None)

[Source]

Description

Each task is defined by the location of the goal. A point mass receives a directional force and moves accordingly (clipped in [-0.1,0.1]). The reward is equal to the negative distance from the goal.

Credit

Adapted from Jonas Rothfuss' implementation.