Meta-RL results are particularly finicky to compare. Different papers use different environment implementations, which in turn produce different convergence and rewards. The plots below only serve to indicate what kind of performance you can expect with learn2learn.
The above results are obtained by running maml_trpo.py on
AntForwardBackwardEnv for 300 updates.
The figures show the expected sum of rewards over all tasks.
The line and shadow are the mean and standard deviation computed over 3 random seeds.
Those results were obtained in August 2019, and might be outdated.