Difference between revisions of "Talk:Timeline of OpenAI"

From Timelines
Jump to: navigation, search
(Removed Rows)
Line 54: Line 54:
 
|-
 
|-
 
| 2017 || {{dts|November 2}} || || Publication || "Interpretable and Pedagogical Examples", a paper on language, is first submitted to the {{w|ArXiv}}. It shows that training the student and teacher iteratively, rather than jointly, can produce interpretable teaching strategies.<ref>{{cite web |last1=Milli |first1=Smitha |last2=Abbeel |first2=Pieter |last3=Mordatch |first3=Igor |title=Interpretable and Pedagogical Examples |url=https://arxiv.org/abs/1711.00694 |website=arxiv.org |accessdate=26 March 2020}}</ref>
 
| 2017 || {{dts|November 2}} || || Publication || "Interpretable and Pedagogical Examples", a paper on language, is first submitted to the {{w|ArXiv}}. It shows that training the student and teacher iteratively, rather than jointly, can produce interpretable teaching strategies.<ref>{{cite web |last1=Milli |first1=Smitha |last2=Abbeel |first2=Pieter |last3=Mordatch |first3=Igor |title=Interpretable and Pedagogical Examples |url=https://arxiv.org/abs/1711.00694 |website=arxiv.org |accessdate=26 March 2020}}</ref>
 +
|-
 +
| 2017 || {{dts|December 4}} || || Publication || "Learning Sparse Neural Networks through ''L<sub>0</sub>'' Regularization", a paper on {{w|reinforcement learning}}, is submitted to the {{w|ArXiv}}. It describes a method which allows for straightforward and efficient learning of model structures with stochastic gradient descent.<ref>{{cite web |last1=Louizos |first1=Christos |last2=Welling |first2=Max |last3=Kingma |first3=Diederik P. |title=Learning Sparse Neural Networks through L0 Regularization |url=https://arxiv.org/abs/1712.01312 |website=arxiv.org |accessdate=26 March 2020}}</ref>
 +
|-
 +
| 2018 || {{dts|February 3}} || || Publication || "DeepType: Multilingual Entity Linking by Neural Type System Evolution" a paper on {{w|reinforcement learning}}, is submitted to the {{w|ArXiv}}.<ref>{{cite web |last1=Raiman |first1=Jonathan |last2=Raiman |first2=Olivier |title=DeepType: Multilingual Entity Linking by Neural Type System Evolution |url=https://arxiv.org/abs/1802.01021 |website=arxiv.org |accessdate=26 March 2020}}</ref>
 +
|-
 +
| 2018 || {{dts|February 13}} || || Publication || "Evolved Policy Gradients", a {{w|reinforcement learning}} paper, is first submitted to the {{w|ArXiv}}. It proposes a metalearning approach for learning gradient-based reinforcement learning (RL) algorithms.<ref>{{cite web |last1=Houthooft |first1=Rein |last2=Chen |first2=Richard Y. |last3=Isola |first3=Phillip |last4=Stadie |first4=Bradly C. |last5=Wolski |first5=Filip |last6=Ho |first6=Jonathan |last7=Abbeel |first7=Pieter |title=Evolved Policy Gradients |url=https://arxiv.org/abs/1802.04821 |website=arxiv.org |accessdate=26 March 2020}}</ref>
 +
|-
 +
| 2018 || {{dts|February 26}} || || Publication || "Multi-Goal Reinforcement Learning: Challenging Robotics Environments and Request for Research" is first submitted to the {{w|ArXiv}}. The paper introduces a suite of challenging continuous control tasks based on currently existing robotics hardware, and presents a set of concrete research ideas for improving {{w|reinforcement learning}} algorithms.<ref>{{cite web |title=Multi-Goal Reinforcement Learning: Challenging Robotics Environments and Request for Research |url=https://arxiv.org/abs/1802.09464 |website=arxiv.org |accessdate=26 March 2020}}</ref>
 +
|-
 +
| 2018 || {{dts|March 3}} || || Publication || "Some Considerations on Learning to Explore via Meta-Reinforcement Learning", a paper on {{w|reinforcement learning}}, is first submitted to {{w|ArXiv}}. It considers the problem of exploration in meta reinforcement learning.<ref>{{cite web |last1=Stadie |first1=Bradly C. |last2=Yang |first2=Ge |last3=Houthooft |first3=Rein |last4=Chen |first4=Xi |last5=Duan |first5=Yan |last6=Wu |first6=Yuhuai |last7=Abbeel |first7=Pieter |last8=Sutskever |first8=Ilya |title=Some Considerations on Learning to Explore via Meta-Reinforcement Learning |url=https://arxiv.org/abs/1803.01118 |website=arxiv.org |accessdate=26 March 2020}}</ref>
 +
|-
 +
| 2018 || {{dts|March 8}} || || Publication || "On First-Order Meta-Learning Algorithms", a paper on {{w|reinforcement learning}}, is submitted to {{w|ArXiv}}. It analyzes meta-learning problems, where there is a distribution of tasks.<ref>{{cite web |last1=Nichol |first1=Alex |last2=Achiam |first2=Joshua |last3=Schulman |first3=John |title=On First-Order Meta-Learning Algorithms |url=https://arxiv.org/abs/1803.02999 |website=arxiv.org |accessdate=26 March 2020}}</ref>
 +
|-
 +
| 2018 || {{dts|March 15}} || || Publication || "Improving GANs Using Optimal Transport", a paper on generative models, is first submitted to the {{w|ArXiv}}. It presents Optimal Transport GAN (OT-GAN), a variant of generative adversarial nets minimizing a new metric measuring the distance between the generator distribution and the data distribution.<ref>{{cite web |last1=Salimans |first1=Tim |last2=Zhang |first2=Han |last3=Radford |first3=Alec |last4=Metaxas |first4=Dimitris |title=Improving GANs Using Optimal Transport |url=https://arxiv.org/abs/1803.05573 |website=arxiv.org |accessdate=26 March 2020}}</ref>
 +
|-
 +
| 2018 || {{dts|March 20}} || || Publication || "Variance Reduction for Policy Gradient with Action-Dependent Factorized Baselines", a paper on {{w|reinforcement learning}}, is submitted to the {{w|ArXiv}}. The paper shows that the general idea of including additional information in baselines for improved variance reduction can be extended to partially observed and multi-agent tasks.<ref>{{cite web |last1=Wu |first1=Cathy |last2=Rajeswaran |first2=Aravind |last3=Duan |first3=Yan |last4=KumarVikash Kumar |first4=Vikash |last5=Bayen |first5=Alexandre M |last6=Kakade |first6=Sham |last7=Mordatch |first7=Igor |last8=Abbeel |first8=Pieter |title=Variance Reduction for Policy Gradient with Action-Dependent Factorized Baselines |url=https://arxiv.org/abs/1803.07246 |website=arxiv.org |accessdate=26 March 2020}}</ref>
 +
|-
 +
| 2018 || {{dts|April 10}} || || Publication || "Gotta Learn Fast: A New Benchmark for Generalization in RL", a paper on {{w|reinforcement learning}}, is first submitted to the {{w|ArXiv}}. The report presents a new {{w|reinforcement learning}} benchmark intended to measure the performance of transfer learning and few-shot learning algorithms in the reinforcement learning domain.<ref>{{cite web |last1=Nichol |first1=Alex |last2=Pfau |first2=Vicki |last3=Hesse |first3=Christopher |last4=Klimov |first4=Oleg |last5=Schulman |first5=John |title=Gotta Learn Fast: A New Benchmark for Generalization in RL |url=https://arxiv.org/abs/1804.03720 |website=arxiv.org |accessdate=26 March 2020}}</ref>
 +
|-
 +
| 2018 || {{Dts|May 2}} || safety || Publication || The paper "AI safety via debate" by Geoffrey Irving, Paul Christiano, and Dario Amodei is uploaded to the arXiv. The paper proposes training agents via self play on a zero sum debate game, in order to adress tasks that are too complicated for a human to directly judge.<ref>{{cite web |url=https://arxiv.org/abs/1805.00899 |title=[1805.00899] AI safety via debate |accessdate=May 5, 2018}}</ref><ref>{{cite web |url=https://blog.OpenAI.com/debate/ |publisher=OpenAI Blog |title=AI Safety via Debate |date=May 3, 2018 |first1=Geoffrey |last1=Irving |first2=Dario |last2=Amodei |accessdate=May 5, 2018}}</ref>
 +
|-
 +
| 2018|| {{dts|June 2}} || || Publication || OpenAI publishes "GamePad: A Learning Environment for Theorem Proving" in {{w|arXiv}}. The paper introduces a system called GamePad that can be used to explore the application of machine learning methods to theorem proving in the Coq proof assistant.<ref>{{cite web |last1=Huang |first1=Daniel |last2=Dhariwal |first2=Prafulla |last3=Song |first3=Dawn |last4=Sutskever |first4=Ilya |title=GamePad: A Learning Environment for Theorem Proving |url=https://arxiv.org/abs/1806.00608 |website=arxiv.org |accessdate=26 March 2020}}</ref>
 +
|-
 +
| 2018 || {{dts|June 17}} || {{w|Reinforcement learning}} || Publication || OpenAI publishes paper on learning policy representations in multiagent systems. The paper proposes a general learning framework for modeling agent behavior in any multiagent system using only a handful of interaction data.<ref>{{cite web |title=Learning Policy Representations in Multiagent Systems |url=https://arxiv.org/abs/1806.06464 |website=arxiv.org |accessdate=26 March 2020}}</ref>
 +
|-
 +
| 2018 || {{dts|July 9}} || Generative models || Publication || "Glow: Generative Flow with Invertible 1x1 Convolutions" is first submitted to the {{w|ArXiv}}. The paper proposes a method for obtaining a significant improvement in log-likelihood on standard benchmarks.<ref>{{cite web |last1=Kingma |first1=Diederik P. |last2=Dhariwal |first2=Prafulla |title=Glow: Generative Flow with Invertible 1x1 Convolutions |url=https://arxiv.org/abs/1807.03039 |website=arxiv.org |accessdate=26 March 2020}}</ref>
 +
|-
 +
| 2018 || {{dts|July 26}} || {{w|Reinforcement learning} || Publication || OpenAI publishes paper on variational option discovery algorithms. The paper highlights a tight connection between variational option discovery methods and variational autoencoders, and introduces Variational Autoencoding Learning of Options by Reinforcement (VALOR), a new method derived from the connection.<ref>{{cite web |last1=Achiam |first1=Joshua |last2=Edwards |first2=Harrison |last3=Amodei |first3=Dario |last4=Abbeel |first4=Pieter |title=Variational Option Discovery Algorithms |website=arxiv.org |accessdate=26 March 2020}}</ref>
 +
|-
 +
| 2018 || {{dts|August 1}} || {{w|Robotics}} || Publication || OpenAI publishes paper describing the use of {{w|reinforcement learning}} to learn dexterous in-hand manipulation policies which can perform vision-based object reorientation on a physical Shadow Dexterous Hand.<ref>{{cite web |title=Learning Dexterous In-Hand Manipulation |url=https://arxiv.org/abs/1808.00177 |website=arxiv.org |accessdate=26 March 2020}}</ref>
 +
|-
 +
| 2018 || {{dts|October 2}} || Generative models || Publication || OpenAI publishes paper on FFJORD (free-form continuous dynamics for scalable reversible generative models), aiming to demonstrate their approach on high-dimensional density estimation, image generation, and variational inference.<ref>{{cite web |last1=Grathwohl |first1=Will |last2=Chen |first2=Ricky T. Q. |last3=Bettencourt |first3=Jesse |last4=Sutskever |first4=Ilya |last5=Duvenaud |first5=David |title=FFJORD: Free-form Continuous Dynamics for Scalable Reversible Generative Models |url=https://arxiv.org/abs/1810.01367 |website=arxiv.org |accessdate=26 March 2020}}</ref>
 +
|-
 +
| 2018 || {{dts|October 19}} || {{w|Reinforcement learning}} || Publication || OpenAI publishes paper proposing Iterated Amplification, an alternative training strategy which progressively builds up a training signal for difficult problems by combining solutions to easier subproblems.<ref>{{cite web |last1=Christiano |first1=Paul |last2=Shlegeris |first2=Buck |last3=Amodei |first3=Dario |title=Supervising strong learners by amplifying weak experts |url=https://arxiv.org/abs/1810.08575 |website=arxiv.org |accessdate=26 March 2020}}</ref>
 
|}
 
|}

Revision as of 20:45, 5 May 2020

Removed Rows

In case any of these events turns out to be relevant, please place it back on the timeline or let me know and I'll do it.

Year Month and date Domain Event type Details
2016 May 25 Publication "Adversarial Training Methods for Semi-Supervised Text Classification" is submitted to the ArXiv. The paper proposes a method that achieves better results on multiple benchmark semi-supervised and purely supervised tasks.[1]
2016 June 21 Publication "Concrete Problems in AI Safety" is submitted to the arXiv. The paper explores practical problems in machine learning systems.[2]
2016 October 11 Publication "Transfer from Simulation to Real World through Learning Deep Inverse Dynamics Model", a paper on robotics, is submitted to the ArXiv. It investigates settings where the sequence of states traversed in simulation remains reasonable for the real world.[3]
2016 October 18 Publication "Semi-supervised Knowledge Transfer for Deep Learning from Private Training Data", a paper on safety, is submitted to the ArXiv. It shows an approach to providing strong privacy guarantees for training data: Private Aggregation of Teacher Ensembles (PATE).[4]
2016 November 2 Publication "Extensions and Limitations of the Neural GPU" is first submitted to the ArXiv. The paper shows that there are two simple ways of improving the performance of the Neural GPU: by carefully designing a curriculum, and by increasing model size.[5]
2016 November 8 Publication "Variational Lossy Autoencoder", a paper on generative models, is submitted to the ArXiv. It presents a method to learn global representations by combining Variational Autoencoder (VAE) with neural autoregressive models.[6]
2016 November 9 Publication "RL2: Fast Reinforcement Learning via Slow Reinforcement Learning", a paper on reinforcement learning, is first submitted to the ArXiv. It seeks to bridge the gap in number of trials between the machine learning process which requires a huge number of trials, and animals which can learn new tasks in just a few trials, benefiting from their prior knowledge about the world.[7]
2016 November 11 Publication "A Connection between Generative Adversarial Networks, Inverse Reinforcement Learning, and Energy-Based Models", a paper on generative models, is first submitted to the ArXiv.[8]
2016 November 14 Publication "On the Quantitative Analysis of Decoder-Based Generative Models", a paper on generative models, is submitted to the ArXiv. It introduces a technique to analyze the performance of decoder-based models.[9]
2016 November 15 Publication "#Exploration: A Study of Count-Based Exploration for Deep Reinforcement Learning", a paper on reinforcement learning, is first submitted to the ArXiv.[10]
2017 January 19 Publication "PixelCNN++: Improving the PixelCNN with Discretized Logistic Mixture Likelihood and Other Modifications", a paper on generative models, is submitted to the ArXiv.[11]
2017 February 8 Publication "Adversarial Attacks on Neural Network Policies" is submitted to the ArXiv. The paper shows that adversarial attacks are effective when targeting neural network policies in reinforcement learning.[12]
2017 March 6 Publication "Third-Person Imitation Learning", a paper on robotics, is submitted to the ArXiv. It presents a method for unsupervised third-person imitation learning.[13]
2017 March 10 Publication "Evolution Strategies as a Scalable Alternative to Reinforcement Learning" is submitted to the ArXiv. It explores the use of Evolution Strategies (ES), a class of black box optimization algorithms.[14]
2017 March 12 Publication "Prediction and Control with Temporal Segment Models", a paper on generative models, is first submitted to the ArXiv. It introduces a method for learning the dynamics of complex nonlinear systems based on deep generative models over temporal segments of states and actions.[15]
2017 March 15 Publication "Emergence of Grounded Compositional Language in Multi-Agent Populations" is first submitted to ArXiv. The paper proposes a multi-agent learning environment and learning methods that bring about emergence of a basic compositional language.[16]
2017 March 20 Publication "Domain Randomization for Transferring Deep Neural Networks from Simulation to the Real World", a paper on robotics, is subitted to the ArXiv. It explores domain randomization, a simple technique for training models on simulated images that transfer to real images by randomizing rendering in the simulator.[17]
2017 March 21 Publication "One-Shot Imitation Learning", a paper on robotics, is first submitted to the ArXiv. The paper proposes a meta-learning framework for optimizing imitation learning.[18]
2017 June 7 Publication "Multi-Agent Actor-Critic for Mixed Cooperative-Competitive Environments" is submitted to the ArXiv. The paper explores deep reinforcement learning methods for multi-agent domains.[19]
2017 September 13 Reinforcement learning Publication "Learning with Opponent-Learning Awareness" is first uploaded to the ArXiv. The paper presents Learning with Opponent-Learning Awareness (LOLA), a method in which each agent shapes the anticipated learning of the other agents in an environment.[20][21]
2017 October 17 Robotics Publication "Domain Randomization and Generative Models for Robotic Grasping", a paper on robotics, is first submitted to the ArXiv. It explores a novel data generation pipeline for training a deep neural network to perform grasp planning that applies the idea of domain randomization to object synthesis.[22]
2017 October 18 Publication "Sim-to-Real Transfer of Robotic Control with Dynamics Randomization", a paper on robotics, is first submitted to ArXiv. It describes a solution for strategies that are successful in simulation but may not transfer to their real world counterparts due to modeling error.[23]
2017 October 26 Publication "Meta Learning Shared Hierarchies", a paper on reinforcement learning, is submitted to the ArXiv. The paper describes the development of a metalearning approach for learning hierarchically structured policies, improving sample efficiency on unseen tasks through the use of shared primitives.[24]
2017 October 31 Publication "Backpropagation through the Void: Optimizing control variates for black-box gradient estimation", a paper on reinforcement learning, is first submitted to the ArXiv. It introduces a general framework for learning low-variance, unbiased gradient estimators for black-box functions of random variables.[25]
2017 November 2 Publication "Interpretable and Pedagogical Examples", a paper on language, is first submitted to the ArXiv. It shows that training the student and teacher iteratively, rather than jointly, can produce interpretable teaching strategies.[26]
2017 December 4 Publication "Learning Sparse Neural Networks through L0 Regularization", a paper on reinforcement learning, is submitted to the ArXiv. It describes a method which allows for straightforward and efficient learning of model structures with stochastic gradient descent.[27]
2018 February 3 Publication "DeepType: Multilingual Entity Linking by Neural Type System Evolution" a paper on reinforcement learning, is submitted to the ArXiv.[28]
2018 February 13 Publication "Evolved Policy Gradients", a reinforcement learning paper, is first submitted to the ArXiv. It proposes a metalearning approach for learning gradient-based reinforcement learning (RL) algorithms.[29]
2018 February 26 Publication "Multi-Goal Reinforcement Learning: Challenging Robotics Environments and Request for Research" is first submitted to the ArXiv. The paper introduces a suite of challenging continuous control tasks based on currently existing robotics hardware, and presents a set of concrete research ideas for improving reinforcement learning algorithms.[30]
2018 March 3 Publication "Some Considerations on Learning to Explore via Meta-Reinforcement Learning", a paper on reinforcement learning, is first submitted to ArXiv. It considers the problem of exploration in meta reinforcement learning.[31]
2018 March 8 Publication "On First-Order Meta-Learning Algorithms", a paper on reinforcement learning, is submitted to ArXiv. It analyzes meta-learning problems, where there is a distribution of tasks.[32]
2018 March 15 Publication "Improving GANs Using Optimal Transport", a paper on generative models, is first submitted to the ArXiv. It presents Optimal Transport GAN (OT-GAN), a variant of generative adversarial nets minimizing a new metric measuring the distance between the generator distribution and the data distribution.[33]
2018 March 20 Publication "Variance Reduction for Policy Gradient with Action-Dependent Factorized Baselines", a paper on reinforcement learning, is submitted to the ArXiv. The paper shows that the general idea of including additional information in baselines for improved variance reduction can be extended to partially observed and multi-agent tasks.[34]
2018 April 10 Publication "Gotta Learn Fast: A New Benchmark for Generalization in RL", a paper on reinforcement learning, is first submitted to the ArXiv. The report presents a new reinforcement learning benchmark intended to measure the performance of transfer learning and few-shot learning algorithms in the reinforcement learning domain.[35]
2018 May 2 safety Publication The paper "AI safety via debate" by Geoffrey Irving, Paul Christiano, and Dario Amodei is uploaded to the arXiv. The paper proposes training agents via self play on a zero sum debate game, in order to adress tasks that are too complicated for a human to directly judge.[36][37]
2018 June 2 Publication OpenAI publishes "GamePad: A Learning Environment for Theorem Proving" in arXiv. The paper introduces a system called GamePad that can be used to explore the application of machine learning methods to theorem proving in the Coq proof assistant.[38]
2018 June 17 Reinforcement learning Publication OpenAI publishes paper on learning policy representations in multiagent systems. The paper proposes a general learning framework for modeling agent behavior in any multiagent system using only a handful of interaction data.[39]
2018 July 9 Generative models Publication "Glow: Generative Flow with Invertible 1x1 Convolutions" is first submitted to the ArXiv. The paper proposes a method for obtaining a significant improvement in log-likelihood on standard benchmarks.[40]
2018 July 26 Reinforcement learning} Publication OpenAI publishes paper on variational option discovery algorithms. The paper highlights a tight connection between variational option discovery methods and variational autoencoders, and introduces Variational Autoencoding Learning of Options by Reinforcement (VALOR), a new method derived from the connection.[41]
2018 August 1 Robotics Publication OpenAI publishes paper describing the use of reinforcement learning to learn dexterous in-hand manipulation policies which can perform vision-based object reorientation on a physical Shadow Dexterous Hand.[42]
2018 October 2 Generative models Publication OpenAI publishes paper on FFJORD (free-form continuous dynamics for scalable reversible generative models), aiming to demonstrate their approach on high-dimensional density estimation, image generation, and variational inference.[43]
2018 October 19 Reinforcement learning Publication OpenAI publishes paper proposing Iterated Amplification, an alternative training strategy which progressively builds up a training signal for difficult problems by combining solutions to easier subproblems.[44]
  1. Miyato, Takeru; Dai, Andrew M.; Goodfellow, Ian. "Adversarial Training Methods for Semi-Supervised Text Classification". arxiv.org. Retrieved 28 March 2020. 
  2. "[1606.06565] Concrete Problems in AI Safety". June 21, 2016. Retrieved July 25, 2017. 
  3. Christiano, Paul; Shah, Zain; Mordatch, Igor; Schneider, Jonas; Blackwell, Trevor; Tobin, Joshua; Abbeel, Pieter; Zaremba, Wojciech. "Transfer from Simulation to Real World through Learning Deep Inverse Dynamics Model". arxiv.org. Retrieved 28 March 2020. 
  4. Papernot, Nicolas; Abadi, Martín; Erlingsson, Úlfar; Goodfellow, Ian; Talwar, Kunal. "Semi-supervised Knowledge Transfer for Deep Learning from Private Training Data". arxiv.org. Retrieved 28 March 2020. 
  5. Price, Eric; Zaremba, Wojciech; Sutskever, Ilya. "Extensions and Limitations of the Neural GPU". arxiv.org. Retrieved 28 March 2020. 
  6. Chen, Xi; Kingma, Diederik P.; Salimans, Tim; Duan, Yan; Dhariwal, Prafulla; Schulman, John; Sutskever, Ilya; Abbeel, Pieter. "Variational Lossy Autoencoder". arxiv.org. 
  7. Duan, Yan; Schulman, John; Chen, Xi; Bartlett, Peter L.; Sutskever, Ilya; Abbeel, Pieter. "RL2: Fast Reinforcement Learning via Slow Reinforcement Learning". arxiv.org. Retrieved 28 March 2020. 
  8. Finn, Chelsea; Christiano, Paul; Abbeel, Pieter; Levine, Sergey. "A Connection between Generative Adversarial Networks, Inverse Reinforcement Learning, and Energy-Based Models". arxiv.org. Retrieved 28 March 2020. 
  9. Wu, Yuhuai; Burda, Yuri; Salakhutdinov, Ruslan; Grosse, Roger. "On the Quantitative Analysis of Decoder-Based Generative Models". arxiv.org. Retrieved 28 March 2020. 
  10. "#Exploration: A Study of Count-Based Exploration for Deep Reinforcement Learning". arxiv.org. Retrieved 28 March 2020. 
  11. Salimans, Tim; Karpathy, Andrej; Chen, Xi; Kingma, Diederik P. "PixelCNN++: Improving the PixelCNN with Discretized Logistic Mixture Likelihood and Other Modifications". arxiv.org. Retrieved 28 March 2020. 
  12. Huang, Sandy; Papernot, Nicolas; Goodfellow, Ian; Duan, Yan; Abbeel, Pieter. "Adversarial Attacks on Neural Network Policies". arxiv.org. Retrieved 28 March 2020. 
  13. Stadie, Bradly C.; Abbeel, Pieter; Sutskever, Ilya. "arxiv.org". arxiv.org. Retrieved 28 March 2020. 
  14. Salimans, Tim; Ho, Jonathan; Chen, Xi; Sidor, Szymon; Sutskever, Ilya. "Evolution Strategies as a Scalable Alternative to Reinforcement Learning". arxiv.org. Retrieved 28 March 2020. 
  15. Mishra, Nikhil; Abbeel, Pieter; Mordatch, Igor. "Prediction and Control with Temporal Segment Models". arxiv.org. Retrieved 28 March 2020. 
  16. Mordatch, Igor; Abbeel, Pieter. "Emergence of Grounded Compositional Language in Multi-Agent Populations". arxiv.org. Retrieved 26 March 2020. 
  17. Tobin, Josh; Fong, Rachel; Ray, Alex; Schneider, Jonas; Zaremba, Wojciech; Abbeel, Pieter. "Domain Randomization for Transferring Deep Neural Networks from Simulation to the Real World". arxiv.org. Retrieved 28 March 2020. 
  18. "One-Shot Imitation Learning". arxiv.org. Retrieved 28 March 2020. 
  19. "Multi-Agent Actor-Critic for Mixed Cooperative-Competitive Environments". arxiv.org. 
  20. "[1709.04326] Learning with Opponent-Learning Awareness". Retrieved March 2, 2018. 
  21. gwern (August 16, 2017). "September 2017 news - Gwern.net". Retrieved March 2, 2018. 
  22. "Domain Randomization and Generative Models for Robotic Grasping". arxiv.org. Retrieved 27 March 2020. 
  23. Bin Peng, Xue; Andrychowicz, Marcin; Zaremba, Wojciech; Abbeel, Pieter. "Sim-to-Real Transfer of Robotic Control with Dynamics Randomization". arxiv.org. Retrieved 26 March 2020. 
  24. Frans, Kevin; Ho, Jonathan; Chen, Xi ChenXi; Abbeel, Pieter; Schulman, John. "Meta Learning Shared Hierarchies". arxiv.org. Retrieved 26 March 2020. 
  25. Grathwohl, Will; Choi, Dami; Wu, Yuhuai; Roeder, Geoffrey; Duvenaud, David. "Backpropagation through the Void: Optimizing control variates for black-box gradient estimation". arxiv.org. Retrieved 26 March 2020. 
  26. Milli, Smitha; Abbeel, Pieter; Mordatch, Igor. "Interpretable and Pedagogical Examples". arxiv.org. Retrieved 26 March 2020. 
  27. Louizos, Christos; Welling, Max; Kingma, Diederik P. "Learning Sparse Neural Networks through L0 Regularization". arxiv.org. Retrieved 26 March 2020. 
  28. Raiman, Jonathan; Raiman, Olivier. "DeepType: Multilingual Entity Linking by Neural Type System Evolution". arxiv.org. Retrieved 26 March 2020. 
  29. Houthooft, Rein; Chen, Richard Y.; Isola, Phillip; Stadie, Bradly C.; Wolski, Filip; Ho, Jonathan; Abbeel, Pieter. "Evolved Policy Gradients". arxiv.org. Retrieved 26 March 2020. 
  30. "Multi-Goal Reinforcement Learning: Challenging Robotics Environments and Request for Research". arxiv.org. Retrieved 26 March 2020. 
  31. Stadie, Bradly C.; Yang, Ge; Houthooft, Rein; Chen, Xi; Duan, Yan; Wu, Yuhuai; Abbeel, Pieter; Sutskever, Ilya. "Some Considerations on Learning to Explore via Meta-Reinforcement Learning". arxiv.org. Retrieved 26 March 2020. 
  32. Nichol, Alex; Achiam, Joshua; Schulman, John. "On First-Order Meta-Learning Algorithms". arxiv.org. Retrieved 26 March 2020. 
  33. Salimans, Tim; Zhang, Han; Radford, Alec; Metaxas, Dimitris. "Improving GANs Using Optimal Transport". arxiv.org. Retrieved 26 March 2020. 
  34. Wu, Cathy; Rajeswaran, Aravind; Duan, Yan; KumarVikash Kumar, Vikash; Bayen, Alexandre M; Kakade, Sham; Mordatch, Igor; Abbeel, Pieter. "Variance Reduction for Policy Gradient with Action-Dependent Factorized Baselines". arxiv.org. Retrieved 26 March 2020. 
  35. Nichol, Alex; Pfau, Vicki; Hesse, Christopher; Klimov, Oleg; Schulman, John. "Gotta Learn Fast: A New Benchmark for Generalization in RL". arxiv.org. Retrieved 26 March 2020. 
  36. "[1805.00899] AI safety via debate". Retrieved May 5, 2018. 
  37. Irving, Geoffrey; Amodei, Dario (May 3, 2018). "AI Safety via Debate". OpenAI Blog. Retrieved May 5, 2018. 
  38. Huang, Daniel; Dhariwal, Prafulla; Song, Dawn; Sutskever, Ilya. "GamePad: A Learning Environment for Theorem Proving". arxiv.org. Retrieved 26 March 2020. 
  39. "Learning Policy Representations in Multiagent Systems". arxiv.org. Retrieved 26 March 2020. 
  40. Kingma, Diederik P.; Dhariwal, Prafulla. "Glow: Generative Flow with Invertible 1x1 Convolutions". arxiv.org. Retrieved 26 March 2020. 
  41. Achiam, Joshua; Edwards, Harrison; Amodei, Dario; Abbeel, Pieter. "Variational Option Discovery Algorithms". arxiv.org. 
  42. "Learning Dexterous In-Hand Manipulation". arxiv.org. Retrieved 26 March 2020. 
  43. Grathwohl, Will; Chen, Ricky T. Q.; Bettencourt, Jesse; Sutskever, Ilya; Duvenaud, David. "FFJORD: Free-form Continuous Dynamics for Scalable Reversible Generative Models". arxiv.org. Retrieved 26 March 2020. 
  44. Christiano, Paul; Shlegeris, Buck; Amodei, Dario. "Supervising strong learners by amplifying weak experts". arxiv.org. Retrieved 26 March 2020.