Difference between revisions of "Talk:Timeline of OpenAI"
(→Partial unsolicited review by Vipul on 2024-10-04) |
|||
(31 intermediate revisions by 2 users not shown) | |||
Line 1: | Line 1: | ||
+ | == Partial unsolicited review by Vipul on 2024-10-04 == | ||
+ | |||
+ | * [https://gwern.net/scaling-hypothesis The Scaling Hypothesis] by Gwern✔, as well as his [https://www.lesswrong.com/users/gwern LessWrong comments on the subject], in particular [https://www.lesswrong.com/posts/N6vZEnCn6A95Xn39p/are-we-in-an-ai-overhang?commentId=jbD8siv7GMWxRro43 this comment]✔, basically Gwern's theory is that OpenAI believed in the scaling hypothesis and that others such as DeepMind didn't, and the success of GPT-3 etc. has been due to OpenAI's big bet on scaling paying off (so far). Please take a look at these and consider how to integrate them into the timeline.✔ | ||
+ | |||
+ | == Partial unsolicited review by Vipul on 2024-09-30 == | ||
+ | |||
+ | * There is a row in the full timeline for Sam Altman being fired, but nothing indicating his reinstatement. Although I know this is still pending completion, I think just adding the row about firing paints an actively misleading picture. Even if adding other rows will take time, it will be good to at least mention in that row that Altman would be reinstated shortly. Also "Altman has been instrumental" should be "Altman had been instrumental"; btw, feel free to copy/reuse content from the [[timeline of AI safety]].✔ | ||
+ | * Elon Musk's departure: whereas the row for the departure gives the reason as Musk's potential conflict of interest with Tesla, a later row says: "he left OpenAI in 2018 due to concerns about its profit-driven direction." It would be good to integrate that later information into the original departure row, as a reason that would emerge later (with the conflict of interest being the reason proffered at the time). | ||
+ | |||
+ | == Review by Vipul on 2023-08-01 == | ||
+ | |||
+ | Version reviewed: https://timelines.issarice.com/index.php?title=Timeline_of_OpenAI&oldid=75287 | ||
+ | |||
+ | I did not go through the full [[review process for timelines]] since I had already done this on 2023-07-01. I just read through the timeline and verified that my previous suggestions had been implemented. I have two additional pieces of feedback: | ||
+ | |||
+ | * I see a lot of mixed tense -- using past tense instead of present tense. Please review the contents of the full timeline and fix the tense for greater consistency.✔ | ||
+ | * It would be good to expand the row about Dario Amodei departing OpenAI. Specifically, in [https://timelines.issarice.com/index.php?title=Timeline_of_OpenAI&type=revision&diff=74312&oldid=74311 this change] the row added for Dario Amodei departure doesn't have all the stuff that was requested in the "What the timeline is still missing" entry that was deleted in the same diff.✔ | ||
+ | * It would be good to mention the starting of Anthropic (as a followup row to the row about Dario Amodei) and Alignment Research Center (started by Paul Christiano, who also left around the same time). Maybe also worth mentioning a bit more about competition dynamics between Anthropic and OpenAI, as well as OpenAI's collaboration with Alignment Research Center on GPT-4 safety evaluation (e.g., [https://arstechnica.com/information-technology/2023/03/openai-checked-to-see-whether-gpt-4-could-take-over-the-world/ here]).✔ | ||
+ | |||
+ | == Review by Vipul on 2023-07-01 == | ||
+ | |||
+ | Version reviewed: https://timelines.issarice.com/index.php?title=Timeline_of_OpenAI&oldid=74736 | ||
+ | |||
+ | Process used: [[Review process for timelines]] | ||
+ | |||
+ | === General standalone evaluation comments === | ||
+ | |||
+ | The timeline seems pretty good as a standalone timeline. A few general comments: | ||
+ | |||
+ | * The inclusion criteria should probably mention that a lot of OpenAI's research papers do not have separate rows on the timeline, and that some of these additional rows may be found on the talk page.✔ | ||
+ | * This timeline could benefit from the use of [[Template:Focused coverage period]] similar to how it's used for [[timeline of AI safety]]. That's because the subject matter of the timeline is experiencing a lot of changes and it would help for readers of the timeline to be able to get a quick sense of what time period the information in the timeline was last collated and reviewed.✔ | ||
+ | |||
+ | === Line-by-line comments === | ||
+ | |||
+ | * It would be worth adding a row on Sam Altman testifying before United States Congress (in May 2023) where he pushes for regulation of AI.✔ | ||
+ | * For the row introducing ChatGPT Plus, it may help to mention that this is tied to ChatGPT 4.✔ | ||
+ | |||
+ | === External verification === | ||
+ | |||
+ | ==== Wikipedia ==== | ||
+ | |||
+ | Vipul read the Wikipedia page about {{w|OpenAI}} and confirmed that most of the stuff on that page can also be found on the timeline. One small omission was that the Wikipedia page mentioned the names "Dactyl" and "Shadow Hand" when describing the efforts for a robot to solve the {{w|Rubik's Cube}}, but this timeline doesn't use those names. It may be nice to include those names; however, a quick look at the references used in the timeline doesn't show the names in the abstract of the referenced papers. So this can be left to Sebastian's discretion. | ||
+ | |||
+ | ==== ChatGPT ==== | ||
+ | |||
+ | Vipul asked ChatGPT to produce a timeline of OpenAI. The timeline produced by ChatGPT was clearly inferior, with incorrect dates, and the material in the timeline was a subset of the material in our timeline. So this test also passed. | ||
+ | |||
== Removed Rows == | == Removed Rows == | ||
− | In case any of these events | + | In case any of these events turns out to be relevant, please place it back on the timeline or let me know and I'll do it. |
{| class="sortable wikitable" | {| class="sortable wikitable" | ||
Line 7: | Line 54: | ||
| 2016 || {{dts|May 25}} || || Publication || "Adversarial Training Methods for Semi-Supervised Text Classification" is submitted to the {{w|ArXiv}}. The paper proposes a method that achieves better results on multiple benchmark semi-supervised and purely supervised tasks.<ref>{{cite web |last1=Miyato |first1=Takeru |last2=Dai |first2=Andrew M. |last3=Goodfellow |first3=Ian |title=Adversarial Training Methods for Semi-Supervised Text Classification |url=https://arxiv.org/abs/1605.07725 |website=arxiv.org |accessdate=28 March 2020}}</ref> | | 2016 || {{dts|May 25}} || || Publication || "Adversarial Training Methods for Semi-Supervised Text Classification" is submitted to the {{w|ArXiv}}. The paper proposes a method that achieves better results on multiple benchmark semi-supervised and purely supervised tasks.<ref>{{cite web |last1=Miyato |first1=Takeru |last2=Dai |first2=Andrew M. |last3=Goodfellow |first3=Ian |title=Adversarial Training Methods for Semi-Supervised Text Classification |url=https://arxiv.org/abs/1605.07725 |website=arxiv.org |accessdate=28 March 2020}}</ref> | ||
|- | |- | ||
− | | 2016 || {{dts| | + | | 2016 || {{dts|July 8}} || || Publication || "Adversarial Examples in the Physical World" is published. One of the authors is {{W|Ian Goodfellow}}, who is at OpenAI at the time.<ref>{{cite web |url=https://www.wired.com/2016/07/fool-ai-seeing-something-isnt/ |title=How To Fool AI Into Seeing Something That Isn't There |publisher=[[wikipedia:WIRED|WIRED]] |date=July 29, 2016 |first=Cade |last=Metz |accessdate=March 3, 2018}}</ref> |
|- | |- | ||
| 2016 || {{dts|October 11}} || || Publication || "Transfer from Simulation to Real World through Learning Deep Inverse Dynamics Model", a paper on {{w|robotics}}, is submitted to the {{w|ArXiv}}. It investigates settings where the sequence of states traversed in simulation remains reasonable for the real world.<ref>{{cite web |last1=Christiano |first1=Paul |last2=Shah |first2=Zain |last3=Mordatch |first3=Igor |last4=Schneider |first4=Jonas |last5=Blackwell |first5=Trevor |last6=Tobin |first6=Joshua |last7=Abbeel |first7=Pieter |last8=Zaremba |first8=Wojciech |title=Transfer from Simulation to Real World through Learning Deep Inverse Dynamics Model |url=https://arxiv.org/abs/1610.03518 |website=arxiv.org |accessdate=28 March 2020}}</ref> | | 2016 || {{dts|October 11}} || || Publication || "Transfer from Simulation to Real World through Learning Deep Inverse Dynamics Model", a paper on {{w|robotics}}, is submitted to the {{w|ArXiv}}. It investigates settings where the sequence of states traversed in simulation remains reasonable for the real world.<ref>{{cite web |last1=Christiano |first1=Paul |last2=Shah |first2=Zain |last3=Mordatch |first3=Igor |last4=Schneider |first4=Jonas |last5=Blackwell |first5=Trevor |last6=Tobin |first6=Joshua |last7=Abbeel |first7=Pieter |last8=Zaremba |first8=Wojciech |title=Transfer from Simulation to Real World through Learning Deep Inverse Dynamics Model |url=https://arxiv.org/abs/1610.03518 |website=arxiv.org |accessdate=28 March 2020}}</ref> | ||
Line 15: | Line 62: | ||
| 2016 || {{dts|November 2}} || || Publication || "Extensions and Limitations of the Neural GPU" is first submitted to the {{w|ArXiv}}. The paper shows that there are two simple ways of improving the performance of the Neural GPU: by carefully designing a curriculum, and by increasing model size.<ref>{{cite web |last1=Price |first1=Eric |last2=Zaremba |first2=Wojciech |last3=Sutskever |first3=Ilya |title=Extensions and Limitations of the Neural GPU |url=https://arxiv.org/abs/1611.00736 |website=arxiv.org |accessdate=28 March 2020}}</ref> | | 2016 || {{dts|November 2}} || || Publication || "Extensions and Limitations of the Neural GPU" is first submitted to the {{w|ArXiv}}. The paper shows that there are two simple ways of improving the performance of the Neural GPU: by carefully designing a curriculum, and by increasing model size.<ref>{{cite web |last1=Price |first1=Eric |last2=Zaremba |first2=Wojciech |last3=Sutskever |first3=Ilya |title=Extensions and Limitations of the Neural GPU |url=https://arxiv.org/abs/1611.00736 |website=arxiv.org |accessdate=28 March 2020}}</ref> | ||
|- | |- | ||
− | | 2016 || {{dts|November 8}} || || Publication || "Variational Lossy Autoencoder", a paper on generative models, is submitted to the {{w|ArXiv}}. It presents a method to learn global representations by combining Variational Autoencoder (VAE) with neural autoregressive models.<ref>{{cite web |last1=Chen |first1=Xi |last2=Kingma |first2=Diederik P. |last3=Salimans |first3=Tim |last4=Duan |first4=Yan |last5=Dhariwal |first5=Prafulla |last6=Schulman |first6=John |last7=Sutskever |first7=Ilya |last8=Abbeel |first8=Pieter |title=Variational Lossy Autoencoder |website=arxiv.org |accessdate=28 March 2020}}</ref> | + | | 2016 || {{dts|November 8}} || {{w|Generative models}} || Publication || "Variational Lossy Autoencoder", a paper on generative models, is submitted to the {{w|ArXiv}}. It presents a method to learn global representations by combining Variational Autoencoder (VAE) with neural autoregressive models.<ref>{{cite web |last1=Chen |first1=Xi |last2=Kingma |first2=Diederik P. |last3=Salimans |first3=Tim |last4=Duan |first4=Yan |last5=Dhariwal |first5=Prafulla |last6=Schulman |first6=John |last7=Sutskever |first7=Ilya |last8=Abbeel |first8=Pieter |title=Variational Lossy Autoencoder |website=arxiv.org |accessdate=28 March 2020}}</ref> |
|- | |- | ||
− | | 2016 || {{dts|November 9}} || || Publication || "RL<sup>2</sup>: Fast Reinforcement Learning via Slow Reinforcement Learning", a paper on {{w|reinforcement learning}}, is first submitted to the {{w|ArXiv}}. It seeks to bridge the gap in number of trials between the machine learning process which requires a huge number of trials, and animals which can learn new tasks in just a few trials, benefiting from their prior knowledge about the world.<ref>{{cite web |last1=Duan |first1=Yan |last2=Schulman |first2=John |last3=Chen |first3=Xi |last4=Bartlett |first4=Peter L. |last5=Sutskever |first5=Ilya |last6=Abbeel |first6=Pieter |title=RL2: Fast Reinforcement Learning via Slow Reinforcement Learning |website=arxiv.org |url=https://arxiv.org/abs/1611.02779|accessdate=28 March 2020}}</ref> | + | | 2016 || {{dts|November 9}} || {{w|Reinforcement learning}} || Publication || "RL<sup>2</sup>: Fast Reinforcement Learning via Slow Reinforcement Learning", a paper on {{w|reinforcement learning}}, is first submitted to the {{w|ArXiv}}. It seeks to bridge the gap in number of trials between the machine learning process which requires a huge number of trials, and animals which can learn new tasks in just a few trials, benefiting from their prior knowledge about the world.<ref>{{cite web |last1=Duan |first1=Yan |last2=Schulman |first2=John |last3=Chen |first3=Xi |last4=Bartlett |first4=Peter L. |last5=Sutskever |first5=Ilya |last6=Abbeel |first6=Pieter |title=RL2: Fast Reinforcement Learning via Slow Reinforcement Learning |website=arxiv.org |url=https://arxiv.org/abs/1611.02779|accessdate=28 March 2020}}</ref> |
|- | |- | ||
| 2016 || {{dts|November 11}} || || Publication || "A Connection between Generative Adversarial Networks, Inverse Reinforcement Learning, and Energy-Based Models", a paper on generative models, is first submitted to the {{w|ArXiv}}.<ref>{{cite web |last1=Finn |first1=Chelsea |last2=Christiano |first2=Paul |last3=Abbeel |first3=Pieter |last4=Levine |first4=Sergey |title=A Connection between Generative Adversarial Networks, Inverse Reinforcement Learning, and Energy-Based Models |website=arxiv.org |accessdate=28 March 2020|url=https://arxiv.org/abs/1611.03852}}</ref> | | 2016 || {{dts|November 11}} || || Publication || "A Connection between Generative Adversarial Networks, Inverse Reinforcement Learning, and Energy-Based Models", a paper on generative models, is first submitted to the {{w|ArXiv}}.<ref>{{cite web |last1=Finn |first1=Chelsea |last2=Christiano |first2=Paul |last3=Abbeel |first3=Pieter |last4=Levine |first4=Sergey |title=A Connection between Generative Adversarial Networks, Inverse Reinforcement Learning, and Energy-Based Models |website=arxiv.org |accessdate=28 March 2020|url=https://arxiv.org/abs/1611.03852}}</ref> | ||
Line 23: | Line 70: | ||
| 2016 || {{dts|November 14}} || || Publication || "On the Quantitative Analysis of Decoder-Based Generative Models", a paper on generative models, is submitted to the {{w|ArXiv}}. It introduces a technique to analyze the performance of decoder-based models.<ref>{{cite web |last1=Wu |first1=Yuhuai |last2=Burda |first2=Yuri |last3=Salakhutdinov |first3=Ruslan |last4=Grosse |first4=Roger |title=On the Quantitative Analysis of Decoder-Based Generative Models |url=https://arxiv.org/abs/1611.04273 |website=arxiv.org |accessdate=28 March 2020}}</ref> | | 2016 || {{dts|November 14}} || || Publication || "On the Quantitative Analysis of Decoder-Based Generative Models", a paper on generative models, is submitted to the {{w|ArXiv}}. It introduces a technique to analyze the performance of decoder-based models.<ref>{{cite web |last1=Wu |first1=Yuhuai |last2=Burda |first2=Yuri |last3=Salakhutdinov |first3=Ruslan |last4=Grosse |first4=Roger |title=On the Quantitative Analysis of Decoder-Based Generative Models |url=https://arxiv.org/abs/1611.04273 |website=arxiv.org |accessdate=28 March 2020}}</ref> | ||
|- | |- | ||
− | | 2016 || {{dts|November 15}} || || Publication || "#Exploration: A Study of Count-Based Exploration for Deep Reinforcement Learning", a paper on {{w|reinforcement learning}}, is first submitted to the {{w|ArXiv}}.<ref>{{cite web |title=#Exploration: A Study of Count-Based Exploration for Deep Reinforcement Learning |url=https://arxiv.org/abs/1611.04717 |website=arxiv.org |accessdate=28 March 2020}}</ref> | + | | 2016 || {{dts|November 15}} || {{w|Reinforcement learning}} || Publication || "#Exploration: A Study of Count-Based Exploration for Deep Reinforcement Learning", a paper on {{w|reinforcement learning}}, is first submitted to the {{w|ArXiv}}.<ref>{{cite web |title=#Exploration: A Study of Count-Based Exploration for Deep Reinforcement Learning |url=https://arxiv.org/abs/1611.04717 |website=arxiv.org |accessdate=28 March 2020}}</ref> |
+ | |- | ||
+ | | 2016 || December 21 || {{w|Reinforcement learning}} || Publication || "Faulty Reward Functions in the Wild" is published. The post explores a failed {{w|reinforcement learning}} algorithm, which leads to misspecifying the reward function.<ref>{{cite web |title=Faulty Reward Functions in the Wild |url=https://openai.com/blog/faulty-reward-functions/ |website=openai.com |accessdate=5 April 2020}}</ref> | ||
|- | |- | ||
| 2017 || {{dts|January 19}} || || Publication || "PixelCNN++: Improving the PixelCNN with Discretized Logistic Mixture Likelihood and Other Modifications", a paper on generative models, is submitted to the {{w|ArXiv}}.<ref>{{cite web |last1=Salimans |first1=Tim |last2=Karpathy |first2=Andrej |last3=Chen |first3=Xi |last4=Kingma |first4=Diederik P. |title=PixelCNN++: Improving the PixelCNN with Discretized Logistic Mixture Likelihood and Other Modifications |url=https://arxiv.org/abs/1701.05517 |website=arxiv.org |accessdate=28 March 2020}}</ref> | | 2017 || {{dts|January 19}} || || Publication || "PixelCNN++: Improving the PixelCNN with Discretized Logistic Mixture Likelihood and Other Modifications", a paper on generative models, is submitted to the {{w|ArXiv}}.<ref>{{cite web |last1=Salimans |first1=Tim |last2=Karpathy |first2=Andrej |last3=Chen |first3=Xi |last4=Kingma |first4=Diederik P. |title=PixelCNN++: Improving the PixelCNN with Discretized Logistic Mixture Likelihood and Other Modifications |url=https://arxiv.org/abs/1701.05517 |website=arxiv.org |accessdate=28 March 2020}}</ref> | ||
Line 40: | Line 89: | ||
|- | |- | ||
| 2017 || {{dts|March 21}} || || Publication || "One-Shot Imitation Learning", a paper on {{w|robotics}}, is first submitted to the {{w|ArXiv}}. The paper proposes a meta-learning framework for optimizing imitation learning.<ref>{{cite web |title=One-Shot Imitation Learning |url=https://arxiv.org/abs/1703.07326 |website=arxiv.org |accessdate=28 March 2020}}</ref> | | 2017 || {{dts|March 21}} || || Publication || "One-Shot Imitation Learning", a paper on {{w|robotics}}, is first submitted to the {{w|ArXiv}}. The paper proposes a meta-learning framework for optimizing imitation learning.<ref>{{cite web |title=One-Shot Imitation Learning |url=https://arxiv.org/abs/1703.07326 |website=arxiv.org |accessdate=28 March 2020}}</ref> | ||
+ | |- | ||
+ | | 2017 || {{dts|May 16}} || Robotics || Product release || OpenAI introduces a robotics system that can learn new tasks after observing them once. The system utilizes two {{w|neural network}}s: a vision network and an imitation network. The vision network processes simulate images to identify object positions, while the imitation network infers task intent and accomplishes the task from different starting configurations. The imitation network learns from training examples and generalizes the demonstrated behavior to new settings. By training on pairs of demonstrations for different tasks, the robot can predict the actions taken by the demonstrator. The system was successfully applied to block stacking, where it can parse human demonstrations and stack blocks into configurations not seen during training. To train a robust policy, a small amount of noise is injected into the outputs of a scripted policy to account for disturbances. OpenAI invites individuals to join their efforts in building this robot.<ref>{{cite web |title=Robots that Learn |url=https://openai.com/blog/robots-that-learn/ |website=openai.com |accessdate=5 April 2020}}</ref> | ||
|- | |- | ||
| 2017 || {{dts|June 7}} || || Publication || "Multi-Agent Actor-Critic for Mixed Cooperative-Competitive Environments" is submitted to the {{w|ArXiv}}. The paper explores deep {{w|reinforcement learning}} methods for multi-agent domains.<ref>{{cite web |title=Multi-Agent Actor-Critic for Mixed Cooperative-Competitive Environments |website=arxiv.org |accessdate=5 April 2020}}</ref> | | 2017 || {{dts|June 7}} || || Publication || "Multi-Agent Actor-Critic for Mixed Cooperative-Competitive Environments" is submitted to the {{w|ArXiv}}. The paper explores deep {{w|reinforcement learning}} methods for multi-agent domains.<ref>{{cite web |title=Multi-Agent Actor-Critic for Mixed Cooperative-Competitive Environments |website=arxiv.org |accessdate=5 April 2020}}</ref> | ||
− | |||
− | |||
|- | |- | ||
| 2017 || {{dts|October 17}} || {{w|Robotics}} || Publication || "Domain Randomization and Generative Models for Robotic Grasping", a paper on {{w|robotics}}, is first submitted to the {{w|ArXiv}}. It explores a novel data generation pipeline for training a deep neural network to perform grasp planning that applies the idea of domain randomization to object synthesis.<ref>{{cite web |title=Domain Randomization and Generative Models for Robotic Grasping |url=https://arxiv.org/abs/1710.06425 |website=arxiv.org |accessdate=27 March 2020}}</ref> | | 2017 || {{dts|October 17}} || {{w|Robotics}} || Publication || "Domain Randomization and Generative Models for Robotic Grasping", a paper on {{w|robotics}}, is first submitted to the {{w|ArXiv}}. It explores a novel data generation pipeline for training a deep neural network to perform grasp planning that applies the idea of domain randomization to object synthesis.<ref>{{cite web |title=Domain Randomization and Generative Models for Robotic Grasping |url=https://arxiv.org/abs/1710.06425 |website=arxiv.org |accessdate=27 March 2020}}</ref> | ||
Line 54: | Line 103: | ||
|- | |- | ||
| 2017 || {{dts|November 2}} || || Publication || "Interpretable and Pedagogical Examples", a paper on language, is first submitted to the {{w|ArXiv}}. It shows that training the student and teacher iteratively, rather than jointly, can produce interpretable teaching strategies.<ref>{{cite web |last1=Milli |first1=Smitha |last2=Abbeel |first2=Pieter |last3=Mordatch |first3=Igor |title=Interpretable and Pedagogical Examples |url=https://arxiv.org/abs/1711.00694 |website=arxiv.org |accessdate=26 March 2020}}</ref> | | 2017 || {{dts|November 2}} || || Publication || "Interpretable and Pedagogical Examples", a paper on language, is first submitted to the {{w|ArXiv}}. It shows that training the student and teacher iteratively, rather than jointly, can produce interpretable teaching strategies.<ref>{{cite web |last1=Milli |first1=Smitha |last2=Abbeel |first2=Pieter |last3=Mordatch |first3=Igor |title=Interpretable and Pedagogical Examples |url=https://arxiv.org/abs/1711.00694 |website=arxiv.org |accessdate=26 March 2020}}</ref> | ||
+ | |- | ||
+ | | 2017 || {{dts|December 4}} || || Publication || "Learning Sparse Neural Networks through ''L<sub>0</sub>'' Regularization", a paper on {{w|reinforcement learning}}, is submitted to the {{w|ArXiv}}. It describes a method which allows for straightforward and efficient learning of model structures with stochastic gradient descent.<ref>{{cite web |last1=Louizos |first1=Christos |last2=Welling |first2=Max |last3=Kingma |first3=Diederik P. |title=Learning Sparse Neural Networks through L0 Regularization |url=https://arxiv.org/abs/1712.01312 |website=arxiv.org |accessdate=26 March 2020}}</ref> | ||
+ | |- | ||
+ | | 2017 || {{dts|December}} || || Publication || The 2017 AI Index is published. OpenAI contributes to the report.<ref>{{cite web |url=https://www.theverge.com/2017/12/1/16723238/ai-artificial-intelligence-progress-index |date=December 1, 2017 |publisher=The Verge |title=Artificial intelligence isn't as clever as we think, but that doesn't stop it being a threat |first=James |last=Vincent |accessdate=March 2, 2018}}</ref> | ||
+ | |- | ||
+ | | 2018 || {{dts|February 3}} || || Publication || "DeepType: Multilingual Entity Linking by Neural Type System Evolution" a paper on {{w|reinforcement learning}}, is submitted to the {{w|ArXiv}}.<ref>{{cite web |last1=Raiman |first1=Jonathan |last2=Raiman |first2=Olivier |title=DeepType: Multilingual Entity Linking by Neural Type System Evolution |url=https://arxiv.org/abs/1802.01021 |website=arxiv.org |accessdate=26 March 2020}}</ref> | ||
+ | |- | ||
+ | | 2018 || {{dts|February 13}} || || Publication || "Evolved Policy Gradients", a {{w|reinforcement learning}} paper, is first submitted to the {{w|ArXiv}}. It proposes a metalearning approach for learning gradient-based reinforcement learning (RL) algorithms.<ref>{{cite web |last1=Houthooft |first1=Rein |last2=Chen |first2=Richard Y. |last3=Isola |first3=Phillip |last4=Stadie |first4=Bradly C. |last5=Wolski |first5=Filip |last6=Ho |first6=Jonathan |last7=Abbeel |first7=Pieter |title=Evolved Policy Gradients |url=https://arxiv.org/abs/1802.04821 |website=arxiv.org |accessdate=26 March 2020}}</ref> | ||
+ | |- | ||
+ | | 2018 || {{dts|February 26}} || || Publication || "Multi-Goal Reinforcement Learning: Challenging Robotics Environments and Request for Research" is first submitted to the {{w|ArXiv}}. The paper introduces a suite of challenging continuous control tasks based on currently existing robotics hardware, and presents a set of concrete research ideas for improving {{w|reinforcement learning}} algorithms.<ref>{{cite web |title=Multi-Goal Reinforcement Learning: Challenging Robotics Environments and Request for Research |url=https://arxiv.org/abs/1802.09464 |website=arxiv.org |accessdate=26 March 2020}}</ref> | ||
+ | |- | ||
+ | | 2018 || {{dts|March 3}} || || Publication || "Some Considerations on Learning to Explore via Meta-Reinforcement Learning", a paper on {{w|reinforcement learning}}, is first submitted to {{w|ArXiv}}. It considers the problem of exploration in meta reinforcement learning.<ref>{{cite web |last1=Stadie |first1=Bradly C. |last2=Yang |first2=Ge |last3=Houthooft |first3=Rein |last4=Chen |first4=Xi |last5=Duan |first5=Yan |last6=Wu |first6=Yuhuai |last7=Abbeel |first7=Pieter |last8=Sutskever |first8=Ilya |title=Some Considerations on Learning to Explore via Meta-Reinforcement Learning |url=https://arxiv.org/abs/1803.01118 |website=arxiv.org |accessdate=26 March 2020}}</ref> | ||
+ | |- | ||
+ | | 2018 || {{dts|March 8}} || || Publication || "On First-Order Meta-Learning Algorithms", a paper on {{w|reinforcement learning}}, is submitted to {{w|ArXiv}}. It analyzes meta-learning problems, where there is a distribution of tasks.<ref>{{cite web |last1=Nichol |first1=Alex |last2=Achiam |first2=Joshua |last3=Schulman |first3=John |title=On First-Order Meta-Learning Algorithms |url=https://arxiv.org/abs/1803.02999 |website=arxiv.org |accessdate=26 March 2020}}</ref> | ||
+ | |- | ||
+ | | 2018 || {{dts|March 15}} || || Publication || "Improving GANs Using Optimal Transport", a paper on generative models, is first submitted to the {{w|ArXiv}}. It presents Optimal Transport GAN (OT-GAN), a variant of generative adversarial nets minimizing a new metric measuring the distance between the generator distribution and the data distribution.<ref>{{cite web |last1=Salimans |first1=Tim |last2=Zhang |first2=Han |last3=Radford |first3=Alec |last4=Metaxas |first4=Dimitris |title=Improving GANs Using Optimal Transport |url=https://arxiv.org/abs/1803.05573 |website=arxiv.org |accessdate=26 March 2020}}</ref> | ||
+ | |- | ||
+ | | 2018 || {{dts|March 20}} || || Publication || "Variance Reduction for Policy Gradient with Action-Dependent Factorized Baselines", a paper on {{w|reinforcement learning}}, is submitted to the {{w|ArXiv}}. The paper shows that the general idea of including additional information in baselines for improved variance reduction can be extended to partially observed and multi-agent tasks.<ref>{{cite web |last1=Wu |first1=Cathy |last2=Rajeswaran |first2=Aravind |last3=Duan |first3=Yan |last4=KumarVikash Kumar |first4=Vikash |last5=Bayen |first5=Alexandre M |last6=Kakade |first6=Sham |last7=Mordatch |first7=Igor |last8=Abbeel |first8=Pieter |title=Variance Reduction for Policy Gradient with Action-Dependent Factorized Baselines |url=https://arxiv.org/abs/1803.07246 |website=arxiv.org |accessdate=26 March 2020}}</ref> | ||
+ | |- | ||
+ | | 2018 || {{dts|April 10}} || || Publication || "Gotta Learn Fast: A New Benchmark for Generalization in RL", a paper on {{w|reinforcement learning}}, is first submitted to the {{w|ArXiv}}. The report presents a new {{w|reinforcement learning}} benchmark intended to measure the performance of transfer learning and few-shot learning algorithms in the reinforcement learning domain.<ref>{{cite web |last1=Nichol |first1=Alex |last2=Pfau |first2=Vicki |last3=Hesse |first3=Christopher |last4=Klimov |first4=Oleg |last5=Schulman |first5=John |title=Gotta Learn Fast: A New Benchmark for Generalization in RL |url=https://arxiv.org/abs/1804.03720 |website=arxiv.org |accessdate=26 March 2020}}</ref> | ||
+ | |- | ||
+ | | 2018|| {{dts|June 2}} || || Publication || OpenAI publishes "GamePad: A Learning Environment for Theorem Proving" in {{w|arXiv}}. The paper introduces a system called GamePad that can be used to explore the application of machine learning methods to theorem proving in the Coq proof assistant.<ref>{{cite web |last1=Huang |first1=Daniel |last2=Dhariwal |first2=Prafulla |last3=Song |first3=Dawn |last4=Sutskever |first4=Ilya |title=GamePad: A Learning Environment for Theorem Proving |url=https://arxiv.org/abs/1806.00608 |website=arxiv.org |accessdate=26 March 2020}}</ref> | ||
+ | |- | ||
+ | | 2018 || {{dts|June 17}} || {{w|Reinforcement learning}} || Publication || OpenAI publishes paper on learning policy representations in multiagent systems. The paper proposes a general learning framework for modeling agent behavior in any multiagent system using only a handful of interaction data.<ref>{{cite web |title=Learning Policy Representations in Multiagent Systems |url=https://arxiv.org/abs/1806.06464 |website=arxiv.org |accessdate=26 March 2020}}</ref> | ||
+ | |- | ||
+ | | 2018 || {{dts|July 9}} || Generative models || Publication || "Glow: Generative Flow with Invertible 1x1 Convolutions" is first submitted to the {{w|ArXiv}}. The paper proposes a method for obtaining a significant improvement in log-likelihood on standard benchmarks.<ref>{{cite web |last1=Kingma |first1=Diederik P. |last2=Dhariwal |first2=Prafulla |title=Glow: Generative Flow with Invertible 1x1 Convolutions |url=https://arxiv.org/abs/1807.03039 |website=arxiv.org |accessdate=26 March 2020}}</ref> | ||
+ | |- | ||
+ | | 2018 || {{dts|July 26}} || {{w|Reinforcement learning} || Publication || OpenAI publishes paper on variational option discovery algorithms. The paper highlights a tight connection between variational option discovery methods and variational autoencoders, and introduces Variational Autoencoding Learning of Options by Reinforcement (VALOR), a new method derived from the connection.<ref>{{cite web |last1=Achiam |first1=Joshua |last2=Edwards |first2=Harrison |last3=Amodei |first3=Dario |last4=Abbeel |first4=Pieter |title=Variational Option Discovery Algorithms |website=arxiv.org |accessdate=26 March 2020}}</ref> | ||
+ | |- | ||
+ | | 2018 || {{dts|August 1}} || {{w|Robotics}} || Publication || OpenAI publishes paper describing the use of {{w|reinforcement learning}} to learn dexterous in-hand manipulation policies which can perform vision-based object reorientation on a physical Shadow Dexterous Hand.<ref>{{cite web |title=Learning Dexterous In-Hand Manipulation |url=https://arxiv.org/abs/1808.00177 |website=arxiv.org |accessdate=26 March 2020}}</ref> | ||
+ | |- | ||
+ | | 2018 || {{dts|October 2}} || Generative models || Publication || OpenAI publishes paper on FFJORD (free-form continuous dynamics for scalable reversible generative models), aiming to demonstrate their approach on high-dimensional density estimation, image generation, and variational inference.<ref>{{cite web |last1=Grathwohl |first1=Will |last2=Chen |first2=Ricky T. Q. |last3=Bettencourt |first3=Jesse |last4=Sutskever |first4=Ilya |last5=Duvenaud |first5=David |title=FFJORD: Free-form Continuous Dynamics for Scalable Reversible Generative Models |url=https://arxiv.org/abs/1810.01367 |website=arxiv.org |accessdate=26 March 2020}}</ref> | ||
+ | |- | ||
+ | | 2018 || {{dts|October 19}} || {{w|Reinforcement learning}} || Publication || OpenAI publishes paper proposing Iterated Amplification, an alternative training strategy which progressively builds up a training signal for difficult problems by combining solutions to easier subproblems.<ref>{{cite web |last1=Christiano |first1=Paul |last2=Shlegeris |first2=Buck |last3=Amodei |first3=Dario |title=Supervising strong learners by amplifying weak experts |url=https://arxiv.org/abs/1810.08575 |website=arxiv.org |accessdate=26 March 2020}}</ref> | ||
+ | |- | ||
+ | | 2018 || {{Dts|November 1}} || || Publication || OpenAI publishes research paper detailing AI able to defeat humans at the retro platformer [[w:Montezuma's Revenge (video game)|Montezuma’s Revenge]]. The top-performing iteration found 22 of the 24 rooms in the first level, and occasionally discovered all 24.<ref>{{cite web |last1=Wiggers |first1=Kyle |title=OpenAI made a system that’s better at Montezuma’s Revenge than humans |url=https://venturebeat.com/2018/11/01/OpenAI-made-a-system-thats-better-at-montezumas-revenge-than-humans/ |website=venturebeat.com |accessdate=15 June 2019}}</ref><ref>{{cite web |last1=Vincent |first1=James |title=New research from OpenAI uses curious AI to beat video games |url=https://www.theverge.com/2018/11/1/18051196/ai-artificial-intelligence-curiosity-OpenAI-montezumas-revenge-noisy-tv-problem |website=theverge.com |accessdate=15 June 2019}}</ref> | ||
+ | |- | ||
+ | | 2018 || {{dts|November 5}} || {{w|Reinforcement learning}} || Publication || OpenAI publishes paper proposing a plan online and learn offline (POLO) framework for the setting where an agent, with an internal model, needs to continually act and learn in the world.<ref>{{cite web |last1=Lowrey |first1=Kendall |last2=Rajeswaran |first2=Aravind |last3=Kakade |first3=Sham |last4=Todorov |first4=Emanuel |last5=Mordatch |first5=Igor |title=Plan Online, Learn Offline: Efficient Learning and Exploration via Model-Based Control |website=arxiv.org |accessdate=26 March 2020}}</ref> | ||
+ | |- | ||
+ | | 2018 || {{dts|December 14}} || {{w|Reinforcement learning}} || Publication || OpenAI publishes paper demonstrating that a simple and easy-to-measure statistic called the gradient noise scale predicts the largest useful batch size across many domains and applications, including a number of {{w|supervised learning}} datasets, {{w|reinforcement learning}} domains, and even generative model training.<ref>{{cite web |last1=McCandlish |first1=Sam |last2=Kaplan |first2=Jared |last3=Amodei |first3=Dario |last4=OpenAI Dota Team |title=An Empirical Model of Large-Batch Training |url=https://arxiv.org/abs/1812.06162 |website=arxiv.org |accessdate=25 March 2020}}</ref> | ||
+ | |- | ||
+ | |||
+ | | 2019 || {{dts|February 4}} || || Publication || OpenAI publishes paper showing computational limitations in robust classification and win-win results.<ref>{{cite web |last1=Degwekar |first1=Akshay |last2=Nakkiran |first2=Preetum |last3=Vaikuntanathan |first3=Vinod |title=Computational Limitations in Robust Classification and Win-Win Results |url=https://arxiv.org/abs/1902.01086 |website=arxiv.org |accessdate=25 March 2020}}</ref> | ||
+ | |- | ||
+ | | 2019 || {{dts|March 2}} || || Publication || OpenAi publishes paper presenting an artificial intelligence research environment that aims to simulate the {{w|natural environment}} setting in microcosm.<ref>{{cite web |last1=Suarez |first1=Joseph |last2=Du |first2=Yilun |last3=Isola |first3=Phillip |last4=Mordatch |first4=Igor |title=Neural MMO: A Massively Multiagent Game Environment for Training and Evaluating Intelligent Agents |url=https://arxiv.org/abs/1903.00784 |website=arxiv.org |accessdate=25 March 2020}}</ref> | ||
+ | |- | ||
+ | | 2019 || {{dts|March 20}} || || Publication || OpenAI publishes paper presenting techniques to scale MCMC based energy base models training on continuous neural networks.<ref>{{cite web |last1=Du |first1=Yilun |last2=Mordatch |first2=Igor |title=Implicit Generation and Generalization in Energy-Based Models |url=https://arxiv.org/abs/1903.08689 |website=arxiv.org |accessdate=25 March 2020}}</ref> | ||
+ | |- | ||
+ | | 2019 || {{dts|May 3}} || || Publication || OpenAI publishes study on the transfer of adversarial robustness of [[w:deep learning|deep neural networks]] between different perturbation types.<ref>{{cite web |last1=Kang |first1=Daniel |last2=Sun |first2=Yi |last3=Brown |first3=Tom |last4=Hendrycks |first4=Dan |last5=Steinhardt |first5=Jacob |title=Transfer of Adversarial Robustness Between Perturbation Types |url=https://arxiv.org/abs/1905.01034 |website=arxiv.org |accessdate=25 March 2020}}</ref> | ||
+ | |- | ||
+ | |||
+ | | 2019 || {{dts|May 28}} || || Publication || OpenAI publishes study on the dynamics of Stochastic Gradient Descent (SGD) in learning [[w:Deep learning|deep neural networks]] for several real and synthetic classification tasks.<ref>{{cite web |last1=Nakkiran |first1=Preetum |last2=Kaplun |first2=Gal |last3=Kalimeris |first3=Dimitris |last4=Yang |first4=Tristan |last5=Edelman |first5=Benjamin L. |last6=Zhang |first6=Fred |last7=Barak |first7=Boaz |title=SGD on Neural Networks Learns Functions of Increasing Complexity |url=https://arxiv.org/abs/1905.11604 |website=arxiv.org |accessdate=25 March 2020}}</ref> | ||
+ | |- | ||
+ | | 2019 || {{dts|July 10}} || || Publication || OpenAI publishes paper arguing that competitive pressures could incentivize AI companies to underinvest in ensuring their systems are safe, secure, and have a positive social impact.<ref>{{cite web |last1=Askell |first1=Amanda |last2=Brundage |first2=Miles |last3=Hadfield |first3=Gillian |title=The Role of Cooperation in Responsible AI Development |url=https://arxiv.org/abs/1907.04534 |website=arxiv.org |accessdate=25 March 2020}}</ref> | ||
+ | |- | ||
+ | | 2020 || {{dts|January 23}} || || Publication || OpenAI publishes study on empirical scaling laws for language model performance on the cross-entropy loss.<ref>{{cite web |last1=Kaplan |first1=Jared |last2=McCandlish |first2=Sam |last3=Henighan |first3=Tom |last4=Brown |first4=Tom B. |last5=Chess |first5=Benjamin |last6=Child |first6=Rewon |last7=Gray |first7=Scott |last8=Radford |first8=Alec |last9=Wu |first9=Jeffrey |last10=Amodei |first10=Dario |title=Scaling Laws for Neural Language Models |url=https://arxiv.org/abs/2001.08361 |website=arxiv.org |accessdate=25 March 2020}}</ref> | ||
|} | |} | ||
+ | |||
+ | == References == |
Latest revision as of 13:53, 4 October 2024
Contents
Partial unsolicited review by Vipul on 2024-10-04
- The Scaling Hypothesis by Gwern✔, as well as his LessWrong comments on the subject, in particular this comment✔, basically Gwern's theory is that OpenAI believed in the scaling hypothesis and that others such as DeepMind didn't, and the success of GPT-3 etc. has been due to OpenAI's big bet on scaling paying off (so far). Please take a look at these and consider how to integrate them into the timeline.✔
Partial unsolicited review by Vipul on 2024-09-30
- There is a row in the full timeline for Sam Altman being fired, but nothing indicating his reinstatement. Although I know this is still pending completion, I think just adding the row about firing paints an actively misleading picture. Even if adding other rows will take time, it will be good to at least mention in that row that Altman would be reinstated shortly. Also "Altman has been instrumental" should be "Altman had been instrumental"; btw, feel free to copy/reuse content from the timeline of AI safety.✔
- Elon Musk's departure: whereas the row for the departure gives the reason as Musk's potential conflict of interest with Tesla, a later row says: "he left OpenAI in 2018 due to concerns about its profit-driven direction." It would be good to integrate that later information into the original departure row, as a reason that would emerge later (with the conflict of interest being the reason proffered at the time).
Review by Vipul on 2023-08-01
Version reviewed: https://timelines.issarice.com/index.php?title=Timeline_of_OpenAI&oldid=75287
I did not go through the full review process for timelines since I had already done this on 2023-07-01. I just read through the timeline and verified that my previous suggestions had been implemented. I have two additional pieces of feedback:
- I see a lot of mixed tense -- using past tense instead of present tense. Please review the contents of the full timeline and fix the tense for greater consistency.✔
- It would be good to expand the row about Dario Amodei departing OpenAI. Specifically, in this change the row added for Dario Amodei departure doesn't have all the stuff that was requested in the "What the timeline is still missing" entry that was deleted in the same diff.✔
- It would be good to mention the starting of Anthropic (as a followup row to the row about Dario Amodei) and Alignment Research Center (started by Paul Christiano, who also left around the same time). Maybe also worth mentioning a bit more about competition dynamics between Anthropic and OpenAI, as well as OpenAI's collaboration with Alignment Research Center on GPT-4 safety evaluation (e.g., here).✔
Review by Vipul on 2023-07-01
Version reviewed: https://timelines.issarice.com/index.php?title=Timeline_of_OpenAI&oldid=74736
Process used: Review process for timelines
General standalone evaluation comments
The timeline seems pretty good as a standalone timeline. A few general comments:
- The inclusion criteria should probably mention that a lot of OpenAI's research papers do not have separate rows on the timeline, and that some of these additional rows may be found on the talk page.✔
- This timeline could benefit from the use of Template:Focused coverage period similar to how it's used for timeline of AI safety. That's because the subject matter of the timeline is experiencing a lot of changes and it would help for readers of the timeline to be able to get a quick sense of what time period the information in the timeline was last collated and reviewed.✔
Line-by-line comments
- It would be worth adding a row on Sam Altman testifying before United States Congress (in May 2023) where he pushes for regulation of AI.✔
- For the row introducing ChatGPT Plus, it may help to mention that this is tied to ChatGPT 4.✔
External verification
Wikipedia
Vipul read the Wikipedia page about OpenAI and confirmed that most of the stuff on that page can also be found on the timeline. One small omission was that the Wikipedia page mentioned the names "Dactyl" and "Shadow Hand" when describing the efforts for a robot to solve the Rubik's Cube, but this timeline doesn't use those names. It may be nice to include those names; however, a quick look at the references used in the timeline doesn't show the names in the abstract of the referenced papers. So this can be left to Sebastian's discretion.
ChatGPT
Vipul asked ChatGPT to produce a timeline of OpenAI. The timeline produced by ChatGPT was clearly inferior, with incorrect dates, and the material in the timeline was a subset of the material in our timeline. So this test also passed.
Removed Rows
In case any of these events turns out to be relevant, please place it back on the timeline or let me know and I'll do it.
Year | Month and date | Domain | Event type | Details |
---|---|---|---|---|
2016 | May 25 | Publication | "Adversarial Training Methods for Semi-Supervised Text Classification" is submitted to the ArXiv. The paper proposes a method that achieves better results on multiple benchmark semi-supervised and purely supervised tasks.[1] | |
2016 | July 8 | Publication | "Adversarial Examples in the Physical World" is published. One of the authors is Ian Goodfellow, who is at OpenAI at the time.[2] | |
2016 | October 11 | Publication | "Transfer from Simulation to Real World through Learning Deep Inverse Dynamics Model", a paper on robotics, is submitted to the ArXiv. It investigates settings where the sequence of states traversed in simulation remains reasonable for the real world.[3] | |
2016 | October 18 | Publication | "Semi-supervised Knowledge Transfer for Deep Learning from Private Training Data", a paper on safety, is submitted to the ArXiv. It shows an approach to providing strong privacy guarantees for training data: Private Aggregation of Teacher Ensembles (PATE).[4] | |
2016 | November 2 | Publication | "Extensions and Limitations of the Neural GPU" is first submitted to the ArXiv. The paper shows that there are two simple ways of improving the performance of the Neural GPU: by carefully designing a curriculum, and by increasing model size.[5] | |
2016 | November 8 | Generative models | Publication | "Variational Lossy Autoencoder", a paper on generative models, is submitted to the ArXiv. It presents a method to learn global representations by combining Variational Autoencoder (VAE) with neural autoregressive models.[6] |
2016 | November 9 | Reinforcement learning | Publication | "RL2: Fast Reinforcement Learning via Slow Reinforcement Learning", a paper on reinforcement learning, is first submitted to the ArXiv. It seeks to bridge the gap in number of trials between the machine learning process which requires a huge number of trials, and animals which can learn new tasks in just a few trials, benefiting from their prior knowledge about the world.[7] |
2016 | November 11 | Publication | "A Connection between Generative Adversarial Networks, Inverse Reinforcement Learning, and Energy-Based Models", a paper on generative models, is first submitted to the ArXiv.[8] | |
2016 | November 14 | Publication | "On the Quantitative Analysis of Decoder-Based Generative Models", a paper on generative models, is submitted to the ArXiv. It introduces a technique to analyze the performance of decoder-based models.[9] | |
2016 | November 15 | Reinforcement learning | Publication | "#Exploration: A Study of Count-Based Exploration for Deep Reinforcement Learning", a paper on reinforcement learning, is first submitted to the ArXiv.[10] |
2016 | December 21 | Reinforcement learning | Publication | "Faulty Reward Functions in the Wild" is published. The post explores a failed reinforcement learning algorithm, which leads to misspecifying the reward function.[11] |
2017 | January 19 | Publication | "PixelCNN++: Improving the PixelCNN with Discretized Logistic Mixture Likelihood and Other Modifications", a paper on generative models, is submitted to the ArXiv.[12] | |
2017 | February 8 | Publication | "Adversarial Attacks on Neural Network Policies" is submitted to the ArXiv. The paper shows that adversarial attacks are effective when targeting neural network policies in reinforcement learning.[13] | |
2017 | March 6 | Publication | "Third-Person Imitation Learning", a paper on robotics, is submitted to the ArXiv. It presents a method for unsupervised third-person imitation learning.[14] | |
2017 | March 10 | Publication | "Evolution Strategies as a Scalable Alternative to Reinforcement Learning" is submitted to the ArXiv. It explores the use of Evolution Strategies (ES), a class of black box optimization algorithms.[15] | |
2017 | March 12 | Publication | "Prediction and Control with Temporal Segment Models", a paper on generative models, is first submitted to the ArXiv. It introduces a method for learning the dynamics of complex nonlinear systems based on deep generative models over temporal segments of states and actions.[16] | |
2017 | March 15 | Publication | "Emergence of Grounded Compositional Language in Multi-Agent Populations" is first submitted to ArXiv. The paper proposes a multi-agent learning environment and learning methods that bring about emergence of a basic compositional language.[17] | |
2017 | March 20 | Publication | "Domain Randomization for Transferring Deep Neural Networks from Simulation to the Real World", a paper on robotics, is subitted to the ArXiv. It explores domain randomization, a simple technique for training models on simulated images that transfer to real images by randomizing rendering in the simulator.[18] | |
2017 | March 21 | Publication | "One-Shot Imitation Learning", a paper on robotics, is first submitted to the ArXiv. The paper proposes a meta-learning framework for optimizing imitation learning.[19] | |
2017 | May 16 | Robotics | Product release | OpenAI introduces a robotics system that can learn new tasks after observing them once. The system utilizes two neural networks: a vision network and an imitation network. The vision network processes simulate images to identify object positions, while the imitation network infers task intent and accomplishes the task from different starting configurations. The imitation network learns from training examples and generalizes the demonstrated behavior to new settings. By training on pairs of demonstrations for different tasks, the robot can predict the actions taken by the demonstrator. The system was successfully applied to block stacking, where it can parse human demonstrations and stack blocks into configurations not seen during training. To train a robust policy, a small amount of noise is injected into the outputs of a scripted policy to account for disturbances. OpenAI invites individuals to join their efforts in building this robot.[20] |
2017 | June 7 | Publication | "Multi-Agent Actor-Critic for Mixed Cooperative-Competitive Environments" is submitted to the ArXiv. The paper explores deep reinforcement learning methods for multi-agent domains.[21] | |
2017 | October 17 | Robotics | Publication | "Domain Randomization and Generative Models for Robotic Grasping", a paper on robotics, is first submitted to the ArXiv. It explores a novel data generation pipeline for training a deep neural network to perform grasp planning that applies the idea of domain randomization to object synthesis.[22] |
2017 | October 18 | Publication | "Sim-to-Real Transfer of Robotic Control with Dynamics Randomization", a paper on robotics, is first submitted to ArXiv. It describes a solution for strategies that are successful in simulation but may not transfer to their real world counterparts due to modeling error.[23] | |
2017 | October 26 | Publication | "Meta Learning Shared Hierarchies", a paper on reinforcement learning, is submitted to the ArXiv. The paper describes the development of a metalearning approach for learning hierarchically structured policies, improving sample efficiency on unseen tasks through the use of shared primitives.[24] | |
2017 | October 31 | Publication | "Backpropagation through the Void: Optimizing control variates for black-box gradient estimation", a paper on reinforcement learning, is first submitted to the ArXiv. It introduces a general framework for learning low-variance, unbiased gradient estimators for black-box functions of random variables.[25] | |
2017 | November 2 | Publication | "Interpretable and Pedagogical Examples", a paper on language, is first submitted to the ArXiv. It shows that training the student and teacher iteratively, rather than jointly, can produce interpretable teaching strategies.[26] | |
2017 | December 4 | Publication | "Learning Sparse Neural Networks through L0 Regularization", a paper on reinforcement learning, is submitted to the ArXiv. It describes a method which allows for straightforward and efficient learning of model structures with stochastic gradient descent.[27] | |
2017 | December | Publication | The 2017 AI Index is published. OpenAI contributes to the report.[28] | |
2018 | February 3 | Publication | "DeepType: Multilingual Entity Linking by Neural Type System Evolution" a paper on reinforcement learning, is submitted to the ArXiv.[29] | |
2018 | February 13 | Publication | "Evolved Policy Gradients", a reinforcement learning paper, is first submitted to the ArXiv. It proposes a metalearning approach for learning gradient-based reinforcement learning (RL) algorithms.[30] | |
2018 | February 26 | Publication | "Multi-Goal Reinforcement Learning: Challenging Robotics Environments and Request for Research" is first submitted to the ArXiv. The paper introduces a suite of challenging continuous control tasks based on currently existing robotics hardware, and presents a set of concrete research ideas for improving reinforcement learning algorithms.[31] | |
2018 | March 3 | Publication | "Some Considerations on Learning to Explore via Meta-Reinforcement Learning", a paper on reinforcement learning, is first submitted to ArXiv. It considers the problem of exploration in meta reinforcement learning.[32] | |
2018 | March 8 | Publication | "On First-Order Meta-Learning Algorithms", a paper on reinforcement learning, is submitted to ArXiv. It analyzes meta-learning problems, where there is a distribution of tasks.[33] | |
2018 | March 15 | Publication | "Improving GANs Using Optimal Transport", a paper on generative models, is first submitted to the ArXiv. It presents Optimal Transport GAN (OT-GAN), a variant of generative adversarial nets minimizing a new metric measuring the distance between the generator distribution and the data distribution.[34] | |
2018 | March 20 | Publication | "Variance Reduction for Policy Gradient with Action-Dependent Factorized Baselines", a paper on reinforcement learning, is submitted to the ArXiv. The paper shows that the general idea of including additional information in baselines for improved variance reduction can be extended to partially observed and multi-agent tasks.[35] | |
2018 | April 10 | Publication | "Gotta Learn Fast: A New Benchmark for Generalization in RL", a paper on reinforcement learning, is first submitted to the ArXiv. The report presents a new reinforcement learning benchmark intended to measure the performance of transfer learning and few-shot learning algorithms in the reinforcement learning domain.[36] | |
2018 | June 2 | Publication | OpenAI publishes "GamePad: A Learning Environment for Theorem Proving" in arXiv. The paper introduces a system called GamePad that can be used to explore the application of machine learning methods to theorem proving in the Coq proof assistant.[37] | |
2018 | June 17 | Reinforcement learning | Publication | OpenAI publishes paper on learning policy representations in multiagent systems. The paper proposes a general learning framework for modeling agent behavior in any multiagent system using only a handful of interaction data.[38] |
2018 | July 9 | Generative models | Publication | "Glow: Generative Flow with Invertible 1x1 Convolutions" is first submitted to the ArXiv. The paper proposes a method for obtaining a significant improvement in log-likelihood on standard benchmarks.[39] |
2018 | July 26 | Reinforcement learning} | Publication | OpenAI publishes paper on variational option discovery algorithms. The paper highlights a tight connection between variational option discovery methods and variational autoencoders, and introduces Variational Autoencoding Learning of Options by Reinforcement (VALOR), a new method derived from the connection.[40] |
2018 | August 1 | Robotics | Publication | OpenAI publishes paper describing the use of reinforcement learning to learn dexterous in-hand manipulation policies which can perform vision-based object reorientation on a physical Shadow Dexterous Hand.[41] |
2018 | October 2 | Generative models | Publication | OpenAI publishes paper on FFJORD (free-form continuous dynamics for scalable reversible generative models), aiming to demonstrate their approach on high-dimensional density estimation, image generation, and variational inference.[42] |
2018 | October 19 | Reinforcement learning | Publication | OpenAI publishes paper proposing Iterated Amplification, an alternative training strategy which progressively builds up a training signal for difficult problems by combining solutions to easier subproblems.[43] |
2018 | November 1 | Publication | OpenAI publishes research paper detailing AI able to defeat humans at the retro platformer Montezuma’s Revenge. The top-performing iteration found 22 of the 24 rooms in the first level, and occasionally discovered all 24.[44][45] | |
2018 | November 5 | Reinforcement learning | Publication | OpenAI publishes paper proposing a plan online and learn offline (POLO) framework for the setting where an agent, with an internal model, needs to continually act and learn in the world.[46] |
2018 | December 14 | Reinforcement learning | Publication | OpenAI publishes paper demonstrating that a simple and easy-to-measure statistic called the gradient noise scale predicts the largest useful batch size across many domains and applications, including a number of supervised learning datasets, reinforcement learning domains, and even generative model training.[47] |
2019 | February 4 | Publication | OpenAI publishes paper showing computational limitations in robust classification and win-win results.[48] | |
2019 | March 2 | Publication | OpenAi publishes paper presenting an artificial intelligence research environment that aims to simulate the natural environment setting in microcosm.[49] | |
2019 | March 20 | Publication | OpenAI publishes paper presenting techniques to scale MCMC based energy base models training on continuous neural networks.[50] | |
2019 | May 3 | Publication | OpenAI publishes study on the transfer of adversarial robustness of deep neural networks between different perturbation types.[51] | |
2019 | May 28 | Publication | OpenAI publishes study on the dynamics of Stochastic Gradient Descent (SGD) in learning deep neural networks for several real and synthetic classification tasks.[52] | |
2019 | July 10 | Publication | OpenAI publishes paper arguing that competitive pressures could incentivize AI companies to underinvest in ensuring their systems are safe, secure, and have a positive social impact.[53] | |
2020 | January 23 | Publication | OpenAI publishes study on empirical scaling laws for language model performance on the cross-entropy loss.[54] |
References
- ↑ Miyato, Takeru; Dai, Andrew M.; Goodfellow, Ian. "Adversarial Training Methods for Semi-Supervised Text Classification". arxiv.org. Retrieved 28 March 2020.
- ↑ Metz, Cade (July 29, 2016). "How To Fool AI Into Seeing Something That Isn't There". WIRED. Retrieved March 3, 2018.
- ↑ Christiano, Paul; Shah, Zain; Mordatch, Igor; Schneider, Jonas; Blackwell, Trevor; Tobin, Joshua; Abbeel, Pieter; Zaremba, Wojciech. "Transfer from Simulation to Real World through Learning Deep Inverse Dynamics Model". arxiv.org. Retrieved 28 March 2020.
- ↑ Papernot, Nicolas; Abadi, Martín; Erlingsson, Úlfar; Goodfellow, Ian; Talwar, Kunal. "Semi-supervised Knowledge Transfer for Deep Learning from Private Training Data". arxiv.org. Retrieved 28 March 2020.
- ↑ Price, Eric; Zaremba, Wojciech; Sutskever, Ilya. "Extensions and Limitations of the Neural GPU". arxiv.org. Retrieved 28 March 2020.
- ↑ Chen, Xi; Kingma, Diederik P.; Salimans, Tim; Duan, Yan; Dhariwal, Prafulla; Schulman, John; Sutskever, Ilya; Abbeel, Pieter. "Variational Lossy Autoencoder". arxiv.org.
- ↑ Duan, Yan; Schulman, John; Chen, Xi; Bartlett, Peter L.; Sutskever, Ilya; Abbeel, Pieter. "RL2: Fast Reinforcement Learning via Slow Reinforcement Learning". arxiv.org. Retrieved 28 March 2020.
- ↑ Finn, Chelsea; Christiano, Paul; Abbeel, Pieter; Levine, Sergey. "A Connection between Generative Adversarial Networks, Inverse Reinforcement Learning, and Energy-Based Models". arxiv.org. Retrieved 28 March 2020.
- ↑ Wu, Yuhuai; Burda, Yuri; Salakhutdinov, Ruslan; Grosse, Roger. "On the Quantitative Analysis of Decoder-Based Generative Models". arxiv.org. Retrieved 28 March 2020.
- ↑ "#Exploration: A Study of Count-Based Exploration for Deep Reinforcement Learning". arxiv.org. Retrieved 28 March 2020.
- ↑ "Faulty Reward Functions in the Wild". openai.com. Retrieved 5 April 2020.
- ↑ Salimans, Tim; Karpathy, Andrej; Chen, Xi; Kingma, Diederik P. "PixelCNN++: Improving the PixelCNN with Discretized Logistic Mixture Likelihood and Other Modifications". arxiv.org. Retrieved 28 March 2020.
- ↑ Huang, Sandy; Papernot, Nicolas; Goodfellow, Ian; Duan, Yan; Abbeel, Pieter. "Adversarial Attacks on Neural Network Policies". arxiv.org. Retrieved 28 March 2020.
- ↑ Stadie, Bradly C.; Abbeel, Pieter; Sutskever, Ilya. "arxiv.org". arxiv.org. Retrieved 28 March 2020.
- ↑ Salimans, Tim; Ho, Jonathan; Chen, Xi; Sidor, Szymon; Sutskever, Ilya. "Evolution Strategies as a Scalable Alternative to Reinforcement Learning". arxiv.org. Retrieved 28 March 2020.
- ↑ Mishra, Nikhil; Abbeel, Pieter; Mordatch, Igor. "Prediction and Control with Temporal Segment Models". arxiv.org. Retrieved 28 March 2020.
- ↑ Mordatch, Igor; Abbeel, Pieter. "Emergence of Grounded Compositional Language in Multi-Agent Populations". arxiv.org. Retrieved 26 March 2020.
- ↑ Tobin, Josh; Fong, Rachel; Ray, Alex; Schneider, Jonas; Zaremba, Wojciech; Abbeel, Pieter. "Domain Randomization for Transferring Deep Neural Networks from Simulation to the Real World". arxiv.org. Retrieved 28 March 2020.
- ↑ "One-Shot Imitation Learning". arxiv.org. Retrieved 28 March 2020.
- ↑ "Robots that Learn". openai.com. Retrieved 5 April 2020.
- ↑ "Multi-Agent Actor-Critic for Mixed Cooperative-Competitive Environments". arxiv.org.
- ↑ "Domain Randomization and Generative Models for Robotic Grasping". arxiv.org. Retrieved 27 March 2020.
- ↑ Bin Peng, Xue; Andrychowicz, Marcin; Zaremba, Wojciech; Abbeel, Pieter. "Sim-to-Real Transfer of Robotic Control with Dynamics Randomization". arxiv.org. Retrieved 26 March 2020.
- ↑ Frans, Kevin; Ho, Jonathan; Chen, Xi ChenXi; Abbeel, Pieter; Schulman, John. "Meta Learning Shared Hierarchies". arxiv.org. Retrieved 26 March 2020.
- ↑ Grathwohl, Will; Choi, Dami; Wu, Yuhuai; Roeder, Geoffrey; Duvenaud, David. "Backpropagation through the Void: Optimizing control variates for black-box gradient estimation". arxiv.org. Retrieved 26 March 2020.
- ↑ Milli, Smitha; Abbeel, Pieter; Mordatch, Igor. "Interpretable and Pedagogical Examples". arxiv.org. Retrieved 26 March 2020.
- ↑ Louizos, Christos; Welling, Max; Kingma, Diederik P. "Learning Sparse Neural Networks through L0 Regularization". arxiv.org. Retrieved 26 March 2020.
- ↑ Vincent, James (December 1, 2017). "Artificial intelligence isn't as clever as we think, but that doesn't stop it being a threat". The Verge. Retrieved March 2, 2018.
- ↑ Raiman, Jonathan; Raiman, Olivier. "DeepType: Multilingual Entity Linking by Neural Type System Evolution". arxiv.org. Retrieved 26 March 2020.
- ↑ Houthooft, Rein; Chen, Richard Y.; Isola, Phillip; Stadie, Bradly C.; Wolski, Filip; Ho, Jonathan; Abbeel, Pieter. "Evolved Policy Gradients". arxiv.org. Retrieved 26 March 2020.
- ↑ "Multi-Goal Reinforcement Learning: Challenging Robotics Environments and Request for Research". arxiv.org. Retrieved 26 March 2020.
- ↑ Stadie, Bradly C.; Yang, Ge; Houthooft, Rein; Chen, Xi; Duan, Yan; Wu, Yuhuai; Abbeel, Pieter; Sutskever, Ilya. "Some Considerations on Learning to Explore via Meta-Reinforcement Learning". arxiv.org. Retrieved 26 March 2020.
- ↑ Nichol, Alex; Achiam, Joshua; Schulman, John. "On First-Order Meta-Learning Algorithms". arxiv.org. Retrieved 26 March 2020.
- ↑ Salimans, Tim; Zhang, Han; Radford, Alec; Metaxas, Dimitris. "Improving GANs Using Optimal Transport". arxiv.org. Retrieved 26 March 2020.
- ↑ Wu, Cathy; Rajeswaran, Aravind; Duan, Yan; KumarVikash Kumar, Vikash; Bayen, Alexandre M; Kakade, Sham; Mordatch, Igor; Abbeel, Pieter. "Variance Reduction for Policy Gradient with Action-Dependent Factorized Baselines". arxiv.org. Retrieved 26 March 2020.
- ↑ Nichol, Alex; Pfau, Vicki; Hesse, Christopher; Klimov, Oleg; Schulman, John. "Gotta Learn Fast: A New Benchmark for Generalization in RL". arxiv.org. Retrieved 26 March 2020.
- ↑ Huang, Daniel; Dhariwal, Prafulla; Song, Dawn; Sutskever, Ilya. "GamePad: A Learning Environment for Theorem Proving". arxiv.org. Retrieved 26 March 2020.
- ↑ "Learning Policy Representations in Multiagent Systems". arxiv.org. Retrieved 26 March 2020.
- ↑ Kingma, Diederik P.; Dhariwal, Prafulla. "Glow: Generative Flow with Invertible 1x1 Convolutions". arxiv.org. Retrieved 26 March 2020.
- ↑ Achiam, Joshua; Edwards, Harrison; Amodei, Dario; Abbeel, Pieter. "Variational Option Discovery Algorithms". arxiv.org.
- ↑ "Learning Dexterous In-Hand Manipulation". arxiv.org. Retrieved 26 March 2020.
- ↑ Grathwohl, Will; Chen, Ricky T. Q.; Bettencourt, Jesse; Sutskever, Ilya; Duvenaud, David. "FFJORD: Free-form Continuous Dynamics for Scalable Reversible Generative Models". arxiv.org. Retrieved 26 March 2020.
- ↑ Christiano, Paul; Shlegeris, Buck; Amodei, Dario. "Supervising strong learners by amplifying weak experts". arxiv.org. Retrieved 26 March 2020.
- ↑ Wiggers, Kyle. "OpenAI made a system that's better at Montezuma's Revenge than humans". venturebeat.com. Retrieved 15 June 2019.
- ↑ Vincent, James. "New research from OpenAI uses curious AI to beat video games". theverge.com. Retrieved 15 June 2019.
- ↑ Lowrey, Kendall; Rajeswaran, Aravind; Kakade, Sham; Todorov, Emanuel; Mordatch, Igor. "Plan Online, Learn Offline: Efficient Learning and Exploration via Model-Based Control". arxiv.org.
- ↑ McCandlish, Sam; Kaplan, Jared; Amodei, Dario; OpenAI Dota Team. "An Empirical Model of Large-Batch Training". arxiv.org. Retrieved 25 March 2020.
- ↑ Degwekar, Akshay; Nakkiran, Preetum; Vaikuntanathan, Vinod. "Computational Limitations in Robust Classification and Win-Win Results". arxiv.org. Retrieved 25 March 2020.
- ↑ Suarez, Joseph; Du, Yilun; Isola, Phillip; Mordatch, Igor. "Neural MMO: A Massively Multiagent Game Environment for Training and Evaluating Intelligent Agents". arxiv.org. Retrieved 25 March 2020.
- ↑ Du, Yilun; Mordatch, Igor. "Implicit Generation and Generalization in Energy-Based Models". arxiv.org. Retrieved 25 March 2020.
- ↑ Kang, Daniel; Sun, Yi; Brown, Tom; Hendrycks, Dan; Steinhardt, Jacob. "Transfer of Adversarial Robustness Between Perturbation Types". arxiv.org. Retrieved 25 March 2020.
- ↑ Nakkiran, Preetum; Kaplun, Gal; Kalimeris, Dimitris; Yang, Tristan; Edelman, Benjamin L.; Zhang, Fred; Barak, Boaz. "SGD on Neural Networks Learns Functions of Increasing Complexity". arxiv.org. Retrieved 25 March 2020.
- ↑ Askell, Amanda; Brundage, Miles; Hadfield, Gillian. "The Role of Cooperation in Responsible AI Development". arxiv.org. Retrieved 25 March 2020.
- ↑ Kaplan, Jared; McCandlish, Sam; Henighan, Tom; Brown, Tom B.; Chess, Benjamin; Child, Rewon; Gray, Scott; Radford, Alec; Wu, Jeffrey; Amodei, Dario. "Scaling Laws for Neural Language Models". arxiv.org. Retrieved 25 March 2020.