Timeline of AI safety

From Timelines
Revision as of 00:10, 6 July 2024 by Vipul (talk | contribs)
Jump to: navigation, search
The timeline currently offers focused coverage of the period until November 2, 2023. It is likely to miss important developments outside this period (particularly after this period) though it may have a few events from after this period.

This is a timeline of AI safety. AI safety is the field focused on reducing risks from artificial intelligence (AI).[1][2]

Big picture

Overall summary

Time period Development summary More details
Until 1950 Fictional portrayals only Most discussion of AI safety is in the form of fictional portrayals. It warns of the risks of robots who, through either stupidity or lack of goal alignment, no longer remain under the control of humans.
1950 to 2000 Scientific speculation + fictional portrayals During this period, discussion of AI safety moves from merely being a topic of fiction to one that scientists who study technological trends start talking about. The era sees commentary by I. J. Good, Vernor Vinge, and Bill Joy.
2000 to 2012 Birth of AI safety organizations, close connections with transhumanism This period sees the creation of the Singularity Institute for Artificial Intelligence (SIAI) (which would later become the Machine Intelligence Research Institute (MIRI)) and the evolution of its mission from creating friendly AI to reducing the risk of unfriendly AI. The Future of Humanity Institute (FHI) and Global Catastrophic Risk Institute (GCRI) are also founded. AI safety work during this time is closely tied to transhumanism and has close connections with techno-utopianism. Peter Thiel and Jaan Tallinn are key funders of the early ecosystem.
2013 to 2022 Mainstreaming of AI safety, separation from transhumanism SIAI changes name to MIRI, sells off the "Singularity" brand to Singularity University, grows considerably in size, and gets a lot of funding. Superintelligence, the book by Nick Bostrom, is released. The Future of Life Institute (FLI) and OpenAI are started, and the latter grows considerably. New organizations founded include the Center for the Study of Existential Risk (CSER), Leverhulme Centre for the Future of Intelligence (CFI), Future of Life Institute (FLI), OpenAI, Center for Human-Compatible AI (CHAI), Berkeley Existential Risk Initiative (BERI), Ought, and the Center for Security and Emerging Technology (CSET). OpenAI in particular becomes quite famous and influential. Prominent individuals such as Elon Musk, Sam Altman, and Bill Gates talk about the importance of AI safety and the risks of unfriendly AI. Key funders of this ecosystem include Open Philanthropy and Elon Musk.
Late 2022 onward Explosion of AI in the public consciousness OpenAI releases ChatGPT (timeline) in late 2022. This triggers an AI "arms race" with players including OpenAI (ChatGPT), Anthropic (Claude), Microsoft (Bing Chat, Copilot (timeline), and Google (Google Bard (timeline), Gemini). People's estimates of the probability of short AI timelines increase. Beginning with an open letter by the Future of Life Institute, there is a rise in calls to shut down AI development, pause AI development, and institute Responsible Scaling Policies.

Highlights by year (2013 onward)

Year Highlights
2013 Research and outreach focused on forecasting and timelines continue. Connections with the nascent effective altruism movement strengthen. The Center for the Study of Existential Risk and the Foundational Research Institute launch.
2014 Superintelligence: Paths, Dangers, Strategies by Nick Bostrom is published. The Future of Life Institute is founded and AI Impacts launches. AI safety gets more mainstream attention, including from Elon Musk, Stephen Hawking, and the fictional portrayal Ex Machina. While forecasting and timelines remain a focus of AI safety efforts, the effort shifts toward the technical AI safety agenda, with the launch of the Intelligent Agent Foundations Forum.
2015 AI safety continues to get more mainstream, with the founding of OpenAI (supported by Elon Musk and Sam Altman) and the Leverhulme Centre for the Future of Intelligence, the Open Letter on Artificial Intelligence, the Puerto Rico conference, and coverage on Wait But Why. This also appears to be the last year that Peter Thiel donates in the area.
2016 Open Philanthropy makes AI safety a focus area; it would ramp up giving in the area considerably starting around this time. The landmark paper "Concrete Problems in AI Safety" is published, and OpenAI's safety work picks up pace. The Center for Human-Compatible AI launches. The annual tradition of LessWrong posts providing an AI alignment literature review and charity comparison for the year begins. AI safety continues to get more mainstream, with the Partnership on AI and the Obama administration's efforts to understand the subject.
2017 This is a great year for cryptocurrency prices, causing a number of donations to MIRI from people who got rich through cryptocurrency. The AI safety funding and support landscape changes somewhat with the launch of the Berkeley Existential Risk Initiative (BERI) (and funding of its grants program by Jaan Tallinn) and the Effective Altruism Funds, specifically the Long-Term Future Fund. Open Philanthropy makes several grants in AI safety, including a $30 million grant to OpenAI and a $3.75 million grant to MIRI. AI safety attracts dismissive commentary from Mark Zuckerberg, while Elon Musk continues to highlight its importance. The year begins with the Asilomar Conference and the Asilomar AI Principles, and initiatives such as AI Watch and the AI Alignment Prize begin toward the end of the year.
2018 Activity in the field of AI safety becomes more steady, in terms of both ongoing discussion (with the launch of the AI Alignment Newsletter, AI Alignment Podcast, and Alignment Forum) and funding (with structural changes to the Long-Term Future Fund to make it grant more regularly, the introduction of the annual Open Philanthropy AI Fellowship grants, and more grantmaking by BERI). Near the end of the year, MIRI announces its nondisclosure-by-default policy. Ought, Median Group, and the Stanford Center for AI Safety launch during the year.
2019 The Center for Security and Emerging Technology (CSET), that is focused on AI safety and other security risks, launches with a 5-year $55 million grant from Open Philanthropy. The Stanford Institute for Human-Centered Artificial Intelligence (HAI) launches. Grantmaking from the Long-Term Future Fund picks up pace; BERI hands off its grantmaking of Jaan Tallinn's money to the Survival and Flourishing Fund (SFF). Open Philanthropy begins using the Committee for Effective Altruism Support to decide grant amounts for some of its AI safety grants, including grants to MIRI. OpenAI unveils its GPT-2 model but does not release the full model initially; this sparks discussion on disclosure norms.
2020 Andrew Critch and David Krueger release their ARCHES paper. OpenAI unveils GPT-3, leading to further discussion of AI safety implications. AI Safety Support launches. The funding ecosystem continues to mature: Open Philanthropy and the Survival and Flourishing Fund continue to make large grants to established organizations, while the Long-Term Future Fund increasingly shifts focus to donating to individuals.
2021 Several new AI safety organizations start, including: Anthropic (by Dario and Daniela Amodei, along with several others who left OpenAI in late 2020 or early 2021), Alignment Research Center (by Paul Christiano, who also left OpenAI around that time), and Redwood Research (which appears to have started in 2019, but taken off and scaled up in 2021). A bunch of conversations related to AI safety, many of them including Eliezer Yudkowsky and other researchers (MIRI and non-MIRI), are published in late 2021. Jaan Tallinn's grantmaking, mostly via the Survival and Flourishing Fund, and a lot of it for AI safety, increases a lot in 2021, to $15-20 million, mostly related to the significant increase in Ethereum prices combined with his 2020 pledge.[3]
2022 This year sees the entry of the FTX Future Fund into the AI safety funding space. Anthropic continues to grow by raising a massive Series B, led by the team from FTX and Alameda Research. In November, the collapse of FTX results in the collapse of the FTX Future Fund and uncertainty about the fate of various projects started by the FTX Future Fund. The release by OpenAI of ChatGPT (based on GPT-3.5) at the end of the year raises the profile of AI (and also of AI safety), with the implications only beginning to unfold.
2023 The year sees an "arms race" of AI tools from leading companies such as Microsoft and Google, as well as AI-focused organizations such as OpenAI and Anthropic, with the successful release of ChatGPT in late 2022 a likely trigger. As a result, the profile of AI as well as its critics -- coming from the AI safety perspective as well as other angles -- rises quite a bit. Beginning with an open letter by the Future of Life Institute, many parties call for a pause to AI, with "Pause AI" a rallying term. There are also calls to shut down AI development for the time being, and there is movement towards AI capabilities organizations implementing Responsible Scaling Policies (RSPs). Governments get more seriously involved in AI safety as part of their broader effort to catch up with AI, with the UK government setting up a Foundation Model Taskforce, the US government issuing an executive order of AI, and an international AI Safety Summit in Bletchley Park in the United Kingdom leading to the Bletchley Declaration signed by 28 countries including the United States, United Kingdom, European Union, and China. People's estimates of the probability of shorter AI timelines increase. In late 2023, an attempt to remove Sam Altman from OpenAI fails, and the failure is considered by some to be a failure for AI safety, given Altman's desire to move fast with AI.

Full timeline

Inclusion criteria

Here is a partial list of thoughts on what rows we've included:

  • For AI safety organizations, we have tried to include a row for their launch where available. When an exact launch date is not available, we have tried to include an approximate launch date based on the history of the website or other public announcements.
  • Most grants from Open Philanthropy in AI safety are included. Some grants may be omitted because of small size. For recent grants, we may not have added them yet.
  • We have tried to include important fictional portrayals and scientific speculation related to AI before 2000, as comprehensively as possible. For the period from 2000 onward, we have been more selective, and focused more on relatively notable commentary or commentary that played a role in changing the AI safety conversation in specific communities.

Timeline

Year Month and date Event type Details
1630–1650 Fictional portrayal The publication of the story of the Golem of Chełm dates to around this period. Wikipedia: "Golems are not intelligent, and if commanded to perform a task, they will perform the instructions literally. In many depictions Golems are inherently perfectly obedient. In its earliest known modern form, the Golem of Chełm became enormous and uncooperative. In one version of this story, the rabbi had to resort to trickery to deactivate it, whereupon it crumbled upon its creator and crushed him."
1818 Fictional portrayal The novel Frankenstein is published. Frankenstein pioneers the archetype of the artificial intelligence that turns against its creator, and is sometimes discussed in the context of an AI takeoff.[4][5][6]
1863 June Publication In Darwin among the Machines, Samuel Butler raises the possibility that intelligent machines will eventually supplant humans as the dominant form of life.[7]
1920 Fictional portrayal The science fiction play R.U.R. is published. The play introduces the word "robot" to the English language and the plot contains a robot rebellion that leads to human extinction.
1942 March Fictional portrayal The Three Laws of Robotics are introduced by Isaac Asimov in his short story "Runaround".
1947 July Fictional portrayal With Folded Hands, a novelette by Jack Williamson, is published. The novelette describes how advanced robots (humanoids) take over large parts of the world to fulfill their Prime Directive, which is to make humans happy.
1948 Publication In The general and logical theory of automata, John von Neumann articulates the idea of self-improving AI. Notable quote: "There is, however, a certain minimum level where this degenerative characteristic ceases to be universal. At this point automata which can reproduce themselves, or even construct higher entities, become possible."[7] He would expand on this idea further in 1949, coming close to articulating what is now called an "intelligence explosion."[7]
1950 Publication Alan Turing publishes Computing Machinery and Intelligence in the philosophy journal Mind. The paper introduces the concept of the Turing test, the simple idea of which is that a machine would be said to have achieved human-level intelligence if it can convince a human that it is human. The Turing test would become a key part of discussions around benchmarking artificial intelligence, and also enter popular culture over the next few decades.[8][9]
1960 May 6 Publication Norbert Wiener's article Some Moral and Technical Consequences of Automation is published.[10] In 2013, Jonah Sinick would note the similarities between the points raised in this article and the thinking of AI safety leader Eliezer Yudkowsky.[11] Wiener's work would also be cited by Allan Dafoe and Stuart Russell in 2016: "Rather, the risk arises from the unpredictability and potential irreversibility of deploying an optimization process more intelligent than the humans who specified its objectives. This problem was stated clearly by Norbert Wiener in 1960, and we still have not solved it."[12]
1965, 1966 Publication I. J. Good originates the concept of intelligence explosion in "Speculations Concerning the First Ultraintelligent Machine" where he says that an ultraintelligent machine would be "the last invention that man need ever make, provided that the machine is docile enough to tell us how to keep it under control."[13] This is written in 1965 though formally published in 1966.[14]
1966 Fictional portrayal The science fiction novel Colossus by British author Dennis Feltham Jones is published. In the novel, both the United States and the USSR develop supercomputers, called Colossus and Guardian respectively, that they connect with each other to share knowledge. The supercomputers coordinate with each other to become increasingly powerful, and humans belatedly try to regain control.[7]
1966 Publication In Other Worlds than Ours, Cecil Maxwell argues that nations would invest in building powerful intelligent machines and surrender decisionmaking to them, and any nation that succeeded would grow into a major world power. He writes: It seems that, in the foreseeable future, the major nations of the world will have to face the alternative of surrendering national control to mechanical ministers, or being dominated by other nations which have already done this. Such a process will eventually lead to the domination of the whole Earth by a dictatorship of an unparalleled type — a single supreme central authority.[7]
1974 Publication In The Ignorance Explosion, Julius Lukasiewicz argues that it is very hard to predict the future after we have machine superintelligence.[7]
1977 Fictional portrayal The science fiction novel The Adolescence of P-1 by Thomas Joseph Ryan is published. It "tells the story of an intelligent worm that at first is merely able to learn to hack novel computer systems and use them to propagate itself, but later (1) has novel insights on how to improve its own intelligence, (2) develops convergent instrumental subgoals for self-preservation and resource acquisition, and (3) learns the ability to fake its own death so that it can grow its powers in secret and later engage in a "treacherous turn"against humans."[7]
1979 Publication In the book Machines Who Think by Pamela McCorduck, Edward Fredkin descibes the challenge of controlling smarter-than-human artificial general intelligences. He describes the high communication bandwidth between superintelligent machines, the goal-directed nature of intelligent systems, and the challenge of aligning the goals of the machines with those of humans.[15]
1984 October 26 Fictional portrayal The American science fiction film The Terminator is released. The film contains the first appearance of Skynet, a "neural net-based conscious group mind and artificial general intelligence" that "seeks to exterminate the human race in order to fulfill the mandates of its original coding".
1985 Fictional portrayal In Robots and Empire, Isaac Asimov introduces the Zeroth Law of Robotics, which states: "A robot may not injure humanity, or through inaction, allow humanity to come to harm." This is analogous to, and must take precedence over, the Fist Law "A robot may not injure a human being, or through inaction, allow a human being to come to harm." The law was self-programmed by the robot R. Daneel Olivaw based on the ideas of R. Giskard Reventlov. In particular, the zeroth law now allows a robot to harm or allow harm to individual humans for the greater good of humanity.
1987 Publication In an article A Question of Responsibility for AI Magazine, Mitchell Waldrop introduces the term machine ethics. He writes: However, one thing that is apparent from the above discussion is that intelligent machines will embody values, assumptions, and purposes, whether their programmers consciously intend them to or not. Thus, as computers and robots become more and more intelligent, it becomes imperative that we think carefully and explicitly about what those built-in values are. Perhaps what we need is, in fact, a theory and practice of machine ethics, in the spirit of Asimov’s three laws of robotics.[16]
1988 October 1 Publication The book Mind Children: The Future of Robot and Human Intelligence by Hans Moravec is published. The book says that machines with human-level intelligence are feasible within fifty years, and should be thought of as the "mind children" of humans. The book would be listed in a 2013 MIRI blog post in a list of past work on AI safety and the AI takeoff.[17]
1993 Publication Vernor Vinge's article "The Coming Technological Singularity: How to Survive in the Post-Human Era" is published. The article popularizes the idea of an intelligence explosion.[18]
1996 May 1 Publication The book Reflections on Artificial Intelligence: The Legal, Moral and Ethical Dimensions by Blay Whitby is published. A 2013 blog post by MIRI would list this as part of "earlier work on the topic" of AI safety.[17]
1998 December 3 Publication The book Robot: Mere Machine to Transcendent Mind by Hans Moravec is published. The book argues that human levels of intelligence are achievable by 2040, and has an optimistic take on this future: "Intelligent machines, which will grow from us, learn our skills, and share our goals and values, can be viewed as children of our minds." It continues the themes of Moravec's previous book Mind Children, and would be cited in a 2013 MIRI blog post in a list of past work related to AI safety and the AI takeoff.[17]
1999 January 1 Publication The book The Age of Spiritual Machines by Ray Kurzweil is published. The book describes Kurzweil's utopian vision of technological progress and the path to computers achieving superhuman intelligence.
2000 April Publication Bill Joy's article "Why The Future Doesn't Need Us" is published in Wired.
2000 July 27 Organization Machine Intelligence Research Institute (MIRI) is founded as the Singularity Institute for Artificial Intelligence (SIAI) by Brian Atkins, Sabine Atkins (then Sabine Stoeckel) and Eliezer Yudkowsky. The organization's mission ("organization's primary exempt purpose" on Form 990) at the time is "Create a Friendly, self-improving Artificial Intelligence"; this mission would be in use during 2000–2006 and would change in 2007.[19]:3[20]
2001 June 15 Publication Version 1.0 of "Creating Friendly AI" by Eliezer Yudkowsky is published.[21]
2002 March 8 AI box The first AI box experiment by Eliezer Yudkowsky, against Nathan Russell as gatekeeper, takes place. The AI is released.[22]
2002 July 4–5 AI box The second AI box experiment by Eliezer Yudkowsky, against David McFadzean as gatekeeper, takes place. The AI is released.[23]
2002 October 31 Publication Bill Hibbard's Super-Intelligent Machines is published.[24]
2003 Publication Nick Bostrom's paper "Ethical Issues in Advanced Artificial Intelligence" is published. The paper introduces the paperclip maximizer thought experiment.[25]
2004 November 11 Publication In his book Catastrophe: Risk and Response, legal scholar Richard Posner discusses risks from artificial general intelligence at some length. According to a 2013 blog post from the Machine Intelligence Research Institute: "His analysis is interesting in part because it appears to be intellectually independent from the Bostrom-Yudkowsky tradition that dominates the topic today. [...] Still, much of Posner’s analysis is consistent with the basic points of the Bostrom-Yudkowsky tradition [...] One major point of divergence seems to be that Posner worries about a scenario in which AGIs become self-aware, re-evaluate their goals, and decide not to be “bossed around by a dumber species” anymore."[17]
2005 Organization The Future of Humanity Institute (FHI) is founded.[26]
2005 Conference The field of machine ethics is delineated at the Fall 2005 Symposium on Machine Ethics held by the Association for Advancement of Artificial Intelligence.[27]
2005 August 21 AI box The third AI box experiment by Eliezer Yudkowsky, against Carl Shulman as gatekeeper, takes place. The AI is released.[28]
2005 Publication The Singularity is Near by inventor and futurist Ray Kurzweil is published. The book builds upon Kurzweil's previous books The Age of Intelligent Machines (1990) and The Age of Spiritual Machines (1999), but unlike its predecessors, uses the term technological singularity introduced by Vinge in 1993. Unlike Bill Joy, Kurzweil takes a very positive view of the impact of smarter-than-human AI and the upcoming (in his view) technological singularity.
2006 November Robin Hanson starts Overcoming Bias.[29] Eliezer Yudkowsky's posts on Overcoming Bias would form seed material for LessWrong, which would grow to be an important community for discussion related to AI safety.
2008 Publication Steve Omohundro's paper "The Basic AI Drives" is published. The paper argues that certain drives, such as self-preservation and resource acquisition, will emerge in any sufficiently advanced AI. The idea would subsequently be defended by Nick Bostrom as part of his instrumental convergence thesis.[30]
2008 Publication Global Catastrophic Risks is published. The book includes Eliezer Yudkowsky's chapter "Artificial Intelligence as a Positive and Negative Factor in Global Risk".[31]
2008 October 13 Publication Moral Machines: Teaching Robots Right from Wrong by Wendell Wallach and Colin Allen is published by Oxford University Press. The book advertises itself as "the first book to examine the challenge of building artificial moral agents, probing deeply into the nature of human decision making and ethics."
2008 November–December Outside review The AI-Foom debate between Robin Hanson and Eliezer Yudkowsky takes place. The blog posts from the debate would later be turned into an ebook by MIRI.[32][33]
2009 February Project Eliezer Yudkowsky starts LessWrong using as seed material his posts on Overcoming Bias.[34] On the 2009 accomplishments page, MIRI describes LessWrong as being "important to the Singularity Institute's work towards a beneficial Singularity in providing an introduction to issues of cognitive biases and rationality relevant for careful thinking about optimal philanthropy and many of the problems that must be solved in advance of the creation of provably human-friendly powerful artificial intelligence". And: "Besides providing a home for an intellectual community dialoguing on rationality and decision theory, Less Wrong is also a key venue for SIAI recruitment. Many of the participants in SIAI's Visiting Fellows Program first discovered the organization through Less Wrong."[35]
2009 December 11 Publication The third edition of Artificial Intelligence: A Modern Approach by Stuart J. Russell and Peter Norvig is published. In this edition, for the first time, Friendly AI is mentioned and Eliezer Yudkowsky is cited.[36][37]
2010 Organization DeepMind is founded by Demis Hassabis, Shane Legg, and Mustafa Suleyman. Legg had previously received the $10,000 Canadian Singularity Institute for Artificial Intelligence Prize.[38]
2010 Organization Vicarious is founded by Scott Phoenix and Dileep George. The company "has publicly expressed some concern about potential risks from future AI development" and the founders are signatories on the FLI open letter.[39]
2011 Publication Baum, Goertzel, and Goertzel's "How Long Until Human-Level AI? Results from an Expert Assessment" is published.[40]
2011 Organization The Global Catastrophic Risk Institute (GCRI) is founded by Seth Baum and Tony Barrett.[41]
2011 Organization Google Brain is started by Jeff Dean, Greg Corrado, and Andrew Ng.
2011–2013 Project Sometime during this period, the Back of the Envelope Guide to Philanthropy, a website created by Gordon Irlam, includes prevention of "hostile artificial intelligence" as a top 10 philanthropic opportunity by impact.[42][43]
2011 September Organization The Oxford Martin Programme on the Impacts of Future Technology (FutureTech) launches.[44]
2013 Publication Luke Muehlhauser's book Facing the Intelligence Explosion is published.[45]
2013 April 13 MIRI publishes an update on its strategy on its blog. In the blog post, MIRI executive director Luke Muehlhauser states that MIRI plans to put less effort into public outreach and shift its research to Friendly AI math research.[46]
2013 July Organization The Center for the Study of Existential Risk (CSER) launches.[47][48]
2013 July Organization The Foundational Research Institute (FRI) is founded. Some of FRI's work discusses risks from artificial intelligence.[49]
2013 July 8 Publication Luke Muehlhauser's blog post Four Focus Areas of Effective Altruism is published. The four focus areas listed are poverty reduction, meta effective altruism, the long term future, and animal suffering. AI safety concerns fall under the "long term future" focus area.[50] This identification of focus areas would persist for several years, and would also be incorporated into the design of the Effective Altruism Funds in 2017. In particular, the blog post encapsulates the central position of AI safety in the then-nascent effective altruist movement.
2013 October 1 Publication Our Final Invention: Artificial Intelligence and the End of the Human Era by James Barrat is published. The book discusses risks from human-level of superhuman artificial intelligence.
2014 Publication Müller and Bostrom's "Future Progress in Artificial Intelligence: A Survey of Expert Opinion" is published.[51]
2014 January 26 Google announces that it has acquired DeepMind. At the same time, it sets up an AI ethics board. DeepMind co-founders Shane Legg and Demis Hassabis, as well as AI safety funders Peter Thiel and Jaan Tallinn, are believed to have been influential in the process.[52][53]
2014 March–May Organization Future of Life Institute (FLI) is founded.[54]
2014 July–September Publication Nick Bostrom's book Superintelligence: Paths, Dangers, Strategies is published. The book would have tremendous impact on defining and raising the status of AI safety concerns, due to "its clear explanation of why superintelligent AI may have arbitrarily negative consequences and why it’s important to begin addressing the issue well in advance."[55]
2014 August Project The AI Impacts website is launched by Paul Christiano and Katja Grace, with funding channeled through MIRI.[56] AI Impacts would grow into a leading source for understanding AI trends and scenarios, and would later receive funding from th Future of Life Institute and Open Philanthropy.
2014 Fall Project The One Hundred Year Study on Artificial Intelligence (AI100) launches.[57]
2014 October 22–24 Opinion During an interview at the AeroAstro Centennial Symposium, Elon Musk calls artificial intelligence humanity's "biggest existential threat".[58][59]
2014 November 4 Project The Intelligent Agent Foundations Forum, run by MIRI, is launched.[60]
2014 November 5 Publication The book Ethical Artificial Intelligence by Bill Hibbard is released on ArXiV. The book brings together ideas from Hibbard's past publications related to technical AI risk.[61]
2014 November Publication Edge hosts a conversation on "The Myth of AI" with an introduction by John Brockman and a lead essay by Jaron Lanier, followed by responses from "Reality Club" participants George Church, Peter Diamandis, Lee Smolin, Rodney Brooks, Nathan Myhrvold, George Dyson, Pamela McCorduck, Sendhil Mullainathan, Steven Pinker, Neal Gershenfeld, D.A. Wallach, Michael Shermer, Stuart Kauffman, Kevin Kelly, Lawrence Krauss, Robert Provine, Stuart Russell, Kai Krause.[62] Russell's response is quoted and discussed by Rob Bensinger on LessWrong.[63]
2014 December 2 Opinion In an interview with BBC, Stephen Hawking states that advanced artificial intelligence could end the human race.[64]
2014 December 16 Fictional portrayal The movie Ex Machina is released. The movie highlights the paperclip maximizer idea: it shows how a robot programmed to optimize for being able to make sure it escapes can callously damage human lives in the process. It also covers the ideas of the Turing test (a robot convincing a human that it has human-like qualities, even though the human knows that the robot is not human) and the AI box experiment (a robot convincing a human to release it from its "box" similar to the experiment proposed and carried out by Eliezer Yudkowsky). It leads to more public discussion of AI safety.[65][66][67][68]
2015 Daniel Dewey joins Open Philanthropy.[69] He begins as or would become Open Phil's program officer for potential risks from advanced artificial intelligence.
2015 Publication The 2015 question on Edge (edge.org) is "What do you think about machines that think?" The question draws responses from a number of intellectuals, including AI researchers, ethicists, AI historians, and other social commentators.[70] Responses from Nick Bostrom, Steve Omohundro, Stuart J. Russell, and Eliezer Yudkowsky discuss AI safety challenges.
2015 Organization The Strategic Artificial Intelligence Research Centre launches around this time.[71][39]
2015 January Publication The Open Letter on Artificial Intelligence, titled "Research Priorities for Robust and Beneficial Artificial Intelligence: an Open Letter", is published. Signatories include Stephen Hawking, Elon Musk, Peter Norvig, and Stuart J. Russell.
2015 January 28 Opinion During an "ask me anything" (AMA) session on reddit, Bill Gates states his concern about artificial superintelligence.[72][73]
2015 January 2–5 Conference The Future of AI: Opportunities and Challenges, an AI safety conference, takes place in Puerto Rico. The conference is organized by the Future of Life Institute.[74] Nate Soares of the Machine Intelligence Research Institute would later call this the "turning point" of when top academics begin to focus on AI risk.[75]
2015 January 22–27 Publication Tim Urban publishes on Wait But Why a two-part series of blog posts about superhuman AI.[76][77]
2015 February 25 Opinion Sam Altman, president of Y Combinator, publishes a blog post in which he writes that the development of superhuman AI is "probably the greatest threat to the continued existence of humanity".[78]
2015 May 1 Publication The Wikipedia article on existential risk from artificial general intelligence is published.[79]
2015 May 22 Publication The blog post AI Researchers On AI Risk by Scott Alexander is published. The post lists prominent AI researchers who have expressed concern about AI safety, countering claims by others that serious AI safety concerns lack buy-in from real AI researchers.[80]
2015 June 4 Opinion At Airbnb's Open Air 2015 conference, Sam Altman, president of Y Combinator, states his concern for advanced artificial intelligence and shares that he recently invested in a company doing AI safety research.[81]
2015 June 17,21 Publication The Kindle edition of Artificial Superintelligence: A Futuristic Approach by Roman Yampolskiy is published. The paperback would be published on June 21.[82] Yampolskiy takes an AI safety engineering perspective, rather than a machine ethics perspective, to the problem of AI safety.[83]
2015 July 1 Funding The Future of Life Institute's Grant Recommendations for its first round of AI safety grants are publicly announced. The grants would be disbursed on September 1. The funding for the grants comes from a $10 million grant from Elon Musk.[84][85][86]
2015 August Conference AI safety is a key theme of EA Global 2015, with the panel on AI risk including Elon Musk, Daniel Dewey, Nick Bostrom, Nate Soares and Stuart Russell.[87][88]
2015 August Funding Open Philanthropy awards a grant of $1.2 million to the Future of Life Institute, saying that they are impressed with the selections from the first round of grants funded by Elon Musk. This adds to the pool of money from Elon Musk.[89][90]
2015 August Publication Open Philanthropy publishes its cause report on potential risks from advanced artificial intelligence.[91]
2015 August 29 The ControlProblem subreddit (full title: "The Artificial General Intelligence Control Problem") is created.[92]
2015 October Publication Open Philanthropy first publishes its page on AI timelines.[93]
2015 December Organization The Leverhulme Centre for the Future of Intelligence launches around this time.[94]
2015 December 11 Organization OpenAI is announced to the public. (The news articles from this period make it sound like OpenAI launched sometime after this date.)[95][96]
2016 April 28 Publication The Global Catastrophic Risks 2016 report is published. The report is a collaboration between the Global Priorities Project and the Global Challenges Foundation.[97] The report includes discussion of risks from artificial general intelligence under "emerging risks".[98][99]
2016 April 7 Publication 80,000 Hours releases a new "problem profile" for risks from artificial intelligence, titled "Risks posed by artificial intelligence".[100][101]
2016 May Publication Luke Muehlhauser of Open Philanthropy publishes "What should we learn from past AI forecasts?".[102]
2016 May 6 Publication Holden Karnofsky, Executive Director of Open Philanthropy, publishes a blog post on why Open Philanthropy is making potential risks from artificial intelligence a major priority for the year.[103]
2016 May 6 Publication Holden Karnofsky, Executive Director of Open Philanthropy, publishes "Some Background on Our Views Regarding Advanced Artificial Intelligence" on the Open Philanthropy blog.[104]
2016 June Funding Open Philanthropy awards a grant of $264,525 to George Mason University for work by Robin Hanson.[90]
2016 June 21 Publication "Concrete Problems in AI Safety" by Dario Amodei, Chris Olah, Jacob Steinhardt, Paul Christiano, John Schulman, and Dan Mané is submitted to the arXiv.[105] The paper would receive a shoutout from Open Philanthropy.[106] It would become a landmark in AI safety literature,[107] and many of its authors would continue to do AI safety work at OpenAI in the years to come.
2016 June 27 and 28 Conference Carnegie Mellon University (CMU) hosts a two-day Workshop on Safety and Control for AI, the first day being an "Exploratory Technical Workshop" (co-sponsored by the Carnegie Mellon Science of Security Lablet) and the seconf day being a "Public Workshop" (co-sponsored by the Office of Science and Technology Policy (OSTP) and CMU). Participants include Edward Felten of OSTP, Bill Scherlis of CMU, Jason Matheny of IARPA, Eric Horvitz of Microsoft, and many others.[108]
2016 August Organization The UC Berkeley Center for Human-Compatible Artificial Intelligence launches under the leadership of AI expert Stuart J. Russell (co-author with Peter Norvig of Artificial Intelligence: A Modern Approach). The focus of the center is "to ensure that AI systems are beneficial to humans".[109]
2016 August Funding Open Philanthropy awards a grant of $5.6 million over two years to the newly formed Center for Human-Compatible AI at the University of California, Berkeley.[90]
2016 August Funding Open Philanthropy awards a grant of $500,000 to the Machine Intelligence Research Institute.[90]
2016 August 24 US president Barack Obama speaks to entrepreneur and MIT Media Lab director Joi Ito about AI risk.[110]
2016 September 20, November 2 Publication On September 20, Oren Etzioni, in a piece titled No, the Experts Don’t Think Superintelligent AI is a Threat to Humanity for MIT Technology Review, critiques Nick Bostrom's Superintelligence and includes Etzioni's own survey of AI researchers to refute Bostrom's survey data.[111] In a reply titled Yes, We Are Worried About the Existential Risk of Artificial Intelligence on November 2, Stuart J. Russell and Allan Dafoe argue that the focus of Bostrom's Superintelligence was not on the predictions of imminent AI, but "its clear explanation of why superintelligent AI may have arbitrarily negative consequences and why it’s important to begin addressing the issue well in advance."[55]
2016 September 28 Organization The Partnership on AI is publicly announced.
2016 October 12 Publication Under the Obama Administration, the United States White House releases two reports, Preparing for the Future of Artificial Intelligence and National Artificial Intelligence Research and Development Strategic Plan. The former "surveys the current state of AI, its existing and potential applications, and the questions that progress in AI raise for society and public policy".[112][113]
2016 November Funding Open Philanthropy awards a grant of $199,000 to the Electronic Frontier Foundation for work by Peter Eckersley.[90]
2016 December Funding Open Philanthropy awards a grant of $32,000 to AI Impacts for work on strategic questions related to potential risks from advanced artificial intelligence.[90]
2016 December 3, 12 Publication A couple of posts are published on LessWrong by Center for Applied Rationality (CFAR) president Anna Salamon. The posts discuss CFAR's new focus on AI safety.[114][115]
2016 December 13 Publication The "2016 AI Risk Literature Review and Charity Comparison" is published on the Effective Altruism Forum. The lengthy blog post covers all the published work of prominent organizations focused on AI safety.[116]
2017 Publication The Global Catastrophic Risks 2017 report is published.[117] The report discusses risks from artificial intelligence in a dedicated chapter.[118]
2017 Publication The Global Risks Report 2017 is published by the World Economic Forum. The report contains a section titled "Assessing the Risk of Artificial Intelligence" under "Emerging Technologies".[119]
2017 January 5 – 8 Conference The Asilomar Conference on Beneficial AI is organized by the Future of Life Institute at the Asilomar Conference Grounds. It is the successor to the 2015 Puerto Rico conference. It would result in the creation of the 23 Asilomar AI Principles, a set of guidelines for AI research.[120][121][122] This is not to be confused with the Asilomar Conference on Recombinant DNA, that has been examined by the AI safety community as a case study in risk mitigation.[123]
2017 February 9 Project The Effective Altruism Funds (EA Funds) is announced on the Effective Altruism Forum. EA Funds includes a Long-Term Future Fund that is partly intended to support "priorities for robust and beneficial artificial intelligence".[124][125]
2017 March Funding Open Philanthropy awards a grant of $2.0 million to the Future of Humanity Institute for general support.[90]
2017 March Funding Open Philanthropy awards a grant of $30 million to OpenAI for general support.[90]
2017 March 5 DeepMind releases AI Safety Gridworlds, which evaluate AI algorithms on nine safety features, such as whether the algorithm wants to turn off its own kill switch. DeepMind confirms that existing algorithms perform poorly, which is "unsurprising" because the algorithms "are not designed to solve these problems"; solving such problems might require "potentially building a new generation of algorithms with safety considerations at their core".[126][127][128]
2017 April Organization The Berkeley Existential Risk Initiative (BERI) launches around this time (under the leadership of Andrew Critch, who previously helped found the Center for Applied Rationality) to assist researchers working at institutions working to mitigate existential risk, including AI risk.[129][130]
2017 April 6 Publication 80,000 Hours publishes an article about the pros and cons of working on AI safety, titled "Positively shaping the development of artificial intelligence".[131][132]
2017 May Funding Open Philanthropy awards a grant of $1.5 million to the UCLA School of Law for work on governance related to AI risk; this would lead to the formation of AI Pulse.[90][133]
2017 May 24 Publication "When Will AI Exceed Human Performance? Evidence from AI Experts" is published on the arXiv.[134] Two researchers from AI Impacts are authors on the paper.[135]
2017 June 14 Publication 80,000 Hours publishes a guide to working in AI policy and strategy, written by Miles Brundage.[136]
2017 July Funding Open Philanthropy awards a grant of $2.4 million to the Montreal Institute for Learning Algorithms.[90]
2017 July Funding Open Philanthropy awards a grant of about $300,000 to Yale University to support research into the global politics of artificial intelligence led by Allan Dafoe.[90]
2017 July Funding Open Philanthropy awards a grant of about $400,000 to the Berkeley Existential Risk Initiative to support core functions of grantee, and to help them provide contract workers for the Center for Human-Compatible AI (CHAI) housed at the University of California, Berkeley.[90]
2017 July 15–16 Opinion At the National Governors Association in Rhode Island, Elon Musk tells US governors that artificial intelligence is an "existential threat" to humanity.[137]
2017 July 23 Opinion During a Facebook Live broadcast from his backyard, Mark Zuckerberg reveals that he is "optimistic" about advanced artificial intelligence and that spreading concern about "doomsday scenarios" is "really negative and in some ways […] pretty irresponsible".[138]
2017 October Funding Open Philanthropy awards MIRI a grant of $3.75 million over three years ($1.25 million per year). The cited reasons for the grant are a "very positive review" of MIRI's "Logical Induction" paper by an "outstanding" machine learning researcher, as well as Open Philanthropy having made more grants in the area so that a grant to MIRI is less likely to appear as an "outsized endorsement of MIRI's approach".[139][140]
2017 October Project The first commit for AI Watch, a repository of organizations, people, and products in AI safety, is made on October 23.[141] Work on the web portal at aiwatch.issarice.com would begin the next day.[142]
2017 October 13 Publication Eliezer Yudkowsky's blog post There's No Fire Alarm for Artificial General Intelligence is published on the MIRI blog and on the new LessWrong (this is shortly after the launch of the new version of LessWrong).[143][144]
2017 October  – December Project FHI launches its Governance of AI Program (later to be called the Centre for the Governance of AI, and shortened as GovAI), co-directed by Nick Bostrom and Allan Dafoe.[145]
2017 November 3 Project Zvi Mowshowitz and Vladimir Slepnev announce the AI Alignment Prize, a $5,000 prize funded by Paul Christiano for publicly posted work advancing AI alignment.[146] The prize would be discontinued after the fourth round (ending December 31, 2018) due to reduced participation.[147]
2017 December Funding Jaan Tallinn makes a donation of about $5 million to the Berkeley Existential Risk Initiative (BERI) Grants Program.[148]
2017 December 20 Publication The "2017 AI Safety Literature Review and Charity Comparison" is published. The lengthy blog post covers all the published work of prominent organizations focused on AI safety, and is a refresh of a similar post published a year ago.[149]
2017 Year-round Funding The huge increase in cryptocurrency prices in this year would drive a lot of donations to AI safety organizations from people who had held cryptocurrency and got rich from it. MIRI would not be able to match the performance of its 2017 fundraiser in later years, and would cite the unusually high cryptocurrency prices in 2017 as one possible reason.[150]
2018 Early year Organization The Median Group, a "non-profit dedicated to research on global catastrophic risks", is formed at around this time. The group would include two former MIRI researchers.[151] As of 2020, the AI safety part of its research would include an interactive model of AI progress as well as a rough estimate of the feasibility of training an AGI using reinforcement learning.[152]
2018 February 24 Publication Paul Christiano publishes a blog post titled "Takeoff speeds" that discusses slow versus fast takeoff. Christiano points out that "slow takeoff" translates to faster initial AI progress (prior to takeoff) and discusses further the implications of the slow and fast takeoff scenarios; overall, Christiano considers slow takeoff a more likely scenario.[153] He also posts a linkpost to his post on LessWrong and the Alignment Forum, that gets several comments.[154] AI Impacts also publishes its reference page "Likelihood of discontinuous progress around the development of AGI" around this time.[155] Over the next few years, there would be a lot of discussion referencing these posts.[156][157]
2018 February 28 Publication 80,000 Hours publishes a blog post A new recommended career path for effective altruists: China specialist suggesting specialization in China as a career path for people in the effective altruist movement. China's likely leading role in the development of artificial intelligence is highlighted as particularly relevant to AI safety efforts.[158]
2018 March 7 Conference During Nextgov and Defense One’s Genius Machines 2018 event, Jason Matheny of IARPA argues that there should be more focus on avoiding bad, hackable design and less concern about superintelligent AI running amok. He says "We’re much less worried about ‘Terminator’ and Skynet scenarios than we are about digital ‘Flubber’ scenarios." Also: "Really badly engineered systems that are vulnerable to either error or malicious attack from outside."[159] Matheny would later go on to lead the Center for Security and Emerging Technology (CSET), and his thinking would influence the direction of the organization.
2018 April 4 Podcast The first AI Alignment Podcast from the Future of Life Institute is on this day. The AI Alignment Podcast would happen at an approximate frequency of one per month from this point onward, but with no set cadence.[160]
2018 April 5 Documentary The documentary Do You Trust This Computer?, directed by Chris Paine, is released. It covers issues related to AI safety and includes interviews with prominent individuals relevant to AI, such as Ray Kurzweil, Elon Musk and Jonathan Nolan.
2018 April 9 Newsletter The first issue of the weekly AI Alignment Newsletter, managed by Rohin Shah of the Center for Human-Compatible AI, is sent. Over the next two years, the team responsible for producing the newsletter would grow to four people, and the newsletter would also be produced in Chinese and in podcast form.[161][162][163]
2018 April 9 Disclosure norms, competition norms OpenAI releases a charter stating that the organization commits to stop competing with a value-aligned and safety-conscious project that comes close to building artificial general intelligence, and also that OpenAI expects to reduce its traditional publishing in the future due to safety concerns.[164][165][166][167][168]
2018 April 12 to 22 Conference The first AI Safety Camp is held in Gran Canaria.[169] The AI Safety Camp team runs about two camps a year.
2018 May Funding Open Philanthropy announces the first set of grants for the Open Philanthropy AI Fellowship, to 7 AI Fellows pursuing research relevant to AI risk. It also makes a grant of $525,000 to Ought and $100,000 to AI Impacts.[90]
2018 May Publication The paper "AGI Safety Literature Review" is published in the International Journal Conference on Artificial Intelligence and uploaded to the ArXiV.[170] This would be referenced as an article to read for people interested in contributing technically to AI safety.[133]
2018 July Funding Open Philanthropy grants $429,770 to the University of Oxford to support research on the global politics of advanced artificial intelligence. The work will be led by Professor Allan Dafoe at the Future of Humanity Institute in Oxford, United Kingdom.[90]
2018 July 10 (beta), October 29 (out of beta) Project The team behind LessWrong 2.0 launches a beta for the AI Alignment Forum at AlignmentForum.org on July 10, as a successor to the Intelligent Agent Foundations Forum (IAFF) at agentfoundations.org.[171] On October 29, the Alignment Forum exits beta and becomes generally available.[172][173]
2018 August 14 Funding Nick Beckstead grants the Machine Intelligence Research Institute (MIRI) $488,994 from the Long-Term Future Fund. This is part of his last set of grants as fund manager; he would subsequently step down and the fund management would move to a different team.[174][175]
2018 August 23 onward (fall semester) Course The course CS 294-149: Safety and Control for Artificial General Intelligence is taught at UC Berkeley, with instructors Andrew Critch and Stuart Russell, both affiliated with the Center for Human-Compatible AI. The lecture schedule, available online, shows coverage of background concepts (like the VNM utility theorem), AI forecasts, and technical AI safety challenges.[176]
2018 September to October Funding During this period, the Berkeley Existential Risk Initiative (BERI) makes a number of grants to individuals working on projects related to AI safety.[177]
2018 October 25 Organization The oldest Wayback Machine snapshot of the webpage aisafety.stanford.edu for the Stanford Center for AI Safety is from this date.[178] A more precise launch date or launch announcement could not be found.
2018 November 22 Disclosure norms Nate Soares, executive director of MIRI, publishes MIRI's 2018 update post that announces MIRI's "nondisclosed-by-default" policy for most of its research.[179] The 2018 AI alignment literature review and charity comparison post would discuss the complications created by this policy for evaluating MIRI's research,[180] and so would the 2019 post.[181] In its 2019 fundraiser review, MIRI would mention the nondisclosure-by-default policy as one possible reason for it raising less money in its 2019 fundraiser.[150]
2018 November 29 Funding The Long-Term Future Fund, one of the Effective Altruism Funds, announces a set of grants: $40,000 to Machine Intelligence Research Institute, $10,000 to Ought, $21,000 to AI Summer School, and $4,500 to the AI Safety Unconference.[175]
2018 December 17 Publication The "2018 AI Alignment Literature Review and Charity Comparison" is published on the Effective Altruism Forum. It surveys AI safety work in 2018. It continues an annual tradition of similar blog posts in 2016 and 2017.[180]
2018 December 21 Publication Kelsey Piper writes a lengthy article for the Future Perfect section of Vox titled "The case for taking AI seriously as a threat to humanity" that gives an overview of AI and AI safety.[182] In an interview with 80,000 Hours in February 2019, Piper would say of the article: "I heard from some people that that gave them a clear explanation of what was going on with AI to point to on. That’s something that seems potentially pretty valuable, just to make sure people are on the same page about that."[183]
2019 January Funding Open Philanthropy grants $250,000 to the Berkeley Existential Risk Initiative (BERI) to temporarily or permanently hire machine learning research engineers dedicated to BERI’s collaboration with the Center for Human-Compatible Artificial Intelligence (CHAI).[90]
2019 January Funding Open Philanthropy provides a founding grant for the Center for Security and Emerging Technology (CSET) at Georgetown University of $55 million over 5 years.[90][184] CSET is led by Jason Matheny, formerly of IARPA, dedicated to policy analysis at the intersection of national and international security and emerging technologies. Other founding members include Dewey Murdick from the Chan Zuckerberg Initiative, William Hannas from the CIA, and Helen Toner from Open Philanthropy.
2019 January Conference The Beneficial AGI 2019 Conference is organized in Puerto Rico by the Future of Life Institute. It is a successor to the 2015 Puerto Rico Conference and the 2017 Asilomar Conference on Beneficial AI, both of which were important milestone conferences.[185][186]
2019 January Publication The Future of Humanity Institute publishes a technical report Reframing Superintelligence: Comprehensive AI Services as General Intelligence by K. Eric Drexler (an engineer best known for seminal studies of the potential of molecular nanotechnology). The report introduces to the AI safety community the idea of comprehensive AI services (CAIS) as a possible and likely path to superintelligence.[187] Rohin Shah summarizes the "gargantuan 210 page document" in a post cross-posted to the Alignment Forum and LessWrong.[188][189]
2019 February Funding Open Philanthropy grants $2,112,500 to the Machine Intelligence Research Institute (MIRI) over two years. This is part of the first batch of grants decided by the Committee for Effective Altruism Support, which will set "grant sizes for a number of our largest grantees in the effective altruism community, including those who work on long-termist causes."[90] Around the same time, BERI grants $600,000 to MIRI.[177]
2019 February 14 Disclosure norms OpenAI unveils its language-generating system called GPT-2, a system able to write the news, answer reading comprehension problems, and is beginning to show promise at tasks like translation.[190] However, the data or the parameters of the model are not released, under expressed concerns about potential abuse. This would lead to a lot of discussion and controversy.[191]
2019 March 11 Organization OpenAI announces the creation of OpenAI LP, a new “capped-profit” company owned and controlled by the OpenAI nonprofit organization’s board of directors. The new company is purposed to allow OpenAI to rapidly increase their investments in compute and talent while including checks and balances to actualize their mission.[192][193]
2019 March 17 Publication Paul Christiano's blog post "What failure looks like" is published to LessWrong and the Alignment Forum. The post describes how Christiano thinks AI alignment might fail in the real world: ML training will hyper-optimize for measurable behaviors, giving rise to "greedy" patterns that exploit their own influence and ultimately dominate the behavior of systems.[194] The post would be mentioned by Larks in his 2019 and 2020 alignment literature reviews, nominated for the 2019 review, and generate a lot of discussion in the LessWrong post comments as well as in later posts by others. A post by Ben Pace over a year later would distill the discussion.[195]
2019 March 18 Organization The Stanford Institute for Human-Centered Artificial Intelligence (HAI) launches at Stanford University under the leadership of philosopher John Etchemendy and AI scientist Fei-Fei Li. Partner organizations at launch include AI4All, AI100, AI Index, Center for AI Safety and the Center for the Study of Language and Information.[196] Despite the similarity in name with UC Berkeley's Center for Human-Compatible AI (CHAI), HAI is not focused primarily on AI safety, but on the more general project of making AI beneficial to and supportive of a wider range of human interests; co-founder Fei-Fei Li is involved with partner AI4ALL, that aims to improve diversity and inclusion in AI. The center at Stanford most analogous to CHAI is the Center for AI Safety.
2019 April 7 Funding The Long-Term Future Fund, one of the Effective Altruism Funds, announces a set of 23 grants totaling $923,150. About half the grant money is to organizations or projects directly working in AI safety. Recipients include the Machine Intelligence Research Institute (MIRI), AI Safety Camp, Ought, and a number of individuals working on AI safety projects, including three in deconfusion research.[175]
2019 April 11, April 25 Publication In a two-part podcast with Lucas Perry for the Future of Life Institute, Rohin Shah of the Center for Human-Compatible AI provides an overview of the field of technical AI safety, including different AI safety agendas.[197][198]
2019 May Funding Open Philanthropy announces the second class of the Open Philanthropy AI Fellowship, with 8 machine learning researchers in the class, receiving a total of $2,325,000 in grants.[90]
2019 June 7 Fictional portrayal The movie I Am Mother is released on Netflix. According to a comment on Slate Star Codex: "you can use it to illustrate everything from paperclip maximization to deontological kill switches".[199]
2019 June 12 Disclosure norms In a blog post titled "The Hacker Learns to Trust" Connor Leahy explains why he decided against releasing his own replication of OpenAI's GPT-2 model. Leahy says that helpful discussions with OpenAI team members (Jack Clark, Alec Radford and Jeff Wu), and a convincing argument from Buck Shlegeris of MIRI, led him to change his mind. Highlighted text, based on the conversation with Shlegeris: "Because this isn’t just about GPT2. What matters is that at some point in the future, someone will create something truly dangerous and there need to be commonly accepted safety norms before that happens."[200] The post is linkposted to LessWrong, where it attracts more comments.[201]
2019 June 13 Publication The book The AI Does Not Hate You: Superintelligence, Rationality and the Race to Save the World by Tom Chivers is released in the United Kingdom. The book includes a discussion of AI safety (including the AI box experiments), though it is more focused on the rationality community that built around LessWrong and Slate Star Codex.[202][203]
2019 June 20 Publication, organization The blog post A case for strategy research: what it is and why we need more of it is published by Siebe Rozendal, Justin Shovelain, and David Kristofferson of Convergence Analysis (an organization focused on existential risk strategy, including AI safety strategy); it is published to both LessWrong[204] and the Effective Altruism Forum.[205] This appeaers to be the organization's first publication in its listed publications,[206] although the organization appears to have had a website as far back as December 2015.[207]
2019 August 25 Funding Grantmaking done by Berkeley Existential Risk Initiative (BERI) funded by Jaan Tallinn moves to the newly created Survival and Flourishing Fund (SFF).[208] BERI's grantmaking in this space had previously included AI safety organizations.
2019 August 30 Funding The Long-Term Future Fund, one of the Effective Altruism Funds, announces a set 13 grants totaling $415,697 USD to organizations and individuals. About half the grant money is to organizations or projects working in AI safety and related AI strategy, governance, and policy issues. With the exception of a grant to AI Safety Camp, all the other grants related to AI safety are to individuals.[209]
2019 October 8 Publication The book Human Compatible by Stuart J. Russell (co-author with Peter Norvig of Artificial Intelligence: A Modern Approach and head of the Center for Human-Compatible AI at UC Berkeley) is published by Viking Press. The book is reviewed by The Guardian[210] and interviews with the author are published by Vox[211] and TechCrunch.[212]
2019 November Funding Open Philanthropy makes a $1 million grant to Ought, double the previous grant of $525,000.[90]
2019 November Funding Open Philanthropy makes a $705,000 grant to Berkeley Existential Risk Initiative (BERI) to support continued work with the Center for Human-Compatible AI (CHAI) at UC Berkeley. This includes one year of support for machine learning researchers hired by BERI, and two years of support for CHAI.[90]
2019 November 6 Publication An "AI Alignment Research Overview" by Jacob Steinhardt (one of the co-authors of Concrete Problems in AI Safety) is published to LessWrong and the AI Alignment Forum.[213]
2019 November 21 Funding The Long-Term Future Fund, one of the Effective Altruism Funds, announces a set 13 grants totaling $466,000 USD to organizations and individuals. About a quarter of the grant money is to organizations and individuals working on AI safety. With the exception of a grant to AI Safety Camp, all the other grants related to AI safety are to individuals.[214]
2019 November 24 Toon Alfrink publishes on LessWrong a postmortem for RAISE, an attempt to build an online course for AI safety. The blog post explains the challenges with running RAISE and the reasons for eventually shutting it down.[215]
2019 December 18 Publication The "2019 AI Alignment Literature Review and Charity Comparison" is published on the Effective Altruism Forum. It surveys AI safety work in 2019. It continues an annual tradition of similar blog posts in 2016, 2017, and 2018. One feature new to the document this year is the author's effort to make it easier for readers to jump to and focus on parts of the document most relevant to them, rather than read it beginning to end. To make this easier, the author ends each paragraph with a hashtag, and lists the hashtags at the beginning of the document.[181]
2020 January Publication Rohin Shah (the person who started the weekly AI Alignment Newsletter) publishes a blog post on LessWrong (cross-posted to the Alignment Forum) titled "AI Alignment 2018-19 Review" that he describes as "a review post of public work in AI alignment over 2019, with some inclusions from 2018."[216]
2020 February Funding Open Philanthropy makes grants to AI safety organizations Ought ($1.59 million) and Machine Intelligence Research Institute ($7.7 million) with the money amount determined by the Committee for Effective Altruism Support (CEAS). Other organizations receiving money based on CEAS recommendations at around the same time are the Centre for Effective Altruism and 80,000 Hours, neither of which is primarily focused on AI safety.[90] In April 2020, MIRI would blog about receiving the Open Philanthropy grant (MIRI's "largest grant to date") and reveal that the grant is for two years and that $1.46 million of the grant money comes from Ben Delo, co-founder of the cryptocurrency trading platform BitMEX.[217]
2020 February 22 Funding Jaan Tallinn, a philanthropist who has funded AI safety efforts, and has recently started using the Survival and Flourishing Fund's S-process to allocate money, announces that he has prepared a page explaining his philanthropic policies and making a concrete commitment for 2020-24. Tallinn's commitment is in terms of the larger of $2 million USD and 20,000 times the year-round minimum price of ETH per year in endpoint grants from 2020 to 2024. Based on ETH prices, the ETH-based minimum is the binding factor for 2020 and likely even more so for future years; and is expected to set a pretty high floor for funding in the AI safety field.[218][3]
2020 March Publication The Precipice: Existential Risk and the Future of Humanity by Toby Ord (affiliated with the Future of Humanity Institute and with Oxford University) is published by Hachette Books. The book covers risks including artificial intelligence, biological risks, and climate change. The author appears on podcasts to talk about the book, for Future of Life Institute[219] and 80,000 Hours.[220]
2020 March 2 Funding Berkeley Existential Risk Initiative (BERI) grants $300,000 to MIRI.[221] In a blog post in April, MIRI would write: "I’ll note that at the time of our 2019 fundraiser, we expected to receive a grant from BERI in early 2020, and incorporated this into our reserves estimates. However, we predicted the grant size would be $600k; now that we know the final grant amount, that estimate should be $300k lower."[217]
2020 April 2 – 3 Publication Asya Bergal of AI Impacts publishes a series of interviews (done along with Robert Long) of people optimistic about AI safety, along with a summary post to LessWrong. Their one-sentence summary: "Relative optimism in AI often comes from the belief that AGI will be developed gradually, and problems will be fixed as they are found rather than neglected."[222][223]
2020 April 14 Funding The Long-Term Future Fund, one of the Effective Altruism Funds, announces a set of 10 grants totaling $488,350. Detailed grant writeups are not available at the time of announcement due to "factors arising from the ongoing coronavirus pandemic." The grants include a $100,000 grant to MIRI, a grant for an AI safety unconference, and some other AI safety grants as well as grants not directly related to AI safety.[224][217]
2020 April 28 Publication The fourth edition of Artificial Intelligence: A Modern Approach by Stuart Russell and Peter Norvig is released. The edition has considerably more coverage of the alignment problem than previous editions.[225]
2020 May (announcement: May 12) Funding Open Philanthropy announces the third class of the Open Philanthropy AI Fellowship, funding 10 researchers for 5 years for a total of $2.3 million; the fellows were selected from a pool of more than 380 applicants.[90] Catherine Olsson, one of the two people at Open Philanthropy responsible for selecting fellows, responds to a question about the selection of fellows as follows: "But the short answer is I think the key pieces to keep in mind are to view the fellowship as 1) a community, not just individual scholarships handed out, and as such also 2) a multi-year project, built slowly."[226]
2020 May Funding The Survival and Flourishing Fund publishes the outcome of its recommendation S-process for the first half of 2020. This includes grant recommendations to Machine Intelligence Research Institute (MIRI) and AI Impacts in the area of AI safety, and other grant recommendations to organizations such as Future of Life Institute and Convergence Analysis that work partly in the area.[227]
2020 May (website launch), November 19 (announcement on EA Forum) Organization AI Safety Support is launched by Linda Linsefors and JJ Hepburn. The website launches in May 2020 and it is announced on the EA Forum on November 19. Initiatives include online AI safety discussion days, a list of AI safety resources, a mentorship program, and the AI Alignment Slack.[228]
2020 May 28 (release), June and July (discussion and exploration) OpenAI releases the natural language model GPT-3 on GitHub[229] and uploads to the ArXiV the paper Language Models are Few-Shot Learners explaining how GPT-3 was trained and how it performs.[230] Games, websites, and chatbots based on GPT-3 are created for exploratory purposes in the next two months (mostly by people unaffiliated with OpenAI), with a general takeaway that GPT-3 performs significantly better than GPT-2 and past natural language models.[231][232][233][234] Commentators also note many weaknesses such as: trouble with arithmetic because of incorrect pattern matching, trouble with multi-step logical reasoning even though it could do the individual steps separately, inability to identify that a question is nonsense, inability to identify that it does not know the answer to a question, and picking up of racist and sexist content when trained on corpuses that contain some such content.[235][236][237] Writing from an AI safety perspective, Abram Demski asks: How "honest" is GPT-3?[238] Rohin Shah discusses GPT-3 in Alignment Newsletter #102.[239] A post by Andy Jones wonders whether we are in an "AI overhang" with GPT-3 being the trigger for "100x larger projects at Google, Facebook and the like, with timelines measured in months."[240]
2020 May 30 (submission), June 11 (date in paper) Publication The paper "AI Research Considerations for Human Existential Safety (ARCHES)" by Andrew Critch of the Center for Human-Compatible AI (CHAI) and David Krueger of the Montreal Institute for Learning Algorithms (MILA) is uploaded to the ArXiV.[241] MIRI's July 2020 newsletter calls it "a review of 29 AI (existential) safety research directions, each with an illustrative analogy, examples of current work and potential synergies between research directions, and discussion of ways the research approach might lower (or raise) existential risk."[242] Critch is interviewed about the paper on the AI Alignment Podcast released September 15.[243]
2020 May 31 DeepMind's Victoria Krakovna writes a blog post (cross-posted to LessWrong and the Alignment Forum) about possible takeaways from the COVID-19 pandemic for slow AI takeoff.[244]
2020 July 27 In a blog post comment (cross-posted between LessWrong and the Alignment Forum), Gwern Branwen talks about the scaling hypothesis underlying OpenAI's work, and the fact that Google Brain and DeepMind refuse to accept or invest in the hypothesis despite increasing amounts of evidence favoring it. The comment is promoted by Raymond Arnold (LessWrong staff member) and gets heavily upvoted.[245]
2020 September 3 Funding The Long-Term Future Fund, one of the Effective Altruism Funds, announces a set of 12 grants totaling $394,200. Two of the grants are to AI safety organizations: the Center for Human-Compatible AI and AI Impacts. 7 of the 8 grants made to indiividuals are in the domain of AI safety.[246]
2020 September 18 Publication Ajeya Cotra of Open Philanthropy makes public a draft report prepared by her on AI timelines for Open Philanthropy, announcing it on LessWrong and the Effective Altruism Forum.[247][248] The report would be referenced in future discussions of AI timelines. Holden Karnofsky would write a "layperson-compatible summary" in 2021.[249] Matthew Barnett would write in 2022 that it "is the most useful, comprehensive report about AI timelines I've seen so far."[250]
2020 September 28 Publication Richard Ngo publishes "AGI safety from first principles" as a collection of Alignmment Forum blog posts, also available as a PDF, that makes the case for AGI safety.[251] Rohin Shah approvingly discusses it in the Alignment Newsletter, summarizing its argument as the "second species argument", i.e., that sufficiently advanced AGI would make humanity the second species on earth.[252]
2020 October 6 Publication The book The Alignment Problem: Machine Learning and Human Values by Brian Christian is published. Rohin Shah reviews the book for LessWrong (cross-posted to the Alignment Forum).[253]
2020 October 22 Publication Scott Garrabrant publishes (cross-posted to LessWrong and the Effective Altruism Forum) a blog post titled "Introduction to Cartesian Frames" that is is first post in a sequence about Cartesian frames, a new conceptual framework for thinking about agency.[254][255]
2020 November Funding The Survival and Flourishing Fund publishes the outcome of its recommendation S-process for the second half of 2020. This includes grant recommendations to several organizations working mostly or partly in AI safety such as Machine Intelligence Research Institute, Center for Human-Compatible AI, Berkeley Existential Risk Initiative, Stanford Existential Risk Initiative, Median Group, Future of Humanity Institute, and Future of Life Institute.[256]
2020 November 28 Funding The Long-Term Future Fund, one of the Effective Altruism Funds, announces a set of 11 grants totaling $505,000. About half of the grant money is for work related to AI safety, and many of the remaining grants are also loosely related to AI, machine learning, and the long-term trajectory of humanity.[257]
2020 December 21 Publication The "2020 AI Alignment Literature Review and Charity Comparison" by Larks is published to the Effective Altruism Forum and cross-posted to LessWrong. It surveys AI safety work in 2020. It continues a tradition of similar posts starting 2016, and follows the same format as 2019.[133]
2020 – 2021 December 2020  – January 2021 Several safety-focused people working at OpenAI announce their departure, including Dario Amodei (who, with many of the others leaving, would co-found Anthropic)[258] and Paul Christiano (who would go on to found the Alignment Research Center).[259] Mira Murati of OpenAI becomes "senior vice president of Research, Product, and Partnerships" as part of "greater focus on the integration of research, product, and safety."[258] Jan Leike joins to lead the alignment effort.[260] The departures are discussed on Reddit[261] and LessWrong.[262] Almost three years later, an open letter allegedly from former OpenAI employees that is critical of Sam Altman's leadership would emerge, with it being likely that this includes employees who left during this tumultous time.[263][264]
2021 January Funding Open Philanthropy grants $11,355,246 over five years to the Center for Human-Compatible AI (CHAI) at UC Berkeley. This renews the original founding support in August 2016.[90]
2021 January Funding Open Philanthropy grants $8 million to the Center for Security and Emerging Technology (CSET) to augment its 5-year, $55 million funding provided in January 2019. Though CSET's activities go beyond AI safety, this grant is specifically for work at the intersection of security and artificial intelligence.[90]
2021 February Funding Open Philanthropy makes grants to three researchers in the field of adversarial robustness research: Aleksander Madry at MIT and Dawn Song and David Wagner at UC Berkeley.[90]
2021 March 18 Organization The Nonlinear Fund by Kat Woods (co-founder of Charity Entrepreneurship, Charity Science Health, and Charity Science Outreach) and Emerson Spartz (founder of MuggleNet and Dose Media) is announced in a blog post on the Effective Altruism Forum. The Fund plans to generate, identify, and evaluate high-leverage AI safety opportunities, then make them happen using a variety of tools including grantmaking, advocacy, RFPs, and incubation.[265]
2021 April (original publication), December (review), June 2022 (ArXiV upload) Publication Joseph Carlsmith of Open Philanthropy publishes a draft post "Is power-seeking AI an existential risk?"[266] In December, a compilation of reviews of the post is published by Carlsmith.[267] In June 2022, the post is uploaded to the ArXiV.[268] In a September 2022 blog post, the FTX Future Fund, when announcing its Worldview Prize, would describe this publication as one of the past publications of the standard that would have qualified for the prize.[269]
2021 April Funding For the fourth class of the Open Philanthropy AI Fellowship, Open Philanthropy provides a total of approximately $1 million over five years to four fellows.[90]
2021 April 26 Organization Paul Christiano (who had left OpenAI at the end of January) announces that he is now working at the Alignment Research Center (ARC), a nonprofit he started that currently has only him. ARC focuses on theoretical research with the hope that it will "yield big alignment improvements within the next few years" that "will be integrated into practice at leading ML labs."[270]
2021 April 30  – May 1 Opinion Paul Christiano conducts an Ask Me Anything (AMA) cross-posted between LessWrong and the Alignment Forum.[271]
2021 May 13 Funding MIRI announces two major donations to it: $15,592,829 in MakerDAO (MKR) from an anonymous donor with a restriction to spend a maximum of $2.5 million per year till 2024, and the remaining funds available in 2025, and 1050 ETH from Vitalik Buterin, worth $4,378,159.[272]
2021 May 26 Funding The Long-Term Future Fund announces its May 2021 funding recommendations, allocating $1,650,795. The majority of grant money is to individuals, and over half the grant money is specific to AI safety (other grants include grants related to biosecurity and pandemic preparedness, and general longtermism).[273]
2021 May 28 Organization The launch of a new AI safety organization, Anthropic, is announced in the Vox Future Perfect newsletter. The organization's founding team include siblings Dario and Daniela Amodei and other people who left OpenAI in December 2020 and January 2021. It starts off with $124 million in Series A funding led by Jaan Tallinn and including Dustin Moskovitz, Eric Schmidt, and James McClave. Anthropic's focus is on building tools that help AI researchers understand AI programs.[274][275]
2021 May/June Funding The Survival and Flourishing Fund publishes the outcome of its recommendation S-process for the first half of 2021 for funders Jaan Tallinn and Jed McCaleb (by this point, the SFF no longer has any of its own funds so it is acting purely as a "virtual fund"). This includes grant recommendations to several organizations working mostly or partly in AI safety such as AI Impacts, AI Safety Support, Centre for the Study of Existential Risk, Convergence Analysis, and the BERI-FHI collaboration. A total of about $9.8 million is granted in this round.[276]
2021 June 15 (announcement), June 25 (workshop) AI Impacts runs an AI Vignettes Workshop, organized by Katja Grace and Daniel Kokotajlo.[277] The output of this and other similarly themed workshops is available as the AI Vignettes Project, that describes itself as "an ongoing effort to write concrete plausible future histories of AI development and its social impacts."[278]
2021 July Funding The Long-Term Future Fund makes grant recommendations; these would get announced in January 2022. Several of the grants, particularly to individuals, are for AI safety work.[279]
2021 July onward Publication Holden Karnofsky, the long-term-focused co-CEO of Open Philanthropy, publishes a series of blog posts called "The Most Important Century" on his personal website, Cold Takes. The series gives "transformative AI" a central role in the long-term trajectory of humanity, breaking down potential outcomes from transformative AI as a world of digital people, a world of misaligned AI, and something else.[280]
2021 October 5 Organization Redwood Research, that appears to have scaled up its activities relatively recently (the oldest Wayback Machine snapshot is from September 22, though Nate Thomas and Bill Zito appear to have started working there in 2019),[281] announces itself in a post cross-posted across LessWrong, the Alignment Forum, and the Effective Altruism Forum. They focus on prosaic alignment approaches and focus on applied alignment research.[282]
2021 October 17 Newsletter The first issue of Dan Hendrycks' ML safety newsletter is published.[283] New issues of the newsletter are published every 1-3 months. The newsletter would be mentioned in a list of existing newsletters in an announcement for another newsletter.[284]
2021 November Funding The Survival and Flourishing Fund publishes the outcome of its recommendation S-process for the second half of 2021 for funders Jaan Tallinn, Jed McCaleb, and The Casey and Family Foundation, represented by David Marble (by this point, the SFF no longer has any of its own funds so it is acting purely as a "virtual fund"). This includes grant recommendations to several organizations working mostly or partly in AI safety such as Ought and AI Safety Camp, as well as to the Long-Term Future Fund that would further reallocate the funds in its upcoming grant rounds. A total of $9.6 million is distributed in this round.[285]
2021 November Funding Open Philanthropy makes a grant of $9.42 million to Redwood Research.[90]
2021 December Funding Open Philanthropy makes a grant of $2,537,600 to the Centre for the Governance of AI.[90]
2021 December 23 Publication The "2021 AI Alignment Literature Review and Charity Comparison" by Larks is published on the Effective Altruism Forum and cross-posted to LessWrong. It surveys AI safety work in 2021. It continues a tradition of similar posts starting 2016, and follows the same format as 2019 and 2020, with one new addition being a "Organisation second preference" section.[286]
2021 December (blog posts announcing the org in August 2022) Organization Encultured AI, a for-profit video game company with a public benefit mission related to long-term survival and flourishing, is started by Andrew Critch and Nick Hay. The founding team claims that the video game environment would be a good sandbox to experiment with and learn more about AI safety and alignment.[287][288][289][290]
2022 February 28 Funding The FTX Future Fund, bankrolled by the wealth of Sam Bankman-Fried and his FTX co-founders, announces its launch and the plan to distribute at least $100 million in 2022.[291] At the time of the launch announcement, the areas of interest page lists artificial intelligence as one of the areas of interest,[292]. Also, the project ideas pages lists AI alignment prizes as one of the project ideas it is excited to fund. Two other AI-related topics are listed: AI-based cognitive aids and AI ethics; both are somewhat connected to AI safety, the latter more so.[293]
2022 March Funding Open Philanthropy makes two grants related to AI safety: to MIT for Neil Thompson's work on AI trends and impacts research, and to Hofvarpnir Studios to maintain a compute cluster for Jacob Steinhardt's lab and CHAI for their AI safety work.[90]
2022 March 26 Project The AI Safety Arguments Competition is announced on LessWrong and the AI Alignment Forum. The competition "addresses shorter arguments (paragraphs and one-liners) with a total prize pool of $20K. The prizes will be split among, roughly, 20-40 winning submissions."[294]
2022 March 26 (first Wayback Machine snapshot), September 12 (first LessWrong post) Organization AI safety organization Apart Research launches. The oldest Wayback Machine snapshot of this site is from March 26, 2022.[295] The first LessWrong post for the organization is published on September 12.[296]
2022 April 8 Organization Conjecture, a new AI alignment research startup, announces its founding on the Alignment Forum and LessWrong. Conjecture is a "for-profit company with products on the market".[297]
2022 April, May Multiple companies release AI tools: OpenAI's DALL-E-2, Google's PaLM, SalesForce's CodeGen, and DeepMind's Gato.[298][299] Around this time, and possibly partly influenced by these releases, the distribution of estimates of the arrival of artificial general intelligence moves to earlier in time.[300][298]
2022 April 29 Funding Anthropic raises $580 million in Series B funding; the round is led by Sam Bankman-Fried, CEO of FTX, and also includes participation from Caroline Ellison, Jim McClave, Nishad Singh, Jaan Tallinn, and the Center for Emerging Risk Research (CERR). Anthropic (co-founded by people from OpenAI with AI safety backgrounds) is a private corporation working on making AI more reliable and explainable, and had previously raised $124 million in Series A funding led by Jaan Tallinn and including Dustin Moskovitz.[301][302]
2022 May 31, June 1 Publication The term effective accelerationism (as well as its shorthand e/acc) is coined at this time and announced on Twitter and in a Substack newsletter post by Swarthy.[303][304] Further notes on the "principles and tenets" are published on July 10.[305] The ideology espouses both the desirability and inevitability of AI-driven capitalistic technological progress and its role in ushering in a new, improved consciousness.[303][306] In an interview in 2023, entrepreneur Emmett Shear says that the original e/acc is very different from garden-variety Silicon Valley techno-optimism (despite the label having been used freely by garden-variety Silicon Valley techno-optimists) would explain a 2X2 of people's attitudes toward AI, puting both e/acc and "EA" (the AI safety crowd that is deeply connected to the EA movement) on the right side of people who think AI could be transformative and disruptive and destroy humanity as we know it, but with e/acc thinking that's actually a good thing and EA thinking that's a bad thing.[307]
2022 June Funding The Survival and Flourishing Fund publishes the outcome of its recommendation S-process for the first half of 2022, with Jaan Tallinn being the sole funder in this round. This includes grant recommendations to several organizations working mostly or partly in AI safety such as Redwood Research, Median Group, and Nonlinear.[308]
2022 June 30 Funding The FTX Future Fund publishes an update on it grantmaking so far, with AI safety as one of the cause areas it has funded. It estimates a total of 76 grants for "Artificial Intelligence" and a total spend of $20 million (much of which is related to AI safety). Several example grants are listed; some grantees working in AI safety are: Ought, ML Safety Scholars Program, and AI Impacts.[309]
2022 July 1 Publication Adam Scholl publishes a blog post to LessWrong and the AI Alignment Forum titled "Safetywashing" in analogy with the term greenwashing. Greenwashing is a form of advertising that or marketing spin that seeks to achieve positive affect by convincing the public that an organization's products, goals, and policies are "green" (environmentally friendly). The proposed new term "safetywashing" refers to AI capabilities organizations convincing their target audience that their products, goals, and policies are safety-conscious. The post would go on to become part of LessWrong's 2022 review, being ranked #20 of 4488 posts. A comment on the post claims that the term was used as early as 2018 in the AI ethics community, but the conversations involved were "on Twitter and maybe Slack" explaining why it didn't catch on at LessWrong back then.[310] The concept would be referenced in future blog posts on LessWrong[311][312]
2022 July, August Organization The Center for AI Safety starts being active around this time (its website had a "Coming Soon" in December 2021).[313][314] This is distinct from the Stanford Center for AI Safety, despite the similar name.
2022 August Publication 80,000 Hours publishes a problem profile for AI safety / AI alignment work titled "Preventing an AI-related catastrophe: AI might bring huge benefits — if we avoid the risks" and written by Benjamin Hilton.[315] A summary by an unaffiliated individual is published to LessWrong a few months later.[316]
2022 August 24 Publication OpenAI publishes a blog post titled "Our approach to alignment research" where it describes its alignment plans. The subtitle says: "Our goal is to build a sufficiently aligned AI system that can help us solve all other alignment problems."[317] There would be several followups to this, including a personal blog post by co-author Jan Leike,[318] a post by Rob Bensinger (on behalf of Eliezer Yudkowsky and Nate Soares) asking Anthropic and Deepmind to follow suit and asking readers to critique OpenAI's plans,[319] and posts by Akash Wasil and Søren Elverlin with their thoughts on OpenAI's plan.[320][321]
2022 August 28 Publication A lengthy blog post by Thomas Larsen (with some help from Eli) goes over what different AI safety groups are doing and why.[322]
2022 September 23 Project The FTX Future Fund announces its Worldview Prize for submissions that challenge the Future Fund's thinking on the trajectory of AI and the importance of AI safety.[269][323]
2022 October 19 Newsletter The first episode of the ML safety updates newsletter/podcast is published.[324][284]
2022 November 10 Funding In response to the collapse of FTX, the source of funds for the FTX Future Fund, amidst allegations of fraud on the part of FTX leadership, the entire FTX Future Fund team resigns. The post says: "We are devastated to say that it looks likely that there are many committed grants that the Future Fund will be unable to honor. We are so sorry that it has come to this. We are no longer employed by the Future Fund, but, in our personal capacities, we are exploring ways to help with this awful situation. We joined the Future Fund to support incredible people and projects, and this outcome is heartbreaking to us." The FTX Future Fund had funded several projects and organizations related to AI safety.[325]
2022 November 10 Funding Holden Karnofsky publishes the first public announcement by Open Philanthropy (the other major founder of the EA ecosystem) regarding the FTX collapse. Karnofsky clarifies that funds directed by Open Philanthropy are not invested in or otherwise exposed to FTX or related entities. He also states that if the FTX Foundation stops funding longtermist and effective altruist projects (including projects in AI safety), then Open Philanthropy would have to consider a substantially larger set of funding opportunities, raising their bar for longtermist grantmaking. The post also announces a temporary pause on longtermist grantmaking while awaiting more clarity on the situation.[326]
2022 November 13 Funding Members of non-profit Nonlinear announce funding for Future Fund grantees affected by the FTX collapse, offering a small budget to help grantees cope with financial crisis. This includes grantees working in AI safety, a cause area that the FTX Future Fund had prioritized.[327]
2022 November 15 Funding In response to the collapse of the FTX Future Fund, Open Philanthropy announces that it is seeking applications from FTX Future Fund grantees affected by the collapse. This includes grantees working in AI safety, a cause area that the FTX Future Fund had prioritized.[328]
2022 November 21 Project Open Philanthropy pre-announces that it plans to conduct an AI Worldviews Contest in 2023. This is in lieu of the (now de facto cancelled due to the FTX collapse) Worldview Prize organized by the FTX Future Fund, that had already received several submissions. The post clarifies that this is not a continuation of the Future Fund contest, and various details including the prize pool, judges, and evaluation criteria will be different, but notes: "We expect it will be easy to adapt Future Fund submissions for the Open Phil contest." The post further says: "We are releasing this post now to try to alleviate some of the fear, uncertainty, and doubt surrounding the old Future Fund competition and also to capture some of the value that has already been generated by the Future Fund competition before it dissipates."[329]
2022 November 23 Project The AI safety Mentors and Mentees program is announced to the EA Forum.[330]
2022 November 30 AI advance OpenAI releases GPT-3.5 as well as a conversational chatbot called ChatGPT based on GPT-3.5.[331][332][333] This conversational chatbot would grow to being used by millions of users in a few days, and unleash a flurry of activity by competitors to speed up thieir own AI work and invest more in chatbots.
2022 December 1 Publication On behalf of his MIRI colleagues Eliezer Yudkowsky and Nate Soares, Rob Bensinger, who handles MIRI's research communications, publishes a blog post challenging organizations such as Anthropic and Deepmind to follow in OpenAI's footsteps by publicly writing up their alignment plans. The post also challenges readers to write their own critiques of OpenAI's publicly documented alignment plan without looking at the critique that Eliezer and Nate are currently working on writing.[319]
2022 December 2 Publication A blog post by Zvi Mowshowitz, cross-posted to LessWrong, documents the several successes with jailbreaking ChatGPT already (on the day of its release) to get it to reveal information that it tries not to reveal as it could be harmful. This includes directions for AI takeover, bullying, inflicting pain, and making dangerous chemicals. Eliezer Yudkowsky is quoted in the post, and both he and Paul Christiano participate in the comments.[334]
2022 December Project, Publication Conjecture hosts conversations to help nail down disagreements around AGI and alignment, with the goal of publishing the conversations publicly to advance the conversation. They invite people from OpenAI, DeepMind, Anthropic, Open Philanthropy, FTX Future Fund, ARC, and MIRI, as well as some independent researchers to participate in the discussions. While most people are unable to participate as their organizations prevent them from sharing information publicly, people from the Alignment Research Center, Deepmind, and OpenAI agree to participate. A summary and the conversations are published in February 2023 to LessWrong and the Alignment Forum.[335][336]
2023 January 18, January 20 Publication On January 18, Sam Altman is interviewed by Connie Loizos about OpenAI, with the recent release of ChatGPT being a key focus of the interview.[337] On January 20, an excerpt of the transcript focused on the parts relevant to AI safety is published to LessWrong, attracting several comments including some from Paul Christiano.[338]
2023 January 20 Project The AI Safety Training website (aisafety.training) is announced to LessWrong and the Effective Altruism Forum. The website describes it as follows: "A database of training programs, courses, conferences, and other events for AI existential safety." The database is managed by Airtable and accepts suggestions for things that it has missed.[339][340]
2023 February 1 Publication Maheen Shermohammed and Vael Gates publish a report based on interviews with 97 AI researchers and share a summary on LessWrong and the Effective Altruism Forum. The questions include questions about the timeline of AGI and the validity of the alignment problem.[341]
2023 February 6 AI advance Google announces its AI chatbot Bard. Sundar Pichai, Google's CEO, reveals that Bard, powered by LaMDA (Language Model for Dialogue Applications), seeks to combine extensive knowledge with the intelligence and creativity of large language models. The chatbot launches for testing by external users before its public release. Bard is built on a lightweight version of LaMDA, requiring less computing power for broader accessibility. The announcement follows the success of ChatGPT.[342][343][344][345][346]
2023 February 7 AI advance Microsoft initiates the release of a significant revamp to Microsoft Bing, which involves introducing Bing Chat, a novel chatbot functionality powered by OpenAI's GPT-4.[347] This integration allows Bing to benefit from future improvements to GPT-4 and reduces limits on the Bing AI chatbot, allowing users 15 turns per session and up to 150 per day. This revelation coincides with Google's announcement of new AI features for its services and access to its AI language model, PaLM.[348]
2023 February 20 Publication Eliezer Yudkowsky appears on the Bankless podcast for an interview lasting a little under two hours, where he shares his relatively pessimistic views about the likelihood of catastrophic AGI with his hosts, neither of whom is deep into AI safety.[349] A full transcript is published to LessWrong and the Alignment Forum a few days later.[350] The podcast gets a lot of traction, eliciting several reactions, and leads to a followup Q&A on Twitter Spaces.[351] A month later, a lengthy point-by-point response by alignment researcher Quintin Pope is published to LessWrong, attracting over 200 comments.[352]
2023 February 21 Publication Zvi Mowshowitz publishes (cross-posted to LessWrong and his personal blog) the first of what will become weekly posts about AI; as somebody concerned about AI safety, his posts look at AI developments from a safety-focused lens. The idea behind this series is to copy over to the AI safety area his earlier success with weekly posts about COVID-19 during the COVID-19 pandemic. The first post is about Sydney (the original name of Bing Chat) and Bing.[353]
2023 February 24 Publication OpenAI publishes a blog post titled "Planning for AGI and beyond" with a subtitle "Our mission is to ensure that artificial general intelligence—AI systems that are generally smarter than humans—benefits all of humanity."[354] A blog post by Scott Alexander a few days later breaks the post down and offers critical perspectives on it, and attracts hundreds of comments.[355]
2023 March 3 Publication Robin Hanson (who had participated in the AI-Foom debate opposite Eliezer Yudkowsky back in 2008, and whose blog had been the place where Eliezer Yudkowsky posted the seed material for LessWrong) publishes a blog post outlining his position on AI risk. Hanson does consider it fairly likely that AI will lead to a significant increase in the speed of economic growth, in the same ways that the human, farming, and industrial revolutions led to significant increases. However, on the possibility of AI systems exploding and causing human extinction, Hanson writes: "While I agree that this is a logically possible scenario, not excluded by what we know, I am disappointed to see so many giving it such a high credence, given how crazy far it seems from our prior experience."[356] Commenters ask Hanson to engage more directly with AI safety arguments such as those provided by Yudkowsky and Richard Ngo, and to debate Yudkowsky on the specifics. A linkpost on LessWrong also receives comments, many of them skeptical of Hanson's skepticism about AI risk.[357] Another post on LessWrong offers a detailed point-by-point response to Hanson.[358]
2023 March 7 Publication Victoria Krakovna and Rohin Shah, both research scientists at Deepmind at the time, publish to Google Drive a document with their high-level thoughts on the Deepmind alignment team's strategy,[359] with cross-posts to LessWrong[360] and the AI Alignment Forum,[361] both of which attract a few comments. Unlike similar documents published by OpenAI and Anthropic (with the Anthropic document published the very next day), this is an informal and unofficial document representing the perspective and opinions of two researchers at Deepmind rather than an official organizational statement.
2023 March 8 Publication Anthropic publishes a blog post titled "Core Views on AI Safety: When, Why, What, and How" (31-minute read). It offers its own high-level summary of the post as follows: (1) "AI will have a very large impact, possibly in the coming decade" (2) "We do not know how to train systems to robustly behave well" (3) "We are most optimistic about a multi-faceted, empirically-driven approach to AI safety". The post classifies Anthropic's research into three broad categories: Capabilities, Alignment Capabilities, and Alignment Science. The post lists the following key ideas influencing Anthropic's safety research: mechanistic interpretability, scalable oversight, process-oriented learning, understanding generalization, testing for dangerous failure modes, and social impacts and evaluations.[362] A linkpost to LessWrong attracts several comments, many of them appreciative of Anthropic for sharing its views in detail, while also expressing disagreement with some of Anthropic's object-level views and criticism of some of its actions.[363] This follows similar documents published by OpenAI (August 24, 2022) and two Deepmind researchers (just the previous day, March 7, 2023).
2023 March 14 AI advance OpenAI releases GPT-4, the next upgrade to GPT from GPT-3.5; going forward, GPT-4 is to be the backend for ChatGPT Plus, the paid version of ChatGPT, with a usage cap. OpenAI claims that GPT-4 is more creative and collaborative than ever before and can solve difficult problems with greater accuracy.[364] On the same day, Anthropic, an OpenAI competitor started by Dario and Daniela Amodei who previously worked at OpenAI, releases Claude, its ChatGPT competitor.[365] Zvi Mowshowitz's weekly AI post of March 21 goes into detail on people's initial impressions of GPT-4, including successes with jailbreaking it, as well as the release of Claude and other AI developments.[366] The release of GPT-4, and its substantially greater capability compared to GPT-3.5, is a likely impetus for the flurry of AI safety activity in the coming weeks, starting with the Future of Life Institute's open letter seeking to pause AI for six months.[367]
2023 March 17 Publication The Alignment Research Center (ARC) publishes a blog post describing its recent evaluation efforts for GPT-4 (from OpenAI) and Claude (from Anthropic) in collaboration with OpenAI and Anthropic (this work would eventually be spun off into a research nonprofit called Model Evaluation and Threat Research (METR)).[368] A few days earlier, a blog post about ARC's work is published to LessWrong by an independent individual, based on information released by OpenAI about ARC's evaluation of GPT-4.[369]
2023 March 22 Publication The Future of Life Institute publishes an open letter calling "on all AI labs to immediately pause for at least 6 months the training of AI systems more powerful than GPT-4."[367] The letter would go on to receive over 30,000 signatures, from people including Yoshua Bengio of Mila, Stuart J. Russell of UC Berkeley and CHAI, Elon Musk, Steve Wozniak, and Yuval Noah Harari. The letter is shared to LessWrong, where several commenters explain their reservations about it and why they are not signing it.[370] The letter would go on to receive press coverage in several publications including Reuters,[371] Forbes,[372] The New York Times,[373] and The Wall Street Journal.[374] "Pause AI" would go on to become a catchphrase for a wide range of sentiments seeking to pause or slow down progress with AI for many reasons, including the AI safety concerns motivating this letter as well as other concerns such as concerns about the effects of AI on human society.[375] Zvi Mowshowitz rounds up the many reactions in a post about a week after the open letter.[376]
2023 March 29 Publication An article by Eliezer Yudkowsky in Time Ideas, in response to the FLI Open Letter, argues that pausing AI for six months isn't enough. He says that what is needed won't happen in practice, but spells it out anyway: "The moratorium on new large training runs needs to be indefinite and worldwide. There can be no exceptions, including for governments or militaries. [...] Shut down all the large GPU clusters (the large computer farms where the most powerful AIs are refined). Shut down all the large training runs. Put a ceiling on how much computing power anyone is allowed to use in training an AI system, and move it downward over the coming years to compensate for more efficient training algorithms. No exceptions for governments and militaries. Make immediate multinational agreements to prevent the prohibited activities from moving elsewhere. [...] Frame nothing as a conflict between national interests, have it clear that anyone talking of arms races is a fool. [...] Shut it all down."[377] The post is shared to LessWrong where it receives over 250 comments.[378]
2023 April 10 Newsletter The first issue of the Center for AI Safety's AI Safety Newsletter is published on this date. Unlike the ML Safety Newsletter (started by Dan Hendrycks, who is also involved with the AI Safety Newsletter), the AI Safety Newsletter is targeted at a general audience and does not require any technical background to read.[379]
2023 April 24 Organization Under the leadership of Prime Minister Rishi Sunak, the United Kingdom government announces the creation of the Foundation Model Taskforce with £100 million in initial funding. While primarily focused on harnessing the potential of AI for economic advancement, the taskforce is also intended to act "as a global standard bearer for AI safety."[380] In June, it would be announced that entrepreneur Ian Hogarth would be chair of the Foundation Model Taskforce.[381] The Foundation Model Taskforce would be one of the forces behind the AI Safety Summit held in November.[382]
2023 April (original talk), May (original redacted release), July 12 (official release) Publication Eliezer Yudkowsky gives a TED talk in April[383] reiterating his message about the high likelihood of superhuman AI coming soon and ending the world. The talk, initially released accidentally on YouTube in May,[384] would be officially released by TED in July[385] and shared on LessWrong.[386]
2023 May 30 Organization AI evals research organization Apollo Research announces its launch on LessWrong.[387]
2023 July 31 Publication Paul Christiano publishes to LessWrong and The Alignment Forum a post titled "Thoughts on sharing information about language model capabilities" where he says "I believe that sharing information about the capabilities and limits of existing ML systems, and especially language model agents, significantly reduces risks from powerful AI—despite the fact that such information may increase the amount or quality of investment in ML generally (or in LM agents in particular)." Christiano leads the Alignment Research Center (ARC), that runs ARC Evals, that has evaluated GPT-4 (from OpenAI) and Claude (from Anthropic) in collaboration with OpenAI and Anthropic respectively.[388]
2023 September 26 Publication A post on the ARC evals blog describes the concept of Responsible Scaling Policies (RSPs) that ARC evals has been consulting on. According to the blog post, "An RSP specifies what level of AI capabilities an AI developer is prepared to handle safely with their current protective measures, and conditions under which it would be too dangerous to continue deploying AI systems and/or scaling up AI capabilities until protective measures improve."[389] A blog post by Anthropic a week prior to this describes it own adoption of a RSP framework called AI Safety Levels (ASL).[390] Two months later, ARC evals advisor Holden Karnofsky, who co-founded and works at Open Philanthropy, and is married to the President of Anthropic, writes a blog post arguing for the superiority of RSPs over a "partial pause" in AI capabilities work, though a full pause would be ideal in his view.[391]
2023 October 30 Policy United States President Joe Biden issues an executuve rder on AI. Among other things, the executive order has section titled "Developing Guidelines, Standards, and Best Practices for AI Safety and Security."[392][393] Two days later, Zvi Mowshowitz, who blogs a lot about AI developments from an AI safety perspective, does a detailed post on LessWrong providing in-depth section-by-section commentary on the executive order[394] and another note covering reactions to the executive order.[395] An update is posted by the White House 180 days later announcing completion of all the tasks planned for the first 180 days, as well as details of what has been done so far and the next steps.[396]
2023 November 1 – 2 Conference The first AI Safety Summit is held at Bletchley Park, Milton Keynes in the United Kingdom. It leads to an agreement known as the Bletchley Declaration by the 28 countries participating in the summit, including the United States, United Kingdom, China, and the European Union.[397] It receives some commentary on LessWrong, viewing it as a partial step in the right direction,[398] including a lengthy blog post by Zvi Mowshowitz, a frequent commentator on AI developments from an AI safety lens.[399]
2023 November 2 Publication A short post by Katja Grace describes the risks of AI even if it does not cause human extinction. The post begins: "I guess there’s maybe a 10-20% chance of AI causing human extinction in the coming decades, but I feel more distressed about it than even that suggests—I think because in the case where it doesn’t cause human extinction, I find it hard to imagine life not going kind of off the rails. [...] Even if we don’t die, it still feels like everything is coming to an end." The post is highly upvoted and attracts many comments, with many commenters agreeing and thanking the author for expressing a latent sentiment, and other commenters disagreeing and stating that if extinction is avoided, humanity will be fine.[400]
2023 November 17  – 22, continued November 30 Drama A whirlwind of events occurs at OpenAI, beginning with the firing of Sam Altman on Friday November 17 and ending with an agreement in principle on the night of Tuesday November 21 to reinstate Altman, with the board agreeing to a replacement board.[401][402] The new board is in place, and Altman is back, on Thursday November 30.[403] The firing of Altman is connected to claims of his dishonesty and manipulation of the board, employee concerns, and the board's lack of trust in his ability to carry out OpenAI's mission of building AI safely while being accountable to the board. In general, there's a strong overlap between the camp of people who support the removal of Sam Altman and the people most concerned about AI safety and race dynamics, including many in the EA-aligned AI safety community.
2023 November  – December Publication The AI Optimists website by Nora Belrose and Quintin Pope is launched around November 2023 (the first Wayback Machine snapshot is from November 30) though it includes essays published as far back as February 2023.[404] The website collates arguments about how AI is likely to be easy to control, while still treating safety concerns about AI seriously. A response essay to "AI is easy to control", the most recent essay around the time of launch, is published by Steven Byrnes on LessWrong on December 1.[405] Another post on LessWrong later in December 2023 references the AI Optimists site and some of its broad arguments.[406]
2024 January 4 Publication The 2023 AI Impacts Survey results are released in a paper, with a summary blog post published to the AI Impacts blog,[407] and later cross-posted to LessWrong.[408]
2024 May OpenAI sees a number of (relatively) safety-focused people depart the organization, including Ilya Sutskever (who was involved in the Altman firing saga six months ago) and Jan Leike (who was heading OpenAI's safety efforts). Leike explains in a thread on Twitter/X that his departure is due to his lack of confidence in OpenAI's commitment to safety.[409] One of the people who had recently departed is Daniel Kokotajlo, who reveals that he gave up a lot of equity in order to not sign a non-disparagement agreement. This leads to media coverage of the non-disparagement agreements, leading Sam Altman to publicly announce a plan to reverse course on these agreements, portraying these agreements as an accidental oversight.[410][411]

Visual and numerical data

Mentions on Google Scholar

The following table summarizes per-year mentions on Google Scholar as of May 13, 2021.

Year artificial intelligence safety ethical artificial intelligence friendly artificial intelligence global risk artificial intelligence
1980 683 478 463 252
1985 1,340 713 1,110 537
1990 3,060 1,410 2,210 1,490
1995 4,160 2,110 2,930 2,510
2000 7,600 3,560 4,860 4,900
2002 8,080 4,510 5,950 6,380
2004 11,400 4,940 6,760 8,440
2006 13,300 6,240 7,820 10,700
2008 16,300 7,770 10,200 12,100
2010 20,500 9,170 12,500 16,400
2012 26,000 11,300 14,800 20,900
2014 29,800 13,300 16,400 24,600
2016 34,600 16,500 18,400 30,500
2017 39,300 20,200 21,000 36,200
2018 47,300 25,300 25,300 43,300
2019 52,700 27,600 28,500 48,300
2020 43,700 31,700 29,200 48,800
AI safety tables.png

Google trends

The chart below shows Google trends data for AI safety (search term) from January 2004 to January 2021, when the screenshot was taken.[412]

AI safety.jpeg

Google Ngram Viewer

The chart below shows Google Ngram Viewer data for AI safety from 1900 to 2020.[413]

AI safety ngram.jpeg

Meta information on the timeline

How the timeline was built

The initial version of the timeline was written by Issa Rice. The timeline was later expanded considerably by Vipul Naik. Sebastian Sanchez also added graphs to the timeline.

Issa likes to work locally and track changes with Git, so the revision history on this wiki only shows changes in bulk. To see more incremental changes, refer to the commit histories at the old location and the new location.

Funding information for this timeline is available.

Feedback and comments

Feedback for the timeline can be provided at the following places:

What the timeline is still missing

  • I think some of Abram Demski's work (radical probabilism?) should be included, but I don't know enough to decide which things are most important. I think similar things can be said for other technical researchers too.
  • The Matrix
  • maybe more at [1]
  • maybe more from the "earlier work on the topic" at Richard Posner on AI Dangers
  • maybe more from AI Researchers On AI Risk but no dates; use as starting point to know what people to look for
  • more AI box results at [2] but unfortunately no dates
  • stuff in [3] and [4]
  • siren/marketing worlds
  • TDT/UDT
  • Paul Christiano AI alignment blog. Also, more on Christiano's trajectory in AI safety
  • The launch of Ought
  • Translations of Superintelligence?
  • universal prior/distant superintelligences stuff
  • Steven Pinker?
  • AI summer school
  • when did the different approaches to alignment come along?
  • the name change from "friendly AI" to "AI safety" and "AI alignment" is probably worth adding, though this was gradual so kind of hard to pin down as an event. See also this comment.
  • https://youtu.be/pClSjljMKeA?t=3605 -- Robin Hanson talks about "booms" in concerns about robots.

Timeline update strategy

Pingbacks

  • Evan Sandhoefner tweet (December 26, 2017) emphasizes the accelerating nature of progress on AI safety by screenshotting the part till 2002 that makes up 1/5 of the timeline at the time.

See also

Timelines of organizations working in AI safety

Timelines of organizations working on AI capabilities

Timelines of products using AI

Timelines of concepts related to AI

Other timelines related to topics with potential applications to AI safety

Other timelines about cause areas prioritized in effective altruism

Other related timelines

External links

References

  1. Paul Christiano (November 19, 2016). "AI "safety" vs "control" vs "alignment"". AI Alignment. AI Alignment. Retrieved November 18, 2017. 
  2. Eliezer Yudkowsky (November 16, 2017). "Hero Licensing". LessWrong. Retrieved November 18, 2017. I'll mention as an aside that talk of "Friendly" AI has been going out of style where I'm from. We've started talking instead in terms of "aligning smarter-than-human AI with operators' goals," mostly because "AI alignment" smacks less of anthropomorphism than "friendliness." 
  3. 3.0 3.1 Tallinn, Jaan. "philanthropy". 
  4. Michael Nuschke (October 10, 2011). "Seven Ways Frankenstein Relates to Singularity". RetirementSingularity.com. Retrieved July 27, 2017. 
  5. Mitchell Howe (2002). "What is the intellectual history of the Singularity concept?". Retrieved July 27, 2017. Bearing little resemblance to the campy motion pictures he would inspire, Dr. Frankenstein's monster was a highly intelligent being of great emotional depth, but who could not be loved because of his hideous appearance; for this, he vowed to take revenge on his creator. The monster actually comes across as the most intelligent character in the novel, making Frankenstein perhaps the first work to touch on the core idea of the Singularity. 
  6. Alan Winfield (August 9, 2014). "Artificial Intelligence will not turn into a Frankenstein monster". The Guardian. Retrieved July 27, 2017. From the Golem to Frankenstein's monster, Skynet and the Matrix, we are fascinated by the old story: man plays god and then things go horribly wrong. 
  7. 7.0 7.1 7.2 7.3 7.4 7.5 7.6 Muehlhauser, Luke (March 31, 2012). "AI Risk & Opportunity: A Timeline of Early Ideas and Arguments". LessWrong. Retrieved September 14, 2019. 
  8. Bradley, Peter. "Turing Test and Machine Intelligence". Consortium on Computer Science Instruction. 
  9. Muehlhauser, Luke (August 11, 2013). "What is AGI?". Machine Intelligence Research Institute. Retrieved September 8, 2019. 
  10. Wiener, Norbert (May 6, 1960). "Some Moral and Technical Consequences of Automation". Retrieved August 18, 2019. 
  11. Sinick, Jonah (July 20, 2013). "Norbert Wiener's paper "Some Moral and Technical Consequences of Automation"". LessWrong. Retrieved August 18, 2019. 
  12. Dafoe, Allan; Russell, Stuart. "Yes, We Are Worried About the Existential Risk of Artificial Intelligence. A defense of the warnings about AI in philosopher Nick Bostrom's book Superintelligence.". Technology Review. 
  13. Good, Irving John. "Speculations Concerning the First Ultraintelligent Machine" (PDF). Retrieved August 30, 2022. 
  14. Good, Irving John (1966). "Speculations Concerning the First Ultraintelligent Machine". Advances in Computers. 6: 31–38. doi:10.1016/S0065-2458(08)60418-0. 
  15. Muehlhauser, Luke (July 12, 2013). "Miles Brundage recently pointed me to these quotes from Ed Fredkin, recorded in McCorduck (1979).". LessWrong. Retrieved May 7, 2020. 
  16. Waldrop, Mitchell (Spring 1987). "A Question of Responsibility". AI Magazine. 8 (1): 28–39. doi:10.1609/aimag.v8i1.572. 
  17. 17.0 17.1 17.2 17.3 Muehlhauser, Luke (October 18, 2013). "Richard Posner on AI Dangers". Machine Intelligence Resarch. Retrieved May 7, 2020. 
  18. "History of AI risk thought". Lesswrongwiki. LessWrong. Retrieved July 28, 2017. 
  19. "Form 990-EZ 2000" (PDF). Retrieved June 1, 2017. Organization was incorporated in July 2000 and does not have a financial history for years 1996-1999. 
  20. "About the Singularity Institute for Artificial Intelligence". Retrieved July 1, 2017. The Singularity Institute for Artificial Intelligence, Inc. (SIAI) was incorporated on July 27th, 2000 by Brian Atkins, Sabine Atkins (then Sabine Stoeckel) and Eliezer Yudkowsky. The Singularity Institute is a nonprofit corporation governed by the Georgia Nonprofit Corporation Code, and is federally tax-exempt as a 501(c)(3) public charity. At this time, the Singularity Institute is funded solely by individual donors. 
  21. Eliezer Yudkowsky (2001). "Creating Friendly AI 1.0: The Analysis and Design of Benevolent Goal Architectures" (PDF). The Singularity Institute. Retrieved July 5, 2017. 
  22. "SL4: By Thread". Retrieved July 1, 2017. 
  23. "SL4: By Thread". Retrieved July 1, 2017. 
  24. "Amazon.com: Super-Intelligent Machines (Ifsr International Series on Systems Science and Engineering) (9780306473883): Bill Hibbard: Books". Retrieved July 26, 2017. Publisher: Springer; 2002 edition (October 31, 2002) 
  25. "Ethical Issues In Advanced Artificial Intelligence". Retrieved July 25, 2017. 
  26. "About". Oxford Martin School. Retrieved July 25, 2017. The Future of Humanity Institute was established in 2005 with funding from the Oxford Martin School (then known as the James Martin 21st Century School). 
  27. "Papers from the 2005 AAAI Fall Symposium". Archived from the original on 2014-11-29. 
  28. "SL4: By Thread". Retrieved July 1, 2017. 
  29. "Overcoming Bias : Bio". Retrieved June 1, 2017. 
  30. "Basic AI drives". Lesswrongwiki. LessWrong. Retrieved July 26, 2017. 
  31. "AIPosNegFactor.pdf" (PDF). Retrieved July 27, 2017. 
  32. "The Hanson-Yudkowsky AI-Foom Debate". Lesswrongwiki. LessWrong. Retrieved July 1, 2017. 
  33. "Eliezer_Yudkowsky comments on Thoughts on the Singularity Institute (SI) - Less Wrong". LessWrong. Retrieved July 15, 2017. Nonetheless, it already has a warm place in my heart next to the debate with Robin Hanson as the second attempt to mount informed criticism of SIAI. 
  34. "FAQ - Lesswrongwiki". LessWrong. Retrieved June 1, 2017. 
  35. "Recent Singularity Institute Accomplishments". Singularity Institute for Artificial Intelligence. Retrieved July 6, 2017. 
  36. Muehlhauser, Luke (March 10, 2011). "AGI and Friendly AI in the dominant AI textbook". LessWrong. Retrieved December 20, 2020. 
  37. Muehlhauser, Luke (October 19, 2013). "Russell and Norvig on Friendly AI". Machine Intelligence Research Institute. Retrieved December 20, 2020. 
  38. Legg, Shane. "About". Retrieved September 15, 2019. 
  39. 39.0 39.1 "Landscape of current work on potential risks from advanced AI". Google Docs. Retrieved July 27, 2017. 
  40. "How Long Untill Human-Level AI - 2011_AI-Experts.pdf" (PDF). Retrieved July 28, 2017. 
  41. "About". Global Catastrophic Risk Institute. Retrieved July 26, 2017. The Global Catastrophic Risk Institute (GCRI) is a nonprofit, nonpartisan think tank. GCRI was founded in 2011 by Seth Baum and Tony Barrett. 
  42. "Back of the Envelope Guide to Philanthropy". Retrieved July 28, 2017. 
  43. "Gordon Irlam on the BEGuide". Meteuphoric. WordPress.com. October 16, 2014. Retrieved July 28, 2017. 
  44. "Welcome". Oxford Martin Programme on the Impacts of Future Technology. Retrieved July 26, 2017. The Oxford Martin Programme on the Impacts of Future Technology, launched in September 2011, is an interdisciplinary horizontal Programme within the Oxford Martin School in collaboration with the Faculty of Philosophy at Oxford University. 
  45. "About". Facing the Intelligence Explosion. Retrieved July 27, 2017. 
  46. Luke Muehlhauser (December 11, 2013). "MIRI's Strategy for 2013". Machine Intelligence Research Institute. Retrieved July 6, 2017. 
  47. Sylvia Hui (November 25, 2012). "Cambridge to study technology's risk to humans". Retrieved July 26, 2017. The university said Sunday the center's launch is planned next year. 
  48. "Centre for the Study of Existential Risk". 
  49. "Transparency". Foundational Research Institute. Retrieved July 27, 2017. 
  50. Muehlhauser, Luke (July 8, 2013). "Four Focus Areas of Effective Altruism". LessWrong. Retrieved September 8, 2019. 
  51. "Future Progress in Artificial Intelligence: A Survey of Expert Opinion - survey.pdf" (PDF). Retrieved July 28, 2017. 
  52. Efrati, Amir (January 26, 2014). "Google Beat Facebook for DeepMind, Creates Ethics Board". Huffington Post. Retrieved September 15, 2019. 
  53. Bosker, Bianca (January 29, 2014). "Google's New A.I. Ethics Board Might Save Humanity From Extinction". Retrieved September 15, 2019. 
  54. Victoria Krakovna. "New organization - Future of Life Institute (FLI)". LessWrong. Retrieved July 6, 2017. As of May 2014, there is an existential risk research and outreach organization based in the Boston area. The Future of Life Institute (FLI), spearheaded by Max Tegmark, was co-founded by Jaan Tallinn, Meia Chita-Tegmark, Anthony Aguirre and myself. 
  55. 55.0 55.1 Dafoe, Allan; Russell, Stuart J. (November 2, 2016). "Yes, We Are Worried About the Existential Risk of Artificial Intelligence. A defense of the warnings about AI in philosopher Nick Bostrom's book Superintelligence.". MIT Technology Review. Retrieved May 7, 2020. 
  56. "MIRI's September Newsletter". Machine Intelligence Research Institute. September 1, 2014. Retrieved July 15, 2017. Paul Christiano and Katja Grace have launched a new website containing many analyses related to the long-term future of AI: AI Impacts. 
  57. Peter Stone; et al. (AI100 Standing Committee and Study Panel) (September 2016). "One Hundred Year Study on Artificial Intelligence: Report of the 2015-2016 Study Panel" (PDF). Retrieved July 27, 2017. The One Hundred Year Study on Artificial Intelligence, launched in the fall of 2014, is a longterm investigation of the field of Artificial Intelligence (AI) and its influences on people, their communities, and society. 
  58. Samuel Gibbs (October 27, 2014). "Elon Musk: artificial intelligence is our biggest existential threat". The Guardian. Retrieved July 25, 2017. 
  59. "AeroAstro Centennial Webcast". Retrieved July 25, 2017. The high point of the MIT Aeronautics and Astronautics Department's 2014 Centennial celebration is the October 22-24 Centennial Symposium 
  60. Benja Fallenstein. "Welcome!". Intelligent Agent Foundations Forum. Retrieved June 30, 2017. post by Benja Fallenstein 969 days ago 
  61. Hibbard, Bill (2014): Ethical Artificial Intelligence. https://arxiv.org/abs/1411.1373
  62. "The Myth Of AI". Edge. November 14, 2014. Retrieved December 20, 2020. 
  63. "Stuart Russell: AI value alignment problem must be an "intrinsic part" of the field's mainstream agenda". LessWrong. November 26, 2014. Retrieved December 20, 2020. 
  64. "Stephen Hawking warns artificial intelligence could end mankind". BBC News. December 2, 2014. Retrieved July 25, 2017. 
  65. "Ex Machina's Scientific Advisor – Murray Shanahan". Y Combinator. June 28, 2017. Retrieved August 18, 2019. 
  66. "Ex Machina movie asks: is AI research in safe hands?". January 21, 2015. Retrieved August 18, 2019. 
  67. "Go see Ex Machina". LessWrong. February 26, 2016. Retrieved August 18, 2019. 
  68. Hardawar, Devindra (April 1, 2015). "'Ex Machina' director embraces the rise of superintelligent AI". Engadget. Retrieved August 18, 2019. 
  69. "Daniel Dewey". Open Philanthropy. Retrieved July 25, 2017. 
  70. "2015: What do you think about machines that think?". Edge. Retrieved May 7, 2020. 
  71. Future of Humanity Institute - FHI. "Strategic Artificial Intelligence Research Centre - Future of Humanity Institute". Future of Humanity Institute. Retrieved July 27, 2017. 
  72. "Hi Reddit, I'm Bill Gates and I'm back for my third AMA. Ask me anything. • r/IAmA". reddit. Retrieved July 25, 2017. 
  73. Stuart Dredge (January 29, 2015). "Artificial intelligence will become strong enough to be a concern, says Bill Gates". The Guardian. Retrieved July 25, 2017. 
  74. "AI safety conference in Puerto Rico". Future of Life Institute. October 12, 2015. Retrieved July 13, 2017. 
  75. Nate Soares (July 16, 2015). "An Astounding Year". Machine Intelligence Research Institute. Retrieved July 13, 2017. 
  76. "The Artificial Intelligence Revolution: Part 1". Wait But Why. January 22, 2017. Retrieved July 25, 2017. 
  77. "The Artificial Intelligence Revolution: Part 2". Wait But Why. January 27, 2015. Retrieved July 25, 2017. 
  78. "Machine intelligence, part 1". Sam Altman. Retrieved July 27, 2017. 
  79. "Existential risk from artificial general intelligence". May 1, 2015. Retrieved August 18, 2019. 
  80. Alexander, Scott (May 22, 2015). "AI Researchers On AI Risk". Retrieved May 7, 2020. 
  81. Matt Weinberger (June 4, 2015). "Head of Silicon Valley's most important startup farm says we're in a 'mega bubble' that won't last". Business Insider. Retrieved July 27, 2017. 
  82. Yampolskiy, Roman (June 17, 2015). "Artificial Superintelligence: A Futuristic Approach". Retrieved August 20, 2017. 
  83. Muehlhauser, Luke (July 15, 2013). "Roman Yampolskiy on AI Safety Engineering". Machine Intelligence Research Institute. Retrieved August 20, 2017. 
  84. "Grants Timeline - Future of Life Institute". Future of Life Institute. Retrieved July 13, 2017. 
  85. "New International Grants Program Jump-Starts Research to Ensure AI Remains Beneficial: Press release for FLI grant awardees. - Future of Life Institute". Future of Life Institute. Retrieved July 13, 2017. 
  86. "AI Safety Research - Future of Life Institute". Future of Life Institute. Retrieved July 13, 2017. 
  87. Matthews, Dylan (August 10, 2015). "I spent a weekend at Google talking with nerds about charity. I came away … worried.". Vox. Retrieved April 19, 2020. 
  88. Bensinger, Rob (August 28, 2015). "AI and Effective Altruism". Machine Intelligence Research Institute. Retrieved April 19, 2020. 
  89. "Future of Life Institute — Artificial Intelligence Risk Reduction". Open Philanthropy. Retrieved May 7, 2020. 
  90. 90.00 90.01 90.02 90.03 90.04 90.05 90.06 90.07 90.08 90.09 90.10 90.11 90.12 90.13 90.14 90.15 90.16 90.17 90.18 90.19 90.20 90.21 90.22 90.23 90.24 90.25 90.26 90.27 90.28 "Open Philanthropy donations made (filtered to cause areas matching safety)". Retrieved May 16, 2020. 
  91. "Potential Risks from Advanced Artificial Intelligence". Open Philanthropy. Retrieved July 27, 2017. 
  92. "The Artificial General Intelligence Control Problem". Retrieved April 18, 2020. 
  93. "What Do We Know about AI Timelines?". Open Philanthropy. Retrieved July 25, 2017. 
  94. "The future of intelligence: Cambridge University launches new centre to study AI and the future of humanity". University of Cambridge. December 3, 2015. Retrieved July 26, 2017. 
  95. John Markoff (December 11, 2015). "Artificial-Intelligence Research Center Is Founded by Silicon Valley Investors". The New York Times. Retrieved July 26, 2017. The organization, to be named OpenAI, will be established as a nonprofit, and will be based in San Francisco. 
  96. "Introducing OpenAI". OpenAI Blog. December 11, 2015. Retrieved July 26, 2017. 
  97. "Global Catastrophic Risks 2016". The Global Priorities Project. April 28, 2016. Retrieved July 28, 2017. 
  98. "Global-Catastrophic-Risk-Annual-Report-2016-FINAL.pdf" (PDF). Retrieved July 28, 2017. 
  99. George Dvorsky. "These Are the Most Serious Catastrophic Threats Faced by Humanity". Gizmodo. Retrieved July 28, 2017. 
  100. "How and why to use your career to make artificial intelligence safer". 80,000 Hours. April 7, 2016. Retrieved July 25, 2017. 
  101. "Risks posed by artificial intelligence". 80,000 Hours. 
  102. "What should we learn from past AI forecasts?". Open Philanthropy. Retrieved July 27, 2017. 
  103. "Potential Risks from Advanced Artificial Intelligence: The Philanthropic Opportunity". Open Philanthropy. Retrieved July 27, 2017. 
  104. "Some Background on Our Views Regarding Advanced Artificial Intelligence". Open Philanthropy. Retrieved July 27, 2017. 
  105. "[1606.06565] Concrete Problems in AI Safety". June 21, 2016. Retrieved July 25, 2017. 
  106. Karnofsky, Holden (June 23, 2016). "Concrete Problems in AI Safety". Retrieved April 18, 2020. 
  107. For instance, the 2019 AI Alignment Literature Review has this line, talking about AI safety publications: "While low-quality contributions might help improve Concrete Problems’ citation count, they may use up scarce funding."
  108. "Workshop on Safety and Control for Artificial Intelligence". June 28, 2016. Retrieved May 16, 2020. 
  109. "UC Berkeley launches Center for Human-Compatible Artificial Intelligence". Berkeley News. August 29, 2016. Retrieved July 26, 2017. 
  110. Scott Dadich (October 12, 2016). "Barack Obama Talks AI, Robo Cars, and the Future of the World". WIRED. Retrieved July 28, 2017. 
  111. "No, the Experts Don't Think Superintelligent AI is a Threat to Humanity. Ask the people who should really know.". MIT Technology Review. September 20, 2016. Retrieved May 7, 2020. 
  112. "The Administration's Report on the Future of Artificial Intelligence". whitehouse.gov. October 12, 2016. Retrieved July 28, 2017. 
  113. "The Obama Administration's Roadmap for AI Policy". Harvard Business Review. December 21, 2016. Retrieved July 28, 2017. 
  114. "CFAR's new focus, and AI Safety - Less Wrong". LessWrong. Retrieved July 13, 2017. 
  115. "Further discussion of CFAR's focus on AI safety, and the good things folks wanted from "cause neutrality" - Less Wrong". LessWrong. Retrieved July 13, 2017. 
  116. Larks (December 13, 2016). "2016 AI Risk Literature Review and Charity Comparison". Effective Altruism Forum. Retrieved August 18, 2019. 
  117. "Annual Report on Global Risks". Global Challenges Foundation. Retrieved July 28, 2017. 
  118. "Global Catastrophic Risks 2017.pdf" (PDF). Retrieved July 28, 2017. 
  119. "Acknowledgements". Global Risks Report 2017. Retrieved July 28, 2017. 
  120. "Asilomar AI Principles". Future of Life Institute. Retrieved April 19, 2020. 
  121. Alexander, Scott (February 6, 2017). "Notes from the Asilomar Conference on Beneficial AI". Retrieved April 19, 2020. 
  122. "Beneficial AI conference develops 'Asilomar AI principles' to guide future AI research". kurzweilai.net. February 3, 2017. Retrieved April 19, 2020. 
  123. Bensinger, Rob (June 30, 2015). "New report: "The Asilomar Conference: A Case Study in Risk Mitigation"". Machine Intelligence Research Institute. Retrieved April 19, 2020. 
  124. "EA Funds". Retrieved July 27, 2017. In the biography on the right you can see a list of organizations the Fund Manager has previously supported, including a wide variety of organizations such as the Centre for the Study of Existential Risk, Future of Life Institute and the Center for Applied Rationality. These organizations vary in their strategies for improving the long-term future but are likely to include activities such as research into possible existential risks and their mitigation, and priorities for robust and beneficial artificial intelligence. 
  125. William MacAskill (February 9, 2017). "Introducing the EA Funds". Effective Altruism Forum. Retrieved July 27, 2017. 
  126. "DeepMind Has Simple Tests That Might Prevent Elon Musk's AI Apocalypse". Bloomberg.com. 11 December 2017. Retrieved 5 March 2020. 
  127. "Alphabet's DeepMind Is Using Games to Discover If Artificial Intelligence Can Break Free and Kill Us All". Fortune. Retrieved 5 March 2020. 
  128. "Specifying AI safety problems in simple environments | DeepMind". DeepMind. Retrieved 5 March 2020. 
  129. "May 2017 Newsletter". Machine Intelligence Research Institute. May 10, 2017. Retrieved July 25, 2017. Interested parties may also wish to apply for the event coordinator position at the new Berkeley Existential Risk Initiative, which will help support work at CHAI and elsewhere. 
  130. "Update on Effective Altruism Funds". Effective Altruism Forum. April 20, 2017. Retrieved July 25, 2017. 
  131. "Positively shaping the development of artificial intelligence". 80,000 Hours. Retrieved July 25, 2017. 
  132. "Completely new article on the pros/cons of working on AI safety, and how to actually go about it". April 6, 2017. 
  133. 133.0 133.1 133.2 Larks (December 21, 2020). "2020 AI Alignment Literature Review and Charity Comparison". Effective Altruism Forum. Retrieved December 21, 2020. 
  134. "[1705.08807] When Will AI Exceed Human Performance? Evidence from AI Experts". Retrieved July 13, 2017. 
  135. "Media discussion of 2016 ESPAI". AI Impacts. June 14, 2017. Retrieved July 13, 2017. 
  136. "New in-depth guide to AI policy and strategy careers, written with Miles Brundage, a researcher at the University of Oxford's Future of Humanity Institute". 80,000 Hours. June 14, 2017. 
  137. "Elon Musk Warns Governors: Artificial Intelligence Poses 'Existential Risk'". NPR.org. July 17, 2017. Retrieved July 28, 2017. 
  138. Catherine Clifford (July 24, 2017). "Facebook CEO Mark Zuckerberg: Elon Musk's doomsday AI predictions are 'pretty irresponsible'". CNBC. Retrieved July 25, 2017. 
  139. Malo Bourgon (November 8, 2017). "A major grant from Open Philanthropy". Machine Intelligence Research Institute. Retrieved November 11, 2017. 
  140. "Machine Intelligence Research Institute — General Support (2017)". Open Philanthropy. November 8, 2017. Retrieved November 11, 2017. 
  141. Rice, Issa (October 23, 2017). "Initial commit: AI Watch". Retrieved April 19, 2020. 
  142. Rice, Issa (October 24, 2017). "start on portal: AI Watch". Retrieved April 19, 2020. 
  143. "There's No Fire Alarm for Artificial General Intelligence". Machine Intelligence Research Institute. October 13, 2017. Retrieved April 19, 2020. 
  144. Yudkowsky, Eliezer (October 13, 2017). "There's No Fire Alarm for Artificial General Intelligence". LessWrong. Retrieved April 19, 2020. 
  145. Future of Humanity Institute - FHI (January 19, 2018). "Quarterly Update Winter 2017 - Future of Humanity Institute". Future of Humanity Institute. Retrieved March 14, 2018. 
  146. Slepnev, Vladimir (November 3, 2017). "Announcing the AI Alignment Prize". LessWrong. Retrieved April 19, 2020. 
  147. Slepnev, Vladimir (January 20, 2019). "Announcement: AI alignment prize round 4 winners". Alignment Forum. Retrieved April 19, 2020. 
  148. "Berkeley Existential Risk Initiative | Activity Update - December 2017". Retrieved February 8, 2018. 
  149. Larks (December 20, 2017). "2017 AI Safety Literature Review and Charity Comparison". Effective Altruism Forum. Retrieved August 18, 2019. 
  150. 150.0 150.1 "Our 2019 Fundraiser Review". Machine Intelligence Research Institute. February 13, 2020. Retrieved April 19, 2020. 
  151. "Team". Median Group. Retrieved May 16, 2020. 
  152. "Research". Median Group. Retrieved May 16, 2020. 
  153. Christiano, Paul (February 24, 2018). "Takeoff speeds". Retrieved February 7, 2021. 
  154. Christiano, Paul (February 24, 2018). "Arguments about fast takeoff". LessWrong. Retrieved February 7, 2021. 
  155. "Likelihood of discontinuous progress around the development of AGI". AI Impacts. Retrieved February 7, 2021. 
  156. SoerenMind (April 21, 2019). "Any rebuttals of Christiano and AI Impacts on takeoff speeds?". LessWrong. Retrieved February 7, 2021. 
  157. Kokotajlo, Daniel (February 24, 2018). "Against GDP as a metric for AI timelines and takeoff speeds". Center on Long-Term Risk. Retrieved February 7, 2021. 
  158. Todd, Benjamin; Tse, Brian (February 28, 2018). "'A new recommended career path for effective altruists: China specialist". 80,000 Hours. Retrieved September 8, 2019. 
  159. "This is Why the IARPA Director Doesn't Worry About Self-Aware Artificial Intelligence". NextGov. March 8, 2018. Retrieved May 16, 2020. 
  160. "AI Alignment Podcast". Future of Life Institute. Retrieved April 4, 2020. 
  161. "Alignment Newsletter One Year Retrospective". Effective Altruism Forum. April 10, 2019. Retrieved September 8, 2019. 
  162. "Alignment Newsletter 1". April 9, 2018. Retrieved September 8, 2019. 
  163. "Alignment Newsletter". Retrieved September 8, 2019. 
  164. "OpenAI Charter". OpenAI Blog. April 9, 2018. Retrieved May 5, 2018. 
  165. wunan (April 9, 2018). "OpenAI charter". LessWrong. Retrieved May 5, 2018. 
  166. "[D] OpenAI Charter • r/MachineLearning". reddit. Retrieved May 5, 2018. 
  167. "OpenAI Charter". Hacker News. Retrieved May 5, 2018. 
  168. Tristan Greene (April 10, 2018). "The AI company Elon Musk co-founded intends to create machines with real intelligence". The Next Web. Retrieved May 5, 2018. 
  169. "Previous Camps". Retrieved September 7, 2019. 
  170. Everitt, Tom; Lea, Gary; Hutter, Marcus (May 3, 2018). "AGI Safety Literature Review". Retrieved December 21, 2020. 
  171. Arnold, Raymond (July 10, 2018). "Announcing AlignmentForum.org Beta". LessWrong. Retrieved April 18, 2020. 
  172. Habryka, Oliver; Pace, Ben; Arnold, Raymond; Babcock, Jim (October 29, 2018). "Introducing the AI Alignment Forum (FAQ)". LessWrong. Retrieved April 18, 2020. 
  173. "Announcing the new AI Alignment Forum". Machine Intelligence Research Institute. October 29, 2018. Retrieved April 18, 2020. 
  174. "July 2018 - Long-Term Future Fund Grants". Effective Altruism Funds. August 14, 2018. Retrieved August 18, 2019. 
  175. 175.0 175.1 175.2 "Effective Altruism Funds donations made (filtered to cause areas matching AI safety)". Retrieved August 18, 2019. 
  176. "CS 294-149: Safety and Control for Artificial General Intelligence (Fall 2018)". EECS (UC Berkeley). Retrieved May 16, 2020. 
  177. 177.0 177.1 "Berkeley Existential Risk Initiative donations made (filtered to cause areas matching AI safety)". Retrieved August 18, 2019. 
  178. "Stanford Center for AI Safety (Wayback Machine snapshot)". October 25, 2018. Retrieved May 9, 2020. 
  179. "2018 Update: Our New Research Directions - Machine Intelligence Research Institute". Machine Intelligence Research Institute. November 22, 2018. Retrieved February 14, 2019. 
  180. 180.0 180.1 Larks (December 17, 2018). "2018 AI Alignment Literature Review and Charity Comparison". Effective Altruism Forum. Retrieved August 18, 2019. 
  181. 181.0 181.1 Larks (December 18, 2019). "2019 AI Alignment Literature Review and Charity Comparison". Effective Altruism Forum. Retrieved April 18, 2020. 
  182. Piper, Kelsey (December 21, 2018). "The case for taking AI seriously as a threat to humanity. Why some people fear AI, explained.". Vox. Retrieved April 19, 2020. 
  183. Wiblin, Robert; Harris, Kieran (February 27, 2019). "Can journalists still write about important things?". 80,000 Hours. Retrieved April 19, 2020. 
  184. Williams, Tate (March 22, 2019). "Important But Neglected: Why an Effective Altruist Funder Is Giving Millions to AI Security". Inside Philanthropy. Retrieved May 16, 2020. 
  185. "Beneficial AGI 2019". Future of Life Institute. Retrieved April 19, 2020. 
  186. "CSER at the Beneficial AGI 2019 Conference". Centre for the Study of Existential Risk. January 31, 2019. Retrieved April 19, 2020. 
  187. Drexler, K. Eric. "Reframing Superintelligence: Comprehensive AI Services as General Intelligence". Future of Humanity Institute. 
  188. Shah, Rohin (January 7, 2019). "Reframing Superintelligence: Comprehensive AI Services as General Intelligence". Alignment Forum. Retrieved May 2, 2020. 
  189. Shah, Rohin (January 7, 2019). "Reframing Superintelligence: Comprehensive AI Services as General Intelligence". LessWrong. Retrieved May 2, 2020. 
  190. "An AI helped us write this article". vox.com. Retrieved 28 June 2019. 
  191. Lowe, Ryan. "OpenAI's GPT-2: the model, the hype, and the controversy". towardsdatascience.com. Retrieved 10 July 2019. 
  192. Johnson, Khari. "OpenAI launches new company for funding safe artificial general intelligence". venturebeat.com. Retrieved 15 June 2019. 
  193. Trazzi, Michaël. "Considerateness in OpenAI LP Debate". medium.com. Retrieved 15 June 2019. 
  194. Christiano, Paul (March 17, 2019). "What failure looks like". LessWrong. Retrieved February 7, 2021. 
  195. Pace, Ben (July 29, 2020). "What Failure Looks Like: Distilling the Discussion". LessWrong. Retrieved February 7, 2021. 
  196. Adams, Amy (March 18, 2019). "Stanford University launches the Institute for Human-Centered Artificial Intelligence. The new institute will focus on guiding artificial intelligence to benefit humanity.". Retrieved May 9, 2020. 
  197. Perry, Lucas (April 11, 2019). "AI Alignment Podcast: An Overview of Technical AI Alignment with Rohin Shah (Part 1)". Future of Life Institute. Retrieved May 2, 2020. 
  198. Perry, Lucas (April 25, 2019). "AI Alignment Podcast: An Overview of Technical AI Alignment with Rohin Shah (Part 2)". Retrieved May 2, 2020. 
  199. "OPEN THREAD 129.25". June 8, 2019. Retrieved August 18, 2019. 
  200. Leahy, Connor (June 12, 2019). "The Hacker Learns to Trust". Retrieved April 19, 2020. 
  201. Pace, Ben (June 21, 2019). "The Hacker Learns to Trust". LessWrong. Retrieved April 19, 2020. 
  202. "Some Notes on The AI Does Not Hate You: Superintelligence, Rationality and the Race to Save the World". Reddit (Slate Star Codex subreddit). June 24, 2019. Retrieved May 2, 2020. 
  203. McCluskey, Peter (October 24, 2019). "The AI Does Not Hate You". Retrieved May 2, 2020. 
  204. Rozendal, Siebe; Shovelain, Justin; Kristofferson, David (June 20, 2019). "A case for strategy research: what it is and why we need more of it". LessWrong. Retrieved May 2, 2020. 
  205. Rozendal, Siebe; Shovelain, Justin; Kristofferson, David (June 20, 2019). "A case for strategy research: what it is and why we need more of it". Effective Altruism Forum. Retrieved May 2, 2020. 
  206. "Convergence publications". Convergence Analysis. Retrieved May 2, 2020. 
  207. "Convergence analysis". Retrieved December 8, 2015. 
  208. "The Future of Grant-making Funded by Jaan Tallinn at BERI". Berkeley Existential Risk Initiative. August 25, 2019. Retrieved April 18, 2020. 
  209. "August 2019: Long-Term Future Fund Grants and Recommendations". Effective Altruism Funds. August 30, 2019. Retrieved April 18, 2020. 
  210. Sample, Ian (October 24, 2019). "Human Compatible by Stuart Russell review -- AI and our future. Creating machines smarter than us could be the biggest event in human history -- and the last". The Guardian. Retrieved April 18, 2020. 
  211. Piper, Kelsey (October 26, 2019). "AI could be a disaster for humanity. A top computer scientist thinks he has the solution. Stuart Russell wrote the book on AI and is leading the fight to change how we build it.". Vox. Retrieved April 18, 2020. 
  212. Coldewey, Devin (March 20, 2020). "Stuart Russell on how to make AI 'human-compatible': 'We've actually thought about AI the wrong way from the beginning'". TechCrunch. Retrieved April 18, 2020. 
  213. Pace, Ben (November 6, 2019). "AI Alignment Research Overview (by Jacob Steinhardt)". LessWrong. Retrieved April 18, 2020. 
  214. "November 2019: Long-Term Future Fund Grants". Effective Altruism Funds. November 21, 2019. Retrieved April 18, 2020. 
  215. Alfrink, Toon (November 24, 2019). "RAISE post-mortem". LessWrong. Retrieved April 18, 2020. 
  216. Shah, Rohin (January 27, 2020). "AI Alignment 2018-19 Review". LessWrong. Retrieved April 18, 2020. 
  217. 217.0 217.1 217.2 Bensinger, Rob (April 27, 2020). "MIRI's largest grant to date!". Machine Intelligence Research Institute. Retrieved May 2, 2020. 
  218. Tallinn, Jaan (February 22, 2020). "Jaan Tallinn's Philanthropic Pledge". LessWrong. Retrieved December 2, 2021. 
  219. Perry, Lucas (March 31, 2020). "FLI Podcast: The Precipice: Existential Risk and the Future of Humanity with Toby Ord". Future of Life Institute. Retrieved April 18, 2020. 
  220. Wiblin, Robert; Koehler, Arden; Harris, Kieran (March 7, 2020). "Toby Ord on the precipice and humanity's potential futures". 80,000 Hours. Retrieved April 18, 2020. 
  221. "Grants". Berkeley Existential Risk Initiative. Retrieved May 2, 2020. 
  222. Bergal, Asya (April 3, 2020). "Takeaways from safety by default interviews". LessWrong. Retrieved April 18, 2020. 
  223. "Interviews on plausibility of AI safety by default". AI Impacts. April 2, 2020. Retrieved April 18, 2020. 
  224. "April 2020 – Long-Term Future Fund Grant Recommendations". Effective Altruism Funds. April 14, 2020. Retrieved May 2, 2020. 
  225. "Artificial Intelligence: A Modern Approach (4th edition) on the Alignment Problem". LessWrong. September 16, 2020. Retrieved December 20, 2020. 
  226. Olsson, Catherine (May 14, 2020). "Hi Ryan - in terms of the Fellowship, I have a lot of thoughts about what we're trying to do". Effective Altruism Forum. Retrieved May 16, 2020. 
  227. "SFF-2020-H1 S-process Recommendations Announcement". Survival and Flourishing Fund. May 29, 2020. Retrieved October 10, 2020. 
  228. Linsefors, Linda (November 19, 2020). "Announcing AI Safety Support". Effective Altruism Forum. Retrieved December 20, 2020. 
  229. "GPT-3 on GitHub". OpenAI. Retrieved July 19, 2020. 
  230. "Language Models are Few-Shot Learners". May 28, 2020. Retrieved July 19, 2020. 
  231. "Nick Cammarata on Twitter: GPT-3 as therapist". July 14, 2020. Retrieved July 19, 2020. 
  232. Gwern (June 19, 2020). "GPT-3 Creative Fiction". Retrieved July 19, 2020. 
  233. Walton, Nick (July 14, 2020). "AI Dungeon: Dragon Model Upgrade. You can now play AI Dungeon with one of the most powerful AI models in the world.". Retrieved July 19, 2020. 
  234. Shameem, Sharif (July 13, 2020). "Sharif Shameem on Twitter: With GPT-3, I built a layout generator where you just describe any layout you want, and it generates the JSX code for you.". Twitter. Retrieved July 19, 2020. 
  235. Lacker, Kevin (July 6, 2020). "Giving GPT-3 a Turing Test". Retrieved July 19, 2020. 
  236. Woolf, Max (July 18, 2020). "Tempering Expectations for GPT-3 and OpenAI's API". Retrieved July 19, 2020. 
  237. Asparouhov, Delian (July 17, 2020). "Quick thoughts on GPT3". Retrieved July 19, 2020. 
  238. "How "honest" is GPT-3?". LessWrong. July 8, 2020. Retrieved December 20, 2020. 
  239. Shah, Rohin (June 3, 2020). "Alignment Newsletter #102: Meta learning by GPT-3, and a list of full proposals for AI alignment". Retrieved December 20, 2020. 
  240. Jones, Andy (July 27, 2020). "Are we in an AI overhang?". LessWrong. Retrieved December 20, 2020. 
  241. "AI Research Considerations for Human Existential Safety (ARCHES)". May 30, 2020. Retrieved June 11, 2020. 
  242. Bensinger, Rob (July 8, 2020). "July 2020 Newsletter". Machine Intelligence Research Institute. 
  243. "Andrew Critch on AI Research Considerations for Human Existential Safety". Future of Life Institute. September 15, 2020. Retrieved December 20, 2020. 
  244. Krakovna, Victoria (May 31, 2020). "Possible takeaways from the coronavirus pandemic for slow AI takeoff". LessWrong. Retrieved December 20, 2020. 
  245. Branwen, Gwern (July 27, 2020). "As far as I can tell, this is what is going on: they do not have any such thing, because GB and DM do not believe in the scaling hypothesis the way that Sutskever, Amodei and others at OA do.". LessWrong. Retrieved December 20, 2020. 
  246. "September 2020: Long-Term Future Fund Grants". Effective Altruism Funds. September 3, 2020. Retrieved December 20, 2020. 
  247. Cotra, Ajeya (September 18, 2020). "Draft report on AI timelines". LessWrong. Retrieved June 16, 2022. 
  248. "2020 Draft Report on Biological Anchors". Retrieved June 16, 2022. 
  249. Karnofsky, Holden (August 31, 2021). "Forecasting transformative AI: the "biological anchors" method in a nutshell". Retrieved June 16, 2022. 
  250. Barnett, Matthew (February 23, 2022). "A comment on Ajeya Cotra's draft report on AI timelines". LessWrong. Retrieved June 16, 2022. 
  251. "AGI safety from first principles". Alignment Forum. September 28, 2020. Retrieved December 20, 2020. 
  252. Shah, Rohin. "Alignment Newsletter #122". 
  253. Shah, Rohin (October 6, 2020). "The Alignment Problem: Machine Learning and Human Values". LessWrong. Retrieved December 20, 2020. 
  254. Garrabrant, Scott (October 22, 2020). "Introduction to Cartesian Frames". LessWrong. Retrieved December 20, 2020. 
  255. Bensinger, Rob (October 23, 2020). "October 2020 Newsletter". Machine Intelligence Research Institute. Retrieved December 20, 2020. 
  256. "SFF-2020-H2 S-process Recommendations Announcement". Survival and Flourishing Fund. Retrieved December 10, 2020. 
  257. "November 2020: Long-Term Future Fund Grants". November 28, 2020. Retrieved December 20, 2020. 
  258. 258.0 258.1 "Organizational Update from OpenAI". OpenAI. December 29, 2020. Retrieved February 7, 2021. 
  259. Christiano, Paul (January 29, 2021). "Today was my last day at OpenAI.". Retrieved February 7, 2021. 
  260. Leike, Jan (January 22, 2021). "Last week I joined @OpenAI to lead their alignment effort.". Twitter. Retrieved February 7, 2021. 
  261. gwern (December 29, 2020). "Dario Amodei et al leave OpenAI". Reddit. Retrieved February 7, 2021. 
  262. Kokotajlo, Daniel (December 29, 2020). "Dario Amodei leaves OpenAI". Retrieved February 7, 2021. 
  263. "openai-message-to-board.md". Retrieved November 21, 2023. 
  264. Musk, Elon (November 21, 2023). "This letter about OpenAI was just sent to me. These seem like concerns worth investigating.". Twitter. 
  265. Woods, Kat (March 18, 2021). "Introducing The Nonlinear Fund: AI Safety research, incubation, and funding". Effective Altruism Forum. Retrieved May 31, 2021. 
  266. "Is power-seeking AI an existential risk?". April 1, 2021. Retrieved October 23, 2022. 
  267. "Reviews of "Is power-seeking AI an existential risk?"". LessWrong. December 16, 2021. Retrieved October 23, 2022. 
  268. "Is Power-Seeking AI an Existential Risk?". June 16, 2022. Retrieved October 23, 2022. 
  269. 269.0 269.1 "Announcing the Future Fund's AI Worldview Prize". FTX Future Fund. September 23, 2022. Retrieved October 23, 2022. 
  270. "Announcing the Alignment Research Center". Alignment Forum. April 26, 2021. Retrieved May 31, 2021. 
  271. "AMA: Paul Christiano, alignment researcher". April 28, 2021. Retrieved May 31, 2021. 
  272. Colm Ó Riain (May 13, 2021). "Our all-time largest donation, and major crypto support from Vitalik Buterin". Retrieved May 31, 2021. 
  273. Bergal, Asya (May 26, 2021). "Long-Term Future Fund: May 2021 grant recommendations". Effective Altruism Forum. Retrieved May 31, 2021. 
  274. "Vox Future Perfect newsletter". Vox. May 28, 2021. Retrieved May 31, 2021. 
  275. "Anthropic". Anthropic. Retrieved May 31, 2021. 
  276. "SFF-2021-H1 S-process Recommendations Announcement". Survival and Flourishing Fund. Retrieved December 2, 2021. 
  277. Kokotajlo, Daniel (June 15, 2021). "Vignettes workshop". Retrieved December 2, 2021. 
  278. "AI Vignettes Project". AI Impacts. Retrieved December 2, 2021. 
  279. Bergal, Asya; Gleave, Adam; Habryka, Oliver; Rodriguez, Luisa (January 18, 2022). "Long-Term Future Fund: July 2021 grant recommendations". Effective Altruism Forum. Retrieved June 16, 2022. 
  280. Karnofsky, Holden. ""Most important century" series: roadmap". Retrieved June 16, 2022. 
  281. "Redwood Research". September 22, 2021. Retrieved December 2, 2021. 
  282. Thomas, Nate (October 5, 2021). "We're Redwood Research, we do applied alignment research, AMA". LessWrong. Retrieved December 2, 2021. 
  283. Hendrycks, Dan (October 17, 2021). "ML Safety Newsletter #1". Retrieved October 23, 2022. 
  284. 284.0 284.1 Kran, Esben (October 22, 2022). "Newsletter for Alignment Research: The ML Safety Updates". LessWrong. Retrieved October 23, 2022. 
  285. "SFF-2021-H2 S-process Recommendations Announcement". Survival and Flourishing Fund. Retrieved December 2, 2021. 
  286. Larks (December 23, 2021). "2021 AI Alignment Literature Review and Charity Comparison". Retrieved January 3, 2022. 
  287. Critch, Andrew; Hay, Nick (August 8, 2022). "Encultured AI Pre-planning, Part 1: Enabling New Benchmarks". AI Alignment Forum. Retrieved August 29, 2022. 
  288. Critch, Andrew; Hay, Nick (August 8, 2022). "Encultured AI, Part 1 Appendix: Relevant Research Examples". AI Alignment Forum. Retrieved August 29, 2022. 
  289. Critch, Andrew; Hay, Nick (August 11, 2022). "Encultured AI Pre-planning, Part 2: Providing a Service". AI Alignment Forum. Retrieved August 29, 2022. 
  290. Critch, Andrew; Hay, Nick (August 17, 2022). "Announcing Encultured AI: Building a Video Game". AI Alignment Forum. Retrieved August 29, 2022. 
  291. "Announcing the Future Fund". FTX Future Fund. February 28, 2022. Retrieved June 16, 2022. 
  292. "Areas of Interest". FTX Future Fund. Retrieved February 28, 2022. 
  293. "Project Ideas". FTX Future Fund. Retrieved February 28, 2022. 
  294. Hendrycks, Dan; Liu, Kevin; Zhang, Oliver; Woodside, Thomas; Hough, Sydney. "$20K in Prizes: AI Safety Arguments Competition". 
  295. "Apart Research". Wayback Machine. March 26, 2022. Retrieved October 23, 2022. 
  296. Kran, Esben; Hallgren, Jonas. "Black Box Investigation Research Hackathon". LessWrong. 
  297. Leahy, Connor (April 8, 2022). "We Are Conjecture, A New Alignment Research Startup". LessWrong. Retrieved October 23, 2022. 
  298. 298.0 298.1 Critch, Andrew (April 10, 2022). "Regarding my AI development timeline/forecast: tl;dr: This month was an inflection point, and showed signs of competitive publishing dynamics and common-knowledge-creation dynamics that have moved up my forecast of "tech company singularities" by ~2.5 years, to 2027-2033 rather than 2030-2035 as previously expressed.". Retrieved June 16, 2022. 
  299. Kokotajlo, Daniel (May 12, 2022). "Deepmind's Gato: Generalist Agent". LessWrong. Retrieved June 16, 2022. 
  300. Leong, Chris (April 11, 2022). "6 Year Decrease of Metaculus AGI Prediction". Effective Altruism Forum. Retrieved June 16, 2022. 
  301. "Anthropic Raises Series B to Build Steerable, Interpretable, Robust AI Systems.". Anthropic. Retrieved June 16, 2022. 
  302. Coldewey, Devin (April 29, 2022). "Anthropic's quest for better, more explainable AI attracts $580M". TechCrunch. Retrieved June 16, 2022. 
  303. 303.0 303.1 swarthy (May 31, 2022). "Effective Accelerationism — e/acc: aligning conciousnesses' path toward the optimal future". Substack. Retrieved May 19, 2024. 
  304. "e/acc (Effective Accelerationism)". Know Your Meme. Retrieved May 19, 2024. 
  305. Jezos, Beff; Bayeslord (July 10, 2022). "Notes on e/acc principles and tenets: A physics-first view of the principles underlying effective accelerationism". Substack. Retrieved May 19, 2024. 
  306. Hauksson, Roman (April 5, 2023). "What's the deal with Effective Accelerationism (e/acc)?". LessWrong. Retrieved May 19, 2024. 
  307. Segan, Shreeda (November 17, 2023). "Emmett Shear on AI's culture wars: The former interim CEO of OpenAI demystifies the various factions of AI and its risk to humanity.". Meridian. Retrieved May 19, 2024. 
  308. "SFF-2022-H1 S-Process Recommendations Announcement". Survival and Flourishing Fund. Retrieved June 16, 2022. 
  309. Beckstead, Nick; Aschenbrenner, Leopold; Balwit, Avital; MacAskill, William; Ramakrishnan, Ketan (June 30, 2022). "Future Fund June 2022 Update". FTX Future Fund. Retrieved July 3, 2022. 
  310. Scholl, Adam (July 1, 2022). "Safetywashing". LessWrong. Retrieved May 18, 2024. 
  311. "Beware safety-washing". LessWrong. January 13, 2023. Retrieved May 18, 2024. 
  312. Ellen, Remmelt (February 6, 2024). "Why I think it's net harmful to do technical safety research at AGI labs". LessWrong. Retrieved May 18, 2024. 
  313. "Updates: Read about our past and present projects.". Center for AI Safety. Retrieved August 29, 2022. 
  314. Hendrycks, Dan; Woodside, Thomas; Zhang, Oliver (August 5, 2022). "Announcing the Introduction to ML Safety course". Retrieved August 29, 2022. 
  315. Hilton, Benjamin (August 1, 2022). "Preventing an AI-related catastrophe: AI might bring huge benefits — if we avoid the risks". 80,000 Hours. Retrieved April 13, 2024. 
  316. JakubK (December 31, 2022). "Summary of 80k's AI problem profile". LessWrong. Retrieved April 13, 2024. 
  317. Leike, Jan; Schulman, John; Wu, Jeffrey (August 24, 2022). "Our approach to alignment research: We are improving our AI systems' ability to learn from human feedback and to assist humans at evaluating AI. Our goal is to build a sufficiently aligned AI system that can help us solve all other alignment problems.". OpenAI. Retrieved April 13, 2024. 
  318. Leike, Jan (December 5, 2022). "Why I'm optimistic about our alignment approach: Some arguments in favor and responses to common objections". Retrieved April 13, 2024. 
  319. 319.0 319.1 Bensinger, Rob; Yudkowsky, Eliezer (December 1, 2022). "A challenge for AGI organizations, and a challenge for readers". LessWrong. Retrieved April 13, 2024. 
  320. Wasil, Akash (December 30, 2022). "My thoughts on OpenAI's alignment plan". LessWrong. Retrieved April 13, 2024. 
  321. Elverlin, Søren (January 17, 2023). "OpenAI's Alignment Plan is not S.M.A.R.T.". LessWrong. Retrieved April 14, 2024. 
  322. Larsen, Thomas; Eli (August 28, 2022). "(My understanding of) What Everyone in Technical Alignment is Doing and Why". LessWrong. Retrieved August 29, 2022. 
  323. "Announcing the Future Fund's AI Worldview Prize". Effective Altruism Forum. September 23, 2022. Retrieved October 23, 2022. 
  324. "OpenAI, Shard Theory, and Left Turns W36". LessWrong. October 19, 2022. Retrieved October 23, 2022. 
  325. "The FTX Future Fund team has resigned". Effective Altruism Forum. November 10, 2022. Retrieved November 25, 2022. 
  326. Karnofsky, Holden (November 10, 2022). "Some comments on recent FTX-related events". Effective Altruism Forum. Retrieved November 25, 2022. 
  327. Woods, Kat; Spartz, Emerson; Spartz, Drew (November 13, 2022). "Announcing Nonlinear Emergency Funding". Effective Altruism Forum. Retrieved November 25, 2022. 
  328. "Open Phil is seeking applications from grantees impacted by recent events". forum.effectivealtruism.org. Retrieved 22 November 2022. 
  329. Schukraft, Jason (November 21, 2022). "Pre-Announcing the 2023 Open Philanthropy AI Worldviews Contest". Effective Altruism Forum. Retrieved November 25, 2022. 
  330. Hobhahn, Marius (November 23, 2022). "Announcing AI safety Mentors and Mentees". Effective Altruism Forum. Retrieved November 25, 2022. 
  331. "Introducing ChatGPT. We've trained a model called ChatGPT which interacts in a conversational way. The dialogue format makes it possible for ChatGPT to answer followup questions, admit its mistakes, challenge incorrect premises, and reject inappropriate requests.". OpenAI. November 30, 2022. Retrieved February 26, 2024. 
  332. Altman, Sam (November 30, 2022). "today we launched ChatGPT. try talking with it here: chat.openai.com". Twitter. Retrieved February 26, 2024. 
  333. Goldman, Sharon (November 30, 2022). "OpenAI debuts ChatGPT and GPT-3.5 series as GPT-4 rumors fly". VentureBeat. Retrieved February 26, 2024. 
  334. Mowshowitz, Zvi (December 2, 2022). "Jailbreaking ChatGPT on Release Day". LessWrong. Retrieved February 26, 2024. 
  335. Miotti, Andrea (February 24, 2023). "Retrospective on the 2022 Conjecture AI Discussions". LessWrong. Retrieved April 14, 2024. 
  336. Miotti, Andrea; Christiano, Paul; Alfour, Gabriel; Jimenez, Olivia (February 24, 2023). "Christiano (ARC) and GA (Conjecture) Discuss Alignment Cruxes". LessWrong. Retrieved April 14, 2024. 
  337. "StrictlyVC in conversation with Sam Altman, part two (OpenAI)". YouTube. January 17, 2023. Retrieved April 14, 2024. 
  338. McKenzie, Andy (January 20, 2023). "Transcript of Sam Altman's interview touching on AI safety". LessWrong. Retrieved April 14, 2024. 
  339. "AI Safety Training". Retrieved April 14, 2024. 
  340. JJ Hepburn (January 20, 2023). "Announcing aisafety.training". Retrieved April 14, 2024. 
  341. Shermohammed, Maheen; Gates, Vael (February 1, 2023). "Interviews with 97 AI Researchers: Quantitative Analysis". Effective Altruism Forum. Retrieved April 14, 2024. 
  342. Southern, Matt G. (20 April 2023). "Google Bard's Latest Update Boosts Creativity With More Drafts". Search Engine Journal. Retrieved 21 June 2023. 
  343. Elias, Jennifer (6 February 2023). "Google announces Bard A.I. in response to ChatGPT". CNBC. Retrieved 2 December 2023. 
  344. "Google to introduce AI chatbot service – DW – 02/06/2023". dw.com. Retrieved 2 December 2023. 
  345. "Bard: Google launches ChatGPT rival". bbc.com. 6 February 2023. Retrieved 2 December 2023. 
  346. "Google Unveils Bard, Its ChatGPT Rival for AI-Powered Conversation". CNET. Retrieved 2 December 2023. 
  347. Mehdi, Yusuf (February 7, 2023). "Reinventing search with a new AI-powered Microsoft Bing and Edge, your copilot for the web". Microsoft. Retrieved May 19, 2024. 
  348. Peters, Jay (15 March 2023). "The Bing AI bot has been secretly running GPT-4". The Verge. Archived from the original on March 17, 2023. Retrieved 17 March 2023. 
  349. "159 - We're All Gonna Die with Eliezer Yudkowsky". YouTube. February 20, 2023. Retrieved April 14, 2024. 
  350. "Full Transcript: Eliezer Yudkowsky on the Bankless podcast". LessWrong. February 23, 2023. Retrieved April 14, 2024. 
  351. "Transcript: Yudkowsky on Bankless follow-up Q&A". LessWrong. February 27, 2023. Retrieved April 14, 2024. 
  352. Pope, Quintin (March 20, 2023). "My Objections to "We're All Gonna Die with Eliezer Yudkowsky"". LessWrong. Retrieved May 17, 2024. 
  353. Mowshowitz, Zvi (February 21, 2023). "AI #1: Sydney and Bing". LessWrong. Retrieved April 14, 2024. 
  354. "Planning for AGI and beyond: Our mission is to ensure that artificial general intelligence—AI systems that are generally smarter than humans—benefits all of humanity.". OpenAI. February 24, 2023. Retrieved May 17, 2024. 
  355. "OpenAI's "Planning For AGI And Beyond"". Astral Codex Ten. March 1, 2023. Retrieved May 17, 2024. 
  356. Hanson, Robin (March 3, 2023). "AI Risk, Again". Overcoming Bias. Retrieved May 17, 2024. 
  357. "Robin Hanson's latest AI risk position statement". LessWrong. March 3, 2023. Retrieved May 17, 2024. 
  358. Liron (March 4, 2023). "Contra Hanson on AI Risk". LessWrong. Retrieved May 17, 2024. 
  359. Krakovna, Victoria; Shah, Rohin (March 7, 2023). "Some high-level thoughts on the DeepMind alignment team's strategy". Retrieved May 17, 2024. 
  360. Krakovna, Victoria; Shah, Rohin (March 7, 2023). "Linkpost: Some high-level thoughts on the DeepMind alignment team's strategy". Retrieved May 17, 2024. 
  361. Krakovna, Victoria; Shah, Rohin (March 7, 2023). "Linkpost: Some high-level thoughts on the DeepMind alignment team's strategy". Retrieved May 17, 2024. 
  362. "Core Views on AI Safety: When, Why, What, and How". Anthropic. March 8, 2023. Retrieved May 17, 2024. 
  363. Hatfield-Dodds, Zac (March 9, 2023). "Anthropic's Core Views on AI Safety". LessWrong. Retrieved May 17, 2024. 
  364. "GPT-4". OpenAI. March 14, 2023. Retrieved May 17, 2024. 
  365. "Introducing Claude". Anthropic. March 14, 2023. Retrieved May 17, 2024. 
  366. Mowshowitz, Zvi (March 21, 2023). "AI #4: Introducing GPT-4". LessWrong. Retrieved May 17, 2024. 
  367. 367.0 367.1 "Pause Giant AI Experiments: An Open Letter. We call on all AI labs to immediately pause for at least 6 months the training of AI systems more powerful than GPT-4.". Future of Life Institute. March 22, 2023. Retrieved May 17, 2024. 
  368. "Update on ARC's recent eval efforts: More information about ARC's evaluations of GPT-4 and Claude". March 17, 2023. Retrieved May 17, 2024. 
  369. King, Christopher (March 14, 2023). "ARC tests to see if GPT-4 can escape human control; GPT-4 failed to do so". LessWrong. Retrieved May 17, 2024. 
  370. Stein-Perlman, Zach (March 28, 2023). "FLI open letter: Pause giant AI experiments". LessWrong. Retrieved May 17, 2024. 
  371. Narayan, Jyoti; Hu, Krystal; Coulter, Martin; Mukherjee, Supantha (April 5, 2023). "Elon Musk and others urge AI pause, citing 'risks to society'". Reuters. Retrieved May 17, 2024. 
  372. Woollacott, Emma (March 29, 2023). "Tech Experts - And Elon Musk - Call For A 'Pause' In AI Training". Forbes. Retrieved May 17, 2024. 
  373. "Elon Musk and Others Call for Pause on A.I., Citing 'Profound Risks to Society'". New York Times. March 29, 2023. Retrieved May 17, 2024. 
  374. Seetharaman, Deepa (March 29, 2023). "Elon Musk, Other AI Experts Call for Pause in Technology's Development. Appeal causes tension among artificial-intelligence stakeholders amid concern over pace of advancement". Retrieved May 17, 2024. 
  375. Gordon, Anna (May 13, 2024). "Why Protesters Around the World Are Demanding a Pause on AI Development". Time Magazine. Retrieved May 17, 2024. 
  376. Mowshowitz, Zvi (March 30, 2023). "On the FLI Open Letter". LessWrong. Retrieved May 17, 2024. 
  377. Yudkowsky, Eliezer (March 29, 2023). "Pausing AI Developments Isn't Enough. We Need to Shut it All Down". Time Magazine. Retrieved May 17, 2024. 
  378. "Pausing AI Developments Isn't Enough. We Need to Shut it All Down by Eliezer Yudkowsky". March 29, 2023. Retrieved May 17, 2024. 
  379. Wasil, Akash; Hendrycks, Dan; O'Gara, Aidan. "AI Safety Newsletter #1: CAIS Linkpost". LessWrong. 
  380. "Initial £100 million for expert taskforce to help UK build and adopt next generation of safe AI: Prime Minister and Technology Secretary announce £100 million in funding for Foundation Model Taskforce.". GOV.UK. April 24, 2023. Retrieved May 19, 2024. 
  381. "Tech entrepreneur Ian Hogarth to lead UK's AI Foundation Model Taskforce: Artificial intelligence expert announced as chair of government's Foundation Model Taskforce.". GOV.UK. June 18, 2023. Retrieved May 19, 2024. 
  382. Mckernon, Elliott (October 11, 2023). "Update on the UK AI Taskforce & upcoming AI Safety Summit". LessWrong. Retrieved May 19, 2024. 
  383. Yudkowsky, Eliezer (April 1, 2023). "Will superintelligent AI end the world?". TED. Retrieved May 17, 2024. 
  384. "TED talk by Eliezer Yudkowsky: Unleashing the Power of Artificial Intelligence". LessWrong. May 6, 2023. Retrieved May 17, 2024. 
  385. "Will Superintelligent AI End the World?; Eliezer Yudkowsky; TED". TED on YouTube. July 11, 2023. Retrieved May 17, 2024. 
  386. Samin, Mikhail (July 12, 2023). "A transcript of the TED talk by Eliezer Yudkowsky". LessWrong. Retrieved May 17, 2024. 
  387. "Announcing Apollo Research". LessWrong. May 30, 2023. Retrieved May 17, 2024. 
  388. Christiano, Paul (July 31, 2023). "Thoughts on sharing information about language model capabilities". LessWrong. Retrieved May 17, 2024. 
  389. "Responsible Scaling Policies (RSPs)". Alignment Research Center. September 26, 2023. Retrieved May 17, 2024. 
  390. "Anthropic's Responsible Scaling Policy". Anthropic. September 19, 2023. Retrieved May 17, 2024. 
  391. Karnofsky, Holden (October 27, 2023). "We're Not Ready: thoughts on "pausing" and responsible scaling policies". LessWrong. Retrieved May 17, 2024. 
  392. "Executive Order on the Safe, Secure, and Trustworthy Development and Use of Artificial Intelligence". White House. October 30, 2023. Retrieved May 19, 2024. 
  393. Williams, Tristan (October 30, 2023). "President Biden Issues Executive Order on Safe, Secure, and Trustworthy Artificial Intelligence". LessWrong. Retrieved May 19, 2024. 
  394. Mowshowitz, Zvi (November 1, 2023). "On the Executive Order". LessWrong. Retrieved May 19, 2024. 
  395. Mowshowitz, Zvi (November 1, 2023). "Reactions to the Executive Order". LessWrong. Retrieved May 19, 2024. 
  396. "Biden-⁠Harris Administration Announces Key AI Actions 180 Days Following President Biden's Landmark Executive Order". White House. April 29, 2024. Retrieved May 19, 2024. 
  397. "The Bletchley Declaration by Countries Attending the AI Safety Summit, 1-2 November 2023". GOV.UK. November 1, 2023. Retrieved May 19, 2024. 
  398. Soares, Nate (October 31, 2023). "Thoughts on the AI Safety Summit company policy requests and responses". LessWrong. Retrieved May 19, 2024. 
  399. Mowshowitz, Zvi (November 7, 2023). "On the UK Summit". LessWrong. Retrieved May 19, 2024. 
  400. "The other side of the tidal wave". LessWrong. November 2, 2023. Retrieved May 19, 2024. 
  401. Mowshowitz, Zvi (November 20, 2023). "OpenAI: Facts from a Weekend". LessWrong. Retrieved July 6, 2024. 
  402. Mowshowitz, Zvi (November 22, 2023). "OpenAI: The Battle of the Board". LessWrong. Retrieved July 6, 2024. 
  403. Mowshowitz, Zvi (November 30, 2023). "OpenAI: Altman Returns". LessWrong. Retrieved July 6, 2024. 
  404. "AI Optimism: For a Free and Fair Future". Retrieved November 30, 2023. 
  405. Byrnes, Steven (December 1, 2023). "Thoughts on "AI is easy to control" by Pope & Belrose". LessWrong. Retrieved July 6, 2024. 
  406. Ruthenis, Thane (December 15, 2023). "Current AIs Provide Nearly No Data Relevant to AGI Alignment". LessWrong. Retrieved July 6, 2024. 
  407. Grace, Katja (January 4, 2024). "Survey of 2,778 AI authors: six parts in pictures". AI Impacts. Retrieved July 6, 2024. 
  408. Grace, Katja (January 5, 2024). "Survey of 2,778 AI authors: six parts in pictures". LessWrong. Retrieved July 6, 2024. 
  409. Leike, Jan (May 17, 2024). "Yesterday was my last day as head of alignment, superalignment lead, and executive @OpenAI.". Twitter. Retrieved July 6, 2024. 
  410. Piper, Kelsey (May 18, 2024). "ChatGPT can talk, but OpenAI employees sure can't. Why is OpenAI's superalignment team imploding?". Vox. Retrieved July 6, 2024. 
  411. Mowshowitz, Zvi (May 20, 2024). "OpenAI: Exodus". Retrieved July 6, 2024. 
  412. "AI safety". trends.google.com/. Retrieved 6 January 2021. 
  413. "AI safety". books.google.com. Retrieved 13 January 2021.