Timeline of AI in medicine

From Timelines
Jump to navigation Jump to search

This is a timeline of AI in medicine, which covers the history of artificial intelligence (AI) applied to medicine and healthcare, from the earliest rule-based expert systems of the 1960s and 1970s through the deep learning revolution of the 2010s and the emergence of large language models in the 2020s. It documents research findings, regulatory milestones, clinical deployments, policy frameworks, and controversies that have shaped the field across more than seven decades.

AI in medicine spans diagnostic decision support systems, medical imaging analysis, predictive models trained on electronic health records, large language models for clinical documentation, and generative AI for drug discovery. The field has been shaped as much by its failures as its successes: early expert systems rarely achieved widespread deployment; IBM Watson for Oncology collapsed under scrutiny; and research from 2018 onward exposed racial bias, poor generalization, and systematic underdiagnosis in deployed systems. The timeline traces genuine breakthroughs — AlphaFold's solution to the protein folding problem, the first FDA-cleared autonomous diagnostic systems, the first generative AI drug in clinical trials — alongside the recurrent gap between laboratory performance and real-world clinical impact.

Sample questions

The following are some interesting questions that can be answered by reading this timeline:

  • How has AI in medicine evolved from early conceptual foundations to deployed clinical tools?
  • What types of AI systems have received regulatory authorization for clinical use?
  • Which AI systems have been deployed at scale in real-world clinical settings?
    • Sort the full timeline by "Event type" and look for the group of rows with value "Deployment."
    • You will find a range of deployment contexts including population-level public health surveillance (BlueDot, 2019), consumer cardiac monitoring (Apple Watch, 2018), large language models in electronic health records (Epic/Microsoft, 2023), and diabetic retinopathy screening across low-resource settings in India (Aravind Eye Hospital, 2019).
  • What policy and regulatory frameworks govern AI in medicine globally?
    • Sort the full timeline by "Event type" and look for the group of rows with value "Policy/legislation."
    • You will find the HITECH Act (2009) creating the data infrastructure for medical AI, China's national AI strategy (2017), the EU AI Act (2024), and World Health Organization guidance on large multimodal models (2024) — illustrating the emergence of governance frameworks from multiple jurisdictions.
  • What research findings have most shaped understanding of AI performance and limitations in medicine?
    • Sort the full timeline by "Event type" and look for the group of rows with value "Research finding."
    • You will see landmark papers across decades: the first demonstrations of AI outperforming clinicians on diagnostic tasks (de Dombal, 1972; Golub, 1999), the deep learning results that matched specialist performance in imaging (2016–2019), and the bias and generalization failure papers that exposed limitations of deployed systems (Zech 2018, Obermeyer 2019, Seyyed-Kalantari 2021).
  • How have controversies and commercial failures influenced the trajectory of AI in medicine?
    • Sort the full timeline by "Event type" and look for the group of rows with value "Controversy."
    • You will find the 2017 STAT News investigation into IBM Watson for Oncology — which revealed that a widely marketed clinical AI system derived its recommendations from a small group of physicians rather than independent data analysis — marking the end of the first wave of AI hype in medicine and directly shaping subsequent attitudes toward clinical validation requirements.

Big picture

Time period Development summary More details
1950s–1970s Early foundations AI in medicine begins with the development of rule-based expert systems that mimic clinical reasoning using logic and symbolic inference. In the 1950s and 1960s, pioneers like Alan Turing envision machines performing intelligent tasks. By the 1970s, Stanford's MYCIN system demonstrates that computers can assist in diagnosing bacterial infections and recommending treatments, using a database of if-then rules. Though MYCIN never sees clinical use due to concerns about liability and clinical trust, it marks a foundational moment in AI-driven healthcare. The focus at this time is on replicating human decision-making through knowledge engineering, but systems struggle with ambiguity, learning, and real-world complexity.[1]
1980s–1990s AI Winter and clinical integration Interest in medical AI declines during the broader "AI Winter," a period marked by limited computing power, high development costs, and unmet expectations. Early expert systems like INTERNIST-I and DXplain aim to support clinical diagnostics but face difficulties scaling beyond narrow domains. Researchers begin shifting from rigid logic systems to probabilistic models, such as Bayesian networks, which allow for better uncertainty management. Despite setbacks, AI continues to influence medical education and decision support in limited environments. This period focuses more on integration with hospital information systems than major technological breakthroughs, laying groundwork for more adaptable AI in future decades.
2000s–2010s Machine learning emerges The 2000s see a major shift from hand-coded rules to machine learning approaches, powered by growing clinical datasets and improved computing resources. With the spread of electronic health records and the internet, machine learning becomes viable for pattern recognition, predictive modeling, and risk assessment. Algorithms become increasingly used in radiology, oncology, and hospital operations. IBM Watson Health gains global attention for its potential in cancer diagnostics[2], although its real-world performance lags behind expectations, as a 2017 STAT News investigation reveals.[3] Despite some overhype, this decade marks a turning point where AI moves from theoretical promise to practical application, enabled by statistical models learning from large-scale clinical data.
2015–2020s Deep learning revolution With the rise of deep learning, especially convolutional neural networks (CNNs), AI makes dramatic advances in fields requiring image analysis, such as dermatology, ophthalmology, and radiology. AI systems begin matching or exceeding expert-level accuracy in diagnosing diseases from medical scans, such as diabetic retinopathy and skin cancer. This success is enabled by access to large labeled datasets and advances in GPU-based computing. Regulatory bodies like the FDA begin approving AI-based diagnostic tools for clinical use. The era marks a growing acceptance of AI as a decision support tool, though interpretability and validation across populations remain key challenges.
2020s–present Generative AI and multimodal models The release of large language models (LLMs) like GPT-3 and GPT-4 sparks a new phase in AI and medicine, emphasizing generative capabilities and human-like reasoning. These models are now used for summarizing patient records, assisting in clinical documentation, triaging, and even drafting research or guideline summaries. Meanwhile, multimodal AI systems — combining text, image, lab, and genetic data — enable more holistic, personalized diagnostics and treatment planning. Models like Med-PaLM and BioGPT demonstrate promising results in medical QA and education. However, challenges remain around transparency, clinical safety, bias, and regulatory frameworks. Still, AI becomes an integral assistant in healthcare workflows rather than just a predictive tool.

Full timeline

Inclusion criteria

We include:

  • Major research findings demonstrating AI performance on clinical or biological tasks, particularly those establishing new benchmarks, outperforming human clinicians, or exposing important limitations such as bias or poor generalization.
  • Regulatory milestones: FDA clearances, De Novo authorizations, and legislative frameworks governing AI in medicine.
  • Clinical deployments of AI systems at meaningful scale, including both successful deployments and notable commercial failures.
  • Policy and governance events at national or international level directly shaping the development or regulation of AI in medicine.
  • Foundational AI research with direct and demonstrated relevance to medicine, including model architectures, training paradigms, and datasets that enabled subsequent medical applications.
  • Awards recognizing AI work with direct medical implications.

We exclude:

  • Minor software updates, parameter tweaks, or incremental version releases of medical AI products without documented clinical significance.
  • Organizational and business events (funding rounds, acquisitions, leadership changes) unless directly tied to a clinically significant milestone.
  • AI research with only speculative or indirect medical relevance.
  • Events covered only in related timelines such as Timeline of artificial intelligence or Timeline of machine learning.
Year Field of medicine Event type Event description
1950 Theoretical artificial intelligence; machine reasoning Research initiative British mathematician and logician Alan Turing publishes "Computing Machinery and Intelligence" in the philosophical journal Mind (journal), opening with the question "Can machines think?" and proposing the "imitation game" — later known as the Turing Test — as an operational criterion for machine intelligence. Turing argues that a sufficiently programmed digital computer could in principle exhibit any behavior a human could, and conjectures that by 2000 a machine would fool a human interrogator 30% of the time. The paper does not address medicine directly, but its framing of machine intelligence as a practical engineering problem rather than a philosophical impossibility is foundational for all subsequent AI development; Turing's conjecture that machines could learn from experience directly anticipates the machine learning paradigm that would eventually power diagnostic AI and clinical decision support. Turing is convicted by the British government in 1952 for homosexuality and subjected to chemical castration; he dies in 1954 at 41. The paper is published six years before the Dartmouth Conference (1956) that formally names artificial intelligence as a field.[4]
1956 General medicine; medical informatics Research initiative The Dartmouth Summer Research Project on Artificial Intelligence, a six-to-eight-week workshop at Dartmouth College, is widely regarded as the founding event of artificial intelligence as a field. Organized by mathematician John McCarthy (computer scientist) (who coins the term "artificial intelligence"), information theorist Claude Shannon of Bell Labs, and researchers Marvin Minsky and Nathaniel Rochester (computer scientist), it brings together roughly twenty researchers including Herbert Simon, Allen Newell, Ray Solomonoff, and John Nash. The 1955 proposal to the Rockefeller Foundation states the conjecture that "every aspect of learning or any other feature of intelligence can in principle be so precisely described that a machine can be made to simulate it," listing natural language processing, neural networks, and self-improvement as target areas — a research agenda recognizable seventy years later. Only McCarthy, Minsky, and Solomonoff stay for the full duration; the workshop produces no immediate breakthroughs. Its lasting significance is institutional: it establishes AI as a named discipline distinct from cybernetics and automata theory, and for medicine it is the conceptual origin of the expert systems paradigm that directly produces MYCIN (1972), INTERNIST-I (1974), CASNET (1978), and DXplain (1986).[5][6]
1965 Biomedical research; computational chemistry; molecular biology Research initiative Edward Feigenbaum, molecular biologist and Nobel laureate Joshua Lederberg, and chemist Carl Djerassi begin the DENDRAL project at Stanford University — considered the first expert system and the first use of artificial intelligence in biomedical research. Motivated by Lederberg's w:NASA work designing a mass spectrometer to detect signs of life on Mars, DENDRAL takes mass spectrometry data from unknown organic compounds and generates hypotheses about their molecular structure using heuristic rules distilled from expert chemists. By 1968 its performance rivals that of expert chemists. DENDRAL establishes the knowledge engineering paradigm — capturing human expert reasoning in formal rules and applying it computationally — that directly underlies MYCIN (1972), INTERNIST-I (1974), CASNET (1978), and subsequent medical expert systems. Bruce Buchanan later develops Meta-DENDRAL, a system capable of forming its own hypotheses, which becomes a template for later AI systems.[7][8]
1971 Biomedicine; medical informatics Research initiative Saul Amarel, founder of the Rutgers University Department of Computer Science and a pioneer of artificial intelligence research since the mid-1950s, organizes the Rutgers Research Resource on Computers in Biomedicine, funded by the w:National Institutes of Health. The initiative supports research on problem-solving approaches in the life sciences, pattern recognition models of clinical decision-making, and early techniques for modeling molecular interactions in drug design. Amarel directs the resource until 1984, and it goes on to host the first NIH-sponsored Artificial Intelligence in Medicine workshop in 1975 and to support the development of the CASNET model for glaucoma diagnosis (1978). The resource represents one of the earliest institutionalized efforts to apply AI methods systematically to biomedical problems, helping establish the field of AI in medicine as a legitimate research discipline with dedicated funding and infrastructure.[9][10]
1972 Infectious diseases; clinical decision support Research initiative Stanford University develops MYCIN, one of the earliest expert systems in artificial intelligence, designed to assist in diagnosing and treating blood infections. Using a knowledge base of approximately 500 "if-then" rules, MYCIN analyzes patient symptoms and test results, requests additional information if needed, and recommends treatments. It can also explain the reasoning behind its conclusions. MYCIN operates at a level comparable to medical specialists and outperforms general practitioners. As a pioneering rule-based system, it demonstrated the potential of AI in clinical decision-making and helped establish the foundation for future medical expert systems and AI applications in healthcare. An early expert system, MYCIN is foundational for medical decision support.[11]
1972 Emergency medicine; surgery; diagnostic medicine Research finding F.T. de Dombal, D.J. Leaper, J.R. Staniland, A.P. McCann, and J.C. Horrocks at the w:University of Leeds publish the first controlled prospective real-time trial of computer-aided diagnosis for acute abdominal pain in the British Medical Journal. Using Bayesian probabilistic reasoning trained on real patient cases, the system achieves diagnostic accuracy of 91.8% in a consecutive series of 304 patients — significantly above the 79.6% of the most senior clinician seeing each case. A follow-up multicentre deployment demonstrates that junior staff accuracy rises by 10–15%, appendicitis perforation rates fall from 27% to 12.5%, and surgical bed-nights decrease by 15%. The de Dombal system is one of the earliest demonstrations that a computer can outperform clinicians on a real diagnostic task, and one of the first AI systems to show measurable patient outcome improvements — anticipating by decades the clinical validation methodology that would become standard for medical AI evaluation.[12]
1974 Internal medicine; diagnostic medicine Research initiative INTERNIST-I, a broad-based diagnostic expert system for internal medicine, is completed at the University of Pittsburgh by AI pioneer Harry Pople, designed to capture the diagnostic expertise of Jack D. Myers, chairman of internal medicine at the University of Pittsburgh School of Medicine. Unlike earlier narrow expert systems like MYCIN, INTERNIST-I aims to cover the breadth of internal medicine, eventually encompassing 70–80% of all possible diagnoses after fifteen person-years of development by 1982. It uses a ranking algorithm and partitioning logic to produce ranked differential diagnoses from symptoms, laboratory results, and patient history, but handles multiple concurrent diseases poorly due to its hierarchical decision-tree logic. Consultation sessions take 30 to 90 minutes and the interface proves too unwieldy for routine clinical use, so the system never advances beyond a research setting. In the mid-1980s it is succeeded by Quick Medical Reference (QMR). Despite never reaching clinical deployment, INTERNIST-I establishes a template for broad-coverage diagnostic AI that later systems — including DXplain and eventually LLM-based diagnostic tools — build upon.[13]
1975 (June 14) General medicine; medical informatics Research initiative The Rutgers Research Resource on Computers in Biomedicine hosts the first workshop dedicated to Artificial Intelligence in Medicine (AIM), held in New Brunswick, New Jersey from June 14 to 17, 1975. Organized by Casimir Kulikowski under the overall direction of Saul Amarel and with technical direction by N.S. Sridharan, the workshop brings together researchers and clinicians working on prototype AI systems for clinical decision support, including presentations on CASNET, MYCIN, and INTERNIST-I. It is the first in a series of National Institutes of Health-sponsored AI in Medicine workshops that establish a community of practice across Rutgers University, Stanford University, University of Pittsburgh, and Tufts-Harvard-MIT. The workshop directly connects to the SUMEX-AIM network at Stanford, which provides the shared computational infrastructure enabling collaborative biomedical AI research across institutions.[14]
1978 Ophthalmology; clinical decision support Deployment The CASNET (causal-associational network) framework is introduced as one of the first AI systems based on causal models in medicine. Developed by Weiss, Kulikowski, Amarel, and Safir, CASNET models disease mechanisms through three components: patient observations, intermediate pathophysiological states, and disease classifications. Observations are linked to states, which form causal chains that map to specific diseases. These diagnostic conclusions then trigger general treatment recommendations, while detailed treatment strategies are tailored to the individual patient profile. The method is successfully applied to a consultation program for diagnosing and managing glaucoma, demonstrating CASNET's value in clinical decision support.[15]
1986 Diagnostic medicine; internal medicine Deployment G. Octo Barnett and colleagues at Massachusetts General Hospital / Harvard Medical School release DXplain, a computer-based diagnostic decision support system that accepts clinical manifestations — symptoms, signs, and laboratory results — and generates a ranked list of differential diagnoses with supporting references. Developed with American Medical Association support and distributed through AMA/NET nationwide, DXplain is one of the first AI diagnostic systems designed for broad clinical distribution rather than single-institution research use. The knowledge base initially covers approximately 500 diseases and expands to over 2,000. Unlike INTERNIST-I, which never achieves clinical deployment, DXplain is designed from the outset for practicing physicians with no computing background. It later transitions to a web-based platform, remains in use for medical education into the 2020s, and is among the longest-lived clinical AI systems ever deployed — outlasting IBM Watson Health by decades.[16]
1992 (July) Pathology; diagnostic decision support Research finding David Heckerman, Eric Horvitz, and Bharat Nathwani publish the first part of their account of the Pathfinder Project in Methods of Information in Medicine, describing a normative expert system for diagnosing lymph node diseases — a domain with over 60 possible diagnoses requiring integration of many uncertain clinical and histology findings. The team initially explores non-probabilistic, rule-based approaches to uncertainty and finds them inadequate, then demonstrates through controlled evaluation that a Bayesian network achieves diagnostic accuracy superior to a panel of expert pathologists on 53 benchmark cases. The result provides the strongest evidence to date that probabilistic graphical model outperform rule-based expert systems on real clinical diagnostic tasks, directly influencing the subsequent adoption of Bayesian networks in medical decision support. Heckerman's and Horvitz's subsequent careers at Microsoft Research further accelerate the integration of Bayesian methods into AI broadly.[17]
1999 (October 15) Oncology; haematology; molecular diagnostics Research finding Todd Golub, Eric Lander, and colleagues at the Whitehead Institute/MIT Center for Genome Research publish "Molecular Classification of Cancer: Class Discovery and Class Prediction by Gene Expression Monitoring" in Science (journal), demonstrating for the first time that machine learning applied to DNA microarray gene expression data can discover previously unknown cancer subtypes and accurately predict the class of new cases without prior biological knowledge. The system automatically discovers the distinction between acute myeloid leukemia (AML) and acute lymphoblastic leukemia (ALL) — a clinically critical distinction, as the two require different chemotherapy regimens and misclassification can be fatal — from 7,129 gene expression measurements across 72 patient samples. Among the most cited papers in the history of cancer biology, it establishes the template of large-scale molecular data plus machine learning equalling clinically actionable classification, underpinning modern computational oncology and precision medicine.[18]
2002 Infectious diseases (HIV/AIDS); precision medicine; clinical decision support Research initiative The HIV Resistance Response Database Initiative (RDI), founded in the United Kingdom by virologist Brendan Larder and colleagues, begins collecting clinical and virological data from HIV patients worldwide to train machine learning models predicting individual patient responses to antiretroviral drug combinations. With approximately 30 approved antiretroviral drugs across six drug classes, selecting the optimal combination for a patient with drug resistance virus is too complex for unaided clinical judgment; the RDI addresses this by drawing on data from over 250,000 patients across roughly 50 countries. In 2010 the RDI launches HIV-TRePS (HIV Treatment Response Prediction System), a free online tool grounded in over one million patient-years of treatment experience, described as possibly the first AI-based system for medical decision-making successfully developed, tested, and used in routine clinical practice at scale. As antiretroviral therapy improves and resistance becomes less clinically challenging, the need diminishes; the RDI withdraws HIV-TRePS in March 2024.[19]
2006 Medical imaging; computational medicine Research finding Geoffrey Hinton and his team introduce Deep Belief Networks (DBNs), marking a major breakthrough in the development of deep learning, and setting the stage for modern artificial neural network applications in medical imaging. DBNs use an unsupervised layer-wise training method, making it possible to train deep neural networks more effectively than before. This approach enables the models to learn complex, high-level features from data, significantly improving tasks like object detection and speech recognition. The success of DBNs would transform deep neural networks from theoretical research topics into practical tools for solving real-world problems, laying the foundation for the rapid growth of modern deep learning applications across many fields.[20]
2009 Health informatics; clinical data infrastructure Policy/legislation The Health Information Technology for Economic and Clinical Health Act is enacted as part of the American Recovery and Reinvestment Act, committing $25.9 billion to promote electronic health record (EHR) adoption across the US healthcare system. The act establishes "meaningful use" as the governing standard — providers must demonstrate EHRs achieve measurable improvements in care quality — offering incentives of up to $44,000 per physician under Medicare (United States) and $63,750 under Medicaid, with penalties for non-adopters from 2015. At the time of passage it is described as the most significant healthcare legislation in two to three decades. The HITECH Act proves foundational for AI in medicine: by driving near-universal EHR adoption, it creates the large-scale clinical datasets — patient records, lab results, imaging reports, medication histories — that machine learning and deep learning systems require to train, validate, and generalize, without which the wave of AI diagnostic tools emerging from 2012 onward would have lacked the data needed to demonstrate real-world performance.[21][22]
2011 (February) Medical informatics; clinical decision support Research initiative IBM Watson, a question-answering computer system developed by a research team led by David Ferrucci under IBM's DeepQA project, defeats Jeopardy! all-time champions Ken Jennings and Brad Rutter in a televised match watched by millions — winning the $1 million first-place prize. Watson's DeepQA architecture combines over 100 techniques for natural language processing, hypothesis generation, evidence evaluation, and answer ranking, processing 200 million pages of locally stored information to answer questions in seconds without internet access. The Jeopardy victory is a public demonstration of unprecedented natural language understanding by a machine and immediately attracts IBM's attention to healthcare as the next target domain: within months IBM announces plans to apply Watson's technology to clinical medicine, culminating in the 2012 partnership with Memorial Sloan Kettering Cancer Center. The win also demonstrates the limits of the approach — Watson's reasoning is statistical pattern matching rather than genuine comprehension, a distinction that would become important when the system is applied to the nuanced, high-stakes domain of oncology. Apple Inc. launches Siri, the first mainstream AI voice assistant, in the same year, further accelerating public and institutional interest in conversational AI for patient interaction and healthcare communication.[23]
2011 Neonatology; neonatal intensive care; infectious disease Research finding J. Randall Moorman, Karen Fairchild, and colleagues at the University of Virginia and eight other US neonatal intensive care unit publish results of the largest randomized clinical trial ever conducted in very low birthweight infants in the Journal of Pediatrics, demonstrating that display of the HeRO (Heart Rate Observation) score — an FDA-cleared machine learning system that detects heart rate abnormalities preceding sepsis by hours — reduces all-cause mortality from 10.2% to 8.1%, a 22% relative reduction. Among extremely low birthweight infants under 1,000 grams the benefit is larger, with mortality falling from 17.6% to 13.2%. Neonatal sepsis is a leading cause of death in premature infants; conventional assessment misses early cases because presenting signs are nonspecific and late. The HeRO trial is one of the earliest RCTs to demonstrate that a machine learning-based clinical decision support system reduces mortality in a defined patient population — predating by years the clinical AI validation wave that would follow AlexNet and deep learning.[24]
2012 Oncology; clinical decision support Research initiative Memorial Sloan Kettering Cancer Center (MSKCC) and IBM announce a collaboration to apply IBM Watson technology to cancer care. The partnership aims to create an AI-powered clinical decision support tool that helps oncologists worldwide make personalized, evidence-based diagnostic and treatment decisions. By combining Watson's natural language processing and rapid data analysis with MSKCC's clinical expertise and vast cancer database, the system will deliver updated recommendations tailored to individual patients. This initiative addresses the growing complexity of cancer treatment and aims to accelerate the spread of cutting-edge oncology knowledge to improve outcomes across diverse healthcare settings. The partnership would later face serious public scrutiny: a 2017 STAT News investigation finds that Watson for Oncology's recommendations derive entirely from training by a small group of Memorial Sloan Kettering physicians rather than from independent data analysis, and that concordance with local physicians at hospitals in Denmark reaches only 33%; IBM Watson Health is eventually sold off in 2022.[25]
2012 (September 30) Medical imaging (foundational) Research finding AlexNet, a deep convolutional neural network developed by Alex Krizhevsky, Ilya Sutskever, and Geoffrey Hinton at the University of Toronto, wins the ImageNet Large Scale Visual Recognition Challenge by a margin of 9.8 percentage points — the largest in the competition's history. Trained on ImageNet, a dataset of 1.2 million labeled images assembled by computer scientist Fei-Fei Li, AlexNet achieves a top-5 error rate below 25%, demonstrating for the first time that deep CNNs trained on large datasets and graphics processing unit hardware can dramatically outperform prior approaches to image classification. Three enabling factors converge: the ImageNet dataset, Nvidia's CUDA platform enabling GPU-accelerated training of the model's 60 million parameters, and architectural innovations including rectifier (neural networks) activation functions and dropout regularization regularization. The medical imaging implications are immediate: the same CNN architecture proves directly transferable to radiographs, pathology slides, retinal scans, and dermatological images, underpinning nearly every major medical imaging AI milestone that follows, including the Stanford skin cancer detector (2017), the DeepMind eye disease system (2018), and FDA-cleared diagnostic tools for diabetic retinopathy.[26][27]
2013 Primary care; patient-facing AI; digital health Research initiative Babylon Health is founded in London by Ali Parsa with the stated ambition of putting an AI-powered "doctor in your pocket" accessible to every human on the planet. The company combines an AI symptom checker with video general practitioner consultations, and in 2015 launches GP at Hand — the first National Health Service-contracted app-based GP service — reaching over 100,000 registered NHS patients by 2021. Babylon simultaneously deploys its Babyl service in Rwanda from 2016, reaching one million consultations by 2020. In 2018 Parsa claims Babylon's symptom checker outperforms human doctors on Royal College of General Practitioners exam questions — a claim immediately disputed by clinicians who demonstrate it misses obvious diagnoses including heart attacks. Babylon raises $1.2 billion and reaches a peak valuation of $4.2 billion on New York Stock Exchange listing in October 2021, before collapsing into bankruptcy in August 2023. Babylon's arc from revolutionary promise to bankruptcy becomes one of the defining cautionary tales of consumer-facing medical AI.[28][29]
2014 (October) Radiology; diagnostic imaging Research initiative AI startup Enlitic raises $2 million in seed funding to develop deep learning tools for medical diagnosis. Founded by Jeremy Howard (entrepreneur), Enlitic aims to analyze large archives of medical images — like CT scan and X-ray — to assist doctors in identifying diseases more accurately and efficiently. Using artificial neural network, the system learns patterns from existing medical data and generates predictions for new cases. With partnerships in the U.S., Brazil, China, and India, Enlitic also builds an "imaging analytics toolbox" to speed up algorithm development. The company sees particular potential in low-resource settings, where AI tools can greatly enhance diagnostic access.[30]
2014 (December 1) Cardiology; coronary artery disease diagnostics Regulatory milestone HeartFlow Inc. receives De Novo clearance from the Food and Drug Administration for its FFRCT technology — the first non-invasive imaging technology cleared to assess both the anatomical extent of coronary artery disease and their functional impact on blood flow. HeartFlow FFRCT takes coronary CT angiography (CCTA) images and solves millions of equations simulating blood flow through a patient-specific 3D model, generating fractional flow reserve (FFR) values without cardiac catheterization. Coronary artery disease is the leading cause of death globally; the clearance addresses a longstanding clinical problem in which anatomical imaging alone cannot reliably determine whether a stenosis needs treatment — leading to either unnecessary invasive procedures or undertreated disease. The pivotal NXT study demonstrates FFRCT diagnostic accuracy of 86% versus 65% for CT angiography alone. HeartFlow subsequently receives CE marking and National Institute for Health and Care Excellence endorsement, becoming the only such technology with all three major regulatory approvals.[31]
2015 Surgery; surgical informatics Research initiative Verb Surgical, a joint venture between Johnson & Johnson's Ethicon division and Verily Life Sciences (formerly Google Life Sciences, a subsidiary of Alphabet Inc.), is incorporated in August 2015 with the explicit mission of combining machine learning, robotics, advanced visualization, and data analytics into a single surgical platform — the first major industry initiative to treat AI and machine learning as foundational components of surgical robotics rather than optional add-ons. Existing commercial surgical robots such as Intuitive Surgical's da Vinci surgical system system are surgeon-controlled tools rather than genuinely intelligent systems; Verb sets out to build a platform in which data from surgical procedures trains machine learning models to improve outcomes, assist decision-making, and eventually support autonomous surgical tasks. Verily contributes image and data analytics software while Ethicon contributes surgical device expertise and commercial infrastructure. Johnson & Johnson acquires Verb's remaining stake from Verily in December 2019; the underlying technology and ambition subsequently inform J&J's OTTAVA robotic surgical system, submitted to the Food and Drug Administration for De Novo classification in January 2026.[32][33]
2016 (November 29) Ophthalmology; medical imaging Research finding Google researchers led by Varun Gulshan and Lily Peng publish a landmark study in JAMA demonstrating that a deep convolutional neural network trained on 128,175 retinal fundus images graded by 54 ophthalmologists can detect referable diabetic retinopathy with sensitivity of 97.5% and specificity of 93.4% — matching or exceeding the performance of board-certified ophthalmologists. Diabetic retinopathy affects approximately 28.5% of people with diabetes in the US and is a leading cause of preventable blindness; screening rates are inadequate due to the global shortage of ophthalmologists. The study is the first large-scale prospective validation of a deep learning diagnostic system against a multi-physician reference standard, and directly informs the development of IDx-DR, which receives FDA clearance as the first autonomous AI diagnostic system in 2018.[34]
2017 Dermatology; medical imaging Research finding Stanford University researchers develop an AI system capable of detecting skin cancer with accuracy comparable to 21 expert dermatologists. Trained on a dataset of 129,450 images representing over 2,000 skin diseases, the deep learning algorithm successfully identifies various types of skin cancer, including melanoma and keratinocyte carcinoma. While at the time reliant on high-quality clinical images, the system holds potential for mobile deployment, expanding access to early diagnosis. Experts caution that more training with smartphone-quality images is needed. The study underscores AI's broader promise in medical imaging, with similar efforts underway in ophthalmology, oncology, and cardiovascular prediction.[35]
2017 (January 9) Cardiology; radiology; medical imaging Regulatory milestone Arterys, a San Francisco-based medical imaging company, receives Food and Drug Administration 510(k) clearance for Arterys Cardio DL, the first technology cleared by the FDA leveraging cloud computing and deep learning in a clinical setting. The application analyzes cardiac MRI images and produces automated, editable ventricular segmentations with accuracy comparable to experienced physicians — reducing a manual process taking up to 45 minutes to seconds. The clearance is significant beyond cardiology: it establishes that the FDA is willing to clear AI systems running on cloud infrastructure rather than local hospital hardware, a prerequisite for scalable, continuously improving medical AI, and opens the regulatory pathway that subsequent cloud-based AI diagnostic tools follow.[36]
2017 (June 6) Psychiatry; mental health; clinical psychology Research finding Kathleen Kara Fitzpatrick, Alison Darcy, and Molly Vierhile at Stanford School of Medicine and Woebot Labs publish the first randomized controlled trial of a fully automated AI conversational agent delivering cognitive behavioral therapy (CBT) in JMIR Mental Health. In the trial of 70 participants aged 18–28, those assigned to Woebot — a text-based chatbot delivering up to 20 CBT-derived sessions over two weeks — show significantly greater reductions in depression symptoms compared with controls, with participants interacting with the system on average 12 out of 14 days. The study is the first to demonstrate measurable therapeutic effects from an AI system in a controlled trial, rather than merely engagement or satisfaction. Mental health care faces a severe global treatment gap due to workforce shortages, cost, stigma, and geographic barriers; Woebot represents early evidence that AI may partially address it. Woebot Health subsequently receives FDA Breakthrough Device Designation for a digital therapeutic for major depressive disorder in 2021.[37]
2017 (July 20) All medical AI fields; health informatics; medical devices Policy/legislation China's State Council of China issues the New Generation Artificial Intelligence Development Plan (State Council Document No. 35), the first comprehensive national strategy for AI development by a major power, setting out a phased roadmap to make China the world's leading AI innovation center by 2030. The plan explicitly identifies intelligent medicine as a priority sector, calling for the development of AI-powered medical imaging analysis, clinical decision support, drug discovery, medical robotics, and personalized medicine, and for the establishment of a national medical AI system integrating electronic health record, genomics, and imaging data at population scale. The three-stage timeline sets targets for 2020 (matching world-leading levels in key AI applications), 2025 (major AI theory breakthroughs and large-scale industrial deployment), and 2030 (global AI innovation leadership). The plan is accompanied by substantial state funding, the designation of national AI champions including Baidu (autonomous systems), Alibaba (smart cities), Tencent (medical imaging), and iFlytek (voice recognition in healthcare), and the rapid expansion of AI medical device regulation by China's National Medical Products Administration (NMPA). China's combination of a large patient population, relatively permissive data environment, and strong state coordination enables rapid deployment of AI medical systems: by 2020 over 100 Chinese hospitals are using AI-assisted CT scan reading for COVID-19, and Chinese AI medical device approvals grow from near zero to over 200 by 2023. The plan is widely credited with triggering a global AI strategy race, with the EU, UK, Canada, and other jurisdictions issuing their own national AI strategies within two years.[38][39]
2017 (September 5) Oncology; clinical decision support Controversy A STAT News investigation by Casey Ross and Ike Swetlitz finds that IBM Watson for Oncology, deployed across more than 50 hospitals on five continents, is failing to deliver on its promises and is "artificially intelligent only in the most rudimentary sense of the term." The investigation reveals that Watson's treatment recommendations derive entirely from training by a few dozen physicians at Memorial Sloan Kettering Cancer Center rather than from independent data analysis — making it essentially "Memorial Sloan Kettering in a portable box." Hospitals in Denmark abandon the system after concordance with local doctors reaches only 33%; IBM has published no peer-reviewed clinical trials of Watson's effect on patient outcomes. MD Anderson Cancer Center had already cancelled its Watson partnership after spending more than $60 million over three years without producing a usable system. The investigation marks the end of the first wave of AI hype in medicine and shapes regulatory and institutional attitudes toward clinical AI validation for years afterward. IBM Watson Health is eventually sold off in 2022.[40]
2017 (September 14) Psychiatry; addiction medicine; substance use disorder Regulatory milestone Pear Therapeutics receives De Novo clearance from the Food and Drug Administration for reSET — the first prescription digital therapeutic (PDT) to receive FDA marketing authorization with claims to improve clinical outcomes in a disease, and the first time the FDA clears a software-only product intended to treat rather than merely monitor a medical condition. reSET is a 12-week smartphone-based CBT program for substance use disorder involving alcohol, cocaine, marijuana, and stimulants, used alongside outpatient treatment. The clearance creates a new regulatory classification for prescription digital therapeutics — sitting between unregulated wellness apps and pharmaceutical drug, requiring clinical evidence of efficacy delivered via software rather than a molecule. Pear subsequently receives clearance for reSET-O for opioid use disorder in December 2018, before filing for bankruptcy in 2023, illustrating the commercial challenges of the PDT model despite regulatory success.[41]
2017 (November 27) Psychiatry; public health; crisis intervention Deployment Facebook deploys proactive AI-based suicide detection across its global platform, using machine learning classifiers trained on previously flagged posts to scan all public content — including live video — for patterns indicating suicidal ideation without requiring any user report. The system routes flagged content to specially trained human reviewers who can contact emergency services or surface crisis resources directly to the at-risk user; within one month of early testing it triggers over 100 wellness checks. The EU declines to permit the proactive version due to General Data Protection Regulation constraints on profiling users based on sensitive personal information. The system represents the first large-scale deployment of AI for real-time mental health crisis detection at population scale, marking a shift in which AI mental health applications move beyond individual clinical interactions into passive, continuous population-level surveillance — generating both public health interest and sustained ethical debate.[42]
2017 Psychiatry; mental health; suicide prevention Research finding Colin Walsh, Jessica Ribeiro, and Joseph Franklin publish one of the first demonstrations of machine learning applied to electronic health record for suicide attempt prediction in Clinical Psychological Science, developing algorithms trained on 5,167 adult patients that achieve an AUC of 0.84 for predicting future suicide attempts — substantially outperforming traditional clinical risk assessment, which performs near chance on individual-level prediction despite decades of research. Prediction accuracy improves as the attempt date approaches, and the most predictive variables shift over time, suggesting machine learning identifies dynamic risk trajectories invisible to static clinical scoring. The paper is part of a broader wave of EHR-based suicide prediction research in the 2015–2020 period that collectively establishes machine learning as a more powerful approach to suicide risk than questionnaire-based screening, while raising ethical questions about algorithmic risk scoring, false positives, and clinical responsibility. Several US health systems subsequently deploy EHR-based suicide risk prediction tools derived from this research paradigm.[43]
2017 Neurology; psychiatry; neurodegeneration; ageing Research finding James Cole, Katja Franke, and colleagues publish a landmark study establishing the Brain Age Gap Estimation (BrainAGE) framework — a machine learning approach that predicts an individual's brain age from structural MRI and uses the gap between predicted and chronological age as a biomarker of neurological health. Models trained on healthy individuals predict chronological age with high accuracy; when applied to those with neurological or psychiatric conditions, predicted brain age is systematically higher — indicating accelerated ageing. The brain age gap correlates with cognitive decline, dementia risk, and mortality, and is elevated in Alzheimer's disease, schizophrenia, HIV, epilepsy, and traumatic brain injury. By 2023 BrainAGE has become one of the most widely cited frameworks in neuroimaging, with hundreds of studies applying it across neurodegenerative, psychiatric, and systemic diseases, and the concept of a machine learning-derived biological age has expanded beyond the brain to other organ systems.[44]
2018 (February 13) Neurology; stroke; emergency medicine Regulatory milestone The Food and Drug Administration grants De Novo clearance to Viz.ai for its ContaCT application — the first FDA-cleared AI system for stroke detection and the first AI triage software of any kind to receive FDA marketing authorization. ContaCT analyzes CT angiography images, automatically detects suspected large vessel occlusions (LVOs) — the most severe and time-sensitive type of ischemic stroke — and alerts the on-call neurovascular specialist's smartphone within a median of under six minutes. Every 30 minutes of delay in LVO treatment costs an average of 16.8 million neuron; real-world studies demonstrate that ContaCT significantly reduces door-to-transfer and door-to-puncture times. The clearance creates a new regulatory classification for computer-aided triage software that subsequent AI triage systems use as a 510(k) predicate. Viz.ai subsequently expands to intracranial hemorrhage, pulmonary embolism, and aortic dissection, and is deployed in over 1,700 hospitals globally by 2023.[45][46]
2018 (February 19) Cardiology; ophthalmology; preventive medicine Research finding Ryan Poplin, Lily Peng, and colleagues at Google Research and Verily Life Sciences publish a study in Nature Biomedical Engineering demonstrating that deep learning models trained on retinal fundus photographs can predict cardiovascular disease not previously thought to be quantifiable in retinal images — including age, sex, smoking, systolic blood pressure, and major adverse cardiac events. Trained on 284,335 patients, the model predicts major adverse cardiovascular events with AUC of 0.70 — comparable to the European SCORE cardiovascular risk calculator, which requires blood tests. The result is scientifically surprising: the retina was not previously known to encode this breadth of systemic cardiovascular information. The practical implication is significant: fundus photography is already deployed at scale for diabetic retinopathy screening, meaning cardiovascular risk stratification could be embedded into existing workflows without additional tests. The paper exemplifies a broader phenomenon — AI discovering previously unknown clinical information in existing medical images — that drives subsequent research into predicting systemic disease from retinal and other imaging data.[47]
2018 (April) Ophthalmology; primary care Regulatory milestone The Food and Drug Administration grants De Novo clearance to IDx-DR, developed by ophthalmologist and AI researcher Michael Abràmoff at the University of Iowa, making it the first fully autonomous AI diagnostic system across any field of medicine to receive FDA authorization. Unlike prior FDA-cleared AI tools that assist clinicians, IDx-DR operates without any clinician oversight: it analyzes retinal images at the point of care and delivers an immediate diagnostic result autonomously. The pivotal trial at 10 primary care sites with 900 patients with diabetes demonstrates sensitivity of 87.2% and specificity of 90.7%, and is the first to prospectively assess the safety of an autonomous AI system in direct patient care. The clearance establishes a De Novo pathway that subsequent autonomous AI diagnostic systems use as their predicate, and demonstrates that the FDA is willing to authorize AI systems that make clinical decisions independently of human oversight — opening a pathway for autonomous AI across medical specialties. IDx-DR is later rebranded as LumineticsCore by Digital Diagnostics Inc.[48][49]
2018 (August 14) Cardiology; primary care; consumer health Regulatory milestone The Food and Drug Administration grants De Novo clearance to Apple Inc.'s ECG App (submission DEN180044), making it the first over-the-counter electrocardiogram application cleared by the FDA for consumer use. The app records a single-channel electrocardiogram via the Apple Watch, analyzes the 30-second recording using an onboard algorithm, and classifies the rhythm as atrial fibrillation (AFib), sinus rhythm, or inconclusive without any clinician involvement. In the pivotal study of 602 subjects, the algorithm achieves false-positive and false-negative rates of 0.4% and 1.7% respectively, meeting prespecified performance goals against physician-adjudicated 12-lead ECG. Atrial fibrillation affects tens of millions worldwide and is a leading cause of stroke; many cases go undetected because episodes are intermittent. The clearance establishes that an AI algorithm embedded in a consumer wearable technology can meet FDA evidentiary standards for a medical indication, creates a new regulatory product code (QDA) for over-the-counter electrocardiograph software, and demonstrates that AI-powered passive health monitoring can serve a genuine clinical purpose. The feature reaches tens of millions of users with Apple Watch Series 4 in September 2018.[50]
2018 (September 24) Genomics; molecular diagnostics; rare disease Research finding Ryan Poplin, Pi-Chuan Chang, Mark DePristo, and colleagues at Verily Life Sciences and Google publish DeepVariant in Nature Biotechnology — the first deep learning system for variant calling, demonstrating that a convolutional neural network treating sequencing read data as images can outperform all existing hand-engineered bioinformatics tools on the task of identifying genetic variant. DeepVariant achieves more than 50% fewer errors per genome than the widely used GATK HaplotypeCaller and wins the highest performance award for single-nucleotide polymorphism in an FDA-administered variant calling challenge. Variant calling is the essential first step in clinical genomics: every genetic diagnosis, pharmacogenomics result, and rare disease investigation depends on it. Google releases DeepVariant as open-source software, enabling rapid adoption across clinical and research genomics, and the system subsequently becomes the basis for DeepSomatic (2025), extending the approach to cancer somatic mutation detection.[51]
2018 (November 6) Radiology; pulmonology; medical AI safety Research finding John Zech, Marcus Badgeley, Eric Oermann, and colleagues at the Icahn School of Medicine at Mount Sinai publish a landmark study in PLOS Medicine demonstrating that deep learning models for pneumonia detection in chest X-ray perform substantially better on data from their own hospital than on external hospital data — exposing a fundamental generalization problem in medical AI. More strikingly, the convolutional neural network identify which hospital system a radiograph came from with 99.95–99.98% accuracy, meaning the models learn hospital-specific imaging artifacts and equipment signatures rather than disease-relevant features. The result reveals that high performance on internal validation — the standard reported in most medical AI papers — does not guarantee real-world generalizability. The paper is among the most cited in the medical AI safety literature and directly influences the FDA's thinking about external validation requirements for AI medical devices and the design of subsequent multi-site validation studies.[52]
2018 Ophthalmology; medical imaging Research finding Researchers from DeepMind, University College London, and Moorfields Eye Hospital develop an AI system capable of detecting over 50 common eye diseases from 3D retinal scans with accuracy comparable to expert doctors. Using deep learning and optical coherence tomography (OCT) scans, the software can recommend patients for treatment based on its analysis. Although not yet approved for clinical use, it demonstrates its ability to assist in hospitals, helping prioritize urgent cases and improve diagnosis efficiency. The system's potential is seen as transformative in managing sight-threatening conditions and can significantly expand access to early and accurate eye disease diagnosis globally.[53]
2019 (January 7) Cardiology; cardiac electrophysiology Research finding Awni Hannun, Pranav Rajpurkar, Andrew Ng, and colleagues at Stanford University and iRhythm Technologies publish a study in Nature Medicine demonstrating that a 34-layer deep learning can classify 12 cardiac arrhythmia classes from single-lead ambulatory electrocardiogram recordings with accuracy exceeding that of individual board-certified cardiologists. Trained on 91,232 single-lead ECGs from 53,549 patients using iRhythm's Zio wearable patch, the model is validated against a consensus committee of cardiologists and outperforms six individual cardiologists on average F1 score across arrhythmia classes. The study is the first to demonstrate cardiologist-level or better performance by a deep learning system across a broad range of arrhythmias — including atrial fibrillation, atrioventricular block, ventricular tachycardia, and supraventricular tachycardia — rather than a single condition. More than 300 million ECGs are obtained annually worldwide; automating their interpretation at cardiologist level has direct implications for reducing diagnostic delays, expanding access in settings without specialist availability, and enabling continuous monitoring via wearable technology. The work connects directly to iRhythm's subsequent FDA clearance of AI-based arrhythmia detection features for its Zio patch platform.[54]
2019 (February 1) Pulmonology; radiology Research initiative MIT's Laboratory for Computational Physiology releases the MIMIC-Chest X-Ray (MIMIC-CXR) database, offering over 350,000 anonymized chest radiographs collected from Beth Israel Deaconess Medical Center over five years. Free and open to academic, clinical, and industrial researchers via PhysioNet, this is the largest public repository of its kind. The database supports the development of AI models to detect conditions like pneumonia, cardiomegaly, and edema — especially beneficial for underserved areas lacking radiologists. The project aims to link imaging data with clinical records (via MIMIC-III), enabling more robust diagnostic tools. Collaboration with Stanford enhances the dataset's generalizability across different healthcare contexts.[55][56]
2019 (February 14) Medical informatics; health communication Deployment OpenAI's GPT-2 is released as a large-scale unsupervised language model with 1.5 billion parameters, trained to predict the next word using 40GB of internet text. Without task-specific training, GPT-2 achieves impressive results in language modeling, reading comprehension, summarization, translation, and question answering. It generates coherent, contextually adaptive text but still shows occasional errors. Due to concerns about potential misuse — such as disinformation, impersonation, and spam — OpenAI initially withheld the full model, opting for a staged release strategy. GPT-2 highlights the potential and risks of large language model, prompting broader discussions on responsible publication, AI policy, and the societal implications of advanced text-generation systems.[57]
2019 Gastroenterology; colorectal cancer screening Research finding Pu Wang, Tyler Berzin, and colleagues publish the first prospective randomized controlled trial of a real-time AI-assisted colonoscopy system in Gut (journal). The AI system highlights detected polyp with a bounding box and audible alert in real time, achieving an adenoma detection rate of 29.1% versus 20.3% in the control group — a relative increase of 44%. Colorectal cancer is the third most common cancer globally; adenoma miss rates during colonoscopy can reach 26% and are a leading cause of post-colonoscopy colorectal cancer. The study is the first RCT to demonstrate a clinically meaningful improvement in adenoma detection using real-time AI, establishing colonoscopy as one of the first procedural specialties to benefit from live deep learning assistance and triggering a wave of subsequent RCTs and regulatory submissions for computer-aided detection systems in endoscopy.[58]
2019 Ophthalmology; global health; low- and middle-income countries Deployment Google and Verily Life Sciences deploy their automated retinal disease assessment (ARDA) deep learning system for diabetic retinopathy screening at Aravind Eye Hospital in Madurai, India — the first large-scale real-world deployment of an AI diagnostic system in a low-resource clinical setting. Deployed across 45 sites in southern India, the system screens patients with diabetes for diabetic retinopathy and diabetic macular edema, reaching populations lacking access to ophthalmologists and demonstrating for the first time that a deep learning diagnostic system validated in high-income settings can be deployed at scale in a low-resource environment. By 2024 ARDA has supported over 600,000 screenings; Google subsequently licenses the model to partners in India and Thailand with a goal of six million AI-enabled screenings at no cost to patients. The Aravind deployment becomes a reference case for adapting AI diagnostic systems for global health equity.[59][60]
2019 (October 25) Health policy; primary care; health equity Research finding Ziad Obermeyer, Brian Powers, Christine Vogeli, and Sendhil Mullainathan publish "Dissecting racial bias in an algorithm used to manage the health of populations" in Science (journal), demonstrating that a widely deployed commercial healthcare algorithm used by US health systems to identify patients for high-risk care management systematically discriminates against Black patients. The algorithm predicts future healthcare costs as a proxy for health needs — a seemingly race-neutral choice that proves deeply discriminatory: because Black patients with the same health status as white patients generate lower costs due to longstanding inequities in access to care, the algorithm underestimates their needs. The bias reduces Black patient referrals for extra care from a corrected 46.5% to only 17.7%. The manufacturer confirms the findings and reformulates the algorithm, reducing racial bias by 84–86%. The paper demonstrates that algorithmic bias in healthcare can arise from proxy variables encoding structural inequity rather than from explicitly using race, directly informing subsequent FDA guidance on algorithmic bias and policy debates around the regulation of commercial healthcare algorithms.[61]
2019 (December 31) Infectious disease; public health; pandemic preparedness Deployment BlueDot, a Toronto-based AI health surveillance company founded by infectious disease physician Dr. Kamran Khan, alerts its clients to a cluster of unusual pneumonia cases around a market in Wuhan, China — nine days before the World Health Organization issues its public statement on January 9, 2020. BlueDot's platform combines natural language processing across news reports and health communications in 65 languages with machine learning analysis of global airline ticketing data to predict pathogen spread. The system correctly identifies eleven of the first cities to receive COVID-19 cases outside China, including Bangkok, Hong Kong, Tokyo, Taipei, and Seoul. In January 2020 the BlueDot team publishes a rapid communication in the Journal of Travel Medicine on the potential for international spread via commercial air travel. The case demonstrates that AI-driven epidemiological surveillance can outpace traditional public health reporting systems, and becomes one of the most cited early examples of beneficial AI in public health.[62][63]
2020 (March 25) Cardiology; echocardiography Research finding David Ouyang, Bryan He, Andrew Ng, and colleagues at Stanford University publish EchoNet-Dynamic in Nature (journal) — the first video-based deep learning algorithm for beat-to-beat assessment of cardiac function from echocardiography videos, surpassing human expert performance on left ventricular segmentation, ejection fraction estimation, and cardiomyopathy classification. Trained on 10,030 labeled echocardiogram videos and validated at Cedars-Sinai Medical Center, the model predicts ejection fraction with a mean absolute error of 4.1% — comparable to or better than the variability between human expert cardiologists — and classifies heart failure with AUC of 0.97. Ejection fraction is the most important single measurement in cardiology, determining heart failure diagnosis, guiding chemotherapy cardiotoxicity monitoring, and driving treatment decisions for millions annually. EchoNet-Dynamic also releases its dataset as the largest public medical video dataset at the time, establishing echocardiography as a major domain for medical AI.[64]
2020 (May 28) Medical informatics; clinical documentation; patient communication Research finding OpenAI releases GPT-3, an autoregressive model language model with 175 billion parameters — more than ten times larger than any previous non-sparse language model — demonstrating that scaling dramatically improves few-shot and zero-shot performance without task-specific fine-tuning. Unlike GPT-2, GPT-3 is made available via API, enabling rapid integration. In healthcare it attracts immediate attention for patient triage, clinical documentation, and question answering, while researchers identify risks including racial and gender biases, affirmation of suicidal ideation in at least one medical chatbot deployment, and unresolved HIPAA compliance questions. Microsoft acquires an exclusive license in September 2020. GPT-3 establishes the template — large-scale pretraining followed by API-based deployment — that subsequent medically-focused language models including Med-PaLM (2022) and ChatGPT (2022) follow.[65][66]
2020 (November 30) Molecular biology; drug discovery Research finding AlphaFold 2, developed by DeepMind, wins the 14th Critical Assessment of Protein Structure Prediction (CASP14) by a decisive margin, with a median backbone accuracy (RMSD_95) of less than 1 angstrom — three times more accurate than the next best system and comparable to experimental methods such as X-ray crystallography. The result is recognized by the CASP14 organizers as a solution to the protein folding problem, a grand challenge in biology that has remained open for over 50 years. Prior to AlphaFold, determining a single protein structure experimentally could take months to years and hundreds of thousands of dollars; AlphaFold 2 achieves comparable accuracy in minutes. The medical implications are immediate: protein underpin virtually every biological process, and understanding their 3D structure is essential for drug discovery, understanding disease mechanisms, and developing targeted therapies. The methodology underlying AlphaFold 2 is subsequently published in Nature (journal) in July 2021.[67]
2021 (July 15) Molecular biology; drug discovery; medical research Research finding Nature (journal) publishes the full methodology of AlphaFold 2 in the paper "Highly accurate protein structure prediction with AlphaFold" by John Jumper, Richard Evans, Demis Hassabis, and colleagues at DeepMind. The paper describes a novel deep learning architecture incorporating physical and biological knowledge about protein structure — including multi-sequence alignments — to predict 3D protein structures with accuracy competitive with experimental methods in the majority of cases. DeepMind simultaneously open-source software the code along with 60 pages of supplementary information. The paper goes on to accumulate over 40,000 citations in scientific journals, making it one of the most cited papers in the history of biology. One week later, a companion Nature paper presents structure predictions for the entire human proteome. The publication directly enables the launch of the AlphaFold Protein Structure Database and transforms drug discovery, cancer research, and the study of infectious diseases by making accurate protein structure prediction freely and immediately accessible to researchers worldwide.[68]
2021 (September 15) Pulmonology; critical care; pandemic preparedness Research finding Ittai Dayan, Holger Roth, and colleagues publish the first large-scale real-world demonstration of federated learning in clinical medicine in Nature Medicine, reporting that the EXAM system — trained across 20 institutions in the US, Brazil, and Europe without any patient data leaving individual hospital sites — achieves sensitivity of 0.950 for predicting mechanical ventilation or death at 24 hours in COVID-19 patients. Federated learning addresses a fundamental barrier in medical AI: valuable training datasets are locked inside hospital systems by privacy regulations, yet centralizing them is legally and institutionally impossible. Rather than moving data, federated learning trains models locally and aggregates only model weight updates. The EXAM study demonstrates that a federally trained model outperforms models trained at any single institution and generalizes across sites with different patient populations and imaging equipment — the first prospective validation of this approach at global scale.[69]
2021 (September 22) Oncology; pathology; prostate cancer Regulatory milestone The Food and Drug Administration grants De Novo marketing authorization to Paige for Paige Prostate — the first AI-based pathology product to receive FDA authorization for in vitro diagnostic use — marking the entry of AI into clinical pathology, the last major medical imaging specialty to receive FDA-cleared AI diagnostic tools. Developed by Paige, a company founded on research from Memorial Sloan Kettering Cancer Center, Paige Prostate analyzes digitized whole-slide images of prostate biopsy and identifies foci suspicious for cancer, designed to serve as a "second set of eyes" for pathologists. In the pivotal clinical study submitted to the FDA, 16 pathologists examining 527 digitized prostate biopsy slide images achieve a 7.3 percentage point improvement in sensitivity when using Paige Prostate — from 89.5% to 96.8% — with a 70% reduction in false negatives and a 24% reduction in false positives. Prostate cancer is the most common cancer among men in the US and a leading cause of cancer death globally; accurate pathological grading of biopsies is essential for treatment decisions. The De Novo authorization creates a new regulatory classification for AI-based pathology software, establishing the predicate pathway for subsequent AI pathology tools. Paige subsequently develops Paige Prostate Grade & Quantify and Paige Prostate Perineural Invasion, and in 2024 publishes Virchow, the largest foundation model for computational pathology. In 2025 Paige is acquired by Tempus AI.[70]
2021 (November) Drug discovery; pulmonary medicine; rare diseases Research finding Insilico Medicine, founded by Alex Zhavoronkov, initiates the first Phase I clinical trial of INS018_055 — a small molecule inhibitor for idiopathic pulmonary fibrosis (IPF) — marking the first time a drug both discovered and designed end-to-end by generative artificial intelligence enters human clinical trials. The molecule is generated using Insilico's Pharma.AI platform and nominated as a preclinical candidate just 18 months after the project begins, a fraction of the decade-plus typically required for traditional drug discovery. Phase I trials in New Zealand and China involving 126 subjects yield positive safety data in early 2023; the FDA grants orphan drug for IPF in February 2023. In June 2023 Insilico completes the first patient dose in a Phase II trial across 40 sites in the US and China — the first Phase II trial of a generative AI-designed drug — with Phase 2a results subsequently yielding positive topline data. The full methodology is published in Nature Biotechnology in March 2024. IPF affects approximately five million people worldwide and carries a median survival of two to five years after diagnosis.[71][72][73]
2021 (December 10) Radiology; pulmonology; health equity Research finding Laleh Seyyed-Kalantari, Marzyeh Ghassemi, and colleagues at the University of Toronto, Vector Institute, and MIT publish a systematic study in Nature Medicine demonstrating that state-of-the-art AI chest X-ray classifiers consistently and selectively underdiagnose underserved patient populations — including female patients, Black patients, and patients of low socioeconomic status — across three large public radiology datasets and a combined multi-source dataset. Underdiagnosis bias is worse for intersectional subgroups such as Black female patients and persists across different model architectures. Underdiagnosis is particularly harmful: a patient labelled healthy by an AI system may be denied further evaluation entirely. The paper extends the Obermeyer et al. 2019 finding from a resource allocation algorithm to a diagnostic imaging AI, demonstrating that algorithmic bias is present even in systems directly analyzing physiological data — and that the training datasets involved are the same ones used to develop systems deployed or under consideration for clinical use globally.[74]
2022 (January 26) Surgery; gastroenterology Research finding Hamed Saeidi, Axel Krieger, and colleagues at Johns Hopkins University publish the first demonstration of autonomous robotic laparoscopy soft tissue surgery in Science Robotics, reporting that their Smart Tissue Autonomous Robot (STAR) performs intestinal anastomosis — the surgical reconnection of two ends of bowel — with greater consistency and accuracy than expert human surgeons and da Vinci surgical system robot-assisted surgery. STAR combines real-time 3D tissue tracking, AI-based surgical planning, and autonomous suture execution, performing approximately 60% of the procedure fully autonomously. In phantom and porcine in vivo models it outperforms expert surgeons across metrics including needle placement, suture spacing, lumen (anatomy) patency, and leak pressure; animals survive with no complications and normal wound healing at seven days. The result is the first time an autonomous AI system demonstrably outperforms human surgeons on a standardized surgical task, with the authors arguing the approach could democratize surgical care by delivering consistent outcomes independent of individual surgeon skill.[75]
2022 Medical informatics; clinical reasoning Deployment Google Research introduces Med-PaLM, the first large language model (LLM) specifically designed for medical reasoning. Built on earlier models (PaLM and Flan-PaLM), it is tested using MultiMedQA, a benchmark that includes several medical question-answering datasets. Flan-PaLM performs well, scoring 67.6% on United States Medical Licensing Examination-style questions, but still shows issues with accuracy and safety. Med-PaLM is improved using a method called instruction prompt engineering, which helps it give more accurate and safer answers. In some areas, its performance is close to that of human doctors.[76]
2022 (July 28) Molecular biology; drug discovery; global health research Research initiative DeepMind and EMBL's European Bioinformatics Institute (EMBL-EBI) expand the AlphaFold Protein Structure Database to cover nearly all catalogued proteins known to science — over 200 million structures spanning virtually every organism with a sequenced genome. The database had launched in July 2021 with 350,000 structures covering the human proteome and 20 model organisms; this expansion represents a more than 500-fold increase. All structures are freely available to researchers worldwide at no cost. Within three years the database accumulates over three million users from more than 190 countries, with over 30% of citing papers related to disease research. The database is estimated to have saved hundreds of millions of research-years of experimental work. For medicine, the practical impact is concentrated in drug discovery, antimicrobial resistance research, cancer biology, and the study of neglected tropical disease — areas where the relationship between protein structure and therapeutic targets had previously been a major bottleneck.[77]
2022 (November 30) Medical informatics; clinical documentation; medical education; patient communication Deployment OpenAI publicly launches ChatGPT, a conversational large language model built on GPT-3.5 and trained using reinforcement learning from human feedback (RLHF), making advanced generative artificial intelligence freely accessible to the general public for the first time. ChatGPT reaches 100 million users within two months — faster than any consumer application in history — and is immediately adopted by clinicians for generating clinical documentation, supporting differential diagnosis, answering patient questions, and drafting referral letters, without regulatory oversight. A survey of UK general practitioner one year after launch finds 20% report using generative AI tools in clinical practice. ChatGPT also demonstrates that LLMs can pass the United States Medical Licensing Examination at or near the passing threshold without medical fine-tuning. The launch marks a fundamental shift from AI as a specialist institutional tool to AI as a consumer product used directly by clinicians and patients, transforming the nature of the regulatory and safety challenge.[78][79]
2023 (February 1) Diagnostic medicine; emergency medicine; triage decision support Research finding A study by David Levine, Andrew Beam, and colleagues at Harvard University evaluates GPT-3's diagnostic and triage performance against 48 validated case vignettes covering both common and severe conditions, comparing results to a nationally representative sample of lay people and practicing physicians. GPT-3 correctly includes the diagnosis in its top three suggestions for 88% of cases, substantially outperforming lay individuals (54%) though trailing physicians (96%). For triage, GPT-3's accuracy (71%) is comparable to lay individuals (74%) but significantly below physicians (91%). GPT-3's confidence scores are reasonably well-calibrated, with Brier score of 0.18 for diagnosis and 0.22 for triage. The study is the first systematic comparison of a general-purpose large language model against physicians on validated clinical vignettes, and demonstrates that without any medical fine-tuning GPT-3 performs substantially better than laypeople on diagnosis — a result that accelerates interest in LLMs as potential clinical decision support tools and raises questions about appropriate use boundaries.[80]
2023 (March 1) Nuclear medicine; radiology Research finding A study evaluates the value of domain adaptation in nuclear medicine by adapting language models for the purpose of 5-point Deauville score prediction based on clinical 18F-fluorodeoxyglucose (FDG) PET/CT reports. The researchers used multiple general-purpose transformer (machine learning model) language models to classify the reports into Deauville scores 1–5, and then adapted the models to the nuclear medicine domain using masked language model. Domain adaptation improved the performance of all language models, and the best performing model (domain-adapted RoBERTa) achieved a five-class accuracy of 77.4%, which was better than the physician's performance (66%), the best vision model's performance (48.1%), and was similar to the multimodal model's performance (77.2%).[81]
2023 (March) Medical informatics; clinical decision support; medical education Research finding Google Research introduces Med-PaLM 2 at Google Health's annual Check Up event, becoming the first AI system to reach human expert level on United States Medical Licensing Examination-style questions, achieving 86.5% accuracy on the MedQA benchmark — above the 67.6% of the original Med-PaLM (2022), which had itself been the first AI system to surpass the USMLE passing mark. Built on Google's PaLM 2 and fine-tuned using MultiMedQA, a benchmark spanning seven medical question-answering datasets, Med-PaLM 2 is evaluated by clinicians across axes including factuality, medical consensus, reasoning, bias, and likelihood of harm; its answers are preferred over physician answers across eight of nine axes in a pairwise study. The team also introduces Med-PaLM M, a multimodal learning extension capable of synthesizing chest X-ray, mammogram, retinal scans, pathology slides, and genomics data alongside language. The underlying technology becomes the foundation of MedLM, Google's family of healthcare foundation model. The original Med-PaLM paper is published in Nature (journal) in July 2023. Med-PaLM 2 demonstrates that the gap between general-purpose LLMs and medical specialists on structured knowledge tasks has effectively closed, while illustrating that benchmark performance alone does not resolve the deeper questions of safety, bias, and clinical deployment.[82]
2023 (May 5) Health informatics; clinical documentation Deployment Epic Systems and Microsoft announce the integration of GPT-4 into electronic health record (EHRs), marking a significant step in applying generative artificial intelligence to healthcare. The collaboration introduces two key AI-enabled features. First, clinicians can use In Basket to generate draft responses to patient messages, improving communication efficiency. Second, Slicer Dicer, Epic's data visualization tool, now uses AI to suggest relevant metrics based on user queries, streamlining data analysis. These tools aim to enhance provider productivity and patient engagement, representing a practical application of advanced AI within clinical workflows. This is the first large-scale deployment of large language model in healthcare documentation.[83]
2023 (June 9) Molecular biology; drug discovery Research finding PoET is introduced as a generative protein language model that designs new proteins with desired functions. It overcomes limitations of existing models by generating sets of related proteins as sequences-of-sequences across natural protein family. PoET can generate and score modifications for specific protein families, extrapolate well for small families, and outperforms existing models in variant effect prediction. Its transformer (machine learning model) layer allows modeling of sequential tokens within sequences while attending between sequences order invariantly. PoET improves variant effect prediction across proteins of all multiple sequence alignment depths.[84]
2023 (June 14) Radiology; medical imaging Research finding Radiology-GPT is introduced as a large language model designed for radiology. Trained through instruction tuning on a radiology-focused dataset, it outperforms general models such as StableLM, Dolly (language model), and LLaMA in radiological diagnosis, research, and communication tasks. The system highlights the potential of specialized, privacy-compliant language models for clinical natural language processing (NLP) and suggests that hospital-localized models combining conversational ability with domain expertise could further advance healthcare AI.[85]
2023 (June 16) Clinical medicine; medical informatics Research finding ClinicalGPT is introduced as a large language model specifically designed for clinical applications. It is trained using diverse real-world data including medical record, domain-specific knowledge, and multi-round dialogue consultations. Additionally, a comprehensive evaluation framework is proposed, encompassing medical knowledge question-answering, medical exams, patient consultations, and diagnostic analysis of medical records. Results indicate that ClinicalGPT outperforms other models in these tasks, showcasing its effectiveness in adapting large language models to the healthcare domain.[86]
2023 (June 22) Ophthalmology; primary care Regulatory milestone Eyenuk receives Food and Drug Administration clearance to expand its EyeArt AI system, allowing its use with the Topcon NW400 retinal camera alongside previously approved Canon Inc. models. EyeArt is the first FDA-cleared AI system compatible with multiple camera brands for autonomous detection of diabetic retinopathy (DR). The update includes Real-Time Image Quality Feedback and an enhanced image quality module, achieving best-in-class gradability without pupil dilation. Clinical trials show high accuracy: 94.4% sensitivity for mild DR and 96.8% for vision-threatening DR. With over 230,000 patients screened globally, the system aims to make AI-powered eye exams more accessible in primary care settings. EyeArt's clearance uses IDx-DR — cleared by the FDA in April 2018 as the first fully autonomous AI diagnostic system — as its regulatory predicate device, following the De Novo pathway IDx-DR established for autonomous diabetic retinopathy detection.[87]
2023 (September 13) Ophthalmology; cardiovascular diagnostics Research finding A study introduces RETFound, a foundation model designed for generalizable disease detection from retinal images. Trained on 1.6 million unlabelled retinal images using self-supervised learning, RETFound can be adapted efficiently to various diagnostic tasks with minimal labeled data. The model significantly outperforms existing approaches in detecting eye diseases and predicting systemic conditions such as heart failure and myocardial infarction. RETFound demonstrates strong potential to improve diagnostic accuracy while reducing reliance on expert annotations, offering a scalable, label-efficient framework for broad clinical applications in ophthalmology and beyond.[88]
2024 (January) Dermatology; primary care Regulatory milestone The Food and Drug Administration approves DermaSensor, the first AI-powered, noninvasive diagnostic tool for skin cancer detection at the point of care. This wireless handheld device uses spectroscopy and an FDA-cleared algorithm to analyze lesions for over 200 types of skin cancer, including melanoma, basal cell carcinoma, and squamous cell carcinoma. Clinical trials involving over 1,000 patients, led by the Mayo Clinic, demonstrated 96% sensitivity and 97% negative predictive value. A companion study with physicians showed DermaSensor halved missed cancer cases. This breakthrough highlights the integration of AI and spectroscopy in improving early cancer detection in primary care settings.[89]
2024 (March) Medical education; clinical decision support Research finding A study evaluates whether large language model (LLMs), such as GPT-3 and LLaMA, can reason effectively about complex medical questions. Using benchmarks like MedQA-United States Medical Licensing Examination, MedMCQA, and PubMedQA, and testing methods including chain-of-thought prompting and retrieval-augmented generation, the authors found that GPT-3.5 reached passing scores on all three datasets. InstructGPT demonstrated the ability to recall and reason with expert medical knowledge. Open-source models like Llama 2 are also closing the performance gap. The findings suggest that with proper prompting, LLMs can support medical decision-making, though challenges such as uncertainty quantification and positional bias remain.[90]
2024 (May 8) Molecular biology; drug discovery; medical research Research finding Google DeepMind and Isomorphic Labs introduce AlphaFold 3, a substantially expanded version of the AlphaFold system, alongside the AlphaFold Server — a free platform providing scientists access to AlphaFold 3 predictions for non-commercial research. AlphaFold 3 represents a qualitative advance beyond AlphaFold 2: while AlphaFold 2 predicts the 3D structure of proteins from amino acid sequences, AlphaFold 3 predicts the structure and interactions of all of life's molecules — including DNA, RNA, and small molecule ligands alongside proteins. This multimodal capability is directly relevant to drug discovery, where understanding how a small molecule drug candidate binds to a protein target is the central computational challenge. Isomorphic Labs, the drug discovery company founded by Demis Hassabis in 2021 to build on AlphaFold, has by this point entered drug discovery partnerships with several major pharmaceutical companies. The model code and weights are made available for academic use. AlphaFold 3 is built using a diffusion-based architecture — the same class of generative model underlying image generation systems like Stable Diffusion — applied to molecular structure prediction, marking a further convergence between generative AI methods and structural biology.[91][92]
2024 Global health; health policy Policy/legislation The World Health Organization (WHO) releases new guidance on the ethics and governance of large multi-modal models (LMMs)—AI systems that process diverse data types (text, images, video) and are increasingly used in health care. The report offers over 40 recommendations for governments, tech companies, and healthcare providers to ensure responsible use. Benefits of LMMs include support in diagnosis, patient communication, education, and research. However, WHO warns of risks like misinformation, bias, data quality issues, automation bias, and cybersecurity threats. WHO calls for inclusive development, regulatory oversight, post-deployment audits, and public infrastructure to promote ethical and equitable AI in health.[93]
2024 (August 1) All medical AI fields; medical devices; clinical decision support; diagnostics Policy/legislation The EU Artificial Intelligence Act (Regulation (EU) 2024/1689), the world's first comprehensive legislative framework for AI, enters into force following endorsement by all 27 European Union member states. The Act establishes a risk-based classification system — unacceptable risk (banned), high risk, limited risk, and minimal risk — with graduated obligations. For medicine, AI systems used as safety components of medical devices or subject to conformity assessment under the EU Medical Device Regulation or In Vitro Diagnostic Regulation are automatically classified as high risk, covering diagnostic software, clinical decision support tools, AI-enabled imaging systems, and surgical robotics. High-risk medical AI must meet requirements for mandatory conformity assessment, risk management, data quality, transparency, human oversight, and post-market monitoring. The Act phases in progressively: bans from February 2025, general-purpose AI obligations from August 2025, high-risk AI from August 2026, and full compliance for medical devices extended to August 2027. It is the first time a major jurisdiction imposes binding pre-market compliance requirements on AI diagnostic systems based on risk profile rather than clinical indication, establishing a regulatory template other jurisdictions are expected to follow.[94][95]
2024 (October 9) Molecular biology; drug discovery; medical research Award The Royal Swedish Academy of Sciences awards the Nobel Prize in Chemistry 2024 to Demis Hassabis and John Jumper of Google DeepMind for protein structure prediction using AlphaFold, and to David Baker (biochemist) of the University of Washington for computational protein design — the first time the Nobel Prize in Chemistry is awarded for work in which artificial intelligence is the central methodological contribution. Hassabis and Jumper share one half of the 11 million Swedish krona prize; Baker receives the other. The citation recognizes that AlphaFold 2 has been used to predict the structures of virtually all 200 million catalogued proteins, with over two million researchers from 190 countries accessing the AlphaFold Protein Structure Database. The award follows the Nobel Prize in Physics the previous day, also awarded for AI — to Geoffrey Hinton and John Hopfield — making October 2024 the first time the Nobel committee awards consecutive prizes for AI-related work across two disciplines, validating a trajectory in which AI transitions from a narrow tool to a method capable of scientific breakthroughs with direct implications for drug discovery, antimicrobial resistance research, and therapeutic protein design.[96][97]
2024 (October) Oncology; histopathology; pathology Research finding Researchers at Paige AI, Microsoft Research, and Memorial Sloan Kettering Cancer Center publish Virchow, a 632 million parameter Vision Transformer trained using self-supervised learning on 1.5 million whole-slide images from 100,000 patients across 17 tissue types — the largest foundation model for computational pathology to date. Named after Rudolf Virchow, the 19th-century father of modern pathology, the model achieves an AUC of 0.95 across nine common and seven rare cancer types for pan-cancer detection, and outperforms tissue-specific clinical-grade AI products on some rare cancer variants — demonstrating the value of large-scale pretraining for long-tail clinical problems where labeled data is scarce. Paige, the model's developer, received the first FDA-authorized AI pathology system in 2021, giving Virchow a direct pathway toward clinical deployment.[98]
2025 (January 15) Radiology; genomic medicine Research initiative Mayo Clinic announces major partnerships with Microsoft Research and Cerebras Systems to advance AI in healthcare. With Microsoft, Mayo develops multimodal learning foundation model using chest X-ray and radiology reports to enhance diagnostics, automate workflows, and improve patient care. In parallel, Mayo and Cerebras create a genomic foundation model that uses exome and genome data to personalize treatments, with early success in predicting rheumatoid arthritis therapy responses. These collaborations leverage powerful AI and computing technologies to accelerate diagnosis, improve clinical precision, and bring personalized medicine closer to everyday care through scalable, real-world applications.[99]
2025 (February 24) Obstetrics; maternal-fetal medicine; reproductive medicine Regulatory milestone The Food and Drug Administration grants 510(k) clearance to Sonio for Sonio Suspect, an AI-powered ultrasound module for fetal abnormality detection. The system automatically detects eight abnormal findings across seven ultrasound views of the fetal heart, brain, and abdomen from as early as 11 weeks gestation. In a multicenter study across 47 sites, Sonio Suspect improves fetal anomaly detection from 69% to 91% AUC — a 22 percentage point improvement — with consistent results regardless of clinician experience. The clinical need is substantial: up to 51% of fetal anomalies are missed during standard prenatal ultrasound, with 31% of missed cases resulting from misinterpretation of adequate-quality images rather than image quality. The system integrates with a companion quality assurance algorithm and is compatible with equipment from GE HealthCare, Samsung, and Canon Inc., enabling deployment across diverse clinical settings.[100][101]

Meta information on the timeline

How the timeline was built

The initial version of the timeline was written by Sebastian.

Funding information for this timeline is available.

Feedback and comments

Feedback for the timeline can be provided at the following places:

  • FIXME

What the timeline is still missing

Timeline update strategy

See also

References

  1. "Heuristic DENDRAL: A program for generating explanatory hypotheses in organic chemistry". ResearchGate. NASA Technical Reports Server (NTRS). February 1968. Retrieved 21 June 2025.
  2. "IBM Watson Hard At Work: New Breakthroughs Transform Quality Care for Patients". Memorial Sloan Kettering Cancer Center. February 8, 2013. Retrieved 19 March 2026.
  3. Jo Cavallo (September 10, 2019). "Confronting the Criticisms Facing Watson for Oncology". The ASCO Post. Retrieved 19 March 2026.
  4. Turing, Alan M. (October 1950). "Computing Machinery and Intelligence". Mind. 59 (236): 433–460. doi:10.1093/mind/LIX.236.433. Retrieved 2025-06-07.
  5. McCarthy, John; Minsky, Marvin; Rochester, Nathaniel; Shannon, Claude (August 1955). "A Proposal for the Dartmouth Summer Research Project on Artificial Intelligence" (PDF). Retrieved 2025-06-07.
  6. Solomonoff, Grace (2023). "The Meeting of the Minds That Launched AI". IEEE Spectrum. Retrieved 2025-06-07.
  7. "Computers, Artificial Intelligence, and Expert Systems in Biomedical Research". NLM Profiles in Science. National Library of Medicine. Retrieved 2025-06-07.
  8. "DENDRAL". Retrieved 2025-06-07.
  9. "Saul Amarel — In Memoriam". Rutgers University Department of Computer Science. Retrieved 2025-06-07.
  10. Kulikowski, Casimir A. (2019). "Beginnings of Artificial Intelligence in Medicine (AIM): Computational Artifice Assisting Scientific Inquiry and Clinical Art". Yearbook of Medical Informatics. Retrieved 2025-06-07.
  11. B.J. Copeland. "MYCIN". Encyclopædia Britannica, Inc. Retrieved 2025-06-07.
  12. de Dombal, F.T.; Leaper, D.J.; Staniland, J.R.; McCann, A.P.; Horrocks, J.C. (1 April 1972). "Computer-aided diagnosis of acute abdominal pain". British Medical Journal. 2 (5804): 9–13. doi:10.1136/bmj.2.5804.9. Retrieved 2025-06-07.
  13. Miller, Randolph A. (19 August 1982). "INTERNIST-1: An Experimental Computer-Based Diagnostic Consultant for General Internal Medicine". New England Journal of Medicine. 307: 468–476. doi:10.1056/NEJM198208193070803.
  14. Kulikowski, Casimir A. (13 August 2015). "An Opening Chapter of the First Generation of Artificial Intelligence in Medicine: The First Rutgers AIM Workshop, June 1975". Yearbook of Medical Informatics. 10 (1): 227–233. doi:10.15265/IY-2015-016. Retrieved 2025-06-07.
  15. Weiss, Sholom M.; Kulikowski, Casimir A.; Amarel, Saul; Safir, Aran (August 1978). "A model-based method for computer-aided medical decision-making". Artificial Intelligence. 11 (1–2): 145–172. doi:10.1016/0004-3702(78)90015-2. Retrieved 2025-06-07.
  16. Barnett, G. Octo; Cimino, James J.; Hupp, Jon A.; Hoffer, Edward P. (3 July 1987). "DXplain: An evolving diagnostic decision-support system". JAMA. 258 (1): 67–74. doi:10.1001/jama.258.1.67. Retrieved 2025-06-07.
  17. Heckerman, David E.; Horvitz, Eric J.; Nathwani, Bruce N. (July 1992). "Toward Normative Expert Systems: Part I. The Pathfinder Project". Methods of Information in Medicine. 31 (2): 90–105. doi:10.1055/s-0038-1634867. Retrieved 2025-06-07.
  18. Golub, Todd R.; Slonim, Donna K.; Tamayo, Pablo; Lander, Eric S. (15 October 1999). "Molecular classification of cancer: class discovery and class prediction by gene expression monitoring". Science. 286 (5439): 531–537. doi:10.1126/science.286.5439.531. Retrieved 2025-06-07.
  19. "The HIV Resistance Response Database Initiative (RDI)". HIV RDI. Retrieved 2025-06-07.
  20. Ahmed, Sahin (9 October 2024). "Geoffrey Hinton: The Godfather of AI Who Now Warns of Its Dangers". Medium. Retrieved 2025-06-07.
  21. Adler-Milstein, Julia; Jha, Ashish K. (August 2017). "HITECH Act Drove Large Gains in Hospital Electronic Health Record Adoption". Health Affairs. 36 (8): 1416–1422. doi:10.1377/hlthaff.2016.1651. Retrieved 2025-06-07.
  22. "Health Information Technology for Economic and Clinical Health Act" (PDF). US Government Publishing Office. Retrieved 2025-06-07.
  23. "Watson, Jeopardy! champion". IBM. Retrieved 2025-06-07.
  24. Moorman, J. Randall; Carlo, Waldemar A.; Kattwinkel, John; Schelonka, Robert L.; Porcelli, Philip J.; Fairchild, Karen D. (December 2011). "Mortality Reduction by Heart Rate Characteristic Monitoring in Very Low Birth Weight Neonates: A Randomized Trial". Journal of Pediatrics. 159 (6): 900–906. doi:10.1016/j.jpeds.2011.06.044. Retrieved 2025-06-07.
  25. "Memorial Sloan Kettering Cancer Center, IBM to Collaborate in Applying Watson Technology to Help Oncologists". Memorial Sloan Kettering Cancer Center. 22 March 2012. Retrieved 2025-06-07.
  26. Krizhevsky, Alex; Sutskever, Ilya; Hinton, Geoffrey E. (2012). "ImageNet Classification with Deep Convolutional Neural Networks". NeurIPS 2012. Retrieved 2025-06-07.
  27. Krizhevsky, Alex; Sutskever, Ilya; Hinton, Geoffrey. "AlexNet and ImageNet: The Birth of Deep Learning". Pinecone. Retrieved 2025-06-07.
  28. "The rise — and fall — of Babylon". Sifted. October 2023. Retrieved 2025-06-07.
  29. "Babylon Health: the failed AI wonder app that 'dazzled' politicians". The Week. Retrieved 2025-06-07.
  30. Novet, Jordan (28 October 2014). "Enlitic picks up $2 M to help diagnose diseases with deep learning". VentureBeat. Retrieved 2025-06-07.
  31. "Heartflow Secures De Novo Clearance from the U.S. Food and Drug Administration for Breakthrough FFRCT Technology". HeartFlow Inc. 1 December 2014. Retrieved 2025-06-07.
  32. "Google and Johnson & Johnson Conjugate to Create Verb Surgical". IEEE Spectrum. Retrieved 2025-06-07.
  33. "Johnson & Johnson to take over Verb Surgical, its robotics venture with Verily". Fierce Biotech. 20 December 2019. Retrieved 2025-06-07.
  34. Gulshan, Varun; Peng, Lily; Coram, Marc (29 November 2016). "Development and Validation of a Deep Learning Algorithm for Detection of Diabetic Retinopathy in Retinal Fundus Photographs". JAMA. 316 (22): 2402–2410. doi:10.1001/jama.2016.17216. Retrieved 2025-06-07.
  35. "This AI can spot skin cancer as accurately as a doctor". WIRED. 25 January 2017. Retrieved 2025-06-07.
  36. "Arterys Receives FDA Clearance For The First Zero-Footprint Medical Imaging Analytics Cloud Software With Deep Learning For Cardiac MRI" (Press release). PR Newswire. 9 January 2017. Retrieved 2025-06-07.
  37. Fitzpatrick, Kathleen Kara; Darcy, Alison; Vierhile, Molly (6 June 2017). "Delivering Cognitive Behavior Therapy to Young Adults With Symptoms of Depression and Anxiety Using a Fully Automated Conversational Agent (Woebot): A Randomized Controlled Trial". JMIR Mental Health. 4 (2): e19. doi:10.2196/mental.7785. Retrieved 2025-06-07.{{cite journal}}: CS1 maint: unflagged free DOI (link)
  38. "Full Translation: China's 'New Generation Artificial Intelligence Development Plan' (2017)". DigiChina, Stanford University. 20 July 2017. Retrieved 2025-06-07.
  39. Han, Yu; Ceross, Aaron; Bergmann, Jeroen (29 July 2024). "Regulatory Frameworks for AI-Enabled Medical Device Software in China". Journal of Medical Internet Research. doi:10.2196/46871. Retrieved 2025-06-07.{{cite journal}}: CS1 maint: unflagged free DOI (link)
  40. Ross, Casey; Swetlitz, Ike (5 September 2017). "IBM pitched its Watson supercomputer as a revolution in cancer care. It's nowhere close". STAT News. Retrieved 2025-06-07.
  41. "Pear Therapeutics Obtains FDA Clearance of the First Prescription Digital Therapeutic to Treat Disease" (Press release). PR Newswire. 14 September 2017. Retrieved 2025-06-07.
  42. Constine, Josh (27 November 2017). "Facebook rolls out AI to detect suicidal posts before they're reported". TechCrunch. Retrieved 2025-06-07.
  43. Walsh, Colin G.; Ribeiro, Jessica D.; Franklin, Joseph C. (2017). "Predicting Risk of Suicide Attempts Over Time Through Machine Learning". Clinical Psychological Science. 5 (3): 457–469. doi:10.1177/2167702617691560. Retrieved 2025-06-07.
  44. Cole, James H.; Franke, Katja (October 2017). "Predicting Age Using Neuroimaging: Innovative Brain Ageing Biomarkers". Trends in Neurosciences. 40 (12): 681–690. doi:10.1016/j.tins.2017.10.001. Retrieved 2025-06-07.
  45. "FDA permits marketing of clinical decision support software for alerting providers of a potential stroke in patients". US Food and Drug Administration. 13 February 2018. Retrieved 2025-06-07.
  46. "Viz.ai Granted De Novo FDA Clearance for First Artificial Intelligence Triage Software". PR Newswire. 15 February 2018. Retrieved 2025-06-07.
  47. Poplin, Ryan; Varadarajan, Avinash V.; Blumer, Katy; Liu, Yun; McConnell, Michael V.; Corrado, Greg S.; Peng, Lily; Webster, Dale R. (19 February 2018). "Prediction of cardiovascular risk factors from retinal fundus photographs via deep learning". Nature Biomedical Engineering. 2: 158–164. doi:10.1038/s41551-018-0195-0. Retrieved 2025-06-07.
  48. "FDA Permits Marketing of IDx-DR for Automated Detection of Diabetic Retinopathy in Primary Care". University of Iowa College of Engineering. April 2018. Retrieved 2025-06-07.
  49. "Artificial Intelligence and Diabetic Retinopathy: AI Framework, Prospective Studies, Head-to-head Validation, and Cost-effectiveness". Diabetes Care. 46 (10). October 2023. Retrieved 2025-06-07.
  50. "De Novo Classification Request for ECG App (DEN180044)" (PDF). US Food and Drug Administration. 14 August 2018. Retrieved 2025-06-07.
  51. Poplin, Ryan; Chang, Pi-Chuan; Alexander, David; DePristo, Mark A. (24 September 2018). "A universal SNP and small-indel variant caller using deep neural networks". Nature Biotechnology. 36: 983–987. doi:10.1038/nbt.4235. Retrieved 2025-06-07.
  52. Zech, John R.; Badgeley, Marcus A.; Liu, Manway; Costa, Anthony B.; Titano, Joseph J.; Oermann, Eric Karl (6 November 2018). "Variable generalization performance of a deep learning model to detect pneumonia in chest radiographs: A cross-sectional study". PLOS Medicine. 15 (11): e1002683. doi:10.1371/journal.pmed.1002683. Retrieved 2025-06-07.{{cite journal}}: CS1 maint: unflagged free DOI (link)
  53. "DeepMind's AI can detect over 50 eye diseases as accurately as a doctor". The Verge. 13 August 2018. Retrieved 2025-06-07.
  54. Hannun, Awni Y.; Rajpurkar, Pranav; Haghpanahi, Masoumeh; Tison, Geoffrey H.; Bourn, Codie; Turakhia, Mintu P.; Ng, Andrew Y. (7 January 2019). "Cardiologist-level arrhythmia detection and classification in ambulatory electrocardiograms using a deep neural network". Nature Medicine. 25: 65–69. doi:10.1038/s41591-018-0268-3. Retrieved 2025-06-07.
  55. Young, Annie (1 February 2019). "MIMIC Chest X‑Ray database to provide researchers access to over 350,000 patient radiographs". MIT News. Retrieved 2025-06-07.
  56. Johnson, Alistair E. W.; Pollard, Tom J.; Berkowitz, Seth J.; Greenbaum, Nathaniel R.; Lungren, Matthew P.; Deng, Chih‑ying; Mark, Roger G.; Horng, Steven (12 December 2019). "MIMIC‑CXR, a de‑identified publicly available database of chest radiographs with free‑text reports". Scientific Data. 6. doi:10.1038/s41597-019-0322-0. Retrieved 2025-06-07.
  57. Radford, Alec; Wu, Jeffrey; Amodei, Dario; Amodei, Daniela; Clark, Jack; Brundage, Miles; Sutskever, Ilya (14 February 2019). "Better language models and their implications". OpenAI. Retrieved 2025-06-07.
  58. Wang, Pu; Berzin, Tyler M.; Glissen Brown, Jeremy R. (2019). "Real-time automatic detection system increases colonoscopic polyp and adenoma detection rates: a prospective randomised controlled study". Gut. doi:10.1136/gutjnl-2018-317500. Retrieved 2025-06-07.
  59. "How AI is making eyesight-saving care more accessible in resource-constrained settings". Google Blog. October 2024. Retrieved 2025-06-07.
  60. "Google works with Aravind Eye Hospital to deploy AI that can detect eye disease". VentureBeat. 25 February 2019. Retrieved 2025-06-07.
  61. Obermeyer, Ziad; Powers, Brian; Vogeli, Christine; Mullainathan, Sendhil (25 October 2019). "Dissecting racial bias in an algorithm used to manage the health of populations". Science. 366 (6464): 447–453. doi:10.1126/science.aax2342. Retrieved 2025-06-07.
  62. "How Canadian AI start-up BlueDot spotted Coronavirus before anyone else had a clue". Diginomica. 20 March 2020. Retrieved 2025-06-07.
  63. "How this Canadian start-up spotted coronavirus before everyone else knew about it". CNBC. 3 March 2020. Retrieved 2025-06-07.
  64. Ouyang, David; He, Bryan; Ghorbani, Amirata; Yuan, Neal; Ebinger, Joseph; Langlotz, Curtis P.; Heidenreich, Paul A.; Harrington, Robert A.; Liang, David H.; Ashley, Euan A.; Zou, James Y. (25 March 2020). "Video-based AI for beat-to-beat assessment of cardiac function". Nature. 580: 252–256. doi:10.1038/s41586-020-2145-8. Retrieved 2025-06-07.
  65. Brown, Tom; Mann, Benjamin; Sutskever, Ilya (2020). "Language Models are Few-Shot Learners". NeurIPS 2020. Retrieved 2025-06-07.
  66. Sezgin, Emre; Sirrianni, Joseph; Linwood, Simon L. (10 February 2022). "Operationalizing and Implementing Pretrained, Large Artificial Intelligence Linguistic Models in the US Health Care System: Outlook of Generative Pretrained Transformer 3 (GPT-3) as a Service Model". JMIR Medical Informatics. 10 (2): e32875. doi:10.2196/32875. Retrieved 2025-06-07.{{cite journal}}: CS1 maint: unflagged free DOI (link)
  67. "AlphaFold: Accelerating breakthroughs in biology with AI". Google DeepMind. Retrieved 2025-06-07.
  68. Jumper, John; Evans, Richard; Pritzel, Alexander (15 July 2021). "Highly accurate protein structure prediction with AlphaFold". Nature. 596 (7873): 583–589. doi:10.1038/s41586-021-03819-2. Retrieved 2025-06-07.
  69. Dayan, Ittai; Roth, Holger R.; Zhong, Aoxiao (15 September 2021). "Federated learning for predicting clinical outcomes in patients with COVID-19". Nature Medicine. 27: 1735–1743. doi:10.1038/s41591-021-01506-3. Retrieved 2025-06-07.
  70. "Paige Receives First Ever FDA Approval for AI Product in Digital Pathology" (Press release). Business Wire. 22 September 2021. Retrieved 2025-06-07.
  71. "From Start to Phase 1 in 30 Months". Insilico Medicine. November 2021. Retrieved 2025-06-07.
  72. "First Generative AI Drug Begins Phase II Trials with Patients". Insilico Medicine. June 2023. Retrieved 2025-06-07.
  73. "A small-molecule TNIK inhibitor targets fibrosis in preclinical and clinical models". Nature Biotechnology. March 2024. Retrieved 2025-06-07.
  74. Seyyed-Kalantari, Laleh; Zhang, Haoran; McDermott, Matthew B. A.; Chen, Irene Y.; Ghassemi, Marzyeh (10 December 2021). "Underdiagnosis bias of artificial intelligence algorithms applied to chest radiographs in under-served patient populations". Nature Medicine. 27: 2176–2182. doi:10.1038/s41591-021-01595-0. Retrieved 2025-06-07.
  75. Saeidi, Hamed; Opfermann, Justin D.; Kam, Michael; Krieger, Axel (26 January 2022). "Autonomous robotic laparoscopic surgery for intestinal anastomosis". Science Robotics. 7 (62). doi:10.1126/scirobotics.abj2908. Retrieved 2025-06-07.
  76. "Med-PaLM: A Medical Large Language Model". Google Research. Retrieved 2025-06-07.
  77. "AlphaFold: Accelerating breakthroughs in biology with AI". Google DeepMind. Retrieved 2025-06-07.
  78. Mesko, Bertalan (22 June 2023). "The ChatGPT (Generative Artificial Intelligence) Revolution Has Made Artificial Intelligence Approachable for Medical Professionals". JMIR Medical Education. doi:10.2196/48392. Retrieved 2025-06-07.{{cite journal}}: CS1 maint: unflagged free DOI (link)
  79. "Generative artificial intelligence in primary care: an online survey of UK general practitioners". BJGP Open. 2024. Retrieved 2025-06-07.
  80. Levine, David M.; Tuwani, Rudraksh; Kompa, Benjamin; Varma, Amita; Finlayson, Samuel G.; Mehrotra, Ateev; Beam, Andrew (August 2024). "The diagnostic and triage accuracy of the GPT-3 artificial intelligence model: an observational study". Lancet Digital Health. 6 (8): e555 – e561. doi:10.1016/S2589-7500(24)00097-9. Retrieved 2025-06-07.
  81. Huemann, Zachary; Lee, Changhee; Hu, Junjie; Cho, Steve Y.; Bradshaw, Tyler (1 March 2023). "Domain-adapted large language models for classifying nuclear medicine reports". arXiv:2303.01258 [cs]. doi:10.48550/arXiv.2303.01258.
  82. "Med-PaLM: A large language model from Google Research, designed for the medical domain". Google Research. Retrieved 2025-06-07.
  83. "Epic and Microsoft Bring GPT-4 to EHRs". Epic. 5 May 2023. Retrieved 2025-06-07.
  84. Truong, Timothy F.; Bepler, Tristan (2023). "PoET: A generative model of protein families as sequences-of-sequences". doi:10.48550/arXiv.2306.06156.
  85. Liu, Zhengliang; Zhong, Aoxiao; Li, Yiwei; Yang, Longtao; Ju, Chao; Wu, Zihao; Ma, Chong; Shu, Peng; Chen, Cheng; Kim, Sekeun; Dai, Haixing; Zhao, Lin; Zhu, Dajiang; Liu, Jun; Liu, Wei; Shen, Dinggang; Li, Xiang; Li, Quanzheng; Liu, Tianming (2023). "Radiology-GPT: A Large Language Model for Radiology". doi:10.48550/arXiv.2306.08666.
  86. Wang, Guangyu; Yang, Guoxing; Du, Zongxin; Fan, Longjun; Li, Xiaohu (2023). "ClinicalGPT: Large Language Models Finetuned with Diverse Medical Data and Comprehensive Evaluation". doi:10.48550/arXiv.2306.09968.
  87. "New FDA Clearance Makes Eyenuk the First Company with Multiple Cameras for Autonomous AI Detection of Diabetic Retinopathy". Eyenuk. 22 June 2023. Retrieved 2025-06-07.
  88. Zhou, Yukun; Chia, Mark A.; Wagner, Siegfried K.; Ayhan, Murat S.; Williamson, Dominic J.; Struyven, Robbert R.; Liu, Timing; Xu, Moucheng; Lozano, Mateo G.; Woodward‑Court, Peter; Kihara, Yuka (13 September 2023). "A foundation model for generalizable disease detection from retinal images". Nature. 622 (7981): 156–163. doi:10.1038/s41586-023-06555-x. Retrieved 2025-06-07.
  89. "FDA Approves First AI‑Powered Skin Cancer Diagnostic Tool". AIM at Melanoma Foundation. 18 January 2024. Retrieved 2025-06-07.
  90. Liévin, Valentin; Hother, Christoffer Egeberg; Motzfeldt, Andreas Geert; Winther, Ole (1 March 2024). "Can large language models reason about medical questions?". Patterns. 5 (3): 100943. doi:10.1016/j.patter.2024.100943. Retrieved 2025-06-07.
  91. "AlphaFold: Accelerating breakthroughs in biology with AI". Google DeepMind. Retrieved 2025-06-07.
  92. Abramson, Josh (8 May 2024). "Accurate structure prediction of biomolecular interactions with AlphaFold 3". Nature. 630: 493–500. doi:10.1038/s41586-024-07487-w. Retrieved 2025-06-07.
  93. "WHO releases AI ethics and governance guidance for large multi-modal models". World Health Organization. 18 January 2024. Retrieved 2025-06-07.
  94. "AI Act enters into force". European Commission. 1 August 2024. Retrieved 2025-06-07.
  95. "Artificial Intelligence in healthcare". European Commission — Public Health. Retrieved 2025-06-07.
  96. "Press release: The Nobel Prize in Chemistry 2024". NobelPrize.org. The Royal Swedish Academy of Sciences. 9 October 2024. Retrieved 2025-06-07.
  97. "Demis Hassabis and John Jumper awarded Nobel Prize in Chemistry". Google DeepMind. 9 October 2024. Retrieved 2025-06-07.
  98. Vorontsov, Eugene; Bozkurt, Alican; Casson, Adam; Shaikovski, George; Zelechowski, Michal; Fuchs, Thomas J. (October 2024). "A foundation model for clinical-grade computational pathology and rare cancers detection". Nature Medicine. 30 (10): 2924–2935. doi:10.1038/s41591-024-03141-0. Retrieved 2025-06-07.
  99. "Mayo Clinic partners with Microsoft Research and Cerebras to revolutionize AI in healthcare". News‑Medical. 15 January 2025. Retrieved 2025-06-07.
  100. "Sonio Announces FDA Clearance of Sonio Suspect". Sonio. 24 February 2025. Retrieved 2025-06-07.
  101. "FDA Clears AI Tool to Improve Ultrasound Detection of Fetal Anomalies". 24x7 Magazine. 24 February 2025. Retrieved 2025-06-07.