Timeline of machine learning

From Timelines
Jump to navigation Jump to search
The content on this page is forked from the English Wikipedia page entitled "Timeline of machine learning". The original page still exists at Timeline of machine learning. The original content was released under the Creative Commons Attribution/Share-Alike License (CC-BY-SA), so this page inherits this license.

This page is a timeline of machine learning. Major discoveries, achievements, milestones and other major events are included.

Sample questions

  • How have foundational concepts from mathematics, logic, and computer science shaped the theoretical foundations of machine learning?
    • Sort the full timeline by "Event type" and look for the group of rows with value "Concept".
    • You will see milestones ranging from early mathematical ideas such as Bayes' theorem and Boolean algebra to later concepts like Markov chains and deep learning, illustrating the theoretical foundations that underpin modern machine learning.
  • How have machine learning algorithms evolved from early theoretical computational procedures to the diverse set of methods used in modern machine learning systems?
    • Sort the full timeline by "Event type" and look for the group of rows with value "Algorithm".
    • You will see a chronological progression beginning with early conceptual algorithms such as Ada Lovelace’s Analytical Engine procedure, followed by foundational machine learning methods including ADALINE, nearest neighbor, and expectation–maximization. Later entries introduce influential techniques such as backpropagation, decision tree learning, reinforcement learning algorithms like Q-learning, ensemble methods such as AdaBoost and random forests, kernel-based methods like support vector machines, and modern anomaly detection and data analysis approaches including local outlier factor and isolation forest. This progression illustrates the expanding methodological toolkit that underpins contemporary machine learning.
  • How have machine learning models evolved from early neural network architectures to modern deep learning systems capable of complex perception and sequence modeling?
    • Sort the full timeline by "Event type" and look for the group of rows with value "Model development".
    • You will see a chronological progression beginning with early neural network models such as ADALINE and MADALINE, followed by architectures like Crossbar Adaptive Array and the Dehaene–Changeux model, and later advances including Long short-term memory, Hierarchical Temporal Memory, large-scale neural systems for tasks such as cat detection at Google, and high-accuracy facial recognition models like DeepFace developed at Facebook.
  • How have machine learning and data-driven methods been applied to real-world problems across different domains over time?
    • Sort the full timeline by "Event type" and look for the group of rows with value "Application".
    • You will see a chronological progression beginning with early data-driven analysis such as John Snow’s cholera map, followed by early computer applications like Arthur Samuel’s checkers-playing program and neural-network-based echo cancellation. Later entries illustrate expanding applications in robotics, language learning, handwriting recognition, speech recognition, motion sensing interfaces such as Microsoft Kinect, financial fraud detection systems, and automated moderation tools for detecting online harassment, demonstrating the growing practical impact of machine learning across science, industry, and digital platforms.
  • How have major machine learning achievements evolved from early demonstrations of competitive performance against humans to widely deployed systems with real-world scientific, creative, and societal impact?
    • Sort the full timeline by "Event type" and look for the group of rows with value "Achievement".
    • You will see a chronological progression beginning with early systems such as TD-Gammon rivaling human backgammon players and Deep Blue defeating Garry Kasparov in chess, followed by Watson winning Jeopardy! and AlphaGo defeating professional Go players. Later entries illustrate expanding capabilities in perception (LipNet), scientific research and healthcare applications, and large-scale generative systems such as ChatGPT, LLaMA, and GPT-4, highlighting the growing scope and societal influence of machine learning technologies.
  • How have organizations and research groups contributed to the development and institutionalization of machine learning?
    • Sort the full timeline by "Event type" and look for the group of rows with value "Organization".
    • You will see a sequence of milestones including the founding of research groups, including major AI organizations such as DeepMind and OpenAI.
  • Other events are described under the following types: "Mathematical foundation", "Milestone", "platform", "Policy / advocacy", "Precursor", and "Theoretical analysis".

Big picture

Time period Development summary More details
1950s-1970s Early days The early days of machine learning are marked by the development of statistical methods and the use of simple algorithms. In the 1950s, Arthur Samuel develops a machine learning algorithm that can learn to play checkers. In the 1960s, Frank Rosenblatt develops the perceptron, a simple neural network that could learn to classify patterns. However, the early days of machine learning were also marked by a period of pessimism, known as the AI Winter. This was due to a number of factors, including the failure of some early AI projects and the difficulty of scaling up machine learning algorithms to large datasets.
1980s-1990s Resurgence The rediscovery of backpropagation causes a resurgence in machine learning research. Convolutional neural networks emerge. Support vector machines and recurrent neural networks become popular. Machine learning shifts from a knowledge-driven approach to a data-driven approach.[1]
2000s-present Modern era The modern era of machine learning begins in the 2000s, when the development of deep learning make it possible to train neural networks on even larger datasets. This leads to a resurgence of interest in neural networks, and they are now used in a wide variety of applications, including image recognition, natural language processing, speech recognition, machine translation, medical diagnosis, financial trading, and self-driving cars.

Summary by decade

Decade Summary
<1950s Statistical methods are discovered and refined.
1950s Pioneering machine learning research is conducted using simple algorithms.
1960s The field of neural network research experiences a notable development with the discovery and utilization of multilayers.[2] neural networks were primarily shallow in structure, meaning they consisted of only a few layers of interconnected neurons. These shallow neural networks had limitations in handling complex problems that required more sophisticated data representations. However, they laid the foundation for further advancements in neural network research and paved the way for the development of deeper and more powerful networks in the future.[3]
1970s The AI Winter is caused by pessimism about machine learning effectiveness. Backpropagation is developed, allowing a network to adjust its hidden layers of neurons/nodes to adapt to new situations.[2]
1980s During the mid-1980s, the focus of research in the field of machine learning shifts towards artificial neural networks (ANN). However, in the subsequent decade of the 1990s, statistical learning systems gain prominence and temporarily overshadows the popularity of ANN. A pivotal event during this period is the emergence of convolution as a significant concept in machine learning, while the rediscovery and renewed exploration of backpropagation techniques leads to a resurgence of interest and advancement in the field of machine learning research. Rediscovery of backpropagation causes a resurgence in machine learning research.[4][3]
1990s There is a shift away from neural networks and towards statistical learning methods. Statistical learning methods are able to achieve comparable or better performance than neural networks on a wider range of tasks. However, neural networks continue to be used for some specific tasks, such as natural language processing and image recognition.[5][6][7]
2000s Deep learning becomes feasible and neural networks see widespread commercial use.[3]
2010s Machine learning becomes integral to many widely used software services and receives great publicity.

Full timeline

Inclusion criteria

The following events are selected for inclusion in the timeline:

  • Foundational theoretical and mathematical contributions underpinning machine learning, including statistical methods, learning theory, and neural network models.
  • Development of major algorithms, architectures, and techniques that significantly advanced the field (e.g., perceptron, backpropagation, support vector machines, deep learning).
  • Conceptual and paradigm shifts, such as transitions from symbolic AI to statistical learning and from shallow models to deep learning.
  • Notable experimental demonstrations and benchmark achievements showing substantial improvements in performance or capabilities.
  • Technological and infrastructure advances enabling large-scale machine learning (e.g., distributed computing, GPUs, large datasets).
  • Influential applications and deployments that demonstrate real-world impact across domains.

We do not include:

  • Incremental algorithmic improvements without clear field-wide impact.
  • Narrow or domain-specific applications lacking broader relevance.
  • Redundant entries describing similar techniques or results.
  • Speculative ideas or minor publications without lasting influence.

Timeline

Year Event Type Caption Event
1642 Precursor Pascal's arithmetic machine At the age of 19, French child prodigy Blaise Pascal creates an "arithmetic machine" for his father, a tax collector. This machine has the capability to perform addition, subtraction, multiplication, and division. Though purely mechanical, Pascal's machine represents one of the earliest attempts to automate calculation, a conceptual step that would eventually lead, through centuries of incremental advancement in computing machinery, to the programmable computers on which modern machine learning runs.[8]
1679 Concept Binary number system Gottfried Wilhelm Leibniz, a German mathematician, philosopher, and sometimes poet, is credited with inventing the binary code system, which serves as the basis for contemporary computing. The binary system's reduction of all values to 0s and 1s would prove foundational not just for computing hardware, but for the logical operations underlying all machine learning algorithms, which ultimately manipulate binary representations of data at the hardware level.[8]
1763 Concept The Underpinnings of Bayes' Theorem Thomas Bayes's work An Essay towards solving a Problem in the Doctrine of Chances is published two years after his death, having been amended and edited by a friend of Bayes, Richard Price.[9] The essay presents work which underpins Bayes theorem. Bayes' theorem would become one of the most widely applied mathematical tools in machine learning, forming the foundation of probabilistic classifiers, spam filters, and Bayesian inference methods that allow models to update their beliefs as new data arrives.
1801 Precursor Jacquard programmable loom French weaver and merchant Joseph-Marie Jacquard introduces a groundbreaking innovation in data storage through the invention of a programmable weaving loom. The loom utilizes punched cards to control the movement of warp threads, enabling the creation of intricate patterns in fabric. The idea that a physical medium could encode instructions to control a machine's behavior would directly inspire Charles Babbage's work on programmable computing, and more broadly anticipates the concept of stored programs that underlies all modern computing and machine learning systems.[10][11]
1805 Mathematical foundation Least squares method Adrien-Marie Legendre describes the "méthode des moindres carrés", known in English as the least squares method. The least squares method would become one of the most fundamental tools in machine learning, underpinning linear regression and remaining central to parameter estimation in a wide range of models, from simple predictors to components of more complex neural network training procedures.[12]
1812 Concept Formalization of Bayes' theorem Pierre-Simon Laplace publishes Théorie Analytique des Probabilités, in which he expands upon the work of Bayes and defines what is now known as Bayes' Theorem.[13] By giving the theorem its formal mathematical expression, Laplace makes it practically applicable, enabling later generations of researchers to build probabilistic models of learning and inference that would become central to machine learning, particularly in areas such as Bayesian networks and probabilistic graphical models.
1834 Concept Babbage's Analytical Engine English polymath Charles Babbage, known as the father of the computer, envisions a machine that could be programmed using punch cards. Although the device would never be constructed, its logical framework — including separate units for computation and memory, and the ability to execute conditional branches — forms the basis for all modern computers, and by extension for the hardware on which machine learning algorithms would eventually run.[14][8]
1842 Algorithm Lovelace's Analytical Engine algorithm English mathematician and writer Ada Lovelace becomes the world's first computer programmer. She develops an algorithm that outlines a series of steps for solving mathematical problems on Charles Babbage's theoretical punch-card machine. Beyond her immediate contribution, Lovelace is among the first to conceive of a machine capable of general-purpose computation, anticipating the idea that algorithms — the backbone of all machine learning — could be expressed and executed mechanically. Her pioneering work would be recognized when the US Department of Defense names a software language "Ada" in her honor.[8]
1847 Concept Boolean algebra English mathematician, philosopher, and logician George Boole devises a type of algebra that allows all values to be simplified as either "true" or "false." This concept, known as Boolean logic, would prove foundational to the design of digital circuits and computing hardware, and would also directly influence the development of decision tree algorithms and logical rule-based systems in machine learning, where binary splitting on conditions forms the basis of some of the field's most widely used models.[8][11]
1854 Application John Snow's cholera map English physician John Snow, during a deadly cholera outbreak in London, challenges the prevailing belief that cholera spreads through "bad air." Using a map, Snow plots the locations of cholera cases and identifies the regions closest to each water pump. He makes a significant discovery by finding that most deaths occurred near a specific pump on Broad Street in the Soho district. Snow deduces that the contaminated water from that pump is responsible for the outbreak. By convincing the locals to disable the pump, the epidemic is brought under control. This event marks the birth of epidemiology and serves as an early success of the nearest-neighbor algorithm, even before its official invention, nearly a century later.[15]
1890 Precursor Hollerith punched-card tabulating machine German-American statistician, inventor, and businessman Herman Hollerith develops a pioneering mechanical system that integrates punch cards with mechanical calculation methods. This groundbreaking system enables the rapid computation of statistics compiled from vast amounts of data collected from millions of individuals. Hollerith's system demonstrates for the first time that large-scale data processing is practically feasible, anticipating the data-driven paradigm that would become central to machine learning, and his company would eventually merge into what became IBM, one of the most influential organizations in the history of computing and AI.[11]
1913 Concept Markov chains Andrey Markov, a Russian mathematician, first describes techniques he uses to analyse patterns of vowels and consonants in Alexander Pushkin's novel in verse Eugene Onegin, presenting his findings to the Imperial Academy of Sciences in St. Petersburg. The techniques later become known as Markov chains. Though originally a contribution to probability theory, Markov chains would become a foundational tool in machine learning, underpinning hidden Markov models used in speech recognition, natural language processing, and reinforcement learning, where they provide a mathematical framework for modeling sequences of states and transitions.[16]
1936 Concept Turing machine English mathematician Alan Turing proposes a theory outlining how a machine could identify and carry out a predefined set of instructions.[14] His theory of computation establishes the theoretical basis for what computers can and cannot compute, a question directly relevant to machine learning, where the limits of what algorithms can learn are still studied under the lens of computational learning theory. Turing's concept of a universal machine also anticipates the idea of a single device capable of running any algorithm, which is the hardware foundation on which all modern machine learning systems operate.[17]
1940 Hardware ENIAC computer ENIAC (Electronic Numerical Integrator and Computer) is created as the first manually operated computer, marking the birth of the first electronic general-purpose computer. Following this milestone, stored program computers such as EDSAC in 1949 and EDVAC in 1951 would be subsequently developed, introducing the concept of storing and executing programs electronically. These machines make it possible for the first time to run iterative numerical computations at scale, a capability that would prove essential for training machine learning models, which rely on repeated mathematical operations over large datasets.[14]
1943 Model development McCulloch–Pitts neuron American neurophysiologist Warren McCulloch and mathematician Walter Pitts publish "A Logical Calculus of the Ideas Immanent in Nervous Activity" in the Bulletin of Mathematical Biophysics, proposing a mathematical model of the nervous system as a network of simple logical elements — later known as artificial neurons or McCulloch–Pitts neurons — that receive inputs, perform a weighted sum, and fire an output signal based on a threshold function. By connecting these units in various configurations, they demonstrate that their model can perform all logical functions, marking the first instance of neural networks. Their work is seminal in cognitive science, computational neuroscience, and artificial intelligence, and would directly inspire later developments including the perceptron and, eventually, the deep learning architectures that underpin modern machine learning.[18]
1949 Concept Hebbian learning Canadian psychologist Donald Hebb introduces a pioneering concept drawn from neuropsychology in his book The Organization of Behavior: A Neuropsychological Theory, proposing that connections between neurons strengthen when both neurons fire together — a principle commonly summarized as "neurons that fire together, wire together." Known as Hebbian Learning theory, it aims to establish correlations among nodes within a recurrent neural network (RNN) and functions as a memory for future reference. The principle would prove influential in the development of unsupervised learning algorithms and associative memory models, and its core idea of strengthening connections based on co-activation remains reflected in modern neural network training procedures. References to Hebb's work increase each year, with his ideas now applied in engineering, robotics, and computer science as well as neuroscience.[19]
1950 Concept Turing's learning machine Alan Turing proposes a 'learning machine' that could learn and become artificially intelligent, and introduces the "Turing Test" as a benchmark for machine intelligence — a computer passes if it can convince a human judge that it is also human.[20] Turing's specific proposal also foreshadows genetic algorithms. The paper would become one of the most cited and influential in the history of artificial intelligence, framing the question of machine intelligence in a way that continues to shape research goals and public discourse, and the Turing Test itself would remain a cultural reference point for evaluating AI progress for decades.[7][14]
1951 Hardware SNARC neural network machine Marvin Minsky and Dean Edmonds build the SNARC (Stochastic Neural Analog Reinforcement Calculator), the first neural network machine able to learn. Constructed from vacuum tubes and motors, the SNARC simulates a network of 40 neurons and is used to model a rat finding its way through a maze. Though primitive by later standards, it represents the first physical instantiation of a learning neural network, demonstrating that the theoretical ideas of McCulloch and Pitts could be realized in hardware, and foreshadowing the neural network machines that would become central to machine learning decades later.[21]
1952 Application Machines playing checkers Arthur Samuel at IBM's Poughkeepsie Laboratory becomes one of the early pioneers of machine learning. He develops some of the first machine learning programs, starting with programs that play checkers. Samuel's program, designed for an IBM computer, analyzes winning strategies by studying gameplay, improving its performance over time by incorporating successful moves into its algorithm. His use of alpha-beta pruning enables the program to play checkers at a championship level. Beyond the game itself, Samuel's work is significant for demonstrating that computers could improve their performance through experience rather than explicit programming — a principle that would become the defining idea of machine learning as a field, and that Samuel himself would articulate when coining the term in 1959.[22][7][23]
1957 Model development Perceptron Frank Rosenblatt invents the perceptron while working at the Cornell Aeronautical Laboratory, creating the first neural network for computers capable of learning from examples. The perceptron garners significant media attention and public excitement about the prospects of thinking machines. Though its limitations would later be exposed by Minsky and Papert in 1969 — showing it could not solve problems that are not linearly separable, such as the XOR function — the perceptron would nonetheless remain a foundational model in machine learning, and its core architecture of weighted inputs feeding into a threshold function directly anticipates the artificial neurons used in modern deep learning.[24][7]
1959 Model development ADALINE and MADALINE A significant advancement in neural networks occurs when Bernard Widrow and Marcian Hoff develop two models at Stanford University. The initial model, known as ADALINE (Adaptive Linear Neuron), showcases the ability to recognize binary patterns and make predictions about the next bit in a sequence. The subsequent generation, called MADALINE (Multiple ADALINE), proves highly practical by effectively eliminating echo on phone lines — the first neural network to be applied to a real-world problem. ADALINE's training rule, known as the Widrow-Hoff learning rule or least mean squares algorithm, would prove particularly influential, anticipating gradient descent methods that remain central to training neural networks in modern machine learning.[8][2]
1959 Concept Machine learning (term coined) The term "Machine Learning" is first coined by Arthur Samuel, a pioneer in the field who had already demonstrated the concept through his checkers-playing program, defining it as the "field of study that gives computers the ability to learn without being explicitly programmed".[25] The definition proves remarkably durable, remaining the most widely cited characterization of the field decades later, and helping to establish machine learning as a discipline distinct from both traditional programming and broader artificial intelligence research.
1959 Application Neural network echo cancellation The first practical application of a neural network occurs when Bernard Widrow and Marcian Hoff's MADALINE system is utilized to address the issue of echo removal on phone lines through the implementation of an adaptive filter. This marks a pivotal moment in the history of machine learning, demonstrating for the first time that a neural network could solve a real-world engineering problem at commercial scale. The success of the application helps make the case that neural networks are not merely theoretical constructs, encouraging further investment in practical machine learning research.[14]
1962 Algorithm ADALINE algorithm U.S. professor Bernard Widrow and Ted Hoff formally introduce the ADALINE algorithm, a single-layer neural network that can be used for classification and regression tasks, trained using the least mean squares (LMS) method. While limited to a single layer, the LMS learning rule introduced alongside ADALINE would prove more lastingly influential than the algorithm itself, becoming a precursor to the gradient descent optimization methods that underpin the training of modern deep learning models.[2]
1963 Research funding DARPA funds AI and machine translation research United States government agencies like the Defense Advanced Research Projects Agency (DARPA) fund AI research at universities such as MIT, hoping for machines that would translate Russian instantly during a period when the Cold War makes such capabilities strategically valuable. Though the machine translation goals would prove far more difficult than anticipated, leading to a critical report in 1966 that would contribute to the first AI winter, the funding establishes a pattern of government investment in AI and machine learning research that would continue for decades, with DARPA remaining one of the most significant sources of funding for foundational machine learning research.[26]
1965 Algorithm Group method of data handling (GMDH) Soviet mathematician Alexey Ivakhnenko publishes a number of articles and books on group method of data handling (GMDH), a method for inductive inference that builds complex models from data by iteratively selecting and combining simpler functions. The GMDH algorithm is notable for being among the first to train deep multilayer networks, with each layer's outputs serving as inputs to the next — a structure that directly anticipates modern deep neural networks. Ivakhnenko's work is considered by some historians of machine learning to be an early, underrecognized predecessor of deep learning, predating the backpropagation-based approaches that would later dominate the field.[27]
1967 Algorithm Nearest neighbor algorithm Thomas M. Cover and Peter E. Hart make a significant contribution to the field of pattern recognition by introducing the nearest neighbor algorithm, which classifies new data points based on their similarity to already-classified examples. Its initial application is in mapping routes, particularly for traveling salesmen who need to visit multiple cities in a short tour. The algorithm would become one of the most widely taught and applied methods in machine learning, valued for its simplicity and interpretability. Cover and Hart also prove theoretical bounds on its error rate, showing it performs at worst twice as poorly as the optimal classifier — a result that establishes nearest neighbor methods as theoretically sound as well as practically useful.[7]
1969 Theoretical analysis Perceptron limitations Marvin Minsky and Seymour Papert publish their book Perceptrons (book), rigorously demonstrating mathematical limitations of single-layer perceptrons, including their inability to solve problems that are not linearly separable such as the XOR function. The book is widely interpreted as showing that neural networks are fundamentally limited, contributing significantly to a reduction in funding and research interest that helps trigger the first AI winter. However, the limitations Minsky and Papert identify apply specifically to single-layer networks, and the publication of backpropagation algorithms in the 1980s would show that multilayer networks could overcome these limitations, eventually leading to the deep learning revolution.[28][29]
1970 Algorithm Automatic differentiation (reverse mode) Finnish mathematician and computer scientist Seppo Linnainmaa publishes the general method for automatic differentiation (AD) of discrete connected networks of nested differentiable functions, corresponding to the modern version of backpropagation though not yet named as such.[30][31] Linnainmaa's method would later be recognized as the theoretical foundation of backpropagation, the algorithm that makes training deep neural networks computationally feasible and that would become the single most important algorithmic tool in modern machine learning. Its significance would only become widely appreciated decades after its publication.[32][33]
1974 Algorithm ALOPEX algorithm E. Harth and Evangelia Micheli-Tzanakou introduce ALOPEX (ALgorithm Of Pattern EXtraction), a stochastic optimization method originally developed to determine visual receptive fields in neurophysiology by identifying correlations between stimulus patterns and neural responses.[34] Micheli-Tzanakou, who would go on to become a professor at Rutgers University, is also credited with establishing one of the first brain-to-computer interfaces using ALOPEX. Developed in the context of neuroscience research, ALOPEX is notable for being one of the earliest algorithms to draw explicitly on biological models of neural processing for machine learning purposes, and would later be applied to signal processing, image processing, pattern recognition, and neural network training.
1977 Algorithm Expectation–maximization algorithm The Expectation–maximization algorithm (EM) is formally explained and given its name in a paper by Arthur P. Dempster, Nan Laird, and Donald Rubin, three statisticians who unify a family of iterative methods for finding maximum likelihood estimates in models with missing or latent data.[35] The EM algorithm would become one of the most widely used tools in machine learning and statistics, finding applications in clustering, density estimation, natural language processing, and computer vision, particularly in models such as Gaussian mixture models and hidden Markov models where some variables cannot be directly observed.
1979 Application Stanford Cart autonomous navigation Students at Stanford University develop the Stanford Cart, a remote-controlled robot that successfully navigates a room filled with obstacles without human intervention, using camera images to plan its path. The Cart takes approximately 15 minutes to move each meter, pausing frequently to process visual information, but nonetheless demonstrates that a machine can perceive and reason about its physical environment autonomously. The project is an early milestone in the application of machine learning and computer vision to robotics, anticipating the self-driving vehicles and autonomous navigation systems that would become a major focus of machine learning research decades later.[36][7]
1980 Model development Neocognitron Japanese computer scientist Kunihiko Fukushima introduces the neocognitron, a hierarchical multilayered convolutional neural network inspired by the structure of the mammalian visual cortex. The neocognitron introduces the key architectural ideas of local receptive fields and spatial hierarchies of features, allowing it to recognize patterns regardless of their position in the visual field. These ideas would directly inspire Yann LeCun's development of convolutional neural networks (CNNs) in the late 1980s, which would in turn become the dominant architecture for image recognition tasks and a cornerstone of modern deep learning.[37][27]
1980 Algorithm Linde–Buzo–Gray algorithm The Linde–Buzo–Gray algorithm is introduced by Yoseph Linde, Andrés Buzo and Robert M. Gray as an iterative vector quantization method that refines a codebook to represent training data optimally, combining Lloyd's algorithm with a splitting technique that expands codebooks by duplicating vectors with perturbations. The algorithm would become widely used in data compression and signal processing, and its core ideas of clustering and centroid-based representation would influence the development of k-means clustering, one of the most widely used unsupervised learning algorithms in machine learning.[38]
1981 Algorithm Explanation-based learning Gerald DeJong introduces Explanation Based Learning (EBL), a concept in machine learning where a computer algorithm analyzes training data to create a general rule by discarding unimportant information, allowing it to generalize from a single example by using prior domain knowledge to explain why the example is relevant. EBL represents an important early attempt to combine knowledge-driven and data-driven approaches to machine learning, anticipating later work on transfer learning and meta-learning, where models leverage existing knowledge to learn more efficiently from limited data.[7]
1981 Algorithm Backpropagation for multilayer perceptrons American social scientist and machine learning pioneer Paul Werbos publishes a paper in the Mathematics of Control, Signals, and Systems journal that elaborates on his 1974 doctoral work, providing a more accessible and explicit formulation of the backpropagation algorithm for training multilayer perceptrons (MLPs). Though still not widely adopted at the time, the paper helps lay the groundwork for the independent rediscovery and popularization of backpropagation by Rumelhart, Hinton, and Williams in 1986, which would trigger a major resurgence of interest in neural networks and establish backpropagation as the dominant training algorithm in machine learning.[39]
1982 Model development Recurrent Neural Network John Hopfield, an American physicist, popularizes Hopfield networks, a type of recurrent neural network that can serve as content-addressable memory systems, storing patterns and retrieving them when presented with partial or noisy versions. Hopfield's work reignites interest in neural networks at a time when the field is still recovering from the pessimism triggered by Minsky and Papert's 1969 critique, demonstrating that neural networks have useful computational properties beyond simple pattern classification. Hopfield networks would also influence the development of Boltzmann machines and deep belief networks, which would become important stepping stones toward modern deep learning.[40][2]
1982 Government initiative Japan Fifth Generation Computer Systems project Japan announces its Fifth Generation Computer Systems (FGCS) project, a ambitious ten-year government initiative aimed at developing computers capable of performing artificial intelligence tasks, including natural language processing and machine reasoning. The announcement triggers alarm among Western governments and serves as a catalyst for increased American and European funding in AI and machine learning research. Though the FGCS project itself would largely fail to meet its goals by the time it concludes in 1992, its geopolitical impact proves significant, helping to sustain research funding during a period that might otherwise have seen deeper cuts, and contributing to the conditions that enabled the machine learning resurgence of the mid-1980s.[2]
1982 Model development Crossbar Adaptive Array Slovenian-American computer scientist Stevo Bozinovski introduces self-learning as a machine learning paradigm alongside a neural network capable of self-learning named Crossbar Adaptive Array (CAA). The CAA learns by interacting with its environment and receiving reinforcement signals, combining elements of neural networks and reinforcement learning in a single architecture. The work is notable for being an early explicit formulation of the idea that a machine learning system could improve its behavior through environmental interaction rather than supervised training, anticipating the reinforcement learning frameworks that would become a major branch of machine learning research.[41]
1985 Application NetTalk neural network Terrence Sejnowski, a neuroscientist at Johns Hopkins University, along with Charles Rosenberg at Princeton University, develop NETtalk, a massively parallel neural network that learns to convert English text to speech by training on a dataset of 20,000 words paired with phonetic transcriptions. NETtalk acquires correct pronunciation of words through practice, with its performance following a power law of learning similar to that observed in humans — the more words it learns, the better it generalizes to new ones. The network attracts significant public attention as a vivid demonstration that neural networks could acquire complex linguistic skills through data-driven training rather than explicit programming, helping rebuild enthusiasm for neural networks during a period of renewed interest in connectionist approaches. Its success inspires further research in speech synthesis and demonstrates the potential of neural networks for natural language processing.[42]
1985–1986 Algorithm Backpropagation for multilayer neural networks The practical training algorithm backpropagation for artificial neural networks — particularly multilayer perceptrons — is popularized when David Rumelhart, Geoffrey Hinton, and Ronald J. Williams publish their influential 1986 paper Learning representations by back-propagating errors, building on earlier conceptual work by Paul Werbos in 1974 and 1981. Unlike the earlier formulations, the Rumelhart et al. paper presents backpropagation in a form accessible to the broader research community and demonstrates its effectiveness on concrete problems, triggering a major resurgence of interest in neural networks. The algorithm would go on to become the single most important training method in machine learning, remaining central to the training of deep neural networks decades later.[39][43]
1986 Algorithm ID3 decision tree algorithm Australian computer scientist Ross Quinlan proposes the ID3 algorithm (Iterative Dichotomiser 3), a method for constructing decision trees from datasets by iteratively selecting the feature that best splits the data according to an information gain criterion. ID3 proves influential for its simplicity and interpretability, making decision tree learning accessible to a wide range of practitioners. Quinlan would later extend the algorithm into C4.5 and C5.0, which would become among the most widely used machine learning algorithms in practical applications, and decision trees would remain a foundational model in machine learning, forming the basis of powerful ensemble methods such as random forests and gradient boosting.[39]
1986 Model development Dehaene–Changeux model Cognitive neuroscientists Stanislas Dehaene and Jean-Pierre Changeux develop the Dehaene–Changeux model, a hierarchical neural network model of the human brain designed to simulate higher cognitive functions, including attention and working memory. The model provides a predictive computational framework for studying phenomena such as inattentional blindness and the solving of the Tower of London test.[44] As one of the early attempts to model complex cognitive processes using neural networks, the work contributes to the broader research program of using machine learning architectures to understand biological intelligence, a tradition that would later inform the development of attention mechanisms and transformer architectures in modern deep learning.[45]
1986 Publication Machine Learning (journal) Peer-reviewed scientific journal Machine Learning (journal) is first issued, published by Springer Nature. Arriving at a moment of renewed interest in the field following the popularization of backpropagation, the journal provides a dedicated venue for machine learning research at a time when the field is establishing itself as a discipline distinct from broader AI research. It would become one of the leading journals in the field, publishing foundational work on statistical learning theory, natural language processing, computer vision, data mining, and reinforcement learning, and helping to consolidate machine learning as a recognized scientific community with shared standards and methods.[46]
1986 Conference European Conference on Machine Learning The European Working Session on Learning (EWSL) is established as a conference focused on machine learning research in Europe, providing a dedicated regional venue at a time when the field is experiencing renewed momentum following the popularization of backpropagation. The event evolves into the European Conference on Machine Learning (ECML) in 1993, and in 2001 begins being jointly organized with the Principles and Practice of Knowledge Discovery in Databases (PKDD) conference as ECML PKDD. The combined conference would grow into one of the most significant annual gatherings for machine learning and data mining research in Europe, complementing North American venues such as NeurIPS and ICML in shaping the global machine learning research community.[47]
1986 Organization Knowledge Engineering and Machine Learning Group The Knowledge Engineering and Machine Learning Group (KEMLg) is founded at the Technical University of Catalonia (UPC) in Barcelona, Spain, becoming one of the earliest dedicated machine learning research groups in Europe. Active in the AI field since 1986, the group focuses on the analysis, design, implementation and application of AI techniques to real-world complex systems, with research spanning healthcare, environmental processes, social and internet-based systems, and the industrial sector. Its main research areas include knowledge discovery and data mining, intelligent decision support systems, multiagent systems, and supervised and unsupervised machine learning techniques. The group's founding reflects the growing institutionalization of machine learning research in European universities during the mid-1980s.[48]
1989 Algorithm Q-learning Christopher Watkins, a British researcher, introduces Q-learning, a model-free (reinforcement learning) reinforcement learning algorithm that enables an agent to learn optimal actions in an environment by estimating the value of taking each action in each state, without requiring a model of the environment itself.[49] Q-learning greatly improves the practicality and feasibility of reinforcement learning, and would become one of the most widely studied and applied algorithms in the field. It would later serve as the foundation for deep Q-networks (DQN), the algorithm used by DeepMind to train agents to play Atari games at superhuman levels, marking a landmark moment in deep reinforcement learning.
1992 Application Machines Playing Backgammon Gerald Tesauro develops TD-Gammon, a computer backgammon program that utilizes an artificial neural network trained using temporal-difference learning (hence the 'TD' in the name), learning to play by playing millions of games against itself rather than from human-labeled examples. TD-Gammon is able to rival the abilities of top human backgammon players, representing a landmark achievement in reinforcement learning. Beyond its performance, TD-Gammon demonstrates for the first time that temporal-difference learning combined with neural networks could produce expert-level performance in a complex board game through self-play, a paradigm that would later be central to DeepMind's AlphaGo and AlphaZero programs.[50]
1993 Conference ICML launches The first International Conference on Machine Learning (ICML) is held, transforming a series of International Workshops on Machine Learning that had been organized for nearly a decade into a formal annual conference. The formalization of ICML reflects the growing maturity of machine learning as a distinct scientific discipline, providing a dedicated global venue at a time when the field is expanding rapidly following the backpropagation resurgence. ICML would go on to become one of the most prestigious and competitive venues in artificial intelligence research, alongside NeurIPS and ICLR, with acceptance to the conference serving as a key measure of research impact in the field.[51]
1995 Algorithm Support vector machines Corinna Cortes and Vladimir Vapnik publish Support-Vector Networks, introducing support vector machines (SVMs) for classification and regression. SVMs find a maximum-margin hyperplane separating data classes and can handle nonlinearly separable data via higher-dimensional mappings using the kernel trick. Possessing a solid theoretical foundation in statistical learning theory alongside impressive empirical results, SVMs would become one of the dominant machine learning methods of the late 1990s and 2000s, proving effective across applications such as spam filtering, image classification, and fraud detection. Their rise also triggers a productive tension within the machine learning community between advocates of SVMs and neural networks, a debate that would not be settled until the deep learning resurgence of the 2010s.[39][52]
1995 Algorithm Random Forest Algorithm Tin Kam Ho publishes the paper Random Decision Forests, introducing random decision forests, an ensemble learning method that combines multiple decision trees by randomly selecting features and thresholds, making the trees relatively independent and reducing prediction variance. Ho's paper establishes the core insight that combining diverse, somewhat randomized decision trees produces more accurate and robust predictions than any single tree. The method would be significantly extended by Leo Breiman in 2001, whose formulation of the random forest algorithm would become one of the most widely used machine learning methods across fields including image classification, natural language processing, and biomedical research.[53]
1997 Milestone Deep Blue defeats Garry Kasparov Supercomputer Deep Blue, developed by IBM, achieves a historic victory by defeating chess grandmaster Garry Kasparov in a six-game match, becoming the first computer system to defeat a reigning world chess champion under standard tournament conditions. While Deep Blue relies primarily on brute-force search and hand-crafted evaluation functions rather than machine learning, the event captures global attention and shifts public perception of what computers are capable of. The match spurs renewed interest and investment in AI research more broadly, and the question of how to achieve similar results through learning rather than explicit programming would motivate much of the machine learning research that follows.[23]
1997 Model development Long short-term memory (LSTM) Sepp Hochreiter and Jürgen Schmidhuber introduce long short-term memory (LSTM), a recurrent neural network architecture that addresses the vanishing gradient problem — a fundamental obstacle that had prevented earlier recurrent networks from learning dependencies spanning long sequences. By introducing gating mechanisms that control the flow of information through the network, LSTM enables learning of long-range dependencies over thousands of time steps. It would become one of the most widely used neural network architectures in machine learning, driving advances in speech recognition, machine translation, and text generation, and remaining a dominant approach to sequence modeling until the rise of transformer architectures in the late 2010s.[54]
1997 Algorithm AdaBoost Yoav Freund and Robert Schapire introduce AdaBoost (Adaptive Boosting), an ensemble method that combines multiple weak classifiers — models that perform only slightly better than random guessing — into a single strong classifier by iteratively training each new classifier to focus on the examples that previous classifiers got wrong. AdaBoost receives the prestigious Gödel Prize for its contributions to theoretical computer science. Beyond its practical effectiveness in tasks such as face detection, AdaBoost provides a theoretical framework for understanding boosting that would influence a generation of ensemble methods, including gradient boosting machines and XGBoost, which would become among the most widely used machine learning algorithms in industry and data science competitions.[39]
1998 Application Neural network ZIP code recognition Researchers at AT&T Bell Laboratories, led by Yann LeCun, develop a convolutional neural network called LeNet-5 that can accurately recognize handwritten ZIP codes and digits, trained on a dataset of 60,000 examples and achieving error rates competitive with other state-of-the-art methods. The system uses backpropagation to train a multilayer convolutional architecture, demonstrating that deep neural networks could be trained effectively on real-world recognition tasks. LeNet-5 would become one of the most influential neural network architectures in machine learning history, directly inspiring the convolutional neural networks that would dominate image recognition research following the deep learning resurgence of the 2010s.[2]
2000 Algorithm Local outlier factor In anomaly detection, the local outlier factor (LOF) algorithm is proposed by Markus M. Breunig, Hans-Peter Kriegel, Raymond T. Ng and Jörg Sander for finding anomalous data points by measuring the local deviation of a given data point with respect to its neighbours. Unlike earlier anomaly detection methods that rely on global statistics, LOF introduces the concept of local density, allowing it to detect outliers in datasets where different regions have very different densities. The algorithm would become one of the most widely cited and used methods in anomaly detection, finding applications in fraud detection, network intrusion detection, and medical diagnosis.[55]
2000 Algorithm LogitBoost LogitBoost, a Boosting (meta-algorithm) algorithm in machine learning and computational learning theory, is formulated by Jerome H. Friedman, Trevor Hastie, and Robert Tibshirani, three prominent statisticians at Stanford University. LogitBoost adapts the boosting framework to logistic regression, providing a statistically principled approach to classification that bridges the gap between machine learning and classical statistics. The algorithm contributes to a broader research program of understanding boosting methods through a statistical lens, influencing the development of gradient boosting machines that would become among the most powerful and widely used machine learning methods in practical applications.[56]
2000 Publication Journal of Machine Learning Research The Journal of Machine Learning Research (JMLR) is first published by the JMLR Foundation, established as a free and open-access alternative to existing subscription-based journals in the field. Founded at a time when open access publishing is still rare in computer science, JMLR's model of making all articles freely available online would prove influential, helping to establish open access as a norm in machine learning research and anticipating the widespread use of preprint servers such as arXiv. JMLR would become one of the leading journals in the field, publishing foundational work across statistical learning theory, natural language processing, computer vision, and reinforcement learning.[57]
2001 Algorithm Random forest Leo Breiman, an influential American statistician, introduces the random forest ensemble learning method, extending Tin Kam Ho's 1995 random decision forests work by adding the concept of bagging — training each tree on a random subset of the training data — alongside random feature selection. The combination produces an algorithm that is both highly accurate and robust to overfitting. Random forests would become one of the most widely used machine learning algorithms across industry and academia, valued for their strong out-of-the-box performance, resistance to overfitting, and ability to handle high-dimensional data, remaining competitive with more complex methods even after the rise of deep learning.[39]
2001 Algorithm iDistance indexing method The iDistance indexing and query processing technique is first proposed by Cui Yu, Beng Chin Ooi, Kian-Lee Tan and H. V. Jagadish, providing an efficient method for indexing and querying data in high-dimensional metric spaces by mapping data points to a single-dimensional space based on their distance from reference points. The technique addresses a fundamental challenge in machine learning known as the curse of dimensionality, where the computational cost of finding nearest neighbors grows prohibitively with the number of features. iDistance would prove effective for applications including image retrieval, text mining, and data mining, contributing to the broader research program of making similarity-based machine learning methods scalable to large, high-dimensional datasets.[58]
2002 (October) Software Torch machine learning library Torch (machine learning) is first released by Ronan Collobert, Samy Bengio, and Johnny Mariethoz as a scientific computing library designed for machine learning research, providing a flexible framework for building and training neural networks. Torch introduces a scripting language based on Lua that makes it easy to define and experiment with new model architectures. Though eventually superseded by more accessible frameworks, Torch would prove highly influential in the deep learning research community, and its core design philosophy of dynamic computation graphs would directly inspire the development of PyTorch, one of the most widely used deep learning frameworks in modern machine learning research.[59]
2003 Algorithm Manifold alignment The concept of manifold alignment is first introduced by Ji Hun Ham, Daniel D. Lee, and Lawrence K. Saul — researchers at the University of Pennsylvania working at the intersection of machine learning and dimensionality reduction — as a class of machine learning algorithms that produce projections between sets of data, given that the original data sets lie on a common manifold learning. The work addresses the challenge of finding meaningful correspondences between datasets that share underlying structure but differ in their representation, a problem relevant to tasks such as cross-language text analysis and multi-modal data fusion. Manifold alignment contributes to the broader field of representation learning, which seeks to discover low-dimensional structure in high-dimensional data, a research direction that would become increasingly central to machine learning with the rise of deep learning.[60]
2004 Software MapReduce Google engineers Jeffrey Dean and Sanjay Ghemawat unveil MapReduce, a distributed programming model for processing and generating large datasets by breaking them into smaller chunks that can be processed in parallel across clusters of computers, presenting their work at the 6th Symposium on Operating Systems Design and Implementation (OSDI).[61] MapReduce makes it practical to process datasets at a scale previously impossible for most organizations. For machine learning, the framework proves transformative by enabling the training of models on vastly larger datasets than had been feasible before, contributing to the data-driven paradigm shift that would characterize the field in the following decade. MapReduce would also directly inspire the development of Apache Hadoop, which would make distributed data processing accessible beyond Google.[27]
2004 Model development Hierarchical Temporal Memory Jeff Hawkins, inventor of the PalmPilot, and science writer Sandra Blakeslee introduce the concept of Hierarchical Temporal Memory (HTM) in their book On Intelligence, presenting a biologically constrained theory of intelligence based on the structure and function of the neocortex. HTM proposes that intelligence arises from the brain's ability to store and recall hierarchical sequences of patterns, and that this same principle could be implemented in machine learning systems. At its core, HTM uses learning algorithms that can store, learn, infer, and recall high-order sequences in an unsupervised process, making it well suited for prediction and anomaly detection in streaming data. Though HTM has not achieved mainstream adoption in machine learning, it represents an influential attempt to ground machine learning architectures in neuroscience, contributing to the broader research program of biologically inspired computing.[62]
2005 Research trend Neural network resurgence A third wave of interest in neural networks begins, driven by the convergence of several factors: the availability of large datasets such as ImageNet, the increasing power of GPUs for parallel computation, and new algorithmic insights from researchers including Geoffrey Hinton, Yoshua Bengio, Yann LeCun, and Andrew Ng. Together these developments make it possible to train deeper and larger neural networks than had previously been feasible, enabling state-of-the-art results in image classification, natural language processing, and speech recognition. This resurgence would culminate in the deep learning revolution of the 2010s, fundamentally transforming machine learning research and its industrial applications.[39]
2005 Application LSTM for speech recognition Alex Graves and Jürgen Schmidhuber demonstrate a significant breakthrough in speech recognition by applying Long Short-Term Memory (LSTM) recurrent neural networks to phoneme classification and recognition on the TIMIT speech corpus, showing that bidirectional LSTM outperforms both unidirectional LSTM and conventional recurrent neural networks, and that a hybrid BLSTM-HMM system improves on equivalent traditional hidden Markov model approaches that had dominated the field for decades. The application demonstrates that LSTM's ability to model long-range temporal dependencies makes it particularly well suited to speech, where the meaning of a sound depends heavily on context spanning many time steps. This work helps establish recurrent neural networks as a serious alternative to established speech recognition methods, laying the groundwork for the deep learning-based speech recognition systems that would be deployed commercially by Google, Apple, and Amazon within a few years.[63]
2006 Concept Deep learning British-Canadian cognitive psychologist and computer scientist Geoffrey Hinton introduces the term "deep learning" to describe a set of algorithms that enable the training of neural networks with many layers, publishing a paper demonstrating that deep belief networks could be trained effectively using a greedy layer-wise pretraining strategy. This addresses a longstanding obstacle in training deep networks and helps legitimize the pursuit of deeper architectures at a time when the machine learning mainstream still favors shallower models. The term and the associated techniques would become the dominant paradigm in machine learning within a decade, underpinning advances in image recognition, speech recognition, natural language processing, and virtually every other major application area.[7][4]
2006 Competition Face Recognition Grand Challenge The Face Recognition Grand Challenge (FRGC) is held by the National Institute of Standards and Technology (NIST) to evaluate the state of the art in face recognition technology, using a variety of data including 3D face scans, iris images, and high-resolution face images. The results demonstrate that face recognition algorithms have improved dramatically since previous evaluations in 2002 and 1995, with the best new algorithms performing significantly more accurately. The FRGC helps establish standardized benchmarks for face recognition research, accelerating progress in the field and contributing to the conditions that would make deep learning-based face recognition systems such as Facebook's DeepFace, published in 2014, possible.[2]
2006 Software Hadoop framework The Apache Hadoop framework is released as an open-source implementation of Google's MapReduce programming model, developed primarily by Doug Cutting and Mike Cafarella and initially built to support the Apache Nutch web search engine. By making distributed processing of large datasets accessible to organizations without Google's proprietary infrastructure, Hadoop democratizes large-scale data processing and enables a new generation of data-driven machine learning applications. It would become a foundational tool in the big data ecosystem, widely adopted across industry and academia, and directly enabling the large-scale dataset curation and model training pipelines that would underpin the deep learning revolution of the 2010s.[27]
2007 (June) Software Scikit-learn Scikit-learn is released by David Cournapeau as a Google Summer of Code project, later developed further by Gaël Varoquaux and others into a free and open-source machine learning library for Python. Built on top of NumPy and SciPy, scikit-learn provides consistent, well-documented implementations of a wide range of machine learning algorithms including support vector machines, decision trees, random forests, and k-nearest neighbors. Its emphasis on ease of use and a consistent API would make it the default starting point for machine learning practitioners across academia and industry, significantly lowering the barrier to applying machine learning and helping to establish Python as the dominant programming language in the field.[64]
2007 Software Theano Theano (software) is initially released by the Montreal Institute for Learning Algorithms (MILA) at the University of Montreal, developed primarily by a team led by Yoshua Bengio. Theano allows users to define, optimize, and evaluate mathematical expressions involving multi-dimensional arrays efficiently, with transparent use of GPUs for computation. It becomes one of the first widely adopted deep learning frameworks, enabling researchers to train neural networks on GPUs far more efficiently than was previously possible. Theano would directly influence the design of later frameworks including TensorFlow and PyTorch, and its symbolic computation approach would remain influential in the field long after its own deprecation in 2017.[65]
2008 (January 11) Software pandas American software developer Wes McKinney, working as a researcher at AQR Capital Management, releases the first version of pandas (software), a Python library for data manipulation and analysis built around the DataFrame data structure, which allows intuitive handling of tabular data. The name is a play on "panel data", a term from econometrics. pandas would become one of the most widely used tools in the machine learning workflow, providing the standard interface for loading, cleaning, and transforming datasets before feeding them into machine learning models. Its adoption would help cement Python's dominance in data science and machine learning, and it remains a foundational tool in the field.[66]
2008 Algorithm Isolation Forest The Isolation Forest (iForest) algorithm is proposed by Fei Tony Liu, Kai Ming Ting and Zhi-Hua Zhou, introducing a novel unsupervised approach to anomaly detection that works by isolating outliers rather than modeling normal data. Unlike density-based methods such as the local outlier factor, Isolation Forest exploits the observation that anomalies are few and different, making them easier to isolate through random partitioning. The result is an algorithm that is both computationally efficient and effective on high-dimensional datasets, addressing key limitations of earlier anomaly detection methods. Isolation Forest would become one of the most widely used anomaly detection algorithms in machine learning, finding broad application in fraud detection, network security, and industrial monitoring.[67]
2008 Software Encog Encog is created by Jeff Heaton as a pure-Java (programming language) and C# machine learning framework designed to support a broad range of neural network technologies, including genetic programming, NEAT (NeuroEvolution of Augmenting Topologies), and HyperNEAT. Encog is notable for being one of the few frameworks of its time to support neuroevolution alongside traditional gradient-based training methods, making it a useful tool for researchers exploring evolutionary approaches to machine learning. While eventually overshadowed by Python-based deep learning frameworks, Encog contributes to the diversification of machine learning tools available to practitioners during the late 2000s.[68]
2010 (April) Platform Kaggle Kaggle is founded by Anthony Goldbloom and Ben Hamner as a platform hosting data science competitions, enabling organizations to crowdsource machine learning solutions while providing practitioners with access to real-world datasets and a community of peers. Kaggle's competition format proves highly effective at accelerating progress on specific machine learning tasks, with winning solutions frequently advancing the state of the art and being published as influential research. Acquired by Google in 2017, Kaggle would grow into the largest community platform in machine learning, providing notebooks, datasets, and pre-trained models to millions of users worldwide, and playing a significant role in democratizing access to machine learning tools and education.[69]
2010 Application Microsoft Kinect motion sensing Microsoft releases the Kinect, a motion-sensing input device developed for the Xbox 360 gaming console that uses a depth camera and machine learning to track 20 human body joints at 30 frames per second, allowing people to interact with computers through movements and gestures without a physical controller. The body tracking system is powered by a random decision forest classifier trained on a large synthetic dataset of depth images, enabling real-time pose estimation invariant to body shape, clothing, and hair style.[70] The Kinect sells eight million units in its first 60 days, earning a Guinness World Record as the fastest-selling consumer electronics device at the time. Beyond gaming, it demonstrates the commercial viability of real-time human pose estimation, spurring significant machine learning research into body tracking and gesture recognition, and making depth camera hardware widely accessible to researchers for the first time.
2010 Algorithm Constructing Skill Trees George Konidaris, Scott Kuindersma, Andrew Barto, and Roderic Grupen introduce Constructing skill trees (CST), a hierarchical reinforcement learning algorithm that automatically builds skill hierarchies from demonstration trajectories by using an incremental maximum a posteriori (MAP) change point detection algorithm to segment each trajectory into skills, then integrating the results into a skill tree. CST addresses a fundamental challenge in reinforcement learning — the difficulty of learning complex, long-horizon tasks — by decomposing them into simpler reusable subtasks. Compared to earlier approaches such as skill chaining, CST is significantly faster and can be applied to higher-dimensional policies, with even unsuccessful episodes contributing to skill improvement. The work contributes to the broader research program of hierarchical reinforcement learning, which would become increasingly important as researchers sought to apply reinforcement learning to complex real-world tasks such as robotic manipulation.[71]
2010 Organization DeepMind founded British artificial intelligence company DeepMind is founded in London by neuroscientist Demis Hassabis, Shane Legg, and Mustafa Suleyman, with a mission to develop artificial general intelligence safely and beneficially, combining neuroscience and machine learning to build systems that can learn to solve a wide range of tasks. In 2014, DeepMind is acquired by Google for a reported £400 million, marking one of the largest acquisitions of an AI company to that point and giving DeepMind access to Google's computational resources and data infrastructure. It would go on to produce landmark results including AlphaGo's defeat of professional Go players in 2016, AlphaFold's prediction of protein structures in 2020, and numerous advances in reinforcement learning and deep learning research.[2]
2011 Achievement IBM Watson wins Jeopardy! Using a combination of machine learning, natural language processing, and information retrieval techniques, IBM's Watson (computer) defeats two of the greatest human champions in the history of Jeopardy! — Ken Jennings and Brad Rutter — in a televised match. Watson processes questions as natural language and searches a vast knowledge base to generate and rank candidate answers in real time, without access to the internet. The victory attracts widespread public attention and demonstrates that machine learning systems can handle the ambiguity, wordplay, and breadth of knowledge required by natural language question answering at a level competitive with the best human players. IBM subsequently pursues commercial applications of Watson in healthcare, finance, and customer service, helping to establish the business case for large-scale applied machine learning.[72]
2012 Research experiment Google Brain cat recognition The Google Brain team, led by Andrew Ng and Jeff Dean, create a neural network consisting of 16,000 computer processors and one billion connections that learns to recognize cats by watching unlabeled images taken from frames of YouTube videos, without being told what a cat is. The experiment demonstrates that large-scale unsupervised learning can produce high-level concept detectors from raw data alone, without human-labeled training examples. The result attracts significant public attention and helps make the case for investing in very large neural networks, contributing to the conditions that would drive the deep learning revolution and the subsequent scaling-up of neural network models across industry and academia.[73][74]
2012 Model development AlexNet Alex Krizhevsky, Ilya Sutskever, and Geoffrey Hinton develop AlexNet, a deep convolutional neural network that wins the ImageNet Large Scale Visual Recognition Challenge by a large margin, achieving a top-5 error rate of 15.3% compared to 26.2% for the second-place entry. AlexNet is the first CNN to use GPUs for training and introduces the ReLU activation function and dropout regularization, making it possible to train much deeper networks than had previously been feasible. The victory is widely regarded as the moment that triggers the modern deep learning era, demonstrating conclusively that deep neural networks trained on large datasets with GPU acceleration could dramatically outperform prior approaches, spurring a rapid shift across the machine learning community toward deep learning methods.[2][75]
2012 (March 12) Software mlpy library mlpy is released as a free and open-source Python module for machine learning, providing implementations of a wide range of algorithms including support vector machines, decision trees, random forests, and k-nearest neighbors, alongside utility functions for data manipulation and visualization. Built on top of NumPy and SciPy, mlpy aims to make machine learning accessible to Python practitioners at a time when the Python machine learning ecosystem is still consolidating around a small number of standard libraries. It contributes to the broader trend of open-source machine learning tooling that would help establish Python as the dominant language in the field.[76]
2013 Conference International Conference on Learning Representations The International Conference on Learning Representations (ICLR) holds its first edition, founded by Yann LeCun and Yoshua Bengio with a distinctive open peer review model that makes submitted papers and reviews publicly visible before acceptance decisions. This transparency represents a significant departure from the closed review processes of established venues such as NeurIPS and ICML. ICLR would rapidly become one of the most influential and competitive venues in machine learning research, with its emphasis on representation learning and deep learning reflecting and reinforcing the field's shift toward these methods. Its open review model would also influence broader discussions about publication norms in machine learning.[77][78]
2014 Model development DeepFace Facebook researchers publish their work on DeepFace, a system that uses a nine-layer deep neural network to identify faces with 97.35% accuracy on the Labeled Faces in the Wild benchmark, reducing the error rate of previous systems by more than 27% and approaching human-level performance of 97.53%. DeepFace demonstrates that the deep learning techniques that had proven transformative in image classification could be applied with equal force to face verification, one of the most practically important computer vision tasks. The work accelerates the deployment of deep learning-based face recognition in commercial applications, while also raising significant privacy and civil liberties concerns that would grow in prominence as the technology spread.[79][7]
2014 (May 26) Software Apache Spark Apache Spark is first released by Matei Zaharia and others at the AMPLab at UC Berkeley as a unified analytics engine for large-scale data processing, providing high-level APIs in Java, Scala, Python, and R. Unlike Hadoop's MapReduce model, Spark performs computations in memory, making it significantly faster for iterative algorithms — a critical advantage for machine learning, where models are trained through many repeated passes over data. Spark's built-in machine learning library, MLlib, makes it possible to train machine learning models on datasets too large to fit on a single machine, enabling large-scale machine learning pipelines in industry. It would become one of the most widely adopted big data processing frameworks, used by companies including Uber, Airbnb, and Netflix.[27][80]
2014 Achievement Eugen Goostman chatbot Turing test claim The chatbot "Eugen Goostman", developed by Vladimir Veselov, Eugene Demchenko, and Sergey Ulasen, is claimed to have passed the Turing Test by convincing 33% of human judges that it is human during a competition organized at the Royal Society in London. The claim attracts significant media coverage as a historic milestone. However, it is widely disputed by AI researchers, who argue that the test's conditions are too lenient — the chatbot plays the role of a 13-year-old Ukrainian boy for whom English is a second language, giving it plausible cover for confused or evasive answers. The episode reignites debate about whether the Turing Test remains a meaningful benchmark for machine intelligence.[81]
2014 Algorithm Generative adversarial networks Ian Goodfellow and colleagues including Yoshua Bengio invent Generative Adversarial Networks (GANs), a framework in which two neural networks — a generator that creates synthetic data and a discriminator that attempts to distinguish synthetic from real data — are trained simultaneously in a competitive process that drives both to improve. The idea is reportedly conceived by Goodfellow in a single evening and implemented the same night. GANs would prove to be one of the most fertile and influential ideas in machine learning of the 2010s, enabling the generation of photorealistic images, deepfake videos, synthetic voices, and artistic content, while also raising significant concerns about misinformation and the authenticity of digital media.[82]
2015 (February) Software spaCy NLP library spaCy is released by Matthew Honnibal and Ines Montani of Explosion AI as a free, open-source natural language processing library for Python, designed from the outset for production use rather than research experimentation. Unlike the dominant NLP toolkit of the time, NLTK, spaCy provides industrial-strength performance with a clean, consistent API, excelling at tasks including tokenization, part-of-speech tagging, named entity recognition, and dependency parsing. spaCy would become the standard NLP library for production machine learning pipelines, widely adopted across industry and academia, and its emphasis on practical performance and ease of integration would influence the design of subsequent NLP tools.[83]
2015 (March 27) Software Keras Keras is released by François Chollet, a Google engineer, as an open-source Python library for building neural networks that emphasizes simplicity, modularity, and rapid experimentation. At a time when deep learning frameworks such as Theano and early TensorFlow require significant expertise to use effectively, Keras provides a high-level API that makes building and training neural networks accessible to a much broader audience. Initially running on top of Theano, Keras later integrates with TensorFlow as its default backend and is eventually absorbed into TensorFlow 2.0 as its official high-level API. Keras plays a significant role in democratizing deep learning, lowering the barrier to entry for researchers and practitioners and accelerating the widespread adoption of neural network methods across fields.[84]
2015 (June 9) Software Chainer Chainer is released by Preferred Networks, Inc. in Japan as a deep learning framework written in Python. Chainer introduces the "define-by-run" approach to building neural networks, in which the computation graph is defined dynamically during execution rather than statically before it, making it significantly more flexible and intuitive for researchers to experiment with novel architectures. This approach, sometimes called dynamic computation graphs, contrasts with the static graph approach used by TensorFlow and Theano at the time. Chainer's define-by-run paradigm would prove highly influential, directly inspiring PyTorch's design and helping to establish dynamic computation graphs as the preferred approach for deep learning research.[85]
2015 (October 8) Software Apache SINGA Apache SINGA is first released as an open-source distributed machine learning library initiated by the DB System Group at the National University of Singapore in collaboration with Zhejiang University, designed to facilitate the training of large-scale deep learning models across clusters of machines. SINGA is notable for being one of the first distributed deep learning frameworks to be developed outside of major technology companies, reflecting the growing internationalization of machine learning infrastructure research. It would be adopted by organizations including Citigroup, NetEase, and Singapore General Hospital, demonstrating the applicability of distributed deep learning to real-world problems in finance and healthcare.[86]
2015 Achievement Beating Humans in Go Google DeepMind's AlphaGo program becomes the first computer program to defeat an unhandicapped professional human player in the ancient Chinese board game Go (game), defeating European champion Fan Hui in a five-game match. Go had long been considered a far more difficult challenge for AI than chess, due to its enormous search space — more possible positions than atoms in the observable universe — and the difficulty of evaluating board positions, which requires intuition that resists explicit programming. AlphaGo's victory, achieved by combining deep reinforcement learning with Monte Carlo tree search, is regarded as a landmark moment in machine learning, demonstrating that deep learning could master tasks previously thought to require uniquely human intuition.[87][88]
2015 Software TensorFlow Google releases TensorFlow as an open-source machine learning library derived from its internal DistBelief system, developed by the Google Brain team and designed for large-scale neural network training and deployment across diverse platforms including CPUs, GPUs, and mobile devices. TensorFlow's release makes Google's production-grade machine learning infrastructure available to the broader research community for the first time, accelerating the adoption of deep learning across academia and industry. It would rapidly become the most widely used deep learning framework in the world, and its open-source release would intensify competition among technology companies to contribute machine learning tools to the research community, helping to establish open-source software as the dominant mode of machine learning infrastructure development.[89]
2015 Platform Amazon Machine Learning Amazon (company) launches Amazon Machine Learning (Amazon ML), a cloud-based service that allows developers to build, train, and deploy machine learning models without managing the underlying infrastructure. The launch marks Amazon's entry into the machine learning platform market alongside Google and Microsoft, reflecting the growing recognition among major technology companies that machine learning services represent a significant commercial opportunity. Amazon ML would later evolve into the more comprehensive Amazon SageMaker platform, and Amazon Web Services would become one of the dominant providers of cloud-based machine learning infrastructure, making scalable model training and deployment accessible to organizations of all sizes.[2]
2015 Software Distributed Machine Learning Toolkit The Distributed Machine Learning Toolkit (DMTK) is released by Microsoft Research as an open-source framework designed to enable the efficient distribution of machine learning problems across multiple computers, allowing models to be trained on datasets too large to fit on a single machine. DMTK introduces a parameter server architecture that allows multiple machines to share and update model parameters efficiently during training, addressing a key bottleneck in large-scale machine learning. Microsoft's decision to open-source DMTK reflects the broader trend among major technology companies of releasing machine learning infrastructure as open-source software, contributing to the rapid democratization of large-scale machine learning capabilities during the mid-2010s.[7]
2015 Policy / advocacy Autonomous weapons open letter Over 3,000 AI and robotics researchers sign an open letter warning of the dangers of autonomous weapons systems that can select and engage targets without human intervention, presented at the International Joint Conference on Artificial Intelligence in Buenos Aires. The letter is endorsed by prominent figures including Stephen Hawking, Elon Musk, and Steve Wozniak. It represents one of the first major collective statements by the machine learning research community on the ethical implications of their work, helping to legitimize AI safety and ethics as serious research concerns. However, the letter is not without critics — some AI researchers and defense analysts argue that a blanket ban on autonomous weapons is neither feasible nor necessarily desirable, contending that properly designed autonomous systems could in some circumstances reduce civilian casualties compared to human-operated weapons. The letter contributes to ongoing discussions at the United Nations about regulating lethal autonomous weapons systems, debates that would continue and intensify as machine learning capabilities advance.[7]
2015 Milestone CTC-trained LSTMs for speech recognition Google researchers demonstrate that deep Long Short-Term Memory (LSTM) recurrent neural networks trained with Connectionist Temporal Classification (CTC) match and then surpass the performance of sequence-trained context-dependent hidden Markov model acoustic models, representing a significant advance in end-to-end speech recognition.[90] CTC training allows the model to learn directly from raw audio and text transcriptions without requiring manually aligned phoneme labels, significantly simplifying the training pipeline. CTC-trained LSTMs would subsequently become the foundation of commercial speech recognition systems including Google Voice Search and Amazon Alexa, transforming voice interfaces from novelties into reliable everyday tools.
2015 Organization OpenAI founded OpenAI is founded as a non-profit research company by Elon Musk, Sam Altman, Ilya Sutskever, Greg Brockman, and others, with a commitment of one billion dollars in funding from its founders and backers. OpenAI is established in explicit response to concerns that the development of artificial general intelligence is becoming concentrated in a small number of large technology companies, particularly Google following its acquisition of DeepMind. By committing to publish its research openly, OpenAI aims to ensure that the benefits of advanced AI are broadly shared. It would go on to produce some of the most influential machine learning research of the following decade, including the GPT series of language models and the DALL-E image generation system, before transitioning to a capped-profit structure in 2019.[2]
2015 Application PayPal fraud detection using machine learning PayPal adopts a hybrid approach to fraud detection that combines human expertise with machine learning, using human investigators to identify patterns and traits associated with fraudulent behavior and then training machine learning models to detect and flag similar activity automatically at scale. Processing millions of transactions daily across more than 200 markets, PayPal's fraud detection system demonstrates the practical value of machine learning for financial security at a scale impossible to achieve through human review alone. The approach serves as an influential model for the financial industry more broadly, accelerating the adoption of machine learning for fraud detection, anti-money laundering, and risk assessment across banks and payment processors worldwide.[8]
2016 Achievement AlphaGo defeats Lee Sedol Google DeepMind's AlphaGo defeats Lee Sedol, one of the world's highest-ranked professional Go players, four games to one in a match watched by an estimated 200 million people worldwide.[91] Unlike earlier AI victories in games such as chess, AlphaGo's success relies entirely on deep reinforcement learning and neural networks rather than hand-crafted rules or brute-force search, using value networks to evaluate board positions and policy networks to select moves, trained through a combination of supervised learning from human expert games and reinforcement learning from self-play. The match attracts unprecedented public attention to machine learning and AI, and is widely regarded as a defining moment in the history of artificial intelligence, accelerating both research investment and public discourse about the implications of increasingly capable machine learning systems. AlphaGo would go on to defeat world number one Ke Jie in 2017 before being succeeded by the even more capable AlphaGo Zero and AlphaZero.
2016 Achievement LipNet lip-reading system Researchers at DeepMind — Yannis M. Assael, Brendan Shillingford, Shimon Whiteson, and Nando de Freitas — introduce LipNet, the first end-to-end sentence-level lipreading model that simultaneously learns spatiotemporal visual features and a sequence model. Using spatiotemporal convolutions, a recurrent neural network, and CTC loss, LipNet achieves 95.2% accuracy on the GRID corpus, outperforming both the previous word-level state of the art of 86.4% and experienced human lip-readers. Beyond its impressive performance, LipNet is significant for demonstrating the power of end-to-end learning for complex perception tasks that combine spatial and temporal reasoning, and for highlighting the potential of machine learning to augment accessibility tools for people with hearing impairments. The work also raises concerns about privacy, as accurate automated lip-reading could enable surveillance of conversations in public spaces.[92]
2016 Software FBLearner Flow Facebook details FBLearner Flow, an internal machine learning platform that allows Facebook engineers to easily share, train, and deploy machine learning algorithms at scale. Used by more than 25% of Facebook's engineers, with over one million models trained and more than six million predictions made per second, FBLearner Flow demonstrates the degree to which machine learning has become embedded in the core operations of a major technology company. The platform's scale and sophistication illustrate the growing importance of machine learning infrastructure as a discipline in its own right, and Facebook's decision to publicize its internal tooling contributes to broader industry discussions about best practices for building and managing large-scale machine learning systems.[93]
2016 (October) Software PyTorch PyTorch is first released by Adam Paszke, Soumith Chintala, and others at Facebook AI Research, built on the Torch library and adopting Chainer's define-by-run dynamic computation graph approach. PyTorch's intuitive, Pythonic design and dynamic graphs make it significantly easier to debug and experiment with novel architectures than static graph frameworks such as TensorFlow, leading to rapid adoption in the research community. Within a few years of its release PyTorch would become the dominant framework for machine learning research, used in the majority of papers published at leading venues such as NeurIPS and ICML, and its influence would eventually prompt TensorFlow to adopt dynamic computation as well in TensorFlow 2.0.[94]
2017 Application Jigsaw trolling detection system Alphabet's Jigsaw (company) team develops Perspective, an intelligent content moderation system that uses machine learning to detect toxic comments and online harassment by analyzing millions of comments from websites including Wikipedia and the New York Times. The system assigns toxicity scores to text in real time, enabling platforms to flag or filter harmful content at a scale impossible to achieve through human moderation alone. Perspective represents one of the most prominent deployments of machine learning for online content moderation, demonstrating both the potential and the limitations of automated approaches — the system draws criticism for biases in its toxicity scores, particularly against text involving certain demographic groups, highlighting the broader challenge of fairness and bias in machine learning systems applied to social contexts.[14][8]
2018 Milestone AutoML and ethical turning point Machine learning sees several significant developments in 2018. Google releases Neural Architecture Search, demonstrating that machine learning can automate the design of neural network architectures — a process previously requiring significant human expertise — marking a major step toward automated machine learning (AutoML). DeepMind's AlphaFold achieves top results in the Critical Assessment of Protein Structure Prediction competition, signaling the potential of machine learning to transform structural biology. The year also marks a sobering ethical moment when a fatal accident involving an Uber self-driving vehicle in Arizona — the first pedestrian death caused by an autonomous vehicle — raises urgent questions about safety standards, accountability, and the pace of deploying machine learning systems in high-stakes real-world environments.[95][96]
2019 Achievement First machine-learning-generated research book Springer Nature publishes Lithium-Ion Batteries: A Machine-Generated Summary of Current Research, the first research book created using machine learning, produced by automatically summarizing and synthesizing a large corpus of scientific literature on lithium-ion batteries using natural language processing algorithms. The book is presented as a demonstration of how machine learning can help researchers navigate the rapidly growing volume of scientific publications, rather than as a replacement for human authorship. The publication attracts significant attention and debate about the role of automated systems in scientific communication, anticipating broader discussions about AI-generated content in academic publishing that would intensify with the rise of large language models in subsequent years.[97]
2020 Achievement Machine learning in COVID-19 response As the COVID-19 pandemic spreads globally, machine learning is deployed across multiple fronts in the response effort. Deep learning models are applied to CT scans and chest X-rays to assist in diagnosing COVID-19, natural language processing tools are used to mine the rapidly growing scientific literature for relevant findings, and machine learning algorithms are employed to accelerate drug discovery and predict protein structures relevant to the virus. The pandemic represents the most prominent large-scale deployment of machine learning in a global health emergency to date, demonstrating both the potential of the technology to accelerate scientific response and its limitations, including concerns about the reliability and generalizability of hastily developed diagnostic models.[98]
2021 Algorithm Player of Games and Switch Transformers DeepMind introduces "Player of Games", a general-purpose algorithm developed by Martin Schmid and colleagues that unifies guided search, self-play learning, and game-theoretic reasoning, achieving strong performance in both perfect information games such as chess and Go and imperfect information games such as poker — the first algorithm to accomplish this across both categories.[99] During the same year, Google researchers William Fedus, Barret Zoph, and Noam Shazeer introduce Switch Transformers, a sparse mixture-of-experts architecture that scales language models to over one trillion parameters while keeping computational costs constant by selectively routing each input to a single expert rather than processing it through the entire model. Switch Transformers demonstrates that dramatically scaling up model size need not require proportionally scaling up computation, influencing subsequent work on efficient large language model architectures.[100]
2022 (November 30) Achievement Release of ChatGPT OpenAI releases ChatGPT as a research preview, powered by GPT-3.5 and trained using reinforcement learning from human feedback (RLHF). ChatGPT reaches one million users within five days and 100 million monthly active users by February 2023, making it the fastest-growing consumer application in history at the time. The release marks a turning point in public awareness and understanding of machine learning, bringing large language models into mainstream use and triggering intense discussion about the implications of highly capable AI systems for education, employment, creative work, and society. ChatGPT's success accelerates competition among technology companies in generative AI, prompting major responses from Google, Microsoft, and Meta, and helping to establish large language models as the dominant paradigm in machine learning research and applications.[101]
2023 (February) Achievement Release of LLaMA Meta Platforms releases LLaMA (Large Language Model Meta AI), a family of large language models ranging from 7 to 65 billion parameters, made available to researchers under a noncommercial license. Unlike the proprietary large language models released by OpenAI and Google, LLaMA's relatively open availability enables a wave of community-driven experimentation, fine-tuning, and derivative model development, significantly democratizing access to high-performance language model research. Subsequent versions — Llama 2, 3, and 3.1 — introduce architectural improvements and increasingly permissive licensing, further accelerating open-source machine learning research. LLaMA's release is widely credited with catalyzing the open-source large language model ecosystem, providing a foundation for hundreds of derivative models and research projects.[102]
2023 (March) Achievement Release of GPT-4 OpenAI introduces GPT-4, a large multimodal language model capable of processing both text and images, representing a significant advance over its predecessor GPT-3.5 in reasoning, factual accuracy, and performance on standardized benchmarks. GPT-4 scores in the top percentiles on professional examinations including the bar exam and medical licensing tests, demonstrating a level of general reasoning capability that surprises many researchers. Trained via deep learning on a large corpus of text and refined with reinforcement learning from human feedback, GPT-4 becomes the foundation for a wide range of commercial applications and research projects. Its release intensifies debate about the pace of AI development, the adequacy of safety evaluations, and the societal implications of increasingly capable general-purpose machine learning systems.[103]

Visual data

The image below shows worldwide Google Trends interest in “Machine learning” from 2004 to 2026. It indicates low early attention, steady growth after the mid-2010s, and sharp increases in the 2020s, culminating in a peak around 2025–2026, reflecting accelerating public awareness, adoption, and discourse surrounding machine learning technologies and applications globally.[104]

Google Ngram Viewer

The chart below shows Google Ngram Viewer data for Machine learning, from 1950 to 2019.[105]

Wikipedia Views

The chart below shows shows monthly Wikipedia pageviews for the Machine learning article from 2015 to 2026, segmented by access platforms (desktop, mobile web, mobile app, and spiders). It highlights overall traffic trends, seasonal fluctuations, and notable spikes, reflecting growing public interest and shifting patterns in how users access machine learning content online.[106]

See also

Meta information on the timeline

How the timeline was built

The initial version of the timeline was written by User:Issa.

Funding information for this timeline is available.

Feedback and comments

Feedback for the timeline can be provided at the following places:

  • FIXME

What the timeline is still missing

Timeline update strategy

Pingbacks

See also

References

  1. Firican, George (31 January 2022). "The history of Machine Learning". LightsOnData. Retrieved 5 July 2023.
  2. 2.00 2.01 2.02 2.03 2.04 2.05 2.06 2.07 2.08 2.09 2.10 2.11 "A Brief History of Machine Learning". dataversity.net. Retrieved 20 February 2020.
  3. 3.0 3.1 3.2 "A History of Machine Learning and Deep Learning". import.io. Retrieved 21 February 2020.
  4. 4.0 4.1 "A brief history of the development of machine learning algorithms". subscription.packtpub.com. Retrieved 25 February 2020.
  5. "A BRIEF HISTORY OF MACHINE LEARNING". provalisresearch.com. Retrieved 21 February 2020.
  6. "What is Machine Learning?". mlplatform.nl. Retrieved 25 February 2020.
  7. 7.00 7.01 7.02 7.03 7.04 7.05 7.06 7.07 7.08 7.09 7.10 "A Short History of Machine Learning". forbes.com. Retrieved 20 February 2020.
  8. 8.0 8.1 8.2 8.3 8.4 8.5 8.6 8.7 "A history of machine learning". cloud.withgoogle.com. Retrieved 21 February 2020.
  9. Bayes, Thomas (1 January 1763). "An Essay towards solving a Problem in the Doctrine of Chance" (PDF). Philosophical Transactions. 53: 370–418. doi:10.1098/rstl.1763.0053. Retrieved 15 June 2016.
  10. "Jacquard Loom, 1934 - The Henry Ford". www.thehenryford.org. Retrieved 14 June 2023.
  11. 11.0 11.1 11.2 "History of Machine Learning". medium.com. Retrieved 25 February 2020.
  12. Legendre, Adrien-Marie (1805). Nouvelles méthodes pour la détermination des orbites des comètes (in French). Paris: Firmin Didot. p. viii. Retrieved 13 June 2016.{{cite book}}: CS1 maint: unrecognized language (link)
  13. O'Connor, J J; Robertson, E F. "Pierre-Simon Laplace". School of Mathematics and Statistics, University of St Andrews, Scotland. Retrieved 15 June 2016.
  14. 14.0 14.1 14.2 14.3 14.4 14.5 "History of Machine Learning". javatpoint.com. Retrieved 21 February 2020.
  15. Domingos, Pedro (22 September 2015). The Master Algorithm: How the Quest for the Ultimate Learning Machine Will Remake Our World (1st ed.). Basic Books.
  16. Hayes, Brian. "First Links in the Markov Chain". American Scientist (March–April 2013). Sigma Xi, The Scientific Research Society: 92. doi:10.1511/2013.101.1. Retrieved 15 June 2016.
  17. Bernhardt, Chris (2016). "Turing's Vision: The Birth of Computer Science". The MIT Press. {{cite journal}}: Cite journal requires |journal= (help)
  18. McCulloch, Warren S.; Pitts, Walter (1943). "A logical calculus of the ideas immanent in nervous activity". Bulletin of Mathematical Biophysics. 5 (4): 115–133. doi:10.1007/BF02478259.
  19. Hebb, Donald O. (1949). The Organization of Behavior: A Neuropsychological Theory. New York: Wiley.
  20. Turing, Alan (October 1950). "COMPUTING MACHINERY AND INTELLIGENCE". MIND. 59 (236): 433–460. doi:10.1093/mind/LIX.236.433. Retrieved 8 June 2016.
  21. Crevier 1993, pp. 34–35 and Russell & Norvig 2003, p. 17
  22. McCarthy, John; Feigenbaum, Ed. "Arthur Samuel: Pioneer in Machine Learning". AI Magazine. No. 3. Association for the Advancement of Artificial Intelligence. p. 10. Retrieved 5 June 2016.
  23. 23.0 23.1 Koch, Robert (1 September 2022). "History of Machine Learning - A Journey through the Timeline". clickworker.com. Retrieved 3 July 2023.
  24. Rosenblatt, Frank (1958). "THE PERCEPTRON: A PROBABILISTIC MODEL FOR INFORMATION STORAGE AND ORGANIZATION IN THE BRAIN" (PDF). Psychological Review. 65 (6): 386–408.
  25. Bheemaiah, Kariappa; Esposito, Mark; Tse, Terence (3 May 2017). "What is machine learning?". The Conversation. Retrieved 3 July 2023.
  26. "Seventy years of highs and lows in the history of machine learning". fastcompany.com. Retrieved 25 February 2020.
  27. 27.0 27.1 27.2 27.3 27.4 "History of deep machine learning". medium.com. Retrieved 21 February 2020.
  28. Cohen, Harvey. "The Perceptron". Retrieved 5 June 2016.
  29. Colner, Robert. "A brief history of machine learning". SlideShare. Retrieved 5 June 2016.
  30. Seppo Linnainmaa (1970). The representation of the cumulative rounding error of an algorithm as a Taylor expansion of the local rounding errors. Master's Thesis (in Finnish), Univ. Helsinki, 6-7.
  31. Seppo Linnainmaa (1976). Taylor expansion of the accumulated rounding error. BIT Numerical Mathematics, 16(2), 146-160.
  32. Griewank, Andreas (2012). Who Invented the Reverse Mode of Differentiation?. Optimization Stories, Documenta Matematica, Extra Volume ISMP (2012), 389-400.
  33. Jürgen Schmidhuber (2015). Deep learning in neural networks: An overview. Neural Networks 61 (2015): 85-117.
  34. Harth, E.; Tzanakou, E. (1974). "Alopex: a stochastic method for determining visual receptive fields". Vision Research. 14 (12): 1475–1482. doi:10.1016/0042-6989(74)90024-8. PMID 4446379.
  35. Dempster, A.P.; Laird, N.M.; Rubin, D.B. (1977). "Maximum Likelihood from Incomplete Data via the EM Algorithm". Journal of the Royal Statistical Society, Series B. 39 (1): 1–38.
  36. "Rise of the machines". mydigitalpublication.com. Retrieved 5 July 2023.
  37. Fukushima, Kunihiko (1980). "Neocognitron: A Self-organizing Neural Network Model for a Mechanism of Pattern Recognition Unaffected by Shift in Position". Biological Cybernetics. 36: 193–202. doi:10.1007/bf00344251.
  38. Linde, Y.; Buzo, A.; Gray, R. (1980). "An Algorithm for Vector Quantizer Design". IEEE Transactions on Communications. 28: 84–95. doi:10.1109/TCOM.1980.1094577.
  39. 39.0 39.1 39.2 39.3 39.4 39.5 39.6 "Brief History of Machine Learning". erogol.com. Retrieved 24 February 2020.
  40. Hopfield, John (April 1982). "Neural networks and physical systems with emergent collective computational abilities". Proceedings of the National Academy of Sciences of the United States of America. 79: 2554–2558. doi:10.1073/pnas.79.8.2554.
  41. Bozinovski, S. (1982). "A self-learning system using secondary reinforcement". In Trappl, Robert (ed.). Cybernetics and Systems Research: Proceedings of the Sixth European Meeting on Cybernetics and Systems Research. North Holland. pp. 397–402.
  42. Sejnowski, Terrence J.; Rosenberg, Charles R. (1987). "Parallel Networks that Learn to Pronounce English Text". Complex Systems. 1: 145–168.
  43. Rumelhart, David; Hinton, Geoffrey; Williams, Ronald (9 October 1986). "Learning representations by back-propagating errors". Nature. 323: 533–536. doi:10.1038/323533a0.
  44. Dehaene S, Changeux JP. Experimental and theoretical approaches to conscious processing. Neuron. 2011 Apr 28;70(2):200-27.
  45. Changeux JP, Dehaene S. Hierarchical neuronal modeling of cognitive functions: from synaptic transmission to the Tower of London. Comptes Rendus de l'Académie des Sciences, Série III. 1998 Feb–Mar;321(2–3):241-7.
  46. "Machine Learning". springer.com. Retrieved 9 March 2020.
  47. "Past Conferences". ECML PKDD 2008. Retrieved 7 March 2026.
  48. "Knowledge Engineering and Machine Learning Group". kemlg.upc.edu. Retrieved 12 May 2026.
  49. Watkins, C.J.C.H. (1989), Learning from Delayed Rewards (PDF) (Ph.D. thesis), Cambridge University
  50. Tesauro, Gerald (March 1995). "Temporal Difference Learning and TD-Gammon". Communications of the ACM. 38 (3).
  51. "Past Conferences". International Conference on Machine Learning (ICML). Retrieved 27 March 2026.
  52. Cortes, Corinna; Vapnik, Vladimir (September 1995). "Support-vector networks". Machine Learning. 20 (3): 273–297. doi:10.1007/BF00994018.
  53. Ho, Tin Kam (August 1995). "Random Decision Forests". Proceedings of the Third International Conference on Document Analysis and Recognition. 1. Montreal, Quebec: IEEE: 278–282. doi:10.1109/ICDAR.1995.598994. ISBN 0-8186-7128-9.
  54. Hochreiter, Sepp; Schmidhuber, Jürgen (1997). "Long Short-Term Memory". Neural Computation. 9 (8): 1735–1780. doi:10.1162/neco.1997.9.8.1735.
  55. Breunig, M. M.; Kriegel, H.-P.; Ng, R. T.; Sander, J. (2000). LOF: Identifying Density-based Local Outliers. Proceedings of the 2000 ACM SIGMOD International Conference on Management of Data. SIGMOD. pp. 93–104. doi:10.1145/335191.335388. ISBN 1-58113-217-4.
  56. Friedman, Jerome; Hastie, Trevor; Tibshirani, Robert (2000). "Additive logistic regression: a statistical view of boosting". Annals of Statistics. 28 (2): 337–407. doi:10.1214/aos/1016218223.
  57. "History of the Journal of Machine Learning Research". Journal of Machine Learning Research. JMLR, Inc. Retrieved 23 March 2026.
  58. Yu, Cui; Ooi, Beng Chin; Tan, Kian-Lee; Jagadish, H. V. (2001). "Indexing the Distance: An Efficient Method to KNN Processing" (PDF). Proceedings of the 27th International Conference on Very Large Data Bases (VLDB). Rome, Italy: Morgan Kaufmann. pp. 421–430. Retrieved 23 March 2026.
  59. Collobert, Ronan; Bengio, Samy; Mariethoz, Johnny (30 October 2002). "Torch: a modular machine learning software library" (PDF). Retrieved 5 June 2016.
  60. Ham, Ji Hun; Daniel D. Lee; Lawrence K. Saul (2003). "Learning high dimensional correspondences from low dimensional manifolds". Proceedings of the Twentieth International Conference on Machine Learning (ICML-2003).
  61. Dean, Jeffrey; Ghemawat, Sanjay (2004). "MapReduce: Simplified Data Processing on Large Clusters" (PDF). 6th Symposium on Operating Systems Design and Implementation (OSDI 2004). San Francisco, CA: USENIX Association. pp. 137–150.
  62. Hawkins, Jeff; Blakeslee, Sandra (2004). On Intelligence. Times Books. ISBN 978-0-8050-7456-7.
  63. Graves, Alex; Schmidhuber, Jürgen (2005). "Framewise phoneme classification with bidirectional LSTM and other neural network architectures". Neural Networks. 18 (5–6): 602–610. doi:10.1016/j.neunet.2005.06.042.
  64. "What is scikit-learn?". njtrainingacademy.com. Retrieved 5 March 2020.
  65. "Sharing is Caring with Algorithms". towardsdatascience.com. Retrieved 8 March 2020.
  66. "Python's pandas library is on its way to v.1.0.0". jaxenter.com. Retrieved 9 March 2020.
  67. Liu, Fei Tony; Ting, Kai Ming; Zhou, Zhi-Hua (December 2008). "Isolation Forest". 2008 Eighth IEEE International Conference on Data Mining: 413–422. doi:10.1109/ICDM.2008.17. ISBN 978-0-7695-3502-9.
  68. "Encog Machine Learning Framework". heatonresearch.com. Retrieved 8 March 2020.
  69. "About". Kaggle. Retrieved 16 June 2016.
  70. Shotton, Jamie; Fitzgibbon, Andrew; Cook, Mat; Sharp, Toby; Finocchio, Mark; Moore, Richard; Kipman, Alex; Blake, Andrew (2011). "Real-Time Human Pose Recognition in Parts from Single Depth Images". Proceedings of CVPR 2011. pp. 1297–1304.
  71. Konidaris, George; Scott Kuindersma; Andrew Barto; Roderic Grupen (2010). "Constructing Skill Trees for Reinforcement Learning Agents from Demonstration Trajectories". Advances in Neural Information Processing Systems 23.
  72. Markoff, John (17 February 2011). "Computer Wins on 'Jeopardy!': Trivial, It's Not". New York Times. p. A1. Retrieved 5 June 2016.
  73. Le, Quoc; Ranzato, Marc'Aurelio; Monga, Rajat; Devin, Matthieu; Chen, Kai; Corrado, Greg; Dean, Jeff; Ng, Andrew (12 July 2012). "Building High-level Features Using Large Scale Unsupervised Learning". CoRR. arXiv:1112.6209.
  74. Markoff, John (26 June 2012). "How Many Computers to Identify a Cat? 16,000". New York Times. p. B1. Retrieved 5 June 2016.
  75. Mevlut Yıldız (September 1, 2023). "History of Machine Learning". Clarusway. Retrieved 25 March 2026.
  76. "mlpy". mlpy.sourceforge.net. Retrieved 8 March 2020.
  77. "Proposal for A New Publishing Model in Computer Science". yann.lecun.com.
  78. "13th Annual International Conference on Learning Representations (ICLR). 2025 Fact Sheet" (PDF). ICLR. 2025. Retrieved 2026-01-31.
  79. Taigman, Yaniv; Yang, Ming; Ranzato, Marc'Aurelio; Wolf, Lior (24 June 2014). "DeepFace: Closing the Gap to Human-Level Performance in Face Verification". Conference on Computer Vision and Pattern Recognition. Retrieved 8 June 2016.
  80. "Popular Big Data Engine Apache Spark 2.0 Released". adtmag.com. Retrieved 8 March 2020.
  81. "The Turing Test Is Not What You Think It Is". WNYC. Retrieved 4 July 2023.
  82. Goodfellow, Ian; Pouget-Abadie, Jean; Mirza, Mehdi; Xu, Bing; Warde-Farley, David; Ozair, Sherjil; Courville, Aaron; Bengio, Yoshua (2014). Generative Adversarial Networks. Proceedings of the International Conference on Neural Information Processing Systems (NIPS 2014). pp. 2672–2680.
  83. "Introducing spaCy". explosion.ai. Retrieved 5 March 2020.
  84. "Keras". news.ycombinator.com. Retrieved 5 March 2020.
  85. "Big-in-Japan AI code 'Chainer' shows how Intel will gun for GPUs". The Register. 2017-04-07. Retrieved 8 March 2020.
  86. "Apache SINGA". singa.apache.org. Retrieved 8 March 2020.
  87. "Google achieves AI 'breakthrough' by beating Go champion". BBC News. 27 January 2016.
  88. "AlphaGo". Google DeepMind.
  89. Dean, Jeff; Monga, Rajat (9 November 2015). "TensorFlow - Google's latest machine learning system, open sourced for everyone". Google Research Blog. Retrieved 5 June 2016.
  90. Sak, Haşim; Senior, Andrew; Rao, Kanishka; Beaufays, Françoise (2015). "Fast and Accurate Recurrent Neural Network Acoustic Models for Speech Recognition". arXiv. arXiv:1507.06947.
  91. Silver, David; Huang, Aja; Maddison, Chris J.; Guez, Arthur; Sifre, Laurent; van den Driessche, George; Schrittwieser, Julian; Antonoglou, Ioannis; Panneershelvam, Veda; Lanctot, Marc; Hassabis, Demis (2016). "Mastering the game of Go with deep neural networks and tree search". Nature. 529: 484–489. doi:10.1038/nature16961.
  92. Assael, Yannis M.; Shillingford, Brendan; Whiteson, Shimon; de Freitas, Nando (2016). "LipNet: End-to-End Sentence-level Lipreading". arXiv. arXiv:1611.01599.
  93. Dunn, Jeffrey (10 May 2016). "Introducing FBLearner Flow: Facebook's AI backbone". Facebook Code. Retrieved 8 June 2016.
  94. "PyTorch Releases Major Update, Now Officially Supports Windows". medium.com. Retrieved 8 March 2020.
  95. "Neural Architecture Search with Reinforcement Learning". arXiv. Retrieved 12 May 2026.
  96. "Uber self-driving car kills pedestrian in first fatal autonomous vehicle crash". The Guardian. 19 March 2018. Retrieved 12 May 2026.
  97. Vincent, James (10 April 2019). "The first AI-generated textbook shows what robot writers are actually good at". The Verge. Retrieved 5 May 2019.
  98. Vaishya, Raju; Javaid, Mohd; Khan, Ibrahim Haleem; Haleem, Abid (1 July 2020). "Artificial Intelligence (AI) applications for COVID-19 pandemic". Diabetes & Metabolic Syndrome: Clinical Research & Reviews. 14 (4): 337–339. doi:10.1016/j.dsx.2020.04.012. PMC 7195043. PMID 32305024.
  99. Schmid, Martin; Moravcik, Matej (2021). "Player of Games". arXiv. arXiv:2112.03178.
  100. Fedus, William; Zoph, Barret; Shazeer, Noam (2022). "Switch Transformers: Scaling to Trillion Parameter Models with Simple and Efficient Sparsity". Journal of Machine Learning Research. 23. arXiv:2101.03961.
  101. "ChatGPT Version History: Evolution Timeline". Nexos.ai. 7 September 2025. Retrieved 25 March 2026.
  102. Luís Roque; Rafael Guedes (6 September 2024). "The Evolution of Llama: From Llama 1 to Llama 3.1". Medium (Towards Data Science). Retrieved 23 March 2026.{{cite web}}: CS1 maint: multiple names: authors list (link)
  103. "GPT-4". OpenAI. 14 March 2023. Retrieved 23 March 2026.
  104. "Google Trends: Machine learning (worldwide search interest)". Google Trends. Google. Retrieved 24 March 2026.
  105. "Machine learning". books.google.com. Retrieved 11 March 2021.
  106. "Wikipedia Views: Machine learning page statistics". Wikipedia Views. Vipul Naik. Retrieved 24 March 2026.