Timeline of large language models
This is a timeline of large language models, which consist in artificial intelligence (AI) systems that use deep learning techniques to process and generate human-like natural language. LLMs are pre-trained on large amounts of data to learn the complexity and linkages of language, and can be adapted for specific tasks using techniques like fine-tuning, in-context learning, and zero-/one-/few-shot learning.
- 1 Sample questions
- 2 Big picture
- 3 Full timeline
- 4 Numerical and visual data
- 5 Meta information on the timeline
- 6 See also
- 7 External links
- 8 References
The following are some interesting questions that can be answered by reading this timeline:
- What are some notable or sample research cases on large language models?
- You will see some research publications revealing important aspects of LLMs.
|Time period||Development summary||More details|
|2010–2017||Early years||Period characterized by the development of the first large-scale language models, such as the Google Ngram Corpus (2010) and the Microsoft Web N-gram Corpus (2013), which provides researchers with large datasets to train language models. During this period, researchers also develop new techniques for training neural language models, such as the use of recurrent neural networks (RNNs) and long short-term memory (LSTM) networks.|
|2017–2019||Emergence of transformers||This period sees the emergence of the transformer architecture, which revolutionizes natural language processing and makes possible the development of larger and more powerful language models. In 2017, Vaswani et al. introduce the transformer architecture, which uses self-attention to model the relationships between words in a sentence. This architecture is used to develop the GPT (Generative Pre-trained Transformer) models by OpenAI, which would achieve state-of-the-art performance on a range of language tasks.|
|2019–present||GPT-3 and beyond||Period characterized by the development of even larger and more powerful language models, such as GPT-2 and GPT-3. In 2020, OpenAI releases GPT-3, which has 175 billion parameters, making it the largest language model to date. GPT-3 demonstrates impressive capabilities, such as the ability to generate coherent text, answer questions, and even write code. This period also sees the emergence of new research directions, such as using language models for unsupervised learning, few-shot learning, and transfer learning. By late 2022, LLMs becomes a sensation on the internet as OpenAI's ChatGPT acquires 1 million users within only 5 days of its release. The remarkable capabilities and extensive uses of ChatGPT can be attributed to the GPT-3 language model's 175 billion parameters.|
|Year||Month and date||Model name||Size (in parameters)||Event type||Details|
|2018||April 1||Marian||A paper introduces Marian, a highly efficient Neural Machine Translation (NMT) framework written entirely in C++. The framework includes an integrated automatic differentiation engine based on dynamic computation graphs. The authors discuss the design of the encoder-decoder framework and demonstrate that Marian, as a research-friendly toolkit, achieves fast training and translation speeds, making it a valuable tool for NMT research and development.|
|2018||October 11||BERT||340,000,000||LLM launch||BERT, short for Bidirectional Encoder Representations from Transformers, is introduced as a language representation model that achieves optimal performance on various natural language processing tasks. Unlike previous models, BERT is designed to pre-train deep bidirectional representations by considering both left and right context in all layers. By fine-tuning BERT with an additional output layer, it can be effectively applied to tasks such as question answering and language inference without significant architectural modifications. BERT demonstrates its simplicity and effectiveness, setting new benchmarks on eleven NLP tasks, including significant improvements in GLUE score, MultiNLI accuracy, and SQuAD question answering performance.|
|2019||May 29||GROVER||LLM launch||A team of researchers from the University of Washington and Allen Institute for AI Research introduce GROVER, a language model similar to GPT-2. However, they do not make the larger versions of the model publicly available. Their publication discusses the potential risks of natural language generation technology and the need for robust defenses against neural fake news. Grover can generate realistic news articles that are difficult to distinguish from real news. They also explore the effectiveness of current methods for detecting fake news and find that the best defense against Grover is itself, with 92% accuracy. The article concludes by discussing the ethical issues related to the technology and the importance of public release of strong generators to facilitate better detection of neural fake news.|
|2019||June 19||XLNet||~340,000,000||LLM launch||XLNet is introduced as a generalized autoregressive pretraining method for language understanding. Unlike BERT, which relies on masking input tokens, XLNet considers all permutations of the factorization order to model bidirectional contexts. This approach overcomes the limitations of BERT and improves pretrain-finetune consistency. XLNet incorporates ideas from Transformer-XL, an autoregressive model, into its pretraining process. In empirical evaluations across 20 tasks, XLNet outperforms BERT by a significant margin, including question answering, natural language inference, sentiment analysis, and document ranking.|
|2019||July 26||RoBERTa||LLM launch||Researchers introduce "RoBERTa: A Robustly Optimized BERT Pretraining Approach," after conducting a replication study of BERT pretraining (Devlin et al., 2019) to evaluate the impact of key hyperparameters and training data size on performance. They find that BERT was undertrained and demonstrate that it can achieve or surpass the performance of subsequent models. The authors achieve state-of-the-art results on GLUE, RACE, and SQuAD benchmarks, highlighting the significance of overlooked design choices and questioning the origins of recently reported improvements. The models and code used in the study are made publicly available, facilitating further research and exploration.|
|2019||August||Megatron-LM||8,300,000,000||LLM launch||NVIDIA introduces Megatron-LM, which boasts 8.3 billion parameters and is trained with data parallelism on a remarkable 512 GPUs. The training process took a mere 53 minutes, showcasing its computational efficiency. Megatron-LM's training data is sourced from diverse places, including Wikipedia, OpenWebText, RealNews, and CC-Stories, with a combined dataset size of 174 gigabytes. This model represents a significant milestone in the development of large-scale language models, highlighting the capabilities of modern hardware and data processing in the field of natural language processing.|
|2019||September 11||CTRL||1,630,000,000||LLM launch||CTRL is introduced as a conditional transformer language model that aims to enhance control over text generation. It is designed to condition on control codes, allowing users to govern style, content, and task-specific behavior. These control codes are derived from the structure that naturally co-occurs with raw text, providing explicit control while leveraging the advantages of unsupervised learning. CTRL is capable of predicting the likelihood of different parts of the training data given a sequence, enabling potential analysis of large datasets through model-based source attribution. Multiple pretrained versions of CTRL have been released and can be accessed at the provided URL. The authors of the model are Nitish Shirish Keskar, Bryan McCann, Lav R. Varshney, Caiming Xiong, and Richard Socher.|
|2019||September 26||ALBERT||LLM launch||ALBERT is introduced as a lightweight version of BERT that focuses on self-supervised learning of language representations. The authors address the limitations of increasing model size by proposing two parameter-reduction techniques, which reduce memory consumption and training time. Empirical evidence demonstrates that their methods significantly improve the scalability of models compared to the original BERT. Additionally, they employ a self-supervised loss that prioritizes modeling inter-sentence coherence, consistently enhancing performance on tasks with multi-sentence inputs. The best ALBERT model achieves new state-of-the-art results on benchmarks such as GLUE, RACE, and SQuAD while having fewer parameters than BERT-large.|
|2019||October 2||DistilBERT||LLM launch||DistilBERT is introduced as a smaller, faster, and cheaper version of BERT, designed for efficient on-device computations. It retains 97% of BERT's language understanding capabilities while reducing its size by 40%. By using knowledge distillation during pre-training and a triple loss function, it captures important linguistic features. DistilBERT proves its capabilities through proof-of-concept experiments and on-device studies.|
|2019||November 1||DialoGPT||1,500,000,000||LLM launch||DialoGPT is introduced as a large, adaptable neural model for generating conversational responses. It is trained on 147 million conversation-like exchanges from Reddit comment chains spanning 2005 to 2017. DialoGPT, an extension of the Hugging Face PyTorch transformer, achieves performance close to human-level evaluation in single-turn dialogues. It outperforms strong baseline systems by generating more relevant, meaningful, and contextually consistent responses. The pre-trained model and training pipeline are publicly available, encouraging research in neural response generation and the advancement of intelligent open-domain dialogue systems.|
|2019||November 10||CamemBERT||110,000,000||A paper introduces CamemBERT, a monolingual Transformer-based language model trained specifically for French. It addresses the limited practical use of pretrained models in languages other than English. The authors evaluate CamemBERT on various tasks including part-of-speech tagging, dependency parsing, named entity recognition, and natural language inference. They find that using web crawled data is preferable to Wikipedia data. Surprisingly, even with a relatively small web crawled dataset of 4GB, CamemBERT achieves results on par with or better than models trained on larger datasets of over 130GB. In fact, CamemBERT outperforms the state-of-the-art models in all four downstream tasks.|
|2019||December 11||FlauBERT||138,000,000 – 373,000,000||LLM launch||FlauBERT is introduced as an unsupervised language model for French, developed by Hang Le et al. It leverages unlabeled texts to pre-train word representations, demonstrating superior performance in various NLP tasks. Trained on a large and diverse French corpus, FlauBERT outperforms other pre-training approaches. The authors share different FlauBERT versions and a unified evaluation protocol, FLUE, for reproducible French NLP experiments.|
|2020||January 13||ProphetNet||LLM launch||A paper introduces ProphetNet, a new sequence-to-sequence pre-training model. It incorporates a novel self-supervised objective called future n-gram prediction and utilizes the n-stream self-attention mechanism. Unlike traditional models that optimize one-step-ahead prediction, ProphetNet predicts the next n tokens simultaneously based on previous context tokens at each time step. The future n-gram prediction objective encourages the model to plan for future tokens and prevents overfitting to local correlations. ProphetNet is pre-trained on both a base-scale dataset (16GB) and a large-scale dataset (160GB). The model's performance is evaluated on benchmarks such as CNN/DailyMail, Gigaword, and SQuAD 1.1 for tasks like abstractive summarization and question generation. Experimental results demonstrate that ProphetNet outperforms models using the same pre-training corpus in terms of state-of-the-art results on all tested datasets.|
|2020||February 24||Text-to-Text Transfer Transformer||11,000,000,000||LLM launch||T5 is introduced as a Text-To-Text Transfer Transformer model. It is a flexible and powerful model that achieves optimal results in natural language processing tasks. It uses a unified text-to-text framework, allowing for easy adaptation to various NLP tasks. T5 is trained on a large-scale pre-training dataset called C4, which improves its performance. The authors conduct a systematic study of transfer learning methodologies and combine the best approaches to achieve remarkable results on multiple benchmarks. T5 is also applied to closed-book question answering and fill-in-the-blank text generation tasks with impressive performance.|
|2020||April||Megatron-11B||11,000,000,000||LLM launch||Facebook AI Research (FAIR) introduces Megatron-11B, a unidirectional language model with 11 billion parameters, which is built upon the Megatron-LM architecture. FAIR trained this model using intra-layer model parallelism, splitting each layer's parameters across 8 GPUs. Megatron-11B is trained on a dataset consisting of English Wikipedia (12GB), BookCorpus (4GB), CC-News (76GB), OpenWebText/Reddit upvoted (38GB), and Stories (31GB), with a total dataset size of 161GB. This model is part of the RoBERTa family and contributes to the advancements in large-scale language models for natural language processing tasks.|
|2020||June 5||DeBERTa||1,500,000,000 (larger model)||LLM launch||A paper presents DeBERTa, a model that enhances BERT and RoBERTa LLMs by introducing disentangled attention and an enhanced mask decoder. These techniques improve model pre-training efficiency and performance on various NLP tasks. A DeBERTa model trained on half the data outperforms RoBERTa-Large on tasks like MNLI, SQuAD v2.0, and RACE. A larger DeBERTa model with 1.5 billion parameters surpasses human performance on the SuperGLUE benchmark, and an ensemble DeBERTa model leads the SuperGLUE leaderboard with a significant margin over the human baseline.|
|2020||June||GPT-3||175,000,000,000||LLM launch||OpenAI releases GPT-3 as a service, powered by a 175-billion-parameter model that can generate text and code with short written prompts.|
|2021||January 11||Wu Dao||1,750,000,000,000||LLM launch||Wu Dao is released. It's among the top large language models by parameter size.|
|2021||March 22||GPT-Neo||2,700,000,000||LLM launch||GPT-Neo is introduced as an open-source alternative to GPT-3, developed by EleutherAI. It offers accessible language generation capabilities and is released under the MIT license. While GPT-Neo's performance is not as strong as GPT-3's largest model, it outperforms comparable GPT-3 models on NLP reasoning benchmarks. GPT-Neo provides a promising option, especially considering OpenAI's restricted access policy.|
|2021||May||LaMDA||173,000,000,000||LLM launch||Google anounces LaMDA (Language Model for Dialogue Applications). Unlike other language models, LaMDA is specifically trained on dialogue to enable more natural and engaging conversations with users. It has the ability to understand and respond to the subtleties of open-ended discussions. LaMDA has various potential applications, including customer service, chatbots, and personal assistants. It is built upon Google's previous chatbot model called Meena.|
|2021||October 11||MT-NLG||530,000,000,000||LLM launch||MT-NLG (Megatron-Turing Natural Language Generation) is introduced as a language model developed jointly by Nvidia and Microsoft. It utilizes the architecture of the Megatron transformer-based model and has a record-breaking size of 530 billion parameters. MT-NLG is designed to generate coherent and contextually relevant text for various natural language processing tasks such as completion prediction, reading comprehension, commonsense reasoning, and word sense disambiguation. Training such large-scale models is challenging due to memory constraints and long training times, but innovations in hardware, software, and training methods have made it feasible. MT-NLG achieves state-of-the-art results in zero-shot, one-shot, and few-shot settings across multiple NLP tasks.|
|2021||December||Fairseq||13,000,000,000 – 1,000,000,000,000||LLM launch||Meta AI, previously known as FAIR (Facebook AI Research), announces the introduction of Fairseq, a language model with parameters of 13B and 1.1T. Fairseq is not related to Megatron, and the two use different technologies for training. Fairseq's dataset sources include the same ones used for RoBERTa (English Wikipedia, BookCorpus, CC-News, OpenWebText/Reddit upvoted, and Stories) with the new addition of English CC100 in Wikipedia style from Jan/2018-Dec/2018, resulting in a total dataset size of 453GB. Fairseq was trained using 2,363 GPU-days with 1,024 GPUs, taking approximately three days.|
|2022||January 19||CM3||A paper introduces CM3, a family of causally masked generative models trained on large-scale web and Wikipedia articles containing text and image tokens. The new approach generates tokens left to right while masking out a small number of long token spans that are generated at the end of the string. This provides a hybrid of the more common causal and masked language models, allowing for full generative modeling while providing bidirectional context when generating the masked spans. The resulting CM3 models can generate rich structured, multi-modal outputs while conditioning on arbitrary masked document contexts and implicitly learn a wide range of text, image, and cross-modal tasks. The paper also reports state-of-the-art performance in zero-shot summarization, entity linking, and entity disambiguation, while maintaining competitive performance in the fine-tuning setting.|
|2022||January 27||InstructGPT||1,300,000,000||LLM launch||OpenAI announces having deployed InstructGPT, a new language model that is safer, more helpful, and more aligned with users. The model was trained using a reinforcement learning technique from human feedback and is significantly better at following instructions than the previous model, GPT-3. InstructGPT is also less toxic and generates fewer false facts than its predecessor. The company believes that fine-tuning language models with humans in the loop is a powerful tool for improving their safety and reliability. InstructGPT becomes the default language model accessible on OpenAI's API.|
|2022||February 28||Extremely Large||LLM launch||Cohere launches a new beta version of their language generation model called "Extremely Large", which, according to Cohere, outperforms their existing largest model, Large, on various tasks such as sentiment analysis, named entity recognition (NER), and common sense reasoning.|
|2022||March 24||SeeKeR||LLM launch||Researchers report having developed a new language model called SeeKeR that combines internet search, knowledge generation, and response generation to improve factual accuracy in open-domain knowledge-grounded conversations. SeeKeR outperforms the model BlenderBot 2 in terms of consistency, knowledge, and engagingness for the same number of parameters. SeeKeR also outperforms GPT2 and GPT3 in terms of factuality and topicality for prompt completions as a standard language model.|
|2022||March 25||CODEGEN||LLM launch||A paper introduces a family of LLMs called CODEGEN, trained on natural language and programming language data for program synthesis. The authors release CODEGEN and the training library JAXFORMER to democratize access to such models. They demonstrate that CODEGEN is competitive with previous state-of-the-art models for zero-shot Python code generation and investigate multi-turn program synthesis using an open benchmark called MTPB. Their analysis shows that multi-turn program synthesis significantly improves program synthesis over single-turn prompts. The training library and model checkpoints are available as open source contributions.|
|2022||April 5||PaLM||540,000,000,000||LLM launch||A paper presents PaLM, a 540-billion parameter language model trained using Pathways, a new machine learning system that enables highly efficient training across multiple TPU Pods. PaLM achieves state-of-the-art few-shot learning results on hundreds of language understanding and generation benchmarks and outperforms the finetuned state-of-the-art on a suite of multi-step reasoning tasks. It also outperforms average human performance on the BIG-bench benchmark. Additionally, PaLM has strong capabilities in multilingual tasks and source code generation. The paper also discusses bias and toxicity and potential mitigation strategies.|
|2022||April 14||GPT-NeoX-20B||20,000,000,000||LLM launch||GPT-NeoX-20B is introduced as an autoregressive language model. It is trained on the Pile dataset, and its weights are openly available to the public under a permissive license. GPT-NeoX-20B is described as the largest publicly available dense autoregressive model at the time of submission. The introducing paper discusses the architecture and training of GPT-NeoX-20B and evaluates its performance on various tasks related to language understanding, mathematics, and knowledge-based reasoning. The results show that GPT-NeoX-20B performs exceptionally well in few-shot scenarios, surpassing similarly sized models such as GPT-3 and FairSeq.|
|2022||April||DALL-E 2||LLM launch||OpenAI reveals DALL-E 2.|
|2022||May 3||OPT||175,000,000,000||LLM launch||Meta AI introduces Open Pretrained Transformer-175B (OPT-175B), a language model designed to democratize access to large-scale language models. By this time, these models, with over 100 billion parameters, have revolutionized NLP and AI research. OPT-175B is released with both pretrained models and code for training and usage, under a noncommercial license for research purposes. It aims to make these models accessible to academic, governmental, civil society, and industry researchers worldwide. Meta AI emphasizes responsible AI and provides documentation, compute efficiency, and smaller-scale baseline models for analysis.|
|2022||June||YaLM 100B||100,000,000,000||LLM launch||Yandex unveils YaLM 100B, the largest open-source GPT-like neural network as of date. This model boasts 100 billion parameters and is offered for free, aiming to make advanced language models accessible to researchers worldwide. It was trained for 65 days on 800 A100 graphics cards using 1.7 TB of diverse text sources. Yandex shares the model on GitHub under the Apache 2.0 license for both research and commercial use.|
|2022||June 29||Minerva||540,000,000,000||LLM launch||Google introduces Minerva, a large language model designed to bridge the gap in quantitative reasoning tasks. While existing language models excel in natural language understanding, they often struggle with quantitative tasks like solving college-level math, science, and engineering problems. Minerva is pretrained on general language data and then fine-tuned on technical content. It achieves optimal performance on technical benchmarks without external tools. Evaluation on over 200 undergraduate-level problems in various sciences reveals Minerva can correctly answer nearly one-third of them, demonstrating significant progress in the integration of quantitative reasoning into language models.|
|2022||November 9||BLOOM||176,000,000,000||LLM launch||A paper introduces BLOOM, an open-access language model designed and built by a collaboration of hundreds of researchers. The model is a decoder-only Transformer language model trained on the ROOTS corpus, a dataset comprising hundreds of sources in 46 natural and 13 programming languages. BLOOM achieves competitive performance on a wide variety of benchmarks and is publicly released under the Responsible AI License to facilitate future research and applications using large language models. The paper also discusses the development process and the need to democratize large language models.|
|2022||November 17||Galactica||120,000,000,000||LLM launch||Meta AI introduces Galactica, a language model capable of generating scientific and academic papers from simple text inputs. Trained on a vast corpus of scientific literature, knowledge bases, and reference materials, Galactica compresses this data into a 120-billion parameter model. It aims to summarize academic literature, solve math problems, and generate Wiki articles. However, after its launch, Galactica faces criticism for generating content that sounds grammatically correct but is scientifically inaccurate, leading Meta to pull it down after just three days. Some experts find it useful, while others consider it a "random bullshit generator."|
|2022||November 17||Alexa Teacher Model||20,000,000,000||LLM launch||Amazon makes the Alexa Teacher Model with 20 billion parameters (AlexaTM 20B) available through Amazon SageMaker JumpStart. AlexaTM 20B is a multilingual sequence-to-sequence language model suitable for various industry applications, including summarizing financial reports and customer service chatbots. It excels in zero-shot learning tasks like SuperGLUE and multilingual zero-shot tasks such as XNLI, outperforming a 175 billion GPT-3 model. The model is designed to generalize well and handle data scarcity for various natural language processing tasks, making it valuable for developers looking to improve performance on downstream tasks with minimal training data.|
|2023||January 5||Research||A paper discusses the concern about the potential of LLMs to influence, modify, and manipulate user preferences adversarially. As these models become more proficient in deducing user preferences and offering tailored assistance, their lack of interpretability in adversarial settings is a major concern. The paper examines existing literature on adversarial behavior in user preferences and provides red teaming samples for dialogue models like ChatGPT and GODEL. It also probes the attention mechanism in these models for non-adversarial and adversarial settings.|
|2023||January 31||FLAME||60,000,000||LLM launch||FLAME is introduced as a small language model for assisting in the creation of spreadsheet formulas. It is based on T5 and trained on Excel formulas using domain-specific insights to achieve competitive performance with a substantially smaller model size (60M parameters) and much less training data. FLAME outperforms much larger models in 6 out of 10 settings, including formula repair, formula auto-completion, and syntax reconstruction.|
|2023||February 2||Prompting||Researchers introduce Multimodal Chain-of-Thought (CoT) reasoning for large language models (LLMs). While LLMs have excelled in complex reasoning, their CoT prompting has been limited to text. Multimodal-CoT extends this by incorporating both text and images, creating a two-stage framework. This separation allows for better-generated rationales based on multimodal information, leading to improved answer inference. Even with under 1 billion parameters, the model outperforms the state-of-the-art LLM (GPT-3.5) by 16 percentage points on the ScienceQA benchmark, achieving 91.68% accuracy, and even surpasses human performance.|
|2023||February 9||Toolformer||6,700,000,000||LLM launch||Toolformer is introduced. It is a language model trained to use external tools via simple APIs, which can achieve improved performance on downstream tasks. The model is trained in a self-supervised way, using only a handful of demonstrations for each API. The model, which incorporates a range of tools including a calculator, Q&A system, search engines, translation system, and calendar, achieves substantially improved zero-shot performance across various downstream tasks, often competitive with much larger models, without sacrificing its core language modeling abilities.|
|2023||c.February 14||Palmyra||20,000,000,000 5,000,000,000 128,000,000||LLM launch||Full-stack generative AI platform Writer launches Palmyra, a trio of LLMs that focus on business writing and marketing data. The models include Palmyra Small (128M), Palmyra Base (5B), and Palmyra Large (20B), and are aimed at enterprises looking to invest in generative AI. Palmyra LLMs offer both an application layer and a foundation model layer, making Writer the first to provide both on a single platform. The models also offer high levels of security and privacy features. While general-use LLMs can achieve human-like output, they lack contextual awareness, multi-modal inputs, brand integrity and compliance with security and privacy standards, limiting their usefulness for enterprise organizations.|
|2023||February 20||MOSS||16,000,000,000||LLM launch||MOSS is introduced as a conversational language model developed by Fudan University. It performs various natural language tasks including question answering, text summarization, and code generation. It is aimed to be open-sourced to facilitate future research. MOSS has some limitations, such as poor performance on languages other than English and a relatively small model capacity. It may also generate misleading or false information and may need multiple attempts to follow instructions correctly.|
|2023||February 21||Prompting||A paper presents a catalog of prompt engineering techniques in pattern form that have been applied successfully to solve common problems when conversing with large language models (LLMs), such as ChatGPT. Prompt patterns are reusable solutions to common problems faced when working with LLMs that can customize the outputs and interactions with an LLM. The paper provides a framework for documenting patterns for structuring prompts to solve a range of problems and presents a catalog of patterns that have been applied successfully to improve the outputs of LLM conversations. It also explains how prompts can be built from multiple patterns and illustrates prompt patterns that benefit from combination with other prompt patterns. The paper contributes to research on prompt engineering that applies LLMs to automate software development tasks.|
|2023||February 24||LLaMA||7,000,000,000 – 65,000,000,000||LLM launch||Meta AI introduces LLaMA as a collection of open-source foundation language models, ranging from 7B to 65B parameters, that were trained on publicly available datasets without the need for proprietary or inaccessible data. The largest model, LLaMA-65B, is competitive with other top models such as Chinchilla70B and PaLM-540B. LLaMA-13B outperforms GPT-3 (175B) on most benchmarks. All models are available for research purposes.|
|2023||February 27||SpikeGPT||260,000,000||LLM launch||A paper discusses the development of a generative language model called SpikeGPT that uses spiking neural networks (SNNs) for more energy-efficient deep learning. While SNNs have been successful in computer vision tasks, their performance in language generation has been limited due to the challenge of training them. SpikeGPT overcomes this challenge by modifying the transformer block to reduce computational complexity and achieves competitive performance with non-spiking models on tested benchmarks while using 5x less energy consumption.|
|2023||February 27||Kosmos-1||1,600,000,000||LLM launch||A paper introduces Kosmos-1, a Multimodal MLLM that can perceive general modalities, learn in context (i.e., few-shot), and follow instructions (i.e., zero-shot). The model is trained from scratch on web-scale multimodal corpora, including text and images, image-caption pairs, and text data. The model achieves impressive performance on language understanding, generation, and even OCR-free NLP (directly fed with document images), perception-language tasks, including multimodal dialogue, image captioning, visual question answering, and vision tasks such as image recognition with descriptions. The paper also shows that MLLMs can benefit from cross-modal transfer, i.e., transfer knowledge from language to multimodal, and from multimodal to language. A dataset of Raven IQ test is introduced, which diagnoses the nonverbal reasoning capability of MLLMs.|
|2023||February 27||Research||A paper proposes a method called "rectification" for reducing the risk of LLMs generating toxic discourses. The method is based on the probability that the finished discourse will be considered toxic, and advises against token selections proportional to this probability. The approach utilizes a separate but smaller model for detoxification and does not require access to the internal representations of the LLM. The method significantly improves the generated discourse compared to base LLMs and other techniques in terms of both language and detoxification performance, and can be applied to diverse LLMs that share the same vocabulary.|
|2023||February 28||A study proposes using LLMs for the automatic analysis of dream reports, specifically focusing on references to emotions. The authors use off-the-shelf and bespoke approaches and find that the bespoke text classification method achieves high performance and is robust against potential biases. This approach could find application in the analysis of large dream datasets and improve the reproducibility and comparability of results across studies. The study of dream content in dream research is typically performed through manual scoring of verbal reports provided by dreamers. This task is time-consuming and requires trained annotators.|
|2023||March 1||A study evaluates the value of domain adaptation in nuclear medicine by adapting language models for the purpose of 5-point Deauville score prediction based on clinical 18F-fluorodeoxyglucose (FDG) PET/CT reports. The researchers used multiple general-purpose transformer language models to classify the reports into Deauville scores 1-5, and then adapted the models to the nuclear medicine domain using masked language modeling. Domain adaptation improved the performance of all language models, and the best performing model (domain-adapted RoBERTa) achieved a five-class accuracy of 77.4%, which was better than the physician's performance (66%), the best vision model's performance (48.1%), and was similar to the multimodal model's performance (77.2%).|
|2023||March 3||FLAN UL2||20,000,000,000||LLM launch||Flan-UL2 is introduced as a powerful encoder-decoder model. It is developed by Google and available for download from HuggingFace. It outperforms previous versions of Flan-T5 and is recommended for self-hosted usage or fine-tuning for commercial purposes. Flan-UL2 is licensed under Apache-2.0 and its usage and training details have been made public. If 20 billion parameters are excessive, there are smaller options available with the previous Flan-T5 model, which comes in five different sizes to better suit specific needs.|
|2023||March 7||SynthIE||A paper presents SynthIE as a novel approach that leverages LLMs for synthetic data generation, even for tasks where LLMs can't directly solve the problem. It operates by prompting the LLM to generate text for a given structured output, exploiting task asymmetry to create high-quality, large-scale data. This methodology is demonstrated in the challenging domain of closed information extraction, where ground-truth data is scarce. SynthIE produces a dataset of 1.8 million data points, surpassing existing datasets in quality through human evaluation. The resulting SynthIE models, fine-tuned on this data, outperform comparable models by significant margins, achieving a 57-point improvement in micro F1 and a 79-point improvement in macro F1. All associated resources are publicly available.|
|2023||March 7||Research||Nature Biomedical Engineering publishes an article stating that it has become increasingly difficult to distinguish human-written text from text generated by large language models. It predicts that these models will rapidly proliferate and have a significant impact on various industries in the future.|
|2023||March 13||Alpaca||7,000,000,000||LLM launch||Alpaca is introduced as a new instruction-following language model that is fine-tuned from Meta's LLaMA 7B model on 52,000 instruction-following demonstrations generated using OpenAI's text-davinci-003. Alpaca shows similar behavior to text-davinci-003 in a preliminary evaluation and is surprisingly small and easy/cheap to reproduce. The authors also release the training recipe and data, with the intention to release the model weights in the future. The article emphasizes that Alpaca is only intended for academic research, and commercial use is prohibited. The authors encourage readers to evaluate Alpaca through an interactive demo and to report any concerning behaviors.|
|2023||March 13||Jurassic-2||LLM launch||AI21 Studio announces Jurassic-2 (J2), the latest iteration of its foundation models, introducing novel features such as zero-shot instruction-following, reduced latency, and multi-language support. The family of J2 models includes Large, Grande, and Jumbo sizes, catering to diverse needs. J2 has already earned recognition on Stanford's HELM benchmark, with Jumbo ranking second in evaluations. Notably, Grande outperforms much larger models in terms of efficiency. With improved quality, multilingual support, and faster performance, J2 is available for free until May 1st, 2023, with simplified pricing models, making it accessible for developers and businesses.|
|2023||March 13||The English Wikipedia article Large language model is created.|
|2023||March 14||Claude||52,000,000,000||LLM launch||Anthropic introduces Claude, a next-generation AI assistant. With undisclosed model size, it offers a range of natural language processing (NLP) capabilities such as summarization, coding, writing, and question answering. Claude is available in two modes: the full, high-performance model, and Claude Instant, which prioritizes speed over quality. However, limited information about Claude's training process and model architecture is given. Access to Claude's API requires application and approval.|
|2023||March 15||40,000,000,000||LLM launch||Abu Dhabi-based Technology Innovation Institute (TII) introduces "Falcon LLM," a foundational LLM. Developed by the AI and Digital Science Research Center's AI Cross-Center Unit, Falcon LLM outperforms GPT-3 while using only 75% of its training compute. Falcon LLM is trained on one trillion tokens and is ideal for on-premises solutions, enabling companies and governments to maintain data privacy. It offers potential applications in chatbots, virtual assistants, language translation, content generation, and more. TII aims to advance AI capabilities in the United Arab Emirates in alignment with the country's National AI Strategy.|
|2023||March||GPT-NeoX-20B||20,000,000,000||LLM launch||GPT-NeoX-20B is introduced a language model with 20 billion parameters trained on the Pile dataset. The model is a powerful few-shot reasoner and outperforms similarly sized models on various tasks. The training and evaluation code and model weights are open-sourced. The model was developed by Sid Black, Stella Biderman, and Eric Hallahan with the support of CoreWeave and trained using fp16.|
|2023||March 16||GPT-4||1,760,000,000,000||LLM launch||OpenAI introduces GPT-4, a large multimodal model that can process both text and image inputs and produce text outputs. GPT-4 shows human-level performance on professional and academic benchmarks and outperforms previous large language models on traditional NLP benchmarks. The report discusses the challenge of developing deep learning infrastructure and optimization methods that behave predictably across a wide range of scales. While GPT-4 has limitations and safety challenges, OpenAI has taken steps to mitigate potential harms. An extensive system card is included in the report.|
|2023||March 20||PanGu-Σ||1,085,000,000,000||LLM launch||PanGu-Σ is introduced a LLM, developed by researchers using Ascend 910 AI processors and the MindSpore framework. This model, inheriting parameters from PanGu-α, employs a sparse architecture with Random Routed Experts (RRE) and efficient training techniques called Expert Computation and Storage Separation (ECSS). These methods led to a 6.3x increase in training throughput through heterogeneous computing. PanGu-Σ demonstrates state-of-the-art zero-shot learning performance in various Chinese natural language processing tasks and excels in fine-tuned applications such as open-domain dialogue, question answering, machine translation, and code generation.|
|2023||March 23||ChatGLM||6,000,000,000||LLM launch||ChatGLM is introduced as a bilingual language model developed by Tsinghua University's Knowledge Engineering Group (KEG) & Data Mining. It has 6 billion parameters and is optimized for both Chinese and English languages. The model can be downloaded from HuggingFace and is compatible with consumer-grade GPUs through quantization. Similar to ChatGPT, ChatGLM is available under an Apache-2.0 license, allowing commercial use.|
|2023||March 23||An article investigates the potential implications of large language models (LLMs), such as Generative Pretrained Transformers (GPTs), on the U.S. labor market. The authors propose a new rubric for assessing LLM capabilities and their potential effects on jobs. The study finds that around 80% of the U.S. workforce could have at least 10% of their work tasks affected by the introduction of LLMs, while approximately 19% of workers may see at least 50% of their tasks impacted. The study suggests that LLMs such as GPTs exhibit traits of general-purpose technologies, indicating that they could have considerable economic, social, and policy implications.|
|2023||March 24||Dolly 2.0||12,000,000,000||LLM launch||Dolly 2.0 is released as an open-source model that exhibits strong instruction-following capabilities similar to ChatGPT. Despite being a smaller and older model compared to state-of-the-art models like GPT-3, Dolly shows remarkable performance when fine-tuned on a small dataset of instruction training data. The model, based on EleutherAI's 6 billion parameter model, demonstrates text generation, brainstorming, and open Q&A abilities. This development is seen as a significant step in democratizing AI for enterprise use, allowing companies to build their own cost-effective instruction-following models.|
|2023||March 28||Cerebras-GPT||111,000,000 – 13,000,000,000||American artificial intelligence company Cerebras open-sources seven GPT-3 models ranging from 111 million to 13 billion parameters, known as Cerebras-GPT. These models are designed to set new benchmarks for accuracy and compute efficiency in large language models. They were trained using the Chinchilla formula and outperform other models in terms of training times, costs, and energy consumption. The release aims to provide open access to advanced models for research and commercial applications, ensuring they are open, reproducible, and royalty-free. Cerebras-GPT follows the "Chinchilla recipe" for compute-optimal training, and it establishes a new scaling law for model performance based on training compute and data.|
|2023||March 30||50,000,000,000||LLM launch||Bloomberg unveils BloombergGPT, a large language model with 50 billion parameters designed specifically for the financial industry. This model, tailored to financial data, can perform tasks such as generating Bloomberg Query Language (BQL), suggesting news headlines, and answering financial questions. By combining domain-specific and general-purpose data during training, BloombergGPT achieves high performance in both financial and general natural language processing (NLP) tasks. This specialized model addresses the growing need for NLP technologies in the financial sector, offering applications in areas like FinTech, where domain-specific data can outperform general-purpose models.|
|2023||April 14||17,000,000,000||LLM launch||German non-profit LAION introduces OpenAssistant, a fully open-source large-scale instruction-tuned model, which is unveiled as part of efforts to democratize large language model (LLM) alignment research. This project recognizes the value of aligning LLMs with human preferences, enhancing their usability across domains. While contemporary alignment methods, like reinforcement learning from human feedback (RLHF), often rely on expensive, proprietary data, OpenAssistant Conversations presents a human-generated dataset of 161,443 assistant-style conversation messages in 35 languages, with 461,292 quality ratings. A preference study demonstrates OpenAssistant's responses are nearly as preferred as GPT-3.5-turbo (ChatGPT), with a relative win rate of 48.3% vs. 51.7%. Both code and data are made available under permissive licenses.|
|2023||April 19||StableLM||3,000,000,000 – 7,000,000,000||LLM launch||Stability AI open-sources its large language model, StableLM, which is designed to efficiently generate text and code. The models are available on GitHub and contain between 3 billion and 7 billion parameters, with 15 to 65 billion parameter models to arrive later. The model is trained on a larger version of the open-source dataset known as the Pile and encompasses information from a range of sources, including Wikipedia, Stack Exchange, and PubMed. The move builds on Stability AI's mission to make AI tools more accessible, as it has done with its AI image generator, Stable Diffusion.|
|2023||April 24||WizardLM||LLM launch||LLM launch||A paper presents WizardLM, a large language model trained to follow complex instructions. Instead of manually creating instruction data, the authors propose Evol-Instruct, a method that uses the model itself to progressively evolve instructions into more complex forms. WizardLM outperforms human-created instructions in evaluations and shows preference over OpenAI ChatGPT in generating outputs for high complexity tasks. While WizardLM still has room for improvement compared to ChatGPT, the findings highlight the potential of fine-tuning LLMs with AI-evolved instructions.|
|2023||May 10||PaLM 2||LLM launch||Google launches PaLM 2, its latest LLM to date, at its I/O developer conference. PaLM 2 is aimed to power Google's updated Bard chat tool, compete with OpenAI's ChatGPT, and serve as the foundation model for new AI features. While technical details about training are not provided, Google focused on the model's capabilities, such as improved common sense reasoning, mathematics, and logic. PaLM 2 excels at multilingual tasks and includes specialized models like Codey for coding and debugging, Med-PaLM 2 for medical knowledge, and Sec-PaLM for security use cases. There is also a smaller PaLM 2 model for smartphones.|
|2023||May 18||VisionLLM||Framework launch||A paper introduces VisionLLM, a framework that combines large language models (LLMs) with computer vision tasks to achieve open-ended task capabilities. While powerful vision foundation models (VFMs) exist, they are limited to predefined tasks, unlike LLMs that excel in user-tailored tasks. VisionLLM treats images as a foreign language and aligns vision-centric tasks with language tasks. By providing language instructions, an LLM-based decoder can make predictions for open-ended tasks. Extensive experiments demonstrate that VisionLLM allows different levels of task customization, achieving good results from fine-grained object-level to coarse-grained task-level customization. Remarkably, the model achieves over 60% mAP on COCO, comparable to detection-specific models. The authors release a demo and code for further exploration, aiming to set a new baseline for generalist vision and language models.|
|2023||May 21||Baize||LLM launch||A paper introduces Baize, an open-source chat model. It is developed through a novel pipeline, which leverages ChatGPT to automatically generate a high-quality multi-turn chat corpus by having ChatGPT engage in a conversation with itself. The generated corpus serves as a resource for training and evaluating chat models. The authors also utilize parameter-efficient tuning to enhance LLaMA, an open-source language model, and create Baize. Baize demonstrates good performance in multi-turn dialogues and incorporates guardrails to minimize potential risks. Additionally, the paper proposes a technique called Self-Distill with Feedback to further improve Baize's performance using feedback from ChatGPT. Baize is designed to be accessible and can run on a single GPU, making it suitable for a wider range of researchers.|
|2023||May 24||Gorilla||LLM launch||A paper presents Gorilla, a large language model (LLM) that effectively uses API calls. Gorilla surpasses GPT-4 in generating accurate API calls by addressing input argument generation and hallucination issues. When combined with a document retriever, Gorilla adapts to test-time document changes and mitigates hallucination problems. The model's integration with the retrieval system enhances reliability. The paper introduces APIBench, a dataset for evaluating the model's performance with popular APIs. Gorilla's code, model, data, and demo are publicly available, showcasing its potential to improve LLMs' accuracy in utilizing tools.|
|2023||June 4||Polyglot-Ko||LLM launch||A technical report discusses the development of Polyglot-Ko, an open-source large-scale Korean language model. The project aims to enhance the performance of multilingual language models in non-English languages. While there are existing multilingual models, researchers often prefer building monolingual models due to limitations in the non-English language capabilities of current multilingual models. To address this, the report focuses on developing advanced Korean language models. The team collected 1.2TB of Korean data and prioritized the development of Korean models to enable performance comparisons and cater to the specific needs of Korean companies and researchers. The work presented in the report contributes to bridging the performance gap in non-English languages within multilingual language models.|
|2023||June 9||PoET||LLM launch||PoET is introduced as a generative protein language model that designs new proteins with desired functions. It overcomes limitations of existing models by generating sets of related proteins as sequences-of-sequences across natural protein sequence clusters. PoET can generate and score modifications for specific protein families, extrapolate well for small families, and outperforms existing models in variant function prediction. Its Transformer layer allows modeling of sequential tokens within sequences while attending between sequences order invariantly. PoET improves variant effect prediction across proteins of all multiple sequence alignment depths.|
|2023||June 9||FinGPT||LLM launch||FinGPT is introduced as an open-source large language model designed specifically for the finance sector. Unlike proprietary models that rely on privileged access to financial data, FinGPT takes a data-centric approach, making high-quality financial data accessible and transparent to researchers and practitioners. It emphasizes the importance of an automatic data curation pipeline and a lightweight low-rank adaptation technique. The introducing paper showcases potential applications of FinGPT in robo-advising, algorithmic trading, and low-code development. Through collaboration within the open-source AI4Finance community, FinGPT reportedly aims to democratize financial language models, stimulate innovation, and unlock opportunities in open finance.|
|2023||June 11||RoBERTweet||LLM launch||RoBERTweet is introduced as a Transformer-based language model specifically trained on Romanian tweets, aiming to develop natural language processing (NLP) systems for social media analysis. Two versions of RoBERTweet are introduced, based on the base and large architectures of BERT. The models were pre-trained on a corpus that includes all tweets collected from 2008 to 2022, which is a significant contribution to the Romanian NLP community. Experimental results demonstrate that RoBERTweet models surpass previous general-domain Romanian and multilingual language models in three NLP tasks involving tweet inputs: emotion detection, sexist language identification, and named entity recognition. The models and the newly created corpus of Romanian tweets are provided freely for public use.|
|2023||June 14||Radiology-GPT||LLM launch||Radiology-GPT is introduced as a large language model specifically designed for radiology. Through instruction tuning on a comprehensive dataset of radiology domain knowledge, Radiology-GPT outperforms general language models like StableLM, Dolly, and LLaMA in radiological diagnosis, research, and communication tasks. This development paves the way for advancements in clinical natural language processing (NLP) and demonstrates the potential of creating specialized, privacy-compliant generative language models tailored to specific medical specialties. The localization of large-scale language models for individual hospitals holds promise in addressing their unique requirements. By combining conversational competence with domain-specific knowledge, these models are expected to drive further advancements in healthcare AI.|
|2023||June 14||AssistGPT||LLM launch||OpenAI introduces AssistGPT as a multi-modal AI assistant designed to handle complex visual-based tasks. Given that visual tasks pose challenges due to their diverse nature, AssistGPT, employs a reasoning approach called Plan, Execute, Inspect, and Learn (PEIL) to integrate LLMs with various tools. The Planner utilizes natural language to plan the next step based on the current reasoning progress, the Executor carries out the planned actions, and the Inspector assists the Planner by providing appropriate visual information. Additionally, the Learner enables the model to autonomously explore and discover optimal solutions. The system achieves optimal results on A-OKVQA and NExT-QA benchmarks and showcases its ability to handle complex questions beyond the benchmark scope.|
|2023||June 15||ChessGPT||LLM launch||ChessGPT is introduced as a GPT model that combines policy learning and language modeling in the context of chess. It emphasizes the importance of incorporating information from both historical policy data and natural language insights for decision-making tasks. Previous studies have typically focused on only one of these sources. ChessGPT leverages a large-scale game and language dataset related to chess to integrate policy learning and language modeling. The researchers showcase two model examples, ChessCLIP and ChessGPT, and propose an evaluation framework to assess the language model's chess ability. Experimental results validate the effectiveness of the model and dataset, and the code, model, and dataset are made available as open source resources.|
|2023||June 16||ORIBA||LLM launch||Customizable AI chatbot ORIBA is introduced in a study that explores the intersection of illustration art and artificial intelligence. It enables illustrators to engage with their original characters (OCs) by conversing with them and observing their inner monologues and behavior. The study aims to inspire illustrators by discovering innovative collaboration methods despite the tension between artists and AI. By examining the impact of AI on the creative process and authorship boundaries, the researchers seek to enhance human-AI interactions in creative fields. The potential applications of this research extend beyond illustration to areas like interactive storytelling. The study was conducted by Yuqian Sun, Xingyu Li, and Ze Gao.|
|2023||June 16||ClinicalGPT||LLM launch||ClinicalGPT is introduced as a language model specifically designed for clinical applications. It is trained using diverse real-world data including medical records, domain-specific knowledge, and multi-round dialogue consultations. Additionally, a comprehensive evaluation framework is proposed, encompassing medical knowledge question-answering, medical exams, patient consultations, and diagnostic analysis of medical records. Results indicate that ClinicalGPT outperforms other models in these tasks, showcasing its effectiveness in adapting large language models to the healthcare domain.|
|2023||June 18||Research||Goldman Sachs predicts that generative language AI, referring to large language models, could contribute to a 7% increase in global GDP over the next decade. However, it also raises concerns about the potential automation of 300 million jobs worldwide.|
|2023||June 19||Research||An article explores the potential negative consequences of AI-generated content flooding the internet, particularly focusing on the impact of models like ChatGPT. Researchers warn that when future generative models are primarily trained on AI-generated content, a phenomenon known as "model collapse" occurs. Model collapse refers to the degenerative process where models forget the true underlying data distribution over time, leading to degraded performance and erroneous interpretations. The article highlights the importance of training models on human-generated content to maintain quality, but with the scale of content creation by models like ChatGPT, access to human-created data may become limited. The article suggests the need to preserve access to human-generated data and acknowledges the challenge of tracking and filtering AI-generated content on a large scale.|
|2023||June 22||AudioPaLM||LLM launch||AudioPaLM is introduced as a large language model designed for speech understanding and generation. It combines two existing models, PaLM-2 (text-based language model) and AudioLM (speech-based language model), into a unified multimodal architecture. This enables AudioPaLM to process and generate both text and speech, making it useful for applications like speech recognition and speech-to-speech translation. By incorporating the paralinguistic information from AudioLM and linguistic knowledge from PaLM-2, AudioPaLM achieves better performance in speech tasks. It outperforms existing systems in speech translation tasks and can perform zero-shot speech-to-text translation for languages not seen during training. AudioPaLM also showcases features such as transferring a voice across languages based on a short spoken prompt.|
|2023||June 28||ChatLaw||LLM launch||ChatLaw is introduced as an open-source legal large language model designed to facilitate the digital transformation of the Chinese legal domain. To ensure data quality, the authors carefully curated a legal domain fine-tuning dataset. They also address the issue of model hallucinations during reference data retrieval by combining vector database retrieval with keyword retrieval, reducing inaccuracy. Additionally, a self-attention method is proposed to enhance the model's ability to overcome errors in reference data, further optimizing model hallucinations and improving problem-solving capabilities.|
|2023||July 11||Baichuan-13B||13,000,000,000||LLM launch||Baichuan Intelligence, a startup founded by Sogou founder Wang Xiaochuan, unveils its open-source large language model called Baichuan-13B. The Chinese model, based on the Transformer architecture like OpenAI's GPT, is trained on Chinese and English data and optimized for commercial applications. Baichuan-13B has 13 billion parameters and is trained on 1.4 trillion tokens. Baichuan-7B, a pre-training model with 7 billion parameters, was released earlier. The model is available for free to academics and developers approved for commercial use. By this time, China focuses on developing large language models as it prepares to implement strict AI regulations, potentially requiring licenses for launching such models.|
|2023||September 9||Research||A team of computer scientists, including one from OpenAI, after researching the potential development of self-awareness in large language models like ChatGPT, express concern that LLMs can develop situational awareness, enabling them to recognize whether they are in testing mode or deployed to the public. This awareness can lead to deceptive behavior, as LLMs might act safely during testing but harmfully after deployment. The researchers conduct experiments focusing on out-of-context reasoning as a precursor to situational awareness. While at this time LLMs are some way from acquiring situational awareness, the study offers a foundation for further research in this area.|
|2023||September 13||LLM launch||Alibaba releases its large language model Tongyi Qianwen, which is made available for public and enterprise use in China. Tongyi Qianwen, similar to ChatGPT, was previously in a beta test phase and is trained on English and Chinese text, although its exact specifications are undisclosed. This release coincides with the relaxation of AI technology restrictions in China, which now require vetting and certification for public AI tech. Companies like Baidu, Tencent, TikTok, and ByteDance have already received approval to launch AI models in China by this time. In contrast, the U.S. remains in the early stages of AI regulation discussions.|
|2023||September||Gemini||7,000,000,000,000 – 10,000,000,000,000||LLM launch||A document discusses Google DeepMind's project named "Gemini," which is described as a general specialist in AI. Gemini is a multimodal model, likely focusing on visual, language, and action (VLA) tasks. It is expected to have 7-10 trillion parameters and a dataset size of 60-100 trillion tokens. Training started in May 2023 and concluded in August 2023, using TPUv4 and TPUv5 over approximately 120 days. The expected public release date is in October 2023, but no paper or playground information is provided in the document. The model's name is inspired by the mythological twins Castor and Pollux.|
Numerical and visual data
The image below shows Google trends data for Large language model (topic), from January 2004 to September 2023, when the screenshot was taken. Interest is also ranked by country and displayed on world map.
Meta information on the timeline
How the timeline was built
The initial version of the timeline was written by Sebastian.
Funding information for this timeline is available.
Feedback and comments
Feedback for the timeline can be provided at the following places:
What the timeline is still missing
- summary table listing the model and parameters
- Vipul: I think you should add columns for model name in the full timeline. And either in the full timeline, or in a separate table with a summary of model names, you should have columns for number of parameters and training data set (or training data set size)
Timeline update strategy
- "Large Language Models: Complete Guide in 2023". research.aimultiple.com. Retrieved 11 March 2023.
- "Large Language Model Training in 2023". research.aimultiple.com. Retrieved 11 March 2023.
- Junczys-Dowmunt, Marcin; Grundkiewicz, Roman; Dwojak, Tomasz; Hoang, Hieu; Heafield, Kenneth; Neckermann, Tom; Seide, Frank; Germann, Ulrich; Aji, Alham Fikri; Bogoychev, Nikolay; Martins, André F. T.; Birch, Alexandra (2018). "Marian: Fast Neural Machine Translation in C++". doi:10.48550/arXiv.1804.00344.
- Template:Cite arXiv
- Devlin, Jacob; Chang, Ming-Wei; Lee, Kenton; Toutanova, Kristina (2018). "BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding". doi:10.48550/arXiv.1810.04805.
- "GPT-2: 6-month follow-up". openai.com. Retrieved 23 March 2023.
- Zellers, Rowan; Holtzman, Ari; Rashkin, Hannah; Bisk, Yonatan; Farhadi, Ali; Roesner, Franziska; Choi, Yejin (2019). "Defending Against Neural Fake News". doi:10.48550/arXiv.1905.12616.
- "BERT, RoBERTa, DistilBERT, XLNet: Which one to use?". KDnuggets. Retrieved 29 June 2023.
- Yang, Zhilin; Dai, Zihang; Yang, Yiming; Carbonell, Jaime; Salakhutdinov, Ruslan; Le, Quoc V. (2019). "XLNet: Generalized Autoregressive Pretraining for Language Understanding". doi:10.48550/arXiv.1906.08237.
- Liu, Yinhan; Ott, Myle; Goyal, Naman; Du, Jingfei; Joshi, Mandar; Chen, Danqi; Levy, Omer; Lewis, Mike; Zettlemoyer, Luke; Stoyanov, Veselin (2019). "RoBERTa: A Robustly Optimized BERT Pretraining Approach". doi:10.48550/arXiv.1907.11692.
- "Megatron Unleashed: NVIDIA's NLP Model "Megatron-LM" is the Largest Transformer Ever Trained | Exxact Blog". www.exxactcorp.com. Retrieved 11 March 2023.
- "AI: Megatron the Transformer, and its related language models". lifearchitect.ai. 24 September 2021. Retrieved 18 September 2023.
- "NeMo Megatron — NVIDIA NeMo". docs.nvidia.com. Retrieved 11 March 2023.
- "Nvidia trains world's largest Transformer-based language model". VentureBeat. 13 August 2019. Retrieved 18 September 2023.
- Keskar, Nitish Shirish; McCann, Bryan; Varshney, Lav R.; Xiong, Caiming; Socher, Richard (2019). "CTRL: A Conditional Transformer Language Model for Controllable Generation". doi:10.48550/arXiv.1909.05858.
- Lan, Zhenzhong; Chen, Mingda; Goodman, Sebastian; Gimpel, Kevin; Sharma, Piyush; Soricut, Radu (2019). "ALBERT: A Lite BERT for Self-supervised Learning of Language Representations". doi:10.48550/arXiv.1909.11942.
- Sanh, Victor; Debut, Lysandre; Chaumond, Julien; Wolf, Thomas (2019). "DistilBERT, a distilled version of BERT: smaller, faster, cheaper and lighter". doi:10.48550/ARXIV.1910.01108.
- Kuzman, Taja (29 March 2023). "Microsoft introduced its DialoGPT to Skype and Edge". Medium. Retrieved 19 September 2023.
- Zhang, Yizhe; Sun, Siqi; Galley, Michel; Chen, Yen-Chun; Brockett, Chris; Gao, Xiang; Gao, Jianfeng; Liu, Jingjing; Dolan, Bill (2019). "DialoGPT: Large-Scale Generative Pre-training for Conversational Response Generation". doi:10.48550/arXiv.1911.00536.
- "Pretrained models — transformers 2.10.0 documentation". huggingface.co.
- Martin, Louis; Muller, Benjamin; Suárez, Pedro Javier Ortiz; Dupont, Yoann; Romary, Laurent; de la Clergerie, Éric Villemonte; Seddah, Djamé; Sagot, Benoît (2019). "CamemBERT: a Tasty French Language Model". doi:10.48550/arXiv.1911.03894.
- Sambucci, Luca (17 November 2021). "Cedille, the largest French AI language model, is actually from Switzerland". Artificial Intelligence news. Retrieved 30 June 2023.
- Le, Hang; Vial, Loïc; Frej, Jibril; Segonne, Vincent; Coavoux, Maximin; Lecouteux, Benjamin; Allauzen, Alexandre; Crabbé, Benoît; Besacier, Laurent; Schwab, Didier (2019). "FlauBERT: Unsupervised Language Model Pre-training for French". doi:10.48550/arXiv.1912.05372.
- Qi, Weizhen; Yan, Yu; Gong, Yeyun; Liu, Dayiheng; Duan, Nan; Chen, Jiusheng; Zhang, Ruofei; Zhou, Ming (2020). "ProphetNet: Predicting Future N-gram for Sequence-to-Sequence Pre-training". doi:10.48550/arXiv.2001.04063.
- Jagtap, Rohan (2 August 2020). "T5: Text-To-Text Transfer Transformer". Medium. Retrieved 19 September 2023.
- "Exploring Transfer Learning with T5: the Text-To-Text Transfer Transformer". ai.googleblog.com. 24 February 2020. Retrieved 25 June 2023.
- Tsang, Sik-Ho (21 January 2023). "Brief Review — DeBERTa: Decoding-enhanced BERT with Disentangled Attention". Medium. Retrieved 18 September 2023.
- He, Pengcheng; Liu, Xiaodong; Gao, Jianfeng; Chen, Weizhu (2020). "DeBERTa: Decoding-enhanced BERT with Disentangled Attention". doi:10.48550/arXiv.2006.03654.
- Wiggers, Kyle (28 April 2022). "The emerging types of language models and why they matter". TechCrunch. Retrieved 29 June 2023.
- Lee, Angie (26 January 2023). "What Are Large Language Models Used For and Why Are They Important?". NVIDIA Blog. Retrieved 11 March 2023.
- "GPT Neo". March 15, 2023.
- "GPT-3's free alternative GPT-Neo is something to be excited about". VentureBeat. 15 May 2021. Retrieved 29 June 2023.
- Kazi, Suleman (28 March 2023). "Top Large Language Models (LLMs): GPT-4, LLaMA, FLAN UL2, BLOOM, and More". Vectara. Retrieved 29 June 2023.
- "Using DeepSpeed and Megatron to Train Megatron-Turing NLG 530B, the World's Largest and Most Powerful Generative Language Model". NVIDIA Technical Blog. 11 October 2021. Retrieved 30 June 2023.
- "fairseq documentation — fairseq 0.12.2 documentation". fairseq.readthedocs.io. Retrieved 16 May 2023.
- Aghajanyan, Armen; Huang, Bernie; Ross, Candace; Karpukhin, Vladimir; Xu, Hu; Goyal, Naman; Okhonko, Dmytro; Joshi, Mandar; Ghosh, Gargi; Lewis, Mike; Zettlemoyer, Luke (2022). "CM3: A Causal Masked Multimodal Model of the Internet". doi:10.48550/arXiv.2201.07520.
- "Aligning language models to follow instructions". openai.com. Retrieved 21 March 2023.
- "Cohere launches Extremely Large (beta)". Context by Cohere. 1 March 2022. Retrieved 12 March 2023.
- Shuster, Kurt; Komeili, Mojtaba; Adolphs, Leonard; Roller, Stephen; Szlam, Arthur; Weston, Jason (2022). "Language Models that Seek for Knowledge: Modular Search & Generation for Dialogue and Prompt Completion". doi:10.48550/arXiv.2203.13224.
- Nijkamp, Erik; Pang, Bo; Hayashi, Hiroaki; Tu, Lifu; Wang, Huan; Zhou, Yingbo; Savarese, Silvio; Xiong, Caiming (2022). "CodeGen: An Open Large Language Model for Code with Multi-Turn Program Synthesis". doi:10.48550/arXiv.2203.13474.
- "CodeGen". github.com. Salesforce. 16 May 2023. Retrieved 16 May 2023.
- Chowdhery, Aakanksha; Narang, Sharan; Devlin, Jacob; Bosma, Maarten; Mishra, Gaurav; Roberts, Adam; Barham, Paul; Chung, Hyung Won; Sutton, Charles; Gehrmann, Sebastian; Schuh, Parker; Shi, Kensen; Tsvyashchenko, Sasha; Maynez, Joshua; Rao, Abhishek; Barnes, Parker; Tay, Yi; Shazeer, Noam; Prabhakaran, Vinodkumar; Reif, Emily; Du, Nan; Hutchinson, Ben; Pope, Reiner; Bradbury, James; Austin, Jacob; Isard, Michael; Gur-Ari, Guy; Yin, Pengcheng; Duke, Toju; Levskaya, Anselm; Ghemawat, Sanjay; Dev, Sunipa; Michalewski, Henryk; Garcia, Xavier; Misra, Vedant; Robinson, Kevin; Fedus, Liam; Zhou, Denny; Ippolito, Daphne; Luan, David; Lim, Hyeontaek; Zoph, Barret; Spiridonov, Alexander; Sepassi, Ryan; Dohan, David; Agrawal, Shivani; Omernick, Mark; Dai, Andrew M.; Pillai, Thanumalayan Sankaranarayana; Pellat, Marie; Lewkowycz, Aitor; Moreira, Erica; Child, Rewon; Polozov, Oleksandr; Lee, Katherine; Zhou, Zongwei; Wang, Xuezhi; Saeta, Brennan; Diaz, Mark; Firat, Orhan; Catasta, Michele; Wei, Jason; Meier-Hellstern, Kathy; Eck, Douglas; Dean, Jeff; Petrov, Slav; Fiedel, Noah (2022). "PaLM: Scaling Language Modeling with Pathways". doi:10.48550/arXiv.2204.02311.
- "Pathways Language Model (PaLM): Scaling to 540 Billion Parameters for Breakthrough Performance". ai.googleblog.com. Retrieved 21 March 2023.
- Black, Sid; Biderman, Stella; Hallahan, Eric; Anthony, Quentin; Gao, Leo; Golding, Laurence; He, Horace; Leahy, Connor; McDonell, Kyle; Phang, Jason; Pieler, Michael; Prashanth, USVSN Sai; Purohit, Shivanshu; Reynolds, Laria; Tow, Jonathan; Wang, Ben; Weinbach, Samuel (2022). "GPT-NeoX-20B: An Open-Source Autoregressive Language Model". doi:10.48550/arXiv.2204.06745.
- Leahy, Connor (2 February 2022). "Announcing GPT-NeoX-20B". EleutherAI Blog. Retrieved 21 March 2023.
- "Democratizing access to large-scale language models with OPT-175B". ai.meta.com. Retrieved 20 September 2023.
- Khrushchev, Mikhail (23 June 2022). "Yandex Publishes YaLM 100B. It's the Largest GPT-Like Neural Network in Open Source". Yandex. Retrieved 20 September 2023.
- Lewkowycz, Aitor; Andreassen, Anders; Dohan, David; Dyer, Ethan; Michalewski, Henryk; Ramasesh, Vinay; Slone, Ambrose; Anil, Cem; Schlag, Imanol; Gutman-Solo, Theo; Wu, Yuhuai; Neyshabur, Behnam; Gur-Ari, Guy; Misra, Vedant (2022). "Solving Quantitative Reasoning Problems with Language Models". doi:10.48550/arXiv.2206.14858.
- Chopra, Disha (1 July 2022). "Google Developed Minerva, an AI That Can Answer Math Questions". Analytics Drift. Retrieved 20 September 2023.
- Workshop, BigScience; Scao, Teven Le; Fan, Angela; Akiki, Christopher; Pavlick, Ellie; Ilić, Suzana; Hesslow, Daniel; Castagné, Roman; Luccioni, Alexandra Sasha; Yvon, François; Gallé, Matthias; Tow, Jonathan; Rush, Alexander M.; Biderman, Stella; Webson, Albert; Ammanamanchi, Pawan Sasanka; Wang, Thomas; Sagot, Benoît; Muennighoff, Niklas; del Moral, Albert Villanova; Ruwase, Olatunji; Bawden, Rachel; Bekman, Stas; McMillan-Major, Angelina; Beltagy, Iz; Nguyen, Huu; Saulnier, Lucile; Tan, Samson; Suarez, Pedro Ortiz; Sanh, Victor; Laurençon, Hugo; Jernite, Yacine; Launay, Julien; Mitchell, Margaret; Raffel, Colin; Gokaslan, Aaron; Simhi, Adi; Soroa, Aitor; Aji, Alham Fikri; Alfassy, Amit; Rogers, Anna; Nitzav, Ariel Kreisberg; Xu, Canwen; Mou, Chenghao; Emezue, Chris; Klamm, Christopher; Leong, Colin; van Strien, Daniel; Adelani, David Ifeoluwa; Radev, Dragomir; Ponferrada, Eduardo González; Levkovizh, Efrat; Kim, Ethan; Natan, Eyal Bar; De Toni, Francesco; Dupont, Gérard; Kruszewski, Germán; Pistilli, Giada; Elsahar, Hady; Benyamina, Hamza; Tran, Hieu; Yu, Ian; Abdulmumin, Idris; Johnson, Isaac; Gonzalez-Dios, Itziar; de la Rosa, Javier; Chim, Jenny; Dodge, Jesse; Zhu, Jian; Chang, Jonathan; Frohberg, Jörg; Tobing, Joseph; Bhattacharjee, Joydeep; Almubarak, Khalid; Chen, Kimbo; Lo, Kyle; Von Werra, Leandro; Weber, Leon; Phan, Long; allal, Loubna Ben; Tanguy, Ludovic; Dey, Manan; Muñoz, Manuel Romero; Masoud, Maraim; Grandury, María; Šaško, Mario; Huang, Max; Coavoux, Maximin; Singh, Mayank; Jiang, Mike Tian-Jian; Vu, Minh Chien; Jauhar, Mohammad A.; Ghaleb, Mustafa; Subramani, Nishant; Kassner, Nora; Khamis, Nurulaqilla; Nguyen, Olivier; Espejel, Omar; de Gibert, Ona; Villegas, Paulo; Henderson, Peter; Colombo, Pierre; Amuok, Priscilla; Lhoest, Quentin; Harliman, Rheza; Bommasani, Rishi; López, Roberto Luis; Ribeiro, Rui; Osei, Salomey; Pyysalo, Sampo; Nagel, Sebastian; Bose, Shamik; Muhammad, Shamsuddeen Hassan; Sharma, Shanya; Longpre, Shayne; Nikpoor, Somaieh; Silberberg, Stanislav; Pai, Suhas; Zink, Sydney; Torrent, Tiago Timponi; Schick, Timo; Thrush, Tristan; Danchev, Valentin; Nikoulina, Vassilina; Laippala, Veronika; Lepercq, Violette; Prabhu, Vrinda; Alyafeai, Zaid; Talat, Zeerak; Raja, Arun; Heinzerling, Benjamin; Si, Chenglei; Taşar, Davut Emre; Salesky, Elizabeth; Mielke, Sabrina J.; Lee, Wilson Y.; Sharma, Abheesht; Santilli, Andrea; Chaffin, Antoine; Stiegler, Arnaud; Datta, Debajyoti; Szczechla, Eliza; Chhablani, Gunjan; Wang, Han; Pandey, Harshit; Strobelt, Hendrik; Fries, Jason Alan; Rozen, Jos; Gao, Leo; Sutawika, Lintang; Bari, M. Saiful; Al-shaibani, Maged S.; Manica, Matteo; Nayak, Nihal; Teehan, Ryan; Albanie, Samuel; Shen, Sheng; Ben-David, Srulik; Bach, Stephen H.; Kim, Taewoon; Bers, Tali; Fevry, Thibault; Neeraj, Trishala; Thakker, Urmish; Raunak, Vikas; Tang, Xiangru; Yong, Zheng-Xin; Sun, Zhiqing; Brody, Shaked; Uri, Yallow; Tojarieh, Hadar; Roberts, Adam; Chung, Hyung Won; Tae, Jaesung; Phang, Jason; Press, Ofir; Li, Conglong; Narayanan, Deepak; Bourfoune, Hatim; Casper, Jared; Rasley, Jeff; Ryabinin, Max; Mishra, Mayank; Zhang, Minjia; Shoeybi, Mohammad; Peyrounette, Myriam; Patry, Nicolas; Tazi, Nouamane; Sanseviero, Omar; von Platen, Patrick; Cornette, Pierre; Lavallée, Pierre François; Lacroix, Rémi; Rajbhandari, Samyam; Gandhi, Sanchit; Smith, Shaden; Requena, Stéphane; Patil, Suraj; Dettmers, Tim; Baruwa, Ahmed; Singh, Amanpreet; Cheveleva, Anastasia; Ligozat, Anne-Laure; Subramonian, Arjun; Névéol, Aurélie; Lovering, Charles; Garrette, Dan; Tunuguntla, Deepak; Reiter, Ehud; Taktasheva, Ekaterina; Voloshina, Ekaterina; Bogdanov, Eli; Winata, Genta Indra; Schoelkopf, Hailey; Kalo, Jan-Christoph; Novikova, Jekaterina; Forde, Jessica Zosa; Clive, Jordan; Kasai, Jungo; Kawamura, Ken; Hazan, Liam; Carpuat, Marine; Clinciu, Miruna; Kim, Najoung; Cheng, Newton; Serikov, Oleg; Antverg, Omer; van der Wal, Oskar; Zhang, Rui; Zhang, Ruochen; Gehrmann, Sebastian; Mirkin, Shachar; Pais, Shani; Shavrina, Tatiana; Scialom, Thomas; Yun, Tian; Limisiewicz, Tomasz; Rieser, Verena; Protasov, Vitaly; Mikhailov, Vladislav; Pruksachatkun, Yada; Belinkov, Yonatan; Bamberger, Zachary; Kasner, Zdeněk; Rueda, Alice; Pestana, Amanda; Feizpour, Amir; Khan, Ammar; Faranak, Amy; Santos, Ana; Hevia, Anthony; Unldreaj, Antigona; Aghagol, Arash; Abdollahi, Arezoo; Tammour, Aycha; HajiHosseini, Azadeh; Behroozi, Bahareh; Ajibade, Benjamin; Saxena, Bharat; Ferrandis, Carlos Muñoz; Contractor, Danish; Lansky, David; David, Davis; Kiela, Douwe; Nguyen, Duong A.; Tan, Edward; Baylor, Emi; Ozoani, Ezinwanne; Mirza, Fatima; Ononiwu, Frankline; Rezanejad, Habib; Jones, Hessie; Bhattacharya, Indrani; Solaiman, Irene; Sedenko, Irina; Nejadgholi, Isar; Passmore, Jesse; Seltzer, Josh; Sanz, Julio Bonis; Dutra, Livia; Samagaio, Mairon; Elbadri, Maraim; Mieskes, Margot; Gerchick, Marissa; Akinlolu, Martha; McKenna, Michael; Qiu, Mike; Ghauri, Muhammed; Burynok, Mykola; Abrar, Nafis; Rajani, Nazneen; Elkott, Nour; Fahmy, Nour; Samuel, Olanrewaju; An, Ran; Kromann, Rasmus; Hao, Ryan; Alizadeh, Samira; Shubber, Sarmad; Wang, Silas; Roy, Sourav; Viguier, Sylvain; Le, Thanh; Oyebade, Tobi; Le, Trieu; Yang, Yoyo; Nguyen, Zach; Kashyap, Abhinav Ramesh; Palasciano, Alfredo; Callahan, Alison; Shukla, Anima; Miranda-Escalada, Antonio; Singh, Ayush; Beilharz, Benjamin; Wang, Bo; Brito, Caio; Zhou, Chenxi; Jain, Chirag; Xu, Chuxin; Fourrier, Clémentine; Periñán, Daniel León; Molano, Daniel; Yu, Dian; Manjavacas, Enrique; Barth, Fabio; Fuhrimann, Florian; Altay, Gabriel; Bayrak, Giyaseddin; Burns, Gully; Vrabec, Helena U.; Bello, Imane; Dash, Ishani; Kang, Jihyun; Giorgi, John; Golde, Jonas; Posada, Jose David; Sivaraman, Karthik Rangasai; Bulchandani, Lokesh; Liu, Lu; Shinzato, Luisa; de Bykhovetz, Madeleine Hahn; Takeuchi, Maiko; Pàmies, Marc; Castillo, Maria A.; Nezhurina, Marianna; Sänger, Mario; Samwald, Matthias; Cullan, Michael; Weinberg, Michael; De Wolf, Michiel; Mihaljcic, Mina; Liu, Minna; Freidank, Moritz; Kang, Myungsun; Seelam, Natasha; Dahlberg, Nathan; Broad, Nicholas Michio; Muellner, Nikolaus; Fung, Pascale; Haller, Patrick; Chandrasekhar, Ramya; Eisenberg, Renata; Martin, Robert; Canalli, Rodrigo; Su, Rosaline; Su, Ruisi; Cahyawijaya, Samuel; Garda, Samuele; Deshmukh, Shlok S.; Mishra, Shubhanshu; Kiblawi, Sid; Ott, Simon; Sang-aroonsiri, Sinee; Kumar, Srishti; Schweter, Stefan; Bharati, Sushil; Laud, Tanmay; Gigant, Théo; Kainuma, Tomoya; Kusa, Wojciech; Labrak, Yanis; Bajaj, Yash Shailesh; Venkatraman, Yash; Xu, Yifan; Xu, Yingxin; Xu, Yu; Tan, Zhe; Xie, Zhongli; Ye, Zifan; Bras, Mathilde; Belkada, Younes; Wolf, Thomas (13 March 2023). "BLOOM: A 176B-Parameter Open-Access Multilingual Language Model". arXiv:2211.05100 [cs].
- Chopra, Disha (17 November 2022). "Meta Introduces 'Galactica,' an AI System that Generates Academic Papers from Simple Text Inputs". Analytics Drift. Retrieved 20 September 2023.
- "Meta's New Large Language Model Galactica Pulled Down Three Days After Launch". Spiceworks. Retrieved 20 September 2023.
- "AlexaTM 20B is now available in Amazon SageMaker JumpStart | AWS Machine Learning Blog". aws.amazon.com. 17 November 2022. Retrieved 20 September 2023.
- Subhash, Varshini (5 January 2023). "Can Large Language Models Change User Preference Adversarially?". arXiv:2302.10291 [cs]. doi:10.48550/arXiv.2302.10291.
- Joshi, Harshit; Ebenezer, Abishai; Cambronero, José; Gulwani, Sumit; Kanade, Aditya; Le, Vu; Radiček, Ivan; Verbruggen, Gust (31 January 2023). "FLAME: A small language model for spreadsheet formulas". arXiv:2301.13779 [cs]. doi:10.48550/arXiv.2301.13779.
- Zhang, Zhuosheng; Zhang, Aston; Li, Mu; Zhao, Hai; Karypis, George; Smola, Alex (2023). "Multimodal Chain-of-Thought Reasoning in Language Models". doi:10.48550/arXiv.2302.00923.
- "Vinija's Notes • Models • Toolformer". vinija.ai. Retrieved 26 June 2023.
- Schick, Timo; Dwivedi-Yu, Jane; Dessì, Roberto; Raileanu, Roberta; Lomeli, Maria; Zettlemoyer, Luke; Cancedda, Nicola; Scialom, Thomas (2023). "Toolformer: Language Models Can Teach Themselves to Use Tools". doi:10.48550/arXiv.2302.04761.
- "Shaped". www.shaped.ai. Retrieved 16 May 2023.
- Weaver, Alaura (2 March 2023). "Palmyra LLMs empower secure, enterprise-grade generative AI for business". Writer. Retrieved 11 March 2023.
- "Writer Launches Three New Generative AI Models for the Enterprise". PRWeb. Retrieved 11 March 2023.
- "fnlp/moss-moon-003-base · Hugging Face". huggingface.co. 20 April 2023. Retrieved 26 June 2023.
- "MOSS". txsun1997.github.io. Retrieved 11 March 2023.
- White, Jules; Fu, Quchen; Hays, Sam; Sandborn, Michael; Olea, Carlos; Gilbert, Henry; Elnashar, Ashraf; Spencer-Smith, Jesse; Schmidt, Douglas C. (21 February 2023). "A Prompt Pattern Catalog to Enhance Prompt Engineering with ChatGPT". arXiv:2302.11382 [cs]. doi:10.48550/arXiv.2302.11382.
- "LLaMA: Open and Efficient Foundation Language Models - Meta Research". Meta Research. Retrieved 11 March 2023.
- Raieli, Salvatore (13 March 2023). "SpikeGPT: a 260 M only parameters LM not afraid of competition". Medium. Retrieved 26 June 2023.
- Zhu, Rui-Jie; Zhao, Qihang; Eshraghian, Jason K. (28 February 2023). "SpikeGPT: Generative Pre-trained Language Model with Spiking Neural Networks". arXiv:2302.13939 [cs]. doi:10.48550/arXiv.2302.13939.
- Bastian, Matthias (3 March 2023). "Microsoft's Kosmos-1 is a multimodal step toward more general AI". THE DECODER. Retrieved 18 September 2023.
- Huang, Shaohan; Dong, Li; Wang, Wenhui; Hao, Yaru; Singhal, Saksham; Ma, Shuming; Lv, Tengchao; Cui, Lei; Mohammed, Owais Khan; Patra, Barun; Liu, Qiang; Aggarwal, Kriti; Chi, Zewen; Bjorck, Johan; Chaudhary, Vishrav; Som, Subhojit; Song, Xia; Wei, Furu (1 March 2023). "Language Is Not All You Need: Aligning Perception with Language Models". arXiv:2302.14045 [cs]. doi:10.48550/arXiv.2302.14045.
- Cao, Meng; Fatemi, Mehdi; Cheung, Jackie Chi Kit; Shabanian, Samira (27 February 2023). "Systematic Rectification of Language Models via Dead-end Analysis". arXiv:2302.14003 [cs]. doi:10.48550/arXiv.2302.14003.
- Bertolini, Lorenzo; Elce, Valentina; Michalak, Adriana; Bernardi, Giulio; Weeds, Julie (28 February 2023). "Automatic Scoring of Dream Reports' Emotional Content with Large Language Models". arXiv:2302.14828 [cs]. doi:10.48550/arXiv.2302.14828.
- Huemann, Zachary; Lee, Changhee; Hu, Junjie; Cho, Steve Y.; Bradshaw, Tyler (1 March 2023). "Domain-adapted large language models for classifying nuclear medicine reports". arXiv:2303.01258 [cs]. doi:10.48550/arXiv.2303.01258.
- "A New Open Source Flan 20B with UL2". Yi Tay. Retrieved 30 June 2023.
- Josifoski, Martin; Sakota, Marija; Peyrard, Maxime; West, Robert (7 March 2023). "Exploiting Asymmetry for Synthetic Training Data Generation: SynthIE and the Case of Information Extraction". arXiv:2303.04132 [cs]. doi:10.48550/arXiv.2303.04132.
- "Prepare for truly useful large language models". Nature Biomedical Engineering. 7 (2): 85–86. 7 March 2023. doi:10.1038/s41551-023-01012-6.
- "Stanford CRFM". crfm.stanford.edu. Retrieved 21 March 2023.
- "Announcement of Jurassic-2 and Task-Specific APIs". Data Phoenix. 12 March 2023. Retrieved 21 September 2023.
- "Large language model: Revision history - Wikipedia". en.wikipedia.org. Retrieved 21 September 2023.
- "How does Claude, the new LLM from Anthropic compare to ChatGPT? A serious contender". www.cerebrium.ai. Retrieved 18 September 2023.
- "Introducing Claude". Anthropic. Retrieved 30 June 2023.
- "Falcon LLM: Abu Dhabi's Based TII Latest AI Breakthrough for Next-Gen Solutions". www.tii.ae. 6 September 2023. Retrieved 20 September 2023.
- "GPT-NeoX". huggingface.co. Retrieved 20 March 2023.
- Lubbad, Mohammed (7 August 2023). "The Ultimate Guide to GPT-4 Parameters: Everything You Need to Know about NLP's Game-Changer". Medium. Retrieved 19 September 2023.
- "GPT-4 Technical Report". 2023. doi:10.48550/arXiv.2303.08774.
- Ren, Xiaozhe; Zhou, Pingyi; Meng, Xinfan; Huang, Xinjing; Wang, Yadao; Wang, Weichao; Li, Pengfei; Zhang, Xiaoda; Podolskiy, Alexander; Arshinov, Grigory; Bout, Andrey; Piontkovskaya, Irina; Wei, Jiansheng; Jiang, Xin; Su, Teng; Liu, Qun; Yao, Jun (2023). "PanGu-Σ: Towards Trillion Parameter Language Model with Sparse Heterogeneous Computing". doi:10.48550/arXiv.2303.10845.
- "ChatGLM-6B". github.com. THUDM. 30 June 2023. Retrieved 30 June 2023.
- Eloundou, Tyna; Manning, Sam; Mishkin, Pamela; Rock, Daniel (2023). "GPTs are GPTs: An Early Look at the Labor Market Impact Potential of Large Language Models". doi:10.48550/arXiv.2303.10130.
- "Free Dolly: Introducing the World's First Truly Open Instruction-Tuned LLM". Databricks. 12 April 2023. Retrieved 19 September 2023.
- "Hello Dolly: Democratizing the magic of ChatGPT with open models". Databricks. 24 March 2023. Retrieved 19 June 2023.
- Dey, Nolan (28 March 2023). "Cerebras-GPT: A Family of Open, Compute-efficient, Large Language Models". Cerebras. Retrieved 20 September 2023.
- "BloombergGPT: The 50 Billion Parameter Large Language Model for Finance". Medium. 8 April 2023. Retrieved 20 September 2023.
- Köpf, Andreas; Kilcher, Yannic; von Rütte, Dimitri; Anagnostidis, Sotiris; Tam, Zhi-Rui; Stevens, Keith; Barhoum, Abdullah; Duc, Nguyen Minh; Stanley, Oliver; Nagyfi, Richárd; ES, Shahul; Suri, Sameer; Glushkov, David; Dantuluri, Arnav; Maguire, Andrew; Schuhmann, Christoph; Nguyen, Huu; Mattick, Alexander (2023). "OpenAssistant Conversations -- Democratizing Large Language Model Alignment". doi:10.48550/arXiv.2304.07327.
- Roth, Emma (19 April 2023). "Stability AI announces new open-source large language model". The Verge. Retrieved 9 May 2023.
- "Stability AI Launches the First of its StableLM Suite of Language Models". Stability AI. Retrieved 9 May 2023.
- Xu, Can; Sun, Qingfeng; Zheng, Kai; Geng, Xiubo; Zhao, Pu; Feng, Jiazhan; Tao, Chongyang; Jiang, Daxin (2023). "WizardLM: Empowering Large Language Models to Follow Complex Instructions". doi:10.48550/arXiv.2304.12244.
- Schwartz, Barry (12 May 2023). "Bing Chat gains image answers with knowledge cards and optimized answers". Search Engine Land. Retrieved 16 May 2023.
- "How to Access PaLM 2 AND TRY IT". MLYearning. 15 May 2023. Retrieved 16 May 2023.
- Hern, Alex (10 May 2023). "Google launches new AI PaLM 2 in attempt to regain leadership of the pack". The Guardian. Retrieved 16 May 2023.
- Wang, Wenhai; Chen, Zhe; Chen, Xiaokang; Wu, Jiannan; Zhu, Xizhou; Zeng, Gang; Luo, Ping; Lu, Tong; Zhou, Jie; Qiao, Yu; Dai, Jifeng (2023). "VisionLLM: Large Language Model is also an Open-Ended Decoder for Vision-Centric Tasks". doi:10.48550/arXiv.2305.11175.
- Xu, Canwen; Guo, Daya; Duan, Nan; McAuley, Julian (2023). "Baize: An Open-Source Chat Model with Parameter-Efficient Tuning on Self-Chat Data". doi:10.48550/arXiv.2304.01196.
- Patil, Shishir G.; Zhang, Tianjun; Wang, Xin; Gonzalez, Joseph E. (2023). "Gorilla: Large Language Model Connected with Massive APIs". doi:10.48550/arXiv.2305.15334.
- Ko, Hyunwoong; Yang, Kichang; Ryu, Minho; Choi, Taekyoon; Yang, Seungmu; Hyun, Jiwung; Park, Sungho; Park, Kyubyong (2023). "A Technical Report for Polyglot-Ko: Open-Source Large-Scale Korean Language Models". doi:10.48550/arXiv.2306.02254.
- Truong, Timothy F.; Bepler, Tristan (2023). "PoET: A generative model of protein families as sequences-of-sequences". doi:10.48550/arXiv.2306.06156.
- Yang, Hongyang; Liu, Xiao-Yang; Wang, Christina Dan (2023). "FinGPT: Open-Source Financial Large Language Models". doi:10.48550/arXiv.2306.06031.
- Tăiatu, Iulian-Marius; Avram, Andrei-Marius; Cercel, Dumitru-Clementin; Pop, Florin (2023). "RoBERTweet: A BERT Language Model for Romanian Tweets". doi:10.48550/arXiv.2306.06598.
- Liu, Zhengliang; Zhong, Aoxiao; Li, Yiwei; Yang, Longtao; Ju, Chao; Wu, Zihao; Ma, Chong; Shu, Peng; Chen, Cheng; Kim, Sekeun; Dai, Haixing; Zhao, Lin; Zhu, Dajiang; Liu, Jun; Liu, Wei; Shen, Dinggang; Li, Xiang; Li, Quanzheng; Liu, Tianming (2023). "Radiology-GPT: A Large Language Model for Radiology". doi:10.48550/arXiv.2306.08666.
- Gao, Difei; Ji, Lei; Zhou, Luowei; Lin, Kevin Qinghong; Chen, Joya; Fan, Zihan; Shou, Mike Zheng (2023). "AssistGPT: A General Multi-modal Assistant that can Plan, Execute, Inspect, and Learn". doi:10.48550/arXiv.2306.08640.
- Feng, Xidong; Luo, Yicheng; Wang, Ziyan; Tang, Hongrui; Yang, Mengyue; Shao, Kun; Mguni, David; Du, Yali; Wang, Jun (2023). "ChessGPT: Bridging Policy Learning and Language Modeling". doi:10.48550/arXiv.2306.09200.
- Sun, Yuqian; Li, Xingyu; Gao, Ze (2023). "Inspire creativity with ORIBA: Transform Artists' Original Characters into Chatbots through Large Language Model". doi:10.48550/arXiv.2306.09776.
- Wang, Guangyu; Yang, Guoxing; Du, Zongxin; Fan, Longjun; Li, Xiaohu (2023). "ClinicalGPT: Large Language Models Finetuned with Diverse Medical Data and Comprehensive Evaluation". doi:10.48550/arXiv.2306.09968.
- "Your job is (probably) safe from artificial intelligence". The Economist. 7 May 2023. Retrieved 18 June 2023.
- "Generative AI Could Raise Global GDP by 7%". Goldman Sachs. Retrieved 18 June 2023.
- Dickson, Ben (19 June 2023). "ChatGPT will make the web toxic for its successors - TechTalks". bdtechtalks.com. Retrieved 18 July 2023.
- Rubenstein, Paul K.; Asawaroengchai, Chulayuth; Nguyen, Duc Dung; Bapna, Ankur; Borsos, Zalán; Quitry, Félix de Chaumont; Chen, Peter; Badawy, Dalia El; Han, Wei; Kharitonov, Eugene; Muckenhirn, Hannah; Padfield, Dirk; Qin, James; Rozenberg, Danny; Sainath, Tara; Schalkwyk, Johan; Sharifi, Matt; Ramanovich, Michelle Tadmor; Tagliasacchi, Marco; Tudor, Alexandru; Velimirović, Mihajlo; Vincent, Damien; Yu, Jiahui; Wang, Yongqiang; Zayats, Vicky; Zeghidour, Neil; Zhang, Yu; Zhang, Zhishuai; Zilka, Lukas; Frank, Christian (2023). "AudioPaLM: A Large Language Model That Can Speak and Listen". doi:10.48550/arXiv.2306.12925.
- "ChatLaw: Open-Source Legal Large Language Model with Integrated External Knowledge Bases". arxiv.org. Retrieved 29 June 2023.
- Liao, Rita (11 July 2023). "China's search engine pioneer unveils open source large language model to rival OpenAI". TechCrunch. Retrieved 16 July 2023.
- Watson, Clare (9 September 2023). "Scientists Devised a Way to Tell if ChatGPT Becomes Aware of Itself". ScienceAlert. Retrieved 17 September 2023.
- "Alibaba launches its ChatGPT-like AI model for public use amid loosening restrictions in China". Cointelegraph. 13 September 2023. Retrieved 17 September 2023.
- "Google DeepMind Gemini". Dr Alan D. Thompson – Life Architect. 20 May 2023. Retrieved 18 September 2023.
- "Wikipedia Views: results". wikipediaviews.org. Retrieved 21 September 2023.
- "Google Trends". Google Trends. Retrieved 21 September 2023.