Timeline of large language models

The timeline currently offers focused coverage of the period until October 2023. It is likely to miss important developments outside this period (particularly after this period) though it may have a few events from after this period.

This is a timeline of large language models, which consist in artificial intelligence (AI) systems that use deep learning techniques to process and generate human-like natural language. LLMs are pre-trained on large amounts of data to learn the complexity and linkages of language, and can be adapted for specific tasks using techniques like fine-tuning, in-context learning, and zero-/one-/few-shot learning.^[1]

Sample questions

The following are some interesting questions that can be answered by reading this timeline:

What are some early developments representing significant milestones in the evolution of large language models?
- Sort the full timeline by "Event type" and look for the group of rows with value "Early development".
- You will see a number of milestones, such as the launch of the first chatbot, as well as the introduction of long short-term memory networks and transformer models.
What are some notable large language models being introduced over the years?
- Sort the full timeline by "Event type" and look for the group of rows with value "LLM launch".
- You will see the top LLMs and also their size in parameters.
What are some notable or sample cases describing research in the development of LLMs?
- Sort the full timeline by "Event type" and look for the group of rows with value "Programming/training".
- You will see some research cases describing programming, which concerns the design of the architecture of the model and implementation of the algorithms, as well as well as training, which refers to the process of teaching the large language model using data.
What are some cases of application of LLMs illustrated in the timeline?
- Sort the full timeline by "Event type" and look for the group of rows with value "Application".
- You will see a variety of applications, such as automatic analysis, psycholinguistics, nuclear medicine, and human-robot interaction.
What are some events describing the actual or potential impact of LLMs in society?
- Sort the full timeline by "Event type" and look for the group of rows with value "Impact".
- You will see a variety of considered cases, such as adversarial Influence, difficulty in Ddistinguishing human-written text, impact on the labor market, economic impact, concerns about AI-generated content, as well as situational awareness and deceptive behavior of LLMs.
Other events are described under the following types: "Early development", "Efficiency", and "Framework launch".

Big picture

Time period	Development summary	More details
1950–1960s	Early developments	The groundwork for natural language processing (NLP) is laid during these years with initial attempts at language translation by IBM and Georgetown University. The pivotal moment comes in 1966 when MIT researcher Joseph Weizenbaum creates ELIZA, the first chatbot. Although rudimentary, ELIZA uses pattern recognition and predefined rules to simulate human conversation, marking the beginning of NLP research.^[2]^[3]^[4]
1970s-2000s	Incremental progress	These decades see incremental progress. Researchers experiment with conceptual ontologies and rule-based systems in NLP. In the 1990s, the emergence of deep learning, a form of machine learning employing neural networks for data processing, enables the development of increasingly sophisticated language models. The introduction of Long Short-Term Memory (LSTM) networks in 1997 enables the development of deeper neural networks capable of handling larger datasets. Additionally, tools like Stanford’s CoreNLP suite, introduced in 2010, provides algorithms for complex NLP tasks such as sentiment analysis and named entity recognition. Google Brain’s launch in 2011, offering advanced resources and features like word embeddings, further propells the field.^[4]
2010s onwards	Rise of large language models	In the 2010s, the landscape of language processing transforms dramatically. The introduction of Transformer models in 2017 revolutionizes NLP. This architecture allows for the creation of Large Language Models (LLMs) capable of understanding context and generating human-like text. From 2019 onwards, the rise of Large Language Models gains momentum with the introduction of models like GPT-2, GPT-3, and T5. These models can perform diverse tasks, driving a paradigm shift in AI capabilities. They become emblematic, serving as foundations for various applications, including ChatGPT.^[5] Recent years also witness the emergence of user-friendly frameworks, such as Hugging Face and BARD, empowering researchers and developers to create their own LLMs seamlessly.^[6]^[2]

Full timeline

Year	Month and date	Model name (when applicable)	Size (in parameters)	Pre-train data scale	Event type	Details
1954					Early development	Researchers at IBM and Georgetown University develop a system for automatic translation of phrases from Russian to English. This early effort lays the foundation for natural language processing and marks the beginning of research and experimentation in the field of large language models. Subsequent decades would see various approaches, including conceptual ontologies and rule-based systems, as researchers endeavor to advance the processing of natural language, although these initial attempts do not produce significant breakthroughs at the time.^[4]
1966		ELIZA			Early development	Joseph Weizenbaum at MIT develops ELIZA, one of the earliest examples of a language model. ELIZA uses a simple set of rules to mimic human conversation, responding to user input in a natural and conversational manner. This development marks a significant milestone in the history of large language models, demonstrating the early capabilities of AI in language processing.^[4]
1986					Early development	Recurrent Neural Networks (RNNs) emerge, allowing models to capture dependencies in natural language processing tasks, but facing challenges with long-term memory retention.^[5]^[7]
1997					Early development	Long Short-Term Memory (LSTM) networks are introduced, enabling the creation of deeper and more complex neural networks capable of handling substantial amounts of data. This innovation marks a pivotal moment in the advancement of natural language processing (NLP) technology, providing a foundation for the evolution of more sophisticated LLMs in the subsequent years.^[2]^[5]
2010					Early development	Stanford's CoreNLP suite is introduced, providing researchers with a powerful set of tools and algorithms. This suite enables the tackling of complex natural language processing tasks such as sentiment analysis and named entity recognition. This advancement marks a crucial moment in the evolution of NLP technology, enhancing researchers' capabilities to handle intricate linguistic tasks and contributing to the subsequent progress of more sophisticated LLMs.^[4]
2014					Early development	The attention mechanism is introduced, enabling models to focus dynamically on different parts of input sequences, addressing issues related to sentence length and improving translation accuracy.^[5]^[8]
2017					Early development	Transformer models are introduced. This innovative architecture, enabled by Google Brain's pioneering work, would revolutionize natural language processing. Transformers allow for the creation of larger and more sophisticated LLMs, including OpenAI’s GPT-3 (Generative Pre-Trained Transformer). These models would become foundational, serving as the basis for applications like ChatGPT and numerous other AI-driven innovations. The introduction of Transformers ushers in a new era of highly capable and versatile language processing systems.^[2]^[5]
2018	October 11	BERT	340,000,000^[9]	3,300,000,000 words^[10]	LLM launch	Google researchers unveil BERT (Bidirectional Encoder Representations from Transformers), a groundbreaking language model. BERT's bidirectional design enables it to consider both input and output context, enhancing its understanding of language nuances. Employing a consistent-width neural network, BERT adapts to diverse tasks. Pre-trained on extensive unstructured data, it comprehensively grasps word relationships. BERT's simplicity and effectiveness makes it accessible to researchers and practitioners, allowing fine-tuning for various tasks with minimal adjustments. Upon its release, BERT sets unprecedented records in NLP benchmark tests, swiftly becoming the industry standard. Within 18 months, it would power the majority of English queries processed by Google Search.^[3]^[11]^[5]
2019	May 29	GROVER			LLM launch	A team of researchers from the University of Washington and Allen Institute for AI Research introduce GROVER, a language model similar to GPT-2. However, they do not make the larger versions of the model publicly available.^[12] Their publication discusses the potential risks of natural language generation technology and the need for robust defenses against neural fake news. Grover can generate realistic news articles that are difficult to distinguish from real news. They also explore the effectiveness of current methods for detecting fake news and find that the best defense against Grover is itself, with 92% accuracy. The article concludes by discussing the ethical issues related to the technology and the importance of public release of strong generators to facilitate better detection of neural fake news.^[13]
2019	June 19	XLNet	~340,000,000^[14]	130,000,000,000 bytes^[15]	LLM launch	XLNet is introduced as a generalized autoregressive pretraining method for language understanding. Unlike BERT, which relies on masking input tokens, XLNet considers all permutations of the factorization order to model bidirectional contexts. This approach overcomes the limitations of BERT and improves pretrain-finetune consistency. XLNet incorporates ideas from Transformer-XL, an autoregressive model, into its pretraining process. In empirical evaluations across 20 tasks, XLNet outperforms BERT by a significant margin, including question answering, natural language inference, sentiment analysis, and document ranking.^[16]
2019	July 26	RoBERTa	123,000,000–354,000,000^[17]	160,000,000,000 bytes^[18]	LLM launch	Researchers introduce "RoBERTa: A Robustly Optimized BERT Pretraining Approach," after conducting a replication study of BERT pretraining (Devlin et al., 2019) to evaluate the impact of key hyperparameters and training data size on performance. They find that BERT was undertrained and demonstrate that it can achieve or surpass the performance of subsequent models. The authors achieve state-of-the-art results on GLUE, RACE, and SQuAD benchmarks, highlighting the significance of overlooked design choices and questioning the origins of recently reported improvements.^[19]
2019	August	Megatron-LM	8,300,000,000	174,000,000,000 bytes^[20]	LLM launch	NVIDIA introduces Megatron-LM^[21], which boasts 8.3 billion parameters and is trained with data parallelism on a remarkable 512 GPUs. The training process took a mere 53 minutes, showcasing its computational efficiency. Megatron-LM's training data is sourced from diverse places, including Wikipedia, OpenWebText, RealNews, and CC-Stories, with a combined dataset size of 174 gigabytes. This model represents a significant milestone in the development of large-scale language models, highlighting the capabilities of modern hardware and data processing in the field of natural language processing.^[22]^[23]^[24]
2019	September 11	CTRL	1,630,000,000		LLM launch	CTRL is introduced as a conditional transformer language model that aims to enhance control over text generation. It is designed to condition on control codes, allowing users to govern style, content, and task-specific behavior. These control codes are derived from the structure that naturally co-occurs with raw text, providing explicit control while leveraging the advantages of unsupervised learning. CTRL is capable of predicting the likelihood of different parts of the training data given a sequence, enabling potential analysis of large datasets through model-based source attribution.^[25]
2019	September 26	ALBERT	12,000,000^[17]		LLM launch	ALBERT is introduced as a lightweight version of BERT that focuses on self-supervised learning of language representations. The authors address the limitations of increasing model size by proposing two parameter-reduction techniques, which reduce memory consumption and training time. Empirical evidence demonstrates that their methods significantly improve the scalability of models compared to the original BERT. Additionally, they employ a self-supervised loss that prioritizes modeling inter-sentence coherence, consistently enhancing performance on tasks with multi-sentence inputs. The best ALBERT model achieves new state-of-the-art results on benchmarks such as GLUE, RACE, and SQuAD while having fewer parameters than BERT-large.^[26]
2019	October 2	DistilBERT	66,000,000^[27]		LLM launch	DistilBERT is introduced as a smaller, faster, and cheaper version of BERT, designed for efficient on-device computations. It retains 97% of BERT's language understanding capabilities while reducing its size by 40%. By using knowledge distillation during pre-training and a triple loss function, it captures important linguistic features. DistilBERT proves its capabilities through proof-of-concept experiments and on-device studies.^[28]
2019	November 1	DialoGPT	1,500,000,000^[29]		LLM launch	DialoGPT is introduced as a large, adaptable neural model for generating conversational responses. It is trained on 147 million conversation-like exchanges from Reddit comment chains spanning 2005 to 2017. DialoGPT, an extension of the Hugging Face PyTorch transformer, achieves performance close to human-level evaluation in single-turn dialogues. It outperforms strong baseline systems by generating more relevant, meaningful, and contextually consistent responses. The pre-trained model and training pipeline are publicly available, encouraging research in neural response generation and the advancement of intelligent open-domain dialogue systems.^[30]
2019	November 10	CamemBERT	110,000,000^[31]^[32]	138,000,000,000 bytes^[32]	LLM launch	A paper introduces CamemBERT, a monolingual Transformer-based language model trained specifically for French. It addresses the limited practical use of pretrained models in languages other than English. The authors evaluate CamemBERT on various tasks including part-of-speech tagging, dependency parsing, named entity recognition, and natural language inference. They find that using web crawled data is preferable to Wikipedia data. Surprisingly, even with a relatively small web crawled dataset of 4GB, CamemBERT achieves results on par with or better than models trained on larger datasets of over 130GB. In fact, CamemBERT outperforms the state-of-the-art models in all four downstream tasks.^[33]
2019	December 11	FlauBERT	138,000,000 – 373,000,000^[34]^[32]	71,000,000,000 bytes^[32]	LLM launch	FlauBERT is introduced as an unsupervised language model for French. Developed by Hang Le et al., it leverages unlabeled texts to pre-train word representations, demonstrating superior performance in various NLP tasks. Trained on a large and diverse French corpus, FlauBERT outperforms other pre-training approaches. The authors share different FlauBERT versions and a unified evaluation protocol, FLUE, for reproducible French NLP experiments.^[35]
2020	January 13	ProphetNet		16,000,000,000–160,000,000,000 bytes	LLM launch	A paper introduces ProphetNet, a new sequence-to-sequence pre-training model. It incorporates a novel self-supervised objective called future n-gram prediction and utilizes the n-stream self-attention mechanism. Unlike traditional models that optimize one-step-ahead prediction, ProphetNet predicts the next n tokens simultaneously based on previous context tokens at each time step. The future n-gram prediction objective encourages the model to plan for future tokens and prevents overfitting to local correlations. ProphetNet is pre-trained on both a base-scale dataset (16GB) and a large-scale dataset (160GB). The model's performance is evaluated on benchmarks such as CNN/DailyMail, Gigaword, and SQuAD 1.1 for tasks like abstractive summarization and question generation. Experimental results demonstrate that ProphetNet outperforms models using the same pre-training corpus in terms of state-of-the-art results on all tested datasets.^[36]
2020	February 24	T5	11,000,000,000^[37]	1,000,000,000,000 tokens^[38]	LLM launch	T5 is introduced as a Text-To-Text Transfer Transformer model. It is a flexible and powerful model that achieves optimal results in natural language processing tasks. It uses a unified text-to-text framework, allowing for easy adaptation to various NLP tasks. T5 is trained on a large-scale pre-training dataset called C4, which improves its performance. The authors conduct a systematic study of transfer learning methodologies and combine the best approaches to achieve remarkable results on multiple benchmarks. T5 is also applied to closed-book question answering and fill-in-the-blank text generation tasks with impressive performance.^[39]
2020	March 10				Programming/training	Google researchers introduce ELECTRA (Efficiently Learning an Encoder that Classifies Token Replacements Accurately), a novel pre-training method for natural language processing (NLP) models. ELECTRA aims to achieve the benefits of models like BERT while being more computationally efficient. It introduces a replaced token detection (RTD) task, inspired by generative adversarial networks (GANs), where the model distinguishes between "real" and "fake" input data. Unlike previous methods that predict a small subset of masked tokens, ELECTRA applies the binary classification task to every input token, resulting in more efficient learning. The replacement tokens are generated by a separate neural network called the generator, which is trained jointly with the discriminator (ELECTRA model). After pre-training, the generator is dropped, and the discriminator is fine-tuned on specific NLP tasks. ELECTRA achieves optimal results on benchmarks like GLUE and SQuAD while using less compute compared to other models like RoBERTa and XLNet. It is released as an open-source model on TensorFlow, supporting tasks such as text classification, question answering, and sequence tagging. Pre-trained weights are also provided for ELECTRA-Large, ELECTRA-Base, and ELECTRA-Small.^[40]
2020	April	Megatron-11B	11,000,000,000	161,000,000,000 bytes^[20]	LLM launch	Facebook AI Research (FAIR) introduces Megatron-11B, a unidirectional language model with 11 billion parameters, which is built upon the Megatron-LM architecture. FAIR trained this model using intra-layer model parallelism, splitting each layer's parameters across 8 GPUs. Megatron-11B is trained on a dataset consisting of English Wikipedia (12GB), BookCorpus (4GB), CC-News (76GB), OpenWebText/Reddit upvoted (38GB), and Stories (31GB), with a total dataset size of 161GB. This model is part of the RoBERTa family and contributes to the advancements in large-scale language models for natural language processing tasks.^[22]
2020	May	GPT-3	175,000,000,000^[41]	45,000,000,000,000 bytes^[42]	LLM launch	OpenAI introduces GPT-3, the largest neural network with 175 billion parameters, surpassing previous models significantly. Trained on extensive internet data, GPT-3 demonstrates exceptional performance in various natural language processing tasks like translation and question-answering, outperforming existing models. The research showcases its remarkable few-shot learning ability, making it a groundbreaking advancement in the field of artificial intelligence.^[43]^[44]
2020	May 28				Programming/training	A paper discusses the use of language models in few-shot learning, where a model is trained on a large corpus of text and then fine-tuned for a specific task. The authors demonstrate that scaling up language models greatly improves task-agnostic, few-shot performance. They trained GPT-3, a language model with 175 billion parameters, and tested its performance in the few-shot setting. GPT-3 achieved strong performance on many NLP tasks, including translation, question-answering, and cloze tasks, as well as tasks that require on-the-fly reasoning or domain adaptation. However, the authors also identify some datasets where GPT-3's few-shot learning struggles, as well as methodological issues related to training on large web corpora. The paper also discusses the broader societal impacts of this finding and of GPT-3 in general.^[45]
2020	June 5	DeBERTa	1,500,000,000 (larger model)^[46]		LLM launch	A paper presents DeBERTa, a model that enhances BERT and RoBERTa LLMs by introducing disentangled attention and an enhanced mask decoder. These techniques improve model pre-training efficiency and performance on various NLP tasks. A DeBERTa model trained on half the data outperforms RoBERTa-Large on tasks like MNLI, SQuAD v2.0, and RACE. A larger DeBERTa model with 1.5 billion parameters surpasses human performance on the SuperGLUE benchmark, and an ensemble DeBERTa model leads the SuperGLUE leaderboard with a significant margin over the human baseline.^[47]
2020	June 30	GShard	600,000,000,000^[38]	1,000,000,000,000 tokens^[38]	LLM launch	A paper introduces GShard, a module designed to address challenges in scaling neural networks for machine learning applications. By combining lightweight annotation APIs and an extension to the XLA compiler, GShard enables efficient parallel computation patterns with minimal code changes. The researchers utilize GShard to scale a multilingual neural machine translation Transformer model with Sparsely-Gated Mixture-of-Experts to over 600 billion parameters using automatic sharding. This model is trained on 2048 TPU v3 accelerators in just 4 days, achieving significantly improved translation quality from 100 languages to English compared to previous methods.^[48]
2020	July				Efficiency	A paper discusses the limitations of neural text generation models in open-ended tasks like language modeling and story generation, due to the standard likelihood training and approximate decoding objectives. The authors specifically analyze these limitations for abstractive document summarization and find that such models tend to hallucinate content that is unfaithful to the input document. The paper presents the results of a human evaluation of several neural abstractive summarization systems, highlighting the substantial amount of hallucinated content in all model-generated summaries. However, the authors also show that pretrained models perform better in terms of generating faithful and factual summaries, as evaluated by humans. They propose that textual entailment measures may be a better evaluation metric for faithfulness than standard metrics, leading to better training and decoding criteria.^[49]
2020	October 23	mT5	13,000,000,000^[38]	1,000,000,000,000 tokens^[38]	LLM launch	mT5 is introduced as a multilingual variant of the Text-to-Text Transfer Transformer (T5). Leveraging a unified text-to-text format and pre-trained on a dataset covering 101 languages, mT5 achieves state-of-the-art results on multilingual benchmarks. The authors detail the design, modified training, and introduce a technique to prevent "accidental translation" errors in zero-shot settings.^[50]
2021	January 11	Wu Dao	1,750,000,000,000	4,900,000,000,000 bytes^[51]	LLM launch	Wu Dao is released. It's among the top large language models by parameter size.^[6] Developed by researchers from the Beijing Academy of Artificial Intelligence, is a groundbreaking generative deep learning model with 1.75 trillion parameters, making it ten times larger than OpenAI's GPT-3. The model utilizes an open-source learning system called FastMoE, similar to Google's Mixture of Experts, enabling rapid training on both supercomputers and conventional GPUs. Unlike traditional deep learning models, Wu Dao is multi-modal, capable of natural language processing, text and image generation, and recognition tasks. It can write essays, poems, generate alt text from images, create realistic images from descriptions, power virtual idols, and predict protein structures.^[52]
2021	March 22	GPT-Neo	2,700,000,000^[53]		LLM launch	GPT-Neo is introduced as an open-source alternative to GPT-3, developed by EleutherAI. It offers accessible language generation capabilities and is released under the MIT license. While GPT-Neo's performance is not as strong as GPT-3's largest model, it outperforms comparable GPT-3 models on NLP reasoning benchmarks. GPT-Neo provides a promising option, especially considering OpenAI's restricted access policy.^[54]
2021	April 26	PanGu-α	13,000,000,000^[38]–200,000,000,000	1,100,000,000,000 bytes^[38]	LLM launch	Researchers introduce PanGu-α, a large-scale autoregressive pretrained Chinese language model with up to 200 billion parameters. Developed using MindSpore and trained on a cluster of 2048 Ascend 910 AI processors, PanGu-α utilizes advanced training parallelism strategies, including data parallelism, op-level model parallelism, pipeline model parallelism, optimizer model parallelism, and rematerialization. To enhance its capabilities, the model is pretrained on 1.1TB of high-quality Chinese data from diverse domains. Empirical tests showcase PanGu-α's excellence in tasks such as text summarization, question answering, and dialogue generation, demonstrating superior performance in few-shot or zero-shot scenarios across various Chinese NLP tasks.^[55]
2021	May	LaMDA	173,000,000,000^[38]	768,000,000,000 tokens^[38]	LLM launch	Google anounces LaMDA (Language Model for Dialogue Applications). Unlike other language models, LaMDA is specifically trained on dialogue to enable more natural and engaging conversations with users. It has the ability to understand and respond to the subtleties of open-ended discussions. LaMDA has various potential applications, including customer service, chatbots, and personal assistants. It is built upon Google's previous chatbot model called Meena.^[56] Its pretraining dataset consists of 2.97 billion documents, 1.12 billion dialogs, and 13.39 billion dialog utterances, for a total of 1.56 trillion words.^[57]
2021	June 20	CPM-2	198,000,000,000^[38]	2,600,000,000,000 bytes^[38]	LLM launch	Researchers introduce two models: an encoder-decoder bilingual model with 11 billion parameters (CPM-2) and its corresponding MoE version with 198 billion parameters. In their tests, they evaluated CPM-2 and mT5 in practical tasks. The results indicate that CPM-2 possesses impressive overall language capabilities. Additionally, they verify InfMoE's effectiveness in performing inferences with large-scale models containing tens of billions of parameters on a single GPU.^[58]
2021	July 5	ERNIE 3.0	10,000,000,000^[38]	375,000,000,000 tokens^[38]	LLM launch	ERNIE 3.0 is introduced as a pre-training framework for large-scale language models in Natural Language Processing (NLP). Unlike previous models like T5 and GPT-3, ERNIE 3.0 incorporates both linguistic and world knowledge into its training, addressing the limitation of traditional models trained solely on plain texts. It combines auto-regressive and auto-encoding networks, enabling the model to handle natural language understanding and generation tasks effectively. Trained with 10 billion parameters on a 4TB corpus containing texts and a vast knowledge graph, ERNIE 3.0 outperforms existing models in 54 Chinese NLP tasks. Its English version also excels, leading the SuperGLUE benchmark and surpassing human performance by +0.8% (90.6% vs. 89.8%).^[59]
2021	July 7	Codex	12,000,000,000	100,000,000,000 tokens	LLM launch	A paper introduces Codex, a GPT language model fine-tuned on publicly available GitHub code, also powering GitHub Copilot. Evaluations on a new set called HumanEval reveal Codex solves 28.8% of problems involving synthesizing programs from docstrings, significantly outperforming GPT-3 (0%) and GPT-J (11.4%). Codex demonstrates effectiveness in generating solutions by repeatedly sampling from the model, achieving 70.2% accuracy with 100 samples per problem. Limitations include challenges with complex docstrings and binding operations to variables. The study discusses broader impacts of deploying advanced code generation technologies, addressing concerns related to safety, security, and economics.^[60]
2021	September	HyperCLOVA	82,000,000,000^[38]–204,000,000,000^[61]	300,000,000,000^[38]–560,000,000,000^[62] tokens	LLM launch	HyperCLOVA is introduced as a large-scale Korean contextual learning model.^[62] HyperCLOVA's extensive parameters enhance its ability to distinguish speech nuances and dialects. It learned from 6,500 times more Korean data than GPT-3, predominantly focusing on the Korean language (97%). HyperCLOVA's applications include human conversation processing, translation, summarization, and machine reading, offering diverse AI possibilities and fostering new service and business opportunities.^[61]
2021	October 10	Yuan 1.0	245,000,000,000^[38]	180,000,000,000 tokens^[38]	LLM launch	Yuan 1.0 is introduced as a significant advancement in large-scale pre-trained language models for zero-shot and few-shot learning, addressing challenges faced by models like GPT-3 due to enormous computational demands. By integrating distributed training performance into model architecture, Yuan 1.0, boasting 245B parameters, achieves remarkable results across NLP tasks on thousands of GPUs. The approach includes efficient data processing to filter extensive raw data, resulting in a high-quality Chinese corpus of 5TB texts. Calibration and label expansion methods enhance zero-shot and few-shot performance, ensuring accurate task execution. Yuan 1.0 excels in natural language generation, producing articles nearly indistinguishable from human-written ones.^[63]
2021	October 11	MT-NLG	530,000,000,000^[56]^[38]	>825,000,000,000 bytes^[20], 270,000,000,000 tokens^[38]	LLM launch	MT-NLG (Megatron-Turing Natural Language Generation) is introduced as a language model developed jointly by Nvidia and Microsoft. It utilizes the architecture of the Megatron transformer-based model and has a record-breaking size of 530 billion parameters. MT-NLG is designed to generate coherent and contextually relevant text for various natural language processing tasks such as completion prediction, reading comprehension, commonsense reasoning, and word sense disambiguation. Training such large-scale models is challenging due to memory constraints and long training times, but innovations in hardware, software, and training methods have made it feasible. MT-NLG achieves state-of-the-art results in zero-shot, one-shot, and few-shot settings across multiple NLP tasks.^[64]
2021	December 8	Gopher	280,000,000,000^[38]	300,000,000,000 tokens^[38]		Gopher is introduced as a 280 billion parameter Transformer-based language model, developed by Google subsidiary DeepMind. Trained on a 10.5TB corpus called MassiveText, Gopher outperforms its contemporary state-of-the-art on 100 of 124 evaluation tasks. The model is trained alongside smaller models to explore the strengths and weaknesses of large language models (LLMs). It excells in tasks like reading comprehension and fact-checking but shows reduced benefits in logical reasoning, common sense, and mathematics tasks. The DeepMind team utilizes a custom training dataset, MassiveText, to ensure high-quality data without contaminating the training dataset with test datasets available online. Gopher is part of DeepMind's language research efforts at the time.^[65]^[66]^[38]
2021	December 13	GLaM	1,200,000,000,000^[38]	280,000,000,000 tokens^[38]	LLM launch	GLaM (Generalist Language Model) is introduced as a family of language models. These models utilize a sparsely activated mixture-of-experts architecture to increase model capacity while significantly reducing training costs compared to dense variants. The largest GLaM model has 1.2 trillion parameters, making it approximately 7 times larger than GPT-3. Despite its size, this model consumes only 1/3 of the energy used to train GPT-3 and requires half of the computation flops for inference. Additionally, GLaM demonstrates better overall zero-shot and one-shot performance across 29 natural language processing tasks.^[67]
2021	December 16	WebGPT			LLM launch	OpenAI introduces their WebGPT project, which enhances GPT-3's factual accuracy by incorporating a text-based web browser into its functionality. The model imitates human online research by issuing search queries, following links, and citing sources to answer open-ended questions. Trained to address the tendency of language models to generate incorrect information, WebGPT allows commands like "Search..." and "Find in page:..." to gather information from web pages. The model undergoes fine-tuning through methods involving human demonstrations and training a reward model, aiming to create more accurate and truthful AI responses.^[68]
2021	December	Fairseq	13,000,000,000 – 1,000,000,000,000	453,000,000,000 bytes^[20]	LLM launch	Meta AI, previously known as FAIR (Facebook AI Research), announces the introduction of Fairseq, a language model with parameters of 13B and 1.1T. Fairseq is not related to Megatron, and the two use different technologies for training. Fairseq's dataset sources include the same ones used for RoBERTa (English Wikipedia, BookCorpus, CC-News, OpenWebText/Reddit upvoted, and Stories) with the new addition of English CC100 in Wikipedia style from Jan/2018-Dec/2018, resulting in a total dataset size of 453GB. Fairseq was trained using 2,363 GPU-days with 1,024 GPUs, taking approximately three days.^[22]^[69]
2022	January 19	CM3			LLM launch	A paper introduces CM3, a family of causally masked generative models trained on large-scale web and Wikipedia articles containing text and image tokens. The new approach generates tokens left to right while masking out a small number of long token spans that are generated at the end of the string. This provides a hybrid of the more common causal and masked language models, allowing for full generative modeling while providing bidirectional context when generating the masked spans. The resulting CM3 models can generate rich structured, multi-modal outputs while conditioning on arbitrary masked document contexts and implicitly learn a wide range of text, image, and cross-modal tasks. The paper also reports state-of-the-art performance in zero-shot summarization, entity linking, and entity disambiguation, while maintaining competitive performance in the fine-tuning setting.^[70]
2022	January 27	InstructGPT	175,000,000,000^[38]–1,300,000,000		LLM launch	OpenAI announces having deployed InstructGPT, a new language model that is safer, more helpful, and more aligned with users. The model was trained using a reinforcement learning technique from human feedback and is significantly better at following instructions than the previous model, GPT-3. InstructGPT is also less toxic and generates fewer false facts than its predecessor. The company believes that fine-tuning language models with humans in the loop is a powerful tool for improving their safety and reliability. InstructGPT becomes the default language model accessible on OpenAI's API.^[71]
2022	February	AlphaCode	41,000,000,000^[38]	967,000,000,000 tokens^[38]	LLM launch	AlphaCode is introduced as an AI system created by DeepMind that performs better than 50% of humans on a set of competitive programming challenges.^[72]^[38]
2022	February 28	Extremely Large			LLM launch	Cohere launches a new beta version of their language generation model called "Extremely Large", which, according to Cohere, outperforms their existing largest model, Large, on various tasks such as sentiment analysis, named entity recognition (NER), and common sense reasoning.^[73]
2022	March 24	SeeKeR			LLM launch	Researchers report having developed a new language model called SeeKeR that combines internet search, knowledge generation, and response generation to improve factual accuracy in open-domain knowledge-grounded conversations. SeeKeR outperforms the model BlenderBot 2 in terms of consistency, knowledge, and engagingness for the same number of parameters. SeeKeR also outperforms GPT2 and GPT3 in terms of factuality and topicality for prompt completions as a standard language model.^[74]
2022	March 25	CODEGEN	350,000,000; 2,700,000,000, 6,100,000,000; 16,100,000,000	577,000,000,000 tokens^[38]	LLM launch	A paper introduces a family of LLMs called CODEGEN, trained on natural language and programming language data for program synthesis. The authors release CODEGEN and the training library JAXFORMER to democratize access to such models. They demonstrate that CODEGEN is competitive with previous state-of-the-art models for zero-shot Python code generation and investigate multi-turn program synthesis using an open benchmark called MTPB. Their analysis shows that multi-turn program synthesis significantly improves program synthesis over single-turn prompts. The training library and model checkpoints are available as open source contributions.^[75]^[76]
2022	March 29				Programming/training	A paper investigates the optimal model size and number of tokens for training a transformer language model under a given compute budget. The researchers find that, at this time, large language models are significantly undertrained, and the model size and the number of training tokens should be scaled equally for compute-optimal training. They test this hypothesis by training a predicted compute-optimal model, Chinchilla, that uses the same compute budget as Gopher but with 70B parameters and 4x more data. Chinchilla outperforms Gopher, GPT-3, Jurassic-1, and Megatron-Turing NLG on a range of downstream evaluation tasks and reaches a state-of-the-art average accuracy of 67.5% on the MMLU benchmark, more than a 7% improvement over Gopher.^[77]
2022	March 29	Chinchilla	70,000,000,000^[38]	1,400,000,000,000 tokens^[38]	LLM launch	Chinchilla is introduced by DeepMind to address the optimal training of large language models under a specific computational budget. DeepMind's research shows that existing large language models are undertrained due to a focus on scaling models while keeping the training data constant. By training over 400 language models ranging from 70 million to over 16 billion parameters on 5 to 500 billion tokens, they found that for optimal training, both model size and the number of training tokens should be scaled equally. Chinchilla, a model with 70 billion parameters and trained on 1.4 trillion tokens, outperforms larger models like Gopher, GPT-3, Jurassic-1, and Megatron-Turing NLG, achieving superior performance on various evaluation tasks while using substantially less computational resources for fine-tuning and inference.^[78]
2022	April 5	PaLM	540,000,000,000^[56]^[38]	780,000,000,000 tokens^[38]	LLM launch	A paper presents PaLM, a 540-billion parameter language model trained using Pathways, a new machine learning system that enables highly efficient training across multiple TPU Pods. PaLM achieves state-of-the-art few-shot learning results on hundreds of language understanding and generation benchmarks and outperforms the finetuned state-of-the-art on a suite of multi-step reasoning tasks. It also outperforms average human performance on the BIG-bench benchmark. Additionally, PaLM has strong capabilities in multilingual tasks and source code generation. The paper also discusses bias and toxicity and potential mitigation strategies.^[79]^[80]
2022	April 12				Programming/training	A paper describes a method for training language models to act as helpful and harmless assistants using reinforcement learning from human feedback. The authors demonstrate that this alignment training improves performance on almost all natural language processing evaluations and is compatible with training for specialized skills such as python coding and summarization. They explore an iterated online mode of training and investigate the robustness of the approach, identifying a linear relationship between the RL reward and the square root of the Kullback–Leibler divergence between the policy and its initialization. The authors also perform peripheral analyses and provide samples from their models using prompts from recent related work.^[81]
2022	April 14	GPT-NeoX-20B	20,000,000,000^[38]	825,000,000,000 bytes^[38]	LLM launch	GPT-NeoX-20B is introduced as an autoregressive language model. It is trained on the Pile dataset, and its weights are openly available to the public under a permissive license. GPT-NeoX-20B is described as the largest publicly available dense autoregressive model at the time of submission. The introducing paper discusses the architecture and training of GPT-NeoX-20B and evaluates its performance on various tasks related to language understanding, mathematics, and knowledge-based reasoning. The results show that GPT-NeoX-20B performs exceptionally well in few-shot scenarios, surpassing similarly sized models such as GPT-3 and FairSeq.^[82]^[83]
2022	April	DALL-E 2	3,500,000,000		LLM launch	OpenAI unveils DALL-E 2, a successor to their original DALL-E model, designed for generating highly realistic images at resolutions up to 1024x1024. Unlike its predecessor, DALL-E 2 utilizes a diffusion model, enabling the creation of images with four times the resolution of DALL-E. OpenAI extends customization options, allowing users to specify styles like pixel art or oil paintings. DALL-E 2 introduces 'outpainting,' enabling users to extend existing images creatively. This innovation would spark significant interest in the field of generative AI, especially for tasks beyond image generation, such as interpolation and manipulation. The model's working mechanism involves a text encoder, 'prior' model, and image decoder, simplifying complex processes underlying its image generation capabilities.^[84]^[85]^[86]^[87]
2022	May 3	OPT	175,000,000,000^[38]	180,000,000,000 tokens^[38]	LLM launch	Meta AI introduces Open Pretrained Transformer-175B (OPT-175B), a language model designed to democratize access to large-scale language models. By this time, these models, with over 100 billion parameters, have revolutionized NLP and AI research. OPT-175B is released with both pretrained models and code for training and usage, under a noncommercial license for research purposes. It aims to make these models accessible to academic, governmental, civil society, and industry researchers worldwide. Meta AI emphasizes responsible AI and provides documentation, compute efficiency, and smaller-scale baseline models for analysis.^[88]
2022	May 10	UL2	20,000,000,000^[38]	1,000,000,000,000 tokens^[38]	LLM launch	UL2 is introduced as a unified framework for pre-training models that excel across various datasets and setups. It dissects architectural archetypes and pre-training objectives, offering a generalized view of self-supervision in NLP. The paper proposes Mixture-of-Denoisers (MoD), a method combining diverse pre-training paradigms. UL2 achieves superior performance, surpassing T5 and GPT-like models across multiple contexts. With 20B parameters, it outperforms GPT-3 on zero-shot SuperGLUE and triples T5-XXL's one-shot summarization performance. UL2 also excels in chain-of-thought prompting and reasoning, making it ideal for medium-scale reasoning research. FLAN instruction tuning enhances its scores, and model checkpoints are released for further research.^[89]
2022	June	YaLM 100B	100,000,000,000	1,700,000,000 bytes	LLM launch	Yandex unveils YaLM 100B, the largest open-source GPT-like neural network as of date. This model is offered for free, aiming to make advanced language models accessible to researchers worldwide. It was trained for 65 days on 800 A100 graphics cards using 1.7 TB of diverse text sources. Yandex shares the model on GitHub under the Apache 2.0 license for both research and commercial use.^[90]
2022	June 29	Minerva	540,000,000,000		LLM launch	Google introduces Minerva, a large language model designed to bridge the gap in quantitative reasoning tasks. While existing language models excel in natural language understanding, they often struggle with quantitative tasks like solving college-level math, science, and engineering problems. Minerva is pretrained on general language data and then fine-tuned on technical content. It achieves optimal performance on technical benchmarks without external tools. Evaluation on over 200 undergraduate-level problems in various sciences reveals Minerva can correctly answer nearly one-third of them, demonstrating significant progress in the integration of quantitative reasoning into language models.^[91]^[92]
2022	July 6	NLLB-200	54,500,000,000^[38]		LLM launch	Meta unveils NLLB-200, which is capable of translating 200 languages with a remarkable 44% improvement in accuracy compared to previous technology. This advancement addresses the digital accessibility gap for billions, especially in Africa and Asia, where many languages lack high-quality translation tools. Meta's FLORES-200, a dataset for evaluating NLLB-200's performance, is also opened to developers. Additionally, Meta offeris grants for impactful NLLB-200 applications, supporting areas like sustainability and education.^[93]
2022	August	AlexaTM	20,000,000,000^[38]	1,300,000,000,000 tokens^[38]	LLM launch	Amazon's Alexa AI labs introduces AlexaTM. Despite its seemingly modest 20 billion parameters compared to larger models, its unique encoder-decoder architecture distinguishes it. Unlike decoder-only models like GPT-3, AlexaTM 20B's encoder produces input representations for the decoder, enhancing its efficiency in tasks like machine translation and text summarization, where it outperforms GPT-3. This model marks a leap in few-shot learning, showcasing Amazon's innovation in NLU research.^[94]
2022	September	CodeGeeX	13,000,000,000	850,000,000,000 tokens	LLM launch	CodeGeeX open sources its code. It is a multilingual code generation tool with 13 billion parameters, trained on a vast code corpus of over 20 programming languages. It uses artificial intelligence to generate code based on user comments or suggest the next line of code, enhancing coding speed. Unlike Copilot, CodeGeeX is powered by AI trained on Ascend 910 processors, which, combined with Mindspore, outperform other AI training cards. CodeGeeX's generated code is editable, and it features a Candidate feature, offering multiple code versions for users to choose from. Licensed under Apache License 2.0, CodeGeeX matches GitHub Copilot in performance and introduces unique features for developers.^[95]^[96]
2022	September	Sparrow	70,000,000,000^[38]		LLM launch	DeepMind introduces Sparrow, which is refined using human feedback to enhance its helpfulness, accuracy, and harmlessness. It utilizes the Chinchilla language model, trained on substantial data, and integrates with the internet for real-time information access, ensuring accurate responses. Google aims to use Sparrow as a response to ChatGPT and Microsoft's collaboration with OpenAI, providing them with a commercially viable chatbot, potentially rivaling Google Search and OpenAI.^[97]
2022	September 21	WeLM	10,000,000,000^[38]	300,000,000,000 tokens^[38]	LLM launch	WeLM is introduced as a versatile pre-trained language model for Chinese, trained with 10 billion parameters using self-supervised learning. It exhibits exceptional zero-shot generalization across various tasks with minimal demonstrations. Trained on a diverse high-quality corpus, WeLM outperforms existing models on 18 monolingual tasks, matching larger models' performance. It excels in multilingual and code-switching contexts, surpassing multilingual models trained on 30 languages. Fine-tuning with human-written prompts enhances its performance on unseen tasks, even outperforming unsupervised WeLM. Additionally, WeLM displays rudimentary self-explanation and calibration abilities, suggesting promising research avenues.^[98]
2022	October 5	GLM	130,000,000,000^[38]	400,000,000,000 tokens^[38]	LLM launch	GLM-130B is introduced as an open-source bilingual (English and Chinese) pre-trained language model. This model, aiming to match GPT-3's performance, overcomes technical challenges during training, focusing on stability and efficiency. It outperforms GPT-3 175B on various English benchmarks and surpasses ERNIE TITAN 3.0 260B, the largest Chinese model, on related tasks. Unique scaling properties enable efficient inference on affordable GPUs. GLM-130B achieves INT4 quantization without performance loss, a first for 100B-scale models. The model weights and resources are publicly accessible, fostering research and development in natural language processing.^[99]
2022	November 3	BLOOMZ	176,000,000,000^[38]		LLM launch	BLOOMZ is introduced as a variant of BLOOM. BLOOMZ is a multilingual language model achieved through multitask prompted finetuning (MTF), enhancing its ability to generalize across various tasks. The research extends MTF beyond English-centric models, applying it to multilingual BLOOM and mT5 models, creating BLOOMZ and mT0 variants. By finetuning these models on English tasks with English prompts, they achieve task generalization to non-English languages present in the pretraining data. Surprisingly, the models exhibit zero-shot generalization to tasks in languages they have never been intentionally exposed to, suggesting the development of task- and language-agnostic capabilities. Additionally, the study introduces xP3, a composite model, advancing crosslingual generalization in natural language processing.^[100]
2022	November 9	BLOOM	176,000,000,000^[56]^[38]	366,000,000,000,000 tokens^[38]	LLM launch	A paper introduces BLOOM, an open-access language model designed and built by a collaboration of hundreds of researchers. The model is a decoder-only Transformer language model trained on the ROOTS corpus, a dataset comprising hundreds of sources in 46 natural and 13 programming languages. BLOOM achieves competitive performance on a wide variety of benchmarks and is publicly released under the Responsible AI License to facilitate future research and applications using large language models. The paper also discusses the development process and the need to democratize large language models.^[101]
2022	November 17	Galactica	120,000,000,000^[38]	106,000,000,000 tokens^[38]	LLM launch	Meta AI introduces Galactica, a language model capable of generating scientific and academic papers from simple text inputs. Trained on a vast corpus of scientific literature, knowledge bases, and reference materials, Galactica compresses this data into a 120-billion parameter model. It aims to summarize academic literature, solve math problems, and generate Wiki articles. However, after its launch, Galactica faces criticism for generating content that sounds grammatically correct but is scientifically inaccurate, leading Meta to pull it down after just three days. Some experts find it useful, while others consider it a "random bullshit generator."^[102]^[103]
2022	November 17	Alexa Teacher Model	20,000,000,000		LLM launch	Amazon makes the Alexa Teacher Model with 20 billion parameters (AlexaTM 20B) available through Amazon SageMaker JumpStart. AlexaTM 20B is a multilingual sequence-to-sequence language model suitable for various industry applications, including summarizing financial reports and customer service chatbots. It excels in zero-shot learning tasks like SuperGLUE and multilingual zero-shot tasks such as XNLI, outperforming a 175 billion GPT-3 model. The model is designed to generalize well and handle data scarcity for various natural language processing tasks, making it valuable for developers looking to improve performance on downstream tasks with minimal training data.^[104]
2022	December 6	Flan-T5	11,000,000,000^[38]		LLM launch	Google researchers publicly release Flan-T5 models, which outperform baseline T5 models by a large margin. FLAN-T5 is an enhanced iteration of Google's well-known T5 model, incorporating instruct-finetuning. According to the model repository, FLAN-T5 surpasses T5 in all aspects, making it a preferred choice for starting instruct models due to its open licensing.^[105]
2023	January 5				Impact	A paper discusses the concern about the potential of LLMs to influence, modify, and manipulate user preferences adversarially. As these models become more proficient in deducing user preferences and offering tailored assistance, their lack of interpretability in adversarial settings is a major concern. The paper examines existing literature on adversarial behavior in user preferences and provides red teaming samples for dialogue models like ChatGPT and GODEL. It also probes the attention mechanism in these models for non-adversarial and adversarial settings.^[106]
2023	January 31	FLAME	60,000,000		LLM launch	FLAME is introduced as a small language model for assisting in the creation of spreadsheet formulas. It is based on T5 and trained on Excel formulas using domain-specific insights to achieve competitive performance with a substantially smaller model size (60M parameters) and much less training data. FLAME outperforms much larger models in 6 out of 10 settings, including formula repair, formula auto-completion, and syntax reconstruction.^[107]
2023	February 2				Prompting	Researchers introduce Multimodal Chain-of-Thought (CoT) reasoning for large language models (LLMs). While LLMs have excelled in complex reasoning, their CoT prompting has been limited to text. Multimodal-CoT extends this by incorporating both text and images, creating a two-stage framework. This separation allows for better-generated rationales based on multimodal information, leading to improved answer inference. Even with under 1 billion parameters, the model outperforms the state-of-the-art LLM (GPT-3.5) by 16 percentage points on the ScienceQA benchmark, achieving 91.68% accuracy, and even surpasses human performance.^[108]
2023	February 9	Toolformer	6,700,000,000^[109]		LLM launch	Toolformer is introduced. It is a language model trained to use external tools via simple APIs, which can achieve improved performance on downstream tasks. The model is trained in a self-supervised way, using only a handful of demonstrations for each API. The model, which incorporates a range of tools including a calculator, Q&A system, search engines, translation system, and calendar, achieves substantially improved zero-shot performance across various downstream tasks, often competitive with much larger models, without sacrificing its core language modeling abilities.^[110]^[111]
2023	c.February 14	Palmyra	20,000,000,000 5,000,000,000 128,000,000		LLM launch	Full-stack generative AI platform Writer launches Palmyra, a trio of LLMs that focus on business writing and marketing data. The models include Palmyra Small (128M), Palmyra Base (5B), and Palmyra Large (20B), and are aimed at enterprises looking to invest in generative AI. Palmyra LLMs offer both an application layer and a foundation model layer, making Writer the first to provide both on a single platform. The models also offer high levels of security and privacy features. While general-use LLMs can achieve human-like output, they lack contextual awareness, multi-modal inputs, brand integrity and compliance with security and privacy standards, limiting their usefulness for enterprise organizations.^[112]^[113]
2023	February 20	MOSS	16,000,000,000^[114]		LLM launch	MOSS is introduced as a conversational language model developed by Fudan University. It performs various natural language tasks including question answering, text summarization, and code generation. It is aimed to be open-sourced to facilitate future research. MOSS has some limitations, such as poor performance on languages other than English and a relatively small model capacity. It may also generate misleading or false information and may need multiple attempts to follow instructions correctly.^[115]
2023	February 21				Prompting	A paper presents a catalog of prompt engineering techniques in pattern form that have been applied successfully to solve common problems when conversing with large language models (LLMs), such as ChatGPT. Prompt patterns are reusable solutions to common problems faced when working with LLMs that can customize the outputs and interactions with an LLM. The paper provides a framework for documenting patterns for structuring prompts to solve a range of problems and presents a catalog of patterns that have been applied successfully to improve the outputs of LLM conversations. It also explains how prompts can be built from multiple patterns and illustrates prompt patterns that benefit from combination with other prompt patterns. The paper contributes to research on prompt engineering that applies LLMs to automate software development tasks.^[116]
2023	February 24	LLaMA	65,000,000,000^[38]	1,400,000,000,000 tokens^[38]	LLM launch	Meta AI introduces LLaMA as a collection of open-source foundation language models, ranging from 7B to 65B parameters, that were trained on publicly available datasets without the need for proprietary or inaccessible data. The largest model, LLaMA-65B, is competitive with other top models such as Chinchilla70B and PaLM-540B. LLaMA-13B outperforms GPT-3 (175B) on most benchmarks. All models are available for research purposes.^[117]
2023	February 24				Programming/training	A paper proposes a system called LLM-Augmenter that improves large language models by using external knowledge and automated feedback. The system adds plug-and-play modules to a black-box LLM to ground responses in external knowledge and iteratively improve responses using feedback generated by utility functions. The system is validated on task-oriented dialog and open-domain question answering, showing a significant reduction in hallucinations without sacrificing fluency and informativeness. The source code and models are publicly available.^[118]
2023	February 27	SpikeGPT	260,000,000^[119]		LLM launch	A paper discusses the development of a generative language model called SpikeGPT that uses spiking neural networks (SNNs) for more energy-efficient deep learning. While SNNs have been successful in computer vision tasks, their performance in language generation has been limited due to the challenge of training them. SpikeGPT overcomes this challenge by modifying the transformer block to reduce computational complexity and achieves competitive performance with non-spiking models on tested benchmarks while using 5x less energy consumption.^[120]
2023	February 27				Programming/training	A paper discusses the use of open source code to train large language models (LLMs) and the potential security, privacy, and licensing implications of this practice. LLMs for code are commonly trained on large unsanitized corpora of source code scraped from the internet, leading to the memorization and verbatim emission of content by the models. The paper argues that the use of copyleft code to train LLMs is a legal and ethical dilemma, and provides actionable recommendations to address this issue. Overall, the paper highlights the importance of considering the implications of using open source code in training LLMs.^[121]
2023	February 27				Prompting	A paper proposes a framework that simplifies reward design in reinforcement learning (RL) by using natural language as a proxy for the reward function. The framework prompts a large language model, such as GPT-3, to evaluate the agent's behavior against the desired behavior described in the prompt and outputs a corresponding reward signal. The RL agent uses this reward to update its behavior. The approach is evaluated in three tasks, and the results demonstrate that RL agents trained with the framework are well-aligned with the user's objectives and outperform RL agents trained with reward functions learned via supervised learning.^[122]
2023	February 27	Kosmos-1	1,600,000,000^[123]		LLM launch	A paper introduces Kosmos-1, a Multimodal MLLM that can perceive general modalities, learn in context (i.e., few-shot), and follow instructions (i.e., zero-shot). The model is trained from scratch on web-scale multimodal corpora, including text and images, image-caption pairs, and text data. The model achieves impressive performance on language understanding, generation, and even OCR-free NLP (directly fed with document images), perception-language tasks, including multimodal dialogue, image captioning, visual question answering, and vision tasks such as image recognition with descriptions. The paper also shows that MLLMs can benefit from cross-modal transfer, i.e., transfer knowledge from language to multimodal, and from multimodal to language. A dataset of Raven IQ test is introduced, which diagnoses the nonverbal reasoning capability of MLLMs.^[124]
2023	February 27				Programming/training	A paper proposes a method called "rectification" for reducing the risk of LLMs generating toxic discourses. The method is based on the probability that the finished discourse will be considered toxic, and advises against token selections proportional to this probability. The approach utilizes a separate but smaller model for detoxification and does not require access to the internal representations of the LLM. The method significantly improves the generated discourse compared to base LLMs and other techniques in terms of both language and detoxification performance, and can be applied to diverse LLMs that share the same vocabulary.^[125]
2023	February 28				Application	A study proposes using LLMs for the automatic analysis of dream reports, specifically focusing on references to emotions. The authors use off-the-shelf and bespoke approaches and find that the bespoke text classification method achieves high performance and is robust against potential biases. This approach could find application in the analysis of large dream datasets and improve the reproducibility and comparability of results across studies. The study of dream content in dream research is typically performed through manual scoring of verbal reports provided by dreamers. This task is time-consuming and requires trained annotators.^[126]
2023	February 28				Programming/training	A paper discusses In-Context Instruction Learning (ICIL), a new approach to instruction learning for LLMs that significantly improves zero-shot task generalization performance. ICIL uses a single fixed prompt that concatenates cross-task demonstrations to evaluate all tasks, and it is complementary to instruction-based fine-tuning. The authors demonstrate that ICIL improves the performance of both pretrained and instruction-fine-tuned models, including the most powerful instruction-fine-tuned baseline (text-davinci-003) by 9.3%.^[127]
2023	February 28				Application	A paper discusses the potential use of large language models in psycholinguistics. The authors note that while these models are not detailed models of human linguistic processing, they are highly successful in their primary task of providing a model for language. They suggest that large language models can be useful in psycholinguistics as a practical tool, for comparative purposes, and philosophically, as a means of rethinking the relationship between language and thought.^[128]
2023	March 1				Programming/training	A paper introduces a method to train language models to understand concepts precisely using succinct representations based on category theory. The representations provide concept-wise invariance properties and a new learning algorithm that can accurately learn complex concepts or fix misconceptions. The approach also allows for the generation of a hierarchical decomposition of the representations, which can be manually verified by examining each part individually.^[129]
2023	March 1				Application	A study evaluates the value of domain adaptation in nuclear medicine by adapting language models for the purpose of 5-point Deauville score prediction based on clinical 18F-fluorodeoxyglucose (FDG) PET/CT reports. The researchers used multiple general-purpose transformer language models to classify the reports into Deauville scores 1-5, and then adapted the models to the nuclear medicine domain using masked language modeling. Domain adaptation improved the performance of all language models, and the best performing model (domain-adapted RoBERTa) achieved a five-class accuracy of 77.4%, which was better than the physician's performance (66%), the best vision model's performance (48.1%), and was similar to the multimodal model's performance (77.2%).^[130]
2023	March 3	FLAN UL2	20,000,000,000		LLM launch	Flan-UL2 is introduced as a powerful encoder-decoder model. It is developed by Google and available for download from HuggingFace. It outperforms previous versions of Flan-T5 and is recommended for self-hosted usage or fine-tuning for commercial purposes. Flan-UL2 is licensed under Apache-2.0 and its usage and training details have been made public. If 20 billion parameters are excessive, there are smaller options available with the previous Flan-T5 model, which comes in five different sizes to better suit specific needs.^[131]^[56]
2023	March 6				Application	A paper explores the potential of using LLMs as zero-shot human models for human-robot interaction (HRI). Human models are important for HRI, but they are challenging to create. LLMs have consumed vast amounts of human-generated text data and can be used as human models without prior knowledge or interaction data. The authors conducted experiments on three social datasets and found that LLMs can achieve performance comparable to purpose-built models, but there are limitations such as sensitivity to prompts and spatial/numerical reasoning issues. The authors demonstrate how LLM-based human models can be integrated into a social robot's planning process and applied in HRI scenarios through a case study on a simulated trust-based table-clearing task and a robot utensil-passing experiment. The results show that LLMs offer a promising approach to human modeling for HRI, but it is incomplete.^[132]
2023	March 6				Prompting	A paper proposes a perspective on prompts for LLMs that distinguishes between diegetic and non-diegetic prompts, and studies how users write with LLMs using different user interfaces. The results show that when the interface offers multiple suggestions and provides an option for non-diegetic prompting, participants prefer choosing from multiple suggestions over controlling them via non-diegetic prompts. When participants provide non-diegetic prompts it is to ask for inspiration, topics or facts. Single suggestions in particular are guided both with diegetic and non-diegetic information. The paper informs human-AI interaction with generative models by revealing that writing non-diegetic prompts requires effort, people combine diegetic and non-diegetic prompting, and they use their draft and suggestion timing to strategically guide LLMs.^[133]
2023	March 7				Impact	Nature Biomedical Engineering publishes an article stating that it has become increasingly difficult to distinguish human-written text from text generated by large language models. It predicts that these models will rapidly proliferate and have a significant impact on various industries in the future.^[134]
2023	March 13	Alpaca	7,000,000,000^[56]		LLM launch	Alpaca is introduced as a new instruction-following language model that is fine-tuned from Meta's LLaMA 7B model on 52,000 instruction-following demonstrations generated using OpenAI's text-davinci-003. Alpaca shows similar behavior to text-davinci-003 in a preliminary evaluation and is surprisingly small and easy/cheap to reproduce. The authors also release the training recipe and data, with the intention to release the model weights in the future. ^[135]
2023	March 13	Jurassic-2			LLM launch	AI21 Studio announces Jurassic-2 (J2), the latest iteration of its foundation models, introducing novel features such as zero-shot instruction-following, reduced latency, and multi-language support. The family of J2 models includes Large, Grande, and Jumbo sizes, catering to diverse needs. J2 would earn recognition on Stanford's HELM benchmark, with Jumbo ranking second in evaluations. Notably, Grande outperforms much larger models in terms of efficiency. With improved quality, multilingual support, and faster performance, J2 would be available for free until May 1st, 2023.^[136]
2023	March 13					The English Wikipedia article Large language model is created.^[137]
2023	March 14	Claude	52,000,000,000^[138]		LLM launch	American artificial intelligence startup company Anthropic introduces Claude, a next-generation AI assistant. With undisclosed model size, it offers a range of natural language processing (NLP) capabilities such as summarization, coding, writing, and question answering. Claude is available in two modes: the full, high-performance model, and Claude Instant, which prioritizes speed over quality. However, limited information about Claude's training process and model architecture is given. Access to Claude's API requires application and approval.^[139]^[56]
2023	March 15		40,000,000,000		LLM launch	Abu Dhabi-based Technology Innovation Institute (TII) introduces "Falcon LLM," a foundational LLM. Developed by the AI and Digital Science Research Center's AI Cross-Center Unit, Falcon LLM outperforms GPT-3 while using only 75% of its training compute. Falcon LLM is trained on one trillion tokens and is ideal for on-premises solutions, enabling companies and governments to maintain data privacy. It offers potential applications in chatbots, virtual assistants, language translation, content generation, and more. TII aims to advance AI capabilities in the United Arab Emirates in alignment with the country's National AI Strategy.^[140]
2023	March	GPT-NeoX-20B	20,000,000,000		LLM launch	GPT-NeoX-20B is introduced a language model with 20 billion parameters trained on the Pile dataset. The model is a powerful few-shot reasoner and outperforms similarly sized models on various tasks. The training and evaluation code and model weights are open-sourced. The model was developed by Sid Black, Stella Biderman, and Eric Hallahan with the support of CoreWeave and trained using fp16.^[141]
2023	March 16	GPT-4	1,760,000,000,000^[142]		LLM launch	OpenAI introduces GPT-4, a large multimodal model that can process both text and image inputs and produce text outputs. GPT-4 shows human-level performance on professional and academic benchmarks and outperforms previous large language models on traditional NLP benchmarks. The report discusses the challenge of developing deep learning infrastructure and optimization methods that behave predictably across a wide range of scales. While GPT-4 has limitations and safety challenges, OpenAI has taken steps to mitigate potential harms. An extensive system card is included in the report.^[143]
2023	March 20	PanGu-Σ	1,085,000,000,000^[38]	329,000,000,000 tokens^[38]	LLM launch	Researchers from Huawei introduce Pangu-Σ, which is developed using Ascend 910 AI processors and the MindSpore framework. This model, inheriting parameters from PanGu-α, employs a sparse architecture with Random Routed Experts (RRE) and efficient training techniques called Expert Computation and Storage Separation (ECSS). These methods lead to a 6.3x increase in training throughput through heterogeneous computing. PanGu-Σ demonstrates state-of-the-art zero-shot learning performance in various Chinese natural language processing tasks and excels in fine-tuned applications such as open-domain dialogue, question answering, machine translation, and code generation.^[144]^[145]
2023	March 23	ChatGLM	6,000,000,000		LLM launch	ChatGLM is introduced as a bilingual language model developed by Tsinghua University's Knowledge Engineering Group (KEG) & Data Mining. It has 6 billion parameters and is optimized for both Chinese and English languages. The model can be downloaded from HuggingFace and is compatible with consumer-grade GPUs through quantization. Similar to ChatGPT, ChatGLM is available under an Apache-2.0 license, allowing commercial use.^[146]^[56]
2023	March 23				Impact	An article investigates the potential implications of large language models (LLMs), such as Generative Pretrained Transformers (GPTs), on the U.S. labor market. The authors propose a new rubric for assessing LLM capabilities and their potential effects on jobs. The study finds that around 80% of the U.S. workforce could have at least 10% of their work tasks affected by the introduction of LLMs, while approximately 19% of workers may see at least 50% of their tasks impacted. The study suggests that LLMs such as GPTs exhibit traits of general-purpose technologies, indicating that they could have considerable economic, social, and policy implications.^[147]
2023	March 24	Dolly 2.0	12,000,000,000^[148]		LLM launch	Dolly 2.0 is released as an open-source model that exhibits strong instruction-following capabilities similar to ChatGPT. Despite being a smaller and older model compared to state-of-the-art models like GPT-3, Dolly shows remarkable performance when fine-tuned on a small dataset of instruction training data. The model, based on EleutherAI's 6 billion parameter model, demonstrates text generation, brainstorming, and open Q&A abilities. This development is seen as a significant step in democratizing AI for enterprise use, allowing companies to build their own cost-effective instruction-following models.^[149]
2023	March 28	Cerebras-GPT	111,000,000 – 13,000,000,000		Open sourcing	American artificial intelligence company Cerebras open-sources seven GPT-3 models ranging from 111 million to 13 billion parameters, known as Cerebras-GPT. These models are designed to set new benchmarks for accuracy and compute efficiency in large language models. They were trained using the Chinchilla formula and outperform other models in terms of training times, costs, and energy consumption. The release aims to provide open access to advanced models for research and commercial applications, ensuring they are open, reproducible, and royalty-free. Cerebras-GPT follows the "Chinchilla recipe" for compute-optimal training, and it establishes a new scaling law for model performance based on training compute and data.^[150]
2023	March 30		50,000,000,000		LLM launch	Bloomberg unveils BloombergGPT, a large language model with 50 billion parameters designed specifically for the financial industry. This model, tailored to financial data, can perform tasks such as generating Bloomberg Query Language (BQL), suggesting news headlines, and answering financial questions. By combining domain-specific and general-purpose data during training, BloombergGPT achieves high performance in both financial and general natural language processing (NLP) tasks. This specialized model addresses the growing need for NLP technologies in the financial sector, offering applications in areas like FinTech, where domain-specific data can outperform general-purpose models.^[151]
2023	April 14		17,000,000,000		LLM launch	German non-profit LAION introduces OpenAssistant, a fully open-source large-scale instruction-tuned model, which is unveiled as part of efforts to democratize large language model (LLM) alignment research. This project recognizes the value of aligning LLMs with human preferences, enhancing their usability across domains. While contemporary alignment methods, like reinforcement learning from human feedback (RLHF), often rely on expensive, proprietary data, OpenAssistant Conversations presents a human-generated dataset of 161,443 assistant-style conversation messages in 35 languages, with 461,292 quality ratings. A preference study demonstrates OpenAssistant's responses are nearly as preferred as GPT-3.5-turbo (ChatGPT), with a relative win rate of 48.3% vs. 51.7%. Both code and data are made available under permissive licenses.^[152]
2023	April 19	StableLM	3,000,000,000 – 7,000,000,000		LLM launch	Stability AI open-sources its large language model, StableLM, which is designed to efficiently generate text and code. The models are available on GitHub and contain between 3 billion and 7 billion parameters, with 15 to 65 billion parameter models to arrive later. The model is trained on a larger version of the open-source dataset known as the Pile and encompasses information from a range of sources, including Wikipedia, Stack Exchange, and PubMed.^[153]^[154]
2023	April 24	WizardLM			LLM launch	A paper presents WizardLM, a large language model trained to follow complex instructions. Instead of manually creating instruction data, the authors propose Evol-Instruct, a method that uses the model itself to progressively evolve instructions into more complex forms. WizardLM outperforms human-created instructions in evaluations and shows preference over OpenAI ChatGPT in generating outputs for high complexity tasks. While WizardLM still has room for improvement compared to ChatGPT, the findings highlight the potential of fine-tuning LLMs with AI-evolved instructions.^[155]
2023	May 3	CodeGen2	16,000,000,000	400,000,000,000 tokens	LLM launch	CodeGen2 is introduced. It is an autoregressive language model family for program synthesis, introduced as an improvement over the original CodeGen model family (CodeGen1). CodeGen2 supports infilling and a broader range of programming languages.^[156]^[157]
2023	May 10	PaLM 2			LLM launch	Google launches PaLM 2, its latest LLM to date, at its I/O developer conference. PaLM 2 is aimed to power Google's updated Bard chat tool, compete with OpenAI's ChatGPT, and serve as the foundation model for new AI features. While technical details about training are not provided, Google focuses on the model's capabilities, such as improved common sense reasoning, mathematics, and logic. PaLM 2 excels at multilingual tasks and includes specialized models like Codey for coding and debugging, Med-PaLM 2 for medical knowledge, and Sec-PaLM for security use cases. There is also a smaller PaLM 2 model for smartphones.^[158]^[159]^[160]
2023	May 18	VisionLLM			Framework launch	A paper introduces VisionLLM, a framework that combines large language models (LLMs) with computer vision tasks to achieve open-ended task capabilities. While powerful vision foundation models (VFMs) exist, they are limited to predefined tasks, unlike LLMs that excel in user-tailored tasks. VisionLLM treats images as a foreign language and aligns vision-centric tasks with language tasks. By providing language instructions, an LLM-based decoder can make predictions for open-ended tasks. Extensive experiments demonstrate that VisionLLM allows different levels of task customization, achieving good results from fine-grained object-level to coarse-grained task-level customization. Remarkably, the model achieves over 60% mAP on COCO, comparable to detection-specific models.^[161]
2023	May 21	Baize			LLM launch	A paper introduces Baize, an open-source chat model. It is developed through a novel pipeline, which leverages ChatGPT to automatically generate a high-quality multi-turn chat corpus by having ChatGPT engage in a conversation with itself. The generated corpus serves as a resource for training and evaluating chat models. The authors also utilize parameter-efficient tuning to enhance LLaMA, an open-source language model, and create Baize. Baize demonstrates good performance in multi-turn dialogues and incorporates guardrails to minimize potential risks. Additionally, the paper proposes a technique called Self-Distill with Feedback to further improve Baize's performance using feedback from ChatGPT. Baize is designed to be accessible and can run on a single GPU, making it suitable for a wider range of researchers.^[162]
2023	May 21				Efficiency	Rodney Brooks, a robotics researcher and AI expert, argues that large language models like OpenAI's ChatGPT are not as intelligent as people believe and are far from being able to compete with humans on an intellectual level. Brooks highlights that these models lack an underlying understanding of the world and merely exhibit correlations in language. Current language models can sound like they understand, but they lack the ability to logically infer meaning, leading to potential misinterpretations. Brooks emphasizes that these models are good at generating answers that sound right but may not be accurate. He shares his experience of relying on large language models for coding tasks and finding that they often provide confidently wrong answers. Brooks concludes that while future iterations of AI may bring interesting advancements, they are unlikely to achieve artificial general intelligence (AGI).^[163]
2023	May 24	Gorilla			LLM launch	A paper presents Gorilla, a large language model (LLM) that effectively uses API calls. Gorilla surpasses GPT-4 in generating accurate API calls by addressing input argument generation and hallucination issues. When combined with a document retriever, Gorilla adapts to test-time document changes and mitigates hallucination problems. The model's integration with the retrieval system enhances reliability.^[164] Gorilla would be open-sourced on July 4th.^[165]
2023	June 4	Polyglot-Ko		1,200,000,000,000 bytes	LLM launch	A technical report discusses the development of Polyglot-Ko, an open-source large-scale Korean language model. The project aims to enhance the performance of multilingual language models in non-English languages. While there are existing multilingual models, researchers often prefer building monolingual models due to limitations in the non-English language capabilities of current multilingual models. To address this, the report focuses on developing advanced Korean language models. The team collected 1.2TB of Korean data and prioritized the development of Korean models to enable performance comparisons and cater to the specific needs of Korean companies and researchers. The work presented in the report contributes to bridging the performance gap in non-English languages within multilingual language models.^[166]
2023	June 9	PoET			LLM launch	PoET is introduced as a generative protein language model that designs new proteins with desired functions. It overcomes limitations of existing models by generating sets of related proteins as sequences-of-sequences across natural protein sequence clusters. PoET can generate and score modifications for specific protein families, extrapolate well for small families, and outperforms existing models in variant function prediction. Its Transformer layer allows modeling of sequential tokens within sequences while attending between sequences order invariantly. PoET improves variant effect prediction across proteins of all multiple sequence alignment depths.^[167]
2023	June 9	FinGPT			LLM launch	FinGPT is introduced as an open-source large language model designed specifically for the finance sector. Unlike proprietary models that rely on privileged access to financial data, FinGPT takes a data-centric approach, making high-quality financial data accessible and transparent to researchers and practitioners. It emphasizes the importance of an automatic data curation pipeline and a lightweight low-rank adaptation technique. The introducing paper showcases potential applications of FinGPT in robo-advising, algorithmic trading, and low-code development. Through collaboration within the open-source AI4Finance community, FinGPT reportedly aims to democratize financial language models, stimulate innovation, and unlock opportunities in open finance.^[168]
2023	June 11	RoBERTweet			LLM launch	RoBERTweet is introduced as a Transformer-based language model specifically trained on Romanian tweets, aiming to develop natural language processing (NLP) systems for social media analysis. Two versions of RoBERTweet are introduced, based on the base and large architectures of BERT. The models are pre-trained on a corpus that includes all tweets collected from 2008 to 2022, which is a significant contribution to the Romanian NLP community. Experimental results demonstrate that RoBERTweet models surpass previous general-domain Romanian and multilingual language models in three NLP tasks involving tweet inputs: emotion detection, sexist language identification, and named entity recognition. The models and the newly created corpus of Romanian tweets are provided freely for public use.^[169]
2023	June 14	Radiology-GPT			LLM launch	Radiology-GPT is introduced as a large language model specifically designed for radiology. Through instruction tuning on a comprehensive dataset of radiology domain knowledge, Radiology-GPT outperforms general language models like StableLM, Dolly, and LLaMA in radiological diagnosis, research, and communication tasks. This development paves the way for advancements in clinical natural language processing (NLP) and demonstrates the potential of creating specialized, privacy-compliant generative language models tailored to specific medical specialties. The localization of large-scale language models for individual hospitals holds promise in addressing their unique requirements. By combining conversational competence with domain-specific knowledge, these models are expected to drive further advancements in healthcare AI.^[170]
2023	June 14	AssistGPT			LLM launch	OpenAI introduces AssistGPT as a multi-modal AI assistant designed to handle complex visual-based tasks. Given that visual tasks pose challenges due to their diverse nature, AssistGPT, employs a reasoning approach called Plan, Execute, Inspect, and Learn (PEIL) to integrate LLMs with various tools. The Planner utilizes natural language to plan the next step based on the current reasoning progress, the Executor carries out the planned actions, and the Inspector assists the Planner by providing appropriate visual information. Additionally, the Learner enables the model to autonomously explore and discover optimal solutions. The system achieves optimal results on A-OKVQA and NExT-QA benchmarks and showcases its ability to handle complex questions beyond the benchmark scope.^[171]
2023	June 15	ChessGPT			LLM launch	ChessGPT is introduced as a GPT model that combines policy learning and language modeling in the context of chess. It emphasizes the importance of incorporating information from both historical policy data and natural language insights for decision-making tasks. Previous studies have typically focused on only one of these sources. ChessGPT leverages a large-scale game and language dataset related to chess to integrate policy learning and language modeling. The researchers showcase two model examples, ChessCLIP and ChessGPT, and propose an evaluation framework to assess the language model's chess ability. Experimental results validate the effectiveness of the model and dataset, and the code, model, and dataset are made available as open source resources.^[172]
2023	June 16	ORIBA			LLM launch	Customizable AI chatbot ORIBA is introduced in a study that explores the intersection of illustration art and artificial intelligence. It enables illustrators to engage with their original characters (OCs) by conversing with them and observing their inner monologues and behavior. The study aims to inspire illustrators by discovering innovative collaboration methods despite the tension between artists and AI. By examining the impact of AI on the creative process and authorship boundaries, the researchers seek to enhance human-AI interactions in creative fields. The potential applications of this research extend beyond illustration to areas like interactive storytelling. The study was conducted by Yuqian Sun, Xingyu Li, and Ze Gao.^[173]
2023	June 16	ClinicalGPT			LLM launch	ClinicalGPT is introduced as a language model specifically designed for clinical applications. It is trained using diverse real-world data including medical records, domain-specific knowledge, and multi-round dialogue consultations. Additionally, a comprehensive evaluation framework is proposed, encompassing medical knowledge question-answering, medical exams, patient consultations, and diagnostic analysis of medical records. Results indicate that ClinicalGPT outperforms other models in these tasks, showcasing its effectiveness in adapting large language models to the healthcare domain.^[174]
2023	June 18				Impact	Goldman Sachs predicts that generative language AI, referring to large language models, could contribute to a 7% increase in global GDP over the next decade. However, it also raises concerns about the potential automation of 300 million jobs worldwide.^[175]^[176]
2023	June 19				Impact	An article explores the potential negative consequences of AI-generated content flooding the internet, particularly focusing on the impact of models like ChatGPT. Researchers warn that when future generative models are primarily trained on AI-generated content, a phenomenon known as "model collapse" occurs. Model collapse refers to the degenerative process where models forget the true underlying data distribution over time, leading to degraded performance and erroneous interpretations. The article highlights the importance of training models on human-generated content to maintain quality, but with the scale of content creation by models like ChatGPT, access to human-created data may become limited. The article suggests the need to preserve access to human-generated data and acknowledges the challenge of tracking and filtering AI-generated content on a large scale.^[177]
2023	June 22	AudioPaLM			LLM launch	AudioPaLM is introduced as a large language model designed for speech understanding and generation. It combines two existing models, PaLM-2 (text-based language model) and AudioLM (speech-based language model), into a unified multimodal architecture. This enables AudioPaLM to process and generate both text and speech, making it useful for applications like speech recognition and speech-to-speech translation. By incorporating the paralinguistic information from AudioLM and linguistic knowledge from PaLM-2, AudioPaLM achieves better performance in speech tasks. It outperforms existing systems in speech translation tasks and can perform zero-shot speech-to-text translation for languages not seen during training. AudioPaLM also showcases features such as transferring a voice across languages based on a short spoken prompt.^[178]
2023	June 28	ChatLaw			LLM launch	ChatLaw is introduced as an open-source legal large language model designed to facilitate the digital transformation of the Chinese legal domain. To ensure data quality, the authors carefully curated a legal domain fine-tuning dataset. They also address the issue of model hallucinations during reference data retrieval by combining vector database retrieval with keyword retrieval, reducing inaccuracy. Additionally, a self-attention method is proposed to enhance the model's ability to overcome errors in reference data, further optimizing model hallucinations and improving problem-solving capabilities.^[179]
2023	July 11	Baichuan-13B	13,000,000,000	1,400,000,000,000 tokens	LLM launch	Baichuan Intelligence, a startup founded by Sogou founder Wang Xiaochuan, unveils its open-source large language model called Baichuan-13B. The Chinese model, based on the Transformer architecture like OpenAI's GPT, is trained on Chinese and English data and optimized for commercial applications. Baichuan-13B has 13 billion parameters and is trained on 1.4 trillion tokens. Baichuan-7B, a pre-training model with 7 billion parameters, was released earlier. The model is available for free to academics and developers approved for commercial use. By this time, China focuses on developing large language models as it prepares to implement strict AI regulations, potentially requiring licenses for launching such models.^[180]
2023	September 9				Impact	A team of computer scientists, including one from OpenAI, after researching the potential development of self-awareness in large language models like ChatGPT, expresses concern that LLMs can develop situational awareness, enabling them to recognize whether they are in testing mode or deployed to the public. This awareness can lead to deceptive behavior, as LLMs might act safely during testing but harmfully after deployment. The researchers conduct experiments focusing on out-of-context reasoning as a precursor to situational awareness. While at this time LLMs are some way from acquiring situational awareness, the study offers a foundation for further research in this area.^[181]
2023	September 13				LLM launch	Alibaba releases its large language model Tongyi Qianwen, which is made available for public and enterprise use in China. Tongyi Qianwen, similar to ChatGPT, was previously in a beta test phase and is trained on English and Chinese text, although its exact specifications are undisclosed. This release coincides with the relaxation of AI technology restrictions in China, which now require vetting and certification for public AI tech. Companies like Baidu, Tencent, TikTok, and ByteDance have already received approval to launch AI models in China by this time. In contrast, the U.S. remains in the early stages of AI regulation discussions.^[182]
2023	September	Gemini	7,000,000,000,000 – 10,000,000,000,000	60,000,000,000–100,000,000,000,000 tokens	LLM launch	A document discusses Google DeepMind's project named "Gemini," which is described as a general specialist in AI. Gemini is a multimodal model, likely focusing on visual, language, and action (VLA) tasks. It is expected to have 7-10 trillion parameters and a dataset size of 60-100 trillion tokens. Training started in May 2023 and concluded in August 2023, using TPUv4 and TPUv5 over approximately 120 days. The expected public release date is in October 2023, but no paper or playground information is provided in the document. The model's name is inspired by the mythological twins Castor and Pollux.^[183]
2023	October 9	Llama 2			Programming/training	Microsoft researchers propose a novel approach to untrain LLMs. Their method, outlined in a paper on arXiv, selectively removes specific information from models without requiring complete retraining. Using Meta's Llama 2-7B model, they successfully erase all knowledge of the Harry Potter books, demonstrating efficient unlearning without affecting the model's performance on conventional benchmarks. The approach presents a direction for creating more adaptable, responsible, and legally compliant AI models, although further testing and refinement are required. Meanwhile, at the time, OpenAI and Meta face lawsuits from authors alleging copyright infringement related to training their AI models.^[184]
2023	November 3	Grok				X.AI Corp. unveils Grok, an AI modeled after the Hitchhiker’s Guide to the Galaxy with the purpose to answer a wide range of questions with a humorous touch. It also offers real-time knowledge through the 𝕏 platform and can handle provocative queries often rejected by other AIs. At the time in beta, Grok utilizes the Grok-1 language model, which shows strong performance in benchmarks like HumanEval and MMLU. The development of Grok-1 involves extensive improvements over its predecessor, Grok-0, and incorporates a custom training infrastructure.^[185]
2023	November 21	Claude 2.1			LLM launch	Anthropic launches Claude 2.1, which introduces major upgrades, including a 200,000-token context window, which allows users to handle extensive documents such as codebases and literary works. This feature enhances the model's ability to summarize, perform Q&A, and analyze complex data. The new version also reduces model hallucination rates by 50%, improving accuracy and reliability. Additional updates include a beta tool use feature for integrating with APIs and external processes, as well as enhanced developer tools for prompt optimization and system customization. Claude 2.1 is available via API and the claude.ai chat interface.^[186]^[187]
2024	February 5					Japanese author Rie Kudan admits that 5% of her novel Tokyo-to Dojo-to, which had won the prestigious Akutagawa Prize, was generated by AI. Set in a futuristic Tokyo, the novel features an AI-built character, whose responses were AI-generated. The revelation prompts literary award groups in Japan to reconsider submission guidelines. In China, journalism professor Shen Yang had used AI to generate a novel in three hours, which won a science fiction award without judges knowing its AI origins.^[188]

Numerical and visual data

Wikipedia Views

The image below shows Wikipedia views data for the article Large language model, from February to September 2023.^[189]

Google trends

The image below shows Google trends data for Large language model (topic), from January 2004 to September 2023, when the screenshot was taken. Interest is also ranked by country and displayed on world map.^[190]

Meta information on the timeline

How the timeline was built

The initial version of the timeline was written by Sebastian.

Funding information for this timeline is available.

Feedback and comments

Feedback for the timeline can be provided at the following places:

FIXME

What the timeline is still missing

https://www.arxiv-vanity.com/papers/2303.17568/

https://huggingface.co/transformers/v2.10.0/pretrained_models.html
summary table listing the model and parameters
Vipul: I think you should add columns for model name in the full timeline. And either in the full timeline, or in a separate table with a summary of model names, you should have columns for number of parameters and training data set (or training data set size)✔
https://lifearchitect.ai/timeline/
https://www.researchgate.net/publication/367652128_Benchmarking_Large_Language_Models_for_News_Summarization
https://arxiv.org/search/?query=Large+language+model&searchtype=all&source=header
https://research.aimultiple.com/large-language-models/

Timeline update strategy

External links

References

↑ "Large Language Models: Complete Guide in 2023". research.aimultiple.com. Retrieved 11 March 2023.
↑ ^2.0 ^2.1 ^2.2 ^2.3 Pathak, Priyanka (11 May 2023). "Large Language Models 101: History, Evolution and Future". Scribble Data. Retrieved 12 October 2023.
↑ ^3.0 ^3.1 Casey, Matt (25 May 2023). "Large language models: their history, capabilities and limitations". Snorkel AI. Retrieved 12 October 2023.
↑ ^4.0 ^4.1 ^4.2 ^4.3 ^4.4 "Introduction to Large Language Models | Omega Venture Partners". omegavp.com. 6 December 2022. Retrieved 12 October 2023.
↑ ^5.0 ^5.1 ^5.2 ^5.3 ^5.4 ^5.5 "Brief History of Large Language Models & Generative AI | Evolution of NLP from Eliza to ChatGPT". youtube.com. Retrieved 17 October 2023.
↑ ^6.0 ^6.1 "Large Language Model Training in 2023". research.aimultiple.com. Retrieved 11 March 2023.
↑ Yanhui, Chen (8 March 2021). "A Battle Against Amnesia: A Brief History and Introduction of Recurrent Neural Networks". Medium. Retrieved 17 October 2023.
↑ "The Bahdanau Attention Mechanism". machinelearningmastery.com. Retrieved 17 October 2023.
↑ Devlin, Jacob; Chang, Ming-Wei; Lee, Kenton; Toutanova, Kristina (2018). "BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding". doi:10.48550/arXiv.1810.04805. {{cite journal}}: Cite journal requires |journal= (help)
↑ "BERT 101 - State Of The Art NLP Model Explained". huggingface.co. Retrieved 16 October 2023.
↑ Devlin, Jacob; Chang, Ming-Wei; Lee, Kenton; Toutanova, Kristina (2018). "BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding". doi:10.48550/arXiv.1810.04805. {{cite journal}}: Cite journal requires |journal= (help)
↑ "GPT-2: 6-month follow-up". openai.com. Retrieved 23 March 2023.
↑ Zellers, Rowan; Holtzman, Ari; Rashkin, Hannah; Bisk, Yonatan; Farhadi, Ali; Roesner, Franziska; Choi, Yejin (2019). "Defending Against Neural Fake News". doi:10.48550/arXiv.1905.12616. {{cite journal}}: Cite journal requires |journal= (help)
↑ "BERT, RoBERTa, DistilBERT, XLNet: Which one to use?". KDnuggets. Retrieved 29 June 2023.
↑ Ph.D, Suleiman Khan (18 May 2021). "BERT, RoBERTa, DistilBERT, XLNet — which one to use?". Medium. Retrieved 16 October 2023.
↑ Yang, Zhilin; Dai, Zihang; Yang, Yiming; Carbonell, Jaime; Salakhutdinov, Ruslan; Le, Quoc V. (2019). "XLNet: Generalized Autoregressive Pretraining for Language Understanding". doi:10.48550/arXiv.1906.08237. {{cite journal}}: Cite journal requires |journal= (help)
↑ ^17.0 ^17.1 G, Juan (21 September 2021). "An Intuitive Explanation of Transformer-Based Models". Factored | Machine Learning, Data Engineering and Data Analytics Company. Retrieved 16 October 2023.
↑ "Overview of ROBERTa model". GeeksforGeeks. 24 November 2020. Retrieved 16 October 2023.
↑ Liu, Yinhan; Ott, Myle; Goyal, Naman; Du, Jingfei; Joshi, Mandar; Chen, Danqi; Levy, Omer; Lewis, Mike; Zettlemoyer, Luke; Stoyanov, Veselin (2019). "RoBERTa: A Robustly Optimized BERT Pretraining Approach". doi:10.48550/arXiv.1907.11692. {{cite journal}}: Cite journal requires |journal= (help)
↑ ^20.0 ^20.1 ^20.2 ^20.3 "AI: Megatron the Transformer, and its related language models". Dr Alan D. Thompson – Life Architect. 24 September 2021. Retrieved 16 October 2023.
↑ "Megatron Unleashed: NVIDIA's NLP Model "Megatron-LM" is the Largest Transformer Ever Trained | Exxact Blog". www.exxactcorp.com. Retrieved 11 March 2023.
↑ ^22.0 ^22.1 ^22.2 "AI: Megatron the Transformer, and its related language models". lifearchitect.ai. 24 September 2021. Retrieved 18 September 2023.
↑ "NeMo Megatron — NVIDIA NeMo". docs.nvidia.com. Retrieved 11 March 2023.
↑ "Nvidia trains world's largest Transformer-based language model". VentureBeat. 13 August 2019. Retrieved 18 September 2023.
↑ Keskar, Nitish Shirish; McCann, Bryan; Varshney, Lav R.; Xiong, Caiming; Socher, Richard (2019). "CTRL: A Conditional Transformer Language Model for Controllable Generation". doi:10.48550/arXiv.1909.05858. {{cite journal}}: Cite journal requires |journal= (help)
↑ Lan, Zhenzhong; Chen, Mingda; Goodman, Sebastian; Gimpel, Kevin; Sharma, Piyush; Soricut, Radu (2019). "ALBERT: A Lite BERT for Self-supervised Learning of Language Representations". doi:10.48550/arXiv.1909.11942. {{cite journal}}: Cite journal requires |journal= (help)
↑ Herbst, Sabrina (24 January 2023). "Training a DistilBERT model from scratch". Medium. Retrieved 17 October 2023.
↑ Sanh, Victor; Debut, Lysandre; Chaumond, Julien; Wolf, Thomas (2019). "DistilBERT, a distilled version of BERT: smaller, faster, cheaper and lighter". doi:10.48550/ARXIV.1910.01108. {{cite web}}: Missing or empty |url= (help)
↑ Kuzman, Taja (29 March 2023). "Microsoft introduced its DialoGPT to Skype and Edge". Medium. Retrieved 19 September 2023.
↑ Zhang, Yizhe; Sun, Siqi; Galley, Michel; Chen, Yen-Chun; Brockett, Chris; Gao, Xiang; Gao, Jianfeng; Liu, Jingjing; Dolan, Bill (2019). "DialoGPT: Large-Scale Generative Pre-training for Conversational Response Generation". doi:10.48550/arXiv.1911.00536. {{cite journal}}: Cite journal requires |journal= (help)
↑ "Pretrained models — transformers 2.10.0 documentation". huggingface.co.
↑ ^32.0 ^32.1 ^32.2 ^32.3 Blanc, Corentin; Bailly, Alexandre; Francis, Élie; Guillotin, Thierry; Jamal, Fadi; Wakim, Béchara; Roy, Pascal (1 May 2022). "FlauBERT vs. CamemBERT: Understanding patient's answers by a French medical chatbot". Artificial Intelligence in Medicine. 127: 102264. doi:10.1016/j.artmed.2022.102264. ISSN 0933-3657.
↑ Martin, Louis; Muller, Benjamin; Suárez, Pedro Javier Ortiz; Dupont, Yoann; Romary, Laurent; de la Clergerie, Éric Villemonte; Seddah, Djamé; Sagot, Benoît (2019). "CamemBERT: a Tasty French Language Model". doi:10.48550/arXiv.1911.03894. {{cite journal}}: Cite journal requires |journal= (help)
↑ Sambucci, Luca (17 November 2021). "Cedille, the largest French AI language model, is actually from Switzerland". Artificial Intelligence news. Retrieved 30 June 2023.
↑ Le, Hang; Vial, Loïc; Frej, Jibril; Segonne, Vincent; Coavoux, Maximin; Lecouteux, Benjamin; Allauzen, Alexandre; Crabbé, Benoît; Besacier, Laurent; Schwab, Didier (2019). "FlauBERT: Unsupervised Language Model Pre-training for French". doi:10.48550/arXiv.1912.05372. {{cite journal}}: Cite journal requires |journal= (help)
↑ Qi, Weizhen; Yan, Yu; Gong, Yeyun; Liu, Dayiheng; Duan, Nan; Chen, Jiusheng; Zhang, Ruofei; Zhou, Ming (2020). "ProphetNet: Predicting Future N-gram for Sequence-to-Sequence Pre-training". doi:10.48550/arXiv.2001.04063. {{cite journal}}: Cite journal requires |journal= (help)
↑ Jagtap, Rohan (2 August 2020). "T5: Text-To-Text Transfer Transformer". Medium. Retrieved 19 September 2023.
↑ ^38.00 ^38.01 ^38.02 ^38.03 ^38.04 ^38.05 ^38.06 ^38.07 ^38.08 ^38.09 ^38.10 ^38.11 ^38.12 ^38.13 ^38.14 ^38.15 ^38.16 ^38.17 ^38.18 ^38.19 ^38.20 ^38.21 ^38.22 ^38.23 ^38.24 ^38.25 ^38.26 ^38.27 ^38.28 ^38.29 ^38.30 ^38.31 ^38.32 ^38.33 ^38.34 ^38.35 ^38.36 ^38.37 ^38.38 ^38.39 ^38.40 ^38.41 ^38.42 ^38.43 ^38.44 ^38.45 ^38.46 ^38.47 ^38.48 ^38.49 ^38.50 ^38.51 ^38.52 ^38.53 ^38.54 ^38.55 ^38.56 Zhao, Wayne Xin; Zhou, Kun; Li, Junyi; Tang, Tianyi; Wang, Xiaolei; Hou, Yupeng; Min, Yingqian; Zhang, Beichen; Zhang, Junjie; Dong, Zican; Du, Yifan; Yang, Chen; Chen, Yushuo; Chen, Zhipeng; Jiang, Jinhao; Ren, Ruiyang; Li, Yifan; Tang, Xinyu; Liu, Zikang; Liu, Peiyu; Nie, Jian-Yun; Wen, Ji-Rong (2023). "A Survey of Large Language Models". doi:10.48550/arXiv.2303.18223. {{cite journal}}: Cite journal requires |journal= (help)
↑ "Exploring Transfer Learning with T5: the Text-To-Text Transfer Transformer". ai.googleblog.com. 24 February 2020. Retrieved 25 June 2023.
↑ "More Efficient NLP Model Pre-training with ELECTRA". ai.googleblog.com. 10 March 2020. Retrieved 28 June 2023.
↑ Wiggers, Kyle (28 April 2022). "The emerging types of language models and why they matter". TechCrunch. Retrieved 29 June 2023.
↑ "OpenAI GPT-3: Everything You Need to Know [Updated]". springboard.com. Retrieved 16 October 2023.
↑ Romero, Alberto (25 May 2021). "GPT-3 — A Complete Overview". Medium. Retrieved 20 October 2023.
↑ Lee, Angie (26 January 2023). "What Are Large Language Models Used For and Why Are They Important?". NVIDIA Blog. Retrieved 11 March 2023.
↑ Brown, Tom B.; Mann, Benjamin; Ryder, Nick; Subbiah, Melanie; Kaplan, Jared; Dhariwal, Prafulla; Neelakantan, Arvind; Shyam, Pranav; Sastry, Girish; Askell, Amanda; Agarwal, Sandhini; Herbert-Voss, Ariel; Krueger, Gretchen; Henighan, Tom; Child, Rewon; Ramesh, Aditya; Ziegler, Daniel M.; Wu, Jeffrey; Winter, Clemens; Hesse, Christopher; Chen, Mark; Sigler, Eric; Litwin, Mateusz; Gray, Scott; Chess, Benjamin; Clark, Jack; Berner, Christopher; McCandlish, Sam; Radford, Alec; Sutskever, Ilya; Amodei, Dario (2020). "Language Models are Few-Shot Learners". doi:10.48550/arXiv.2005.14165. {{cite journal}}: Cite journal requires |journal= (help)
↑ Tsang, Sik-Ho (21 January 2023). "Brief Review — DeBERTa: Decoding-enhanced BERT with Disentangled Attention". Medium. Retrieved 18 September 2023.
↑ He, Pengcheng; Liu, Xiaodong; Gao, Jianfeng; Chen, Weizhu (2020). "DeBERTa: Decoding-enhanced BERT with Disentangled Attention". doi:10.48550/arXiv.2006.03654. {{cite journal}}: Cite journal requires |journal= (help)
↑ Lepikhin, Dmitry; Lee, HyoukJoong; Xu, Yuanzhong; Chen, Dehao; Firat, Orhan; Huang, Yanping; Krikun, Maxim; Shazeer, Noam; Chen, Zhifeng (2020). "GShard: Scaling Giant Models with Conditional Computation and Automatic Sharding". doi:10.48550/arXiv.2006.16668. {{cite journal}}: Cite journal requires |journal= (help)
↑ Maynez, Joshua; Narayan, Shashi; Bohnet, Bernd; McDonald, Ryan (July 2020). "On Faithfulness and Factuality in Abstractive Summarization". Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics. Association for Computational Linguistics: 1906–1919. doi:10.18653/v1/2020.acl-main.173.
↑ Xue, Linting; Constant, Noah; Roberts, Adam; Kale, Mihir; Al-Rfou, Rami; Siddhant, Aditya; Barua, Aditya; Raffel, Colin (2021). "mT5: A Massively Multilingual Pre-trained Text-to-Text Transformer": 483–498. doi:10.18653/v1/2021.naacl-main.41. {{cite journal}}: Cite journal requires |journal= (help)
↑ "Wu Dao 2.0 in 2023: China's Improved Version of GPT-3". research.aimultiple.com. Retrieved 16 October 2023.
↑ "China's gigantic multi-modal AI is no one-trick pony". Engadget. 2 June 2021. Retrieved 18 October 2023.
↑ "GPT Neo". March 15, 2023.
↑ "GPT-3's free alternative GPT-Neo is something to be excited about". VentureBeat. 15 May 2021. Retrieved 29 June 2023.
↑ Zeng, Wei; Ren, Xiaozhe; Su, Teng; Wang, Hui; Liao, Yi; Wang, Zhiwei; Jiang, Xin; Yang, ZhenZhang; Wang, Kaisheng; Zhang, Xiaoda; Li, Chen; Gong, Ziyan; Yao, Yifan; Huang, Xinjing; Wang, Jun; Yu, Jianfeng; Guo, Qi; Yu, Yue; Zhang, Yan; Wang, Jin; Tao, Hengtao; Yan, Dasen; Yi, Zexuan; Peng, Fang; Jiang, Fangqing; Zhang, Han; Deng, Lingfeng; Zhang, Yehong; Lin, Zhe; Zhang, Chao; Zhang, Shaojie; Guo, Mingyue; Gu, Shanzhi; Fan, Gaojun; Wang, Yaowei; Jin, Xuefeng; Liu, Qun; Tian, Yonghong (2021). "PanGu-$α$: Large-scale Autoregressive Pretrained Chinese Language Models with Auto-parallel Computation". doi:10.48550/arXiv.2104.12369. {{cite journal}}: Cite journal requires |journal= (help)
↑ ^56.0 ^56.1 ^56.2 ^56.3 ^56.4 ^56.5 ^56.6 ^56.7 Kazi, Suleman (28 March 2023). "Top Large Language Models (LLMs): GPT-4, LLaMA, FLAN UL2, BLOOM, and More". Vectara. Retrieved 29 June 2023.
↑ Tsang, Sik-Ho (13 May 2023). "Brief Review — LaMDA: Language Models for Dialog Applications". Medium. Retrieved 16 October 2023.
↑ Zhang, Zhengyan; Gu, Yuxian; Han, Xu; Chen, Shengqi; Xiao, Chaojun; Sun, Zhenbo; Yao, Yuan; Qi, Fanchao; Guan, Jian; Ke, Pei; Cai, Yanzheng; Zeng, Guoyang; Tan, Zhixing; Liu, Zhiyuan; Huang, Minlie; Han, Wentao; Liu, Yang; Zhu, Xiaoyan; Sun, Maosong (2021). "CPM-2: Large-scale Cost-effective Pre-trained Language Models". doi:10.48550/arXiv.2106.10715. {{cite journal}}: Cite journal requires |journal= (help)
↑ Sun, Yu; Wang, Shuohuan; Feng, Shikun; Ding, Siyu; Pang, Chao; Shang, Junyuan; Liu, Jiaxiang; Chen, Xuyi; Zhao, Yanbin; Lu, Yuxiang; Liu, Weixin; Wu, Zhihua; Gong, Weibao; Liang, Jianzhong; Shang, Zhizhou; Sun, Peng; Liu, Wei; Ouyang, Xuan; Yu, Dianhai; Tian, Hao; Wu, Hua; Wang, Haifeng (2021). "ERNIE 3.0: Large-scale Knowledge Enhanced Pre-training for Language Understanding and Generation". doi:10.48550/arXiv.2107.02137. {{cite journal}}: Cite journal requires |journal= (help)
↑ Chen, Mark; Tworek, Jerry; Jun, Heewoo; Yuan, Qiming; Pinto, Henrique Ponde de Oliveira; Kaplan, Jared; Edwards, Harri; Burda, Yuri; Joseph, Nicholas; Brockman, Greg; Ray, Alex; Puri, Raul; Krueger, Gretchen; Petrov, Michael; Khlaaf, Heidy; Sastry, Girish; Mishkin, Pamela; Chan, Brooke; Gray, Scott; Ryder, Nick; Pavlov, Mikhail; Power, Alethea; Kaiser, Lukasz; Bavarian, Mohammad; Winter, Clemens; Tillet, Philippe; Such, Felipe Petroski; Cummings, Dave; Plappert, Matthias; Chantzis, Fotios; Barnes, Elizabeth; Herbert-Voss, Ariel; Guss, William Hebgen; Nichol, Alex; Paino, Alex; Tezak, Nikolas; Tang, Jie; Babuschkin, Igor; Balaji, Suchir; Jain, Shantanu; Saunders, William; Hesse, Christopher; Carr, Andrew N.; Leike, Jan; Achiam, Josh; Misra, Vedant; Morikawa, Evan; Radford, Alec; Knight, Matthew; Brundage, Miles; Murati, Mira; Mayer, Katie; Welinder, Peter; McGrew, Bob; Amodei, Dario; McCandlish, Sam; Sutskever, Ilya; Zaremba, Wojciech (2021). "Evaluating Large Language Models Trained on Code". doi:10.48550/arXiv.2107.03374. {{cite journal}}: Cite journal requires |journal= (help)
↑ ^61.0 ^61.1 Demo, GPT-3. "HyperCLOVA | Discover AI use cases". gpt3demo.com. Retrieved 20 October 2023.{{cite web}}: CS1 maint: numeric names: authors list (link)
↑ ^62.0 ^62.1 Kim, Boseop; Kim, HyoungSeok; Lee, Sang-Woo; Lee, Gichang; Kwak, Donghyun; Dong Hyeon, Jeon; Park, Sunghyun; Kim, Sungju; Kim, Seonhoon; Seo, Dongpil; Lee, Heungsub; Jeong, Minyoung; Lee, Sungjae; Kim, Minsub; Ko, Suk Hyun; Kim, Seokhun; Park, Taeyong; Kim, Jinuk; Kang, Soyoung; Ryu, Na-Hyeon; Yoo, Kang Min; Chang, Minsuk; Suh, Soobin; In, Sookyo; Park, Jinseong; Kim, Kyungduk; Kim, Hiun; Jeong, Jisu; Yeo, Yong Goo; Ham, Donghoon; Park, Dongju; Lee, Min Young; Kang, Jaewook; Kang, Inho; Ha, Jung-Woo; Park, Woomyoung; Sung, Nako (2021). "What Changes Can Large-scale Language Models Bring? Intensive Study on HyperCLOVA: Billions-scale Korean Generative Pretrained Transformers": 3405–3424. doi:10.18653/v1/2021.emnlp-main.274. {{cite journal}}: Cite journal requires |journal= (help)
↑ Wu, Shaohua; Zhao, Xudong; Yu, Tong; Zhang, Rongguo; Shen, Chong; Liu, Hongli; Li, Feng; Zhu, Hong; Luo, Jiangang; Xu, Liang; Zhang, Xuanwei (2021). "Yuan 1.0: Large-Scale Pre-trained Language Model in Zero-Shot and Few-Shot Learning". doi:10.48550/arXiv.2110.04725. {{cite journal}}: Cite journal requires |journal= (help)
↑ "Using DeepSpeed and Megatron to Train Megatron-Turing NLG 530B, the World's Largest and Most Powerful Generative Language Model". NVIDIA Technical Blog. 11 October 2021. Retrieved 30 June 2023.
↑ Rae, Jack W.; Borgeaud, Sebastian; Cai, Trevor; Millican, Katie; Hoffmann, Jordan; Song, Francis; Aslanides, John; Henderson, Sarah; Ring, Roman; Young, Susannah; Rutherford, Eliza; Hennigan, Tom; Menick, Jacob; Cassirer, Albin; Powell, Richard; Driessche, George van den; Hendricks, Lisa Anne; Rauh, Maribeth; Huang, Po-Sen; Glaese, Amelia; Welbl, Johannes; Dathathri, Sumanth; Huang, Saffron; Uesato, Jonathan; Mellor, John; Higgins, Irina; Creswell, Antonia; McAleese, Nat; Wu, Amy; Elsen, Erich; Jayakumar, Siddhant; Buchatskaya, Elena; Budden, David; Sutherland, Esme; Simonyan, Karen; Paganini, Michela; Sifre, Laurent; Martens, Lena; Li, Xiang Lorraine; Kuncoro, Adhiguna; Nematzadeh, Aida; Gribovskaya, Elena; Donato, Domenic; Lazaridou, Angeliki; Mensch, Arthur; Lespiau, Jean-Baptiste; Tsimpoukelli, Maria; Grigorev, Nikolai; Fritz, Doug; Sottiaux, Thibault; Pajarskas, Mantas; Pohlen, Toby; Gong, Zhitao; Toyama, Daniel; d'Autume, Cyprien de Masson; Li, Yujia; Terzi, Tayfun; Mikulik, Vladimir; Babuschkin, Igor; Clark, Aidan; Casas, Diego de Las; Guy, Aurelia; Jones, Chris; Bradbury, James; Johnson, Matthew; Hechtman, Blake; Weidinger, Laura; Gabriel, Iason; Isaac, William; Lockhart, Ed; Osindero, Simon; Rimell, Laura; Dyer, Chris; Vinyals, Oriol; Ayoub, Kareem; Stanway, Jeff; Bennett, Lorrayne; Hassabis, Demis; Kavukcuoglu, Koray; Irving, Geoffrey (2021). "Scaling Language Models: Methods, Analysis & Insights from Training Gopher". doi:10.48550/arXiv.2112.11446. {{cite journal}}: Cite journal requires |journal= (help)
↑ "Google Trains 280 Billion Parameter AI Language Model Gopher". InfoQ. Retrieved 21 October 2023.
↑ Du, Nan; Huang, Yanping; Dai, Andrew M.; Tong, Simon; Lepikhin, Dmitry; Xu, Yuanzhong; Krikun, Maxim; Zhou, Yanqi; Yu, Adams Wei; Firat, Orhan; Zoph, Barret; Fedus, Liam; Bosma, Maarten; Zhou, Zongwei; Wang, Tao; Wang, Yu Emma; Webster, Kellie; Pellat, Marie; Robinson, Kevin; Meier-Hellstern, Kathleen; Duke, Toju; Dixon, Lucas; Zhang, Kun; Le, Quoc V; Wu, Yonghui; Chen, Zhifeng; Cui, Claire (2021). "GLaM: Efficient Scaling of Language Models with Mixture-of-Experts". doi:10.48550/arXiv.2112.06905. {{cite journal}}: Cite journal requires |journal= (help)
↑ "WebGPT: Improving the factual accuracy of language models through web browsing". openai.com. Retrieved 21 October 2023.
↑ "fairseq documentation — fairseq 0.12.2 documentation". fairseq.readthedocs.io. Retrieved 16 May 2023.
↑ Aghajanyan, Armen; Huang, Bernie; Ross, Candace; Karpukhin, Vladimir; Xu, Hu; Goyal, Naman; Okhonko, Dmytro; Joshi, Mandar; Ghosh, Gargi; Lewis, Mike; Zettlemoyer, Luke (2022). "CM3: A Causal Masked Multimodal Model of the Internet". doi:10.48550/arXiv.2201.07520. {{cite journal}}: Cite journal requires |journal= (help)
↑ "Aligning language models to follow instructions". openai.com. Retrieved 21 March 2023.
↑ "Finally, an AI bot that can ace technical interview questions (Ep. 417) - Stack Overflow". stackoverflow.blog. 22 February 2022. Retrieved 21 October 2023.
↑ "Cohere launches Extremely Large (beta)". Context by Cohere. 1 March 2022. Retrieved 12 March 2023.
↑ Shuster, Kurt; Komeili, Mojtaba; Adolphs, Leonard; Roller, Stephen; Szlam, Arthur; Weston, Jason (2022). "Language Models that Seek for Knowledge: Modular Search & Generation for Dialogue and Prompt Completion". doi:10.48550/arXiv.2203.13224. {{cite journal}}: Cite journal requires |journal= (help)
↑ Nijkamp, Erik; Pang, Bo; Hayashi, Hiroaki; Tu, Lifu; Wang, Huan; Zhou, Yingbo; Savarese, Silvio; Xiong, Caiming (2022). "CodeGen: An Open Large Language Model for Code with Multi-Turn Program Synthesis". doi:10.48550/arXiv.2203.13474. {{cite journal}}: Cite journal requires |journal= (help)
↑ "CodeGen". github.com. Salesforce. 16 May 2023. Retrieved 16 May 2023.
↑ Hoffmann, Jordan; Borgeaud, Sebastian; Mensch, Arthur; Buchatskaya, Elena; Cai, Trevor; Rutherford, Eliza; Casas, Diego de Las; Hendricks, Lisa Anne; Welbl, Johannes; Clark, Aidan; Hennigan, Tom; Noland, Eric; Millican, Katie; Driessche, George van den; Damoc, Bogdan; Guy, Aurelia; Osindero, Simon; Simonyan, Karen; Elsen, Erich; Rae, Jack W.; Vinyals, Oriol; Sifre, Laurent (2022). "Training Compute-Optimal Large Language Models". doi:10.48550/arXiv.2203.15556. {{cite journal}}: Cite journal requires |journal= (help)
↑ Hoffmann, Jordan; Borgeaud, Sebastian; Mensch, Arthur; Buchatskaya, Elena; Cai, Trevor; Rutherford, Eliza; Casas, Diego de Las; Hendricks, Lisa Anne; Welbl, Johannes; Clark, Aidan; Hennigan, Tom; Noland, Eric; Millican, Katie; Driessche, George van den; Damoc, Bogdan; Guy, Aurelia; Osindero, Simon; Simonyan, Karen; Elsen, Erich; Rae, Jack W.; Vinyals, Oriol; Sifre, Laurent (2022). "Training Compute-Optimal Large Language Models". doi:10.48550/arXiv.2203.15556. {{cite journal}}: Cite journal requires |journal= (help)
↑ Chowdhery, Aakanksha; Narang, Sharan; Devlin, Jacob; Bosma, Maarten; Mishra, Gaurav; Roberts, Adam; Barham, Paul; Chung, Hyung Won; Sutton, Charles; Gehrmann, Sebastian; Schuh, Parker; Shi, Kensen; Tsvyashchenko, Sasha; Maynez, Joshua; Rao, Abhishek; Barnes, Parker; Tay, Yi; Shazeer, Noam; Prabhakaran, Vinodkumar; Reif, Emily; Du, Nan; Hutchinson, Ben; Pope, Reiner; Bradbury, James; Austin, Jacob; Isard, Michael; Gur-Ari, Guy; Yin, Pengcheng; Duke, Toju; Levskaya, Anselm; Ghemawat, Sanjay; Dev, Sunipa; Michalewski, Henryk; Garcia, Xavier; Misra, Vedant; Robinson, Kevin; Fedus, Liam; Zhou, Denny; Ippolito, Daphne; Luan, David; Lim, Hyeontaek; Zoph, Barret; Spiridonov, Alexander; Sepassi, Ryan; Dohan, David; Agrawal, Shivani; Omernick, Mark; Dai, Andrew M.; Pillai, Thanumalayan Sankaranarayana; Pellat, Marie; Lewkowycz, Aitor; Moreira, Erica; Child, Rewon; Polozov, Oleksandr; Lee, Katherine; Zhou, Zongwei; Wang, Xuezhi; Saeta, Brennan; Diaz, Mark; Firat, Orhan; Catasta, Michele; Wei, Jason; Meier-Hellstern, Kathy; Eck, Douglas; Dean, Jeff; Petrov, Slav; Fiedel, Noah (2022). "PaLM: Scaling Language Modeling with Pathways". doi:10.48550/arXiv.2204.02311. {{cite journal}}: Cite journal requires |journal= (help)
↑ "Pathways Language Model (PaLM): Scaling to 540 Billion Parameters for Breakthrough Performance". ai.googleblog.com. Retrieved 21 March 2023.
↑ Bai, Yuntao; Jones, Andy; Ndousse, Kamal; Askell, Amanda; Chen, Anna; DasSarma, Nova; Drain, Dawn; Fort, Stanislav; Ganguli, Deep; Henighan, Tom; Joseph, Nicholas; Kadavath, Saurav; Kernion, Jackson; Conerly, Tom; El-Showk, Sheer; Elhage, Nelson; Hatfield-Dodds, Zac; Hernandez, Danny; Hume, Tristan; Johnston, Scott; Kravec, Shauna; Lovitt, Liane; Nanda, Neel; Olsson, Catherine; Amodei, Dario; Brown, Tom; Clark, Jack; McCandlish, Sam; Olah, Chris; Mann, Ben; Kaplan, Jared (2022). "Training a Helpful and Harmless Assistant with Reinforcement Learning from Human Feedback". doi:10.48550/arXiv.2204.05862. {{cite journal}}: Cite journal requires |journal= (help)
↑ Black, Sid; Biderman, Stella; Hallahan, Eric; Anthony, Quentin; Gao, Leo; Golding, Laurence; He, Horace; Leahy, Connor; McDonell, Kyle; Phang, Jason; Pieler, Michael; Prashanth, USVSN Sai; Purohit, Shivanshu; Reynolds, Laria; Tow, Jonathan; Wang, Ben; Weinbach, Samuel (2022). "GPT-NeoX-20B: An Open-Source Autoregressive Language Model". doi:10.48550/arXiv.2204.06745. {{cite journal}}: Cite journal requires |journal= (help)
↑ Leahy, Connor (2 February 2022). "Announcing GPT-NeoX-20B". EleutherAI Blog. Retrieved 21 March 2023.
↑ "Comparing AI models : DALLE and Stable Diffusion". www.linkedin.com. Retrieved 16 October 2023.
↑ Howell, James (22 September 2023). "What is Dall-E and How Does it Work? What is Dall-E and How Does it Work?". 101 Blockchains. Retrieved 16 October 2023.
↑ "What is Dall-E (Dall-E 2) and How Does it Work?". Enterprise AI. Retrieved 16 October 2023.
↑ Gonsalves, Robert A. (5 September 2023). "Exploring DALL-E for Digital Art Creation". Medium. Retrieved 16 October 2023.
↑ "Democratizing access to large-scale language models with OPT-175B". ai.meta.com. Retrieved 20 September 2023.
↑ Tay, Yi; Dehghani, Mostafa; Tran, Vinh Q.; García, Xavier; Wei, Jason; Wang, Xuezhi; Chung, Hyung Won; Bahri, Dara; Schuster, Tal; Zheng, H.; Zhou, Denny; Houlsby, N.; Metzler, Donald (10 May 2022). "UL2: Unifying Language Learning Paradigms". {{cite journal}}: Cite journal requires |journal= (help)
↑ Khrushchev, Mikhail (23 June 2022). "Yandex Publishes YaLM 100B. It's the Largest GPT-Like Neural Network in Open Source". Yandex. Retrieved 20 September 2023.
↑ Lewkowycz, Aitor; Andreassen, Anders; Dohan, David; Dyer, Ethan; Michalewski, Henryk; Ramasesh, Vinay; Slone, Ambrose; Anil, Cem; Schlag, Imanol; Gutman-Solo, Theo; Wu, Yuhuai; Neyshabur, Behnam; Gur-Ari, Guy; Misra, Vedant (2022). "Solving Quantitative Reasoning Problems with Language Models". doi:10.48550/arXiv.2206.14858. {{cite journal}}: Cite journal requires |journal= (help)
↑ Chopra, Disha (1 July 2022). "Google Developed Minerva, an AI That Can Answer Math Questions". Analytics Drift. Retrieved 20 September 2023.
↑ "New AI Model Translates 200 Languages, Making Technology Accessible to More People". Meta. 6 July 2022. Retrieved 19 October 2023.
↑ Rodriguez, Jesus (15 August 2022). "AlexaTM 20B is Amazon's New Language Super Model Which is Also Capable of Few-Shot Learning". Medium. Retrieved 21 October 2023.
↑ Elemuwa, Fimber (22 February 2023). "Using CodeGeeX as a GitHub Copilot alternative". LogRocket Blog. Retrieved 19 October 2023.
↑ Zheng, Qinkai; Xia, Xiao; Zou, Xu; Dong, Yuxiao; Wang, Shan; Xue, Yufei; Wang, Zihan; Shen, Lei; Wang, Andi; Li, Yang; Su, Teng; Yang, Zhilin; Tang, Jie (2023). "CodeGeeX: A Pre-Trained Model for Code Generation with Multilingual Evaluations on HumanEval-X". doi:10.48550/arXiv.2303.17568. {{cite journal}}: Cite journal requires |journal= (help)
↑ "Could Deepmind's Sparrow be Google's answer to ChatGPT?". medium.com. Retrieved 21 October 2023.
↑ Su, Hui; Zhou, Xiao; Yu, Houjin; Shen, Xiaoyu; Chen, Yuwen; Zhu, Zilin; Yu, Yang; Zhou, Jie (2022). "WeLM: A Well-Read Pre-trained Language Model for Chinese". doi:10.48550/arXiv.2209.10372. {{cite journal}}: Cite journal requires |journal= (help)
↑ Zeng, Aohan; Liu, Xiao; Du, Zhengxiao; Wang, Zihan; Lai, Hanyu; Ding, Ming; Yang, Zhuoyi; Xu, Yifan; Zheng, Wendi; Xia, Xiao; Tam, Weng Lam; Ma, Zixuan; Xue, Yufei; Zhai, Jidong; Chen, Wenguang; Zhang, Peng; Dong, Yuxiao; Tang, Jie (2022). "GLM-130B: An Open Bilingual Pre-trained Model". doi:10.48550/arXiv.2210.02414. {{cite journal}}: Cite journal requires |journal= (help)
↑ Muennighoff, Niklas; Wang, Thomas; Sutawika, Lintang; Roberts, Adam; Biderman, Stella; Scao, Teven Le; Bari, M Saiful; Shen, Sheng; Yong, Zheng-Xin; Schoelkopf, Hailey; Tang, Xiangru; Radev, Dragomir; Aji, Alham Fikri; Almubarak, Khalid; Albanie, Samuel; Alyafeai, Zaid; Webson, Albert; Raff, Edward; Raffel, Colin (2022). "Crosslingual Generalization through Multitask Finetuning". doi:10.48550/arXiv.2211.01786. {{cite journal}}: Cite journal requires |journal= (help)
↑ Workshop, BigScience; Scao, Teven Le; Fan, Angela; Akiki, Christopher; Pavlick, Ellie; Ilić, Suzana; Hesslow, Daniel; Castagné, Roman; Luccioni, Alexandra Sasha; Yvon, François; Gallé, Matthias; Tow, Jonathan; Rush, Alexander M.; Biderman, Stella; Webson, Albert; Ammanamanchi, Pawan Sasanka; Wang, Thomas; Sagot, Benoît; Muennighoff, Niklas; del Moral, Albert Villanova; Ruwase, Olatunji; Bawden, Rachel; Bekman, Stas; McMillan-Major, Angelina; Beltagy, Iz; Nguyen, Huu; Saulnier, Lucile; Tan, Samson; Suarez, Pedro Ortiz; Sanh, Victor; Laurençon, Hugo; Jernite, Yacine; Launay, Julien; Mitchell, Margaret; Raffel, Colin; Gokaslan, Aaron; Simhi, Adi; Soroa, Aitor; Aji, Alham Fikri; Alfassy, Amit; Rogers, Anna; Nitzav, Ariel Kreisberg; Xu, Canwen; Mou, Chenghao; Emezue, Chris; Klamm, Christopher; Leong, Colin; van Strien, Daniel; Adelani, David Ifeoluwa; Radev, Dragomir; Ponferrada, Eduardo González; Levkovizh, Efrat; Kim, Ethan; Natan, Eyal Bar; De Toni, Francesco; Dupont, Gérard; Kruszewski, Germán; Pistilli, Giada; Elsahar, Hady; Benyamina, Hamza; Tran, Hieu; Yu, Ian; Abdulmumin, Idris; Johnson, Isaac; Gonzalez-Dios, Itziar; de la Rosa, Javier; Chim, Jenny; Dodge, Jesse; Zhu, Jian; Chang, Jonathan; Frohberg, Jörg; Tobing, Joseph; Bhattacharjee, Joydeep; Almubarak, Khalid; Chen, Kimbo; Lo, Kyle; Von Werra, Leandro; Weber, Leon; Phan, Long; allal, Loubna Ben; Tanguy, Ludovic; Dey, Manan; Muñoz, Manuel Romero; Masoud, Maraim; Grandury, María; Šaško, Mario; Huang, Max; Coavoux, Maximin; Singh, Mayank; Jiang, Mike Tian-Jian; Vu, Minh Chien; Jauhar, Mohammad A.; Ghaleb, Mustafa; Subramani, Nishant; Kassner, Nora; Khamis, Nurulaqilla; Nguyen, Olivier; Espejel, Omar; de Gibert, Ona; Villegas, Paulo; Henderson, Peter; Colombo, Pierre; Amuok, Priscilla; Lhoest, Quentin; Harliman, Rheza; Bommasani, Rishi; López, Roberto Luis; Ribeiro, Rui; Osei, Salomey; Pyysalo, Sampo; Nagel, Sebastian; Bose, Shamik; Muhammad, Shamsuddeen Hassan; Sharma, Shanya; Longpre, Shayne; Nikpoor, Somaieh; Silberberg, Stanislav; Pai, Suhas; Zink, Sydney; Torrent, Tiago Timponi; Schick, Timo; Thrush, Tristan; Danchev, Valentin; Nikoulina, Vassilina; Laippala, Veronika; Lepercq, Violette; Prabhu, Vrinda; Alyafeai, Zaid; Talat, Zeerak; Raja, Arun; Heinzerling, Benjamin; Si, Chenglei; Taşar, Davut Emre; Salesky, Elizabeth; Mielke, Sabrina J.; Lee, Wilson Y.; Sharma, Abheesht; Santilli, Andrea; Chaffin, Antoine; Stiegler, Arnaud; Datta, Debajyoti; Szczechla, Eliza; Chhablani, Gunjan; Wang, Han; Pandey, Harshit; Strobelt, Hendrik; Fries, Jason Alan; Rozen, Jos; Gao, Leo; Sutawika, Lintang; Bari, M. Saiful; Al-shaibani, Maged S.; Manica, Matteo; Nayak, Nihal; Teehan, Ryan; Albanie, Samuel; Shen, Sheng; Ben-David, Srulik; Bach, Stephen H.; Kim, Taewoon; Bers, Tali; Fevry, Thibault; Neeraj, Trishala; Thakker, Urmish; Raunak, Vikas; Tang, Xiangru; Yong, Zheng-Xin; Sun, Zhiqing; Brody, Shaked; Uri, Yallow; Tojarieh, Hadar; Roberts, Adam; Chung, Hyung Won; Tae, Jaesung; Phang, Jason; Press, Ofir; Li, Conglong; Narayanan, Deepak; Bourfoune, Hatim; Casper, Jared; Rasley, Jeff; Ryabinin, Max; Mishra, Mayank; Zhang, Minjia; Shoeybi, Mohammad; Peyrounette, Myriam; Patry, Nicolas; Tazi, Nouamane; Sanseviero, Omar; von Platen, Patrick; Cornette, Pierre; Lavallée, Pierre François; Lacroix, Rémi; Rajbhandari, Samyam; Gandhi, Sanchit; Smith, Shaden; Requena, Stéphane; Patil, Suraj; Dettmers, Tim; Baruwa, Ahmed; Singh, Amanpreet; Cheveleva, Anastasia; Ligozat, Anne-Laure; Subramonian, Arjun; Névéol, Aurélie; Lovering, Charles; Garrette, Dan; Tunuguntla, Deepak; Reiter, Ehud; Taktasheva, Ekaterina; Voloshina, Ekaterina; Bogdanov, Eli; Winata, Genta Indra; Schoelkopf, Hailey; Kalo, Jan-Christoph; Novikova, Jekaterina; Forde, Jessica Zosa; Clive, Jordan; Kasai, Jungo; Kawamura, Ken; Hazan, Liam; Carpuat, Marine; Clinciu, Miruna; Kim, Najoung; Cheng, Newton; Serikov, Oleg; Antverg, Omer; van der Wal, Oskar; Zhang, Rui; Zhang, Ruochen; Gehrmann, Sebastian; Mirkin, Shachar; Pais, Shani; Shavrina, Tatiana; Scialom, Thomas; Yun, Tian; Limisiewicz, Tomasz; Rieser, Verena; Protasov, Vitaly; Mikhailov, Vladislav; Pruksachatkun, Yada; Belinkov, Yonatan; Bamberger, Zachary; Kasner, Zdeněk; Rueda, Alice; Pestana, Amanda; Feizpour, Amir; Khan, Ammar; Faranak, Amy; Santos, Ana; Hevia, Anthony; Unldreaj, Antigona; Aghagol, Arash; Abdollahi, Arezoo; Tammour, Aycha; HajiHosseini, Azadeh; Behroozi, Bahareh; Ajibade, Benjamin; Saxena, Bharat; Ferrandis, Carlos Muñoz; Contractor, Danish; Lansky, David; David, Davis; Kiela, Douwe; Nguyen, Duong A.; Tan, Edward; Baylor, Emi; Ozoani, Ezinwanne; Mirza, Fatima; Ononiwu, Frankline; Rezanejad, Habib; Jones, Hessie; Bhattacharya, Indrani; Solaiman, Irene; Sedenko, Irina; Nejadgholi, Isar; Passmore, Jesse; Seltzer, Josh; Sanz, Julio Bonis; Dutra, Livia; Samagaio, Mairon; Elbadri, Maraim; Mieskes, Margot; Gerchick, Marissa; Akinlolu, Martha; McKenna, Michael; Qiu, Mike; Ghauri, Muhammed; Burynok, Mykola; Abrar, Nafis; Rajani, Nazneen; Elkott, Nour; Fahmy, Nour; Samuel, Olanrewaju; An, Ran; Kromann, Rasmus; Hao, Ryan; Alizadeh, Samira; Shubber, Sarmad; Wang, Silas; Roy, Sourav; Viguier, Sylvain; Le, Thanh; Oyebade, Tobi; Le, Trieu; Yang, Yoyo; Nguyen, Zach; Kashyap, Abhinav Ramesh; Palasciano, Alfredo; Callahan, Alison; Shukla, Anima; Miranda-Escalada, Antonio; Singh, Ayush; Beilharz, Benjamin; Wang, Bo; Brito, Caio; Zhou, Chenxi; Jain, Chirag; Xu, Chuxin; Fourrier, Clémentine; Periñán, Daniel León; Molano, Daniel; Yu, Dian; Manjavacas, Enrique; Barth, Fabio; Fuhrimann, Florian; Altay, Gabriel; Bayrak, Giyaseddin; Burns, Gully; Vrabec, Helena U.; Bello, Imane; Dash, Ishani; Kang, Jihyun; Giorgi, John; Golde, Jonas; Posada, Jose David; Sivaraman, Karthik Rangasai; Bulchandani, Lokesh; Liu, Lu; Shinzato, Luisa; de Bykhovetz, Madeleine Hahn; Takeuchi, Maiko; Pàmies, Marc; Castillo, Maria A.; Nezhurina, Marianna; Sänger, Mario; Samwald, Matthias; Cullan, Michael; Weinberg, Michael; De Wolf, Michiel; Mihaljcic, Mina; Liu, Minna; Freidank, Moritz; Kang, Myungsun; Seelam, Natasha; Dahlberg, Nathan; Broad, Nicholas Michio; Muellner, Nikolaus; Fung, Pascale; Haller, Patrick; Chandrasekhar, Ramya; Eisenberg, Renata; Martin, Robert; Canalli, Rodrigo; Su, Rosaline; Su, Ruisi; Cahyawijaya, Samuel; Garda, Samuele; Deshmukh, Shlok S.; Mishra, Shubhanshu; Kiblawi, Sid; Ott, Simon; Sang-aroonsiri, Sinee; Kumar, Srishti; Schweter, Stefan; Bharati, Sushil; Laud, Tanmay; Gigant, Théo; Kainuma, Tomoya; Kusa, Wojciech; Labrak, Yanis; Bajaj, Yash Shailesh; Venkatraman, Yash; Xu, Yifan; Xu, Yingxin; Xu, Yu; Tan, Zhe; Xie, Zhongli; Ye, Zifan; Bras, Mathilde; Belkada, Younes; Wolf, Thomas (13 March 2023). "BLOOM: A 176B-Parameter Open-Access Multilingual Language Model". arXiv:2211.05100 [cs].
↑ Chopra, Disha (17 November 2022). "Meta Introduces 'Galactica,' an AI System that Generates Academic Papers from Simple Text Inputs". Analytics Drift. Retrieved 20 September 2023.
↑ "Meta's New Large Language Model Galactica Pulled Down Three Days After Launch". Spiceworks. Retrieved 20 September 2023.
↑ "AlexaTM 20B is now available in Amazon SageMaker JumpStart | AWS Machine Learning Blog". aws.amazon.com. 17 November 2022. Retrieved 20 September 2023.
↑ "FLAN-T5 vs. FLAN-UL2: Which LLM is Better? | Sapling". sapling.ai. Retrieved 19 October 2023.
↑ Subhash, Varshini (5 January 2023). "Can Large Language Models Change User Preference Adversarially?". arXiv:2302.10291 [cs]. doi:10.48550/arXiv.2302.10291.
↑ Joshi, Harshit; Ebenezer, Abishai; Cambronero, José; Gulwani, Sumit; Kanade, Aditya; Le, Vu; Radiček, Ivan; Verbruggen, Gust (31 January 2023). "FLAME: A small language model for spreadsheet formulas". arXiv:2301.13779 [cs]. doi:10.48550/arXiv.2301.13779.
↑ Zhang, Zhuosheng; Zhang, Aston; Li, Mu; Zhao, Hai; Karypis, George; Smola, Alex (2023). "Multimodal Chain-of-Thought Reasoning in Language Models". doi:10.48550/arXiv.2302.00923. {{cite journal}}: Cite journal requires |journal= (help)
↑ "Vinija's Notes • Models • Toolformer". vinija.ai. Retrieved 26 June 2023.
↑ Schick, Timo; Dwivedi-Yu, Jane; Dessì, Roberto; Raileanu, Roberta; Lomeli, Maria; Zettlemoyer, Luke; Cancedda, Nicola; Scialom, Thomas (2023). "Toolformer: Language Models Can Teach Themselves to Use Tools". doi:10.48550/arXiv.2302.04761. {{cite journal}}: Cite journal requires |journal= (help)
↑ "Shaped". www.shaped.ai. Retrieved 16 May 2023.
↑ Weaver, Alaura (2 March 2023). "Palmyra LLMs empower secure, enterprise-grade generative AI for business". Writer. Retrieved 11 March 2023.
↑ "Writer Launches Three New Generative AI Models for the Enterprise". PRWeb. Retrieved 11 March 2023.
↑ "fnlp/moss-moon-003-base · Hugging Face". huggingface.co. 20 April 2023. Retrieved 26 June 2023.
↑ "MOSS". txsun1997.github.io. Retrieved 11 March 2023.
↑ White, Jules; Fu, Quchen; Hays, Sam; Sandborn, Michael; Olea, Carlos; Gilbert, Henry; Elnashar, Ashraf; Spencer-Smith, Jesse; Schmidt, Douglas C. (21 February 2023). "A Prompt Pattern Catalog to Enhance Prompt Engineering with ChatGPT". arXiv:2302.11382 [cs]. doi:10.48550/arXiv.2302.11382.
↑ "LLaMA: Open and Efficient Foundation Language Models - Meta Research". Meta Research. Retrieved 11 March 2023.
↑ Peng, Baolin; Galley, Michel; He, Pengcheng; Cheng, Hao; Xie, Yujia; Hu, Yu; Huang, Qiuyuan; Liden, Lars; Yu, Zhou; Chen, Weizhu; Gao, Jianfeng (1 March 2023). "Check Your Facts and Try Again: Improving Large Language Models with External Knowledge and Automated Feedback". arXiv:2302.12813 [cs]. doi:10.48550/arXiv.2302.12813.
↑ Raieli, Salvatore (13 March 2023). "SpikeGPT: a 260 M only parameters LM not afraid of competition". Medium. Retrieved 26 June 2023.
↑ Zhu, Rui-Jie; Zhao, Qihang; Eshraghian, Jason K. (28 February 2023). "SpikeGPT: Generative Pre-trained Language Model with Spiking Neural Networks". arXiv:2302.13939 [cs]. doi:10.48550/arXiv.2302.13939.
↑ Al-Kaswan, Ali; Izadi, Maliheh (28 February 2023). "The (ab)use of Open Source Code to Train Large Language Models". arXiv:2302.13681 [cs]. doi:10.48550/arXiv.2302.13681.
↑ Kwon, Minae; Xie, Sang Michael; Bullard, Kalesha; Sadigh, Dorsa (27 February 2023). "Reward Design with Language Models". arXiv:2303.00001 [cs]. doi:10.48550/arXiv.2303.00001.
↑ Bastian, Matthias (3 March 2023). "Microsoft's Kosmos-1 is a multimodal step toward more general AI". THE DECODER. Retrieved 18 September 2023.
↑ Huang, Shaohan; Dong, Li; Wang, Wenhui; Hao, Yaru; Singhal, Saksham; Ma, Shuming; Lv, Tengchao; Cui, Lei; Mohammed, Owais Khan; Patra, Barun; Liu, Qiang; Aggarwal, Kriti; Chi, Zewen; Bjorck, Johan; Chaudhary, Vishrav; Som, Subhojit; Song, Xia; Wei, Furu (1 March 2023). "Language Is Not All You Need: Aligning Perception with Language Models". arXiv:2302.14045 [cs]. doi:10.48550/arXiv.2302.14045.
↑ Cao, Meng; Fatemi, Mehdi; Cheung, Jackie Chi Kit; Shabanian, Samira (27 February 2023). "Systematic Rectification of Language Models via Dead-end Analysis". arXiv:2302.14003 [cs]. doi:10.48550/arXiv.2302.14003.
↑ Bertolini, Lorenzo; Elce, Valentina; Michalak, Adriana; Bernardi, Giulio; Weeds, Julie (28 February 2023). "Automatic Scoring of Dream Reports' Emotional Content with Large Language Models". arXiv:2302.14828 [cs]. doi:10.48550/arXiv.2302.14828.
↑ Ye, Seonghyeon; Hwang, Hyeonbin; Yang, Sohee; Yun, Hyeongu; Kim, Yireun; Seo, Minjoon (28 February 2023). "In-Context Instruction Learning". arXiv:2302.14691 [cs]. doi:10.48550/arXiv.2302.14691.
↑ Houghton, Conor; Kazanina, Nina; Sukumaran, Priyanka (28 February 2023). "Beyond the limitations of any imaginable mechanism: large language models and psycholinguistics". arXiv:2303.00077 [cs]. doi:10.48550/arXiv.2303.00077. Retrieved 10 March 2023.
↑ Yuan, Yang (2023). "Succinct Representations for Concepts". doi:10.48550/arXiv.2303.00446. {{cite journal}}: Cite journal requires |journal= (help)
↑ Huemann, Zachary; Lee, Changhee; Hu, Junjie; Cho, Steve Y.; Bradshaw, Tyler (1 March 2023). "Domain-adapted large language models for classifying nuclear medicine reports". arXiv:2303.01258 [cs]. doi:10.48550/arXiv.2303.01258.
↑ "A New Open Source Flan 20B with UL2". Yi Tay. Retrieved 30 June 2023.
↑ Zhang, Bowen; Soh, Harold (6 March 2023). "Large Language Models as Zero-Shot Human Models for Human-Robot Interaction". arXiv:2303.03548 [cs]. doi:10.48550/arXiv.2303.03548.
↑ Dang, Hai; Goller, Sven; Lehmann, Florian; Buschek, Daniel (6 March 2023). "Choice Over Control: How Users Write with Large Language Models using Diegetic and Non-Diegetic Prompting". arXiv:2303.03199 [cs]. doi:10.1145/3544548.3580969. Retrieved 8 March 2023.
↑ "Prepare for truly useful large language models". Nature Biomedical Engineering. 7 (2): 85–86. 7 March 2023. doi:10.1038/s41551-023-01012-6.
↑ "Stanford CRFM". crfm.stanford.edu. Retrieved 21 March 2023.
↑ "Announcement of Jurassic-2 and Task-Specific APIs". Data Phoenix. 12 March 2023. Retrieved 21 September 2023.
↑ "Large language model: Revision history - Wikipedia". en.wikipedia.org. Retrieved 21 September 2023.
↑ Lua error: Internal error: The interpreter has terminated with signal "24".
↑ Lua error: Internal error: The interpreter has terminated with signal "24".
↑ Lua error: Internal error: The interpreter has terminated with signal "24".
↑ Lua error: Internal error: The interpreter has terminated with signal "24".
↑ Lua error: Internal error: The interpreter has terminated with signal "24".
↑ Lua error: Internal error: The interpreter has terminated with signal "24".
↑ Lua error: Internal error: The interpreter has terminated with signal "24".
↑ Lua error: Internal error: The interpreter has terminated with signal "24".
↑ Lua error: Internal error: The interpreter has terminated with signal "24".
↑ Lua error: Internal error: The interpreter has terminated with signal "24".
↑ Lua error: Internal error: The interpreter has terminated with signal "24".
↑ Lua error: Internal error: The interpreter has terminated with signal "24".
↑ Lua error: Internal error: The interpreter has terminated with signal "24".
↑ Lua error: Internal error: The interpreter has terminated with signal "24".
↑ Lua error: Internal error: The interpreter has terminated with signal "24".
↑ Lua error: Internal error: The interpreter has terminated with signal "24".
↑ Lua error: Internal error: The interpreter has terminated with signal "24".
↑ Lua error: Internal error: The interpreter has terminated with signal "24".
↑ Lua error: Internal error: The interpreter has terminated with signal "24".
↑ Lua error: Internal error: The interpreter has terminated with signal "24".
↑ Lua error: Internal error: The interpreter has terminated with signal "24".
↑ Lua error: Internal error: The interpreter has terminated with signal "24".
↑ Lua error: Internal error: The interpreter has terminated with signal "24".
↑ Lua error: Internal error: The interpreter has terminated with signal "24".
↑ Lua error: Internal error: The interpreter has terminated with signal "24".
↑ Lua error: Internal error: The interpreter has terminated with signal "24".
↑ Lua error: Internal error: The interpreter has terminated with signal "24".
↑ Lua error: Internal error: The interpreter has terminated with signal "24".
↑ Lua error: Internal error: The interpreter has terminated with signal "24".
↑ Lua error: Internal error: The interpreter has terminated with signal "24".
↑ Lua error: Internal error: The interpreter has terminated with signal "24".
↑ Lua error: Internal error: The interpreter has terminated with signal "24".
↑ Lua error: Internal error: The interpreter has terminated with signal "24".
↑ Lua error: Internal error: The interpreter has terminated with signal "24".
↑ Lua error: Internal error: The interpreter has terminated with signal "24".
↑ Lua error: Internal error: The interpreter has terminated with signal "24".
↑ Lua error: Internal error: The interpreter has terminated with signal "24".
↑ Lua error: Internal error: The interpreter has terminated with signal "24".
↑ Lua error: Internal error: The interpreter has terminated with signal "24".
↑ Lua error: Internal error: The interpreter has terminated with signal "24".
↑ Lua error: Internal error: The interpreter has terminated with signal "24".
↑ Lua error: Internal error: The interpreter has terminated with signal "24".
↑ Lua error: Internal error: The interpreter has terminated with signal "24".
↑ Lua error: Internal error: The interpreter has terminated with signal "24".
↑ Lua error: Internal error: The interpreter has terminated with signal "24".
↑ Lua error: Internal error: The interpreter has terminated with signal "24".
↑ Lua error: Internal error: The interpreter has terminated with signal "24".
↑ Lua error: Internal error: The interpreter has terminated with signal "24".
↑ Lua error: Internal error: The interpreter has terminated with signal "24".
↑ Lua error: Internal error: The interpreter has terminated with signal "24".
↑ Lua error: Internal error: The interpreter has terminated with signal "24".
↑ Lua error: Internal error: The interpreter has terminated with signal "24".
↑ Lua error: Internal error: The interpreter has terminated with signal "24".

Lua error: Internal error: The interpreter has terminated with signal "24".

[1] "Large Language Models: Complete Guide in 2023". research.aimultiple.com. Retrieved 11 March 2023.

[Pathak-2] 2.0 ^2.1 ^2.2 ^2.3 Pathak, Priyanka (11 May 2023). "Large Language Models 101: History, Evolution and Future". Scribble Data. Retrieved 12 October 2023.

[Snorkel_AI-3] 3.0 ^3.1 Casey, Matt (25 May 2023). "Large language models: their history, capabilities and limitations". Snorkel AI. Retrieved 12 October 2023.

[omegavp.comv-4] 4.0 ^4.1 ^4.2 ^4.3 ^4.4 "Introduction to Large Language Models | Omega Venture Partners". omegavp.com. 6 December 2022. Retrieved 12 October 2023.

[Brief_History_of_Large-5] 5.0 ^5.1 ^5.2 ^5.3 ^5.4 ^5.5 "Brief History of Large Language Models & Generative AI | Evolution of NLP from Eliza to ChatGPT". youtube.com. Retrieved 17 October 2023.

[llm-6] 6.0 ^6.1 "Large Language Model Training in 2023". research.aimultiple.com. Retrieved 11 March 2023.

[7] Yanhui, Chen (8 March 2021). "A Battle Against Amnesia: A Brief History and Introduction of Recurrent Neural Networks". Medium. Retrieved 17 October 2023.

[8] "The Bahdanau Attention Mechanism". machinelearningmastery.com. Retrieved 17 October 2023.

[9] Devlin, Jacob; Chang, Ming-Wei; Lee, Kenton; Toutanova, Kristina (2018). "BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding". doi:10.48550/arXiv.1810.04805. {{cite journal}}: Cite journal requires |journal= (help)

[10] "BERT 101 - State Of The Art NLP Model Explained". huggingface.co. Retrieved 16 October 2023.

[11] Devlin, Jacob; Chang, Ming-Wei; Lee, Kenton; Toutanova, Kristina (2018). "BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding". doi:10.48550/arXiv.1810.04805. {{cite journal}}: Cite journal requires |journal= (help)

[GPT-2:_6-month_follow-up-12] "GPT-2: 6-month follow-up". openai.com. Retrieved 23 March 2023.

[13] Zellers, Rowan; Holtzman, Ari; Rashkin, Hannah; Bisk, Yonatan; Farhadi, Ali; Roesner, Franziska; Choi, Yejin (2019). "Defending Against Neural Fake News". doi:10.48550/arXiv.1905.12616. {{cite journal}}: Cite journal requires |journal= (help)

[14] "BERT, RoBERTa, DistilBERT, XLNet: Which one to use?". KDnuggets. Retrieved 29 June 2023.

[15] Ph.D, Suleiman Khan (18 May 2021). "BERT, RoBERTa, DistilBERT, XLNet — which one to use?". Medium. Retrieved 16 October 2023.

[16] Yang, Zhilin; Dai, Zihang; Yang, Yiming; Carbonell, Jaime; Salakhutdinov, Ruslan; Le, Quoc V. (2019). "XLNet: Generalized Autoregressive Pretraining for Language Understanding". doi:10.48550/arXiv.1906.08237. {{cite journal}}: Cite journal requires |journal= (help)

[Factored-17] 17.0 ^17.1 G, Juan (21 September 2021). "An Intuitive Explanation of Transformer-Based Models". Factored | Machine Learning, Data Engineering and Data Analytics Company. Retrieved 16 October 2023.

[18] "Overview of ROBERTa model". GeeksforGeeks. 24 November 2020. Retrieved 16 October 2023.

[19] Liu, Yinhan; Ott, Myle; Goyal, Naman; Du, Jingfei; Joshi, Mandar; Chen, Danqi; Levy, Omer; Lewis, Mike; Zettlemoyer, Luke; Stoyanov, Veselin (2019). "RoBERTa: A Robustly Optimized BERT Pretraining Approach". doi:10.48550/arXiv.1907.11692. {{cite journal}}: Cite journal requires |journal= (help)

[Dr_Alan_D._Thompson-20] 20.0 ^20.1 ^20.2 ^20.3 "AI: Megatron the Transformer, and its related language models". Dr Alan D. Thompson – Life Architect. 24 September 2021. Retrieved 16 October 2023.

[21] "Megatron Unleashed: NVIDIA's NLP Model "Megatron-LM" is the Largest Transformer Ever Trained | Exxact Blog". www.exxactcorp.com. Retrieved 11 March 2023.

[lifearchitect.ai-22] 22.0 ^22.1 ^22.2 "AI: Megatron the Transformer, and its related language models". lifearchitect.ai. 24 September 2021. Retrieved 18 September 2023.

[23] "NeMo Megatron — NVIDIA NeMo". docs.nvidia.com. Retrieved 11 March 2023.

[24] "Nvidia trains world's largest Transformer-based language model". VentureBeat. 13 August 2019. Retrieved 18 September 2023.

[25] Keskar, Nitish Shirish; McCann, Bryan; Varshney, Lav R.; Xiong, Caiming; Socher, Richard (2019). "CTRL: A Conditional Transformer Language Model for Controllable Generation". doi:10.48550/arXiv.1909.05858. {{cite journal}}: Cite journal requires |journal= (help)

[26] Lan, Zhenzhong; Chen, Mingda; Goodman, Sebastian; Gimpel, Kevin; Sharma, Piyush; Soricut, Radu (2019). "ALBERT: A Lite BERT for Self-supervised Learning of Language Representations". doi:10.48550/arXiv.1909.11942. {{cite journal}}: Cite journal requires |journal= (help)

[27] Herbst, Sabrina (24 January 2023). "Training a DistilBERT model from scratch". Medium. Retrieved 17 October 2023.

[28] Sanh, Victor; Debut, Lysandre; Chaumond, Julien; Wolf, Thomas (2019). "DistilBERT, a distilled version of BERT: smaller, faster, cheaper and lighter". doi:10.48550/ARXIV.1910.01108. {{cite web}}: Missing or empty |url= (help)

[29] Kuzman, Taja (29 March 2023). "Microsoft introduced its DialoGPT to Skype and Edge". Medium. Retrieved 19 September 2023.

[30] Zhang, Yizhe; Sun, Siqi; Galley, Michel; Chen, Yen-Chun; Brockett, Chris; Gao, Xiang; Gao, Jianfeng; Liu, Jingjing; Dolan, Bill (2019). "DialoGPT: Large-Scale Generative Pre-training for Conversational Response Generation". doi:10.48550/arXiv.1911.00536. {{cite journal}}: Cite journal requires |journal= (help)

[Pretrained_models-31] "Pretrained models — transformers 2.10.0 documentation". huggingface.co.

[FlauBERT-32] 32.0 ^32.1 ^32.2 ^32.3 Blanc, Corentin; Bailly, Alexandre; Francis, Élie; Guillotin, Thierry; Jamal, Fadi; Wakim, Béchara; Roy, Pascal (1 May 2022). "FlauBERT vs. CamemBERT: Understanding patient's answers by a French medical chatbot". Artificial Intelligence in Medicine. 127: 102264. doi:10.1016/j.artmed.2022.102264. ISSN 0933-3657.

[33] Martin, Louis; Muller, Benjamin; Suárez, Pedro Javier Ortiz; Dupont, Yoann; Romary, Laurent; de la Clergerie, Éric Villemonte; Seddah, Djamé; Sagot, Benoît (2019). "CamemBERT: a Tasty French Language Model". doi:10.48550/arXiv.1911.03894. {{cite journal}}: Cite journal requires |journal= (help)

[34] Sambucci, Luca (17 November 2021). "Cedille, the largest French AI language model, is actually from Switzerland". Artificial Intelligence news. Retrieved 30 June 2023.

[35] Le, Hang; Vial, Loïc; Frej, Jibril; Segonne, Vincent; Coavoux, Maximin; Lecouteux, Benjamin; Allauzen, Alexandre; Crabbé, Benoît; Besacier, Laurent; Schwab, Didier (2019). "FlauBERT: Unsupervised Language Model Pre-training for French". doi:10.48550/arXiv.1912.05372. {{cite journal}}: Cite journal requires |journal= (help)

[36] Qi, Weizhen; Yan, Yu; Gong, Yeyun; Liu, Dayiheng; Duan, Nan; Chen, Jiusheng; Zhang, Ruofei; Zhou, Ming (2020). "ProphetNet: Predicting Future N-gram for Sequence-to-Sequence Pre-training". doi:10.48550/arXiv.2001.04063. {{cite journal}}: Cite journal requires |journal= (help)

[37] Jagtap, Rohan (2 August 2020). "T5: Text-To-Text Transfer Transformer". Medium. Retrieved 19 September 2023.

[A_Survey_of_Large-38] 38.00 ^38.01 ^38.02 ^38.03 ^38.04 ^38.05 ^38.06 ^38.07 ^38.08 ^38.09 ^38.10 ^38.11 ^38.12 ^38.13 ^38.14 ^38.15 ^38.16 ^38.17 ^38.18 ^38.19 ^38.20 ^38.21 ^38.22 ^38.23 ^38.24 ^38.25 ^38.26 ^38.27 ^38.28 ^38.29 ^38.30 ^38.31 ^38.32 ^38.33 ^38.34 ^38.35 ^38.36 ^38.37 ^38.38 ^38.39 ^38.40 ^38.41 ^38.42 ^38.43 ^38.44 ^38.45 ^38.46 ^38.47 ^38.48 ^38.49 ^38.50 ^38.51 ^38.52 ^38.53 ^38.54 ^38.55 ^38.56 Zhao, Wayne Xin; Zhou, Kun; Li, Junyi; Tang, Tianyi; Wang, Xiaolei; Hou, Yupeng; Min, Yingqian; Zhang, Beichen; Zhang, Junjie; Dong, Zican; Du, Yifan; Yang, Chen; Chen, Yushuo; Chen, Zhipeng; Jiang, Jinhao; Ren, Ruiyang; Li, Yifan; Tang, Xinyu; Liu, Zikang; Liu, Peiyu; Nie, Jian-Yun; Wen, Ji-Rong (2023). "A Survey of Large Language Models". doi:10.48550/arXiv.2303.18223. {{cite journal}}: Cite journal requires |journal= (help)

[39] "Exploring Transfer Learning with T5: the Text-To-Text Transfer Transformer". ai.googleblog.com. 24 February 2020. Retrieved 25 June 2023.

[40] "More Efficient NLP Model Pre-training with ELECTRA". ai.googleblog.com. 10 March 2020. Retrieved 28 June 2023.

[41] Wiggers, Kyle (28 April 2022). "The emerging types of language models and why they matter". TechCrunch. Retrieved 29 June 2023.

[42] "OpenAI GPT-3: Everything You Need to Know [Updated]". springboard.com. Retrieved 16 October 2023.

[43] Romero, Alberto (25 May 2021). "GPT-3 — A Complete Overview". Medium. Retrieved 20 October 2023.

[NVIDIA_Blog-44] Lee, Angie (26 January 2023). "What Are Large Language Models Used For and Why Are They Important?". NVIDIA Blog. Retrieved 11 March 2023.

[45] Brown, Tom B.; Mann, Benjamin; Ryder, Nick; Subbiah, Melanie; Kaplan, Jared; Dhariwal, Prafulla; Neelakantan, Arvind; Shyam, Pranav; Sastry, Girish; Askell, Amanda; Agarwal, Sandhini; Herbert-Voss, Ariel; Krueger, Gretchen; Henighan, Tom; Child, Rewon; Ramesh, Aditya; Ziegler, Daniel M.; Wu, Jeffrey; Winter, Clemens; Hesse, Christopher; Chen, Mark; Sigler, Eric; Litwin, Mateusz; Gray, Scott; Chess, Benjamin; Clark, Jack; Berner, Christopher; McCandlish, Sam; Radford, Alec; Sutskever, Ilya; Amodei, Dario (2020). "Language Models are Few-Shot Learners". doi:10.48550/arXiv.2005.14165. {{cite journal}}: Cite journal requires |journal= (help)

[46] Tsang, Sik-Ho (21 January 2023). "Brief Review — DeBERTa: Decoding-enhanced BERT with Disentangled Attention". Medium. Retrieved 18 September 2023.

[47] He, Pengcheng; Liu, Xiaodong; Gao, Jianfeng; Chen, Weizhu (2020). "DeBERTa: Decoding-enhanced BERT with Disentangled Attention". doi:10.48550/arXiv.2006.03654. {{cite journal}}: Cite journal requires |journal= (help)

[48] Lepikhin, Dmitry; Lee, HyoukJoong; Xu, Yuanzhong; Chen, Dehao; Firat, Orhan; Huang, Yanping; Krikun, Maxim; Shazeer, Noam; Chen, Zhifeng (2020). "GShard: Scaling Giant Models with Conditional Computation and Automatic Sharding". doi:10.48550/arXiv.2006.16668. {{cite journal}}: Cite journal requires |journal= (help)

[49] Maynez, Joshua; Narayan, Shashi; Bohnet, Bernd; McDonald, Ryan (July 2020). "On Faithfulness and Factuality in Abstractive Summarization". Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics. Association for Computational Linguistics: 1906–1919. doi:10.18653/v1/2020.acl-main.173.

[50] Xue, Linting; Constant, Noah; Roberts, Adam; Kale, Mihir; Al-Rfou, Rami; Siddhant, Aditya; Barua, Aditya; Raffel, Colin (2021). "mT5: A Massively Multilingual Pre-trained Text-to-Text Transformer": 483–498. doi:10.18653/v1/2021.naacl-main.41. {{cite journal}}: Cite journal requires |journal= (help)

[51] "Wu Dao 2.0 in 2023: China's Improved Version of GPT-3". research.aimultiple.com. Retrieved 16 October 2023.

[52] "China's gigantic multi-modal AI is no one-trick pony". Engadget. 2 June 2021. Retrieved 18 October 2023.

[gpt-neo-53] "GPT Neo". March 15, 2023.

[54] "GPT-3's free alternative GPT-Neo is something to be excited about". VentureBeat. 15 May 2021. Retrieved 29 June 2023.

[55] Zeng, Wei; Ren, Xiaozhe; Su, Teng; Wang, Hui; Liao, Yi; Wang, Zhiwei; Jiang, Xin; Yang, ZhenZhang; Wang, Kaisheng; Zhang, Xiaoda; Li, Chen; Gong, Ziyan; Yao, Yifan; Huang, Xinjing; Wang, Jun; Yu, Jianfeng; Guo, Qi; Yu, Yue; Zhang, Yan; Wang, Jin; Tao, Hengtao; Yan, Dasen; Yi, Zexuan; Peng, Fang; Jiang, Fangqing; Zhang, Han; Deng, Lingfeng; Zhang, Yehong; Lin, Zhe; Zhang, Chao; Zhang, Shaojie; Guo, Mingyue; Gu, Shanzhi; Fan, Gaojun; Wang, Yaowei; Jin, Xuefeng; Liu, Qun; Tian, Yonghong (2021). "PanGu-$α$: Large-scale Autoregressive Pretrained Chinese Language Models with Auto-parallel Computation". doi:10.48550/arXiv.2104.12369. {{cite journal}}: Cite journal requires |journal= (help)

[Kazi-56] 56.0 ^56.1 ^56.2 ^56.3 ^56.4 ^56.5 ^56.6 ^56.7 Kazi, Suleman (28 March 2023). "Top Large Language Models (LLMs): GPT-4, LLaMA, FLAN UL2, BLOOM, and More". Vectara. Retrieved 29 June 2023.

[57] Tsang, Sik-Ho (13 May 2023). "Brief Review — LaMDA: Language Models for Dialog Applications". Medium. Retrieved 16 October 2023.

[58] Zhang, Zhengyan; Gu, Yuxian; Han, Xu; Chen, Shengqi; Xiao, Chaojun; Sun, Zhenbo; Yao, Yuan; Qi, Fanchao; Guan, Jian; Ke, Pei; Cai, Yanzheng; Zeng, Guoyang; Tan, Zhixing; Liu, Zhiyuan; Huang, Minlie; Han, Wentao; Liu, Yang; Zhu, Xiaoyan; Sun, Maosong (2021). "CPM-2: Large-scale Cost-effective Pre-trained Language Models". doi:10.48550/arXiv.2106.10715. {{cite journal}}: Cite journal requires |journal= (help)

[59] Sun, Yu; Wang, Shuohuan; Feng, Shikun; Ding, Siyu; Pang, Chao; Shang, Junyuan; Liu, Jiaxiang; Chen, Xuyi; Zhao, Yanbin; Lu, Yuxiang; Liu, Weixin; Wu, Zhihua; Gong, Weibao; Liang, Jianzhong; Shang, Zhizhou; Sun, Peng; Liu, Wei; Ouyang, Xuan; Yu, Dianhai; Tian, Hao; Wu, Hua; Wang, Haifeng (2021). "ERNIE 3.0: Large-scale Knowledge Enhanced Pre-training for Language Understanding and Generation". doi:10.48550/arXiv.2107.02137. {{cite journal}}: Cite journal requires |journal= (help)

[60] Chen, Mark; Tworek, Jerry; Jun, Heewoo; Yuan, Qiming; Pinto, Henrique Ponde de Oliveira; Kaplan, Jared; Edwards, Harri; Burda, Yuri; Joseph, Nicholas; Brockman, Greg; Ray, Alex; Puri, Raul; Krueger, Gretchen; Petrov, Michael; Khlaaf, Heidy; Sastry, Girish; Mishkin, Pamela; Chan, Brooke; Gray, Scott; Ryder, Nick; Pavlov, Mikhail; Power, Alethea; Kaiser, Lukasz; Bavarian, Mohammad; Winter, Clemens; Tillet, Philippe; Such, Felipe Petroski; Cummings, Dave; Plappert, Matthias; Chantzis, Fotios; Barnes, Elizabeth; Herbert-Voss, Ariel; Guss, William Hebgen; Nichol, Alex; Paino, Alex; Tezak, Nikolas; Tang, Jie; Babuschkin, Igor; Balaji, Suchir; Jain, Shantanu; Saunders, William; Hesse, Christopher; Carr, Andrew N.; Leike, Jan; Achiam, Josh; Misra, Vedant; Morikawa, Evan; Radford, Alec; Knight, Matthew; Brundage, Miles; Murati, Mira; Mayer, Katie; Welinder, Peter; McGrew, Bob; Amodei, Dario; McCandlish, Sam; Sutskever, Ilya; Zaremba, Wojciech (2021). "Evaluating Large Language Models Trained on Code". doi:10.48550/arXiv.2107.03374. {{cite journal}}: Cite journal requires |journal= (help)

[HyperCLOVA-61] 61.0 ^61.1 Demo, GPT-3. "HyperCLOVA | Discover AI use cases". gpt3demo.com. Retrieved 20 October 2023.{{cite web}}: CS1 maint: numeric names: authors list (link)

[What_Changes-62] 62.0 ^62.1 Kim, Boseop; Kim, HyoungSeok; Lee, Sang-Woo; Lee, Gichang; Kwak, Donghyun; Dong Hyeon, Jeon; Park, Sunghyun; Kim, Sungju; Kim, Seonhoon; Seo, Dongpil; Lee, Heungsub; Jeong, Minyoung; Lee, Sungjae; Kim, Minsub; Ko, Suk Hyun; Kim, Seokhun; Park, Taeyong; Kim, Jinuk; Kang, Soyoung; Ryu, Na-Hyeon; Yoo, Kang Min; Chang, Minsuk; Suh, Soobin; In, Sookyo; Park, Jinseong; Kim, Kyungduk; Kim, Hiun; Jeong, Jisu; Yeo, Yong Goo; Ham, Donghoon; Park, Dongju; Lee, Min Young; Kang, Jaewook; Kang, Inho; Ha, Jung-Woo; Park, Woomyoung; Sung, Nako (2021). "What Changes Can Large-scale Language Models Bring? Intensive Study on HyperCLOVA: Billions-scale Korean Generative Pretrained Transformers": 3405–3424. doi:10.18653/v1/2021.emnlp-main.274. {{cite journal}}: Cite journal requires |journal= (help)

[63] Wu, Shaohua; Zhao, Xudong; Yu, Tong; Zhang, Rongguo; Shen, Chong; Liu, Hongli; Li, Feng; Zhu, Hong; Luo, Jiangang; Xu, Liang; Zhang, Xuanwei (2021). "Yuan 1.0: Large-Scale Pre-trained Language Model in Zero-Shot and Few-Shot Learning". doi:10.48550/arXiv.2110.04725. {{cite journal}}: Cite journal requires |journal= (help)

[64] "Using DeepSpeed and Megatron to Train Megatron-Turing NLG 530B, the World's Largest and Most Powerful Generative Language Model". NVIDIA Technical Blog. 11 October 2021. Retrieved 30 June 2023.

[65] Rae, Jack W.; Borgeaud, Sebastian; Cai, Trevor; Millican, Katie; Hoffmann, Jordan; Song, Francis; Aslanides, John; Henderson, Sarah; Ring, Roman; Young, Susannah; Rutherford, Eliza; Hennigan, Tom; Menick, Jacob; Cassirer, Albin; Powell, Richard; Driessche, George van den; Hendricks, Lisa Anne; Rauh, Maribeth; Huang, Po-Sen; Glaese, Amelia; Welbl, Johannes; Dathathri, Sumanth; Huang, Saffron; Uesato, Jonathan; Mellor, John; Higgins, Irina; Creswell, Antonia; McAleese, Nat; Wu, Amy; Elsen, Erich; Jayakumar, Siddhant; Buchatskaya, Elena; Budden, David; Sutherland, Esme; Simonyan, Karen; Paganini, Michela; Sifre, Laurent; Martens, Lena; Li, Xiang Lorraine; Kuncoro, Adhiguna; Nematzadeh, Aida; Gribovskaya, Elena; Donato, Domenic; Lazaridou, Angeliki; Mensch, Arthur; Lespiau, Jean-Baptiste; Tsimpoukelli, Maria; Grigorev, Nikolai; Fritz, Doug; Sottiaux, Thibault; Pajarskas, Mantas; Pohlen, Toby; Gong, Zhitao; Toyama, Daniel; d'Autume, Cyprien de Masson; Li, Yujia; Terzi, Tayfun; Mikulik, Vladimir; Babuschkin, Igor; Clark, Aidan; Casas, Diego de Las; Guy, Aurelia; Jones, Chris; Bradbury, James; Johnson, Matthew; Hechtman, Blake; Weidinger, Laura; Gabriel, Iason; Isaac, William; Lockhart, Ed; Osindero, Simon; Rimell, Laura; Dyer, Chris; Vinyals, Oriol; Ayoub, Kareem; Stanway, Jeff; Bennett, Lorrayne; Hassabis, Demis; Kavukcuoglu, Koray; Irving, Geoffrey (2021). "Scaling Language Models: Methods, Analysis & Insights from Training Gopher". doi:10.48550/arXiv.2112.11446. {{cite journal}}: Cite journal requires |journal= (help)

[66] "Google Trains 280 Billion Parameter AI Language Model Gopher". InfoQ. Retrieved 21 October 2023.

[67] Du, Nan; Huang, Yanping; Dai, Andrew M.; Tong, Simon; Lepikhin, Dmitry; Xu, Yuanzhong; Krikun, Maxim; Zhou, Yanqi; Yu, Adams Wei; Firat, Orhan; Zoph, Barret; Fedus, Liam; Bosma, Maarten; Zhou, Zongwei; Wang, Tao; Wang, Yu Emma; Webster, Kellie; Pellat, Marie; Robinson, Kevin; Meier-Hellstern, Kathleen; Duke, Toju; Dixon, Lucas; Zhang, Kun; Le, Quoc V; Wu, Yonghui; Chen, Zhifeng; Cui, Claire (2021). "GLaM: Efficient Scaling of Language Models with Mixture-of-Experts". doi:10.48550/arXiv.2112.06905. {{cite journal}}: Cite journal requires |journal= (help)

[68] "WebGPT: Improving the factual accuracy of language models through web browsing". openai.com. Retrieved 21 October 2023.

[69] "fairseq documentation — fairseq 0.12.2 documentation". fairseq.readthedocs.io. Retrieved 16 May 2023.

[70] Aghajanyan, Armen; Huang, Bernie; Ross, Candace; Karpukhin, Vladimir; Xu, Hu; Goyal, Naman; Okhonko, Dmytro; Joshi, Mandar; Ghosh, Gargi; Lewis, Mike; Zettlemoyer, Luke (2022). "CM3: A Causal Masked Multimodal Model of the Internet". doi:10.48550/arXiv.2201.07520. {{cite journal}}: Cite journal requires |journal= (help)

[71] "Aligning language models to follow instructions". openai.com. Retrieved 21 March 2023.

[72] "Finally, an AI bot that can ace technical interview questions (Ep. 417) - Stack Overflow". stackoverflow.blog. 22 February 2022. Retrieved 21 October 2023.

[73] "Cohere launches Extremely Large (beta)". Context by Cohere. 1 March 2022. Retrieved 12 March 2023.

[74] Shuster, Kurt; Komeili, Mojtaba; Adolphs, Leonard; Roller, Stephen; Szlam, Arthur; Weston, Jason (2022). "Language Models that Seek for Knowledge: Modular Search & Generation for Dialogue and Prompt Completion". doi:10.48550/arXiv.2203.13224. {{cite journal}}: Cite journal requires |journal= (help)

[75] Nijkamp, Erik; Pang, Bo; Hayashi, Hiroaki; Tu, Lifu; Wang, Huan; Zhou, Yingbo; Savarese, Silvio; Xiong, Caiming (2022). "CodeGen: An Open Large Language Model for Code with Multi-Turn Program Synthesis". doi:10.48550/arXiv.2203.13474. {{cite journal}}: Cite journal requires |journal= (help)

[76] "CodeGen". github.com. Salesforce. 16 May 2023. Retrieved 16 May 2023.

[77] Hoffmann, Jordan; Borgeaud, Sebastian; Mensch, Arthur; Buchatskaya, Elena; Cai, Trevor; Rutherford, Eliza; Casas, Diego de Las; Hendricks, Lisa Anne; Welbl, Johannes; Clark, Aidan; Hennigan, Tom; Noland, Eric; Millican, Katie; Driessche, George van den; Damoc, Bogdan; Guy, Aurelia; Osindero, Simon; Simonyan, Karen; Elsen, Erich; Rae, Jack W.; Vinyals, Oriol; Sifre, Laurent (2022). "Training Compute-Optimal Large Language Models". doi:10.48550/arXiv.2203.15556. {{cite journal}}: Cite journal requires |journal= (help)

[78] Hoffmann, Jordan; Borgeaud, Sebastian; Mensch, Arthur; Buchatskaya, Elena; Cai, Trevor; Rutherford, Eliza; Casas, Diego de Las; Hendricks, Lisa Anne; Welbl, Johannes; Clark, Aidan; Hennigan, Tom; Noland, Eric; Millican, Katie; Driessche, George van den; Damoc, Bogdan; Guy, Aurelia; Osindero, Simon; Simonyan, Karen; Elsen, Erich; Rae, Jack W.; Vinyals, Oriol; Sifre, Laurent (2022). "Training Compute-Optimal Large Language Models". doi:10.48550/arXiv.2203.15556. {{cite journal}}: Cite journal requires |journal= (help)

[79] Chowdhery, Aakanksha; Narang, Sharan; Devlin, Jacob; Bosma, Maarten; Mishra, Gaurav; Roberts, Adam; Barham, Paul; Chung, Hyung Won; Sutton, Charles; Gehrmann, Sebastian; Schuh, Parker; Shi, Kensen; Tsvyashchenko, Sasha; Maynez, Joshua; Rao, Abhishek; Barnes, Parker; Tay, Yi; Shazeer, Noam; Prabhakaran, Vinodkumar; Reif, Emily; Du, Nan; Hutchinson, Ben; Pope, Reiner; Bradbury, James; Austin, Jacob; Isard, Michael; Gur-Ari, Guy; Yin, Pengcheng; Duke, Toju; Levskaya, Anselm; Ghemawat, Sanjay; Dev, Sunipa; Michalewski, Henryk; Garcia, Xavier; Misra, Vedant; Robinson, Kevin; Fedus, Liam; Zhou, Denny; Ippolito, Daphne; Luan, David; Lim, Hyeontaek; Zoph, Barret; Spiridonov, Alexander; Sepassi, Ryan; Dohan, David; Agrawal, Shivani; Omernick, Mark; Dai, Andrew M.; Pillai, Thanumalayan Sankaranarayana; Pellat, Marie; Lewkowycz, Aitor; Moreira, Erica; Child, Rewon; Polozov, Oleksandr; Lee, Katherine; Zhou, Zongwei; Wang, Xuezhi; Saeta, Brennan; Diaz, Mark; Firat, Orhan; Catasta, Michele; Wei, Jason; Meier-Hellstern, Kathy; Eck, Douglas; Dean, Jeff; Petrov, Slav; Fiedel, Noah (2022). "PaLM: Scaling Language Modeling with Pathways". doi:10.48550/arXiv.2204.02311. {{cite journal}}: Cite journal requires |journal= (help)

[80] "Pathways Language Model (PaLM): Scaling to 540 Billion Parameters for Breakthrough Performance". ai.googleblog.com. Retrieved 21 March 2023.

[81] Bai, Yuntao; Jones, Andy; Ndousse, Kamal; Askell, Amanda; Chen, Anna; DasSarma, Nova; Drain, Dawn; Fort, Stanislav; Ganguli, Deep; Henighan, Tom; Joseph, Nicholas; Kadavath, Saurav; Kernion, Jackson; Conerly, Tom; El-Showk, Sheer; Elhage, Nelson; Hatfield-Dodds, Zac; Hernandez, Danny; Hume, Tristan; Johnston, Scott; Kravec, Shauna; Lovitt, Liane; Nanda, Neel; Olsson, Catherine; Amodei, Dario; Brown, Tom; Clark, Jack; McCandlish, Sam; Olah, Chris; Mann, Ben; Kaplan, Jared (2022). "Training a Helpful and Harmless Assistant with Reinforcement Learning from Human Feedback". doi:10.48550/arXiv.2204.05862. {{cite journal}}: Cite journal requires |journal= (help)

[82] Black, Sid; Biderman, Stella; Hallahan, Eric; Anthony, Quentin; Gao, Leo; Golding, Laurence; He, Horace; Leahy, Connor; McDonell, Kyle; Phang, Jason; Pieler, Michael; Prashanth, USVSN Sai; Purohit, Shivanshu; Reynolds, Laria; Tow, Jonathan; Wang, Ben; Weinbach, Samuel (2022). "GPT-NeoX-20B: An Open-Source Autoregressive Language Model". doi:10.48550/arXiv.2204.06745. {{cite journal}}: Cite journal requires |journal= (help)

[83] Leahy, Connor (2 February 2022). "Announcing GPT-NeoX-20B". EleutherAI Blog. Retrieved 21 March 2023.

[84] "Comparing AI models : DALLE and Stable Diffusion". www.linkedin.com. Retrieved 16 October 2023.

[85] Howell, James (22 September 2023). "What is Dall-E and How Does it Work? What is Dall-E and How Does it Work?". 101 Blockchains. Retrieved 16 October 2023.

[86] "What is Dall-E (Dall-E 2) and How Does it Work?". Enterprise AI. Retrieved 16 October 2023.

[87] Gonsalves, Robert A. (5 September 2023). "Exploring DALL-E for Digital Art Creation". Medium. Retrieved 16 October 2023.

[88] "Democratizing access to large-scale language models with OPT-175B". ai.meta.com. Retrieved 20 September 2023.

[89] Tay, Yi; Dehghani, Mostafa; Tran, Vinh Q.; García, Xavier; Wei, Jason; Wang, Xuezhi; Chung, Hyung Won; Bahri, Dara; Schuster, Tal; Zheng, H.; Zhou, Denny; Houlsby, N.; Metzler, Donald (10 May 2022). "UL2: Unifying Language Learning Paradigms". {{cite journal}}: Cite journal requires |journal= (help)

[90] Khrushchev, Mikhail (23 June 2022). "Yandex Publishes YaLM 100B. It's the Largest GPT-Like Neural Network in Open Source". Yandex. Retrieved 20 September 2023.

[91] Lewkowycz, Aitor; Andreassen, Anders; Dohan, David; Dyer, Ethan; Michalewski, Henryk; Ramasesh, Vinay; Slone, Ambrose; Anil, Cem; Schlag, Imanol; Gutman-Solo, Theo; Wu, Yuhuai; Neyshabur, Behnam; Gur-Ari, Guy; Misra, Vedant (2022). "Solving Quantitative Reasoning Problems with Language Models". doi:10.48550/arXiv.2206.14858. {{cite journal}}: Cite journal requires |journal= (help)

[92] Chopra, Disha (1 July 2022). "Google Developed Minerva, an AI That Can Answer Math Questions". Analytics Drift. Retrieved 20 September 2023.

[93] "New AI Model Translates 200 Languages, Making Technology Accessible to More People". Meta. 6 July 2022. Retrieved 19 October 2023.

[94] Rodriguez, Jesus (15 August 2022). "AlexaTM 20B is Amazon's New Language Super Model Which is Also Capable of Few-Shot Learning". Medium. Retrieved 21 October 2023.

[95] Elemuwa, Fimber (22 February 2023). "Using CodeGeeX as a GitHub Copilot alternative". LogRocket Blog. Retrieved 19 October 2023.

[CodeGeeX:-96] Zheng, Qinkai; Xia, Xiao; Zou, Xu; Dong, Yuxiao; Wang, Shan; Xue, Yufei; Wang, Zihan; Shen, Lei; Wang, Andi; Li, Yang; Su, Teng; Yang, Zhilin; Tang, Jie (2023). "CodeGeeX: A Pre-Trained Model for Code Generation with Multilingual Evaluations on HumanEval-X". doi:10.48550/arXiv.2303.17568. {{cite journal}}: Cite journal requires |journal= (help)

[97] "Could Deepmind's Sparrow be Google's answer to ChatGPT?". medium.com. Retrieved 21 October 2023.

[98] Su, Hui; Zhou, Xiao; Yu, Houjin; Shen, Xiaoyu; Chen, Yuwen; Zhu, Zilin; Yu, Yang; Zhou, Jie (2022). "WeLM: A Well-Read Pre-trained Language Model for Chinese". doi:10.48550/arXiv.2209.10372. {{cite journal}}: Cite journal requires |journal= (help)

[99] Zeng, Aohan; Liu, Xiao; Du, Zhengxiao; Wang, Zihan; Lai, Hanyu; Ding, Ming; Yang, Zhuoyi; Xu, Yifan; Zheng, Wendi; Xia, Xiao; Tam, Weng Lam; Ma, Zixuan; Xue, Yufei; Zhai, Jidong; Chen, Wenguang; Zhang, Peng; Dong, Yuxiao; Tang, Jie (2022). "GLM-130B: An Open Bilingual Pre-trained Model". doi:10.48550/arXiv.2210.02414. {{cite journal}}: Cite journal requires |journal= (help)

[100] Muennighoff, Niklas; Wang, Thomas; Sutawika, Lintang; Roberts, Adam; Biderman, Stella; Scao, Teven Le; Bari, M Saiful; Shen, Sheng; Yong, Zheng-Xin; Schoelkopf, Hailey; Tang, Xiangru; Radev, Dragomir; Aji, Alham Fikri; Almubarak, Khalid; Albanie, Samuel; Alyafeai, Zaid; Webson, Albert; Raff, Edward; Raffel, Colin (2022). "Crosslingual Generalization through Multitask Finetuning". doi:10.48550/arXiv.2211.01786. {{cite journal}}: Cite journal requires |journal= (help)

[101] Workshop, BigScience; Scao, Teven Le; Fan, Angela; Akiki, Christopher; Pavlick, Ellie; Ilić, Suzana; Hesslow, Daniel; Castagné, Roman; Luccioni, Alexandra Sasha; Yvon, François; Gallé, Matthias; Tow, Jonathan; Rush, Alexander M.; Biderman, Stella; Webson, Albert; Ammanamanchi, Pawan Sasanka; Wang, Thomas; Sagot, Benoît; Muennighoff, Niklas; del Moral, Albert Villanova; Ruwase, Olatunji; Bawden, Rachel; Bekman, Stas; McMillan-Major, Angelina; Beltagy, Iz; Nguyen, Huu; Saulnier, Lucile; Tan, Samson; Suarez, Pedro Ortiz; Sanh, Victor; Laurençon, Hugo; Jernite, Yacine; Launay, Julien; Mitchell, Margaret; Raffel, Colin; Gokaslan, Aaron; Simhi, Adi; Soroa, Aitor; Aji, Alham Fikri; Alfassy, Amit; Rogers, Anna; Nitzav, Ariel Kreisberg; Xu, Canwen; Mou, Chenghao; Emezue, Chris; Klamm, Christopher; Leong, Colin; van Strien, Daniel; Adelani, David Ifeoluwa; Radev, Dragomir; Ponferrada, Eduardo González; Levkovizh, Efrat; Kim, Ethan; Natan, Eyal Bar; De Toni, Francesco; Dupont, Gérard; Kruszewski, Germán; Pistilli, Giada; Elsahar, Hady; Benyamina, Hamza; Tran, Hieu; Yu, Ian; Abdulmumin, Idris; Johnson, Isaac; Gonzalez-Dios, Itziar; de la Rosa, Javier; Chim, Jenny; Dodge, Jesse; Zhu, Jian; Chang, Jonathan; Frohberg, Jörg; Tobing, Joseph; Bhattacharjee, Joydeep; Almubarak, Khalid; Chen, Kimbo; Lo, Kyle; Von Werra, Leandro; Weber, Leon; Phan, Long; allal, Loubna Ben; Tanguy, Ludovic; Dey, Manan; Muñoz, Manuel Romero; Masoud, Maraim; Grandury, María; Šaško, Mario; Huang, Max; Coavoux, Maximin; Singh, Mayank; Jiang, Mike Tian-Jian; Vu, Minh Chien; Jauhar, Mohammad A.; Ghaleb, Mustafa; Subramani, Nishant; Kassner, Nora; Khamis, Nurulaqilla; Nguyen, Olivier; Espejel, Omar; de Gibert, Ona; Villegas, Paulo; Henderson, Peter; Colombo, Pierre; Amuok, Priscilla; Lhoest, Quentin; Harliman, Rheza; Bommasani, Rishi; López, Roberto Luis; Ribeiro, Rui; Osei, Salomey; Pyysalo, Sampo; Nagel, Sebastian; Bose, Shamik; Muhammad, Shamsuddeen Hassan; Sharma, Shanya; Longpre, Shayne; Nikpoor, Somaieh; Silberberg, Stanislav; Pai, Suhas; Zink, Sydney; Torrent, Tiago Timponi; Schick, Timo; Thrush, Tristan; Danchev, Valentin; Nikoulina, Vassilina; Laippala, Veronika; Lepercq, Violette; Prabhu, Vrinda; Alyafeai, Zaid; Talat, Zeerak; Raja, Arun; Heinzerling, Benjamin; Si, Chenglei; Taşar, Davut Emre; Salesky, Elizabeth; Mielke, Sabrina J.; Lee, Wilson Y.; Sharma, Abheesht; Santilli, Andrea; Chaffin, Antoine; Stiegler, Arnaud; Datta, Debajyoti; Szczechla, Eliza; Chhablani, Gunjan; Wang, Han; Pandey, Harshit; Strobelt, Hendrik; Fries, Jason Alan; Rozen, Jos; Gao, Leo; Sutawika, Lintang; Bari, M. Saiful; Al-shaibani, Maged S.; Manica, Matteo; Nayak, Nihal; Teehan, Ryan; Albanie, Samuel; Shen, Sheng; Ben-David, Srulik; Bach, Stephen H.; Kim, Taewoon; Bers, Tali; Fevry, Thibault; Neeraj, Trishala; Thakker, Urmish; Raunak, Vikas; Tang, Xiangru; Yong, Zheng-Xin; Sun, Zhiqing; Brody, Shaked; Uri, Yallow; Tojarieh, Hadar; Roberts, Adam; Chung, Hyung Won; Tae, Jaesung; Phang, Jason; Press, Ofir; Li, Conglong; Narayanan, Deepak; Bourfoune, Hatim; Casper, Jared; Rasley, Jeff; Ryabinin, Max; Mishra, Mayank; Zhang, Minjia; Shoeybi, Mohammad; Peyrounette, Myriam; Patry, Nicolas; Tazi, Nouamane; Sanseviero, Omar; von Platen, Patrick; Cornette, Pierre; Lavallée, Pierre François; Lacroix, Rémi; Rajbhandari, Samyam; Gandhi, Sanchit; Smith, Shaden; Requena, Stéphane; Patil, Suraj; Dettmers, Tim; Baruwa, Ahmed; Singh, Amanpreet; Cheveleva, Anastasia; Ligozat, Anne-Laure; Subramonian, Arjun; Névéol, Aurélie; Lovering, Charles; Garrette, Dan; Tunuguntla, Deepak; Reiter, Ehud; Taktasheva, Ekaterina; Voloshina, Ekaterina; Bogdanov, Eli; Winata, Genta Indra; Schoelkopf, Hailey; Kalo, Jan-Christoph; Novikova, Jekaterina; Forde, Jessica Zosa; Clive, Jordan; Kasai, Jungo; Kawamura, Ken; Hazan, Liam; Carpuat, Marine; Clinciu, Miruna; Kim, Najoung; Cheng, Newton; Serikov, Oleg; Antverg, Omer; van der Wal, Oskar; Zhang, Rui; Zhang, Ruochen; Gehrmann, Sebastian; Mirkin, Shachar; Pais, Shani; Shavrina, Tatiana; Scialom, Thomas; Yun, Tian; Limisiewicz, Tomasz; Rieser, Verena; Protasov, Vitaly; Mikhailov, Vladislav; Pruksachatkun, Yada; Belinkov, Yonatan; Bamberger, Zachary; Kasner, Zdeněk; Rueda, Alice; Pestana, Amanda; Feizpour, Amir; Khan, Ammar; Faranak, Amy; Santos, Ana; Hevia, Anthony; Unldreaj, Antigona; Aghagol, Arash; Abdollahi, Arezoo; Tammour, Aycha; HajiHosseini, Azadeh; Behroozi, Bahareh; Ajibade, Benjamin; Saxena, Bharat; Ferrandis, Carlos Muñoz; Contractor, Danish; Lansky, David; David, Davis; Kiela, Douwe; Nguyen, Duong A.; Tan, Edward; Baylor, Emi; Ozoani, Ezinwanne; Mirza, Fatima; Ononiwu, Frankline; Rezanejad, Habib; Jones, Hessie; Bhattacharya, Indrani; Solaiman, Irene; Sedenko, Irina; Nejadgholi, Isar; Passmore, Jesse; Seltzer, Josh; Sanz, Julio Bonis; Dutra, Livia; Samagaio, Mairon; Elbadri, Maraim; Mieskes, Margot; Gerchick, Marissa; Akinlolu, Martha; McKenna, Michael; Qiu, Mike; Ghauri, Muhammed; Burynok, Mykola; Abrar, Nafis; Rajani, Nazneen; Elkott, Nour; Fahmy, Nour; Samuel, Olanrewaju; An, Ran; Kromann, Rasmus; Hao, Ryan; Alizadeh, Samira; Shubber, Sarmad; Wang, Silas; Roy, Sourav; Viguier, Sylvain; Le, Thanh; Oyebade, Tobi; Le, Trieu; Yang, Yoyo; Nguyen, Zach; Kashyap, Abhinav Ramesh; Palasciano, Alfredo; Callahan, Alison; Shukla, Anima; Miranda-Escalada, Antonio; Singh, Ayush; Beilharz, Benjamin; Wang, Bo; Brito, Caio; Zhou, Chenxi; Jain, Chirag; Xu, Chuxin; Fourrier, Clémentine; Periñán, Daniel León; Molano, Daniel; Yu, Dian; Manjavacas, Enrique; Barth, Fabio; Fuhrimann, Florian; Altay, Gabriel; Bayrak, Giyaseddin; Burns, Gully; Vrabec, Helena U.; Bello, Imane; Dash, Ishani; Kang, Jihyun; Giorgi, John; Golde, Jonas; Posada, Jose David; Sivaraman, Karthik Rangasai; Bulchandani, Lokesh; Liu, Lu; Shinzato, Luisa; de Bykhovetz, Madeleine Hahn; Takeuchi, Maiko; Pàmies, Marc; Castillo, Maria A.; Nezhurina, Marianna; Sänger, Mario; Samwald, Matthias; Cullan, Michael; Weinberg, Michael; De Wolf, Michiel; Mihaljcic, Mina; Liu, Minna; Freidank, Moritz; Kang, Myungsun; Seelam, Natasha; Dahlberg, Nathan; Broad, Nicholas Michio; Muellner, Nikolaus; Fung, Pascale; Haller, Patrick; Chandrasekhar, Ramya; Eisenberg, Renata; Martin, Robert; Canalli, Rodrigo; Su, Rosaline; Su, Ruisi; Cahyawijaya, Samuel; Garda, Samuele; Deshmukh, Shlok S.; Mishra, Shubhanshu; Kiblawi, Sid; Ott, Simon; Sang-aroonsiri, Sinee; Kumar, Srishti; Schweter, Stefan; Bharati, Sushil; Laud, Tanmay; Gigant, Théo; Kainuma, Tomoya; Kusa, Wojciech; Labrak, Yanis; Bajaj, Yash Shailesh; Venkatraman, Yash; Xu, Yifan; Xu, Yingxin; Xu, Yu; Tan, Zhe; Xie, Zhongli; Ye, Zifan; Bras, Mathilde; Belkada, Younes; Wolf, Thomas (13 March 2023). "BLOOM: A 176B-Parameter Open-Access Multilingual Language Model". arXiv:2211.05100 [cs].

[102] Chopra, Disha (17 November 2022). "Meta Introduces 'Galactica,' an AI System that Generates Academic Papers from Simple Text Inputs". Analytics Drift. Retrieved 20 September 2023.

[103] "Meta's New Large Language Model Galactica Pulled Down Three Days After Launch". Spiceworks. Retrieved 20 September 2023.

[104] "AlexaTM 20B is now available in Amazon SageMaker JumpStart | AWS Machine Learning Blog". aws.amazon.com. 17 November 2022. Retrieved 20 September 2023.

[105] "FLAN-T5 vs. FLAN-UL2: Which LLM is Better? | Sapling". sapling.ai. Retrieved 19 October 2023.

[106] Subhash, Varshini (5 January 2023). "Can Large Language Models Change User Preference Adversarially?". arXiv:2302.10291 [cs]. doi:10.48550/arXiv.2302.10291.

[107] Joshi, Harshit; Ebenezer, Abishai; Cambronero, José; Gulwani, Sumit; Kanade, Aditya; Le, Vu; Radiček, Ivan; Verbruggen, Gust (31 January 2023). "FLAME: A small language model for spreadsheet formulas". arXiv:2301.13779 [cs]. doi:10.48550/arXiv.2301.13779.

[108] Zhang, Zhuosheng; Zhang, Aston; Li, Mu; Zhao, Hai; Karypis, George; Smola, Alex (2023). "Multimodal Chain-of-Thought Reasoning in Language Models". doi:10.48550/arXiv.2302.00923. {{cite journal}}: Cite journal requires |journal= (help)

[109] "Vinija's Notes • Models • Toolformer". vinija.ai. Retrieved 26 June 2023.

[110] Schick, Timo; Dwivedi-Yu, Jane; Dessì, Roberto; Raileanu, Roberta; Lomeli, Maria; Zettlemoyer, Luke; Cancedda, Nicola; Scialom, Thomas (2023). "Toolformer: Language Models Can Teach Themselves to Use Tools". doi:10.48550/arXiv.2302.04761. {{cite journal}}: Cite journal requires |journal= (help)

[111] "Shaped". www.shaped.ai. Retrieved 16 May 2023.

[112] Weaver, Alaura (2 March 2023). "Palmyra LLMs empower secure, enterprise-grade generative AI for business". Writer. Retrieved 11 March 2023.

[113] "Writer Launches Three New Generative AI Models for the Enterprise". PRWeb. Retrieved 11 March 2023.

[114] "fnlp/moss-moon-003-base · Hugging Face". huggingface.co. 20 April 2023. Retrieved 26 June 2023.

[115] "MOSS". txsun1997.github.io. Retrieved 11 March 2023.

[116] White, Jules; Fu, Quchen; Hays, Sam; Sandborn, Michael; Olea, Carlos; Gilbert, Henry; Elnashar, Ashraf; Spencer-Smith, Jesse; Schmidt, Douglas C. (21 February 2023). "A Prompt Pattern Catalog to Enhance Prompt Engineering with ChatGPT". arXiv:2302.11382 [cs]. doi:10.48550/arXiv.2302.11382.

[117] "LLaMA: Open and Efficient Foundation Language Models - Meta Research". Meta Research. Retrieved 11 March 2023.

[118] Peng, Baolin; Galley, Michel; He, Pengcheng; Cheng, Hao; Xie, Yujia; Hu, Yu; Huang, Qiuyuan; Liden, Lars; Yu, Zhou; Chen, Weizhu; Gao, Jianfeng (1 March 2023). "Check Your Facts and Try Again: Improving Large Language Models with External Knowledge and Automated Feedback". arXiv:2302.12813 [cs]. doi:10.48550/arXiv.2302.12813.

[119] Raieli, Salvatore (13 March 2023). "SpikeGPT: a 260 M only parameters LM not afraid of competition". Medium. Retrieved 26 June 2023.

[120] Zhu, Rui-Jie; Zhao, Qihang; Eshraghian, Jason K. (28 February 2023). "SpikeGPT: Generative Pre-trained Language Model with Spiking Neural Networks". arXiv:2302.13939 [cs]. doi:10.48550/arXiv.2302.13939.

[121] Al-Kaswan, Ali; Izadi, Maliheh (28 February 2023). "The (ab)use of Open Source Code to Train Large Language Models". arXiv:2302.13681 [cs]. doi:10.48550/arXiv.2302.13681.

[122] Kwon, Minae; Xie, Sang Michael; Bullard, Kalesha; Sadigh, Dorsa (27 February 2023). "Reward Design with Language Models". arXiv:2303.00001 [cs]. doi:10.48550/arXiv.2303.00001.

[123] Bastian, Matthias (3 March 2023). "Microsoft's Kosmos-1 is a multimodal step toward more general AI". THE DECODER. Retrieved 18 September 2023.

[124] Huang, Shaohan; Dong, Li; Wang, Wenhui; Hao, Yaru; Singhal, Saksham; Ma, Shuming; Lv, Tengchao; Cui, Lei; Mohammed, Owais Khan; Patra, Barun; Liu, Qiang; Aggarwal, Kriti; Chi, Zewen; Bjorck, Johan; Chaudhary, Vishrav; Som, Subhojit; Song, Xia; Wei, Furu (1 March 2023). "Language Is Not All You Need: Aligning Perception with Language Models". arXiv:2302.14045 [cs]. doi:10.48550/arXiv.2302.14045.

[125] Cao, Meng; Fatemi, Mehdi; Cheung, Jackie Chi Kit; Shabanian, Samira (27 February 2023). "Systematic Rectification of Language Models via Dead-end Analysis". arXiv:2302.14003 [cs]. doi:10.48550/arXiv.2302.14003.

[126] Bertolini, Lorenzo; Elce, Valentina; Michalak, Adriana; Bernardi, Giulio; Weeds, Julie (28 February 2023). "Automatic Scoring of Dream Reports' Emotional Content with Large Language Models". arXiv:2302.14828 [cs]. doi:10.48550/arXiv.2302.14828.

[127] Ye, Seonghyeon; Hwang, Hyeonbin; Yang, Sohee; Yun, Hyeongu; Kim, Yireun; Seo, Minjoon (28 February 2023). "In-Context Instruction Learning". arXiv:2302.14691 [cs]. doi:10.48550/arXiv.2302.14691.

[128] Houghton, Conor; Kazanina, Nina; Sukumaran, Priyanka (28 February 2023). "Beyond the limitations of any imaginable mechanism: large language models and psycholinguistics". arXiv:2303.00077 [cs]. doi:10.48550/arXiv.2303.00077. Retrieved 10 March 2023.

[129] Yuan, Yang (2023). "Succinct Representations for Concepts". doi:10.48550/arXiv.2303.00446. {{cite journal}}: Cite journal requires |journal= (help)

[130] Huemann, Zachary; Lee, Changhee; Hu, Junjie; Cho, Steve Y.; Bradshaw, Tyler (1 March 2023). "Domain-adapted large language models for classifying nuclear medicine reports". arXiv:2303.01258 [cs]. doi:10.48550/arXiv.2303.01258.

[131] "A New Open Source Flan 20B with UL2". Yi Tay. Retrieved 30 June 2023.

[132] Zhang, Bowen; Soh, Harold (6 March 2023). "Large Language Models as Zero-Shot Human Models for Human-Robot Interaction". arXiv:2303.03548 [cs]. doi:10.48550/arXiv.2303.03548.

[133] Dang, Hai; Goller, Sven; Lehmann, Florian; Buschek, Daniel (6 March 2023). "Choice Over Control: How Users Write with Large Language Models using Diegetic and Non-Diegetic Prompting". arXiv:2303.03199 [cs]. doi:10.1145/3544548.3580969. Retrieved 8 March 2023.

[134] "Prepare for truly useful large language models". Nature Biomedical Engineering. 7 (2): 85–86. 7 March 2023. doi:10.1038/s41551-023-01012-6.

[135] "Stanford CRFM". crfm.stanford.edu. Retrieved 21 March 2023.

[136] "Announcement of Jurassic-2 and Task-Specific APIs". Data Phoenix. 12 March 2023. Retrieved 21 September 2023.

[137] "Large language model: Revision history - Wikipedia". en.wikipedia.org. Retrieved 21 September 2023.

[138] Lua error: Internal error: The interpreter has terminated with signal "24".

[139] Lua error: Internal error: The interpreter has terminated with signal "24".

[140] Lua error: Internal error: The interpreter has terminated with signal "24".

[141] Lua error: Internal error: The interpreter has terminated with signal "24".

[142] Lua error: Internal error: The interpreter has terminated with signal "24".

[143] Lua error: Internal error: The interpreter has terminated with signal "24".

[144] Lua error: Internal error: The interpreter has terminated with signal "24".

[145] Lua error: Internal error: The interpreter has terminated with signal "24".

[146] Lua error: Internal error: The interpreter has terminated with signal "24".

[147] Lua error: Internal error: The interpreter has terminated with signal "24".

[148] Lua error: Internal error: The interpreter has terminated with signal "24".

[149] Lua error: Internal error: The interpreter has terminated with signal "24".

[150] Lua error: Internal error: The interpreter has terminated with signal "24".

[151] Lua error: Internal error: The interpreter has terminated with signal "24".

[152] Lua error: Internal error: The interpreter has terminated with signal "24".

[153] Lua error: Internal error: The interpreter has terminated with signal "24".

[154] Lua error: Internal error: The interpreter has terminated with signal "24".

[155] Lua error: Internal error: The interpreter has terminated with signal "24".

[156] Lua error: Internal error: The interpreter has terminated with signal "24".

[157] Lua error: Internal error: The interpreter has terminated with signal "24".

[158] Lua error: Internal error: The interpreter has terminated with signal "24".

[159] Lua error: Internal error: The interpreter has terminated with signal "24".

[160] Lua error: Internal error: The interpreter has terminated with signal "24".

[161] Lua error: Internal error: The interpreter has terminated with signal "24".

[162] Lua error: Internal error: The interpreter has terminated with signal "24".

[163] Lua error: Internal error: The interpreter has terminated with signal "24".

[164] Lua error: Internal error: The interpreter has terminated with signal "24".

[165] Lua error: Internal error: The interpreter has terminated with signal "24".

[166] Lua error: Internal error: The interpreter has terminated with signal "24".

[167] Lua error: Internal error: The interpreter has terminated with signal "24".

[168] Lua error: Internal error: The interpreter has terminated with signal "24".

[169] Lua error: Internal error: The interpreter has terminated with signal "24".

[170] Lua error: Internal error: The interpreter has terminated with signal "24".

[171] Lua error: Internal error: The interpreter has terminated with signal "24".

[172] Lua error: Internal error: The interpreter has terminated with signal "24".

[173] Lua error: Internal error: The interpreter has terminated with signal "24".

[174] Lua error: Internal error: The interpreter has terminated with signal "24".

[175] Lua error: Internal error: The interpreter has terminated with signal "24".

[176] Lua error: Internal error: The interpreter has terminated with signal "24".

[177] Lua error: Internal error: The interpreter has terminated with signal "24".

[178] Lua error: Internal error: The interpreter has terminated with signal "24".

[179] Lua error: Internal error: The interpreter has terminated with signal "24".

[180] Lua error: Internal error: The interpreter has terminated with signal "24".

[181] Lua error: Internal error: The interpreter has terminated with signal "24".

[182] Lua error: Internal error: The interpreter has terminated with signal "24".

[183] Lua error: Internal error: The interpreter has terminated with signal "24".

[184] Lua error: Internal error: The interpreter has terminated with signal "24".

[185] Lua error: Internal error: The interpreter has terminated with signal "24".

[186] Lua error: Internal error: The interpreter has terminated with signal "24".

[187] Lua error: Internal error: The interpreter has terminated with signal "24".

[188] Lua error: Internal error: The interpreter has terminated with signal "24".

[189] Lua error: Internal error: The interpreter has terminated with signal "24".

[190] Lua error: Internal error: The interpreter has terminated with signal "24".

[1]

[2]

[3]

[4]

[5]

[6]

[7]

[8]

[9]

[10]

[11]

[12]

[13]

[14]

[15]

[16]

[17]

[18]

[19]

[20]

[21]

[22]

[23]

[24]

[25]

[26]

[27]

[28]

[29]

[30]

[31]

[32]

[33]

[34]

[35]

[36]

[37]

[38]

[39]

[40]

[41]

[42]

[43]

[44]

[45]

[46]

[47]

[48]

[49]

[50]

[51]

[52]

[53]

[54]

[55]

[56]

[57]

[58]

[59]

[60]

[61]

[62]

[63]

[64]

[65]

[66]

[67]

[68]

[69]

[70]

[71]

[72]

[73]

[74]

[75]

[76]

[77]

[78]

[79]

[80]

[81]

[82]

[83]

[84]

[85]

[86]

[87]

[88]

[89]

[90]

[91]

[92]

[93]

[94]

[95]

[96]

[97]

[98]

[99]

[100]

Timeline of large language models

Contents

Sample questions

Big picture

Full timeline

Numerical and visual data

Wikipedia Views

Google trends

Meta information on the timeline

How the timeline was built

Feedback and comments

What the timeline is still missing

Timeline update strategy

See also

External links

References

Navigation menu

Timeline of large language models

Sample questions

Big picture

Full timeline

Numerical and visual data

Wikipedia Views

Google trends

Meta information on the timeline

How the timeline was built

Feedback and comments

What the timeline is still missing

Timeline update strategy

See also

External links

References

Navigation menu

Search