Timeline of large language models

From Timelines
Revision as of 21:45, 31 May 2023 by Sebastian (talk | contribs)
Jump to: navigation, search

This is a timeline of large language models, which consist in artificial intelligence (AI) systems that use deep learning techniques to process and generate human-like natural language. LLMs are pre-trained on large amounts of data to learn the complexity and linkages of language, and can be adapted for specific tasks using techniques like fine-tuning, in-context learning, and zero-/one-/few-shot learning.[1]


Big picture

Time period Development summary More details
2010–2017 Early years Period characterized by the development of the first large-scale language models, such as the Google Ngram Corpus (2010) and the Microsoft Web N-gram Corpus (2013), which provides researchers with large datasets to train language models. During this period, researchers also develop new techniques for training neural language models, such as the use of recurrent neural networks (RNNs) and long short-term memory (LSTM) networks.
2017–2019 Emergence of transformers This period sees the emergence of the transformer architecture, which revolutionizes natural language processing and makes possible the development of larger and more powerful language models. In 2017, Vaswani et al. introduce the transformer architecture, which uses self-attention to model the relationships between words in a sentence. This architecture is used to develop the GPT (Generative Pre-trained Transformer) models by OpenAI, which would achieve state-of-the-art performance on a range of language tasks.
2019–present GPT-3 and beyond Period characterized by the development of even larger and more powerful language models, such as GPT-2 and GPT-3. In 2020, OpenAI releases GPT-3, which has 175 billion parameters, making it the largest language model to date. GPT-3 demonstrates impressive capabilities, such as the ability to generate coherent text, answer questions, and even write code. This period also sees the emergence of new research directions, such as using language models for unsupervised learning, few-shot learning, and transfer learning. By late 2022, LLMs becomes a sensation on the internet as OpenAI's ChatGPT acquires 1 million users within only 5 days of its release. The remarkable capabilities and extensive uses of ChatGPT can be attributed to the GPT-3 language model's 175 billion parameters.[2]

Full timeline

Year Month and date Model name Event type Details
2019 May 29 A team of researchers from the University of Washington and Allen Institute for AI Research introduce GROVER, a language model similar to GPT-2. However, they do not make the larger versions of the model publicly available.[3] Their publication discusses the potential risks of natural language generation technology and the need for robust defenses against neural fake news. Grover can generate realistic news articles that are difficult to distinguish from real news. They also explore the effectiveness of current methods for detecting fake news and find that the best defense against Grover is itself, with 92% accuracy. The article concludes by discussing the ethical issues related to the technology and the importance of public release of strong generators to facilitate better detection of neural fake news.[4]
2019 August NVIDIA introduces Megatron-LM.[5] It is a library that is optimized and made efficient for training large language models. By using Megatron's model parallelism, it is possible to train language models with billions of weights, which can then be utilized in NeMo for downstream tasks.[6]
2020 May 28 A paper discusses the use of language models in few-shot learning, where a model is trained on a large corpus of text and then fine-tuned for a specific task. The authors demonstrate that scaling up language models greatly improves task-agnostic, few-shot performance. They trained GPT-3, a language model with 175 billion parameters, and tested its performance in the few-shot setting. GPT-3 achieved strong performance on many NLP tasks, including translation, question-answering, and cloze tasks, as well as tasks that require on-the-fly reasoning or domain adaptation. However, the authors also identify some datasets where GPT-3's few-shot learning struggles, as well as methodological issues related to training on large web corpora. The paper also discusses the broader societal impacts of this finding and of GPT-3 in general.[7]
2020 June OpenAI releases GPT-3 as a service, powered by a 175-billion-parameter model that can generate text and code with short written prompts.[8]
2020 July A paper discusses the limitations of neural text generation models in open-ended tasks like language modeling and story generation, due to the standard likelihood training and approximate decoding objectives. The authors specifically analyze these limitations for abstractive document summarization and find that such models tend to hallucinate content that is unfaithful to the input document. The paper presents the results of a human evaluation of several neural abstractive summarization systems, highlighting the substantial amount of hallucinated content in all model-generated summaries. However, the authors also show that pretrained models perform better in terms of generating faithful and factual summaries, as evaluated by humans. They propose that textual entailment measures may be a better evaluation metric for faithfulness than standard metrics, leading to better training and decoding criteria.[9]
2021 January 11 Wu Dao is released. It's among the top large language models by parameter size.[2]
2021 May Google anounces chatbot LaMDA, but doesn't release it publicly.
2021 December Sequence modeling toolkit[10] Meta AI, previously known as FAIR (Facebook AI Research), announces the introduction of Fairseq, a language model with parameters of 13B and 1.1T. Fairseq is not related to Megatron, and the two use different technologies for training. Fairseq's dataset sources include the same ones used for RoBERTa (English Wikipedia, BookCorpus, CC-News, OpenWebText/Reddit upvoted, and Stories) with the new addition of English CC100 in Wikipedia style from Jan/2018-Dec/2018, resulting in a total dataset size of 453GB. Fairseq was trained using 2,363 GPU-days with 1,024 GPUs, taking approximately three days.[11]
2022 January 27 OpenAI announces having deployed InstructGPT, a new language model that is safer, more helpful, and more aligned with users. The model was trained using a reinforcement learning technique from human feedback and is significantly better at following instructions than the previous model, GPT-3. InstructGPT is also less toxic and generates fewer false facts than its predecessor. The company believes that fine-tuning language models with humans in the loop is a powerful tool for improving their safety and reliability. InstructGPT becomes the default language model accessible on OpenAI's API.[12]
2022 February 28 Cohere launches a new beta version of their language generation model called "Extremely Large", which, according to Cohere, outperforms their existing largest model, Large, on various tasks such as sentiment analysis, named entity recognition (NER), and common sense reasoning.[13]
2022 March 21 NVIDIA and Microsoft introduce Megatron-Turing NLG 530B (The Pile). Megatron-Turing Natural Language Generation model (MT-NLG).[14]
2022 March 24 Researchers report having developed a new language model called SeeKeR that combines internet search, knowledge generation, and response generation to improve factual accuracy in open-domain knowledge-grounded conversations. SeeKeR outperforms the model BlenderBot 2 in terms of consistency, knowledge, and engagingness for the same number of parameters. SeeKeR also outperforms GPT2 and GPT3 in terms of factuality and topicality for prompt completions as a standard language model.[15]
2022 March 25 Open-source model for program synthesis[16] A paper introduces a family of LLMs called CODEGEN, trained on natural language and programming language data for program synthesis. The authors release CODEGEN and the training library JAXFORMER to democratize access to such models. They demonstrate that CODEGEN is competitive with previous state-of-the-art models for zero-shot Python code generation and investigate multi-turn program synthesis using an open benchmark called MTPB. Their analysis shows that multi-turn program synthesis significantly improves program synthesis over single-turn prompts. The training library and model checkpoints are available as open source contributions.[17]
2022 April 4 Google Research introduces the Pathways Language Model (PaLM), a 540-billion parameter language model that can achieve state-of-the-art few-shot performance across most tasks. PaLM's training was scaled using data parallelism at the Pod level across two Cloud TPU v4 Pods, while using standard data and model parallelism within each Pod. The PaLM was trained using a combination of English and multilingual datasets that include high-quality web documents, books, Wikipedia, conversations, and GitHub code. PaLM has demonstrated impressive natural language understanding and generation capabilities on several BIG-bench tasks, including distinguishing cause and effect, understanding conceptual combinations in appropriate contexts, and even guessing the movie from an emoji.[18]
2022 April 5 A paper presents PaLM, a 540-billion parameter language model trained using Pathways, a new machine learning system that enables highly efficient training across multiple TPU Pods. PaLM achieves state-of-the-art few-shot learning results on hundreds of language understanding and generation benchmarks and outperforms the finetuned state-of-the-art on a suite of multi-step reasoning tasks. It also outperforms average human performance on the BIG-bench benchmark. Additionally, PaLM has strong capabilities in multilingual tasks and source code generation. The paper also discusses bias and toxicity and potential mitigation strategies.[19]
2022 April 12 A paper describes a method for training language models to act as helpful and harmless assistants using reinforcement learning from human feedback. The authors demonstrate that this alignment training improves performance on almost all natural language processing evaluations and is compatible with training for specialized skills such as python coding and summarization. They explore an iterated online mode of training and investigate the robustness of the approach, identifying a linear relationship between the RL reward and the square root of the Kullback–Leibler divergence between the policy and its initialization. The authors also perform peripheral analyses and provide samples from their models using prompts from recent related work.[20]
2022 April OpenAI reveals DALL-E 2.
2022 June 2 OpenAI publishes a blog post on the development of best practices for organizations developing or deploying large language models. The principles include prohibiting misuse of language models, mitigating unintentional harm by evaluating models, minimizing sources of bias, and collaborating with stakeholders. These practices are meant to mitigate the risks of language models and achieve their full potential to augment human capabilities. The authors express hope that other organizations will adopt these principles and advance public discussion on language model development and deployment. The support from other organizations shows the growing social concern over the safety of LLMs.[21]
2022 September Competition Nvidia announces the launch of its BioNeMo LLM service to help researchers build new artificial intelligence models for biology.[22]
2023 January 5 A paper discusses the concern about the potential of LLMs to influence, modify, and manipulate user preferences adversarially. As these models become more proficient in deducing user preferences and offering tailored assistance, their lack of interpretability in adversarial settings is a major concern. The paper examines existing literature on adversarial behavior in user preferences and provides red teaming samples for dialogue models like ChatGPT and GODEL. It also probes the attention mechanism in these models for non-adversarial and adversarial settings.[23]
2023 January 11 OpenAI researchers collaborate with Georgetown University and the Stanford Internet Observatory to investigate how language models might be misused for disinformation campaigns. Their report outlines the threats that language models pose to the information environment if used to augment disinformation campaigns and introduces a framework for analyzing potential mitigations. The report points out that language models could drive down the cost of running influence operations, place them within reach of new actors and actor types, and generate more impactful or persuasive messaging compared to propagandists. It also introduces the key stages in the language model-to-influence operation pipeline and provides a set of guiding questions for policymakers and others to consider for mitigations.[24]
2022 January 19 A paper introduces CM3, a family of causally masked generative models trained on large-scale web and Wikipedia articles containing text and image tokens. The new approach generates tokens left to right while masking out a small number of long token spans that are generated at the end of the string. This provides a hybrid of the more common causal and masked language models, allowing for full generative modeling while providing bidirectional context when generating the masked spans. The resulting CM3 models can generate rich structured, multi-modal outputs while conditioning on arbitrary masked document contexts and implicitly learn a wide range of text, image, and cross-modal tasks. The paper also reports state-of-the-art performance in zero-shot summarization, entity linking, and entity disambiguation, while maintaining competitive performance in the fine-tuning setting.[25]
2022 February 2 GPT-NeoX-20B is introduced as a 20 billion parameter model that was trained in collaboration with CoreWeave. It is at this time the largest publicly accessible pretrained general-purpose autoregressive language model, and its checkpoints are available for download under Apache 2.0. The model was trained using GPT-NeoX framework on GPUs provided by CoreWeave.[26]
2022 March 29 A paper investigates the optimal model size and number of tokens for training a transformer language model under a given compute budget. The researchers find that current large language models are significantly undertrained, and the model size and the number of training tokens should be scaled equally for compute-optimal training. They test this hypothesis by training a predicted compute-optimal model, Chinchilla, that uses the same compute budget as Gopher but with 70B parameters and 4x more data. Chinchilla outperforms Gopher, GPT-3, Jurassic-1, and Megatron-Turing NLG on a range of downstream evaluation tasks and reaches a state-of-the-art average accuracy of 67.5% on the MMLU benchmark, more than a 7% improvement over Gopher.[27]
2022 November 9 A paper introduces BLOOM, a 176B-parameter open-access language model designed and built by a collaboration of hundreds of researchers. The model is a decoder-only Transformer language model trained on the ROOTS corpus, a dataset comprising hundreds of sources in 46 natural and 13 programming languages. BLOOM achieves competitive performance on a wide variety of benchmarks and is publicly released under the Responsible AI License to facilitate future research and applications using large language models. The paper also discusses the development process and the need to democratize large language models.[28]
2023 January 31 LLM launch FLAME is introduced as a small language model for assisting in the creation of spreadsheet formulas. It is based on T5 and trained on Excel formulas using domain-specific insights to achieve competitive performance with a substantially smaller model size (60M parameters) and much less training data. FLAME outperforms much larger models in 6 out of 10 settings, including formula repair, formula auto-completion, and syntax reconstruction.[29]
2023 February 9 A paper presents a collaborative design framework that combines interactive evolution and LLMs to simulate the human design process. The framework uses interactive evolution to exploit user feedback and LLMs for a complex creative task of recombining and varying ideas. The process begins with a brief and a set of candidate designs, generated by a language model or proposed by users. Users provide feedback to an interactive genetic algorithm that selects, recombines, and mutates the most promising designs. The framework was evaluated on three game design tasks with human designers collaborating remotely.[30]
2023 February 9 Framework wrapper[31] LLM launch Toolformer is introduced. It is a language model trained to use external tools via simple APIs, which can achieve improved performance on downstream tasks. The model is trained in a self-supervised way, using only a handful of demonstrations for each API. The model, which incorporates a range of tools including a calculator, Q&A system, search engines, translation system, and calendar, achieves substantially improved zero-shot performance across various downstream tasks, often competitive with much larger models, without sacrificing its core language modeling abilities.[32]
2023 February 14 Research A paper presents a framework called ChatCAD, which integrates LLMs with computer-aided diagnosis (CAD) networks for medical images. ChatCAD uses LLMs to enhance the output of multiple CAD networks by summarizing and reorganizing the information presented in natural language text format. This approach merges the strengths of LLMs' medical domain knowledge and logical reasoning with the vision understanding capability of existing medical-image CAD models. The goal is to create a more user-friendly and understandable system for patients compared to conventional CAD systems. The paper suggests that LLMs can also be used to improve the performance of vision-based medical-image CAD models in the future.[33]
2023 c.February 14 LLM launch Full-stack generative AI platform Writer launches Palmyra, a trio of LLMs that focus on business writing and marketing data. The models include Palmyra Small (128M), Palmyra Base (5B), and Palmyra Large (20B), and are aimed at enterprises looking to invest in generative AI. Palmyra LLMs offer both an application layer and a foundation model layer, making Writer the first to provide both on a single platform. The models also offer high levels of security and privacy features. While general-use LLMs can achieve human-like output, they lack contextual awareness, multi-modal inputs, brand integrity and compliance with security and privacy standards, limiting their usefulness for enterprise organizations.[34][35]
2023 February 17 Research A paper surveys the state of the art of hybrid language models architectures and strategies for complex question-answering (QA, CQA, CPS). While very large language models are good at leveraging public data on standard problems, they may require specific architecture, knowledge, skills, tasks, methods, sensitive data, performance, human approval, and versatile feedback to tackle more specific complex questions or problems. The paper identifies the key elements used with LLMs to solve complex questions or problems and discusses challenges associated with complex QA. The paper also reviews current solutions and promising strategies, using elements such as hybrid LLM architectures, human-in-the-loop reinforcement learning, prompting adaptation, neuro-symbolic and structured knowledge grounding, program synthesis, and others.[36]
2023 February 20 LLM launch MOSS is introduced as a conversational language model developed by Fudan University. It performs various natural language tasks including question answering, text summarization, and code generation. It is aimed to be open-sourced to facilitate future research. MOSS has some limitations, such as poor performance on languages other than English and a relatively small model capacity. It may also generate misleading or false information and may need multiple attempts to follow instructions correctly.[37]
2023 February 21 Research A paper presents a catalog of prompt engineering techniques in pattern form that have been applied successfully to solve common problems when conversing with large language models (LLMs), such as ChatGPT. Prompt patterns are reusable solutions to common problems faced when working with LLMs that can customize the outputs and interactions with an LLM. The paper provides a framework for documenting patterns for structuring prompts to solve a range of problems and presents a catalog of patterns that have been applied successfully to improve the outputs of LLM conversations. It also explains how prompts can be built from multiple patterns and illustrates prompt patterns that benefit from combination with other prompt patterns. The paper contributes to research on prompt engineering that applies LLMs to automate software development tasks.[38]
2023 February 24 Research A paper proposes a system called LLM-Augmenter that improves large language models by using external knowledge and automated feedback. The system adds plug-and-play modules to a black-box LLM to ground responses in external knowledge and iteratively improve responses using feedback generated by utility functions. The system is validated on task-oriented dialog and open-domain question answering, showing a significant reduction in hallucinations without sacrificing fluency and informativeness. The source code and models are publicly available.[39]
2023 February 24 LLaMA is introduced as a collection of open-source foundation language models, ranging from 7B to 65B parameters, that were trained on publicly available datasets without the need for proprietary or inaccessible data. The largest model, LLaMA-65B, is competitive with other top models such as Chinchilla70B and PaLM-540B. LLaMA-13B outperforms GPT-3 (175B) on most benchmarks. All models are available for research purposes.[40]
2023 February 27 A paper proposes a framework that simplifies reward design in reinforcement learning (RL) by using natural language as a proxy for the reward function. The framework prompts a large language model, such as GPT-3, to evaluate the agent's behavior against the desired behavior described in the prompt and outputs a corresponding reward signal. The RL agent uses this reward to update its behavior. The approach is evaluated in three tasks, and the results demonstrate that RL agents trained with the framework are well-aligned with the user's objectives and outperform RL agents trained with reward functions learned via supervised learning.[41]
2023 February 27 Generative language model A paper discusses the development of a generative language model called SpikeGPT that uses spiking neural networks (SNNs) for more energy-efficient deep learning. While SNNs have been successful in computer vision tasks, their performance in language generation has been limited due to the challenge of training them. SpikeGPT overcomes this challenge by modifying the transformer block to reduce computational complexity and achieves competitive performance with non-spiking models on tested benchmarks while using 5x less energy consumption.[42]
2023 February 27 A paper introduces Kosmos-1, a Multimodal MLLM that can perceive general modalities, learn in context (i.e., few-shot), and follow instructions (i.e., zero-shot). The model is trained from scratch on web-scale multimodal corpora, including text and images, image-caption pairs, and text data. The model achieves impressive performance on language understanding, generation, and even OCR-free NLP (directly fed with document images), perception-language tasks, including multimodal dialogue, image captioning, visual question answering, and vision tasks such as image recognition with descriptions. The paper also shows that MLLMs can benefit from cross-modal transfer, i.e., transfer knowledge from language to multimodal, and from multimodal to language. A dataset of Raven IQ test is introduced, which diagnoses the nonverbal reasoning capability of MLLMs.[43]
2023 February 27 A paper proposes a method called "rectification" for reducing the risk of LLMs generating toxic discourses. The method is based on the probability that the finished discourse will be considered toxic, and advises against token selections proportional to this probability. The approach utilizes a separate but smaller model for detoxification and does not require access to the internal representations of the LLM. The method significantly improves the generated discourse compared to base LLMs and other techniques in terms of both language and detoxification performance, and can be applied to diverse LLMs that share the same vocabulary.[44]
2023 February 27 Research A paper discusses the use of open source code to train large language models (LLMs) and the potential security, privacy, and licensing implications of this practice. LLMs for code are commonly trained on large unsanitized corpora of source code scraped from the internet, leading to the memorization and verbatim emission of content by the models. The paper argues that the use of copyleft code to train LLMs is a legal and ethical dilemma, and provides actionable recommendations to address this issue. Overall, the paper highlights the importance of considering the implications of using open source code in training LLMs.[45]
2023 February 28 GPT-based metric[46] GEMBA (GPT Estimation Metric Based Assessment) is presented as a GPT-based metric for evaluating translation quality both with and without a reference translation. The authors evaluate four prompt variants in two modes and investigate seven versions of GPT models, including ChatGPT. Their method achieves state-of-the-art accuracy in both modes compared to human labels and provides insight into the usefulness of pre-trained, generative large language models for translation quality assessment.[47]
2023 February 28 Research A paper discusses the potential use of large language models in psycholinguistics. The authors note that while these models are not detailed models of human linguistic processing, they are highly successful in their primary task of providing a model for language. They suggest that large language models can be useful in psycholinguistics as a practical tool, for comparative purposes, and philosophically, as a means of rethinking the relationship between language and thought.[48]
2023 February 28 A study proposes using LLMs for the automatic analysis of dream reports, specifically focusing on references to emotions. The authors use off-the-shelf and bespoke approaches and find that the bespoke text classification method achieves high performance and is robust against potential biases. This approach could find application in the analysis of large dream datasets and improve the reproducibility and comparability of results across studies. The study of dream content in dream research is typically performed through manual scoring of verbal reports provided by dreamers. This task is time-consuming and requires trained annotators.[49]
2023 February 28 A paper discusses In-Context Instruction Learning (ICIL), a new approach to instruction learning for LLMs that significantly improves zero-shot task generalization performance. ICIL uses a single fixed prompt that concatenates cross-task demonstrations to evaluate all tasks, and it is complementary to instruction-based fine-tuning. The authors demonstrate that ICIL improves the performance of both pretrained and instruction-fine-tuned models, including the most powerful instruction-fine-tuned baseline (text-davinci-003) by 9.3%.[50]
2023 March 1 Research A paper introduces a method to train language models like ChatGPT to understand concepts precisely using succinct representations based on category theory. The representations provide concept-wise invariance properties and a new learning algorithm that can accurately learn complex concepts or fix misconceptions. The approach also allows for the generation of a hierarchical decomposition of the representations, which can be manually verified by examining each part individually.[51]
2023 March GPT-NeoX-20B is introduced a language model with 20 billion parameters trained on the Pile dataset. The model is a powerful few-shot reasoner and outperforms similarly sized models on various tasks. The training and evaluation code and model weights are open-sourced. The model was developed by Sid Black, Stella Biderman, and Eric Hallahan with the support of CoreWeave and trained using fp16.[52]
2023 March 1 A study evaluates the value of domain adaptation in nuclear medicine by adapting language models for the purpose of 5-point Deauville score prediction based on clinical 18F-fluorodeoxyglucose (FDG) PET/CT reports. The researchers used multiple general-purpose transformer language models to classify the reports into Deauville scores 1-5, and then adapted the models to the nuclear medicine domain using masked language modeling. Domain adaptation improved the performance of all language models, and the best performing model (domain-adapted RoBERTa) achieved a five-class accuracy of 77.4%, which was better than the physician's performance (66%), the best vision model's performance (48.1%), and was similar to the multimodal model's performance (77.2%).[53]
2023 March 3 Two stage framework[54] Research A paper proposes a framework called Prophet that uses answer heuristics to prompt LLMs for knowledge-based visual question answering (VQA). Previous methods used LLMs to acquire necessary knowledge for answering, but these methods did not fully activate the capacity of LLMs due to insufficient input information. Prophet trains a vanilla VQA model on a knowledge-based VQA dataset without external knowledge and extracts two types of answer heuristics: answer candidates and answer-aware examples. These answer heuristics are encoded into prompts to enhance the capacity of LLMs. Prophet outperforms existing state-of-the-art methods on two challenging knowledge-based VQA datasets, OK-VQA and A-OKVQA, delivering 61.1% and 55.7% accuracies on their testing sets, respectively.[55]
2023 March 6 Research A paper explores the potential of using LLMs as zero-shot human models for human-robot interaction (HRI). Human models are important for HRI, but they are challenging to create. LLMs have consumed vast amounts of human-generated text data and can be used as human models without prior knowledge or interaction data. The authors conducted experiments on three social datasets and found that LLMs can achieve performance comparable to purpose-built models, but there are limitations such as sensitivity to prompts and spatial/numerical reasoning issues. The authors demonstrate how LLM-based human models can be integrated into a social robot's planning process and applied in HRI scenarios through a case study on a simulated trust-based table-clearing task and a robot utensil-passing experiment. The results show that LLMs offer a promising approach to human modeling for HRI, but it is incomplete.[56]
2023 March 6 Research A paper proposes a perspective on prompts for LLMs that distinguishes between diegetic and non-diegetic prompts, and studies how users write with LLMs using different user interfaces. The results show that when the interface offered multiple suggestions and provided an option for non-diegetic prompting, participants preferred choosing from multiple suggestions over controlling them via non-diegetic prompts. When participants provided non-diegetic prompts it was to ask for inspiration, topics or facts. Single suggestions in particular were guided both with diegetic and non-diegetic information. The paper informs human-AI interaction with generative models by revealing that writing non-diegetic prompts requires effort, people combine diegetic and non-diegetic prompting, and they use their draft and suggestion timing to strategically guide LLMs.[57]
2023 March 7 Synthetic data generation method A paper presents SynthIE, a method for synthetic data generation that LLMs to generate plausible text for structured outputs in the opposite direction. The authors demonstrate the effectiveness of this approach on closed information extraction, where collecting ground-truth data is challenging, and no satisfactory dataset exists to date. They synthetically generate a dataset of 1.8 million data points, demonstrate its superior quality compared to existing datasets in a human evaluation, and use it to fine-tune small models (220M and 770M parameters). The models they introduce outperform existing baselines of comparable size with a substantial gap in micro and macro F1 scores.[58]
2023 March 13 Instruction-following language model Alpaca is introduced as a new instruction-following language model that is fine-tuned from Meta's LLaMA 7B model on 52,000 instruction-following demonstrations generated using OpenAI's text-davinci-003. Alpaca shows similar behavior to text-davinci-003 in a preliminary evaluation and is surprisingly small and easy/cheap to reproduce. The authors also release the training recipe and data, with the intention to release the model weights in the future. The article emphasizes that Alpaca is only intended for academic research, and commercial use is prohibited. The authors encourage readers to evaluate Alpaca through an interactive demo and to report any concerning behaviors.[59]
2023 Marh 14 Medical language model Google shares health AI updates including progress on their Medical PaLM 2, expert-level medical language model (LLM) research which demonstrated consistently expert-level performance on medical exam questions, scoring 85%. The company has partnered with Jacaranda Health and Chang Gung Memorial Hospital to build AI models that can help simplify acquiring and interpreting ultrasound images to identify important information like gestational age in expecting mothers and early detection of breast cancer. They're also partners with Mayo Clinic with the purpose to extend the reach of their AI model, with the goal of helping more patients receive radiotherapy treatment sooner. Additionally, Google works with partners on the ground to bring their research on tuberculosis (TB) AI-powered chest x-ray screening into the care setting.[60]
2023 March 16 Multimodal model OpenAI introduces GPT-4, a large multimodal model that can process both text and image inputs and produce text outputs. GPT-4 shows human-level performance on professional and academic benchmarks and outperforms previous large language models on traditional NLP benchmarks. The report discusses the challenge of developing deep learning infrastructure and optimization methods that behave predictably across a wide range of scales. While GPT-4 has limitations and safety challenges, OpenAI has taken steps to mitigate potential harms. An extensive system card is included in the report.[61]
2023 March 23 An article investigates the potential implications of large language models (LLMs), such as Generative Pretrained Transformers (GPTs), on the U.S. labor market. The authors propose a new rubric for assessing LLM capabilities and their potential effects on jobs. The study finds that around 80% of the U.S. workforce could have at least 10% of their work tasks affected by the introduction of LLMs, while approximately 19% of workers may see at least 50% of their tasks impacted. The study suggests that LLMs such as GPTs exhibit traits of general-purpose technologies, indicating that they could have considerable economic, social, and policy implications.[62]
2023 April 19 Stability AI open-sources its large language model, StableLM, which is designed to efficiently generate text and code. The models are available on GitHub and contain between 3 billion and 7 billion parameters, with 15 to 65 billion parameter models to arrive later. The model is trained on a larger version of the open-source dataset known as the Pile and encompasses information from a range of sources, including Wikipedia, Stack Exchange, and PubMed. The move builds on Stability AI's mission to make AI tools more accessible, as it has done with its AI image generator, Stable Diffusion.[63][64]
2023 May 10 Transformer-based neural network model[65], general-purpose AI model[66] Google launches PaLM 2, its latest LLM to date, at its I/O developer conference. PaLM 2 is aimed to power Google's updated Bard chat tool, compete with OpenAI's ChatGPT, and serve as the foundation model for new AI features. While technical details about training are not provided, Google focused on the model's capabilities, such as improved common sense reasoning, mathematics, and logic. PaLM 2 excels at multilingual tasks and includes specialized models like Codey for coding and debugging, Med-PaLM 2 for medical knowledge, and Sec-PaLM for security use cases. There is also a smaller PaLM 2 model for smartphones.[67]
2023 May 21 Rodney Brooks, a robotics researcher and AI expert, argues that large language models like OpenAI's ChatGPT are not as intelligent as people believe and are far from being able to compete with humans on an intellectual level. Brooks highlights that these models lack an underlying understanding of the world and merely exhibit correlations in language. Current language models can sound like they understand, but they lack the ability to logically infer meaning, leading to potential misinterpretations. Brooks emphasizes that these models are good at generating answers that sound right but may not be accurate. He shares his experience of relying on large language models for coding tasks and finding that they often provide confidently wrong answers. Brooks concludes that while future iterations of AI may bring interesting advancements, they are unlikely to achieve artificial general intelligence (AGI).[68]

Meta information on the timeline

How the timeline was built

The initial version of the timeline was written by Sebastian.

Funding information for this timeline is available.

Feedback and comments

Feedback for the timeline can be provided at the following places:

  • FIXME

What the timeline is still missing

Timeline update strategy

See also

External links

References

  1. "Large Language Models: Complete Guide in 2023". research.aimultiple.com. Retrieved 11 March 2023. 
  2. 2.0 2.1 "Large Language Model Training in 2023". research.aimultiple.com. Retrieved 11 March 2023. 
  3. "GPT-2: 6-month follow-up". openai.com. Retrieved 23 March 2023. 
  4. Zellers, Rowan; Holtzman, Ari; Rashkin, Hannah; Bisk, Yonatan; Farhadi, Ali; Roesner, Franziska; Choi, Yejin (2019). "Defending Against Neural Fake News". doi:10.48550/arXiv.1905.12616. 
  5. "Megatron Unleashed: NVIDIA's NLP Model "Megatron-LM" is the Largest Transformer Ever Trained | Exxact Blog". www.exxactcorp.com. Retrieved 11 March 2023. 
  6. "NeMo Megatron — NVIDIA NeMo". docs.nvidia.com. Retrieved 11 March 2023. 
  7. Brown, Tom B.; Mann, Benjamin; Ryder, Nick; Subbiah, Melanie; Kaplan, Jared; Dhariwal, Prafulla; Neelakantan, Arvind; Shyam, Pranav; Sastry, Girish; Askell, Amanda; Agarwal, Sandhini; Herbert-Voss, Ariel; Krueger, Gretchen; Henighan, Tom; Child, Rewon; Ramesh, Aditya; Ziegler, Daniel M.; Wu, Jeffrey; Winter, Clemens; Hesse, Christopher; Chen, Mark; Sigler, Eric; Litwin, Mateusz; Gray, Scott; Chess, Benjamin; Clark, Jack; Berner, Christopher; McCandlish, Sam; Radford, Alec; Sutskever, Ilya; Amodei, Dario (2020). "Language Models are Few-Shot Learners". doi:10.48550/arXiv.2005.14165. 
  8. Lee, Angie (26 January 2023). "What Are Large Language Models Used For and Why Are They Important?". NVIDIA Blog. Retrieved 11 March 2023. 
  9. Maynez, Joshua; Narayan, Shashi; Bohnet, Bernd; McDonald, Ryan (July 2020). "On Faithfulness and Factuality in Abstractive Summarization". Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics. Association for Computational Linguistics: 1906–1919. doi:10.18653/v1/2020.acl-main.173. 
  10. "fairseq documentation — fairseq 0.12.2 documentation". fairseq.readthedocs.io. Retrieved 16 May 2023. 
  11. "AI: Megatron the Transformer, and its related language models". lifearchitect.ai. 24 September 2021. Retrieved 12 March 2023. 
  12. "Aligning language models to follow instructions". openai.com. Retrieved 21 March 2023. 
  13. "Cohere launches Extremely Large (beta)". Context by Cohere. 1 March 2022. Retrieved 12 March 2023. 
  14. "AI: Megatron the Transformer, and its related language models". Dr Alan D. Thompson – Life Architect. 24 September 2021. Retrieved 11 March 2023. 
  15. Shuster, Kurt; Komeili, Mojtaba; Adolphs, Leonard; Roller, Stephen; Szlam, Arthur; Weston, Jason (2022). "Language Models that Seek for Knowledge: Modular Search & Generation for Dialogue and Prompt Completion". doi:10.48550/arXiv.2203.13224. 
  16. "CodeGen". github.com. Salesforce. 16 May 2023. Retrieved 16 May 2023. 
  17. Nijkamp, Erik; Pang, Bo; Hayashi, Hiroaki; Tu, Lifu; Wang, Huan; Zhou, Yingbo; Savarese, Silvio; Xiong, Caiming (2022). "CodeGen: An Open Large Language Model for Code with Multi-Turn Program Synthesis". doi:10.48550/arXiv.2203.13474. 
  18. "Pathways Language Model (PaLM): Scaling to 540 Billion Parameters for Breakthrough Performance". ai.googleblog.com. Retrieved 21 March 2023. 
  19. Chowdhery, Aakanksha; Narang, Sharan; Devlin, Jacob; Bosma, Maarten; Mishra, Gaurav; Roberts, Adam; Barham, Paul; Chung, Hyung Won; Sutton, Charles; Gehrmann, Sebastian; Schuh, Parker; Shi, Kensen; Tsvyashchenko, Sasha; Maynez, Joshua; Rao, Abhishek; Barnes, Parker; Tay, Yi; Shazeer, Noam; Prabhakaran, Vinodkumar; Reif, Emily; Du, Nan; Hutchinson, Ben; Pope, Reiner; Bradbury, James; Austin, Jacob; Isard, Michael; Gur-Ari, Guy; Yin, Pengcheng; Duke, Toju; Levskaya, Anselm; Ghemawat, Sanjay; Dev, Sunipa; Michalewski, Henryk; Garcia, Xavier; Misra, Vedant; Robinson, Kevin; Fedus, Liam; Zhou, Denny; Ippolito, Daphne; Luan, David; Lim, Hyeontaek; Zoph, Barret; Spiridonov, Alexander; Sepassi, Ryan; Dohan, David; Agrawal, Shivani; Omernick, Mark; Dai, Andrew M.; Pillai, Thanumalayan Sankaranarayana; Pellat, Marie; Lewkowycz, Aitor; Moreira, Erica; Child, Rewon; Polozov, Oleksandr; Lee, Katherine; Zhou, Zongwei; Wang, Xuezhi; Saeta, Brennan; Diaz, Mark; Firat, Orhan; Catasta, Michele; Wei, Jason; Meier-Hellstern, Kathy; Eck, Douglas; Dean, Jeff; Petrov, Slav; Fiedel, Noah (2022). "PaLM: Scaling Language Modeling with Pathways". doi:10.48550/arXiv.2204.02311. 
  20. Bai, Yuntao; Jones, Andy; Ndousse, Kamal; Askell, Amanda; Chen, Anna; DasSarma, Nova; Drain, Dawn; Fort, Stanislav; Ganguli, Deep; Henighan, Tom; Joseph, Nicholas; Kadavath, Saurav; Kernion, Jackson; Conerly, Tom; El-Showk, Sheer; Elhage, Nelson; Hatfield-Dodds, Zac; Hernandez, Danny; Hume, Tristan; Johnston, Scott; Kravec, Shauna; Lovitt, Liane; Nanda, Neel; Olsson, Catherine; Amodei, Dario; Brown, Tom; Clark, Jack; McCandlish, Sam; Olah, Chris; Mann, Ben; Kaplan, Jared (2022). "Training a Helpful and Harmless Assistant with Reinforcement Learning from Human Feedback". doi:10.48550/arXiv.2204.05862. 
  21. "Best practices for deploying language models". openai.com. Retrieved 17 March 2023. 
  22. "Nvidia boosts generative AI for biology with BioNeMo". VentureBeat. 12 January 2023. Retrieved 11 March 2023. 
  23. Subhash, Varshini (5 January 2023). "Can Large Language Models Change User Preference Adversarially?". arXiv:2302.10291 [cs]. doi:10.48550/arXiv.2302.10291. 
  24. "Forecasting potential misuses of language models for disinformation campaigns and how to reduce risk". openai.com. Retrieved 14 March 2023. 
  25. Aghajanyan, Armen; Huang, Bernie; Ross, Candace; Karpukhin, Vladimir; Xu, Hu; Goyal, Naman; Okhonko, Dmytro; Joshi, Mandar; Ghosh, Gargi; Lewis, Mike; Zettlemoyer, Luke (2022). "CM3: A Causal Masked Multimodal Model of the Internet". doi:10.48550/arXiv.2201.07520. 
  26. Leahy, Connor (2 February 2022). "Announcing GPT-NeoX-20B". EleutherAI Blog. Retrieved 21 March 2023. 
  27. Hoffmann, Jordan; Borgeaud, Sebastian; Mensch, Arthur; Buchatskaya, Elena; Cai, Trevor; Rutherford, Eliza; Casas, Diego de Las; Hendricks, Lisa Anne; Welbl, Johannes; Clark, Aidan; Hennigan, Tom; Noland, Eric; Millican, Katie; Driessche, George van den; Damoc, Bogdan; Guy, Aurelia; Osindero, Simon; Simonyan, Karen; Elsen, Erich; Rae, Jack W.; Vinyals, Oriol; Sifre, Laurent (2022). "Training Compute-Optimal Large Language Models". doi:10.48550/arXiv.2203.15556. 
  28. Workshop, BigScience; Scao, Teven Le; Fan, Angela; Akiki, Christopher; Pavlick, Ellie; Ilić, Suzana; Hesslow, Daniel; Castagné, Roman; Luccioni, Alexandra Sasha; Yvon, François; Gallé, Matthias; Tow, Jonathan; Rush, Alexander M.; Biderman, Stella; Webson, Albert; Ammanamanchi, Pawan Sasanka; Wang, Thomas; Sagot, Benoît; Muennighoff, Niklas; del Moral, Albert Villanova; Ruwase, Olatunji; Bawden, Rachel; Bekman, Stas; McMillan-Major, Angelina; Beltagy, Iz; Nguyen, Huu; Saulnier, Lucile; Tan, Samson; Suarez, Pedro Ortiz; Sanh, Victor; Laurençon, Hugo; Jernite, Yacine; Launay, Julien; Mitchell, Margaret; Raffel, Colin; Gokaslan, Aaron; Simhi, Adi; Soroa, Aitor; Aji, Alham Fikri; Alfassy, Amit; Rogers, Anna; Nitzav, Ariel Kreisberg; Xu, Canwen; Mou, Chenghao; Emezue, Chris; Klamm, Christopher; Leong, Colin; van Strien, Daniel; Adelani, David Ifeoluwa; Radev, Dragomir; Ponferrada, Eduardo González; Levkovizh, Efrat; Kim, Ethan; Natan, Eyal Bar; De Toni, Francesco; Dupont, Gérard; Kruszewski, Germán; Pistilli, Giada; Elsahar, Hady; Benyamina, Hamza; Tran, Hieu; Yu, Ian; Abdulmumin, Idris; Johnson, Isaac; Gonzalez-Dios, Itziar; de la Rosa, Javier; Chim, Jenny; Dodge, Jesse; Zhu, Jian; Chang, Jonathan; Frohberg, Jörg; Tobing, Joseph; Bhattacharjee, Joydeep; Almubarak, Khalid; Chen, Kimbo; Lo, Kyle; Von Werra, Leandro; Weber, Leon; Phan, Long; allal, Loubna Ben; Tanguy, Ludovic; Dey, Manan; Muñoz, Manuel Romero; Masoud, Maraim; Grandury, María; Šaško, Mario; Huang, Max; Coavoux, Maximin; Singh, Mayank; Jiang, Mike Tian-Jian; Vu, Minh Chien; Jauhar, Mohammad A.; Ghaleb, Mustafa; Subramani, Nishant; Kassner, Nora; Khamis, Nurulaqilla; Nguyen, Olivier; Espejel, Omar; de Gibert, Ona; Villegas, Paulo; Henderson, Peter; Colombo, Pierre; Amuok, Priscilla; Lhoest, Quentin; Harliman, Rheza; Bommasani, Rishi; López, Roberto Luis; Ribeiro, Rui; Osei, Salomey; Pyysalo, Sampo; Nagel, Sebastian; Bose, Shamik; Muhammad, Shamsuddeen Hassan; Sharma, Shanya; Longpre, Shayne; Nikpoor, Somaieh; Silberberg, Stanislav; Pai, Suhas; Zink, Sydney; Torrent, Tiago Timponi; Schick, Timo; Thrush, Tristan; Danchev, Valentin; Nikoulina, Vassilina; Laippala, Veronika; Lepercq, Violette; Prabhu, Vrinda; Alyafeai, Zaid; Talat, Zeerak; Raja, Arun; Heinzerling, Benjamin; Si, Chenglei; Taşar, Davut Emre; Salesky, Elizabeth; Mielke, Sabrina J.; Lee, Wilson Y.; Sharma, Abheesht; Santilli, Andrea; Chaffin, Antoine; Stiegler, Arnaud; Datta, Debajyoti; Szczechla, Eliza; Chhablani, Gunjan; Wang, Han; Pandey, Harshit; Strobelt, Hendrik; Fries, Jason Alan; Rozen, Jos; Gao, Leo; Sutawika, Lintang; Bari, M. Saiful; Al-shaibani, Maged S.; Manica, Matteo; Nayak, Nihal; Teehan, Ryan; Albanie, Samuel; Shen, Sheng; Ben-David, Srulik; Bach, Stephen H.; Kim, Taewoon; Bers, Tali; Fevry, Thibault; Neeraj, Trishala; Thakker, Urmish; Raunak, Vikas; Tang, Xiangru; Yong, Zheng-Xin; Sun, Zhiqing; Brody, Shaked; Uri, Yallow; Tojarieh, Hadar; Roberts, Adam; Chung, Hyung Won; Tae, Jaesung; Phang, Jason; Press, Ofir; Li, Conglong; Narayanan, Deepak; Bourfoune, Hatim; Casper, Jared; Rasley, Jeff; Ryabinin, Max; Mishra, Mayank; Zhang, Minjia; Shoeybi, Mohammad; Peyrounette, Myriam; Patry, Nicolas; Tazi, Nouamane; Sanseviero, Omar; von Platen, Patrick; Cornette, Pierre; Lavallée, Pierre François; Lacroix, Rémi; Rajbhandari, Samyam; Gandhi, Sanchit; Smith, Shaden; Requena, Stéphane; Patil, Suraj; Dettmers, Tim; Baruwa, Ahmed; Singh, Amanpreet; Cheveleva, Anastasia; Ligozat, Anne-Laure; Subramonian, Arjun; Névéol, Aurélie; Lovering, Charles; Garrette, Dan; Tunuguntla, Deepak; Reiter, Ehud; Taktasheva, Ekaterina; Voloshina, Ekaterina; Bogdanov, Eli; Winata, Genta Indra; Schoelkopf, Hailey; Kalo, Jan-Christoph; Novikova, Jekaterina; Forde, Jessica Zosa; Clive, Jordan; Kasai, Jungo; Kawamura, Ken; Hazan, Liam; Carpuat, Marine; Clinciu, Miruna; Kim, Najoung; Cheng, Newton; Serikov, Oleg; Antverg, Omer; van der Wal, Oskar; Zhang, Rui; Zhang, Ruochen; Gehrmann, Sebastian; Mirkin, Shachar; Pais, Shani; Shavrina, Tatiana; Scialom, Thomas; Yun, Tian; Limisiewicz, Tomasz; Rieser, Verena; Protasov, Vitaly; Mikhailov, Vladislav; Pruksachatkun, Yada; Belinkov, Yonatan; Bamberger, Zachary; Kasner, Zdeněk; Rueda, Alice; Pestana, Amanda; Feizpour, Amir; Khan, Ammar; Faranak, Amy; Santos, Ana; Hevia, Anthony; Unldreaj, Antigona; Aghagol, Arash; Abdollahi, Arezoo; Tammour, Aycha; HajiHosseini, Azadeh; Behroozi, Bahareh; Ajibade, Benjamin; Saxena, Bharat; Ferrandis, Carlos Muñoz; Contractor, Danish; Lansky, David; David, Davis; Kiela, Douwe; Nguyen, Duong A.; Tan, Edward; Baylor, Emi; Ozoani, Ezinwanne; Mirza, Fatima; Ononiwu, Frankline; Rezanejad, Habib; Jones, Hessie; Bhattacharya, Indrani; Solaiman, Irene; Sedenko, Irina; Nejadgholi, Isar; Passmore, Jesse; Seltzer, Josh; Sanz, Julio Bonis; Dutra, Livia; Samagaio, Mairon; Elbadri, Maraim; Mieskes, Margot; Gerchick, Marissa; Akinlolu, Martha; McKenna, Michael; Qiu, Mike; Ghauri, Muhammed; Burynok, Mykola; Abrar, Nafis; Rajani, Nazneen; Elkott, Nour; Fahmy, Nour; Samuel, Olanrewaju; An, Ran; Kromann, Rasmus; Hao, Ryan; Alizadeh, Samira; Shubber, Sarmad; Wang, Silas; Roy, Sourav; Viguier, Sylvain; Le, Thanh; Oyebade, Tobi; Le, Trieu; Yang, Yoyo; Nguyen, Zach; Kashyap, Abhinav Ramesh; Palasciano, Alfredo; Callahan, Alison; Shukla, Anima; Miranda-Escalada, Antonio; Singh, Ayush; Beilharz, Benjamin; Wang, Bo; Brito, Caio; Zhou, Chenxi; Jain, Chirag; Xu, Chuxin; Fourrier, Clémentine; Periñán, Daniel León; Molano, Daniel; Yu, Dian; Manjavacas, Enrique; Barth, Fabio; Fuhrimann, Florian; Altay, Gabriel; Bayrak, Giyaseddin; Burns, Gully; Vrabec, Helena U.; Bello, Imane; Dash, Ishani; Kang, Jihyun; Giorgi, John; Golde, Jonas; Posada, Jose David; Sivaraman, Karthik Rangasai; Bulchandani, Lokesh; Liu, Lu; Shinzato, Luisa; de Bykhovetz, Madeleine Hahn; Takeuchi, Maiko; Pàmies, Marc; Castillo, Maria A.; Nezhurina, Marianna; Sänger, Mario; Samwald, Matthias; Cullan, Michael; Weinberg, Michael; De Wolf, Michiel; Mihaljcic, Mina; Liu, Minna; Freidank, Moritz; Kang, Myungsun; Seelam, Natasha; Dahlberg, Nathan; Broad, Nicholas Michio; Muellner, Nikolaus; Fung, Pascale; Haller, Patrick; Chandrasekhar, Ramya; Eisenberg, Renata; Martin, Robert; Canalli, Rodrigo; Su, Rosaline; Su, Ruisi; Cahyawijaya, Samuel; Garda, Samuele; Deshmukh, Shlok S.; Mishra, Shubhanshu; Kiblawi, Sid; Ott, Simon; Sang-aroonsiri, Sinee; Kumar, Srishti; Schweter, Stefan; Bharati, Sushil; Laud, Tanmay; Gigant, Théo; Kainuma, Tomoya; Kusa, Wojciech; Labrak, Yanis; Bajaj, Yash Shailesh; Venkatraman, Yash; Xu, Yifan; Xu, Yingxin; Xu, Yu; Tan, Zhe; Xie, Zhongli; Ye, Zifan; Bras, Mathilde; Belkada, Younes; Wolf, Thomas (13 March 2023). "BLOOM: A 176B-Parameter Open-Access Multilingual Language Model". arXiv:2211.05100 [cs]. 
  29. Joshi, Harshit; Ebenezer, Abishai; Cambronero, José; Gulwani, Sumit; Kanade, Aditya; Le, Vu; Radiček, Ivan; Verbruggen, Gust (31 January 2023). "FLAME: A small language model for spreadsheet formulas". arXiv:2301.13779 [cs]. doi:10.48550/arXiv.2301.13779. 
  30. Lanzi, Pier Luca; Loiacono, Daniele (9 February 2023). "ChatGPT and Other Large Language Models as Evolutionary Engines for Online Interactive Collaborative Game Design". arXiv:2303.02155 [cs]. doi:10.48550/arXiv.2303.02155. 
  31. "Shaped". www.shaped.ai. Retrieved 16 May 2023. 
  32. Schick, Timo; Dwivedi-Yu, Jane; Dessì, Roberto; Raileanu, Roberta; Lomeli, Maria; Zettlemoyer, Luke; Cancedda, Nicola; Scialom, Thomas (2023). "Toolformer: Language Models Can Teach Themselves to Use Tools". doi:10.48550/arXiv.2302.04761. 
  33. Wang, Sheng; Zhao, Zihao; Ouyang, Xi; Wang, Qian; Shen, Dinggang (2023). "ChatCAD: Interactive Computer-Aided Diagnosis on Medical Image using Large Language Models". doi:10.48550/arXiv.2302.07257. 
  34. Weaver, Alaura (2 March 2023). "Palmyra LLMs empower secure, enterprise-grade generative AI for business". Writer. Retrieved 11 March 2023. 
  35. "Writer Launches Three New Generative AI Models for the Enterprise". PRWeb. Retrieved 11 March 2023. 
  36. Daull, Xavier; Bellot, Patrice; Bruno, Emmanuel; Martin, Vincent; Murisasco, Elisabeth (17 February 2023). "Complex QA and language models hybrid architectures, Survey". arXiv:2302.09051 [cs]. doi:10.48550/arXiv.2302.09051. 
  37. "MOSS". txsun1997.github.io. Retrieved 11 March 2023. 
  38. White, Jules; Fu, Quchen; Hays, Sam; Sandborn, Michael; Olea, Carlos; Gilbert, Henry; Elnashar, Ashraf; Spencer-Smith, Jesse; Schmidt, Douglas C. (21 February 2023). "A Prompt Pattern Catalog to Enhance Prompt Engineering with ChatGPT". arXiv:2302.11382 [cs]. doi:10.48550/arXiv.2302.11382. 
  39. Peng, Baolin; Galley, Michel; He, Pengcheng; Cheng, Hao; Xie, Yujia; Hu, Yu; Huang, Qiuyuan; Liden, Lars; Yu, Zhou; Chen, Weizhu; Gao, Jianfeng (1 March 2023). "Check Your Facts and Try Again: Improving Large Language Models with External Knowledge and Automated Feedback". arXiv:2302.12813 [cs]. doi:10.48550/arXiv.2302.12813. 
  40. "LLaMA: Open and Efficient Foundation Language Models - Meta Research". Meta Research. Retrieved 11 March 2023. 
  41. Kwon, Minae; Xie, Sang Michael; Bullard, Kalesha; Sadigh, Dorsa (27 February 2023). "Reward Design with Language Models". arXiv:2303.00001 [cs]. doi:10.48550/arXiv.2303.00001. 
  42. Zhu, Rui-Jie; Zhao, Qihang; Eshraghian, Jason K. (28 February 2023). "SpikeGPT: Generative Pre-trained Language Model with Spiking Neural Networks". arXiv:2302.13939 [cs]. doi:10.48550/arXiv.2302.13939. 
  43. Huang, Shaohan; Dong, Li; Wang, Wenhui; Hao, Yaru; Singhal, Saksham; Ma, Shuming; Lv, Tengchao; Cui, Lei; Mohammed, Owais Khan; Patra, Barun; Liu, Qiang; Aggarwal, Kriti; Chi, Zewen; Bjorck, Johan; Chaudhary, Vishrav; Som, Subhojit; Song, Xia; Wei, Furu (1 March 2023). "Language Is Not All You Need: Aligning Perception with Language Models". arXiv:2302.14045 [cs]. doi:10.48550/arXiv.2302.14045. 
  44. Cao, Meng; Fatemi, Mehdi; Cheung, Jackie Chi Kit; Shabanian, Samira (27 February 2023). "Systematic Rectification of Language Models via Dead-end Analysis". arXiv:2302.14003 [cs]. doi:10.48550/arXiv.2302.14003. 
  45. Al-Kaswan, Ali; Izadi, Maliheh (28 February 2023). "The (ab)use of Open Source Code to Train Large Language Models". arXiv:2302.13681 [cs]. doi:10.48550/arXiv.2302.13681. 
  46. "Large Language Models Are State-of-the-Art Evaluators of Translation Quality". arxiv-vanity.com. Retrieved 16 May 2023. 
  47. Kocmi, Tom; Federmann, Christian (28 February 2023). "Large Language Models Are State-of-the-Art Evaluators of Translation Quality". arXiv:2302.14520 [cs]. doi:10.48550/arXiv.2302.14520. 
  48. Houghton, Conor; Kazanina, Nina; Sukumaran, Priyanka (28 February 2023). "Beyond the limitations of any imaginable mechanism: large language models and psycholinguistics". arXiv:2303.00077 [cs]. doi:10.48550/arXiv.2303.00077. Retrieved 10 March 2023. 
  49. Bertolini, Lorenzo; Elce, Valentina; Michalak, Adriana; Bernardi, Giulio; Weeds, Julie (28 February 2023). "Automatic Scoring of Dream Reports' Emotional Content with Large Language Models". arXiv:2302.14828 [cs]. doi:10.48550/arXiv.2302.14828. 
  50. Ye, Seonghyeon; Hwang, Hyeonbin; Yang, Sohee; Yun, Hyeongu; Kim, Yireun; Seo, Minjoon (28 February 2023). "In-Context Instruction Learning". arXiv:2302.14691 [cs]. doi:10.48550/arXiv.2302.14691. 
  51. Yuan, Yang (2023). "Succinct Representations for Concepts". doi:10.48550/arXiv.2303.00446. 
  52. "GPT-NeoX". huggingface.co. Retrieved 20 March 2023. 
  53. Huemann, Zachary; Lee, Changhee; Hu, Junjie; Cho, Steve Y.; Bradshaw, Tyler (1 March 2023). "Domain-adapted large language models for classifying nuclear medicine reports". arXiv:2303.01258 [cs]. doi:10.48550/arXiv.2303.01258. 
  54. "Prophet". github.com. Vision and Language Group@ MIL. 16 May 2023. Retrieved 16 May 2023. 
  55. Shao, Zhenwei; Yu, Zhou; Wang, Meng; Yu, Jun (3 March 2023). "Prompting Large Language Models with Answer Heuristics for Knowledge-based Visual Question Answering". arXiv:2303.01903 [cs]. doi:10.48550/arXiv.2303.01903. 
  56. Zhang, Bowen; Soh, Harold (6 March 2023). "Large Language Models as Zero-Shot Human Models for Human-Robot Interaction". arXiv:2303.03548 [cs]. doi:10.48550/arXiv.2303.03548. 
  57. Dang, Hai; Goller, Sven; Lehmann, Florian; Buschek, Daniel (6 March 2023). "Choice Over Control: How Users Write with Large Language Models using Diegetic and Non-Diegetic Prompting". arXiv:2303.03199 [cs]. doi:10.1145/3544548.3580969. Retrieved 8 March 2023. 
  58. Josifoski, Martin; Sakota, Marija; Peyrard, Maxime; West, Robert (7 March 2023). "Exploiting Asymmetry for Synthetic Training Data Generation: SynthIE and the Case of Information Extraction". arXiv:2303.04132 [cs]. doi:10.48550/arXiv.2303.04132. 
  59. "Stanford CRFM". crfm.stanford.edu. Retrieved 21 March 2023. 
  60. "Our latest health AI research updates". Google. 14 March 2023. Retrieved 21 March 2023. 
  61. "GPT-4 Technical Report". 2023. doi:10.48550/arXiv.2303.08774. 
  62. Eloundou, Tyna; Manning, Sam; Mishkin, Pamela; Rock, Daniel (2023). "GPTs are GPTs: An Early Look at the Labor Market Impact Potential of Large Language Models". doi:10.48550/arXiv.2303.10130. 
  63. Roth, Emma (19 April 2023). "Stability AI announces new open-source large language model". The Verge. Retrieved 9 May 2023. 
  64. "Stability AI Launches the First of its StableLM Suite of Language Models". Stability AI. Retrieved 9 May 2023. 
  65. "How to Access PaLM 2 AND TRY IT". MLYearning. 15 May 2023. Retrieved 16 May 2023. 
  66. Hern, Alex (10 May 2023). "Google launches new AI PaLM 2 in attempt to regain leadership of the pack". The Guardian. Retrieved 16 May 2023. 
  67. Schwartz, Barry (12 May 2023). "Bing Chat gains image answers with knowledge cards and optimized answers". Search Engine Land. Retrieved 16 May 2023. 
  68. "AI Expert Says ChatGPT Is Way Stupider Than People Realize". Futurism. Retrieved 24 May 2023.