Talk:Timeline of large language models

Sample questions

The following are some interesting questions that can be answered by reading this timeline:

Concepts without articles on Wikipedia

BioNeMo

Year	Month and date	Model name	Number of parameters	Event type	Details
2022	March 29	Large-scale transformer language model			A paper investigates the optimal model size and number of tokens for training a transformer language model under a given compute budget. The researchers find that current large language models are significantly undertrained, and the model size and the number of training tokens should be scaled equally for compute-optimal training. They test this hypothesis by training a predicted compute-optimal model, Chinchilla, that uses the same compute budget as Gopher but with 70B parameters and 4x more data. Chinchilla outperforms Gopher, GPT-3, Jurassic-1, and Megatron-Turing NLG on a range of downstream evaluation tasks and reaches a state-of-the-art average accuracy of 67.5% on the MMLU benchmark, more than a 7% improvement over Gopher.^[1]
2020	May 28	Large-scale language model			A paper discusses the use of language models in few-shot learning, where a model is trained on a large corpus of text and then fine-tuned for a specific task. The authors demonstrate that scaling up language models greatly improves task-agnostic, few-shot performance. They trained GPT-3, a language model with 175 billion parameters, and tested its performance in the few-shot setting. GPT-3 achieved strong performance on many NLP tasks, including translation, question-answering, and cloze tasks, as well as tasks that require on-the-fly reasoning or domain adaptation. However, the authors also identify some datasets where GPT-3's few-shot learning struggles, as well as methodological issues related to training on large web corpora. The paper also discusses the broader societal impacts of this finding and of GPT-3 in general.^[2]
2020	July	Neural text generation model			A paper discusses the limitations of neural text generation models in open-ended tasks like language modeling and story generation, due to the standard likelihood training and approximate decoding objectives. The authors specifically analyze these limitations for abstractive document summarization and find that such models tend to hallucinate content that is unfaithful to the input document. The paper presents the results of a human evaluation of several neural abstractive summarization systems, highlighting the substantial amount of hallucinated content in all model-generated summaries. However, the authors also show that pretrained models perform better in terms of generating faithful and factual summaries, as evaluated by humans. They propose that textual entailment measures may be a better evaluation metric for faithfulness than standard metrics, leading to better training and decoding criteria.^[3]
2022	April 12	Reinforcement learning-based language model			A paper describes a method for training language models to act as helpful and harmless assistants using reinforcement learning from human feedback. The authors demonstrate that this alignment training improves performance on almost all natural language processing evaluations and is compatible with training for specialized skills such as python coding and summarization. They explore an iterated online mode of training and investigate the robustness of the approach, identifying a linear relationship between the RL reward and the square root of the Kullback–Leibler divergence between the policy and its initialization. The authors also perform peripheral analyses and provide samples from their models using prompts from recent related work.^[4]
2022	June 2				OpenAI publishes a blog post on the development of best practices for organizations developing or deploying large language models. The principles include prohibiting misuse of language models, mitigating unintentional harm by evaluating models, minimizing sources of bias, and collaborating with stakeholders. These practices are meant to mitigate the risks of language models and achieve their full potential to augment human capabilities. The authors express hope that other organizations will adopt these principles and advance public discussion on language model development and deployment. The support from other organizations shows the growing social concern over the safety of LLMs.^[5]
2022	September		Competition		Nvidia announces the launch of its BioNeMo LLM service to help researchers build new artificial intelligence models for biology.^[6]
2023	January 5				A paper discusses the concern about the potential of LLMs to influence, modify, and manipulate user preferences adversarially. As these models become more proficient in deducing user preferences and offering tailored assistance, their lack of interpretability in adversarial settings is a major concern. The paper examines existing literature on adversarial behavior in user preferences and provides red teaming samples for dialogue models like ChatGPT and GODEL. It also probes the attention mechanism in these models for non-adversarial and adversarial settings.^[7]
2023	February 17	Hybrid language model		Research	A paper surveys the state of the art of hybrid language models architectures and strategies for complex question-answering (QA, CQA, CPS). While very large language models are good at leveraging public data on standard problems, they may require specific architecture, knowledge, skills, tasks, methods, sensitive data, performance, human approval, and versatile feedback to tackle more specific complex questions or problems. The paper identifies the key elements used with LLMs to solve complex questions or problems and discusses challenges associated with complex QA. The paper also reviews current solutions and promising strategies, using elements such as hybrid LLM architectures, human-in-the-loop reinforcement learning, prompting adaptation, neuro-symbolic and structured knowledge grounding, program synthesis, and others.^[8]
2023	Marh 14	Medical language model			Google shares health AI updates including progress on their Medical PaLM 2, expert-level medical language model (LLM) research which demonstrated consistently expert-level performance on medical exam questions, scoring 85%. The company has partnered with Jacaranda Health and Chang Gung Memorial Hospital to build AI models that can help simplify acquiring and interpreting ultrasound images to identify important information like gestational age in expecting mothers and early detection of breast cancer. They're also partners with Mayo Clinic with the purpose to extend the reach of their AI model, with the goal of helping more patients receive radiotherapy treatment sooner. Additionally, Google works with partners on the ground to bring their research on tuberculosis (TB) AI-powered chest x-ray screening into the care setting.^[9]

↑ Hoffmann, Jordan; Borgeaud, Sebastian; Mensch, Arthur; Buchatskaya, Elena; Cai, Trevor; Rutherford, Eliza; Casas, Diego de Las; Hendricks, Lisa Anne; Welbl, Johannes; Clark, Aidan; Hennigan, Tom; Noland, Eric; Millican, Katie; Driessche, George van den; Damoc, Bogdan; Guy, Aurelia; Osindero, Simon; Simonyan, Karen; Elsen, Erich; Rae, Jack W.; Vinyals, Oriol; Sifre, Laurent (2022). "Training Compute-Optimal Large Language Models". doi:10.48550/arXiv.2203.15556.
↑ Brown, Tom B.; Mann, Benjamin; Ryder, Nick; Subbiah, Melanie; Kaplan, Jared; Dhariwal, Prafulla; Neelakantan, Arvind; Shyam, Pranav; Sastry, Girish; Askell, Amanda; Agarwal, Sandhini; Herbert-Voss, Ariel; Krueger, Gretchen; Henighan, Tom; Child, Rewon; Ramesh, Aditya; Ziegler, Daniel M.; Wu, Jeffrey; Winter, Clemens; Hesse, Christopher; Chen, Mark; Sigler, Eric; Litwin, Mateusz; Gray, Scott; Chess, Benjamin; Clark, Jack; Berner, Christopher; McCandlish, Sam; Radford, Alec; Sutskever, Ilya; Amodei, Dario (2020). "Language Models are Few-Shot Learners". doi:10.48550/arXiv.2005.14165.
↑ Maynez, Joshua; Narayan, Shashi; Bohnet, Bernd; McDonald, Ryan (July 2020). "On Faithfulness and Factuality in Abstractive Summarization". Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics. Association for Computational Linguistics: 1906–1919. doi:10.18653/v1/2020.acl-main.173.
↑ Bai, Yuntao; Jones, Andy; Ndousse, Kamal; Askell, Amanda; Chen, Anna; DasSarma, Nova; Drain, Dawn; Fort, Stanislav; Ganguli, Deep; Henighan, Tom; Joseph, Nicholas; Kadavath, Saurav; Kernion, Jackson; Conerly, Tom; El-Showk, Sheer; Elhage, Nelson; Hatfield-Dodds, Zac; Hernandez, Danny; Hume, Tristan; Johnston, Scott; Kravec, Shauna; Lovitt, Liane; Nanda, Neel; Olsson, Catherine; Amodei, Dario; Brown, Tom; Clark, Jack; McCandlish, Sam; Olah, Chris; Mann, Ben; Kaplan, Jared (2022). "Training a Helpful and Harmless Assistant with Reinforcement Learning from Human Feedback". doi:10.48550/arXiv.2204.05862.
↑ "Best practices for deploying language models". openai.com. Retrieved 17 March 2023.
↑ "Nvidia boosts generative AI for biology with BioNeMo". VentureBeat. 12 January 2023. Retrieved 11 March 2023.
↑ Subhash, Varshini (5 January 2023). "Can Large Language Models Change User Preference Adversarially?". arXiv:2302.10291 [cs]. doi:10.48550/arXiv.2302.10291.
↑ Daull, Xavier; Bellot, Patrice; Bruno, Emmanuel; Martin, Vincent; Murisasco, Elisabeth (17 February 2023). "Complex QA and language models hybrid architectures, Survey". arXiv:2302.09051 [cs]. doi:10.48550/arXiv.2302.09051.
↑ "Our latest health AI research updates". Google. 14 March 2023. Retrieved 21 March 2023.

[1] Hoffmann, Jordan; Borgeaud, Sebastian; Mensch, Arthur; Buchatskaya, Elena; Cai, Trevor; Rutherford, Eliza; Casas, Diego de Las; Hendricks, Lisa Anne; Welbl, Johannes; Clark, Aidan; Hennigan, Tom; Noland, Eric; Millican, Katie; Driessche, George van den; Damoc, Bogdan; Guy, Aurelia; Osindero, Simon; Simonyan, Karen; Elsen, Erich; Rae, Jack W.; Vinyals, Oriol; Sifre, Laurent (2022). "Training Compute-Optimal Large Language Models". doi:10.48550/arXiv.2203.15556.

[2] Brown, Tom B.; Mann, Benjamin; Ryder, Nick; Subbiah, Melanie; Kaplan, Jared; Dhariwal, Prafulla; Neelakantan, Arvind; Shyam, Pranav; Sastry, Girish; Askell, Amanda; Agarwal, Sandhini; Herbert-Voss, Ariel; Krueger, Gretchen; Henighan, Tom; Child, Rewon; Ramesh, Aditya; Ziegler, Daniel M.; Wu, Jeffrey; Winter, Clemens; Hesse, Christopher; Chen, Mark; Sigler, Eric; Litwin, Mateusz; Gray, Scott; Chess, Benjamin; Clark, Jack; Berner, Christopher; McCandlish, Sam; Radford, Alec; Sutskever, Ilya; Amodei, Dario (2020). "Language Models are Few-Shot Learners". doi:10.48550/arXiv.2005.14165.

[3] Maynez, Joshua; Narayan, Shashi; Bohnet, Bernd; McDonald, Ryan (July 2020). "On Faithfulness and Factuality in Abstractive Summarization". Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics. Association for Computational Linguistics: 1906–1919. doi:10.18653/v1/2020.acl-main.173.

[4] Bai, Yuntao; Jones, Andy; Ndousse, Kamal; Askell, Amanda; Chen, Anna; DasSarma, Nova; Drain, Dawn; Fort, Stanislav; Ganguli, Deep; Henighan, Tom; Joseph, Nicholas; Kadavath, Saurav; Kernion, Jackson; Conerly, Tom; El-Showk, Sheer; Elhage, Nelson; Hatfield-Dodds, Zac; Hernandez, Danny; Hume, Tristan; Johnston, Scott; Kravec, Shauna; Lovitt, Liane; Nanda, Neel; Olsson, Catherine; Amodei, Dario; Brown, Tom; Clark, Jack; McCandlish, Sam; Olah, Chris; Mann, Ben; Kaplan, Jared (2022). "Training a Helpful and Harmless Assistant with Reinforcement Learning from Human Feedback". doi:10.48550/arXiv.2204.05862.

[5] "Best practices for deploying language models". openai.com. Retrieved 17 March 2023.

[6] "Nvidia boosts generative AI for biology with BioNeMo". VentureBeat. 12 January 2023. Retrieved 11 March 2023.

[7] Subhash, Varshini (5 January 2023). "Can Large Language Models Change User Preference Adversarially?". arXiv:2302.10291 [cs]. doi:10.48550/arXiv.2302.10291.

[8] Daull, Xavier; Bellot, Patrice; Bruno, Emmanuel; Martin, Vincent; Murisasco, Elisabeth (17 February 2023). "Complex QA and language models hybrid architectures, Survey". arXiv:2302.09051 [cs]. doi:10.48550/arXiv.2302.09051.

[9] "Our latest health AI research updates". Google. 14 March 2023. Retrieved 21 March 2023.

[1]

[2]

[3]

[4]

[5]

[6]

[7]

[8]

[9]

Talk:Timeline of large language models

Sample questions

Concepts without articles on Wikipedia

Navigation menu

Personal tools

Namespaces

Variants

Views

More

Search

Navigation

Tools