Difference between revisions of "Talk:Timeline of large language models"

Revision as of 07:35, 19 June 2023

Sample questions

The following are some interesting questions that can be answered by reading this timeline:

Concepts without articles on Wikipedia

BioNeMo

Year	Month and date	Model name	Number of parameters	Event type	Details
2022	April 12	Reinforcement learning-based language model			A paper describes a method for training language models to act as helpful and harmless assistants using reinforcement learning from human feedback. The authors demonstrate that this alignment training improves performance on almost all natural language processing evaluations and is compatible with training for specialized skills such as python coding and summarization. They explore an iterated online mode of training and investigate the robustness of the approach, identifying a linear relationship between the RL reward and the square root of the Kullback–Leibler divergence between the policy and its initialization. The authors also perform peripheral analyses and provide samples from their models using prompts from recent related work.^[1]
2022	June 2				OpenAI publishes a blog post on the development of best practices for organizations developing or deploying large language models. The principles include prohibiting misuse of language models, mitigating unintentional harm by evaluating models, minimizing sources of bias, and collaborating with stakeholders. These practices are meant to mitigate the risks of language models and achieve their full potential to augment human capabilities. The authors express hope that other organizations will adopt these principles and advance public discussion on language model development and deployment. The support from other organizations shows the growing social concern over the safety of LLMs.^[2]

↑ Bai, Yuntao; Jones, Andy; Ndousse, Kamal; Askell, Amanda; Chen, Anna; DasSarma, Nova; Drain, Dawn; Fort, Stanislav; Ganguli, Deep; Henighan, Tom; Joseph, Nicholas; Kadavath, Saurav; Kernion, Jackson; Conerly, Tom; El-Showk, Sheer; Elhage, Nelson; Hatfield-Dodds, Zac; Hernandez, Danny; Hume, Tristan; Johnston, Scott; Kravec, Shauna; Lovitt, Liane; Nanda, Neel; Olsson, Catherine; Amodei, Dario; Brown, Tom; Clark, Jack; McCandlish, Sam; Olah, Chris; Mann, Ben; Kaplan, Jared (2022). "Training a Helpful and Harmless Assistant with Reinforcement Learning from Human Feedback". doi:10.48550/arXiv.2204.05862.
↑ "Best practices for deploying language models". openai.com. Retrieved 17 March 2023.

[1] Bai, Yuntao; Jones, Andy; Ndousse, Kamal; Askell, Amanda; Chen, Anna; DasSarma, Nova; Drain, Dawn; Fort, Stanislav; Ganguli, Deep; Henighan, Tom; Joseph, Nicholas; Kadavath, Saurav; Kernion, Jackson; Conerly, Tom; El-Showk, Sheer; Elhage, Nelson; Hatfield-Dodds, Zac; Hernandez, Danny; Hume, Tristan; Johnston, Scott; Kravec, Shauna; Lovitt, Liane; Nanda, Neel; Olsson, Catherine; Amodei, Dario; Brown, Tom; Clark, Jack; McCandlish, Sam; Olah, Chris; Mann, Ben; Kaplan, Jared (2022). "Training a Helpful and Harmless Assistant with Reinforcement Learning from Human Feedback". doi:10.48550/arXiv.2204.05862.

[2] "Best practices for deploying language models". openai.com. Retrieved 17 March 2023.

[1]

[2]

@@ Line 11: / Line 11: @@
 {| class="sortable wikitable"
 ! Year !! Month and date !! Model name !! Number of parameters !! Event type !! Details
+|-
+| 2022 || April 12 || Reinforcement learning-based language model || || || A paper describes a method for training language models to act as helpful and harmless assistants using {{w|reinforcement learning}} from human feedback. The authors demonstrate that this alignment training improves performance on almost all natural language processing evaluations and is compatible with training for specialized skills such as python coding and summarization. They explore an iterated online mode of training and investigate the robustness of the approach, identifying a linear relationship between the RL reward and the square root of the {{w|Kullback–Leibler divergence}} between the policy and its initialization. The authors also perform peripheral analyses and provide samples from their models using prompts from recent related work.<ref>{{cite journal |last1=Bai |first1=Yuntao |last2=Jones |first2=Andy |last3=Ndousse |first3=Kamal |last4=Askell |first4=Amanda |last5=Chen |first5=Anna |last6=DasSarma |first6=Nova |last7=Drain |first7=Dawn |last8=Fort |first8=Stanislav |last9=Ganguli |first9=Deep |last10=Henighan |first10=Tom |last11=Joseph |first11=Nicholas |last12=Kadavath |first12=Saurav |last13=Kernion |first13=Jackson |last14=Conerly |first14=Tom |last15=El-Showk |first15=Sheer |last16=Elhage |first16=Nelson |last17=Hatfield-Dodds |first17=Zac |last18=Hernandez |first18=Danny |last19=Hume |first19=Tristan |last20=Johnston |first20=Scott |last21=Kravec |first21=Shauna |last22=Lovitt |first22=Liane |last23=Nanda |first23=Neel |last24=Olsson |first24=Catherine |last25=Amodei |first25=Dario |last26=Brown |first26=Tom |last27=Clark |first27=Jack |last28=McCandlish |first28=Sam |last29=Olah |first29=Chris |last30=Mann |first30=Ben |last31=Kaplan |first31=Jared |title=Training a Helpful and Harmless Assistant with Reinforcement Learning from Human Feedback |date=2022 |doi=10.48550/arXiv.2204.05862}}</ref>
+|-
+| 2022 || June 2 || || || || {{w|OpenAI}} publishes a blog post on the development of best practices for organizations developing or deploying large language models. The principles include prohibiting misuse of language models, mitigating unintentional harm by evaluating models, minimizing sources of bias, and collaborating with stakeholders. These practices are meant to mitigate the risks of language models and achieve their full potential to augment human capabilities. The authors express hope that other organizations will adopt these principles and advance public discussion on language model development and deployment. The support from other organizations shows the growing social concern over the safety of LLMs.<ref>{{cite web |title=Best practices for deploying language models |url=https://openai.com/blog/best-practices-for-deploying-language-models |website=openai.com |access-date=17 March 2023}}</ref>
 |-
 |}

Difference between revisions of "Talk:Timeline of large language models"

Revision as of 07:35, 19 June 2023

Sample questions

Concepts without articles on Wikipedia

Navigation menu

Personal tools

Namespaces

Variants

Views

More

Search

Navigation

Tools