Difference between revisions of "Talk:Timeline of large language models"
From Timelines
(37 intermediate revisions by the same user not shown) | |||
Line 1: | Line 1: | ||
− | |||
− | + | ||
+ | == Extended Timeline == | ||
+ | |||
+ | These events were removed from the main timeline. | ||
+ | |||
+ | {| class="sortable wikitable" | ||
+ | ! Year !! Month and date !! Model name !! Number of parameters !! Event type !! Details | ||
+ | |- | ||
+ | | 2018 || April 1 || Marian || || Early development || A paper introduces Marian, a highly efficient Neural Machine Translation (NMT) framework written entirely in C++. The framework includes an integrated automatic differentiation engine based on dynamic computation graphs. The authors discuss the design of the encoder-decoder framework and demonstrate that Marian, as a research-friendly toolkit, achieves fast training and translation speeds, making it a valuable tool for NMT research and development.<ref>{{cite journal |last1=Junczys-Dowmunt |first1=Marcin |last2=Grundkiewicz |first2=Roman |last3=Dwojak |first3=Tomasz |last4=Hoang |first4=Hieu |last5=Heafield |first5=Kenneth |last6=Neckermann |first6=Tom |last7=Seide |first7=Frank |last8=Germann |first8=Ulrich |last9=Aji |first9=Alham Fikri |last10=Bogoychev |first10=Nikolay |last11=Martins |first11=André F. T. |last12=Birch |first12=Alexandra |title=Marian: Fast Neural Machine Translation in C++ |date=2018 |doi=10.48550/arXiv.1804.00344}}</ref> NMT models, like those used in Marian, form a significant component of large language models. | ||
+ | |- | ||
+ | | 2022 || June 2 || || || || {{w|OpenAI}} publishes a blog post on the development of best practices for organizations developing or deploying large language models. The principles include prohibiting misuse of language models, mitigating unintentional harm by evaluating models, minimizing sources of bias, and collaborating with stakeholders. These practices are meant to mitigate the risks of language models and achieve their full potential to augment human capabilities. The authors express hope that other organizations will adopt these principles and advance public discussion on language model development and deployment. The support from other organizations shows the growing social concern over the safety of LLMs.<ref>{{cite web |title=Best practices for deploying language models |url=https://openai.com/blog/best-practices-for-deploying-language-models |website=openai.com |access-date=17 March 2023}}</ref> | ||
+ | |- | ||
+ | | 2022 || September || || Competition || || {{w|Nvidia}} announces the launch of its {{w|BioNeMo}} LLM service to help researchers build new artificial intelligence models for biology.<ref>{{cite web |title=Nvidia boosts generative AI for biology with BioNeMo |url=https://venturebeat.com/ai/nvidia-boosts-generative-ai-for-biology-with-bionemo/#:~:text=In%20September%202022%2C%20Nvidia%20announced,yielded%20some%20strong%20early%20results. |website=VentureBeat |access-date=11 March 2023 |date=12 January 2023}}</ref> | ||
+ | |- | ||
+ | | 2023 || February 9 || || || || A paper presents a collaborative design framework that combines interactive evolution and LLMs to simulate the human design process. The framework uses interactive evolution to exploit user feedback and LLMs for a complex creative task of recombining and varying ideas. The process begins with a brief and a set of candidate designs, generated by a language model or proposed by users. Users provide feedback to an interactive {{w|genetic algorithm}} that selects, recombines, and mutates the most promising designs. The framework was evaluated on three game design tasks with human designers collaborating remotely.<ref>{{cite journal |last1=Lanzi |first1=Pier Luca |last2=Loiacono |first2=Daniele |title=ChatGPT and Other Large Language Models as Evolutionary Engines for Online Interactive Collaborative Game Design |journal=arXiv:2303.02155 [cs] |date=9 February 2023 |doi=10.48550/arXiv.2303.02155 |url=https://arxiv.org/abs/2303.02155}}</ref> | ||
+ | |- | ||
+ | | 2023 || February 14 || || Research || || A paper presents a framework called ChatCAD, which integrates LLMs with {{w|computer-aided diagnosis}} (CAD) networks for medical images. ChatCAD uses LLMs to enhance the output of multiple CAD networks by summarizing and reorganizing the information presented in natural language text format. This approach merges the strengths of LLMs' medical domain knowledge and logical reasoning with the vision understanding capability of existing medical-image CAD models. The goal is to create a more user-friendly and understandable system for patients compared to conventional CAD systems. The paper suggests that LLMs can also be used to improve the performance of vision-based medical-image CAD models in the future.<ref>{{cite journal |last1=Wang |first1=Sheng |last2=Zhao |first2=Zihao |last3=Ouyang |first3=Xi |last4=Wang |first4=Qian |last5=Shen |first5=Dinggang |title=ChatCAD: Interactive Computer-Aided Diagnosis on Medical Image using Large Language Models |date=2023 |doi=10.48550/arXiv.2302.07257}}</ref> | ||
+ | |- | ||
+ | | 2023 || February 17 || || || Research || A paper surveys the state of the art of hybrid language models architectures and strategies for complex question-answering (QA, CQA, CPS). While very large language models are good at leveraging public data on standard problems, they may require specific architecture, knowledge, skills, tasks, methods, sensitive data, performance, human approval, and versatile feedback to tackle more specific complex questions or problems. The paper identifies the key elements used with LLMs to solve complex questions or problems and discusses challenges associated with complex QA. The paper also reviews current solutions and promising strategies, using elements such as hybrid LLM architectures, human-in-the-loop reinforcement learning, prompting adaptation, neuro-symbolic and structured knowledge grounding, {{w|program synthesis}}, and others.<ref>{{cite journal |last1=Daull |first1=Xavier |last2=Bellot |first2=Patrice |last3=Bruno |first3=Emmanuel |last4=Martin |first4=Vincent |last5=Murisasco |first5=Elisabeth |title=Complex QA and language models hybrid architectures, Survey |journal=arXiv:2302.09051 [cs] |date=17 February 2023 |doi=10.48550/arXiv.2302.09051 |url=https://arxiv.org/abs/2302.09051}}</ref> | ||
+ | |- | ||
+ | | 2023 || February 28 || || || || GEMBA (GPT Estimation Metric Based Assessment) is presented as a GPT-based metric for evaluating translation quality both with and without a reference translation. The authors evaluate four prompt variants in two modes and investigate seven versions of GPT models, including ChatGPT. Their method achieves state-of-the-art accuracy in both modes compared to human labels and provides insight into the usefulness of pre-trained, generative large language models for translation quality assessment.<ref>{{cite journal |last1=Kocmi |first1=Tom |last2=Federmann |first2=Christian |title=Large Language Models Are State-of-the-Art Evaluators of Translation Quality |journal=arXiv:2302.14520 [cs] |date=28 February 2023 |doi=10.48550/arXiv.2302.14520 |url=https://arxiv.org/abs/2302.14520}}</ref><ref>{{cite web |title=Large Language Models Are State-of-the-Art Evaluators of Translation Quality |url=https://www.arxiv-vanity.com/papers/2302.14520/ |website=arxiv-vanity.com |access-date=16 May 2023}}</ref> | ||
+ | |- | ||
+ | | 2023 || March 3 || Two stage framework<ref>{{cite web |title=Prophet |url=https://github.com/MILVLG/prophet |website=github.com |publisher=Vision and Language Group@ MIL |access-date=16 May 2023 |date=16 May 2023}}</ref> || || Research || A paper proposes a framework called Prophet that uses answer heuristics to prompt LLMs for knowledge-based visual question answering (VQA). Previous methods used LLMs to acquire necessary knowledge for answering, but these methods did not fully activate the capacity of LLMs due to insufficient input information. Prophet trains a vanilla VQA model on a knowledge-based VQA dataset without external knowledge and extracts two types of answer heuristics: answer candidates and answer-aware examples. These answer heuristics are encoded into prompts to enhance the capacity of LLMs. Prophet outperforms existing state-of-the-art methods on two challenging knowledge-based VQA datasets, OK-VQA and A-OKVQA, delivering 61.1% and 55.7% accuracies on their testing sets, respectively.<ref>{{cite journal |last1=Shao |first1=Zhenwei |last2=Yu |first2=Zhou |last3=Wang |first3=Meng |last4=Yu |first4=Jun |title=Prompting Large Language Models with Answer Heuristics for Knowledge-based Visual Question Answering |journal=arXiv:2303.01903 [cs] |date=3 March 2023 |doi=10.48550/arXiv.2303.01903 |url=https://arxiv.org/abs/2303.01903}}</ref> | ||
+ | |- | ||
+ | | 2023 || March 7 || SynthIE || || || A paper presents SynthIE as a novel approach that leverages LLMs for synthetic data generation, even for tasks where LLMs can't directly solve the problem. It operates by prompting the LLM to generate text for a given structured output, exploiting task asymmetry to create high-quality, large-scale data. This methodology is demonstrated in the challenging domain of closed information extraction, where ground-truth data is scarce. SynthIE produces a dataset of 1.8 million data points, surpassing existing datasets in quality through human evaluation. The resulting SynthIE models, fine-tuned on this data, outperform comparable models by significant margins, achieving a 57-point improvement in micro F1 and a 79-point improvement in macro F1. All associated resources are publicly available.<ref>{{cite journal |last1=Josifoski |first1=Martin |last2=Sakota |first2=Marija |last3=Peyrard |first3=Maxime |last4=West |first4=Robert |title=Exploiting Asymmetry for Synthetic Training Data Generation: SynthIE and the Case of Information Extraction |journal=arXiv:2303.04132 [cs] |date=7 March 2023 |doi=10.48550/arXiv.2303.04132 |url=https://arxiv.org/abs/2303.04132}}</ref> | ||
+ | |- | ||
+ | | 2023 || March 14 || || || || Google shares health AI updates including progress on their Medical PaLM 2, expert-level medical language model (LLM) research which demonstrated consistently expert-level performance on medical exam questions, scoring 85%. The company has partnered with Jacaranda Health and Chang Gung Memorial Hospital to build AI models that can help simplify acquiring and interpreting ultrasound images to identify important information like gestational age in expecting mothers and early detection of breast cancer. They're also partners with Mayo Clinic with the purpose to extend the reach of their AI model, with the goal of helping more patients receive radiotherapy treatment sooner. Additionally, Google works with partners on the ground to bring their research on tuberculosis (TB) AI-powered chest x-ray screening into the care setting.<ref>{{cite web |title=Our latest health AI research updates |url=https://blog.google/technology/health/ai-llm-medpalm-research-thecheckup/ |website=Google |access-date=21 March 2023 |language=en-us |date=14 March 2023}}</ref> | ||
+ | |- | ||
+ | |} |
Latest revision as of 12:25, 12 October 2023
Extended Timeline
These events were removed from the main timeline.
Year | Month and date | Model name | Number of parameters | Event type | Details |
---|---|---|---|---|---|
2018 | April 1 | Marian | Early development | A paper introduces Marian, a highly efficient Neural Machine Translation (NMT) framework written entirely in C++. The framework includes an integrated automatic differentiation engine based on dynamic computation graphs. The authors discuss the design of the encoder-decoder framework and demonstrate that Marian, as a research-friendly toolkit, achieves fast training and translation speeds, making it a valuable tool for NMT research and development.[1] NMT models, like those used in Marian, form a significant component of large language models. | |
2022 | June 2 | OpenAI publishes a blog post on the development of best practices for organizations developing or deploying large language models. The principles include prohibiting misuse of language models, mitigating unintentional harm by evaluating models, minimizing sources of bias, and collaborating with stakeholders. These practices are meant to mitigate the risks of language models and achieve their full potential to augment human capabilities. The authors express hope that other organizations will adopt these principles and advance public discussion on language model development and deployment. The support from other organizations shows the growing social concern over the safety of LLMs.[2] | |||
2022 | September | Competition | Nvidia announces the launch of its BioNeMo LLM service to help researchers build new artificial intelligence models for biology.[3] | ||
2023 | February 9 | A paper presents a collaborative design framework that combines interactive evolution and LLMs to simulate the human design process. The framework uses interactive evolution to exploit user feedback and LLMs for a complex creative task of recombining and varying ideas. The process begins with a brief and a set of candidate designs, generated by a language model or proposed by users. Users provide feedback to an interactive genetic algorithm that selects, recombines, and mutates the most promising designs. The framework was evaluated on three game design tasks with human designers collaborating remotely.[4] | |||
2023 | February 14 | Research | A paper presents a framework called ChatCAD, which integrates LLMs with computer-aided diagnosis (CAD) networks for medical images. ChatCAD uses LLMs to enhance the output of multiple CAD networks by summarizing and reorganizing the information presented in natural language text format. This approach merges the strengths of LLMs' medical domain knowledge and logical reasoning with the vision understanding capability of existing medical-image CAD models. The goal is to create a more user-friendly and understandable system for patients compared to conventional CAD systems. The paper suggests that LLMs can also be used to improve the performance of vision-based medical-image CAD models in the future.[5] | ||
2023 | February 17 | Research | A paper surveys the state of the art of hybrid language models architectures and strategies for complex question-answering (QA, CQA, CPS). While very large language models are good at leveraging public data on standard problems, they may require specific architecture, knowledge, skills, tasks, methods, sensitive data, performance, human approval, and versatile feedback to tackle more specific complex questions or problems. The paper identifies the key elements used with LLMs to solve complex questions or problems and discusses challenges associated with complex QA. The paper also reviews current solutions and promising strategies, using elements such as hybrid LLM architectures, human-in-the-loop reinforcement learning, prompting adaptation, neuro-symbolic and structured knowledge grounding, program synthesis, and others.[6] | ||
2023 | February 28 | GEMBA (GPT Estimation Metric Based Assessment) is presented as a GPT-based metric for evaluating translation quality both with and without a reference translation. The authors evaluate four prompt variants in two modes and investigate seven versions of GPT models, including ChatGPT. Their method achieves state-of-the-art accuracy in both modes compared to human labels and provides insight into the usefulness of pre-trained, generative large language models for translation quality assessment.[7][8] | |||
2023 | March 3 | Two stage framework[9] | Research | A paper proposes a framework called Prophet that uses answer heuristics to prompt LLMs for knowledge-based visual question answering (VQA). Previous methods used LLMs to acquire necessary knowledge for answering, but these methods did not fully activate the capacity of LLMs due to insufficient input information. Prophet trains a vanilla VQA model on a knowledge-based VQA dataset without external knowledge and extracts two types of answer heuristics: answer candidates and answer-aware examples. These answer heuristics are encoded into prompts to enhance the capacity of LLMs. Prophet outperforms existing state-of-the-art methods on two challenging knowledge-based VQA datasets, OK-VQA and A-OKVQA, delivering 61.1% and 55.7% accuracies on their testing sets, respectively.[10] | |
2023 | March 7 | SynthIE | A paper presents SynthIE as a novel approach that leverages LLMs for synthetic data generation, even for tasks where LLMs can't directly solve the problem. It operates by prompting the LLM to generate text for a given structured output, exploiting task asymmetry to create high-quality, large-scale data. This methodology is demonstrated in the challenging domain of closed information extraction, where ground-truth data is scarce. SynthIE produces a dataset of 1.8 million data points, surpassing existing datasets in quality through human evaluation. The resulting SynthIE models, fine-tuned on this data, outperform comparable models by significant margins, achieving a 57-point improvement in micro F1 and a 79-point improvement in macro F1. All associated resources are publicly available.[11] | ||
2023 | March 14 | Google shares health AI updates including progress on their Medical PaLM 2, expert-level medical language model (LLM) research which demonstrated consistently expert-level performance on medical exam questions, scoring 85%. The company has partnered with Jacaranda Health and Chang Gung Memorial Hospital to build AI models that can help simplify acquiring and interpreting ultrasound images to identify important information like gestational age in expecting mothers and early detection of breast cancer. They're also partners with Mayo Clinic with the purpose to extend the reach of their AI model, with the goal of helping more patients receive radiotherapy treatment sooner. Additionally, Google works with partners on the ground to bring their research on tuberculosis (TB) AI-powered chest x-ray screening into the care setting.[12] |
- ↑ Junczys-Dowmunt, Marcin; Grundkiewicz, Roman; Dwojak, Tomasz; Hoang, Hieu; Heafield, Kenneth; Neckermann, Tom; Seide, Frank; Germann, Ulrich; Aji, Alham Fikri; Bogoychev, Nikolay; Martins, André F. T.; Birch, Alexandra (2018). "Marian: Fast Neural Machine Translation in C++". doi:10.48550/arXiv.1804.00344.
- ↑ "Best practices for deploying language models". openai.com. Retrieved 17 March 2023.
- ↑ "Nvidia boosts generative AI for biology with BioNeMo". VentureBeat. 12 January 2023. Retrieved 11 March 2023.
- ↑ Lanzi, Pier Luca; Loiacono, Daniele (9 February 2023). "ChatGPT and Other Large Language Models as Evolutionary Engines for Online Interactive Collaborative Game Design". arXiv:2303.02155 [cs]. doi:10.48550/arXiv.2303.02155.
- ↑ Wang, Sheng; Zhao, Zihao; Ouyang, Xi; Wang, Qian; Shen, Dinggang (2023). "ChatCAD: Interactive Computer-Aided Diagnosis on Medical Image using Large Language Models". doi:10.48550/arXiv.2302.07257.
- ↑ Daull, Xavier; Bellot, Patrice; Bruno, Emmanuel; Martin, Vincent; Murisasco, Elisabeth (17 February 2023). "Complex QA and language models hybrid architectures, Survey". arXiv:2302.09051 [cs]. doi:10.48550/arXiv.2302.09051.
- ↑ Kocmi, Tom; Federmann, Christian (28 February 2023). "Large Language Models Are State-of-the-Art Evaluators of Translation Quality". arXiv:2302.14520 [cs]. doi:10.48550/arXiv.2302.14520.
- ↑ "Large Language Models Are State-of-the-Art Evaluators of Translation Quality". arxiv-vanity.com. Retrieved 16 May 2023.
- ↑ "Prophet". github.com. Vision and Language Group@ MIL. 16 May 2023. Retrieved 16 May 2023.
- ↑ Shao, Zhenwei; Yu, Zhou; Wang, Meng; Yu, Jun (3 March 2023). "Prompting Large Language Models with Answer Heuristics for Knowledge-based Visual Question Answering". arXiv:2303.01903 [cs]. doi:10.48550/arXiv.2303.01903.
- ↑ Josifoski, Martin; Sakota, Marija; Peyrard, Maxime; West, Robert (7 March 2023). "Exploiting Asymmetry for Synthetic Training Data Generation: SynthIE and the Case of Information Extraction". arXiv:2303.04132 [cs]. doi:10.48550/arXiv.2303.04132.
- ↑ "Our latest health AI research updates". Google. 14 March 2023. Retrieved 21 March 2023.