Language Models are Few-Shot Learners

Resource history | v1 (current) | created by janarez on Jul 21, 2020

Details

see v1 | created by janarez on Jul 21, 2020 | Add topic "GPT-3"

Title: Language Models are Few-Shot Learners
Type: Paper
Created: 2020-01-01
Description: Humans generally perform a new language task from only a few examples - something which current NLP systems still largely struggle to do. Here we show that scaling up language models greatly improves task-agnostic, few-shot performance, sometimes even reaching competitiveness with prior state-of-the-art fine-tuning approaches. Specifically, we train GPT-3, an autoregressive language model with 175 billion parameters, 10x more than any previous non-sparse language model, and test its performance in the few-shot setting. For all tasks, GPT-3 is applied without any gradient updates or fine-tuning, with tasks and few-shot demonstrations specified purely via text interaction with the model. GPT-3 achieves strong performance on many NLP datasets, including translation, question-answering, and cloze tasks, as well as several tasks that require on-the-fly reasoning or domain adaptation, such as unscrambling words, using a novel word in a sentence, or performing 3-digit arithmetic.
Link: http://arxiv.org/abs/2005.14165
Identifier: https://github.com/openai/gpt-3

This resource has no history of related authors.

v1 | attached by janarez on Jul 21, 2020 | Add topic "GPT-3"

This resource has no history of related resources.