Differences

This shows you the differences between two versions of the page.

--- projects:ggd:nlp:start [2021/09/13 13:59] – created sarah
+++ projects:ggd:nlp:start [2021/09/15 10:15] (current) – removed sarah
@@ Line 1: / Line 1: @@
-^  Author      ^  Title       ^  Link to code          ^  Abstract (short)  ^
-|Vaswani et al. (2017) |[[https://arxiv.org/pdf/1706.03762.pdf|Attention Is All You Need ]]  | Code used for training and evaluation: https://github.com/tensorflow/tensor2tensor | Introduction of a new simple network architecture, the **Transformer**, based solely on attention mechanisms, dispensing with recurrence and convolutions entirely.  |
-|Kim et al. (2017) |[[https://arxiv.org/pdf/1702.00887.pdf|Structured Attention Networks]]  | https://github.com/harvardnlp/struct-attn | In this work, we experiment with incorporating richer structural distributions, encoded using graphical models, within deep networks. We show that these structured **attention** networks are simple extensions of the basic attention procedure, and that they allow for extending attention beyond the standard soft-selection approach, such as attending to partial segmentations or to subtrees. |
-|Radford et al. (2018) |[[https://www.cs.ubc.ca/~amuham01/LING530/papers/radford2018improving.pdf|Improving Language Understanding by Generative Pre-Training]]  | https://github.com/openai/finetune-transformer-lm | Natural language understanding comprises a wide range of diverse tasks such as textual entailment, question answering, semantic similarity assessment, and document classification. Although large unlabeled text corpora are abundant, labeled data for learning these specific tasks is scarce, making it challenging for discriminatively trained models to perform adequately. We demonstrate that large gains on these tasks can be realized by generative pre-training of a language model on a diverse corpus of unlabeled text, followed by discriminative fine-tuning on each specific task. **//GPT-1//** |
-|Devlin et al. (2018) |[[https://arxiv.org/pdf/1810.04805.pdf|BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding]]  | https://github.com/openai/finetune-transformer-lm | Introduction of a new language representation model called **BERT**, which stands for Bidirectional Encoder Representations from Transformers. BERT is designed to pre-train deep bidirectional representations from unlabeled text by jointly conditioning on both left and right context in all layers. |
-|Radford et al. (2019) |[[https://d4mucfpksywv.cloudfront.net/better-language-models/language-models.pdf|Language Models are Unsupervised Multitask Learners]]  | https://github.com/openai/gpt-2  | Natural language processing tasks, such as question answering, machine translation, reading comprehension, and summarization, are typically approached with supervised learning on taskspecific datasets. We demonstrate that language models begin to learn these tasks without any explicit supervision when trained on a new dataset of millions of webpages called WebText.(...) Our largest model, **GPT-2**, is a 1.5B parameter Transformer that achieves state of the art results on 7 out of 8 tested language modeling datasets in a zero-shot setting but still underfits Web-Text. |
-|Ruder (2019) |[[https://ruder.io/thesis/neural_transfer_learning_for_nlp.pdf|Neural Transfer Learning for Natural Language Processing]] | https://github.com/sebastianruder | Multiple novel methods for different **transfer learning** scenarios were presented and evaluated across a diversity of settings where they outperformed single-task learning as well as competing transfer learning methods. |
-| Kovaleva et al. (2019) | [[https://arxiv.org/pdf/1908.08593.pdf|Revealing the Dark Secrets of BERT]] | - | BERT-based architectures currently give state-of-the-art performance on many NLP tasks, but little is known about the exact mechanisms that contribute to its success. In the current work, we focus on the **interpretation of self-attention**, which is one of the fundamental underlying components of BERT. |
-| Rogers et al. (2020) |  [[https://arxiv.org/pdf/2002.12327.pdf|A Primer in BERTology: What We Know About How BERT Works]]  | - | This paper is the first survey of over 150 studies of the popular BERT model. We review the current state of knowledge about **how BERT works**, what kind of information it learns and how it is represented, common modifications to its training objectives and architecture, the overparameterization issue and approaches to compression. ^