Телеграмм чат группы dlinnlp страница 29

Рубрика «Читаем статьи за вас». Июль — Сентябрь 2019

Рубрика «Читаем статьи за вас». Июль — Сентябрь 2019
habr.com/ru/company/ods/blog/472672

Хабр

Привет, Хабр! Продолжаем публиковать рецензии на научные статьи от членов сообщества Open Data Science из канала #article_essense. Хотите получать их раньше вс...

173115:56пожаловаться #1

2019 November 01

BPE-Dropout: Simple and Effective Subword Regularization
Provilkov et al. [Yandex]
arxiv.org/abs/1910.13267

TL;DR
Hypothesis: standard BPE embeddings do not allow the model to learn the full variety of morphology in the language and are not robust to segmentation errors.
Proposed solution: let’s use multiple segmentations of the same word during training (see the picture).
Results: up to +3 BPE; makes rare tokens less rare and improves their embeddings; more robust to misspellings.

It is a very simple regularization algorithm and can be seen as a token-level augmentation too. Let’s use it! The algorithm is a part of github.com/rsennrich/subword-nmt (just use argument --dropout 0.1 )

via twitter.com/lena_voita/status/1189546512491134977

138822:41пожаловаться #2

127422:41пожаловаться #3

2019 November 02

Google увидел, что люди бегут от tensorboard к wandb.ai и создал свой аналог. Если вы всё ещё используете Tensorboard, то теперь можете складывать его в облако.

twitter.com/alxkh/status/1189952614139670528

Have you already tried this? Google launches https://t.co/vz2a0Z0XZn (for sharing ML experiments) and #TensorFlow Enterprise (with up to 3x improvements in reading data): https://t.co/fvX4vSiQIk #AI #ArtificialIntelligence #MachineLearning

Alex Hyżniak

143910:11пожаловаться #4

BART: Denoising Sequence-to-Sequence Pre-training for Natural Language Generation, Translation, and Comprehension
Lewis et al. [FAIR]
arxiv.org/abs/1910.13461

Ok, people. We all know that seq2seq is kinda the most general task in NLP. So every pre-training task we have can be built into it. Let’s add some new ones and see what happends.
List of tasks:
1. Masked Language Modelling
1. Random Token Deletion
1. Text Infilling - replase a sequence of tokens with a single MASK token, restore them all (not knowing how many was deleted)
1. Sentence Shuffling - permute the sentences, restore them in the original order
1. Document Rotation - select random token, start document from this token and when it ends continue it with the first token, model should restore the document starting with the original token

After some experiments authors chose two tasks: Text Infilling and Sentence Shuffling and train seq2seq on this two tasks in RoBERTa setup.

Results: compatible to RoBERTa on NLU (GLUE, SQuAD, …) and SOTA on NLG (ELI5, XSum, CNN/DailyMail, PersonaChat). But not-so-good metrics at (Romanian-English) translation.

132811:26пожаловаться #5

127211:26пожаловаться #6

127311:26пожаловаться #7

1910.13461.pdf

(434.87 Кб)

Paper with my highlights

182811:26пожаловаться #8

2019 November 03

Подъезжают новости с текущего EmNLP

115419:00пожаловаться #9

Переслано от Yaroslav Emelianov

https://arxiv.org/abs/1908.07898
Пока одни делают yet another B*RT, чтобы выбить соту на открытых датасетах, другие заморочились и сделали анализ annotator bias в nli данных
В частности, добавка id аннотатора как фичи модели дает прибавку к метрикам и генерализация модели при разбивке данных по аннотаторам получается хуже чем на test set.

126419:00пожаловаться #10

2019 November 04

[D] any principled reason for cross entropy instead of L2 in...

Interesting Reddit discussion about cross-entropy loss vs L2 over embeddings

reddit.com/r/MachineLearning/comments/dqoh2u/d_any_principled_reason_for_cross_entropy_instead

Is there any principled reason for doing softmax and cross entropy for the loss in for example transformers, rather than doing L2 over the target...

128111:26пожаловаться #11

Я часто вижу вопросы вида “что мне почитать, чтобы разбираться в NLP”

132211:26пожаловаться #12

156911:26пожаловаться #13

2019 November 05

Несколько очевидные наблюдения, но зато всё на постере и хорошо видно.

115311:26пожаловаться #14

133011:26пожаловаться #15

2019 November 06

https://twitter.com/OpenAI/status/1191764001434173440?s=20

Переслано от viktor

We're releasing the 1.5billion parameter GPT-2 model as part of our staged release publication strategy. - GPT-2 output detection model: https://t.co/PX3tbOOOTy - Research from partners on potential malicious uses: https://t.co/om28yMULL5 - More details: https://t.co/d2JzaENiks

OpenAI

115001:37пожаловаться #16

https://twitter.com/facebookai/status/1191945825192153088

If you’re attending #emnlp2019, we’re presenting a live demo today at 13:30 – 15:00 of VizSeq, a visual analysis tool for text generation tasks. Hope to see you there! You can also learn more about VizSeq here: Paper: https://t.co/1P4giAyEsv Code: https://t.co/43SX9e94Np

Facebook AI

110711:20пожаловаться #17

Ещё одна статья в копилку про text style transfer

Style Transfer for Texts: Retrain, Report Errors, Compare with Rewrites
Tikhonov et al.

This paper shows that standard assessment methodology for style transfer has several sig- nificant problems. First, the standard metrics for style accuracy and semantics preservation vary significantly on different re-runs. There- fore one has to report error margins for the obtained results. Second, starting with cer- tain values of bilingual evaluation understudy (BLEU) between input and output and accu- racy of the sentiment transfer the optimization of these two standard metrics diverge from the intuitive goal of the style transfer task. Finally, due to the nature of the task itself, there is a specific dependence between these two met- rics that could be easily manipulated. Under these circumstances, we suggest taking BLEU between input and human-written reformula- tions into consideration for benchmarks. We also propose three new architectures that out- perform state of the art in terms of this metric.

https://www.aclweb.org/anthology/D19-1406.pdf

112411:24пожаловаться #18

twitter.com/EzraWu/status/1191898513027796992

Excited to share my summer intern project @facebookai “Emerging Cross-lingual Structure in Pretrained Language Models” https://t.co/YGFhT3Go1e We dissect mBERT & XLM and show monolingual BERTs are similar Joint work with @alex_conneau @AimeeLiS @LukeZettlemoyer @vesko_st

Shijie Wu

109011:25пожаловаться #19

Revealing the Dark Secrets of BERT
Kovaleva et al. [UMass Lowell]
arxiv.org/abs/1908.08593

Статья нашей лабы по интерпретации Берта на EMNLP