Transformer 代码实现及应用

栏目: 数据库 · 发布时间: 6年前

内容简介:Ashish Vaswani, Noam Shazeer, Niki Parmar, Jakob Uszkoreit, Llion Jones, Aidan N. Gomez, Lukasz Kaiser, Illia Polosukhin(Submitted on 12 Jun 2017 (v1), last revised 6 Dec 2017 (this version, v5))

Transformer_implementation_and_application 该资源仅仅用300行代码(Tensorflow 2)完整复现了Transformer模型,并且应用在神经机器翻译任务和聊天机器人上。

Attention Is All You Need

Ashish Vaswani, Noam Shazeer, Niki Parmar, Jakob Uszkoreit, Llion Jones, Aidan N. Gomez, Lukasz Kaiser, Illia Polosukhin

(Submitted on 12 Jun 2017 (v1), last revised 6 Dec 2017 (this version, v5))

The dominant sequence transduction models are based on complex recurrent or convolutional neural networks in an encoder-decoder configuration. The best performing models also connect the encoder and decoder through an attention mechanism. We propose a new simple network architecture, the Transformer, based solely on attention mechanisms, dispensing with recurrence and convolutions entirely. Experiments on two machine translation tasks show these models to be superior in quality while being more parallelizable and requiring significantly less time to train. Our model achieves 28.4 BLEU on the WMT 2014 English-to-German translation task, improving over the existing best results, including ensembles by over 2 BLEU. On the WMT 2014 English-to-French translation task, our model establishes a new single-model state-of-the-art BLEU score of 41.8 after training for 3.5 days on eight GPUs, a small fraction of the training costs of the best models from the literature. We show that the Transformer generalizes well to other tasks by applying it successfully to English constituency parsing both with large and limited training data.

Comments: 15 pages, 5 figures

Subjects: Computation and Language (cs.CL); Machine Learning (cs.LG)

Cite as: arXiv:1706.03762 [cs.CL]

(or arXiv:1706.03762v5 [cs.CL] for this version)


以上就是本文的全部内容,希望对大家的学习有所帮助,也希望大家多多支持 码农网

查看所有标签

猜你喜欢:

本站部分资源来源于网络,本站转载出于传递更多信息之目的,版权归原作者或者来源机构所有,如转载稿涉及版权问题,请联系我们

长尾理论2.0

长尾理论2.0

安德森 / 乔江涛、石晓燕 / 中信出版社 / 2009-5 / 42.00元

《长尾理论2.0》是克里斯·安德森对所有问题最明确的回答。在此书中,他详细阐释了长尾的精华所在,揭示了长尾现象是如何从工业资本主义原动力——规模经济与范围经济——的矛盾中产生出来的。长尾现象虽然是明显的互联网现象,但其商务逻辑本身,却是从工业经济中自然而然“长”出来的,网络只是把酝酿了几十年的供应链革命的诸多要素简单地结合在一起了。同时,长尾理论转化为行动,最有力、最可操作的就是营销长尾,通过口碑......一起来看看 《长尾理论2.0》 这本书的介绍吧!

RGB转16进制工具
RGB转16进制工具

RGB HEX 互转工具

URL 编码/解码
URL 编码/解码

URL 编码/解码

HEX CMYK 转换工具
HEX CMYK 转换工具

HEX CMYK 互转工具