Gender Bias In Machine Translation

栏目: IT技术 · 发布时间: 5年前

Gender Bias In Machine translation

Machine translation models are trained on huge corpuses of text, with pairs of sentences, one a translation of another into a different language. However, there are nuances in language that often make it difficult to provide an accurate and direct translation from one language to another.

When translating from English to languages such as French or Spanish, some gender neutral nouns will be translated into gender specific nouns. For example, the word “friend” in “his friend is kind” is gender neutral in English. However, in Spanish it is gender specific, either “amiga” (feminine) or “amigo” (masculine).

In Spanish the word “friend” is gender specific, either “amiga” or “amigo”

Another example is translation from Turkish to English. Turkish is almost an entirely gender neutral language. The pronoun “o” in Turkish can be translated into English as any of “he”, “she” or “it”. Google claim that 10% of their Turkish translate queries are ambiguous, and could be correctly translated into either gender.

In both these examples, we can see how a phrase in one language can be correctly translated into another language with different variations based on gender. Neither is more correct than the other, and a human with the same translation task would be faced with the same ambiguity without being provided with further context. (The only difference is that perhaps the human would know to ask for further context, or else provide both translations.) This means that it is incorrect to assume that there is always a single correct translation for any given word, phrase or sentence when translating from one language to another.

It is now easy understand why Google Translate was having issues with gender bias. If societal biases meant more men than women had historically become doctors, there would be more examples of male doctors than female doctors in the training data, which is just an accurate historical record of that gender imbalance. The model would learn from this data, resulting in a bias, that doctors are more likely to be male.

Now, when faced with the task of finding a single translation for “o bir doktor”, “he/she is a doctor” from Turkish to English, the model will assume “o” should he translated as he, as doctors are more likely to be male.

One might see how the opposite could occur for nurses.

Request for deletion

About

MC.AI – Aggregated news about artificial intelligence

MC.AI collects interesting articles and news about artificial intelligence and related areas. The contributions come from various open sources and are presented here in a collected form.

The copyrights are held by the original authors, the source is indicated with each contribution.

Contributions which should be deleted from this platform can be reported using the appropriate form (within the contribution).

MC.AI is open for direct submissions, we look forward to your contribution!

Search on MC.AI

mc.ai aggregates articles from different sources - copyright remains at original authors


以上就是本文的全部内容,希望本文的内容对大家的学习或者工作能带来一定的帮助,也希望大家多多支持 码农网

查看所有标签

猜你喜欢:

本站部分资源来源于网络,本站转载出于传递更多信息之目的,版权归原作者或者来源机构所有,如转载稿涉及版权问题,请联系我们

算法导论

算法导论

[美] Thomas H. Cormen、Charles E. Leiserson、Ronald L. Rivest、Clifford Stein / 高等教育出版社 / 2002-5 / 68.00元

《算法导论》自第一版出版以来,已经成为世界范围内广泛使用的大学教材和专业人员的标准参考手册。 这本书全面论述了算法的内容,从一定深度上涵盖了算法的诸多方面,同时其讲授和分析方法又兼顾了各个层次读者的接受能力。各章内容自成体系,可作为独立单元学习。所有算法都用英文和伪码描述,使具备初步编程经验的人也可读懂。全书讲解通俗易懂,且不失深度和数学上的严谨性。第二版增加了新的章节,如算法作用、概率分析......一起来看看 《算法导论》 这本书的介绍吧!

MD5 加密
MD5 加密

MD5 加密工具

XML 在线格式化
XML 在线格式化

在线 XML 格式化压缩工具

HEX CMYK 转换工具
HEX CMYK 转换工具

HEX CMYK 互转工具