Salesforce Open Sources a Framework for Open Domain Question Answering Using Wikipedia

栏目: IT技术 · 发布时间: 6年前

内容简介:The framework uses a multi-hop QA method to answer complex questions by reasoning through Wikipedia’s datasets .Question-answering is a natural human cognitive mechanism that plays a key ole in the acquisition of knowledge. We are constantly evaluating inf

Salesforce Open Sources a Framework for Open Domain Question Answering Using Wikipedia

The framework uses a multi-hop QA method to answer complex questions by reasoning through Wikipedia’s datasets .

Question-answering is a natural human cognitive mechanism that plays a key ole in the acquisition of knowledge. We are constantly evaluating information to develop answers to specific questions. For years, artificial intelligence(AI) systems have tried to simulate that cognitive ability in the form of a discipline known as Open Domain Question Answering(ODQA). Recently, Salesforce Research open sourced a framework for ODQA based on the Wikipedia graph .

ODQA has been one of the most active fields in natural language processing(NLP) research in recent years. However most ODQA systems operate within a highly constrained environments in which open-domain QA systems first select a few paragraphs for each query, using a computationally efficient term-based retriever and then read the top-ranked paragraphs to extract an answer. This approach is known as single-hope QA and even though is a difficult challenge it does not quite resembles the human cognition process. Regularly, we are faced with questions that require examining many documents to construct a single answer. This has become one of the new frontiers for ODQA systems.

Multi-Hop QA.

Consider trying to answer the question When was the football club founded in which Walter Otto Davis played at center-forward? using two paragraphs listed below. There are a few challenges that should be considered in the context of ODQA systems. For the example question, it is necessary to effectively “hop” to the paragraph 2 (Millwall F.C.), which contains the answer (1985). However, widely-used term-based retrieval may fail to find it, as the key entity, “Millwall Football Club,” is not explicitly stated in the question. To answer the target question, an ODQA systems needs to do “multiple hops” across different documents. AI theory typically refers to this type of system as multi-hop QA.

Multi-hop QA usually requires finding more than one evidence document, one of which often consists of little lexical overlap or semantic relationship to the original question. The key element to solve multi-hop QA problems is to build a cognitive graph of documents that contain the evidence required to answer the target question.

Wikipedia Reasoning Graph

Building a cognitive graph of documents seems like something we’ve already done. Wikipedia provides a rich, hierarchical dataset of linked documents that provide evidence to almost any question we can think of. Following that line of thinking, Salesforce’s framework leveraged Wikipedia to learn to retrieve reasoning paths and build a cognitive graph to answer complex open-domain questions.

In order to construct the reasoning graph, the Salesforce framework first search answer for a complex question on Wikipedia, we may first look at a Wikipedia article we can easily find based on partial information in the question. If we cannot find enough information there, we might click a hyperlink to another highly related article, and terminate searching when we collect enough evidence to answer the question. The reasoning graph is constructed using links in Wikipedia articles. Specifically, the framework uses the hyperlinks to construct the direct edges of the graph. Additionally, it also considers symmetric within-document links, allowing a paragraph to hop to other paragraphs in the same article. The resulting Wikipedia graph is densely connected and covers a wide range of topics that provide useful evidence for open-domain questions. This graph is constructed offline and is reused throughout training and inference for any question.

Using a reasoning graph as an starting point, the Salesforce framework relies on recurrent neural networks(RNN) to model the reasoning path for a given question. At any given time, the model selects a paragraph among a set of candidate paragraphs given the current hidden state of the RNN. The initial hidden state is independent of any questions or paragraphs, and based on a parameterized vector. The model relies on BERT’s token representation to independently encode each candidate paragraph. After that, the RNN computes the then compute the probability that the candidate paragraph is selected. The RNN selection procedure captures relationships between paragraphs in the reasoning path by conditioning on the selection history. The process is terminated when [EOE], the end-of evidence symbol, is selected, to allow it to capture reasoning paths with arbitrary length given each question.

The Salesforce framework actively leverages efficient search strategies like beam search that can reduce the computational complexity of the model which has traditionally been a challenge for previous multi-hop QA approaches.

Salesforce evaluated its graph retriever framework for several Wikipedia datasets: HotpotQA, SQuAD Open and Natural Questions Open. The model achieved remarkable performance across all datasets but exceled when using the HotpotQA dataset in which multi-hop retrieval was essential. A pretty impressive fact is that the complete model was trained using a single GPU with 11GB of RAM. The following matrix shows the results.

Arguably the most notable achievement of the Salesforce framework was the ability of build reasoning paths in scenarios that were impossible for other multi-hop models. For instance, the following figure shows a scenario in which the Salesforce graph model successfully retrieves the correct reasoning path and answers correctly, while other models like Re-rank fails. The top two paragraphs next to the graph are the introductory paragraphs of the two entities on the reasoning path, and the paragraph at the bottom shows the wrong paragraph selected by Re-rank. The “Millwall F.C.” has fewer lexical overlaps and the bridge entity “Millwall” is not stated in the given question. Thus, Re-rank chooses a wrong paragraph with high lexical overlaps to the given question.

The salesforce graph-based retrieval framework represents an interesting approach to ODQA systems. The model learned to sequentially retrieve evidence paragraphs to form the reasoning path and then re-ranks the reasoning paths, determining the final answer as the one extracted from the best reasoning path. The current implementation of the framework is available in Github and is accompanied by a research paper detailing the underlying techniques .


以上所述就是小编给大家介绍的《Salesforce Open Sources a Framework for Open Domain Question Answering Using Wikipedia》,希望对大家有所帮助,如果大家有任何疑问请给我留言,小编会及时回复大家的。在此也非常感谢大家对 码农网 的支持!

查看所有标签

猜你喜欢:

本站部分资源来源于网络,本站转载出于传递更多信息之目的,版权归原作者或者来源机构所有,如转载稿涉及版权问题,请联系我们

产品增长力

产品增长力

李阳 / 机械工业出版社 / 2018-4-1 / 59

本书由京东资深数据产品经理撰写,重新定义了数据与产品、业务的关系,从数据分析方法、数据价值挖掘、数据结果倒逼业务优化3个层次,以及设计、运营和优化3个维度,为产品增长提供了科学的依据和方法论,得到了PMCaff创始人阿德、GrowingIO创始人&CEO张溪梦、增长官研究院创始人范冰、腾讯高级产品经理刘涵宇等专家的高度评价。 全书内容以理论为主线,以实操为目标,萃取技术实操与管理思维中的精华......一起来看看 《产品增长力》 这本书的介绍吧!

HTML 压缩/解压工具
HTML 压缩/解压工具

在线压缩/解压 HTML 代码

图片转BASE64编码
图片转BASE64编码

在线图片转Base64编码工具

HEX HSV 转换工具
HEX HSV 转换工具

HEX HSV 互换工具