Salesforce Open Sources a Framework for Open Domain Question Answering Using Wikipedia

栏目: IT技术 · 发布时间: 6年前

内容简介:The framework uses a multi-hop QA method to answer complex questions by reasoning through Wikipedia’s datasets .Question-answering is a natural human cognitive mechanism that plays a key ole in the acquisition of knowledge. We are constantly evaluating inf

Salesforce Open Sources a Framework for Open Domain Question Answering Using Wikipedia

The framework uses a multi-hop QA method to answer complex questions by reasoning through Wikipedia’s datasets .

Question-answering is a natural human cognitive mechanism that plays a key ole in the acquisition of knowledge. We are constantly evaluating information to develop answers to specific questions. For years, artificial intelligence(AI) systems have tried to simulate that cognitive ability in the form of a discipline known as Open Domain Question Answering(ODQA). Recently, Salesforce Research open sourced a framework for ODQA based on the Wikipedia graph .

ODQA has been one of the most active fields in natural language processing(NLP) research in recent years. However most ODQA systems operate within a highly constrained environments in which open-domain QA systems first select a few paragraphs for each query, using a computationally efficient term-based retriever and then read the top-ranked paragraphs to extract an answer. This approach is known as single-hope QA and even though is a difficult challenge it does not quite resembles the human cognition process. Regularly, we are faced with questions that require examining many documents to construct a single answer. This has become one of the new frontiers for ODQA systems.

Multi-Hop QA.

Consider trying to answer the question When was the football club founded in which Walter Otto Davis played at center-forward? using two paragraphs listed below. There are a few challenges that should be considered in the context of ODQA systems. For the example question, it is necessary to effectively “hop” to the paragraph 2 (Millwall F.C.), which contains the answer (1985). However, widely-used term-based retrieval may fail to find it, as the key entity, “Millwall Football Club,” is not explicitly stated in the question. To answer the target question, an ODQA systems needs to do “multiple hops” across different documents. AI theory typically refers to this type of system as multi-hop QA.

Multi-hop QA usually requires finding more than one evidence document, one of which often consists of little lexical overlap or semantic relationship to the original question. The key element to solve multi-hop QA problems is to build a cognitive graph of documents that contain the evidence required to answer the target question.

Wikipedia Reasoning Graph

Building a cognitive graph of documents seems like something we’ve already done. Wikipedia provides a rich, hierarchical dataset of linked documents that provide evidence to almost any question we can think of. Following that line of thinking, Salesforce’s framework leveraged Wikipedia to learn to retrieve reasoning paths and build a cognitive graph to answer complex open-domain questions.

In order to construct the reasoning graph, the Salesforce framework first search answer for a complex question on Wikipedia, we may first look at a Wikipedia article we can easily find based on partial information in the question. If we cannot find enough information there, we might click a hyperlink to another highly related article, and terminate searching when we collect enough evidence to answer the question. The reasoning graph is constructed using links in Wikipedia articles. Specifically, the framework uses the hyperlinks to construct the direct edges of the graph. Additionally, it also considers symmetric within-document links, allowing a paragraph to hop to other paragraphs in the same article. The resulting Wikipedia graph is densely connected and covers a wide range of topics that provide useful evidence for open-domain questions. This graph is constructed offline and is reused throughout training and inference for any question.

Using a reasoning graph as an starting point, the Salesforce framework relies on recurrent neural networks(RNN) to model the reasoning path for a given question. At any given time, the model selects a paragraph among a set of candidate paragraphs given the current hidden state of the RNN. The initial hidden state is independent of any questions or paragraphs, and based on a parameterized vector. The model relies on BERT’s token representation to independently encode each candidate paragraph. After that, the RNN computes the then compute the probability that the candidate paragraph is selected. The RNN selection procedure captures relationships between paragraphs in the reasoning path by conditioning on the selection history. The process is terminated when [EOE], the end-of evidence symbol, is selected, to allow it to capture reasoning paths with arbitrary length given each question.

The Salesforce framework actively leverages efficient search strategies like beam search that can reduce the computational complexity of the model which has traditionally been a challenge for previous multi-hop QA approaches.

Salesforce evaluated its graph retriever framework for several Wikipedia datasets: HotpotQA, SQuAD Open and Natural Questions Open. The model achieved remarkable performance across all datasets but exceled when using the HotpotQA dataset in which multi-hop retrieval was essential. A pretty impressive fact is that the complete model was trained using a single GPU with 11GB of RAM. The following matrix shows the results.

Arguably the most notable achievement of the Salesforce framework was the ability of build reasoning paths in scenarios that were impossible for other multi-hop models. For instance, the following figure shows a scenario in which the Salesforce graph model successfully retrieves the correct reasoning path and answers correctly, while other models like Re-rank fails. The top two paragraphs next to the graph are the introductory paragraphs of the two entities on the reasoning path, and the paragraph at the bottom shows the wrong paragraph selected by Re-rank. The “Millwall F.C.” has fewer lexical overlaps and the bridge entity “Millwall” is not stated in the given question. Thus, Re-rank chooses a wrong paragraph with high lexical overlaps to the given question.

The salesforce graph-based retrieval framework represents an interesting approach to ODQA systems. The model learned to sequentially retrieve evidence paragraphs to form the reasoning path and then re-ranks the reasoning paths, determining the final answer as the one extracted from the best reasoning path. The current implementation of the framework is available in Github and is accompanied by a research paper detailing the underlying techniques .


以上所述就是小编给大家介绍的《Salesforce Open Sources a Framework for Open Domain Question Answering Using Wikipedia》,希望对大家有所帮助,如果大家有任何疑问请给我留言,小编会及时回复大家的。在此也非常感谢大家对 码农网 的支持!

查看所有标签

猜你喜欢:

本站部分资源来源于网络,本站转载出于传递更多信息之目的,版权归原作者或者来源机构所有,如转载稿涉及版权问题,请联系我们

移动风暴

移动风暴

[美]弗雷德·沃格尔斯坦 / 朱邦芊 / 中信出版社 / 2014-1-1 / 39

也许,除了伟大的乔布斯,每一位奋力改变世界的硅谷英雄,都值得我们肃然起敬。苹果与谷歌十年博弈,关于这场移动平台战争的报道早已铺天盖地,而这是第一次,我们能听到幕后工程师的真实声音。两大科技巨人用智能手机和平板电脑颠覆了电脑产业。它们位处变革的中心,凭借各自的经营哲学、魅力领袖和商业敏感度,把竞争变成了残酷对决。商业记者沃格尔斯坦报道这场对抗已逾十载,在《移动风暴》中,他带领我们来到一间间办公室和会......一起来看看 《移动风暴》 这本书的介绍吧!

JSON 在线解析
JSON 在线解析

在线 JSON 格式化工具

在线进制转换器
在线进制转换器

各进制数互转换器

HEX HSV 转换工具
HEX HSV 转换工具

HEX HSV 互换工具