Vespa.ai and the CORD-19 public API

栏目: IT技术 · 发布时间: 4年前

内容简介：The Vespa team has been working non-stop to put together theAs a user, you can either search for articles by using theThe cord19.vespa.ai query interface supports the Vespa

Vespa.ai and the CORD-19 public API

A taste of what you can do with Vespa

The Vespa team has been working non-stop to put together the cord19.vespa.ai search app based on the COVID-19 Open Research Dataset (CORD-19) released by the Allen Institute for AI . Both the frontend and the backend are 100% open-sourced. The backend is based on vespa.ai , a powerful and open-sourced computation engine. Since everything is open-sourced, you can contribute to the project in multiple ways.

As a user, you can either search for articles by using the frontend or perform advanced search by using the public search API . As a developer, you can contribute by improving the existing application through pull requests to the backend and frontend or you can fork and create your own application, either locally or through Vespa Cloud , to experiment with different ways to match and rank the CORD-19 articles . My goal here with this piece is to give you an overview of what can be accomplished with Vespa by using the cord19 search app public API. This only scratches the surface but I hope it can help direct you to the right places to learn more about what is possible.

Simple query language

The cord19.vespa.ai query interface supports the Vespa simple query language that allow you to quickly perform simple queries. Examples:

Additional resources:

Vespa Search API

In addition to the simple query language, Vespa has also a more powerful search API that gives full control in terms of search experience through the Vespa query language called YQL. We can then send a wide range of queries by sending a POST request to the search end-point of cord19.vespa.ai . Following are python code illustrating the API:

import requests # Install via 'pip install requests'endpoint = 'https://api.cord19.vespa.ai/search/'
response = requests.post(endpoint, json=body)

Search by query terms

Let’s break down one example to give you a hint of what is possible to do with Vespa search API:

body = {
 'yql': 'select title, abstract from sources * where userQuery() and has_full_text=true and timestamp > 1577836800;',
 'hits': 5,
 'query': 'coronavirus temperature sensitivity',
 'type': 'any',
 'ranking': 'bm25'
}

The match phase: The body parameter above will select the title and the abstract fields for all articles that match any ( 'type': 'any' ) of the 'query' terms and that has full text available ( has_full_text=true ) and timestamp greater than 1577836800.

The ranking phase: After matching the articles by the criteria described above, Vespa will rank them according to their BM25 scores ( 'ranking': 'bm25' ) and return the top 5 articles ( 'hits': 5 ) according to this rank criteria.

The example above gives only a taste of what is possible with the search API. We can tailor both the match phase and ranking phase to our needs. For example, we can use more complex match operators such as the Vespa weakAND, we can restrict the search to look for match only in the abstract by adding 'default-index': 'abstract' in the body above. We can experiment with different ranking function at query time by changing the 'ranking' parameter to one of the rank-profiles available in the search definition file .

Additional resources:

The Vespa text search tutorial show how to create a text search app on a step-by-step basis. Part 1 shows how to create a basic app from scratch. Part 2 shows how to collect training data from Vespa and improve the application with ML models. Part 3 shows how to get started with semantic search by using pre-trained sentence embeddings.
More YQL examples specific to the cord19 app can be found in cord19 API doc .

Search by semantic relevance

In addition to searching by query terms, Vespa supports semantic search.

body = {
 'yql': 'select * from sources * where ([{"targetNumHits":100}]nearestNeighbor(title_embedding, vector));',
 'hits': 5,
 'ranking.features.query(vector)': embedding.tolist(),
 'ranking.profile': 'semantic-search-title',
}

The match phase: In the query above we match at least 100 articles ( [{"targetNumHits":100}] ) which have the smallest (euclidean) distance between the title_embedding and the query embedding vector by using the nearestNeighbor operator .

The ranking phase: After matching we can rank the documents in a variety of ways. In this case we use a specific rank-profile named 'semantic-search-title' that was pre-defined to order the matched articles the distance between title and query embeddings.

The title embeddings have been created while feeding the documents to Vespa while the query embedding is created at query time and sent to Vespa by the ranking.features.query(vector) parameter. This Kaggle notebook illustrate how to perform semantic search in the cord19 app by using the SCIBERT-NLI model .

Additional resources:

Part 3 of the text search tutorial shows how to get started with semantic search by using pre-trained sentence embeddings.
Go to the Ranking page to know more about ranking in general and how to deploy ML models in Vespa (including TensorFlow, XGBoost, etc).

以上就是本文的全部内容，希望本文的内容对大家的学习或者工作能带来一定的帮助，也希望大家多多支持码农网

查看所有标签

猜你喜欢:

Vespa.ai and the CORD-19 public API

本站部分资源来源于网络，本站转载出于传递更多信息之目的，版权归原作者或者来源机构所有，如转载稿涉及版权问题，请联系我们。

码农书籍

黑客攻防技术宝典（第2版）

[英] Dafydd Stuttard、[英] Marcus Pinto / 石华耀、傅志红 / 人民邮电出版社 / 2012-6-26 / 99.00元

内容简介： Web应用无处不在，安全隐患如影随形。承载着丰富功能与用途的Web应用程序中布满了各种漏洞，攻击者能够利用这些漏洞盗取用户资料，实施诈骗，破坏其他系统等。近年来，一些公司的网络系统频频遭受攻击，导致用户信息泄露，造成不良影响。因此，如何确保Web应用程序的安全，已成为摆在人们眼前亟待解决的问题。本书是Web安全领域专家的经验结晶，系统阐述了如何针对Web应用程序展开攻击与......一起来看看《黑客攻防技术宝典（第2版）》这本书的介绍吧!

码农工具