Recreating YikYak with Postgres

栏目: IT技术 · 发布时间: 4年前

内容简介:YikYak filed a few patents for the tech that helped them achieve this. The patents mention segmenting users into buckets by their physical location. One modern tool we have to recreate this type of user segmentation is a data-structure called anR-trees are

YikYak was an anonymous social network that used your location to show you posts 5km around you. Users of the app could create new posts and the people around them could view the posts and vote up or down.

YikYak filed a few patents for the tech that helped them achieve this. The patents mention segmenting users into buckets by their physical location. One modern tool we have to recreate this type of user segmentation is a data-structure called an R-Tree .

Recreating YikYak with Postgres
An example on an R-Tree in action

R-trees are tree data structures used for spatial access methods, i.e., for indexing multi-dimensional information such as geographical coordinates , rectangles or polygons.

Luckily the Postgres database enables us to make use of this data-structure via geospatial extensions. In this post I am going to;

  1. Show how we can enable those extensions.
  2. Seed a few posts into our database.
  3. Find the posts in a small around a specific latitude and longitude using a SQL query.

Let’s get started!

Creating tables.

Firstly you will need an instance of Postgres. It is easy to set up in Docker (I’ve detailed a posthere showing how).

I am going to be using DBeaver for this tutorial but you could use psql or any other Postgres connector. Let’s creating a new table for our posts.

Recreating YikYak with Postgres
Select the SQL Editor
Recreating YikYak with Postgres
Chose whatever Database you want. I am going with Postgres
Recreating YikYak with Postgres
Name your script

Ready to go – So below we have a simple example of table for storing new posts. I am using a split latitude and longitude to show how the extensions work, but you could also combine the two into a POINT datatype if you are planning to use a lot of columns.

CREATE TABLE post (
	id int8 NOT NULL GENERATED ALWAYS AS IDENTITY,
	post_content text NOT NULL,
	latitude float8 NOT NULL,
	longitude float8 NOT NULL
);

On executing that you should have a table you can start insert values into.

Inserting posts.

So let’s start out by inserting two posts, the first posted from 10 Downing Street, and the second from Buckingham Palace.

INSERT INTO post VALUES (
	default,
	'I absolutely love the Queen. I hope she thinks I am doing a good job.',
	51.5034,
	0.1276
);
INSERT INTO post VALUES (
	default,
	'The new Prime Minister is a prat! I do hope he doesnt come over often',
	51.5014,
	0.1419
);

Now let’s put another post in from an aspiring politics student who is located in Cambridge University (65 miles away). Now we have an outlier that won’t show up once we do location bound queries later in this tutorial.

INSERT INTO post VALUES (
	default,
	'Day one of my politics degree. Shall be most fun to stalk the halls of Westminister in 4 years.',
	52.2053,
	0.1218
);

Installing Postgres extensions

We would like to be able to stand in St. James park (a large park between 10 Downing Street and Buckingham Palace) and see the two posts close by, but not the one from Cambridge.

So how do we do that? Through extensions! Postgres enables users to incrementally add features that help us do new things with our data.

Once they are installed we can use the latitude and longitude of 51.5032, -0.1349 to create a new select query on our posts table.

Recreating YikYak with Postgres

You can install extensions in Postgres simply by running a query. The two extensions we need are cube and earthdistance .

CREATE EXTENSION IF NOT EXISTS cube;
CREATE EXTENSION IF NOT EXISTS earthdistance;

After executing those two queries, you should see them under the ‘Extensions’ tab in DBeaver.

Recreating YikYak with Postgres

Finding nearby posts.

We can now use these built in functions from those extensions to show us the two nearby posts.

SELECT * FROM post
WHERE 
	earth_box(ll_to_earth(51.5032,-0.1349), 50000) 
	@> ll_to_earth(latitude, longitude);

The earth_box function takes two parameters, a point (which is returned by the ll_to_earth function) and a value for the size of the bounding box we want which is in metres.

By using the contains? operator (@>) we are saying we only want values in the table in the bounding box generated by the earth_box function.

When executing that query we will see the two posts we were expecting! Try increasing the bounding box range out and you will be able to see the Cambridge post.

Recreating YikYak with Postgres

So now we have a working example of how to recreate the YikYak location-based functionality.

So…how?

Okay so why did we need those extensions? Can’t we just take the world, split it into squares and determine which box a latitude and longitude falls into?

Thats what we would like to do – but there are complications caused by the fact that the world is a sphere. To find posts “in your area” you are querying to find straight line distances between two points, your lat-long and for each row in the database. In a sphere there are no straight lines.

There is a way to determine the distance between two points known as the Great-Circle distance . Instead of using straight lines we use circles or curves known as geodesics. Through any two points on a sphere that are not  directly opposite each other , there is a unique great circle.

The earthdistance extension allows us to generate queries using the contains? operator from the cube extension to generate efficient distance lookups between points.

Conclusion

One thing to note is that this query will do a sequential scan of the entire table, which can be slow once you get up to thousands of posts.

If you do decide to use this setup in your application you should create an index on the latitude, longitude to dramatically speed up queries. That would look like this.

CREATE INDEX loc_index ON post USING gist (ll_to_earth(latitude, longitude));

Postgres will then determine whether it needs to use this index to speed up queries. You can check if the index is being used by using a tool to view the execution plan when you run the query detailed above. If it says SEQ_SCAN it is not using the index.

And we’re done! If you’ve noticed any mistakes or improvements I can make please drop me an email at adam@adamfallon.com


以上所述就是小编给大家介绍的《Recreating YikYak with Postgres》,希望对大家有所帮助,如果大家有任何疑问请给我留言,小编会及时回复大家的。在此也非常感谢大家对 码农网 的支持!

查看所有标签

猜你喜欢:

本站部分资源来源于网络,本站转载出于传递更多信息之目的,版权归原作者或者来源机构所有,如转载稿涉及版权问题,请联系我们

Modern PHP(中文版)

Modern PHP(中文版)

Josh Lockhart / 安道 / 中国电力出版社 / 2015-9 / 39

PHP正在重生,不过所有PHP在线教程都过时了,很难体现这一点。通过这本实用的指南,你会发现,借助面向对象、命名空间和不断增多的可重用的组件库,PHP已经成为一门功能完善的成熟语言。 本书作者Josh Lockhart是“PHP之道”的发起人,这是个受欢迎的新方案,鼓励开发者使用PHP最佳实践。Josh通过实践揭示了PHP语言的这些新特性。你会学到关于应用架构、规划、数据库、安全、测试、调试......一起来看看 《Modern PHP(中文版)》 这本书的介绍吧!

HTML 编码/解码
HTML 编码/解码

HTML 编码/解码

Markdown 在线编辑器
Markdown 在线编辑器

Markdown 在线编辑器

HEX CMYK 转换工具
HEX CMYK 转换工具

HEX CMYK 互转工具