Recreating YikYak with Postgres

栏目: IT技术 · 发布时间: 4年前

内容简介:YikYak filed a few patents for the tech that helped them achieve this. The patents mention segmenting users into buckets by their physical location. One modern tool we have to recreate this type of user segmentation is a data-structure called anR-trees are

YikYak was an anonymous social network that used your location to show you posts 5km around you. Users of the app could create new posts and the people around them could view the posts and vote up or down.

YikYak filed a few patents for the tech that helped them achieve this. The patents mention segmenting users into buckets by their physical location. One modern tool we have to recreate this type of user segmentation is a data-structure called an R-Tree .

Recreating YikYak with Postgres
An example on an R-Tree in action

R-trees are tree data structures used for spatial access methods, i.e., for indexing multi-dimensional information such as geographical coordinates , rectangles or polygons.

Luckily the Postgres database enables us to make use of this data-structure via geospatial extensions. In this post I am going to;

  1. Show how we can enable those extensions.
  2. Seed a few posts into our database.
  3. Find the posts in a small around a specific latitude and longitude using a SQL query.

Let’s get started!

Creating tables.

Firstly you will need an instance of Postgres. It is easy to set up in Docker (I’ve detailed a posthere showing how).

I am going to be using DBeaver for this tutorial but you could use psql or any other Postgres connector. Let’s creating a new table for our posts.

Recreating YikYak with Postgres
Select the SQL Editor
Recreating YikYak with Postgres
Chose whatever Database you want. I am going with Postgres
Recreating YikYak with Postgres
Name your script

Ready to go – So below we have a simple example of table for storing new posts. I am using a split latitude and longitude to show how the extensions work, but you could also combine the two into a POINT datatype if you are planning to use a lot of columns.

CREATE TABLE post (
	id int8 NOT NULL GENERATED ALWAYS AS IDENTITY,
	post_content text NOT NULL,
	latitude float8 NOT NULL,
	longitude float8 NOT NULL
);

On executing that you should have a table you can start insert values into.

Inserting posts.

So let’s start out by inserting two posts, the first posted from 10 Downing Street, and the second from Buckingham Palace.

INSERT INTO post VALUES (
	default,
	'I absolutely love the Queen. I hope she thinks I am doing a good job.',
	51.5034,
	0.1276
);
INSERT INTO post VALUES (
	default,
	'The new Prime Minister is a prat! I do hope he doesnt come over often',
	51.5014,
	0.1419
);

Now let’s put another post in from an aspiring politics student who is located in Cambridge University (65 miles away). Now we have an outlier that won’t show up once we do location bound queries later in this tutorial.

INSERT INTO post VALUES (
	default,
	'Day one of my politics degree. Shall be most fun to stalk the halls of Westminister in 4 years.',
	52.2053,
	0.1218
);

Installing Postgres extensions

We would like to be able to stand in St. James park (a large park between 10 Downing Street and Buckingham Palace) and see the two posts close by, but not the one from Cambridge.

So how do we do that? Through extensions! Postgres enables users to incrementally add features that help us do new things with our data.

Once they are installed we can use the latitude and longitude of 51.5032, -0.1349 to create a new select query on our posts table.

Recreating YikYak with Postgres

You can install extensions in Postgres simply by running a query. The two extensions we need are cube and earthdistance .

CREATE EXTENSION IF NOT EXISTS cube;
CREATE EXTENSION IF NOT EXISTS earthdistance;

After executing those two queries, you should see them under the ‘Extensions’ tab in DBeaver.

Recreating YikYak with Postgres

Finding nearby posts.

We can now use these built in functions from those extensions to show us the two nearby posts.

SELECT * FROM post
WHERE 
	earth_box(ll_to_earth(51.5032,-0.1349), 50000) 
	@> ll_to_earth(latitude, longitude);

The earth_box function takes two parameters, a point (which is returned by the ll_to_earth function) and a value for the size of the bounding box we want which is in metres.

By using the contains? operator (@>) we are saying we only want values in the table in the bounding box generated by the earth_box function.

When executing that query we will see the two posts we were expecting! Try increasing the bounding box range out and you will be able to see the Cambridge post.

Recreating YikYak with Postgres

So now we have a working example of how to recreate the YikYak location-based functionality.

So…how?

Okay so why did we need those extensions? Can’t we just take the world, split it into squares and determine which box a latitude and longitude falls into?

Thats what we would like to do – but there are complications caused by the fact that the world is a sphere. To find posts “in your area” you are querying to find straight line distances between two points, your lat-long and for each row in the database. In a sphere there are no straight lines.

There is a way to determine the distance between two points known as the Great-Circle distance . Instead of using straight lines we use circles or curves known as geodesics. Through any two points on a sphere that are not  directly opposite each other , there is a unique great circle.

The earthdistance extension allows us to generate queries using the contains? operator from the cube extension to generate efficient distance lookups between points.

Conclusion

One thing to note is that this query will do a sequential scan of the entire table, which can be slow once you get up to thousands of posts.

If you do decide to use this setup in your application you should create an index on the latitude, longitude to dramatically speed up queries. That would look like this.

CREATE INDEX loc_index ON post USING gist (ll_to_earth(latitude, longitude));

Postgres will then determine whether it needs to use this index to speed up queries. You can check if the index is being used by using a tool to view the execution plan when you run the query detailed above. If it says SEQ_SCAN it is not using the index.

And we’re done! If you’ve noticed any mistakes or improvements I can make please drop me an email at adam@adamfallon.com


以上所述就是小编给大家介绍的《Recreating YikYak with Postgres》,希望对大家有所帮助,如果大家有任何疑问请给我留言,小编会及时回复大家的。在此也非常感谢大家对 码农网 的支持!

查看所有标签

猜你喜欢:

本站部分资源来源于网络,本站转载出于传递更多信息之目的,版权归原作者或者来源机构所有,如转载稿涉及版权问题,请联系我们

智能优化算法及其应用

智能优化算法及其应用

王凌 / 清华大学出版社 / 2001-10 / 22.00元

智能优化算法及其应用,ISBN:9787302044994,作者:王凌著一起来看看 《智能优化算法及其应用》 这本书的介绍吧!

JS 压缩/解压工具
JS 压缩/解压工具

在线压缩/解压 JS 代码

随机密码生成器
随机密码生成器

多种字符组合密码

URL 编码/解码
URL 编码/解码

URL 编码/解码