High-Performance SQL Cluster Engine for Scalable Data Virtualization

栏目: IT技术 · 发布时间: 5年前

内容简介:PGSpider is High-Performance SQL Cluster Engine for distributed big data. PGSpider can access a number of data sources using Foreign Data Wrapper(FDW) and retrieves the distributed data source vertically.PGSpider source code is patch for PostgreSQL. You sh

PGSpider

PGSpider is High-Performance SQL Cluster Engine for distributed big data. PGSpider can access a number of data sources using Foreign Data Wrapper(FDW) and retrieves the distributed data source vertically.

PGSpider source code is patch for PostgreSQL. You should apply this to PostgreSQL source code if you want to use PGSpider. Usage of PGSpider is the same as PostgreSQL. You can use any client applications such as libpq and psql.

Features

  • Multi-Tenant User can get records in multi tables by one SQL easily. If there are tables with similar schema in each data source, PGSpider can view them as a single virtual table: We call it as Multi-Tenant table.

  • Parallel processing

    PGSpider executes queries and fetches results from child nodes in parallel.

    PGSpider expands Multi-Tenant table to child tables, creates new threads for each child table to access corresponding data source.

  • Pushdown

    WHERE clause and aggregation functions are pushed down to child nodes. Pushdown to Multi-tenant tables occur error when AVG, STDDEV and VARIANCE are used. PGSPider improves this error, PGSpider can execute them.

How to build PGSpider

The current version can work with PostgreSQL 11.6.

Download PostgreSQL source code.

https://www.postgresql.org/ftp/source/v11.6/postgresql-11.6.tar.gz

Decompress PostgreSQL source code.

tar xvf postgresql-11.6.tar.gz

Apply patch.

patch -p1 -d postgresql-11.6 < pgspider.patch

Build and install PGSpider and extensions.

cd postgresql-11.6
./configure
make
sudo make install
cd contrib/pgspider_core_fdw
make 
sudo make install
cd ../pgspider_fdw
make 
sudo make install

Default install directory is /usr/local/pgspider .

Usage

For example, we will create 2 different child nodes, SQLite and PostgreSQL. They are accessed by PGSpider as root node. Please install SQLite and PostgreSQL for child nodes.

After that, we install PostgreSQL FDW and SQLite FDW into PGSpider.

Install SQLite FDW

cd ../
git clone https://github.com/pgspider/sqlite_fdw.git
cd sqlite_fdw
make
sudo make install

Install PostgreSQL FDW

cd ../postgres_fdw
make 
sudo make install

Start PGSpider

PGSpider binary name is same as PostgreSQL. Default install direcotry is changed.

/usr/local/pgspider

Create database cluster and start server.

cd /usr/local/pgspider/bin
./initdb -D ~/pgspider_db
./pg_ctl -D ~/pgspider_db start

Connect to PGSpider.

./psql postgres

Load extension

PGSpider(Parent node)

CREATE EXTENSION pgspider_core_fdw;

PostgreSQL,SQLite(Child node)

CREATE EXTENSION postgres_fdw;
CREATE EXTENSION sqlite_fdw;

Create server

PGSpider(Parent node)

CREATE SERVER parent FOREIGN DATA WRAPPER pgspider_core_fdw OPTIONS (host '127.0.0.1', port '5432');

PostgreSQL,SQLite (Child node)

In this example, child PostgreSQL node is localhost and port is 15432.

SQLite node's database is /tmp/temp.db.

CREATE SERVER postgres_svr FOREIGN DATA WRAPPER postgres_fdw OPTIONS(host '127.0.0.1', port '15432', dbname 'postgres');
CREATE SERVER sqlite_svr FOREIGN DATA WRAPPER sqlite_fdw OPTIONS(database '/tmp/temp.db');

Create user mapping

PGSpider(Parent node)

Create user mapping for PGSpider. User and password are for current psql user.

CREATE USER MAPPING FOR CURRENT_USER SERVER parent OPTIONS(user 'user', password 'pass');

PostgreSQL(Child node)

CREATE USER MAPPING FOR CURRENT_USER SERVER postgres_svr OPTIONS(user 'user', password 'pass');

SQLite(Child node)

No need to create user mapping.

Create Multi-Tenant table

PGSpider(Parent node)

You need to declare a column named "__spd_url" on parent table. This column is node location in PGSpider. It allows you to know where the data is comming from node.

In this example, we define 't1' table to get data from PostgreSQL node and SQLite node.

CREATE FOREIGN TABLE t1(i int, t text, __spd_url text) SERVER parent;

When expanding Multi-Tenant table to data source tables, PGSpider searches child node tables by name having [Multi-Tenant table name]__[data source name]__0.

PostgreSQL,SQLite(Child node)

CREATE FOREIGN TABLE t1__postgres_svr__0(i int, t text) SERVER postgres_svr OPTIONS (table_name 't1');
CREATE FOREIGN TABLE t1__sqlite_svr__0(i int, t text) SERVER sqlite_svr OPTIONS (table 't1');

Access Multi-Tenant table

SELECT * FROM t1;

  i |  t  | __spd_url 
----+-----+----------------
  1 | aaa | /sqlite_svr/
  2 | bbb | /sqlite_svr/
 10 | a   | /postgres_svr/
 11 | b   | /postgres_svr/
(4 rows)

Access Multi-Tenant table using node filter

You can choose getting node with 'IN' clause after FROM items(Table name).

SELECT * FROM t1 IN ('/postgres_svr/');

  i | t | __spd_url 
----+---+----------------
 10 | a | /postgres_svr/
 11 | b | /postgres_svr/
(2 rows)

Tree Structure

PGSpider can get data from child PGSpider, it means PGSpider can create tree structure.

For example, we will create a new PGSpider as root node which connects to PGSpider of previous example. The new root node is parent of previous PGSpider node.

Start new root PGSpider

Create new database cluster with initdb and change port number.

After that, start and connect to new root node.

Load extension

PGSpider(new root node)

If child node is PGSpider, PGSpider use pgspider_fdw.

CREATE EXTENSION pgspider_core_fdw;
CREATE EXTENSION pgspider_fdw;

Create server

PGSpider(new root node)

CREATE SERVER new_root FOREIGN DATA WRAPPER pgspider_core_fdw OPTIONS (host '127.0.0.1', port '54321') ;

PGSpider(Parent node)

CREATE SERVER parent FOREIGN DATA WRAPPER pgspider_svr OPTIONS
(host '127.0.0.1', port '5432') ;

Create user mapping

PGSpider(new root node)

CREATE USER MAPPING FOR CURRENT_USER SERVER new_root OPTIONS(user 'user', password 'pass');

PostgreSQL(Parent node)

CREATE USER MAPPING FOR CURRENT_USER SERVER parent OPTIONS(user 'user', password 'pass');

Create Multi-Tenant table

PGSpider(new root node)

CREATE FOREIGN TABLE t1(i int, t text, __spd_url text) SERVER new_root;

PGSpider(Parent node)

CREATE FOREIGN TABLE t1__parent__0(i int, t text, __spd_url text) SERVER parent;

Access Multi-Tenant table

SELECT * FROM t1;

  i |  t  |      __spd_url 
----+-----+-----------------------
  1 | aaa | /parent/sqlite_svr/
  2 | bbb | /parent/sqlite_svr/
 10 | a   | /parent/postgres_svr/
 11 | b   | /parent/postgres_svr/
(4 rows)

Note

When a query to foreign tables fails, you can find why it fails by seeing a query executed in PGSpider with EXPLAIN (VERBOSE) .

Contributing

Opening issues and pull requests on GitHub are welcome.

License

Copyright and license information can be found in the file License .


以上就是本文的全部内容,希望对大家的学习有所帮助,也希望大家多多支持 码农网

查看所有标签

猜你喜欢:

本站部分资源来源于网络,本站转载出于传递更多信息之目的,版权归原作者或者来源机构所有,如转载稿涉及版权问题,请联系我们

Python神经网络编程

Python神经网络编程

[英]塔里克·拉希德(Tariq Rashid) / 林赐 / 人民邮电出版社 / 2018-4 / 69.00元

神经网络是一种模拟人脑的神经网络,以期能够实现类人工智能的机器学习 技术。 本书揭示神经网络背后的概念,并介绍如何通过Python实现神经网络。全书 分为3章和两个附录。第1章介绍了神经网络中所用到的数学思想。第2章介绍使 用Python实现神经网络,识别手写数字,并测试神经网络的性能。第3章带领读 者进一步了解简单的神经网络,观察已受训练的神经网络内部,尝试进一步改......一起来看看 《Python神经网络编程》 这本书的介绍吧!

JSON 在线解析
JSON 在线解析

在线 JSON 格式化工具

MD5 加密
MD5 加密

MD5 加密工具

RGB HSV 转换
RGB HSV 转换

RGB HSV 互转工具