PostgreSQL数据加载工具之pg_bulkload

栏目: 数据库 · PostgreSQL · 发布时间: 7年前

内容简介:1. 介绍PostgreSQL提供了一个copy命令的便利数据加载工具,copy命令源于PostgreSQL数据库,copy命令支持文件与表之间的数据加载和表对文件的数据卸载。pg_bulkload是一种用于PostgreSQL的高速数据加载工具,相比copy命令。最大的优势就是速度。优势在让我们跳过shared buffer,wal buffer。直接写文件。pg_bulkload的direct模式就是这种思路来实现的,它还包含了数据恢复功能,即导入失败的话,需要恢复。2. pg_bulkload架构图

1. 介绍

PostgreSQL提供了一个copy命令的便利数据加载工具,copy命令源于PostgreSQL数据库,copy命令支持文件与表之间的数据加载和表对文件的数据卸载。pg_bulkload是一种用于PostgreSQL的高速数据加载工具,相比copy命令。最大的优势就是速度。优势在让我们跳过shared buffer,wal buffer。直接写文件。pg_bulkload的direct模式就是这种思路来实现的,它还包含了数据恢复功能,即导入失败的话,需要恢复。

2. pg_bulkload架构图

pg_bulkload主要包括两个模块:reader和writer。reader负责读取文件、解析tuple,writer负责把解析出的tuple写入输出源中。pg_bulkload最初的版本功能很简单,只是加载数据。3.1版本增加了数据过滤的功能。

PostgreSQL数据加载 <a href='https://www.codercto.com/tool.html'>工具</a> 之pg_bulkload

3. pg_bulkload安装

[root@Postgres201 ~]# unzip pg_bulkload-VERSION3_1_10.zip

[root@Postgres201 ~]# cd pg_bulkload-VERSION3_1_10

[root@Postgres201 pg_bulkload-VERSION3_1_10]# make

[root@Postgres201 pg_bulkload-VERSION3_1_10]# make install

安装完成;要使用它需要建extension

[postgres@Postgres201 ~]$ psql lottu lottu

psql (9.6.0)

Type "help" for help.

lottu=# create extension pg_bulkload;

CREATE EXTENSION

4. pg_bulkload参数

[postgres@Postgres201 ~]$ pg_bulkload --help

pg_bulkload is a bulk data loading tool for PostgreSQL

Usage:

Dataload: pg_bulkload [dataload options] control_file_path

Recovery: pg_bulkload -r [-D DATADIR]

Dataload options:

-i, --input=INPUT        INPUT path or function

-O, --output=OUTPUT      OUTPUT path or table

-l, --logfile=LOGFILE    LOGFILE path

-P, --parse-badfile=*    PARSE_BADFILE path

-u, --duplicate-badfile=* DUPLICATE_BADFILE path

-o, --option="key=val"    additional option

Recovery options:

-r, --recovery            execute recovery

-D, --pgdata=DATADIR      database directory

Connection options:

-d, --dbname=DBNAME      database to connect

-h, --host=HOSTNAME      database server host or socket directory

-p, --port=PORT          database server port

-U, --username=USERNAME  user name to connect as

-w, --no-password        never prompt for password

-W, --password            force password prompt

Generic options:

-e, --echo                echo queries

-E, --elevel=LEVEL        set output message level

--help                    show this help, then exit

--version                output version information, then exit

5. pg_bulkload的使用

创建测试表tbl_lottu和测试文件tbl_lottu_output.txt

[postgres@Postgres201 ~]$ psql lottu lottu

psql (9.6.0)

Type "help" for help.

lottu=# create table tbl_lottu(id int,name text);

CREATE TABLE

[postgres@Postgres201 ~]$  seq 100000| awk '{print $0"|lottu"}' > tbl_lottu_output.txt

1.不使用控制文件使用参数

[postgres@Postgres201 ~]$ pg_bulkload -i /home/postgres/tbl_lottu_output.txt -O tbl_lottu -l /home/postgres/tbl_lottu_output.log -P /home/postgres/tbl_lottu_bad.txt  -o "TYPE=CSV" -o "DELIMITER=|" -d lottu -U lottu

NOTICE: BULK LOAD START

NOTICE: BULK LOAD END

0 Rows skipped.

100000 Rows successfully loaded.

0 Rows not loaded due to parse errors.

0 Rows not loaded due to duplicate errors.

0 Rows replaced with new rows.

[postgres@Postgres201 ~]$ cat tbl_lottu_output.log

pg_bulkload 3.1.9 on 2018-07-12 13:37:18.326685+08

INPUT = /home/postgres/tbl_lottu_output.txt

PARSE_BADFILE = /home/postgres/tbl_lottu_bad.txt

LOGFILE = /home/postgres/tbl_lottu_output.log

LIMIT = INFINITE

PARSE_ERRORS = 0

CHECK_CONSTRAINTS = NO

TYPE = CSV

SKIP = 0

DELIMITER = |

QUOTE = "\""

ESCAPE = "\""

NULL =

OUTPUT = lottu.tbl_lottu

MULTI_PROCESS = NO

VERBOSE = NO

WRITER = DIRECT

DUPLICATE_BADFILE = /data/postgres/data/pg_bulkload/20180712133718_lottu_lottu_tbl_lottu.dup.csv

DUPLICATE_ERRORS = 0

ON_DUPLICATE_KEEP = NEW

TRUNCATE = NO

0 Rows skipped.

100000 Rows successfully loaded.

0 Rows not loaded due to parse errors.

0 Rows not loaded due to duplicate errors.

0 Rows replaced with new rows.

Run began on 2018-07-12 13:37:18.326685+08

Run ended on 2018-07-12 13:37:18.594494+08

CPU 0.14s/0.07u sec elapsed 0.27 sec

2. 导入之前先清理表数据

[postgres@Postgres201 ~]$ pg_bulkload -i /home/postgres/tbl_lottu_output.txt -O tbl_lottu -l /home/postgres/tbl_lottu_output.log -P /home/postgres/tbl_lottu_bad.txt  -o "TYPE=CSV" -o "DELIMITER=|" -o "TRUNCATE=YES" -d lottu -U lottu

NOTICE: BULK LOAD START

NOTICE: BULK LOAD END

0 Rows skipped.

100000 Rows successfully loaded.

0 Rows not loaded due to parse errors.

0 Rows not loaded due to duplicate errors.

0 Rows replaced with new rows.

[postgres@Postgres201 ~]$ psql lottu lottu -c "select count(1) from tbl_lottu;"

count 

--------

100000

(1 row)

3. 使用控制文件

新建控制文件lottu.ctl

INPUT = /home/postgres/lotu01

PARSE_BADFILE = /home/postgres/tbl_lottu_bad.txt

LOGFILE = /home/postgres/tbl_lottu_output.log

LIMIT = INFINITE

PARSE_ERRORS = 0

CHECK_CONSTRAINTS = NO

TYPE = CSV

SKIP = 5

DELIMITER = |

QUOTE = "\""

ESCAPE = "\""

OUTPUT = lottu.tbl_lottu

MULTI_PROCESS = NO

WRITER = DIRECT

DUPLICATE_BADFILE = /home/postgres/tbl_lottu.dup.csv

DUPLICATE_ERRORS = 0

ON_DUPLICATE_KEEP = NEW

TRUNCATE = YES

使用控制文件进行加载操作

pg_bulkload  /home/postgres/lottu.ctl -d lottu -U lottu

[postgres@Postgres201 ~]$ pg_bulkload  /home/postgres/lottu.ctl -d lottu -U lottu

NOTICE: BULK LOAD START

NOTICE: BULK LOAD END

5 Rows skipped.

95 Rows successfully loaded.

0 Rows not loaded due to parse errors.

0 Rows not loaded due to duplicate errors.

0 Rows replaced with new rows.

6. 总结

pg_bulkload是一种用于PostgreSQL的高速数据加载工具,相比copy命令。最大的优势就是速度。优势在让我们跳过shared buffer,wal buffer。直接写文件。pg_bulkload的direct模式就是这种思路来实现的。不足的是;表字段的顺序要跟导入的文件报错一致。希望后续版本能开发。

Linux下RPM包方式安装PostgreSQL https://www.linuxidc.com/Linux/2016-03/128906.htm

Linux下安装PostgreSQL  https://www.linuxidc.com/Linux/2016-12/138765.htm

Linux下PostgreSQL安装部署指南  https://www.linuxidc.com/Linux/2016-11/137603.htm

Linux下安装 PostgreSQL 并设置基本参数  https://www.linuxidc.com/Linux/2016-11/137324.htm

Ubuntu 16.04 下 PostgreSQL 主从复制配置  https://www.linuxidc.com/Linux/2017-08/146190.htm

Fedota 24 将数据库升级到 PostgreSQL 9.5  https://www.linuxidc.com/Linux/2016-11/137374.htm

CentOS7安装配置PostgreSQL9.6  https://www.linuxidc.com/Linux/2017-10/147536.htm

CentOS5.8_x64下离线安装PostgreSQL 9.1  https://www.linuxidc.com/Linux/2017-10/147822.htm

CentOS 6.5下PostgreSQL服务部署  https://www.linuxidc.com/Linux/2017-01/139144.htm

Linux公社的RSS地址: https://www.linuxidc.com/rssFeed.aspx

本文永久更新链接地址: https://www.linuxidc.com/Linux/2018-07/153166.htm


以上就是本文的全部内容,希望本文的内容对大家的学习或者工作能带来一定的帮助,也希望大家多多支持 码农网

查看所有标签

猜你喜欢:

本站部分资源来源于网络,本站转载出于传递更多信息之目的,版权归原作者或者来源机构所有,如转载稿涉及版权问题,请联系我们

JavaScript Patterns

JavaScript Patterns

Stoyan Stefanov / O'Reilly Media, Inc. / 2010-09-21 / USD 29.99

What's the best approach for developing an application with JavaScript? This book helps you answer that question with numerous JavaScript coding patterns and best practices. If you're an experienced d......一起来看看 《JavaScript Patterns》 这本书的介绍吧!

在线进制转换器
在线进制转换器

各进制数互转换器

SHA 加密
SHA 加密

SHA 加密工具

XML、JSON 在线转换
XML、JSON 在线转换

在线XML、JSON转换工具