greenplum集群启动失败问题分析

栏目: 数据库 · 发布时间: 8年前

内容简介:greenplum集群启动失败问题分析

开发同事跟我说,测试环境的greenplun突然连接不上了,于是我登陆进去服务器,发现没有greenplun进程了,问开发同事是否有对greenplumn有过改动之类的,他们说没有动过,这就奇了怪了,咋回事呢?

自己手动尝试下gpstart启动报错

[gpadmin@00_mdw ~]$ gpstart
20170517:10:53:59:017586 gpstart:00_mdw:gpadmin-[INFO]:-Starting gpstart with args: 
20170517:10:53:59:017586 gpstart:00_mdw:gpadmin-[INFO]:-Gathering information and validating the environment...
20170517:10:53:59:017586 gpstart:00_mdw:gpadmin-[INFO]:-Greenplum Binary Version: 'postgres (Greenplum Database) 4.3.10.0 build commit: f413ff3b006655f14b6b9aa217495ec94da5c96c'
20170517:10:53:59:017586 gpstart:00_mdw:gpadmin-[INFO]:-Greenplum Catalog Version: '201310150'
20170517:10:53:59:017586 gpstart:00_mdw:gpadmin-[INFO]:-Starting Master instance in admin mode
20170517:10:54:01:017586 gpstart:00_mdw:gpadmin-[CRITICAL]:-Failed to start Master instance in admin mode
20170517:10:54:01:017586 gpstart:00_mdw:gpadmin-[CRITICAL]:-Error occurred: non-zero rc: 1
 Command was: 'env GPSESSID=0000000000 GPERA=None $GPHOME/bin/pg_ctl -D /home/gpadmin/gpdata/gpmaster/gpseg-1 -l /home/gpadmin/gpdata/gpmaster/gpseg-1/pg_log/startup.log -w -t 600 -o " -p 5432 -b 1 -z 0 --silent-mode=true -i -M master -C -1 -x 0 -c gp_role=utility " start'
rc=1, stdout='waiting for server to start...... stopped waiting
', stderr='pg_ctl: PID file "/home/gpadmin/gpdata/gpmaster/gpseg-1/postmaster.pid" does not exist
pg_ctl: could not start server
Examine the log output.
'
[gpadmin@00_mdw ~]$

日志信息比较简单,没有看出来啥有用的信息,砸破呢?

2017-05-16 11:18:20.666964 CST,,,p16542,th251283232,,,,0,,,seg-1,,,,,"LOG","00000","removing all temporary files",,,,,,,,"RemovePgTempFiles","fd.c",1873,
2017-05-16 11:18:20.692596 CST,,,p16542,th251283232,,,,0,,,seg-1,,,,,"LOG","00000","temporary files using default filespace",,,,,,,,"primaryMirrorPopulateFilespaceInfo","primary_mirror_mode.c",2569,
2017-05-16 11:18:20.693209 CST,,,p16542,th251283232,,,,0,,,seg-1,,,,,"LOG","00000","transaction files using default pg_system filespace",,,,,,,,"primaryMirrorPopulateFilespaceInfo","primary_mirror_mode.c",2629,
2017-05-16 13:27:17.059691 CST,,,p16630,th930637600,,,,0,,,seg-1,,,,,"LOG","00000","removing all temporary files",,,,,,,,"RemovePgTempFiles","fd.c",1873,
2017-05-16 13:27:17.062897 CST,,,p16630,th930637600,,,,0,,,seg-1,,,,,"LOG","00000","temporary files using default filespace",,,,,,,,"primaryMirrorPopulateFilespaceInfo","primary_mirror_mode.c",2569,
2017-05-16 13:27:17.063528 CST,,,p16630,th930637600,,,,0,,,seg-1,,,,,"LOG","00000","transaction files using default pg_system filespace",,,,,,,,"primaryMirrorPopulateFilespaceInfo","primary_mirror_mode.c",2629,
2017-05-17 10:53:59.610428 CST,,,p17597,th695740192,,,,0,,,seg-1,,,,,"LOG","00000","removing all temporary files",,,,,,,,"RemovePgTempFiles","fd.c",1873,
2017-05-17 10:53:59.643630 CST,,,p17597,th695740192,,,,0,,,seg-1,,,,,"LOG","00000","temporary files using default filespace",,,,,,,,"primaryMirrorPopulateFilespaceInfo","primary_mirror_mode.c",2569,
2017-05-17 10:53:59.644220 CST,,,p17597,th695740192,,,,0,,,seg-1,,,,,"LOG","00000","transaction files using default pg_system filespace",,,,,,,,"primaryMirrorPopulateFilespaceInfo","primary_mirror_mode.c",2629,

去日志目录下面去查看所有的日志记录,看到最新的有一个.csv文件,gpdb-2017-05-17_112454.csv

博客来源地址: http://blog.csdn.net/mchdba/article/details/72383684 ,作者为mchdba黄杉,谢绝转载。

[gpadmin@00_mdw pg_log]$ ll -t
total 740
-rw-------. 1 gpadmin gpadmin    386 May 17 11:24 gpdb-2017-05-17_112454.csv
-rw-------. 1 gpadmin gpadmin   3951 May 17 11:24 startup.log
-rw-------. 1 gpadmin gpadmin    384 May 17 10:53 gpdb-2017-05-17_105359.csv
-rw-------. 1 gpadmin gpadmin    384 May 16 13:27 gpdb-2017-05-16_132717.csv
-rw-------. 1 gpadmin gpadmin    384 May 16 11:18 gpdb-2017-05-16_111820.csv
-rw-------. 1 gpadmin gpadmin  30004 May 16 11:17 gpdb-2017-05-16_000000.csv
-rw-------. 1 gpadmin gpadmin      0 May 15 00:00 gpdb-2017-05-15_000000.csv
-rw-------. 1 gpadmin gpadmin      0 May 14 00:00 gpdb-2017-05-14_000000.csv
-rw-------. 1 gpadmin gpadmin      0 May 13 00:00 gpdb-2017-05-13_000000.csv
-rw-------. 1 gpadmin gpadmin      0 May 12 00:00 gpdb-2017-05-12_000000.csv
-rw-------. 1 gpadmin gpadmin      0 May 11 00:00 gpdb-2017-05-11_000000.csv
-rw-------. 1 gpadmin gpadmin      0 May 10 00:00 gpdb-2017-05-10_000000.csv
-rw-------. 1 gpadmin gpadmin  13073 May  9 21:14 gpdb-2017-05-09_000000.csv
-rw-------. 1 gpadmin gpadmin  18458 May  8 11:38 gpdb-2017-05-08_000000.csv
-rw-------. 1 gpadmin gpadmin      0 May  7 00:00 gpdb-2017-05-07_000000.csv
[gpadmin@00_mdw pg_log]$ more gpdb-2017-05-17_112454.csv
2017-05-17 11:24:54.936656 CST,,,p17681,th-400611552,,,,0,,,seg-1,,,,,"LOG","F0000","invalid authentication method ""127.0.0.1/28""",,,,,"line 87 of configuration file ""/home/gpadmin/gpdata/gpmaster/gpseg-1/pg_hba.conf""",,0,,"hba.c",1095,
2017-05-17 11:24:54.936871 CST,,,p17681,th-400611552,,,,0,,,seg-1,,,,,"FATAL","XX000","could not load pg_hba.conf",,,,,,,0,,"postmaster.c",1529,
[gpadmin@00_mdw pg_log]$

看到gpdb-2017-05-17_112454.csv文件里面描述的很清晰,是pg_hba.conf配置文件有误,然后去找配置文件/home/gpadmin/gpdata/gpmaster/gpseg-1/pg_hba.conf,注释掉报错的那一行【line 87 of configuration file 】”127.0.0.1/28”“

#local all all 127.0.0.1/28 trust

然后再次启动greenplum集群,ok,可以启动起来了

[gpadmin@00_mdw pg_log]$ gpstart
20170517:11:28:20:017745 gpstart:00_mdw:gpadmin-[INFO]:-Starting gpstart with args: 
20170517:11:28:20:017745 gpstart:00_mdw:gpadmin-[INFO]:-Gathering information and validating the environment...
20170517:11:28:20:017745 gpstart:00_mdw:gpadmin-[INFO]:-Greenplum Binary Version: 'postgres (Greenplum Database) 4.3.10.0 build commit: f413ff3b006655f14b6b9aa217495ec94da5c96c'
20170517:11:28:20:017745 gpstart:00_mdw:gpadmin-[INFO]:-Greenplum Catalog Version: '201310150'
20170517:11:28:20:017745 gpstart:00_mdw:gpadmin-[INFO]:-Starting Master instance in admin mode
20170517:11:28:21:017745 gpstart:00_mdw:gpadmin-[INFO]:-Obtaining Greenplum Master catalog information
20170517:11:28:21:017745 gpstart:00_mdw:gpadmin-[INFO]:-Obtaining Segment details from master...
20170517:11:28:21:017745 gpstart:00_mdw:gpadmin-[INFO]:-Setting new master era
20170517:11:28:21:017745 gpstart:00_mdw:gpadmin-[INFO]:-Master Started...
20170517:11:28:21:017745 gpstart:00_mdw:gpadmin-[INFO]:-Shutting down master
20170517:11:28:23:017745 gpstart:00_mdw:gpadmin-[WARNING]:-Skipping startup of segment marked down in configuration: on 02_sdw directory /home/gpadmin/gpdata/gpdatam1/gpseg0 <<<<<
20170517:11:28:23:017745 gpstart:00_mdw:gpadmin-[WARNING]:-Skipping startup of segment marked down in configuration: on 02_sdw directory /home/gpadmin/gpdata/gpdatam2/gpseg1 <<<<<
20170517:11:28:23:017745 gpstart:00_mdw:gpadmin-[WARNING]:-Skipping startup of segment marked down in configuration: on 01_sdw directory /home/gpadmin/gpdata/gpdatam1/gpseg4 <<<<<
20170517:11:28:23:017745 gpstart:00_mdw:gpadmin-[WARNING]:-Skipping startup of segment marked down in configuration: on 01_sdw directory /home/gpadmin/gpdata/gpdatam2/gpseg5 <<<<<
20170517:11:28:23:017745 gpstart:00_mdw:gpadmin-[INFO]:---------------------------
20170517:11:28:23:017745 gpstart:00_mdw:gpadmin-[INFO]:-Master instance parameters
20170517:11:28:23:017745 gpstart:00_mdw:gpadmin-[INFO]:---------------------------
20170517:11:28:23:017745 gpstart:00_mdw:gpadmin-[INFO]:-Database                 = template1
20170517:11:28:23:017745 gpstart:00_mdw:gpadmin-[INFO]:-Master Port              = 5432
20170517:11:28:23:017745 gpstart:00_mdw:gpadmin-[INFO]:-Master directory         = /home/gpadmin/gpdata/gpmaster/gpseg-1
20170517:11:28:23:017745 gpstart:00_mdw:gpadmin-[INFO]:-Timeout                  = 600 seconds
20170517:11:28:23:017745 gpstart:00_mdw:gpadmin-[INFO]:-Master standby           = Off 
20170517:11:28:23:017745 gpstart:00_mdw:gpadmin-[INFO]:---------------------------------------
20170517:11:28:23:017745 gpstart:00_mdw:gpadmin-[INFO]:-Segment instances that will be started
20170517:11:28:23:017745 gpstart:00_mdw:gpadmin-[INFO]:---------------------------------------
20170517:11:28:23:017745 gpstart:00_mdw:gpadmin-[INFO]:-   Host      Datadir                                Port    Role
20170517:11:28:23:017745 gpstart:00_mdw:gpadmin-[INFO]:-   01_sdw    /home/gpadmin/gpdata/gpdatap1/gpseg0   40000   Primary
20170517:11:28:23:017745 gpstart:00_mdw:gpadmin-[INFO]:-   01_sdw    /home/gpadmin/gpdata/gpdatap2/gpseg1   40001   Primary
20170517:11:28:23:017745 gpstart:00_mdw:gpadmin-[INFO]:-   02_sdw    /home/gpadmin/gpdata/gpdatap1/gpseg2   40000   Primary
20170517:11:28:23:017745 gpstart:00_mdw:gpadmin-[INFO]:-   03_sdwm   /home/gpadmin/gpdata/gpdatam1/gpseg2   50000   Mirror
20170517:11:28:23:017745 gpstart:00_mdw:gpadmin-[INFO]:-   02_sdw    /home/gpadmin/gpdata/gpdatap2/gpseg3   40001   Primary
20170517:11:28:23:017745 gpstart:00_mdw:gpadmin-[INFO]:-   03_sdwm   /home/gpadmin/gpdata/gpdatam2/gpseg3   50001   Mirror
20170517:11:28:23:017745 gpstart:00_mdw:gpadmin-[INFO]:-   03_sdwm   /home/gpadmin/gpdata/gpdatap1/gpseg4   40000   Primary
20170517:11:28:23:017745 gpstart:00_mdw:gpadmin-[INFO]:-   03_sdwm   /home/gpadmin/gpdata/gpdatap2/gpseg5   40001   Primary

Continue with Greenplum instance startup Yy|Nn (default=N):
> y
20170517:11:28:25:017745 gpstart:00_mdw:gpadmin-[INFO]:-Commencing parallel primary and mirror segment instance startup, please wait...
... 
20170517:11:28:28:017745 gpstart:00_mdw:gpadmin-[INFO]:-Process results...
20170517:11:28:28:017745 gpstart:00_mdw:gpadmin-[INFO]:-----------------------------------------------------
20170517:11:28:28:017745 gpstart:00_mdw:gpadmin-[INFO]:-   Successful segment starts                                            = 8
20170517:11:28:28:017745 gpstart:00_mdw:gpadmin-[INFO]:-   Failed segment starts                                                = 0
20170517:11:28:28:017745 gpstart:00_mdw:gpadmin-[WARNING]:-Skipped segment starts (segments are marked down in configuration)   = 4   <<<<<<<<
20170517:11:28:28:017745 gpstart:00_mdw:gpadmin-[INFO]:-----------------------------------------------------
20170517:11:28:28:017745 gpstart:00_mdw:gpadmin-[INFO]:-
20170517:11:28:28:017745 gpstart:00_mdw:gpadmin-[INFO]:-Successfully started 8 of 8 segment instances, skipped 4 other segments 
20170517:11:28:28:017745 gpstart:00_mdw:gpadmin-[INFO]:-----------------------------------------------------
20170517:11:28:28:017745 gpstart:00_mdw:gpadmin-[WARNING]:-****************************************************************************
20170517:11:28:28:017745 gpstart:00_mdw:gpadmin-[WARNING]:-There are 4 segment(s) marked down in the database
20170517:11:28:28:017745 gpstart:00_mdw:gpadmin-[WARNING]:-To recover from this current state, review usage of the gprecoverseg
20170517:11:28:28:017745 gpstart:00_mdw:gpadmin-[WARNING]:-management utility which will recover failed segment instance databases.
20170517:11:28:28:017745 gpstart:00_mdw:gpadmin-[WARNING]:-****************************************************************************
20170517:11:28:28:017745 gpstart:00_mdw:gpadmin-[INFO]:-Starting Master instance 00_mdw directory /home/gpadmin/gpdata/gpmaster/gpseg-1 
20170517:11:28:29:017745 gpstart:00_mdw:gpadmin-[INFO]:-Command pg_ctl reports Master 00_mdw instance active
20170517:11:28:30:017745 gpstart:00_mdw:gpadmin-[INFO]:-No standby master configured.  skipping...
20170517:11:28:30:017745 gpstart:00_mdw:gpadmin-[WARNING]:-Number of segments not attempted to start: 4
20170517:11:28:30:017745 gpstart:00_mdw:gpadmin-[INFO]:-Check status of database with gpstate utility
[gpadmin@00_mdw pg_log]$

bty有意思的是greenplum的关键报错信息竟然不在log日志里面,而是记录在了同目录的csv文件里面,这大大惊呆我,哈哈。

最后问题分析,为啥这条127的配置,greenplum就起不起来了呢,去查看pg_hba.conf文件,猜测原因有如下情况:

(1)因为已经有了一个127.0.0.1/28的配置了,导致相互冲突了

[gpadmin@00_mdw ~]$ more /home/gpadmin/gpdata/gpmaster/gpseg-1/pg_hba.conf |grep 127
host     all         gpadmin         127.0.0.1/28    trust
#local    all         all             127.0.0.1/28      trust
[gpadmin@00_mdw ~]$ 

(2)local后面只能跟ident之类的配置,不能跟127…..trust的配置

[gpadmin@00_mdw ~]$ more /home/gpadmin/gpdata/gpmaster/gpseg-1/pg_hba.conf |grep local |grep -v "#"
local    all         gpadmin         ident
local    replication gpadmin         ident
#local    all         all             127.0.0.1/28      trust
[gpadmin@00_mdw ~]$

以上所述就是小编给大家介绍的《greenplum集群启动失败问题分析》,希望对大家有所帮助,如果大家有任何疑问请给我留言,小编会及时回复大家的。在此也非常感谢大家对 码农网 的支持!

查看所有标签

猜你喜欢:

本站部分资源来源于网络,本站转载出于传递更多信息之目的,版权归原作者或者来源机构所有,如转载稿涉及版权问题,请联系我们

Elements of Programming

Elements of Programming

Alexander A. Stepanov、Paul McJones / Addison-Wesley Professional / 2009-6-19 / USD 39.99

Elements of Programming provides a different understanding of programming than is presented elsewhere. Its major premise is that practical programming, like other areas of science and engineering, mus......一起来看看 《Elements of Programming》 这本书的介绍吧!

JSON 在线解析
JSON 在线解析

在线 JSON 格式化工具

Base64 编码/解码
Base64 编码/解码

Base64 编码/解码