内容简介:greenplum集群启动失败问题分析
开发同事跟我说,测试环境的greenplun突然连接不上了,于是我登陆进去服务器,发现没有greenplun进程了,问开发同事是否有对greenplumn有过改动之类的,他们说没有动过,这就奇了怪了,咋回事呢?
自己手动尝试下gpstart启动报错
[gpadmin@00_mdw ~]$ gpstart 20170517:10:53:59:017586 gpstart:00_mdw:gpadmin-[INFO]:-Starting gpstart with args: 20170517:10:53:59:017586 gpstart:00_mdw:gpadmin-[INFO]:-Gathering information and validating the environment... 20170517:10:53:59:017586 gpstart:00_mdw:gpadmin-[INFO]:-Greenplum Binary Version: 'postgres (Greenplum Database) 4.3.10.0 build commit: f413ff3b006655f14b6b9aa217495ec94da5c96c' 20170517:10:53:59:017586 gpstart:00_mdw:gpadmin-[INFO]:-Greenplum Catalog Version: '201310150' 20170517:10:53:59:017586 gpstart:00_mdw:gpadmin-[INFO]:-Starting Master instance in admin mode 20170517:10:54:01:017586 gpstart:00_mdw:gpadmin-[CRITICAL]:-Failed to start Master instance in admin mode 20170517:10:54:01:017586 gpstart:00_mdw:gpadmin-[CRITICAL]:-Error occurred: non-zero rc: 1 Command was: 'env GPSESSID=0000000000 GPERA=None $GPHOME/bin/pg_ctl -D /home/gpadmin/gpdata/gpmaster/gpseg-1 -l /home/gpadmin/gpdata/gpmaster/gpseg-1/pg_log/startup.log -w -t 600 -o " -p 5432 -b 1 -z 0 --silent-mode=true -i -M master -C -1 -x 0 -c gp_role=utility " start' rc=1, stdout='waiting for server to start...... stopped waiting ', stderr='pg_ctl: PID file "/home/gpadmin/gpdata/gpmaster/gpseg-1/postmaster.pid" does not exist pg_ctl: could not start server Examine the log output. ' [gpadmin@00_mdw ~]$
日志信息比较简单,没有看出来啥有用的信息,砸破呢?
2017-05-16 11:18:20.666964 CST,,,p16542,th251283232,,,,0,,,seg-1,,,,,"LOG","00000","removing all temporary files",,,,,,,,"RemovePgTempFiles","fd.c",1873, 2017-05-16 11:18:20.692596 CST,,,p16542,th251283232,,,,0,,,seg-1,,,,,"LOG","00000","temporary files using default filespace",,,,,,,,"primaryMirrorPopulateFilespaceInfo","primary_mirror_mode.c",2569, 2017-05-16 11:18:20.693209 CST,,,p16542,th251283232,,,,0,,,seg-1,,,,,"LOG","00000","transaction files using default pg_system filespace",,,,,,,,"primaryMirrorPopulateFilespaceInfo","primary_mirror_mode.c",2629, 2017-05-16 13:27:17.059691 CST,,,p16630,th930637600,,,,0,,,seg-1,,,,,"LOG","00000","removing all temporary files",,,,,,,,"RemovePgTempFiles","fd.c",1873, 2017-05-16 13:27:17.062897 CST,,,p16630,th930637600,,,,0,,,seg-1,,,,,"LOG","00000","temporary files using default filespace",,,,,,,,"primaryMirrorPopulateFilespaceInfo","primary_mirror_mode.c",2569, 2017-05-16 13:27:17.063528 CST,,,p16630,th930637600,,,,0,,,seg-1,,,,,"LOG","00000","transaction files using default pg_system filespace",,,,,,,,"primaryMirrorPopulateFilespaceInfo","primary_mirror_mode.c",2629, 2017-05-17 10:53:59.610428 CST,,,p17597,th695740192,,,,0,,,seg-1,,,,,"LOG","00000","removing all temporary files",,,,,,,,"RemovePgTempFiles","fd.c",1873, 2017-05-17 10:53:59.643630 CST,,,p17597,th695740192,,,,0,,,seg-1,,,,,"LOG","00000","temporary files using default filespace",,,,,,,,"primaryMirrorPopulateFilespaceInfo","primary_mirror_mode.c",2569, 2017-05-17 10:53:59.644220 CST,,,p17597,th695740192,,,,0,,,seg-1,,,,,"LOG","00000","transaction files using default pg_system filespace",,,,,,,,"primaryMirrorPopulateFilespaceInfo","primary_mirror_mode.c",2629,
去日志目录下面去查看所有的日志记录,看到最新的有一个.csv文件,gpdb-2017-05-17_112454.csv
博客来源地址: http://blog.csdn.net/mchdba/article/details/72383684 ,作者为mchdba黄杉,谢绝转载。
[gpadmin@00_mdw pg_log]$ ll -t total 740 -rw-------. 1 gpadmin gpadmin 386 May 17 11:24 gpdb-2017-05-17_112454.csv -rw-------. 1 gpadmin gpadmin 3951 May 17 11:24 startup.log -rw-------. 1 gpadmin gpadmin 384 May 17 10:53 gpdb-2017-05-17_105359.csv -rw-------. 1 gpadmin gpadmin 384 May 16 13:27 gpdb-2017-05-16_132717.csv -rw-------. 1 gpadmin gpadmin 384 May 16 11:18 gpdb-2017-05-16_111820.csv -rw-------. 1 gpadmin gpadmin 30004 May 16 11:17 gpdb-2017-05-16_000000.csv -rw-------. 1 gpadmin gpadmin 0 May 15 00:00 gpdb-2017-05-15_000000.csv -rw-------. 1 gpadmin gpadmin 0 May 14 00:00 gpdb-2017-05-14_000000.csv -rw-------. 1 gpadmin gpadmin 0 May 13 00:00 gpdb-2017-05-13_000000.csv -rw-------. 1 gpadmin gpadmin 0 May 12 00:00 gpdb-2017-05-12_000000.csv -rw-------. 1 gpadmin gpadmin 0 May 11 00:00 gpdb-2017-05-11_000000.csv -rw-------. 1 gpadmin gpadmin 0 May 10 00:00 gpdb-2017-05-10_000000.csv -rw-------. 1 gpadmin gpadmin 13073 May 9 21:14 gpdb-2017-05-09_000000.csv -rw-------. 1 gpadmin gpadmin 18458 May 8 11:38 gpdb-2017-05-08_000000.csv -rw-------. 1 gpadmin gpadmin 0 May 7 00:00 gpdb-2017-05-07_000000.csv [gpadmin@00_mdw pg_log]$ more gpdb-2017-05-17_112454.csv 2017-05-17 11:24:54.936656 CST,,,p17681,th-400611552,,,,0,,,seg-1,,,,,"LOG","F0000","invalid authentication method ""127.0.0.1/28""",,,,,"line 87 of configuration file ""/home/gpadmin/gpdata/gpmaster/gpseg-1/pg_hba.conf""",,0,,"hba.c",1095, 2017-05-17 11:24:54.936871 CST,,,p17681,th-400611552,,,,0,,,seg-1,,,,,"FATAL","XX000","could not load pg_hba.conf",,,,,,,0,,"postmaster.c",1529, [gpadmin@00_mdw pg_log]$
看到gpdb-2017-05-17_112454.csv文件里面描述的很清晰,是pg_hba.conf配置文件有误,然后去找配置文件/home/gpadmin/gpdata/gpmaster/gpseg-1/pg_hba.conf,注释掉报错的那一行【line 87 of configuration file 】”127.0.0.1/28”“
#local all all 127.0.0.1/28 trust
然后再次启动greenplum集群,ok,可以启动起来了
[gpadmin@00_mdw pg_log]$ gpstart 20170517:11:28:20:017745 gpstart:00_mdw:gpadmin-[INFO]:-Starting gpstart with args: 20170517:11:28:20:017745 gpstart:00_mdw:gpadmin-[INFO]:-Gathering information and validating the environment... 20170517:11:28:20:017745 gpstart:00_mdw:gpadmin-[INFO]:-Greenplum Binary Version: 'postgres (Greenplum Database) 4.3.10.0 build commit: f413ff3b006655f14b6b9aa217495ec94da5c96c' 20170517:11:28:20:017745 gpstart:00_mdw:gpadmin-[INFO]:-Greenplum Catalog Version: '201310150' 20170517:11:28:20:017745 gpstart:00_mdw:gpadmin-[INFO]:-Starting Master instance in admin mode 20170517:11:28:21:017745 gpstart:00_mdw:gpadmin-[INFO]:-Obtaining Greenplum Master catalog information 20170517:11:28:21:017745 gpstart:00_mdw:gpadmin-[INFO]:-Obtaining Segment details from master... 20170517:11:28:21:017745 gpstart:00_mdw:gpadmin-[INFO]:-Setting new master era 20170517:11:28:21:017745 gpstart:00_mdw:gpadmin-[INFO]:-Master Started... 20170517:11:28:21:017745 gpstart:00_mdw:gpadmin-[INFO]:-Shutting down master 20170517:11:28:23:017745 gpstart:00_mdw:gpadmin-[WARNING]:-Skipping startup of segment marked down in configuration: on 02_sdw directory /home/gpadmin/gpdata/gpdatam1/gpseg0 <<<<< 20170517:11:28:23:017745 gpstart:00_mdw:gpadmin-[WARNING]:-Skipping startup of segment marked down in configuration: on 02_sdw directory /home/gpadmin/gpdata/gpdatam2/gpseg1 <<<<< 20170517:11:28:23:017745 gpstart:00_mdw:gpadmin-[WARNING]:-Skipping startup of segment marked down in configuration: on 01_sdw directory /home/gpadmin/gpdata/gpdatam1/gpseg4 <<<<< 20170517:11:28:23:017745 gpstart:00_mdw:gpadmin-[WARNING]:-Skipping startup of segment marked down in configuration: on 01_sdw directory /home/gpadmin/gpdata/gpdatam2/gpseg5 <<<<< 20170517:11:28:23:017745 gpstart:00_mdw:gpadmin-[INFO]:--------------------------- 20170517:11:28:23:017745 gpstart:00_mdw:gpadmin-[INFO]:-Master instance parameters 20170517:11:28:23:017745 gpstart:00_mdw:gpadmin-[INFO]:--------------------------- 20170517:11:28:23:017745 gpstart:00_mdw:gpadmin-[INFO]:-Database = template1 20170517:11:28:23:017745 gpstart:00_mdw:gpadmin-[INFO]:-Master Port = 5432 20170517:11:28:23:017745 gpstart:00_mdw:gpadmin-[INFO]:-Master directory = /home/gpadmin/gpdata/gpmaster/gpseg-1 20170517:11:28:23:017745 gpstart:00_mdw:gpadmin-[INFO]:-Timeout = 600 seconds 20170517:11:28:23:017745 gpstart:00_mdw:gpadmin-[INFO]:-Master standby = Off 20170517:11:28:23:017745 gpstart:00_mdw:gpadmin-[INFO]:--------------------------------------- 20170517:11:28:23:017745 gpstart:00_mdw:gpadmin-[INFO]:-Segment instances that will be started 20170517:11:28:23:017745 gpstart:00_mdw:gpadmin-[INFO]:--------------------------------------- 20170517:11:28:23:017745 gpstart:00_mdw:gpadmin-[INFO]:- Host Datadir Port Role 20170517:11:28:23:017745 gpstart:00_mdw:gpadmin-[INFO]:- 01_sdw /home/gpadmin/gpdata/gpdatap1/gpseg0 40000 Primary 20170517:11:28:23:017745 gpstart:00_mdw:gpadmin-[INFO]:- 01_sdw /home/gpadmin/gpdata/gpdatap2/gpseg1 40001 Primary 20170517:11:28:23:017745 gpstart:00_mdw:gpadmin-[INFO]:- 02_sdw /home/gpadmin/gpdata/gpdatap1/gpseg2 40000 Primary 20170517:11:28:23:017745 gpstart:00_mdw:gpadmin-[INFO]:- 03_sdwm /home/gpadmin/gpdata/gpdatam1/gpseg2 50000 Mirror 20170517:11:28:23:017745 gpstart:00_mdw:gpadmin-[INFO]:- 02_sdw /home/gpadmin/gpdata/gpdatap2/gpseg3 40001 Primary 20170517:11:28:23:017745 gpstart:00_mdw:gpadmin-[INFO]:- 03_sdwm /home/gpadmin/gpdata/gpdatam2/gpseg3 50001 Mirror 20170517:11:28:23:017745 gpstart:00_mdw:gpadmin-[INFO]:- 03_sdwm /home/gpadmin/gpdata/gpdatap1/gpseg4 40000 Primary 20170517:11:28:23:017745 gpstart:00_mdw:gpadmin-[INFO]:- 03_sdwm /home/gpadmin/gpdata/gpdatap2/gpseg5 40001 Primary Continue with Greenplum instance startup Yy|Nn (default=N): > y 20170517:11:28:25:017745 gpstart:00_mdw:gpadmin-[INFO]:-Commencing parallel primary and mirror segment instance startup, please wait... ... 20170517:11:28:28:017745 gpstart:00_mdw:gpadmin-[INFO]:-Process results... 20170517:11:28:28:017745 gpstart:00_mdw:gpadmin-[INFO]:----------------------------------------------------- 20170517:11:28:28:017745 gpstart:00_mdw:gpadmin-[INFO]:- Successful segment starts = 8 20170517:11:28:28:017745 gpstart:00_mdw:gpadmin-[INFO]:- Failed segment starts = 0 20170517:11:28:28:017745 gpstart:00_mdw:gpadmin-[WARNING]:-Skipped segment starts (segments are marked down in configuration) = 4 <<<<<<<< 20170517:11:28:28:017745 gpstart:00_mdw:gpadmin-[INFO]:----------------------------------------------------- 20170517:11:28:28:017745 gpstart:00_mdw:gpadmin-[INFO]:- 20170517:11:28:28:017745 gpstart:00_mdw:gpadmin-[INFO]:-Successfully started 8 of 8 segment instances, skipped 4 other segments 20170517:11:28:28:017745 gpstart:00_mdw:gpadmin-[INFO]:----------------------------------------------------- 20170517:11:28:28:017745 gpstart:00_mdw:gpadmin-[WARNING]:-**************************************************************************** 20170517:11:28:28:017745 gpstart:00_mdw:gpadmin-[WARNING]:-There are 4 segment(s) marked down in the database 20170517:11:28:28:017745 gpstart:00_mdw:gpadmin-[WARNING]:-To recover from this current state, review usage of the gprecoverseg 20170517:11:28:28:017745 gpstart:00_mdw:gpadmin-[WARNING]:-management utility which will recover failed segment instance databases. 20170517:11:28:28:017745 gpstart:00_mdw:gpadmin-[WARNING]:-**************************************************************************** 20170517:11:28:28:017745 gpstart:00_mdw:gpadmin-[INFO]:-Starting Master instance 00_mdw directory /home/gpadmin/gpdata/gpmaster/gpseg-1 20170517:11:28:29:017745 gpstart:00_mdw:gpadmin-[INFO]:-Command pg_ctl reports Master 00_mdw instance active 20170517:11:28:30:017745 gpstart:00_mdw:gpadmin-[INFO]:-No standby master configured. skipping... 20170517:11:28:30:017745 gpstart:00_mdw:gpadmin-[WARNING]:-Number of segments not attempted to start: 4 20170517:11:28:30:017745 gpstart:00_mdw:gpadmin-[INFO]:-Check status of database with gpstate utility [gpadmin@00_mdw pg_log]$
bty有意思的是greenplum的关键报错信息竟然不在log日志里面,而是记录在了同目录的csv文件里面,这大大惊呆我,哈哈。
最后问题分析,为啥这条127的配置,greenplum就起不起来了呢,去查看pg_hba.conf文件,猜测原因有如下情况:
(1)因为已经有了一个127.0.0.1/28的配置了,导致相互冲突了
[gpadmin@00_mdw ~]$ more /home/gpadmin/gpdata/gpmaster/gpseg-1/pg_hba.conf |grep 127 host all gpadmin 127.0.0.1/28 trust #local all all 127.0.0.1/28 trust [gpadmin@00_mdw ~]$
(2)local后面只能跟ident之类的配置,不能跟127…..trust的配置
[gpadmin@00_mdw ~]$ more /home/gpadmin/gpdata/gpmaster/gpseg-1/pg_hba.conf |grep local |grep -v "#" local all gpadmin ident local replication gpadmin ident #local all all 127.0.0.1/28 trust [gpadmin@00_mdw ~]$
以上所述就是小编给大家介绍的《greenplum集群启动失败问题分析》,希望对大家有所帮助,如果大家有任何疑问请给我留言,小编会及时回复大家的。在此也非常感谢大家对 码农网 的支持!
猜你喜欢:- greenplum 集群启动失败
- 快速失败机制 & 失败安全机制
- 通过不断地失败来避免失败,携程混沌工程实践
- 快速失败(fail-fast)和安全失败(fail-safe)
- Nginx 失败重试机制
- 一次换工作的失败总结
本站部分资源来源于网络,本站转载出于传递更多信息之目的,版权归原作者或者来源机构所有,如转载稿涉及版权问题,请联系我们。
Elements of Programming
Alexander A. Stepanov、Paul McJones / Addison-Wesley Professional / 2009-6-19 / USD 39.99
Elements of Programming provides a different understanding of programming than is presented elsewhere. Its major premise is that practical programming, like other areas of science and engineering, mus......一起来看看 《Elements of Programming》 这本书的介绍吧!
JSON 在线解析
在线 JSON 格式化工具
Base64 编码/解码
Base64 编码/解码