内容简介:PMON started with pid=2, OS id=25005
每年年底,系统管理员都要组织一次容灾方案的测试、演练。会在一个与生产环境网络隔离的 DR环境中 ,启动各个 “ 生产环境服务器 ” ,然后让各路人员参与其中测试、演练容灾方案是否可靠。这次演练中,一台Oracle数据库服务器启动的时候遇到了问题。如下所示,启动的时候遇到ORA-03113: end-of-file on communication channel错误。
[oracle@mylnx6 ~]$ sqlplus / as sysdba SQL*Plus: Release 10.2.0.5.0 - Production on Fri Dec 21 09:42:11 2018 Copyright (c) 1982, 2010, Oracle. All Rights Reserved. Connected to an idle instance. SQL> startup ORA-03113: end-of-file on communication channel SQL>
检查告警日志,发现数据库在启动的时候,报 “ ORA-00471: DBWR process terminated with error ” 错误。如下所示:
PMON started with pid=2, OS id=25005
PSP0 started with pid=3, OS id=25007
MMAN started with pid=4, OS id=25009
DBW0 started with pid=5, OS id=25011
LGWR started with pid=6, OS id=25013
CKPT started with pid=7, OS id=25016
SMON started with pid=8, OS id=25018
RECO started with pid=9, OS id=25020
CJQ0 started with pid=10, OS id=25022
MMON started with pid=11, OS id=25024
Fri Dec 21 09:44:36 CST 2018
starting up 8 dispatcher(s) for network address '(ADDRESS=(PARTIAL=YES)(PROTOCOL=TCP))'...
MMNL started with pid=12, OS id=25026
Fri Dec 21 09:45:12 CST 2018
starting up 24 shared server(s) ...
Fri Dec 21 09:46:43 CST 2018
Errors in file /u01/app/oracle/admin/SCM2/bdump/scm2_pmon_25005.trc:
ORA-00471: DBWR process terminated with error
Fri Dec 21 09:46:43 CST 2018
PMON: terminating instance due to error 471
Instance terminated by PMON, pid = 25005
启动数据库实例的时候,报 “ ORA-00471: DBWR process terminated with error ” 这个错误,这个很蹊跷,很有可能是进程被系统给Kill掉了,检查操作系统的错误日志,发现出现了oom_kill_process,也就是说数据库实例启动的时候,由于系统内存资源紧张,DBWR进程被系统选作了牺牲品。具体错误日志如下所示:
Dec 21 09:46:39 mylnx6 kernel: oracle invoked oom-killer: gfp_mask=0x200da, order=0, oom_adj=0 Dec 21 09:46:39 mylnx6 kernel: oracle cpuset=/ mems_allowed=0 Dec 21 09:46:39 mylnx6 kernel: Pid: 25026, comm: oracle Not tainted 2.6.32-200.13.1.el5uek #1 Dec 21 09:46:39 mylnx6 kernel: Call Trace: Dec 21 09:46:39 mylnx6 kernel: [<ffffffff810a0b66>] ? cpuset_print_task_mems_allowed+0x92/0x9e Dec 21 09:46:39 mylnx6 kernel: [<ffffffff810d9ae6>] oom_kill_process+0x85/0x25b Dec 21 09:46:39 mylnx6 kernel: [<ffffffff810d9fbc>] ? select_bad_process+0xbc/0x102 Dec 21 09:46:39 mylnx6 kernel: [<ffffffff810da03f>] __out_of_memory+0x3d/0x86 Dec 21 09:46:39 mylnx6 kernel: [<ffffffff810da30f>] out_of_memory+0xfc/0x195 Dec 21 09:46:39 mylnx6 kernel: [<ffffffff810dd75e>] __alloc_pages_nodemask+0x487/0x595 Dec 21 09:46:39 mylnx6 kernel: [<ffffffff811075ac>] alloc_page_vma+0xb9/0xc8 Dec 21 09:46:39 mylnx6 kernel: [<ffffffff810ff0a7>] read_swap_cache_async+0x52/0xf1 Dec 21 09:46:39 mylnx6 kernel: [<ffffffff810ff1a3>] swapin_readahead+0x5d/0x9c Dec 21 09:46:39 mylnx6 kernel: [<ffffffff810d725a>] ? find_get_page+0x22/0x69 Dec 21 09:46:39 mylnx6 kernel: [<ffffffff810f1ea3>] handle_mm_fault+0x44b/0x80f Dec 21 09:46:39 mylnx6 kernel: [<ffffffff8106d7cd>] ? getrusage+0x2b1/0x2ce Dec 21 09:46:39 mylnx6 kernel: [<ffffffff8101270e>] ? common_interrupt+0xe/0x13 Dec 21 09:46:39 mylnx6 kernel: [<ffffffff81043696>] ? should_resched+0xe/0x2f Dec 21 09:46:39 mylnx6 kernel: [<ffffffff81456006>] do_page_fault+0x210/0x299 Dec 21 09:46:39 mylnx6 kernel: [<ffffffff81453fd5>] page_fault+0x25/0x30 Dec 21 09:46:39 mylnx6 kernel: Mem-Info: Dec 21 09:46:39 mylnx6 kernel: Node 0 DMA per-cpu: Dec 21 09:46:39 mylnx6 kernel: CPU 0: hi: 0, btch: 1 usd: 0 Dec 21 09:46:39 mylnx6 kernel: CPU 1: hi: 0, btch: 1 usd: 0 Dec 21 09:46:39 mylnx6 kernel: CPU 2: hi: 0, btch: 1 usd: 0 Dec 21 09:46:39 mylnx6 kernel: CPU 3: hi: 0, btch: 1 usd: 0 Dec 21 09:46:39 mylnx6 kernel: CPU 4: hi: 0, btch: 1 usd: 0 Dec 21 09:46:39 mylnx6 kernel: CPU 5: hi: 0, btch: 1 usd: 0 Dec 21 09:46:39 mylnx6 kernel: CPU 6: hi: 0, btch: 1 usd: 0 Dec 21 09:46:39 mylnx6 kernel: CPU 7: hi: 0, btch: 1 usd: 0 Dec 21 09:46:39 mylnx6 kernel: Node 0 DMA32 per-cpu: Dec 21 09:46:39 mylnx6 kernel: CPU 0: hi: 186, btch: 31 usd: 0 Dec 21 09:46:39 mylnx6 kernel: CPU 1: hi: 186, btch: 31 usd: 0 Dec 21 09:46:39 mylnx6 kernel: CPU 2: hi: 186, btch: 31 usd: 0 Dec 21 09:46:39 mylnx6 kernel: CPU 3: hi: 186, btch: 31 usd: 0 Dec 21 09:46:39 mylnx6 kernel: CPU 4: hi: 186, btch: 31 usd: 0 Dec 21 09:46:39 mylnx6 kernel: CPU 5: hi: 186, btch: 31 usd: 0 Dec 21 09:46:39 mylnx6 kernel: CPU 6: hi: 186, btch: 31 usd: 0 Dec 21 09:46:39 mylnx6 kernel: CPU 7: hi: 186, btch: 31 usd: 0 Dec 21 09:46:39 mylnx6 kernel: Node 0 Normal per-cpu: Dec 21 09:46:39 mylnx6 kernel: CPU 0: hi: 186, btch: 31 usd: 0 Dec 21 09:46:40 mylnx6 lvm[4702]: Another thread is handling an event. Waiting... Dec 21 09:46:41 mylnx6 kernel: CPU 1: hi: 186, btch: 31 usd: 0 Dec 21 09:46:40 mylnx6 lvm[4702]: Another thread is handling an event. Waiting... Dec 21 09:46:41 mylnx6 kernel: CPU 2: hi: 186, btch: 31 usd: 0 Dec 21 09:46:41 mylnx6 kernel: CPU 3: hi: 186, btch: 31 usd: 0 Dec 21 09:46:41 mylnx6 kernel: CPU 4: hi: 186, btch: 31 usd: 0 Dec 21 09:46:41 mylnx6 kernel: CPU 5: hi: 186, btch: 31 usd: 0 Dec 21 09:46:41 mylnx6 kernel: CPU 6: hi: 186, btch: 31 usd: 0 Dec 21 09:46:41 mylnx6 kernel: CPU 7: hi: 186, btch: 31 usd: 0 Dec 21 09:46:41 mylnx6 kernel: active_anon:1764 inactive_anon:209 isolated_anon:64 Dec 21 09:46:41 mylnx6 kernel: active_file:349 inactive_file:1710 isolated_file:0 Dec 21 09:46:41 mylnx6 kernel: unevictable:5377 dirty:0 writeback:4 unstable:0 Dec 21 09:46:41 mylnx6 kernel: free:29838 slab_reclaimable:2400 slab_unreclaimable:119491 Dec 21 09:46:41 mylnx6 kernel: mapped:2703 shmem:830 pagetables:9849 bounce:0 Dec 21 09:46:41 mylnx6 kernel: Node 0 DMA free:15652kB min:12kB low:12kB high:16kB active_anon:0kB inactive_anon:0kB active_file:0kB inactive_file:0kB unevictable:0kB isolated(anon):0kB isolated(file):0kB present:15172kB mlocked:0kB dirty:0kB writeback:0kB mapped:0kB shmem:0kB slab_reclaimable:0kB slab_unreclaimable:0kB kernel_stack:0kB pagetables:0kB unstable:0kB bounce:0kB writeback_tmp:0kB pages_scanned:0 all_unreclaimable? yes Dec 21 09:46:41 mylnx6 kernel: lowmem_reserve[]: 0 3000 24210 24210 Dec 21 09:46:41 mylnx6 kernel: Node 0 DMA32 free:86296kB min:2464kB low:3080kB high:3696kB active_anon:0kB inactive_anon:0kB active_file:0kB inactive_file:0kB unevictable:0kB isolated(anon):0kB isolated(file):0kB present:3072096kB mlocked:0kB dirty:0kB writeback:0kB mapped:0kB shmem:0kB slab_reclaimable:0kB slab_unreclaimable:0kB kernel_stack:0kB pagetables:0kB unstable:0kB bounce:0kB writeback_tmp:0kB pages_scanned:0 all_unreclaimable? yes Dec 21 09:46:41 mylnx6 kernel: lowmem_reserve[]: 0 0 21210 21210 Dec 21 09:46:41 mylnx6 kernel: Node 0 Normal free:17404kB min:17440kB low:21800kB high:26160kB active_anon:7056kB inactive_anon:836kB active_file:1396kB inactive_file:6840kB unevictable:21508kB isolated(anon):256kB isolated(file):0kB present:21719040kB mlocked:21504kB dirty:0kB writeback:16kB mapped:10812kB shmem:3320kB slab_reclaimable:9600kB slab_unreclaimable:477964kB kernel_stack:2800kB pagetables:39396kB unstable:0kB bounce:0kB writeback_tmp:0kB pages_scanned:544 all_unreclaimable? no Dec 21 09:46:41 mylnx6 kernel: lowmem_reserve[]: 0 0 0 0 Dec 21 09:46:41 mylnx6 kernel: Node 0 DMA: 1*4kB 2*8kB 1*16kB 0*32kB 2*64kB 1*128kB 0*256kB 0*512kB 1*1024kB 1*2048kB 3*4096kB = 15652kB Dec 21 09:46:41 mylnx6 kernel: Node 0 DMA32: 12*4kB 13*8kB 2*16kB 5*32kB 5*64kB 11*128kB 3*256kB 7*512kB 6*1024kB 4*2048kB 16*4096kB = 86296kB Dec 21 09:46:41 mylnx6 kernel: Node 0 Normal: 420*4kB 1917*8kB 49*16kB 0*32kB 0*64kB 0*128kB 0*256kB 0*512kB 0*1024kB 0*2048kB 0*4096kB = 17800kB Dec 21 09:46:41 mylnx6 kernel: 4722 total pagecache pages Dec 21 09:46:41 mylnx6 kernel: 694 pages in swap cache Dec 21 09:46:41 mylnx6 kernel: Swap cache stats: add 589182, delete 588488, find 343370/443306 Dec 21 09:46:41 mylnx6 kernel: Free swap = 66723056kB Dec 21 09:46:41 mylnx6 kernel: Total swap = 67108856kB Dec 21 09:46:41 mylnx6 kernel: 6291440 pages RAM Dec 21 09:46:41 mylnx6 kernel: 107316 pages reserved Dec 21 09:46:41 mylnx6 kernel: 24060 pages shared Dec 21 09:46:41 mylnx6 kernel: 77648 pages non-shared Dec 21 09:46:41 mylnx6 kernel: Out of memory: kill process 25011 (oracle) score 8425150 or a child Dec 21 09:46:41 mylnx6 kernel: Killed process 25011 (oracle) Dec 21 09:47:20 mylnx6 lvm[4702]: Another thread is handling an event. Waiting...
检查这个系统的内存,发现DR环境下,这个服务器只分配了24G内存,而实际生产环境的内存为64G(设置了 Linux 标准大页,而且SGA_MAX_SIZE大小为32G),而且这个环境是生产环境的“克隆体”,只是由于资源限制的缘故,系统管理员只分配24G内存。如下所示:
[root@mylnx6 ~]# free -m total used free shared buffers cached Mem: 24156 24033 123 0 0 6 -/+ buffers/cache: 24026 130 Swap: 65535 41 65494 [root@mylnx6 ~]# ps -ef | grep ora_ root 11759 11490 0 16:10 pts/1 00:00:00 grep ora_ [root@mylnx6 ~]# ipcs -m ------ Shared Memory Segments -------- key shmid owner perms bytes nattch status 0x00000000 3080192 root 644 80 2 0x00000000 3112961 root 644 16384 2 0x00000000 3145730 root 644 280 2 0x00000000 4096003 gdm 600 393216 0 0x2cd12178 3866628 oracle 640 34361835520 0 0x00000000 5210117 gdm 600 393216 2 dest
如上所示,可以看到oracle用户的共享内存段为 34361835520字节。所以引起这个错误的原因是因为在系统层面配置了标准大页的缘故(内存资源变化了,但是配置没有随之修改),为了快速解决问题,我们取消标准大页的相关设置。如下所示:
修改 limits.conf参数,注释soft memlock和hard memlock参数。
vi /etc/security/limits.conf
然后修改sysctl.conf,将vm.nr_hugepages注释掉。然后重启一下(DR测试环境,可以随时重启)。然后启动Oracle数据库实例,一切正常,当然还需调整相关参数,继续后续测试~。
以上所述就是小编给大家介绍的《ORA-00471: DBWR process terminated with error案例》,希望对大家有所帮助,如果大家有任何疑问请给我留言,小编会及时回复大家的。在此也非常感谢大家对 码农网 的支持!
猜你喜欢:- iOS混合开发库(GICXMLLayout)布局案例分析(1)今日头条案例
- 17个云计算开源案例入围第三届中国优秀云计算开源案例评选
- Spring Boot 2.0 基础案例(十二):基于转账案例,演示事务管理操作
- 基于MNIST数据集实现2层神经网络案例实战-大数据ML样本集案例实战
- Nginx相关实战案例
- SSIS 开发案例
本站部分资源来源于网络,本站转载出于传递更多信息之目的,版权归原作者或者来源机构所有,如转载稿涉及版权问题,请联系我们。