Oracle12c R2注意事项:DB password file in ASM (DataGuard环境重建PWD)

栏目: 数据库 · 发布时间: 5年前

内容简介:之前在1, db alert log of primary sidetip:

之前在 Oracle 12c 关于密码(password)的几个新特性小结 曾经记录过12C 密码文件可以放到ASM DG的特性, 最近处理一个案例时就遇到了,简单记录一套修复他人搭建的问题RAC+ DG的过程, 环境Oracle 12cR2 2-nodes RAC+DG。

1, db alert log of primary side

2018-12-25 09:32:18.111000 +08:00
Error 1034 received logging on to the standby
2018-12-25 09:37:18.335000 +08:00
Error 1034 received logging on to the standby

tip:

提示目标实例未启动,后确认备库有一个节点1启动,节点2关闭,理论上Standby 端不应该因部分实例停而影响DG的传输。

2. 确认DG 配置 on primary side

SQL>show parameter  log_archive
NAME                   VALUES
-----------------      --------------------------------------------------------------------------------------------
log_archive_dest_2     SERVICE=STD_ANBOB2 LGWR ASYNC VALID_FOR=(ONLINE_LOGFILES,PRIMARY_ROLE) DB_UNIQUE_NAME=stdANBOB

oracle@DB01:/home/oracle> tnsping STD_ANBOB2

Used TNSNAMES adapter to resolve the alias
Attempting to contact (DESCRIPTION = (ADDRESS_LIST = (ADDRESS = (PROTOCOL = TCP)(HOST = 133.96.43.38)(PORT = 1531))) (CONNECT_DATA = (SID = ANBOB2)))
OK (0 msec)

tip:

使用了非默认的1531 port, 但是配置的只是节点2的IP,而且还使用的SID而不是service_name, 后来发现standby side的实例监听中只配置了SERVICE_NAME=SID的服务,无同名服务说明静态监听配置也是不正确的,需要先修复standby side上增加共同的服务名。

3, 修正primary side 传输SERVICE配置

oracle@DB01:/home/oracle> tnsping STD_ANBOB
Used TNSNAMES adapter to resolve the alias
Attempting to contact (DESCRIPTION = (ADDRESS_LIST = (ADDRESS = (PROTOCOL = TCP)(HOST = 133.96.43.137)(PORT = 1531)) (ADDRESS = (PROTOCOL = TCP)(HOST = 133.96.43.138)(PORT = 1531))) (CONNECT_DATA = (service_name = ANBOB.com)))
OK (0 msec)

SQL> alter system set log_archive_dest_2='SERVICE=STD_ANBOB LGWR ASYNC VALID_FOR=(ONLINE_LOGFILES,PRIMARY_ROLE) DB_UNIQUE_NAME=stdANBOB';

# Primary side db alert log 
Error 1017 received logging on to the standby
------------------------------------------------------------
Check that the primary and standby are using a password file
and remote_login_passwordfile is set to SHARED or EXCLUSIVE, 
and that the SYS password is same in the password files.
      returning error ORA-16191
------------------------------------------------------------
TT04: Attempting destination LOG_ARCHIVE_DEST_2 network reconnect (16191)
TT04: Destination LOG_ARCHIVE_DEST_2 network reconnect abandoned
Errors in file /oracle/app/oracle/diag/rdbms/ANBOB/anbob/trace/anbob_tt04_47574.trc:
ORA-16191: Primary log shipping client not logged on standby
Error 16191 for archive log file 5 to 'STD_ANBOB'

Tip:

增加了实例1,使用了服务名,支持STANDBY上的RAC部分实例关闭后的正常传输, 但是主库日志里又出现了ora-1017 ora-16191错误

4, STANDBY side ORA-7445 [kzsrgavs()+995]

# db alert log
2018-12-25 10:48:21.996000 +08:00
Exception [type: SIGSEGV, Address not mapped to object] [ADDR:0x7FFD269AF8A7] [PC:0x2897843, kzsrgavs()+995] [flags: 0x0, count: 1]
Errors in file /oracle/app/oracle/diag/rdbms/stdANBOB/ANBOB1/trace/ANBOB1_pr00_36293.trc  (incident=64809):
ORA-07445: exception encountered: core dump [kzsrgavs()+995] [SIGSEGV] [ADDR:0x7FFD269AF8A7] [PC:0x2897843] [Address not mapped to object] []
Incident details in: /oracle/app/oracle/diag/rdbms/stdANBOB/ANBOB1/incident/incdir_64809/ANBOB1_pr00_36293_i64809.trc
Use ADRCI or Support Workbench to package the incident.
See Note 411.1 at My Oracle Support for error and packaging details.
Dumping diagnostic data in directory=[cdmp_20181225104822], requested by (instance=1, osid=36293 (PR00)), summary=[incident=64809].
2018-12-25 10:48:51.764000 +08:00
alter database recover managed standby database cancel
2018-12-25 10:48:53.175000 +08:00
Logmerger died, shutting down parallel recovery slaves
2018-12-25 10:49:26.184000 +08:00
CLMN: delete dead process - failed
2018-12-25 10:53:59.078000 +08:00
ORA-1013 signalled during: alter database recover managed standby database cancel...

# ora-7445 trace file 
        *** 2018-12-25T10:48:22.063388+08:00
        dbkedDefDump(): Starting a non-incident diagnostic dump (flags=0x3, level=3, mask=0x0)
        [TOC00004]
3>       ***** SQL Statement (None) *****
         Current SQL information unavailable - no cursor.
3< ***** current_sql_statement ***** [TOC00004-END] [TOC00005] 3>       ***** Call Stack Trace *****
         ksedst()+119        <         dbkedDefDump()+1200 <   ksedmp()+259        <         ssexhd()+3188       <
         sslssSynchHdlr()+39 <       sslsshandler()+118  <       __sighandler()      <
         kzsrgavs()+995      <         krsu_pre11_pwd_conn() <    krsu_oci_est_sec_conn()+1594     <   
		 krsu_wupi_log()+446 <         krsu_upi_logc()+366 <         krsu_connect()+104  <        
		 krsf_send_fal_request()+607    <     krsf_get_gap_sequence()+1266    <       krr_read_disk()+446 <    
         krr_read_buffer()  <      krr_parse_redo()  <         krr_do_media_recovery()+6603     <        
		 krddmr()+1356       <         krd_do_mrp()+1078   <      krd_logmerger_driver()+6425    <     krp_slave_main() <
         ksvrdp_int()+2010   <         opirip()+602        <       opidrv()+602    <     sou2o()+145      <
         opimai_real()+202   <         ssthrdmain()+417    <        main()+262

tips:

Standby 实例具然出现了ora-7445 [kzsrgavs()+995].

KZSRGAVS >>>>> (Kzsr)Gavs – Kernel Security Privileges Roles/Remote Password File ??

KRSU_PRE11_PWD_CONN >>>>>(Krs)U – Kernel Recovery Standby/Dataguard

猜测DG 的db password file不一致,说明当时备库的节点1可能密码文件和主库和备库2节点不一致,这点可以使用md5sum验证文件。于是习惯性的从Primary节点1的$ORACLE_HOME/dbs下, scp过去一份PWD file到standby 下,此时执行alter database recover managed standby database cancel操作可能会hang, 可以shutdown abort重新启动. 再次启动log应用。

# standby side db alert log

Managed Standby Recovery starting Real Time Apply
2018-12-25 11:07:57.610000 +08:00
Parallel Media Recovery started with 80 slaves
Media Recovery Log +ARCHDG/STDANBOB/ARCHIVELOG/2018_12_25/thread_2_seq_1906.301.995795603
Media Recovery Log +ARCHDG/STDANBOB/ARCHIVELOG/2018_12_24/thread_1_seq_5110.280.995721757
Exception [type: SIGSEGV, Address not mapped to object] [ADDR:0x7FFF32BC5EC7] [PC:0x2897843, kzsrgavs()+995] [flags: 0x0, count: 1]
Errors in file /oracle/app/oracle/diag/rdbms/stdANBOB/ANBOB1/trace/ANBOB1_arc3_76413.trc  (incident=128681):
ORA-07445: exception encountered: core dump [kzsrgavs()+995] [SIGSEGV] [ADDR:0x7FFF32BC5EC7] [PC:0x2897843] [Address not mapped to object] []
Incident details in: /oracle/app/oracle/diag/rdbms/stdANBOB/ANBOB1/incident/incdir_128681/ANBOB1_arc3_76413_i128681.trc
Use ADRCI or Support Workbench to package the incident.
See Note 411.1 at My Oracle Support for error and packaging details.
2018-12-25 11:07:59.047000 +08:00
Dumping diagnostic data in directory=[cdmp_20181225110759], requested by (instance=1, osid=76413 (ARC3)), summary=[incident=128681].
2018-12-25 11:08:00.972000 +08:00
ARCH: Detected ARCH process failure
ARCH: STARTING ARCH PROCESSES
Starting background process ARC3
ARC3 started with pid=167, OS id=29684
ARC3: Archival started
ARCH: STARTING ARCH PROCESSES COMPLETE
Media Recovery of Online Log [Thread=1, Seq=5111]
Recovery of Online Redo Log: Thread 1 Group 52 Seq 5111 Reading mem 0
  Mem# 0: +DATADG/STDANBOB/ONLINELOG/group_52.260.995277121
2018-12-25 11:09:01.424000 +08:00
Standby crash recovery failed to bring standby database to a consistent
point because needed redo hasn't arrived yet.
MRP: Wait timeout: thread 1 sequence# 5111  
Media Recovery user canceled with status 16016
Managed Standby Recovery not using Real Time Apply
Standby Crash Recovery aborted due to error 10877.
Errors in file /oracle/app/oracle/diag/rdbms/stdANBOB/ANBOB1/trace/ANBOB1_ora_29171.trc:
ORA-10877: error signaled in parallel recovery slave
ORA-10877: error signaled in parallel recovery slave
Recovery interrupted!
Some recovered datafiles maybe left media fuzzy
Media recovery may continue but open resetlogs may fail
Errors in file /oracle/app/oracle/diag/rdbms/stdANBOB/ANBOB1/trace/ANBOB1_pr00_29289.trc:
ORA-16043: Redo apply has been canceled.
ORA-16016: archived log for thread 1 sequence# 5111 unavailable
Completed Standby Crash Recovery.

Tip:

当时还以为redo archived log sequence# 5111有问题,手动传输应用一下。

5. 手动传输GAP ARCHIVED LOG file

RMAN> backup archivelog sequence between  5111 and 5122;

# scp backuppiece file to standby side

RMAN> catalog backuppiece '/home/oracle/bftllg3h_1_1';

RMAN> restore archivelog sequence  between  5111 and 5122;

tip:

Standby 重启应用发现可以往下继续应用,但是很快把手动传输的日志应用完后,又ORA-7445挂起了, 此时判断还应该是密码文件问题。 难道密码文件不对,看来是忽略了12C 的password file in ASM的特性。

6, password file in ASM

oracle@DB01:/home/oracle> srvctl config database -d ANBOB
Database unique name: ANBOB
Database name: ANBOB
Oracle home: /oracle/app/oracle/product/12.2.0/db_1
Oracle user: oracle
Spfile: +DATADG/ANBOB/PARAMETERFILE/spfile.281.992037707
Password file: +DATADG/ANBOB/PASSWORD/pwdANBOB.269.992036729
Domain: com
Start options: open
Stop options: immediate
Database role: PRIMARY
Management policy: AUTOMATIC
Server pools:
Disk Groups: DATADG,ARCHDG
Mount point paths:
Services:
Type: RAC
Start concurrency:
Stop concurrency:
OSDBA group: dba
OSOPER group:
Database instances: ANBOB1,ANBOB2
...

Note:

Password file记录确认了当前的实例密码文件是在ASM DiskGroup中。如果该行记录值为空表明使用使用之前的本地ORACLE_HOME/dbs下的目录文件,现在解决进来就简单了。

grid@DB01:/home/grid>asmcmd
ASMCMD> cp +DATADG/ANBOB/PASSWORD/pwdANBOB.269.992036729 /tmp/orapwd.file
copying +DATADG/ANBOB/PASSWORD/pwdANBOB.269.992036729 -> /tmp/orapwd.file

grid@DB01:/home/grid>ls -l /tmp/orapwd.file
-rw-r----- 1 grid oinstall 3584 Dec 25 14:15 /tmp/orapwd.file

# scp the orapwd.file to standby instance both.

7. REGISTER Standby DB in CRS

oracle@qdord01:/home/oracle> srvctl add database -db stdANBOB -oraclehome $ORACLE_HOME -spfile +DATADG/STDANBOB/PARAMETERFILE/spfile.435.995277297 -role PHYSICAL_STANDBY -startoption MOUNT -dbtype RAC -dbname ANBOB -diskgroup "datadg,archdg" 
oracle@qdord01:/home/oracle> srvctl add instance -db stdANBOB -instance ANBOB1 -node qdord01
oracle@qdord01:/home/oracle> srvctl add instance -db stdANBOB -instance ANBOB2 -node qdord02
grid@qdord01:/home/grid> srvctl start instance -d stdANBOB  -n qdord01
grid@qdord01:/home/grid> srvctl start instance -d stdANBOB  -n qdord02

Conclusion:

12C中为了在不同实例间统一的密码管理, 支持把DB密码(ASM password same)存储到ASM DISKGROUP中,这样在维护DG环境时,当同步密码文件时就要先确认一下密码的位置, 同样DG端也可以把密码存储到ASM中,然后使用srvctl modify database修改pwd路径. 这个案例通过在标准化DG配置中因密码不一致产生了各种错误,后重新同步密码文件解决,提醒有相同问题的同学可以少走弯路。

— enjoy —


以上所述就是小编给大家介绍的《Oracle12c R2注意事项:DB password file in ASM (DataGuard环境重建PWD)》,希望对大家有所帮助,如果大家有任何疑问请给我留言,小编会及时回复大家的。在此也非常感谢大家对 码农网 的支持!

查看所有标签

猜你喜欢:

本站部分资源来源于网络,本站转载出于传递更多信息之目的,版权归原作者或者来源机构所有,如转载稿涉及版权问题,请联系我们

Design for Hackers

Design for Hackers

David Kadavy / Wiley / 2011-10-18 / USD 39.99

Discover the techniques behind beautiful design?by deconstructing designs to understand them The term ?hacker? has been redefined to consist of anyone who has an insatiable curiosity as to how thin......一起来看看 《Design for Hackers》 这本书的介绍吧!

URL 编码/解码
URL 编码/解码

URL 编码/解码

SHA 加密
SHA 加密

SHA 加密工具

UNIX 时间戳转换
UNIX 时间戳转换

UNIX 时间戳转换