ORA-15077 PROC-26 CRSD Fails During CRS Startup on 11gR2 [ID 1152583.1]-CFANZ编程社区

ORA-15077 PROC-26 CRSD Fails During CRS Startup on 11gR2 [ID 1152583.1]

	Modified 04-AUG-2010 Type PROBLEM Status PUBLISHED

In this Document
Symptoms Changes Cause Solution References

Applies to:

Oracle Server - Enterprise Edition - Version: 11.2.0.1 and later [Release: 11.2 and later ]
Information in this document applies to any platform.

Symptoms

2 node RAC, node 2 rebooted manually, after node restart and restart CRS, CRSD crashed with:

The OCR location +DG_DATA_01 is inaccessible2010-06-27 09:58:56.869: [ OCRASM][4156924400]proprasmo: Error in open/create file in dg [DG_DATA_01][ OCRASM][4156924400]SLOS : SLOS: cat=7, opn=kgfoAl06, dep=15077, loc=kgfokgeORA-15077: could not locate ASM instance serving a required diskgroup2010-06-27 09:58:56.871: [ CRSOCR][4156924400] OCR context init failure. Error: PROC-26: Error while accessing the physical storage ASM error [SLOS: cat=7, opn=kgfoAl06, dep=15077, loc=kgfokgeORA-15077: could not locate ASM instance serving a required diskgroup] [7]2010-06-27 09:58:56.871: [ CRSD][4156924400][PANIC] CRSD exiting: Could not init OCR, code: 26

alertracnode2.log shows:

2010-06-27 09:45:04.759[cssd(13087)]CRS-1713:CSSD daemon is started in clustered mode2010-06-27 09:45:24.911[cssd(13087)]CRS-1601:CSSD Reconfiguration complete. Active nodes are racnode1 racnode2 .
2010-06-27 09:45:43.399[crsd(13556)]CRS-1201:CRSD started on node racnode2.2010-06-27 09:58:43.026[crsd(13556)]CRS-1013:The OCR location in an ASM disk group is inaccessible. Details in /opt/oracle/11.2.0/grid/log/racnode2/crsd/crsd.log.2010-06-27 09:58:43.207[/opt/oracle/11.2.0/grid/bin/oraagent.bin(14944)]CRS-5822:Agent '/opt/oracle/11.2.0/grid/bin/oraagent_oracle' disconnected from server. Details at (:CRSAGF00117:) in /opt/oracle/11.2.0/grid/log/racnode2/agent/crsd/oraagent_oracle/oraagent_oracle.log.2010-06-27 09:58:43.465[ohasd(12493)]CRS-2765:Resource 'ora.crsd' has failed on server 'racnode2'....2010-06-27 09:59:02.943[crsd(15055)]CRS-1013:The OCR location in an ASM disk group is inaccessible. Details in /opt/oracle/11.2.0/grid/log/racnode2/crsd/crsd.log.2010-06-27 09:59:03.713[ohasd(12493)]CRS-2765:Resource 'ora.crsd' has failed on server 'racnode2'.2010-06-27 09:59:03.713[ohasd(12493)]CRS-2771:Maximum restart attempts reached for resource 'ora.crsd'; will not restart.

Changes

reboot the node

Cause

This issue is caused by VIP address being already assigned in the network due to a wrong system configuration.

In the crsd.log, we can see:

2010-06-27 09:49:15.743: [UiServer][1519442240] Container [ Name: ORDERMESSAGE:TextMessage[CRS-2672: Attempting to start 'ora.racnode2.vip' on 'racnode2']

2010-06-27 09:49:35.827: [UiServer][1519442240] Container [ Name: ORDERMESSAGE:TextMessage[CRS-5005: IP Address: 10.18.14.16 is already in use in the network]
2010-06-27 09:49:35.829: [UiServer][1519442240] Container [ Name: ORDERMESSAGE:TextMessage[CRS-2674: Start of 'ora.racnode2.vip' on 'racnode2' failed]
2010-06-27 09:51:32.746: [UiServer][1519442240] Container [ Name: ORDERMESSAGE:TextMessage[Attempting to stop `ora.asm` on member `racnode2`]
2010-06-27 09:58:44.543: [ CRSOCR][1147494896] OCR context init failure. Error: PROC-26: Error while accessing the physical storage ASM error [SLOS: cat=7, opn=kgfoAl06, dep=15077, loc=kgfokgeORA-15077: could not locate ASM instance serving a required diskgroup] [7]2010-06-27 09:58:44.543: [ CRSD][1147494896][PANIC] CRSD exiting: Could not init OCR, code: 262010-06-27 09:58:44.543: [ CRSD][1147494896] Done.

So ASM and OCR diskgroup were ONLINE, CRSD was starting resource, when it starts VIP, due to VIP address already used in network, it failed to start ora.racnode2.vip, it then shutdown ASM, causing OCR device access failure and CRSD abort.

Checking network, we see:

/etc/hosts# public node names10.12.14.13 racnode110.12.14.14 racnode2#Oracle RAC VIP10.12.14.15 racnode1-vip10.12.14.16 racnode2-vip

The ifconfig output from node 2 shows that the VIP address for racnode2 is permanently assigned to eth1:

eth1 Link encap:Ethernet HWaddr 00:22:64:F7:0C:E8inet addr:10.12.14.16 Bcast:10.12.14.255 Mask:255.255.248.0inet6 addr: fe80::222:64ff:fef7:ce8/64 Scope:LinkUP BROADCAST RUNNING MULTICAST MTU:1500 Metric:1RX packets:2772 errors:0 dropped:0 overruns:0 frame:0TX packets:119 errors:0 dropped:0 overruns:0 carrier:0collisions:0 txqueuelen:1000RX bytes:203472 (198.7 KiB) TX bytes:22689 (22.1 KiB)Interrupt:177 Memory:f4000000-f4012100

while it should have been bound to the public interface on node 1 (eth1:<n>) while CRS was down on node 2:

eth0 Link encap:Ethernet HWaddr 00:22:64:F7:0B:22inet addr:10.12.14.13 Bcast:10.12.14.255 Mask:255.255.248.0inet6 addr: fe80::222:64ff:fef7:b22/64 Scope:LinkUP BROADCAST RUNNING MULTICAST MTU:1500 Metric:1
eth0:1 Link encap:Ethernet HWaddr 00:22:64:F7:0B:22inet addr:10.12.14.15 Bcast:10.12.14.255 Mask:255.255.248.0UP BROADCAST RUNNING MULTICAST MTU:1500 Metric:1Interrupt:169 Memory:f2000000-f2012100

Solution

Modify network configuration at OS layer, eg:
/etc/sysconfig/network-scripts/ifcfg-eth*
script, remove the VIP IP from ifcfg-eth1 definition.

Restart network service, check ifconfig -a result, ensure VIP is not assigned to network interface before CRSD is started (unless it is failed over to the other node).

Restart CRSD on node 2.

References

NOTE:1050908.1 - How to Troubleshoot Grid Infrastructure Startup Issues