今天正在东莞蜜月的时候,一个学生说他管理的测试库出问题了,无法open,我们先来看看是什么问题:
Recovery of Online Redo Log: Thread 1 Group 4 Seq 4 Reading mem 0
  Mem# 0: /onlinelog/shr/redo04.log
Completed redo application of 0.00MB
Completed crash recovery at
 Thread 1: logseq 4, block 3, scn 7755957
 0 data blocks read, 0 data blocks written, 0 redo k-bytes read
Thread 1 advanced to log sequence 5 (thread open)
Thread 1 opened at log sequence 5
  Current log# 5 seq# 5 mem# 0: /onlinelog/shr/redo05.log
Successful open of redo thread 1
MTTR advisory is disabled because FAST_START_MTTR_TARGET is not set
Thu Jun 19 13:31:35 2014
SMON: enabling cache recovery
ORA-01555 caused by SQL statement below (SQL ID: 4krwuz0ctqxdt, SCN: 0x0000.007658ba):
select ctime, mtime, stime from obj$ where obj# = :1
Errors in file /oracle/diag/rdbms/shr/shr/trace/shr_ora_5262.trc:
ORA-00704: bootstrap process failure
ORA-00704: bootstrap process failure
ORA-00604: error occurred at recursive SQL level 1
ORA-01555: snapshot too old: rollback segment number 6 with name "_SYSSMU6_1263032392$" too small
Errors in file /oracle/diag/rdbms/shr/shr/trace/shr_ora_5262.trc:
ORA-00704: bootstrap process failure
ORA-00704: bootstrap process failure
ORA-00604: error occurred at recursive SQL level 1
ORA-01555: snapshot too old: rollback segment number 6 with name "_SYSSMU6_1263032392$" too small
Error 704 happened during db open, shutting down database
USER (ospid: 5262): terminating the instance due to error 704
Instance terminated by USER, pid = 5262
ORA-1092 signalled during: ALTER DATABASE OPEN...
opiodr aborting process unknown ospid (5262) as a result of ORA-1092
Thu Jun 19 13:31:37 2014
ORA-1092 : opitsk aborting process
 
从上面的错误来看,该数据库之所以open失败,是由于Oracle在bootstrap阶段执行递归SQL时出现ora-01555错误,
 这样bootstrap过程无法继续下去,也就导致数据库无法open。我们可以看到报错的SQL语句如下:
 select ctime, mtime, stime from obj$ where obj# = :1
这是很熟悉的一个SQL,通过10046 trace跟踪Oracle open的过程你会发现该SQL。
针对该错误,或许有人以为是回滚段的问题,实际上并不是,这种情况下推进下SCN 就可以很顺利的把数据库open。
但是这里有个问题:该兄弟的数据库是Oracle 11.2.0.4,已经不支持传统的10015 event的方式了。
下面我们通过oradebug 来解决该问题:
SQL> conn /as sysdba
Connected to an idle instance.
SQL> startup mount
ORACLE instance started.
 
Total System Global Area 4275781632 bytes
Fixed Size                  2260088 bytes
Variable Size             989856648 bytes
Database Buffers         3271557120 bytes
Redo Buffers               12107776 bytes
Database mounted.
SQL>
SQL> oradebug poke 0x06001AE70 4 0x859AFA
ORA-00074: no process has been specified
SQL> oradebug setmypid
Statement processed.
SQL>  oradebug DUMPvar SGA kcsgscn_
kcslf kcsgscn_ [06001AE70, 06001AEA0) = 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000 6001AB50 00000000
SQL>  oradebug poke 0x06001AE70 4 0x859AFA
BEFORE: [06001AE70, 06001AE74) = 00000000
AFTER:  [06001AE70, 06001AE74) = 00859AFA
SQL> alter database open;
 
Database altered.
 
SQL>
 
这里简单解释一下,4 为长度,0x859AFA是16进制,我在原来的v$datafile_header.checkpoint_change#的基础之上
 加上上1000000得到该值。
我们可以看到,顺利打开了数据库。最后再出观察下alert log发现居然有ora-00600 4194错误。
 
Thu Jun 19 14:48:43 2014
Dumping diagnostic data in directory=[cdmp_20140619144843], requested by (instance=1, osid=9140 (MMON)), summary=[incident=132122].
Exception [type: SIGSEGV, Address not mapped to object] [ADDR:0x3D6C3836] [PC:0x97F4DF6, kgegpa()+40] [flags: 0x0, count: 1]
Exception [type: SIGSEGV, Address not mapped to object] [ADDR:0x3D6C3836] [PC:0x97F386A, kgebse()+776] [flags: 0x2, count: 2]
Thu Jun 19 14:48:46 2014
Flush retried for xcb 0x159c668c8, pmd 0x15870d270
Errors in file /oracle/diag/rdbms/shr/shr/trace/shr_pmon_9112.trc  (incident=132017):
ORA-00600: internal error code, arguments: [4194], [], [], [], [], [], [], [], [], [], [], []
Incident details in: /oracle/diag/rdbms/shr/shr/incident/incdir_132017/shr_pmon_9112_i132017.trc
Use ADRCI or Support Workbench to package the incident.
See Note 411.1 at My Oracle Support for error and packaging details.
Thu Jun 19 14:48:47 2014
Errors in file /oracle/diag/rdbms/shr/shr/trace/shr_ora_9268.trc  (incident=132209):
ORA-00600: internal error code, arguments: [4194], [], [], [], [], [], [], [], [], [], [], []
ORA-00001: unique constraint (SYSTEM.UNQ_PAIRS) violated
Incident details in: /oracle/diag/rdbms/shr/shr/incident/incdir_132209/shr_ora_9268_i132209.trc
Use ADRCI or Support Workbench to package the incident.
See Note 411.1 at My Oracle Support for error and packaging details.
Errors in file /oracle/diag/rdbms/shr/shr/trace/shr_pmon_9112.trc:
ORA-00600: internal error code, arguments: [4194], [], [], [], [], [], [], [], [], [], [], []
PMON (ospid: 9112): terminating the instance due to error 472
System state dump requested by (instance=1, osid=9112 (PMON)), summary=[abnormal instance termination].
System State dumped to trace file /oracle/diag/rdbms/shr/shr/trace/shr_diag_9122_20140619144848.trc
Dumping diagnostic data in directory=[cdmp_20140619144848], requested by (instance=1, osid=9112 (PMON)), summary=[abnormal instance termination].
Instance terminated by PMON, pid = 9112
 
这是一个非常常见的错误了,最简单的方式就是通过重建undo或处理回滚段来解决,这里我们用最简单的方式:
修改undo_management=manual
    undo_tablespace='system'
SQL> l
  1* create pfile='/tmp/1.ora' from spfile
SQL> startup mount pfile='/tmp/1.ora';
ORACLE instance started.
 
Total System Global Area 4275781632 bytes
Fixed Size                  2260088 bytes
Variable Size             989856648 bytes
Database Buffers         3271557120 bytes
Redo Buffers               12107776 bytes
Database mounted.
SQL> alter database open;
 
Database altered.
 
SQL> select name from v$datafile;
 
NAME
--------------------------------------------------------------------------------
/oracle/oradata/shr/system01.dbf
/oracle/oradata/shr/sysaux01.dbf
/oracle/oradata/shr/undotbs01.dbf
/oracle/oradata/shr/users01.dbf
/oradata/shr/jy_shr01.dbf
/oradata/shr/jy_shr02.dbf
/oradata/shr/jy_shr03.dbf
/oradata/shr/eas_d_stand01.dbf
/oradata/shr/eas_d_stand02.dbf
/oradata/shr/eas_d_stand03.dbf
/backup/eas/eas_d_stand04.dbf
 
NAME
--------------------------------------------------------------------------------
/backup/eas/eas_d_stand05.dbf
/backup/eas/eas_d_stand06.dbf
 
13 rows selected.
 
SQL> show parameter undo
 
NAME                                 TYPE        VALUE
------------------------------------ ----------- ------------------------------
undo_management                      string      MANUAL
undo_retention                       integer     900
undo_tablespace                      string      SYSTEM
SQL> create undo tablespace undotbs2 datafile '/oracle/oradata/shr/undotbs2_01.dbf' size 200m;
 
Tablespace created.
 
SQL> shutdown immediate
 
重建undo之后,再停库,修改undo参数即可顺利打开数据库,如下:
SQL> create spfile from pfile='/tmp/1.ora';
 
File created.
 
SQL> startup mount
ORACLE instance started.
 
Total System Global Area 4275781632 bytes
Fixed Size                  2260088 bytes
Variable Size             989856648 bytes
Database Buffers         3271557120 bytes
Redo Buffers               12107776 bytes
Database mounted.
SQL> alter database open;
 
Database altered.
 
SQL>
处理完成之后alert log不再抛出任何错误。









