问题描述
填写问题的基础信息。
系统名称 | 集群 |
IP地址 | |
操作系统 | LInux |
数据库 | Oracle RAC 11.2.0.4 |
发现时间 | |
发现方式 |
症状表现
问题的症状表现如下
两套集群上的数据库实例频繁重启,并且两个实例告警日志都有如下内容
The controlfile header block returned by the oshas a sequence number that is too old.The controlfile might be corrupted.PLEASE DO NOT ATTEMPT TO START UP THE INSTANCEwithout following the steps below.RE -STARTING THE INSTANCE CAN CAUSE SERIOUS DAMAGETO THE DATABASE, if the controlfile is truly corrupted.In order to re-start the instance safely,please do the following:
(1)Save all copies of the controlfile for lateranalysis and contact your 0s vendor and Oracle support.(2)Mount the instance and issue:
ALTER DATABASE BACKUP CONTROLFILE TO TRACE;
(3)Unmount the instance.
(4)Use the script in the trace file to
RE-CREATE THE CONTROLFILE and open the database.
处理过程
处理过程推荐按照时间以列表形式,将处理过程时间点,处理内容。
去mos上搜关键字得到如下吻合的case
The controlfile header block returned by the OS has a sequence number that is too old. (Doc ID 1589355.1)APPLIES TO:
Oracle Database Cloud Schema Service - Version N/A and later
Gen 1 Exadata Cloud at Customer (Oracle Exadata Database Cloud Machine) - Version N/A and later
Oracle Database Exadata Express Cloud Service - Version N/A and later
Oracle Cloud Infrastructure - Database Service - Version N/A and later
Oracle Database Cloud Exadata Service - Version N/A and later
Information in this document applies to any platform.SYMPTOMS
Database instance went down with following error message in alert log:
---
Wed Sep 11 23:26:39 2013
********************* ATTENTION: ********************
The controlfile header block returned by the OS
has a sequence number that is too old.
The controlfile might be corrupted.
PLEASE DO NOT ATTEMPT TO START UP THE INSTANCE
without following the steps below.
RE-STARTING THE INSTANCE CAN CAUSE SERIOUS DAMAGE
TO THE DATABASE, if the controlfile is truly corrupted.
In order to re-start the instance safely,
please do the following:
(1) Save all copies of the controlfile for later
analysis and contact your OS vendor and Oracle support.
(2) Mount the instance and issue:
ALTER DATABASE BACKUP CONTROLFILE TO TRACE;
(3) Unmount the instance.
(4) Use the script in the trace file to
RE-CREATE THE CONTROLFILE and open the database.
*****************************************************
USER (ospid: 24051722): terminating the instance
---CAUSE
BUG 14281768 - CONTROL FILE GETS CORRUPTEDWhich was closed as Vendor OS/Software/Framework ProblemSOLUTION
Error is typically raised when the Controlfile is overwritten by an older copy of the Controlfile. Most likely this happened due to Storage OR I/o error.
All copies of the control file must have the same internal sequence number for Oracle to start up the database or shut it down in normal or immediate mode.The solution is actually given with the accompained message :-(1) Save all copies of the controlfile for later
analysis and contact your OS vendor and Oracle support.
(2) Mount the instance and issue:
ALTER DATABASE BACKUP CONTROLFILE TO TRACE;
(3) Unmount the instance.
(4) Use the script in the trace file to
RE-CREATE THE CONTROLFILE and open the database.To make a sanity check in the future , please set the following parameter :-SQL> alter system set "_controlfile_update_check"='HIGH' scope=spfile; -- then bounce the database.Please check with your OS System/Storage admin regarding the issue.The precautions is to relocate the control file on a fast and direct I/O enabled disk , the main target is not letting the OS to write an old copy (cached copy of the controlfile to it).
To reverse the parameter setting :-SQL> alter system set "_controlfile_update_check"='OFF' scope=spfile; -- then bounce the database.
问题原因
问题原因如下
让客户查了报错实例确实有多个控制文件,分别存放在不同的磁盘中,并且这些磁盘组都是同一套存储。
结合官方文档的提示,推测是客户的存储出现了问题
问题解决
问题解决如下
优先排查存储问题,存储问题排查完成之后建议recreate controlfile,这个步骤视存储修复的情况而定,如果存储方面的问题排除后,这个报错不再出现则不做任何操作。
不过当前情况下如果客户没有备份,建议先对controlfile做一个trace备份,在数据库能够打开或者mount的情况下执行:
alter database backup controlfile to trace;
recreate controlfile需要考虑一些问题,并且需要做一些测试,详见另一篇文档。