DBA 定制自动化巡检工具(附参考脚本)

科技   科技   2024-11-04 07:35   海南  

【摘要】本文是一篇很实用的技术文章。介绍了逐步实现自动化的基本步骤、需要获取的数据库诊断信息,对实现自动化脚本集成进行了详细说明,并提供了参考脚本。

【作者】赵海


一、逐步实现自动化的基本步骤

1. 手动巡检

最初阶段,多数DBA习惯于将常用的SQL语句总结记录下来,在日常巡检的时候,会在数据库中执行常用的SQL语句,然后以抓屏方式将结果复制出来,然后逐条检查分析。这种方法只适合于管理少量数据库,一旦数据库数量增加,工作量会非常大。

2. 脚本巡检

升级阶段,具备脚本语言开发功底的DBA会逐步将常用的SQL语句编制为固定SQL脚本,然后通过操作系统可执行的脚本语言(例如:KSH、BASH),通过Shell脚本去调用SQL脚本,并把执行结果写入日志文件,后期通过日志文件去进行检查分析。这种方法可管理较多数据库,但是需要频繁切换到不同的数据库服务器,容易误操作,尤其是操作系统平台不统一的场景。

3. 自动化巡检

最终阶段,具备Python、Expect等语言开发功底的DBA会逐步将常前面的成果物积淀下来,然后通过Python脚本的方式去自动调用各个系统的Shell脚本,从而批量完成所有数据库的巡检过程,并将所有数据库的巡检日志传输至集中位置。再通过文本过滤工具(例如:AWK)或者VBA对日志内容进行过滤筛选,最后对过滤后的内容进行最终分析。这种方法可以通过集中管理服务器对所有数据库服务器进行批量操作,并且消除了操作系统平台不一致带来的问题。

二、需要获取的数据库诊断信息

1. 基本信息

基本信息根据不同的数据库环境会有所差异,但是有些最基本的信息是必须的。如数据库名称、实例名称、唯一标识信息、系统版本、地址信息、数据库创建时间、数据库启动时间、数据库版本信息、数据库补丁信息、字符集、实例状态、数据库归档模式等。以下脚本(for Oracle)供参考:

SELECT name AS db_name,       dbid,       created,       platform_name,       db_unique_name,       log_mode,       guard_status,       force_logging,       flashback_on  FROM v$database;  SELECT instance_name,       inst_id,       host_name,       TO_CHAR(startup_time, 'YYYY/MM/DD HH24:MI:SS') AS startup_time,       status,       SYSDATE - STARTUP_TIME AS Running_Time  FROM gv$instance ORDER BY inst_id;

2. 存储信息

存储信息包括逻辑存储信息和物理存储信息,具体包含数据库的表空间、数据文件、磁盘三个维度。表空间需要收集表空间名称,类型,区管理方式,区初始大小,段的管理方式,总大小,剩余大小,剩余百分比;数据文件需要收集对应的表空间,是否自动扩展,当前值,最大值等;磁盘信息需要收集磁盘组的序号、名称、冗余类型、总大小、剩余大小、使用频繁的数据大小、镜像需要的空间、可用于镜像的大小、offline的磁盘数量等。以下脚本(for Oracle)供参考:

SELECT c.tablespace_name,       c.contents,       c.extent_management ext_mgmt,       c.allocation_type alloc_type,       c.initial_extent / 1024 ext_kb,       c.segment_space_management SSM,       nvl(a.total_gb, 0) total_gb,       decode(c.contents, 'TEMPORARY', nvl(a.total_gb, 0) - c.block_size * b.used_blocks / 1073741824, nvl(b.free_gb, 0)) free_gb,       100 *  decode(c.contents, 'TEMPORARY', nvl(a.total_gb, 0) - c.block_size * b.used_blocks / 1073741824, nvl(b.free_gb, 0)) / nvl(a.total_gb, 1) free_pct  FROM (SELECT tablespace_name, nvl(sum(bytes) / 1073741824, 0) total_gb          FROM dba_data_files         GROUP BY tablespace_name         UNION all        SELECT tablespace_name, nvl(sum(bytes) / 1073741824, 0) total_gb          FROM dba_temp_files         GROUP BY tablespace_name) a,       (SELECT tablespace_name,               nvl(sum(bytes) / 1073741824, 0) free_gb,               0 used_blocks          FROM dba_free_space         GROUP BY tablespace_name         UNION all        SELECT tablespace_name,               0 free_gb,               nvl(sum(used_blocks), 0) used_blocks          FROM gv$sort_segment         GROUP BY tablespace_name) b,       dba_tablespaces c WHERE c.tablespace_name = b.tablespace_name(+)   AND c.tablespace_name = a.tablespace_name(+) ORDER BY c.contents, free_pct, c.tablespace_name;
SELECT file_id, file_name, tablespace_name, autoextensible, bytes / 1073741824 as current_gb, maxbytes / 1073741824 as max_gb FROM dba_data_files UNIONSELECT file_id, file_name, tablespace_name as ts_name, autoextensible, bytes / 1073741824 as cur_gb, maxbytes / 1073741824 as max_gb FROM dba_temp_files ORDER BY tablespace_name, file_id, file_name;
SELECT group_number, name, type, total_mb, free_mb, hot_used_mb, required_mirror_free_mb, usable_file_mb, offline_disks FROM v$asm_diskgroup_stat;

3. 日志信息

日志信息主要是对联机日志、快速恢复日志相关动态信息的统计分析。具体包括联机日志的组号、线程号、序列号、文件路径、大小、状态、归档状态、组内成员数量;归档日志产生日期,产生的大小,和文件数等;每天日志切换的量以及切换的频率。以下脚本(for Oracle)供参考:

SELECT t2.member,       t1.group#,       t1.thread#,       t1.sequence#,       t1.bytes / 1024 / 1024 AS SIZE_MB,       t1.status,       t1.archived,       t1.members  FROM v$log t1, v$logfile t2 WHERE t1.group# = t2.group# ORDER BY thread#, group#;
SELECT lpad(to_char(first_time, 'yyyymmdd'), 12) "DATE", trunc(sum(blocks * block_size) / 1024 / 1024) "SIZE(MB)", count(*) FROM v$archived_log WHERE first_time > sysdate - 6 AND creator = 'ARCH' GROUP BY lpad(to_char(first_time, 'yyyymmdd'), 12) ORDER BY 1;
SELECT SUBSTR(TO_CHAR(first_time, 'MM/DD/RR HH:MI:SS'),1,5) Day, SUM(DECODE(SUBSTR(TO_CHAR(first_time, 'MM/DD/RR HH24:MI:SS'),10,2),'00',1,0)) H00, SUM(DECODE(SUBSTR(TO_CHAR(first_time, 'MM/DD/RR HH24:MI:SS'),10,2),'01',1,0)) H01, SUM(DECODE(SUBSTR(TO_CHAR(first_time, 'MM/DD/RR HH24:MI:SS'),10,2),'02',1,0)) H02, SUM(DECODE(SUBSTR(TO_CHAR(first_time, 'MM/DD/RR HH24:MI:SS'),10,2),'03',1,0)) H03, SUM(DECODE(SUBSTR(TO_CHAR(first_time, 'MM/DD/RR HH24:MI:SS'),10,2),'04',1,0)) H04, SUM(DECODE(SUBSTR(TO_CHAR(first_time, 'MM/DD/RR HH24:MI:SS'),10,2),'05',1,0)) H05, SUM(DECODE(SUBSTR(TO_CHAR(first_time, 'MM/DD/RR HH24:MI:SS'),10,2),'06',1,0)) H06, SUM(DECODE(SUBSTR(TO_CHAR(first_time, 'MM/DD/RR HH24:MI:SS'),10,2),'07',1,0)) H07, SUM(DECODE(SUBSTR(TO_CHAR(first_time, 'MM/DD/RR HH24:MI:SS'),10,2),'08',1,0)) H08, SUM(DECODE(SUBSTR(TO_CHAR(first_time, 'MM/DD/RR HH24:MI:SS'),10,2),'09',1,0)) H09, SUM(DECODE(SUBSTR(TO_CHAR(first_time, 'MM/DD/RR HH24:MI:SS'),10,2),'10',1,0)) H10, SUM(DECODE(SUBSTR(TO_CHAR(first_time, 'MM/DD/RR HH24:MI:SS'),10,2),'11',1,0)) H11, SUM(DECODE(SUBSTR(TO_CHAR(first_time, 'MM/DD/RR HH24:MI:SS'),10,2),'12',1,0)) H12, SUM(DECODE(SUBSTR(TO_CHAR(first_time, 'MM/DD/RR HH24:MI:SS'),10,2),'13',1,0)) H13, SUM(DECODE(SUBSTR(TO_CHAR(first_time, 'MM/DD/RR HH24:MI:SS'),10,2),'14',1,0)) H14, SUM(DECODE(SUBSTR(TO_CHAR(first_time, 'MM/DD/RR HH24:MI:SS'),10,2),'15',1,0)) H15, SUM(DECODE(SUBSTR(TO_CHAR(first_time, 'MM/DD/RR HH24:MI:SS'),10,2),'16',1,0)) H16, SUM(DECODE(SUBSTR(TO_CHAR(first_time, 'MM/DD/RR HH24:MI:SS'),10,2),'17',1,0)) H17, SUM(DECODE(SUBSTR(TO_CHAR(first_time, 'MM/DD/RR HH24:MI:SS'),10,2),'18',1,0)) H18, SUM(DECODE(SUBSTR(TO_CHAR(first_time, 'MM/DD/RR HH24:MI:SS'),10,2),'19',1,0)) H19, SUM(DECODE(SUBSTR(TO_CHAR(first_time, 'MM/DD/RR HH24:MI:SS'),10,2),'20',1,0)) H20, SUM(DECODE(SUBSTR(TO_CHAR(first_time, 'MM/DD/RR HH24:MI:SS'),10,2),'21',1,0)) H21, SUM(DECODE(SUBSTR(TO_CHAR(first_time, 'MM/DD/RR HH24:MI:SS'),10,2),'22',1,0)) H22 , SUM(DECODE(SUBSTR(TO_CHAR(first_time, 'MM/DD/RR HH24:MI:SS'),10,2),'23',1,0)) H23, COUNT(*) TOTAL FROM v$log_history a WHERE first_time>=to_char(sysdate-7) GROUP BY SUBSTR(TO_CHAR(first_time, 'MM/DD/RR HH:MI:SS'),1,5) ORDER BY SUBSTR(TO_CHAR(first_time, 'MM/DD/RR HH:MI:SS'),1,5) DESC;
SELECT thread#, sequence#, to_char(first_time, 'MM/DD/RR HH24:MI:SS') FROM v$log_history WHERE thread# = 1 AND first_time > sysdate - 1 ORDER BY first_time DESC;
SELECT thread#, sequence#, to_char(first_time, 'MM/DD/RR HH24:MI:SS') FROM v$log_history WHERE thread# = 2 AND first_time > sysdate - 1 ORDER BY first_time DESC;

4. 备份信息

备份信息主要是查看数据库备份任务的执行情况。主要包括备份任务的会话信息标识,备份内容,开始时间,结束时间,消耗时间。以下脚本(for Oracle)供参考:

SELECT session_key,       start_time,       end_time,       status,       time_taken_display tt  FROM v$rman_backup_job_details WHERE start_time > sysdate -1 ORDER BY session_key;

5. 性能信息

数据库性能诊断分析的基本目标分两个方面,一方面要看数据库本身跟性能有关的资源使用情况是否正常,另外一方面就是要抓取异常的会话、事件、SQL等。因此,这部分内容首先要收集数据库缓存使用情况,然后要收集异常会话以及相关进程的资源参数,24小时内CPU等待最长的事件,执行解析最多的SQL语句等。以下脚本(for Oracle)供参考:

SELECT free_space,       avg_free_size,       used_space,       avg_used_size,       request_failures,       last_failure_size   FROM v$shared_pool_reserved;
SELECT * FROM (SELECT t.sid, t.serial#, trunc(sysdate - logon_time) AS online_time, t.PROGRAM, t.status, t.LOGON_TIME, t.sql_id, t.prev_sql_id, t.event FROM gv$session t WHERE t.type <> 'BACKGROUND' AND program is not null ORDER BY logon_time) WHERE rownum <= 30;
SELECT * FROM gv$resource_limit WHERE trim(limit_value) != 'UNLIMITED';
SELECT t.*, s.sid, s.serial#, s.machine, s.program, s.osuser FROM (SELECT c.USERNAME, a.event, to_char(a.cnt) as seconds, a.sql_id, dbms_lob.substr(b.sql_fulltext, 100, 1) sqltext FROM (SELECT rownum rn, t.* FROM (SELECT decode(s.session_state, 'WAITING', s.event, 'Cpu + Wait For Cpu') Event, s.sql_id, s.user_id, count(*) cnt FROM v$active_session_history s WHERE sample_time > sysdate - 1 GROUP BY s.user_id, decode(s.session_state, 'WAITING', s.event, 'Cpu + Wait For Cpu'), s.sql_id ORDER BY cnt DESC) t WHERE rownum < 20) a, v$sqlarea b, dba_users c WHERE a.sql_id = b.sql_id AND a.user_id = c.user_id ORDER BY cnt DESC) t, v$session s WHERE t.sql_id = s.sql_id(+);
SELECT * FROM (SELECT sql_id, sql_text, s.executions, s.last_load_time, s.first_load_time, s.disk_reads, s.buffer_gets, s.parse_calls FROM v$sql s ORDER BY s.executions DESC) WHERE rownum <= 10;
SELECT * FROM (SELECT sql_id, sql_text, s.EXECUTIONS, s.LAST_LOAD_TIME, s.FIRST_LOAD_TIME, s.DISK_READS, s.BUFFER_GETS, s.PARSE_CALLS FROM v$sql s ORDER BY s.PARSE_CALLS DESC) WHERE rownum <= 10;

6. 容灾信息

所谓容灾信息就是指数据库有容灾配置模式的场景,比如Oracle的Data Guard。如果有相关的配置,则需要检查主备库的同步是否异常,主要通过归档日志的同步信息来判断分析。这个时候需要获取归档目标名称、状态、数据库当前模式、目的地路径等静态信息,需要获取每个节点应用日志和归档日志的最大号,SCN最大值和最小值等相关信息。以下脚本(for Oracle)供参考:

SELECT dest_name,       status,       database_mode,       destination    FROM v$archive_dest_status  WHERE dest_id in ('1','2');
SELECT m.thread#, m.sequence#, first_change#, next_change# FROM v$log_history m, (SELECT thread#, max(sequence#) as sequence# FROM v$log_history GROUP BY thread#) t WHERE m.thread# = t.thread# AND m.sequence# = t.sequence#;
SELECT UNIQUE thread# AS thread, MAX(sequence#) OVER (PARTITION BY thread#) AS last FROM v$archived_log;

三、实现自动化脚本集成

1. 将基本SQL文转换成可用SQL脚本

通过手动执行SQL命令的方式,查询结果可以展现在屏幕上。但是以脚本后台模式执行的时候,就需要将查询结果格式化之后输入到结果文件当中。可以采用文本文件或者HTML文件。采用文件文件的时候,需要对输出结果进行美观易读方面的格式化,想再升级为HTML文件时,SQL脚本中要加HTML头。例如以下是格式化之前和之后的对比:

PROMPT <p><h1 class="awr">XX公司数据库巡检报告</h1>PROMPT <p><a class="awr">数据库基本信息 </a>COLUMN log_mode FOR a12 COLUMN guard_status FOR a10COLUMN force_logging FOR a15COLUMN flashback_on  FOR a15COLUMN db_unique_name FOR a10COLUMN platform_name FOR a20SELECT name AS db_name,       dbid,       created,       platform_name,       db_unique_name,       log_mode,       guard_status,       force_logging,       flashback_on  FROM v$database;

2. 通过Shell脚本调用SQL脚本

如果单纯通过Shell脚本完成对SQL脚本的调用还是有些单薄。因此Shell脚本的设计需要完成两方面的功能,一方面需要收集操作系统相关的诊断信息,另外一方面要定义SQL脚本执行的初始化参数,完成对SQL脚本的调用执行。对于操作系统级别的诊断信息收集,无非是在Shell当中调用操作系统命令,如:uptime、vmstat、free、df、sar、iostat;对于SQL脚本的调用,需要完善相关初始化信息,如位置信息(脚本位置、输入文件位置、数据库安装目录等相关信息),时间信息(执行日期、时间),连接必要信息(用户、数据库名称、标识等)。以下调用过程脚本片段供参考:

function _dbinfo(){cd $PWDDIRECTORYsu - oracle -c "cd $LOGDIR/db;export ORACLE_SID=$1;sqlplus -s /nolog <<EOFconn / as sysdba;@$LOGDIR/../dailyhealthycheck_withadg.sqlexit;EOF"}

3. 通过Python脚本实现集中自动化调用

对于这个步骤实现的功能有两个重点,一个是完成集中化的优势,一个是完成自动化交互的功能。集中化的优势需要在循环语句当中完成。自动化交互的功能本来就是Python之类脚本语言的优势。只是我们在调用的时候需要考虑的初始化参数的定义和确认。以下调用过程脚本片段供参考:

def ssh_cmd(ip, user, passwd, cmd):    ret = -1    ssh = pexpect.spawn('ssh %s@%s "%s"' % (user,ip, cmd),timeout=120)    try:        i = ssh.expect(['password:', 'continue connecting (yes/no)?'], timeout=5)        if i == 0 :            ssh.sendline(passwd)        elif i == 1:            ssh.sendline('yes\n')            ssh.expect('password: ')            ssh.sendline(passwd)        ssh.sendline(cmd)        r = ssh.read()    print (r)        ret = r    except pexpect.EOF:        print ("EOF")        ssh.close()        ret = -1    except pexpect.TIMEOUT:    print ("TIMEOUT")    ssh.close()    ret = -2    return ret

4. 工具化产品化打造过程

大部分数据库管理员设计脚本工具都是为了便于自己的日常管理工作,基本上不会考虑到工具的可维护性、健壮性、规范化以及未来的扩展性。如果想让自己辛苦设计出来的脚本最终可以转化为自动化运维工具,那么还要考虑到以下三个方面的问题:

(1). 遵循开发者规范,将脚本的设计趋向于标准化、模块化。例如脚本注释标准化,利用函数模块化设计,变量定义规范化。

(2). 每一个执行步骤要考虑到执行前的确认步骤,执行后的反馈步骤,执行中的异常处理场景。

(3). 所有参数输入尽量采用变量化设计,所有常量以文件方式隔离出脚本本身。

四、参考脚本

-- ------------------------------------------------------------------------------------- File Name    : dailyhealthycheck_withadg_pdb.sql-- Author       : haizdl@126.com-- Description  : Daily Healthy Checking for Oracle Rac Database.-- Requirements : Access to the V$ views.-- Call Syntax  : by script "healthyCheck_forLinux.sh" or "healthyCheck1_forAIX.sh"-- Last Modified: 23/05/2017-- -----------------------------------------------------------------------------------SET MARKUP HTML ON SPOOL ON ENTMAP OFF PREFORMAT OFFSET TERM OFFSET HEADING ONSET VERIFY OFFSET FEEDBACK OFFSET LINE 55555SET PAGES 999999SET LONG 999999999SET LONGCHUNKSIZE 999999
-- ------------------------------------------------------------------------------------- SECTION: 巡检脚本初始化-- -----------------------------------------------------------------------------------COLUMN dbid new_value spool_dbidCOLUMN inst_num new_value spool_inst_numSELECT dbid FROM v$database WHERE rownum = 1;
SELECT instance_number AS inst_num FROM v$instance WHERE rownum = 1;
COLUMN spoolfile_name new_value spoolfileSELECT 'spool_'||(SELECT instance_name FROM v$instance WHERE rownum=1)||'_'||TO_CHAR(SYSDATE,'yy-mm-dd_hh24.mi')||'_daily' AS spoolfile_name FROM dual;spool &&spoolfile..html

PROMPT <p><h1 class="awr">XXX数据库日巡检报告</h1>-- ------------------------------------------------------------------------------------- SECTION: 数据库基本信息-- -----------------------------------------------------------------------------------PROMPT <p><h3 class="awr">数据库基本信息汇总</h3>
/*数据库标识、数据库名、创建日期、平台名称、唯一名称、归档模式等信息。*/PROMPT <p><a class="awr">数据库状态 </a>
COLUMN log_mode FOR a12 COLUMN guard_status FOR a10COLUMN force_logging FOR a15COLUMN flashback_on FOR a15COLUMN db_unique_name FOR a10COLUMN platform_name FOR a20SELECT name AS db_name, dbid, created, platform_name, db_unique_name, log_mode, guard_status, force_logging, flashback_on FROM v$database;
/*实例的序号、名称、主机名、启动时间、状态、运行时间等。*/PROMPT <p><a class="awr">数据库实例基本信息</a>
CLEAR COLUMNSSET LINE 200COLUMN host_name FOR A50SELECT instance_name, inst_id, host_name, TO_CHAR(startup_time, 'YYYY/MM/DD HH24:MI:SS') AS startup_time, status, SYSDATE - STARTUP_TIME AS Running_Time FROM gv$instance ORDER BY inst_id;
-- ------------------------------------------------------------------------------------- SECTION: 联机重做日志信息-- -----------------------------------------------------------------------------------PROMPT <h3 class="awr">redo信息</h3>
/*联机日志的组号、线程号、序列号、文件路径、大小、状态、归档状态、组内成员数量。*/PROMPT <p><a class="awr">数据库联机日志</a>
CLEAR COLUMNSSET LINE 200SET PAGES 1000
COL status FOR a30COL member FOR a45SELECT t2.member, t1.group#, t1.thread#, t1.sequence#, t1.bytes / 1024 / 1024 AS SIZE_MB, t1.status, t1.archived, t1.members FROM v$log t1, v$logfile t2 WHERE t1.group# = t2.group# ORDER BY thread#, group#;
/*(即可分析6天的波度,又可分析24小时内,可很容易看出异常情况)。*/ PROMPT <p><a class="awr">最近7天中每天日志切换的量</a>
SELECT SUBSTR(TO_CHAR(first_time, 'MM/DD/RR HH:MI:SS'),1,5) Day, SUM(DECODE(SUBSTR(TO_CHAR(first_time, 'MM/DD/RR HH24:MI:SS'),10,2),'00',1,0)) H00, SUM(DECODE(SUBSTR(TO_CHAR(first_time, 'MM/DD/RR HH24:MI:SS'),10,2),'01',1,0)) H01, SUM(DECODE(SUBSTR(TO_CHAR(first_time, 'MM/DD/RR HH24:MI:SS'),10,2),'02',1,0)) H02, SUM(DECODE(SUBSTR(TO_CHAR(first_time, 'MM/DD/RR HH24:MI:SS'),10,2),'03',1,0)) H03, SUM(DECODE(SUBSTR(TO_CHAR(first_time, 'MM/DD/RR HH24:MI:SS'),10,2),'04',1,0)) H04, SUM(DECODE(SUBSTR(TO_CHAR(first_time, 'MM/DD/RR HH24:MI:SS'),10,2),'05',1,0)) H05, SUM(DECODE(SUBSTR(TO_CHAR(first_time, 'MM/DD/RR HH24:MI:SS'),10,2),'06',1,0)) H06, SUM(DECODE(SUBSTR(TO_CHAR(first_time, 'MM/DD/RR HH24:MI:SS'),10,2),'07',1,0)) H07, SUM(DECODE(SUBSTR(TO_CHAR(first_time, 'MM/DD/RR HH24:MI:SS'),10,2),'08',1,0)) H08, SUM(DECODE(SUBSTR(TO_CHAR(first_time, 'MM/DD/RR HH24:MI:SS'),10,2),'09',1,0)) H09, SUM(DECODE(SUBSTR(TO_CHAR(first_time, 'MM/DD/RR HH24:MI:SS'),10,2),'10',1,0)) H10, SUM(DECODE(SUBSTR(TO_CHAR(first_time, 'MM/DD/RR HH24:MI:SS'),10,2),'11',1,0)) H11, SUM(DECODE(SUBSTR(TO_CHAR(first_time, 'MM/DD/RR HH24:MI:SS'),10,2),'12',1,0)) H12, SUM(DECODE(SUBSTR(TO_CHAR(first_time, 'MM/DD/RR HH24:MI:SS'),10,2),'13',1,0)) H13, SUM(DECODE(SUBSTR(TO_CHAR(first_time, 'MM/DD/RR HH24:MI:SS'),10,2),'14',1,0)) H14, SUM(DECODE(SUBSTR(TO_CHAR(first_time, 'MM/DD/RR HH24:MI:SS'),10,2),'15',1,0)) H15, SUM(DECODE(SUBSTR(TO_CHAR(first_time, 'MM/DD/RR HH24:MI:SS'),10,2),'16',1,0)) H16, SUM(DECODE(SUBSTR(TO_CHAR(first_time, 'MM/DD/RR HH24:MI:SS'),10,2),'17',1,0)) H17, SUM(DECODE(SUBSTR(TO_CHAR(first_time, 'MM/DD/RR HH24:MI:SS'),10,2),'18',1,0)) H18, SUM(DECODE(SUBSTR(TO_CHAR(first_time, 'MM/DD/RR HH24:MI:SS'),10,2),'19',1,0)) H19, SUM(DECODE(SUBSTR(TO_CHAR(first_time, 'MM/DD/RR HH24:MI:SS'),10,2),'20',1,0)) H20, SUM(DECODE(SUBSTR(TO_CHAR(first_time, 'MM/DD/RR HH24:MI:SS'),10,2),'21',1,0)) H21, SUM(DECODE(SUBSTR(TO_CHAR(first_time, 'MM/DD/RR HH24:MI:SS'),10,2),'22',1,0)) H22 , SUM(DECODE(SUBSTR(TO_CHAR(first_time, 'MM/DD/RR HH24:MI:SS'),10,2),'23',1,0)) H23, COUNT(*) TOTAL FROM v$log_history a WHERE first_time>=to_char(sysdate-7) GROUP BY SUBSTR(TO_CHAR(first_time, 'MM/DD/RR HH:MI:SS'),1,5) ORDER BY SUBSTR(TO_CHAR(first_time, 'MM/DD/RR HH:MI:SS'),1,5) DESC;
/*日志切换频率分析(注意观察各行里first_time之间的时间差异会不会很短,很短就是切换过频繁)。*/PROMPT <p><a class="awr">日志切换频率分析</a>
SET LINE 200SET PAGES 1000SELECT thread#, sequence#, to_char(first_time, 'MM/DD/RR HH24:MI:SS') FROM v$log_history WHERE thread# = 1 AND first_time > sysdate - 1 ORDER BY first_time DESC;
SELECT thread#, sequence#, to_char(first_time, 'MM/DD/RR HH24:MI:SS') FROM v$log_history WHERE thread# = 2 AND first_time > sysdate - 1 ORDER BY first_time DESC;
-- ------------------------------------------------------------------------------------- SECTION: 归档日志信息-- -----------------------------------------------------------------------------------PROMPT <h3 class="awr">归档日志信息</h3>
/*归档日志产生日期,产生的大小,和文件数等。*/PROMPT <p><a class="awr">查询归档产生状况</a>
SELECT lpad(to_char(first_time, 'yyyymmdd'), 12) "DATE", trunc(sum(blocks * block_size) / 1024 / 1024) "SIZE(MB)", count(*) FROM v$archived_log WHERE first_time > sysdate - 6 AND creator = 'ARCH' GROUP BY lpad(to_char(first_time, 'yyyymmdd'), 12) ORDER BY 1;

-- ------------------------------------------------------------------------------------- SECTION: 空间使用-- -----------------------------------------------------------------------------------PROMPT <h3 class="awr">空间信息</h3>
/*表空间名称,类型,区管理方式,区初始大小,段的管理方式,总大小,剩余大小,剩余百分比。*/PROMPT <p><a class="awr">数据库表空间信息</a>
CLEAR COLUMNSCOLUMN tablespace_name FOR a20 COLUMN contents FOR a9 COLUMN ext_mgmt FOR a12 COLUMN alloc_type FOR a9 COLUMN ext_kb FOR 9999999 COLUMN ssm FOR a10 COLUMN total_gb FOR 99999999.99 COLUMN free_gb FOR 99999999.99COLUMN free_pct FOR 999.99 SET PAGES 100SELECT c.tablespace_name, c.contents, c.extent_management ext_mgmt, c.allocation_type alloc_type, c.initial_extent / 1024 ext_kb, c.segment_space_management SSM, nvl(a.total_gb, 0) total_gb, decode(c.contents, 'TEMPORARY', nvl(a.total_gb, 0) - c.block_size * b.used_blocks / 1073741824, nvl(b.free_gb, 0)) free_gb, 100 * decode(c.contents, 'TEMPORARY', nvl(a.total_gb, 0) - c.block_size * b.used_blocks / 1073741824, nvl(b.free_gb, 0)) / nvl(a.total_gb, 1) free_pct FROM (SELECT tablespace_name, nvl(sum(bytes) / 1073741824, 0) total_gb FROM dba_data_files GROUP BY tablespace_name UNION all SELECT tablespace_name, nvl(sum(bytes) / 1073741824, 0) total_gb FROM dba_temp_files GROUP BY tablespace_name) a, (SELECT tablespace_name, nvl(sum(bytes) / 1073741824, 0) free_gb, 0 used_blocks FROM dba_free_space GROUP BY tablespace_name UNION all SELECT tablespace_name, 0 free_gb, nvl(sum(used_blocks), 0) used_blocks FROM gv$sort_segment GROUP BY tablespace_name) b, dba_tablespaces c WHERE c.tablespace_name = b.tablespace_name(+) AND c.tablespace_name = a.tablespace_name(+) ORDER BY c.contents, free_pct, c.tablespace_name;

PROMPT <p><a class="awr">数据库表空间与数据文件</a>/*表空间文件对应的表空间,是否自动扩展,当前值,最大值。*/SET LINE 200SET PAGES 100
COLUMN file_name FOR A65COLUMN tablespace_name FOR A30SELECT file_id, file_name, tablespace_name, autoextensible, bytes / 1073741824 as current_gb, maxbytes / 1073741824 as max_gb FROM dba_data_files UNIONSELECT file_id, file_name, tablespace_name as ts_name, autoextensible, bytes / 1073741824 as cur_gb, maxbytes / 1073741824 as max_gb FROM dba_temp_files ORDER BY tablespace_name, file_id, file_name;
-- ------------------------------------------------------------------------------------- SECTION: 存储信息-- -----------------------------------------------------------------------------------PROMPT <h3 class="awr">ASM信息</h3>
/*磁盘组的序号、名称、冗余类型、总大小、剩余大小、使用频繁的数据大小、镜像需要的空间、可用于镜像的大小、offline的磁盘数量。*/PROMPT <p><a class="awr">查询ASM磁盘组信息</a>
CLEAR COLUMNSCOLUMN group_number FOR 9999999999COLUMN name FOR A20COLUMN type FOR A20COLUMN total_mb FOR 9999999999COLUMN free_mb FOR 9999999999COLUMN hot_used_mb FOR 9999999999COLUMN required_mirror_free_mb FOR 9999999999COLUMN usable_file_mb FOR 9999999999COLUMN offline_disks FOR 9999999999SELECT group_number, name, type, total_mb, free_mb, hot_used_mb, required_mirror_free_mb, usable_file_mb, offline_disks FROM v$asm_diskgroup_stat;
/*磁盘组对应磁盘的信息。*/PROMPT <p><a class="awr">查询ASM磁盘组存储信息</a>
CLEAR COLUMNSCOLUMN g_number FOR 9999COLUMN d_number FOR 9999COLUMN g_name FOR A8COLUMN d_name FOR A15COLUMN g_total_mb FOR 99999999COLUMN d_total_mb FOR 99999999COLUMN path FOR A30COLUMN h_status FOR A8COLUMN g_free_mb FOR 99999999COLUMN d_free_mb FOR 99999999SELECT g.group_number g_number, g.name g_name, g.total_mb g_total_mb, g.free_mb g_free_mb, g.state g_state, d.disk_number d_number, d.name d_name, d.path, d.total_mb d_total_mb, d.free_mb d_free_mb, d.header_status h_status, d.mode_status m_status FROM v$asm_diskgroup_stat g, v$asm_disk_stat d WHERE g.group_number = d.group_number ORDER BY g.group_number, d.disk_number;
-- ------------------------------------------------------------------------------------- SECTION: 闪回区信息-- -----------------------------------------------------------------------------------PROMPT <h3 class="awr">闪回信息</h3>
/*文件类型、空间使用百分比、可回收百分比、文件的数量。*/PROMPT <p><a class="awr">闪回区空间使用状况</a>
CLEAR COLUMNSSET LINE 100SELECT * FROM v$flash_recovery_area_usage;

-- ------------------------------------------------------------------------------------- SECTION: 备份信息-- -----------------------------------------------------------------------------------PROMPT <h3 class="awr">数据库备份信息</h3>
/*会话信息标识,备份内容,开始时间,结束时间,消耗时间。*/PROMPT <p><a class="awr">查询备份信息</a>
CLEAR COLUMNSSET PAGES 1000COLUMN time_taken_display FOR a10COLUMN start_time FOR a20COLUMN end_time FOR a20COLUMN status FOR a15COLUMN tt FOR a10SELECT session_key, start_time, end_time, status, time_taken_display tt FROM v$rman_backup_job_details WHERE start_time > sysdate -1 ORDER BY session_key;
-- ------------------------------------------------------------------------------------- SECTION: 会话信息-- -----------------------------------------------------------------------------------PROMPT <p><h3 class="awr">session情况</h3>
/*查询登录异常的会话。*/PROMPT <p><a class="awr">登录时间最长的会话</a>
SET LINE 500CLEAR COLUMNSCOLUMN spid for a15COLUMN sql_id for a20COLUMN program for a40COLUMN event for a35SELECT * FROM (SELECT t.sid, t.serial#, trunc(sysdate - logon_time) AS online_time, t.PROGRAM, t.status, t.LOGON_TIME, t.sql_id, t.prev_sql_id, t.event FROM gv$session t WHERE t.type <> 'BACKGROUND' AND program is not null ORDER BY logon_time) WHERE rownum <= 30;
PROMPT <p><a class="awr">查看进程资源限制</a>/*资源所在的实例,资源的名称,当前大小,上次启动以来的最大值,初始大小*/CLEAR COLUMNSSET PAGES 100SET LINE 200COLUMN resource_name FOR a30COLUMN current_utilization FOR 9999999COLUMN max_utilization FOR 9999999COLUMN initial_allocation FOR a18COLUMN limit_value FOR a15SELECT * FROM gv$resource_limit WHERE trim(limit_value) != 'UNLIMITED';
-- ------------------------------------------------------------------------------------- SECTION: 等待事件-- -----------------------------------------------------------------------------------prompt <a class="awr">24小时内CPU等待最长的事件</a>CLEAR COLUMNSSET PAGES 100SET LINE 800COLUMN user_id for a45COLUMN SQL_ID for a30COLUMN Event for a45COLUMN sql_id for a45COLUMN user_id for a45COLUMN cnt for a45COLUMN machine for a45COLUMN program for a45SELECT t.*, s.sid, s.serial#, s.machine, s.program, s.osuser FROM (SELECT c.USERNAME, a.event, to_char(a.cnt) as seconds, a.sql_id, dbms_lob.substr(b.sql_fulltext, 100, 1) sqltext FROM (SELECT rownum rn, t.* FROM (SELECT decode(s.session_state, 'WAITING', s.event, 'Cpu + Wait For Cpu') Event, s.sql_id, s.user_id, count(*) cnt FROM v$active_session_history s WHERE sample_time > sysdate - 1 GROUP BY s.user_id, decode(s.session_state, 'WAITING', s.event, 'Cpu + Wait For Cpu'), s.sql_id ORDER BY cnt DESC) t WHERE rownum < 20) a, v$sqlarea b, dba_users c WHERE a.sql_id = b.sql_id AND a.user_id = c.user_id ORDER BY cnt DESC) t, v$session s WHERE t.sql_id = s.sql_id(+);
-- ------------------------------------------------------------------------------------- SECTION: 内存信息-- -----------------------------------------------------------------------------------PROMPT <h3 class="awr">04031报错</h3>PROMPT <p><a class="awr">04031报错(出错时,检查shared_pool_reserved池是否有报错)</a>SELECT free_space, avg_free_size, used_space, avg_used_size, request_failures, last_failure_size FROM v$shared_pool_reserved;
-- ------------------------------------------------------------------------------------- SECTION: SQL信息-- -----------------------------------------------------------------------------------PROMPT <h3 class="awr">当前SQL</h3>PROMPT <p><a class="awr">执行次数最多</a>SET LINE 300SELECT * FROM (SELECT sql_id, sql_text, s.executions, s.last_load_time, s.first_load_time, s.disk_reads, s.buffer_gets, s.parse_calls FROM v$sql s ORDER BY s.executions DESC) WHERE rownum <= 10;
PROMPT <p><a class="awr">解析次数最多</a>SET LINE 300SELECT * FROM (SELECT sql_id, sql_text, s.EXECUTIONS, s.LAST_LOAD_TIME, s.FIRST_LOAD_TIME, s.DISK_READS, s.BUFFER_GETS, s.PARSE_CALLS FROM v$sql s ORDER BY s.PARSE_CALLS DESC) WHERE rownum <= 10;

-- ------------------------------------------------------------------------------------- SECTION: ADG-- -----------------------------------------------------------------------------------PROMPT <h3 class="awr">ADG信息</h3>
/*归档目标名称、状态、数据库当前模式、目的地路径*/PROMPT <p><a class="awr">归档目的地状态</a>
column dest_name FOR a20column status FOR a10column database_mode FOR a20column destination FOR a20SELECT dest_name, status, database_mode, destination FROM v$archive_dest_status WHERE dest_id in ('1','2');
/*每个节点,应用日志的最大号,和scn最大值和最小值*/PROMPT <p><a class="awr">查询最大的应用日志号</a>
CLEAR COLUMNSSET LINE 150SELECT m.thread#, m.sequence#, first_change#, next_change# FROM v$log_history m, (SELECT thread#, max(sequence#) as sequence# FROM v$log_history GROUP BY thread#) t WHERE m.thread# = t.thread# AND m.sequence# = t.sequence#;
/*查询每个节点最大的归档日志号*/PROMPT <p><a class="awr">查询最大的归档日志号</a>
SET LINE 100SELECT UNIQUE thread# AS thread, MAX(sequence#) OVER (PARTITION BY thread#) AS last FROM v$archived_log;/*PROMPT <p><a class="awr">查询GAP</a>备库缺少日志的起始号码clear columnsset line 200SELECT THREAD#,LOW_SEQUENCE#,HIGH_SEQUENCE# FROM V$ARCHIVE_GAP;*/
PROMPT <br><a class="awr" href="#top">Back to Top</a>PROMPT <p>End of Report</p>PROMPT <p></body></html>SPOOL OFFexit;

#!/bin/bash#----------------------------------------------------------------------------------------------------+#                      HAIZDL@126.COM                                                                |#----------------------------------------------------------------------------------------------------|#      COPYRIGHT (C) 1998-2015 HAIZDL ALL RIGHTS RESERVED.                                         |#----------------------------------------------------------------------------------------------------|# DATABASE : ORACLE                                                                                  |# FILE     : healthyCheck1_forLinux.sh                                                                          |# CLASS    : DATABASE ADMINISTRATION                                                                 |# PURPOSE  : COLLECTING LOGS FOR ORACLE DATABASE DAILY CHECKING.                                     |# NOTE     : AS WITH ANY CODE, ENSURE TO TEST THIS SCRIPT IN A DEVELOPMENT                           |#            ENVIRONMENT BEFORE ATTEMPTING TO RUN IT IN PRODUCTION.                                  |#----------------------------------------------------------------------------------------------------+
#----------------------------------------------#CHECK JOB OPERATOR USER INFO.#----------------------------------------------function _userVerify(){ USER=`whoami` if [ "$USER" != "root" ]; then echo "PLS USE ROOT USER FOR THIS JOB." echo "USAGE: $0" exit 1 fi}
#----------------------------------------------#BASIC VARIABLE.#----------------------------------------------HOST=`hostname`PWDDIRECTORY=/tmp/oraDATESTRING=`date '+%Y%m%d'`LOGDIR=$PWDDIRECTORY/$DATESTRING$HOSTASMSID=+ASM1
#LOG DIRECTORY PARAMETEROINSTALLDIRECTORY=/home/u01/appRDBMSDIRECTORY=$OINSTALLDIRECTORY/oracle/diag/rdbmsGRIDLOGDIRECTORY=$OINSTALLDIRECTORY/11.2.0.4/grid/logGRIDDIRECTORY=$OINSTALLDIRECTORY/grid
ALERTOFCLUSTER=$GRIDLOGDIRECTORY/$HOST/alert$HOST.logALERTOFLISTENER=$GRIDDIRECTORY/diag/tnslsnr/$HOST/listener/trace/listener.logALERTOFLISTENERSCAN=$GRIDLOGDIRECTORY/diag/tnslsnr/$HOST/listener_scan1/trace/listener_scan1.logALERTOFASM=$GRIDDIRECTORY/diag/asm/+asm/$ASMSID/trace/alert_$ASMSID.log
OPATCHORACLEDIR=$OINSTALLDIRECTORY/oracle/product/11.2.0.4/dbhome_1/OPatchOPATCHGRIDDIR=$OINSTALLDIRECTORY/11.2.0.4/grid/OPatch
#DIAGNOSTIC FILESLOGOFOSINFO=$LOGDIR/os/info_os_`date '+%y%m%d%H%M'`LOGOFCLUSTER=$LOGDIR/cluster/info_cluster_`date '+%y%m%d%H%M'`DBFILE=$LOGDIR/dbs.tmp
#----------------------------------------------#CREATE LOG INFO DIRECTORY.#----------------------------------------------function _initialization(){ if [ -d $LOGDIR ]; then echo "INIT: LOG DIRECTORY EXISTS, DELETING." rm -rf $LOGDIR fi
echo "INIT: LOG DIRECTORY CREATING." mkdir $LOGDIR mkdir $LOGDIR/os mkdir $LOGDIR/cluster mkdir $LOGDIR/db chmod -R 777 $LOGDIR
if [ $? -ne 0 ]; then echo "INIT: LOG DIRECTORY CREATING ERROR!" exit 1 fi}
#----------------------------------------------------------------------------------------------------|# 1. CLUSTER LOG. |#----------------------------------------------------------------------------------------------------|function _clusterlog(){
cd $PWDDIRECTORY if [ -e $ALERTOFCLUSTER ]; then ALERTOFCLUSTERNUM=$(($(awk '{print NR}' $ALERTOFCLUSTER |tail -n1) - $(cat -n $ALERTOFCLUSTER| grep "`date -d last-day +%Y-%m-%d|cut -c 1-10`" | awk 'BEGIN {min=99999999; max=-1;} {if(min>$1) min=$1; if(max<$1) max=$1;} END {printf("%d", min);}') +1)) if [ $ALERTOFCLUSTERNUM -gt 0 ]; then tail -$ALERTOFCLUSTERNUM $ALERTOFCLUSTER >>$LOGDIR/cluster/alert$HOST_`date '+%y%m%d'`.log if [ $? -ne 0 ]; then echo " JOB-1-1: CLUSTER ALERT LOG COPY ERROR!" exit 1 fi fi fi
if [ -e $ALERTOFLISTENER ]; then ALERTOFLISTENERNUM=$(($(awk '{print NR}' $ALERTOFLISTENER |tail -n1) - $(cat -n $ALERTOFLISTENER| grep "`date -d last-day "+%d-%^b-%Y"`" | awk 'BEGIN {min=9999999; max=-1;} {if(min>$1) min=$1; if(max<$1) max=$1;} END {printf("%d", min);}') +1)) if [ $ALERTOFLISTENERNUM -gt 0 ]; then tail -$ALERTOFLISTENERNUM $ALERTOFLISTENER >>$LOGDIR/cluster/alert_listener_$HOST_`date '+%y%m%d'`.log if [ $? -ne 0 ]; then echo " JOB-1-2: CLUSTER ALERT LOG COPY ERROR!" exit 1 fi fi fi
if [ -e $ALERTOFLISTENERSCAN ]; then ALERTOFLISTENERSCANNUM=$(($(awk '{print NR}' $ALERTOFLISTENERSCAN |tail -n1) - $(cat -n $ALERTOFLISTENERSCAN| grep "`date -d last-day "+%d-%^b-%Y"`" | awk 'BEGIN {min=9999999; max=-1;} {if(min>$1) min=$1; if(max<$1) max=$1;} END {printf("%d", min);}') +1)) if [ $ALERTOFLISTENERSCANNUM -gt 0 ]; then tail -$ALERTOFLISTENERSCANNUM $ALERTOFLISTENERSCAN >>$LOGDIR/cluster/alert_listener_scan1_`date '+%y%m%d'`.log if [ $? -ne 0 ]; then echo " JOB-1-3: CLUSTER ALERT LOG COPY ERROR!" exit 1 fi fi fi
if [ -e $ALERTOFASM ]; then ALERTOFASMNUM=$(($(awk '{print NR}' $ALERTOFASM |tail -n1) - $(cat -n $ALERTOFASM| grep "`date -d last-day|cut -c 1-8``date -d last-day +%d`" | awk 'BEGIN {min=9999999; max=-1;} {if(min>$1) min=$1; if(max<$1) max=$1;} END {printf("%d", min);}') +1)) if [ $ALERTOFASMNUM -gt 0 ]; then tail -$ALERTOFASMNUM $ALERTOFASM >>$LOGDIR/cluster/alert_$ASMSID_`date '+%y%m%d'`.log if [ $? -ne 0 ]; then echo " JOB-1-4: CLUSTER ALERT LOG COPY ERROR!" exit 1 fi fi fi
echo " " >>$LOGOFCLUSTER echo " " >>$LOGOFCLUSTER}
#----------------------------------------------------------------------------------------------------|# 2. OS INFO. |#----------------------------------------------------------------------------------------------------|function _osinfo(){
echo "---------------------------------------------------------------" >>$LOGOFOSINFO echo "- UPTIME:" >>$LOGOFOSINFO echo "---------------------------------------------------------------" >>$LOGOFOSINFO uptime >>$LOGOFOSINFO echo " " >>$LOGOFOSINFO
echo "---------------------------------------------------------------" >>$LOGOFOSINFO echo "- VMSTAT:" >>$LOGOFOSINFO echo "---------------------------------------------------------------" >>$LOGOFOSINFO vmstat 1 5 >>$LOGOFOSINFO echo " " >>$LOGOFOSINFO
echo "---------------------------------------------------------------" >>$LOGOFOSINFO echo "- POWERMT:" >>$LOGOFOSINFO echo "---------------------------------------------------------------" >>$LOGOFOSINFO powermt display dev=all >>$LOGOFOSINFO echo " " >>$LOGOFOSINFO
echo "---------------------------------------------------------------" >>$LOGOFOSINFO echo "- FREE:" >>$LOGOFOSINFO echo "---------------------------------------------------------------" >>$LOGOFOSINFO free -m >>$LOGOFOSINFO echo " " >>$LOGOFOSINFO
echo "---------------------------------------------------------------" >>$LOGOFOSINFO echo "- DF:" >>$LOGOFOSINFO echo "---------------------------------------------------------------" >>$LOGOFOSINFO df -h >>$LOGOFOSINFO echo " " >>$LOGOFOSINFO
echo "---------------------------------------------------------------" >>$LOGOFOSINFO echo "- SAR-PAGE:" >>$LOGOFOSINFO echo "---------------------------------------------------------------" >>$LOGOFOSINFO sar -B 1 5 >>$LOGOFOSINFO echo " " >>$LOGOFOSINFO
echo "---------------------------------------------------------------" >>$LOGOFOSINFO echo "- SAR-CPU:" >>$LOGOFOSINFO echo "---------------------------------------------------------------" >>$LOGOFOSINFO sar -u 1 5 >>$LOGOFOSINFO echo " " >>$LOGOFOSINFO
echo "---------------------------------------------------------------" >>$LOGOFOSINFO echo "- IOSTAT:" >>$LOGOFOSINFO echo "---------------------------------------------------------------" >>$LOGOFOSINFO iostat -mx 1 5 | grep power >>$LOGOFOSINFO echo " " >>$LOGOFOSINFO
echo "---------------------------------------------------------------" >>$LOGOFOSINFO echo "- SAR-NETWORK:" >>$LOGOFOSINFO echo "---------------------------------------------------------------" >>$LOGOFOSINFO sar -n DEV 1 5 | grep bond >>$LOGOFOSINFO echo " " >>$LOGOFOSINFO
if [ $? -ne 0 ]; then echo " JOB-2-1: OS INFO ERROR!" exit 1 fi
LOGOFSYS=/var/log/messages if [ -e $LOGOFSYS ]; then cp $LOGOFSYS $LOGDIR/os if [ $? -ne 0 ]; then echo " JOB-2-2: OS INFO ERROR!" exit 1 fi fi}
#----------------------------------------------------------------------------------------------------|# 3. CLUSTER INFO. |#----------------------------------------------------------------------------------------------------|function _clusterinfo(){
echo "---------------------------------------------------------------" >>$LOGOFCLUSTER echo "- 查看集群资源状态:" >>$LOGOFCLUSTER echo "---------------------------------------------------------------" >>$LOGOFCLUSTER echo " " >>$LOGOFCLUSTER su - grid -c "crsctl status res -t " >>$LOGOFCLUSTER echo " " >>$LOGOFCLUSTER
echo "---------------------------------------------------------------" >>$LOGOFCLUSTER echo "- 查看集群服务状态:" >>$LOGOFCLUSTER echo "---------------------------------------------------------------" >>$LOGOFCLUSTER echo " " >>$LOGOFCLUSTER su - grid -c "crsctl check cluster -all" >>$LOGOFCLUSTER echo " " >>$LOGOFCLUSTER
echo "---------------------------------------------------------------" >>$LOGOFCLUSTER echo "- 查看集群监听状态:" >>$LOGOFCLUSTER echo "---------------------------------------------------------------" >>$LOGOFCLUSTER echo " " >>$LOGOFCLUSTER echo "#1 本地监听 ">>$LOGOFCLUSTER su - grid -c "lsnrctl status" >>$LOGOFCLUSTER echo "#2 SCAN监听 " >>$LOGOFCLUSTER su - grid -c "lsnrctl status LISTENER_SCAN1" >>$LOGOFCLUSTER echo " " >>$LOGOFCLUSTER
echo "---------------------------------------------------------------" >>$LOGOFCLUSTER echo "- 查看磁盘组信息及状态 ">>$LOGOFCLUSTER echo "---------------------------------------------------------------" >>$LOGOFCLUSTER echo "- #1 ">>$LOGOFCLUSTER su - grid -c "asmcmd lsdg" >>$LOGOFCLUSTER
echo "---------------------------------------------------------------" >>$LOGOFCLUSTER echo "- 查看OCR信息 ">>$LOGOFCLUSTER echo "---------------------------------------------------------------" >>$LOGOFCLUSTER echo "- #1 心跳盘 ">>$LOGOFCLUSTER su - grid -c "crsctl query css votedisk" >>$LOGOFCLUSTER echo "- #2 一致性检查 " >>$LOGOFCLUSTER su - grid -c "ocrcheck" >>$LOGOFCLUSTER echo " " >>$LOGOFCLUSTER}
#----------------------------------------------------------------------------------------------------|# 4. DB INFO. |#----------------------------------------------------------------------------------------------------|function _dbinfo(){cd $PWDDIRECTORYsu - oracle -c "cd $LOGDIR/db;export ORACLE_SID=$1;sqlplus -s /nolog <<EOFconn / as sysdba;@$LOGDIR/../dailyhealthycheck_withadg.sqlexit;EOF"}
#----------------------------------------------------------------------------------------------------|# 5. DB LOG. |#----------------------------------------------------------------------------------------------------|function _dblog(){ cd $PWDDIRECTORY DBNAME2=$1 DBNAME2=${DBNAME2%?} ALERTOFDB=$RDBMSDIRECTORY/$DBNAME2/$1/trace/alert_$1.log if [ -e $ALERTOFDB ]; then ALERTOFDBNUM=$(($(awk '{print NR}' $ALERTOFDB |tail -n1) - $(cat -n $ALERTOFDB| grep "`date -d last-day|cut -c 1-8``date -d last-day +%d`" | awk 'BEGIN {min=9999999; max=-1;} {if(min>$1) min=$1; if(max<$1) max=$1;} END {printf("%d", min);}') +1)) if [ $ALERTOFDBNUM -gt 0 ]; then tail -$ALERTOFDBNUM $ALERTOFDB >>$LOGDIR/db/alert_$1_`date '+%y%m%d_%H:%M:%S'`.log
if [ $? -ne 0 ]; then echo " JOB-5: DB LOG ERROR!" exit 1 fi fi fi
}
#----------------------------------------------------------------------------------------------------|# 6. RMAN INFO. |#----------------------------------------------------------------------------------------------------|function _rmaninfo(){cd $PWDDIRECTORYLOGOFRMAN=$LOGDIR/rman_$1-`date '+%y%m%d_%H:%M:%S'`.logsu - oracle -c "export ORACLE_SID=$1;rman target / <<EOF >$LOGOFRMANshow all;list backup of database summary;exit;EOF"
if [ $? -ne 0 ]; then echo "JOB-6: RMAN INFO ERROR!" exit 1 fi}
#----------------------------------------------------------------------------------------------------|# 7. AWR. |#----------------------------------------------------------------------------------------------------|function _awrinfo(){cd $PWDDIRECTORYsu - oracle -c "cd $LOGDIR/db;mkdir ./$1AWR;cd $1AWR;export ORACLE_SID=$1;sqlplus -silent /nolog <<EOFconn /as sysdba@$LOGDIR/../makeawr.sql;exit;EOF"
sed -i "s/PL\/SQL procedure successfully completed.//g" $LOGDIR/$1AWR/awrsql.sql
su - oracle -c "cd $LOGDIR;cd ./$1AWR;export ORACLE_SID=$1;sqlplus -silent /nolog <<EOFconn /as sysdbaset echo off @awrsql.sql;exit;EOF"
if [ $? -ne 0 ]; then echo "JOB7-5: AWR ERROR!" exit 1 fi}
#----------------------------------------------------------------------------------------------------|# 8. TAR. |#----------------------------------------------------------------------------------------------------|function _logtar(){cd $PWDDIRECTORY
ls $LOGDIR/db/*.html > $LOGDIR/html.tmp cat $LOGDIR/html.tmp | while read FNAMEdo sed -i 's/GBK/UTF8/g' $FNAMEdonerm -rf $LOGDIR/html.tmprm -f *$HOST.tar.gz
if [ -d $DATESTRING$HOST ]; then tar -czvf $DATESTRING-$HOST.tar.gz ./$DATESTRING$HOST rm -rf ./$DATESTRING$HOST else echo " JOB-8: TAR ERROR." fi}
#----------------------------------------------------------------------------------------------------|# MAIN(). |#----------------------------------------------------------------------------------------------------|echo "========================================================="echo "= JOB-NAME: $0"echo "========================================================="_userVerify_initialization
echo "----------------------------------------------------"echo "- JOB-1: CLUSTER LOG."echo "----------------------------------------------------"_clusterlog
echo "----------------------------------------------------"echo "- JOB-2: OS INFO."echo "----------------------------------------------------"_osinfo
echo "----------------------------------------------------"echo "- JOB-3: CLUSTER INFO."echo "----------------------------------------------------"_clusterinfo
echo "----------------------------------------------------"echo "- JOB-(4-7): DB JOBs."echo "----------------------------------------------------"ps -ef | grep ora_pmon | grep -v grep | awk '{print $8}' | awk -F "_" '{print $3}' >$DBFILEif [ -e $DBFILE ]; then cat $DBFILE | while read DBSID do echo "- # JOB-4: DB INFO." _dbinfo $DBSID echo "- # JOB-5: DB LOG." _dblog $DBSID doneelse echo " JOB-4-7: NO DBSID FILES ERROR."firm -rf $DBFILE
echo "----------------------------------------------------"echo "- JOB-8: LOG TAR."echo "----------------------------------------------------"_logtar
原题:DBA如何定制自动化巡检工具
如有任何问题,可点击文末阅读原文,到社区提问
觉得本文有用,请转发或点击“在看”,让更多同行看到


 资料/文章推荐:


欢迎关注社区 “数据库”技术主题 ,将会不断更新优质资料、文章。地址:https://www.talkwithtrend.com/Channel/597


下载 twt 社区客户端 APP


长按识别二维码即可下载

或到应用商店搜索“twt”


长按二维码关注公众号

*本公众号所发布内容仅代表作者观点,不代表社区立场

twt企业IT社区
talkwithtrend.com社区(即twt社区)官方公众号,持续发布优秀社区原创内容。内容深度服务企业内各方向的架构师、运维主管、开发和运维工程师等IT专业岗位人群,让您时刻和国内企业IT同行保持信息同步。
 最新文章