I work as a database support engineer for a company with 20,000 clients, which means I encounter all kinds of unique issues. One of our clients faced a persistent problem where their database was generating a large number of trace files, frequently filling up the hard disk. Here’s how I resolved this issue.

Problem Analysis

After examining the trace files’ names and their creation times, I concluded that these trace files were generated by the RMAN process during the incremental backup at 3:30 AM every day. Oracle’s Metalink provided a solution for this issue, detailed in Document 29061016.8, which suggests applying a patch.

Bug 29061016 – Huge Tracefile Generated by RMAN Incremental Backups with KCBTSE Structure (Doc ID 29061016.8)

However, applying this patch to the production system was a lengthy process. As a temporary measure, the client needed to manually delete the trace files promptly. I recommended using ADRCI (Automatic Diagnostic Repository Command Interpreter) to delete the trace files. For instance, to delete all trace files from one day:

adrci> purge -age 3600 -type trace

Setting Automatic Deletion Policy

To prevent this issue in the future, it is advisable to set an automatic deletion policy for trace files. Firstly, let’s check the default settings:

adrci> show home
ADR Homes:
diag/rdbms/small/small
diag/rdbms/orcl1/orcl1
diag/rdbms/aurreum/aurreum
diag/rdbms/orcl/orcl
diag/clients/user_oracle/host_3498212516_110
diag/tnslsnr/dell/listener1

adrci> set home diag/rdbms/small/small
adrci> show control

The output shows:

ADR Home = /u01/app/oracle/diag/rdbms/small/small:
*************************************************************************
ADRID                SHORTP_POLICY        LONGP_POLICY         LAST_MOD_TIME                            LAST_AUTOPRG_TIME                        LAST_MANUPRG_TIME                        ADRDIR_VERSION       ADRSCHM_VERSION      ADRSCHMV_SUMMARY     ADRALERT_VERSION     CREATE_TIME                              SIZEP_POLICY         PURGE_PERIOD         FLAGS                PURGE_THRESHOLD
-------------------- -------------------- -------------------- ---------------------------------------- ---------------------------------------- ---------------------------------------- -------------------- -------------------- -------------------- -------------------- ---------------------------------------- -------------------- -------------------- -------------------- --------------------
114559742            720                  8760                 2021-06-25 11:56:19.334671 +08:00        2022-08-05 12:21:51.844023 +08:00                                                 1                    2                  110                  1                    2021-06-25 11:56:19.334671 +08:00        18446744073709551615 0                    0                    95
1 row fetched
  • SHORTP_POLICY is set to 720 hours (one month) for incident and health monitor warnings.
  • LONGP_POLICY is set to 8760 hours (one year) for trace and core dump files.
  • LAST_AUTOPRG_TIME indicates the last automatic purge.
  • LAST_MANUPRG_TIME is null, meaning no manual deletion has occurred.

To set the delete policy to 168 hours (one week):

adrci> set control (SHORTP_POLICY=168)
adrci> set control (LONGP_POLICY=168)
adrci> show control

The updated settings:

ADR Home = /u01/app/oracle/diag/rdbms/small/small:
*************************************************************************
ADRID                SHORTP_POLICY        LONGP_POLICY         LAST_MOD_TIME                            LAST_AUTOPRG_TIME                        LAST_MANUPRG_TIME                        ADRDIR_VERSION       ADRSCHM_VERSION      ADRSCHMV_SUMMARY     ADRALERT_VERSION     CREATE_TIME                              SIZEP_POLICY         PURGE_PERIOD         FLAGS                PURGE_THRESHOLD
-------------------- -------------------- -------------------- ---------------------------------------- ---------------------------------------- ---------------------------------------- -------------------- -------------------- -------------------- -------------------- ---------------------------------------- -------------------- -------------------- -------------------- --------------------
114559742            168                  168                  2022-08-05 15:47:32.029723 +08:00        2022-08-05 12:21:51.844023 +08:00                                                 1                    2                  110                  1                    2021-06-25 11:56:19.334671 +08:00        18446744073709551615 0                    0                    95
1 row fetched

Run the purge command:

adrci> purge
adrci> show control

The updated output shows LAST_MANUPRG_TIME has updated, indicating manual purge, and the trace files have been deleted.

Automation Script

To automate setting different ADR homes in batches, I provided the following script:

#!/bin/sh
for ADRHOME in `adrci exec="show home"`
do
    if [ "$ADRHOME" = "ADR" ] || [ "$ADRHOME" = "Homes:" ]
    then
        continue
    fi
    echo $ADRHOME
    adrci <<EOF
      set home $ADRHOME
      set control (SHORTP_POLICY=168)
      set control (LONGP_POLICY=168)
      purge
    exit
EOF
done

This script iterates through all ADR homes, sets the purge policies to 168 hours, and purges old trace files.

By implementing these measures, we managed to prevent the client’s hard disk from being overwhelmed by excessive trace files while awaiting the permanent patch solution.

Leave a comment

I’m Yuan

Welcome to the blog of a performance maven in Oracle & MySQL. Feel free to contact me to share and discuss database knowledge.

My 20 Oracle Certifications

A screenshot from certview.oracle.com

My Book

MySQL 8.0 Operations and Optimization

Let’s connect