Here is an example of an AWR report generated from a benchmark test of an OLTP system:

In the Top 10 Foreground Events by Total Wait Time section, it shows that log file sync accounts for 48.3% of the total database time. This event occurs when a user session commits a transaction and must wait for the corresponding redo data to be written to the redo log files by the Log Writer (LGWR) process. The average write time for this event is 5.69 milliseconds, which is on the high side, as it typically should not exceed 2 milliseconds.

The second-largest wait event, log file switch (checkpoint incomplete), accounts for 28.2% of the total database time. This event arises when Oracle needs to switch to a new redo log file, but the checkpoint process has not yet completed writing the dirty buffers from the current redo log file to the data files. The fact that this event accounts for such a significant portion of the total database time is a major concern. Typically, this event should not appear in the top 10 wait events. Additionally, the approximately 2 seconds of average wait time for this event is abnormally high.

In the Instance Activity Stats - Thread Activity section, it is shown that there are 456 log switches per hour. Typically, a healthy range for log file switches during peak hours is once every 15 to 20 minutes. The observed the high frequency of log switches is evidently the root cause of the log file-related wait events, this has significantly eaten into Oracle performance.

The Load Profile section indicates that this database generates 24,559,056.7 bytes per second. To bring the log file switch frequency into a healthy range. We increased the number of log file groups from 3 to 4, and increased the size of each log file from 200 MB to 22 GB, which can hold approximately 15 minutes of redo. After making these modifications, we reran the benchmark test.

The Top 10 Foreground Events by Total Wait Time section section of the AWR report for the rerun benchmark test shows that:

  • The log file switch (checkpoint incomplete) event disappeared from the top 10 wait events.
  • The average wait time for log file sync decreased significantly to approximately 0.5 milliseconds.
  • DB CPU accounted for 68.4% of the total database time, indicating that the database spent the most of its time doing productive work.

By increasing the size of log file, we successfully reduced the log file-related wait events, thereby enhancing the database’s efficiency and overall performance.

One response to “How Under-Sized Log Files Eat Into Oracle Performance”

  1. Nitesh avatar
    Nitesh

    Large Redo log files (like 22GB) can cause several issues.

    1. Slower Recovery: If a crash happens, recovery takes longer because larger redolog file mean more data to process for recovery .
    2. Data Loss Risk: Large logs can delay archiving, leading to gaps in archived logs and potential data loss during disaster recovery (DR).

    3.Recovery Requirements:

    Consider the criticality of your recovery time objectives (RTO) and recovery point objectives (RPO). If rapid recovery is important, you may need smaller log files to ensure more frequent switches and quicker recovery.

    Balancing Redo log size is key to avoiding these problems, ensuring efficient recovery, performance, and data protection.

    Like

Leave a reply to Nitesh Cancel reply

I’m Yuan

Welcome to the blog of a performance maven in Oracle & MySQL. Feel free to contact me to share and discuss database knowledge.

My 20 Oracle Certifications

A screenshot from certview.oracle.com

My Book

MySQL 8.0 Operations and Optimization

Let’s connect