In a high-traffic database, it is inevitable that concurrent sessions will block each other. One blocked session can subsequently block other sessions, resulting in a chain of blocking sessions when blockages occur.
Depending on how long the block has been lifted, information on blocking issues can be retrieved from three different tables:
- GV$SESSION: Provides real-time information about all sessions, including inactive ones.
- GV$ACTIVE_SESSION_HISTORY: Captures recent sessions activity samples in memory.
- DBA_HIST_ACTIVE_SESS_HISTORY: Stores historical sessions activity samples persisted in disks.
These tables share three columns for identifying blocking issues:
When the value of the BLOCKING_SESSION_STATUS column is VALID, the BLOCKING_INSTANCE, BLOCKING_SESSION, and BLOCKING_SESSION_SERIAL# columns contain the instance ID, session ID, and serial number of the blocking session, respectively.
By utilizing these tables and columns, blocking issues can be effectively identified and troubleshooted. Here is an example of the process to identify the chain of blocking sessions with ASH.
One of my clients complained that the database became very slow between midnight and 2 AM, even though the workload during this time is usually low. I start by querying the distribution of wait events during this time period from ASH:
SELECT
NVL(event, 'ON CPU') AS event,
NVL(wait_class, 'CPU') AS wait_class,
COUNT(*) AS total_wait_time,
TO_CHAR(100 * TRUNC(RATIO_TO_REPORT(COUNT(*)) OVER (), 6), 'FM990.9999') || '%' AS "%"
FROM
dba_hist_active_sess_history
WHERE
sample_time BETWEEN TRUNC(SYSDATE) + INTERVAL '0 00:00:00' DAY TO SECOND
AND TRUNC(SYSDATE) + INTERVAL '0 02:00:00' DAY TO SECOND
GROUP BY
event,
wait_class
ORDER BY
total_wait_time DESC
FETCH FIRST 10 ROWS ONLY;
The output of the query revealed the following:
EVENT WAIT_CLASS TOTAL_WAIT_TIME %
log file switch completion Configuration 99213 56.9754%
log file sync Commit 25953 14.9041%
gc buffer busy release Cluster 21965 12.6139%
gc buffer busy acquire Cluster 10520 6.0413%
buffer busy waits Concurrency 5309 3.0488%
log buffer space Configuration 3371 1.9358%
ON CPU CPU 1544 0.8866%
db file sequential read User I/O 1215 0.6977%
gcs log flush sync Other 916 0.526%
gc cr request Cluster 625 0.3589%
10 rows selected.
From the output, we can see that log file switch completion and log file sync were the dominant wait events, accounting for 56.9754% and 14.9041% respectively. Typical high-percentage events like CPU and I/O waits (such as ON CPU and db file sequential read) had much lower percentages in this period. I suspect there might be a blockage in accessing the log files, so we use the following query to check blocked sessions:
SELECT
blocking_session || ':' || blocking_session_serial# || '@' || blocking_inst_id AS blocking_sess,
blocking_session_status,
COUNT(*)
FROM dba_hist_active_sess_history
WHERE
blocking_session_status = 'VALID' AND
blocking_session IS NOT NULL AND
sample_time BETWEEN TRUNC(SYSDATE) + INTERVAL '0 00:00:00' DAY TO SECOND
AND TRUNC(SYSDATE) + INTERVAL '0 02:00:00' DAY TO SECOND AND
event IN ('log file switch completion', 'log file sync')
GROUP BY
blocking_session || ':' || blocking_session_serial# || '@' || blocking_inst_id,
blocking_session_status
ORDER BY
COUNT(*);
The output of the blocked sessions query is as follows:
BLOCKING_SESS BLOCKING_SESSION_STATUS COUNT(*)
3367:1@1 VALID 19511
3367:1@2 VALID 95938
It shows that on both the first and second nodes, sessions with session ID 3367 caused a significant amount of blocking in log file access. To gain more insight, I further query details of these blocking sessions:
SELECT
instance_number,
program,
event,
blocking_session || ':' || blocking_session_serial# || '@' || blocking_inst_id AS blocking_sess,
blocking_session_status,
COUNT(*)
FROM
dba_hist_active_sess_history
WHERE
SESSION_ID = 3367 AND
SESSION_SERIAL# = 1 AND
sample_time BETWEEN TRUNC(SYSDATE) + INTERVAL '0 00:00:00' DAY TO SECOND
AND TRUNC(SYSDATE) + INTERVAL '0 02:00:00' DAY TO SECOND
GROUP BY
instance_number,
program,
event,
blocking_session || ':' || blocking_session_serial# || '@' || blocking_inst_id,
blocking_session_status
ORDER BY
COUNT(*);
The details of these blocking sessions are as follows:
INSTANCE_NUMBER PROGRAM EVENT BLOCKING_SESS BLOCKING_SESSION_STATUS COUNT(*)
2 oracle@yxgldb02 (LGWR) enq: CF - contention 6181:29725@2 VALID 13
1 oracle@yxgldb01 (LGWR) enq: CF - contention 6181:29725@2 VALID 16
2 oracle@yxgldb02 (LGWR) log file parallel write :@ NO HOLDER 28
2 oracle@yxgldb02 (LGWR) enq: CF - contention 17754:31107@1 VALID 61
We see that these two sessions were LGWR background processes. Most of their time was spent waiting for enq: CF - contention, indicating contention for the Control File Enqueue. I continue to investigate the blocker sessions with the following query:
SELECT
session_id || ':' || session_serial# || '@' || instance_number sess,
program,
event,
blocking_session || ':' || blocking_session_serial# || '@' || blocking_inst_id AS blocking_sess,
blocking_session_status,
COUNT(*)
FROM
dba_hist_active_sess_history
WHERE
(
(INSTANCE_NUMBER = 2 AND SESSION_ID = 6181 AND SESSION_SERIAL# = 29725)
OR (INSTANCE_NUMBER = 1 AND SESSION_ID = 17754 AND SESSION_SERIAL# = 31107)
)
AND sample_time BETWEEN TRUNC(SYSDATE) + INTERVAL '0 00:00:00' DAY TO SECOND
AND TRUNC(SYSDATE) + INTERVAL '0 02:00:00' DAY TO SECOND
GROUP BY
session_id || ':' || session_serial# || '@' || instance_number,
program,
event,
blocking_session || ':' || blocking_session_serial# || '@' || blocking_inst_id,
blocking_session_status
ORDER BY
COUNT(*);
The output of this further investigation is as follows:
SESS PROGRAM EVENT BLOCKING_SESS BLOCKING_SESSION_STATUS COUNT(*)
6181:29725@2 rman@yxgldb02 (TNS V1-V3) enq: CF - contention 5509:32237@1 VALID 1
6181:29725@2 rman@yxgldb02 (TNS V1-V3) ges inquiry response 613:1@2 VALID 1
6181:29725@2 rman@yxgldb02 (TNS V1-V3) Backup: MML delete backup piece :@ NO HOLDER 1
6181:29725@2 rman@yxgldb02 (TNS V1-V3) Backup: MML query backup piece :@ NO HOLDER 1
6181:29725@2 rman@yxgldb02 (TNS V1-V3) enq: CF - contention 4608:13189@1 VALID 2
6181:29725@2 rman@yxgldb02 (TNS V1-V3) Backup: MML command to channel :@ NO HOLDER 3
6181:29725@2 rman@yxgldb02 (TNS V1-V3) enq: CF - contention :@ GLOBAL 5
6181:29725@2 rman@yxgldb02 (TNS V1-V3) control file parallel write :@ UNKNOWN 5
6181:29725@2 rman@yxgldb02 (TNS V1-V3) control file sequential read :@ UNKNOWN 33
9 rows selected.
We do not see the blocker session on the first node, as the sampling interval of data in DBA_HIST_ACTIVE_SESS_HISTORY view is ten seconds, so a session not captured by ASH can still be an enqueue holder. The blocking session on the second node was an RMAN backup job. Typically, the majority of events for a backup job should be related to I/O operations on the datafiles. However, this particular backup job spent a significant amount of time accessing the control file.
Further investigation reveals that the control file, which is typically a few tens of megabytes, had grown to a daunting size of 300GB. This was due to the Oracle parameter control_file_record_keep_time being accidentally set to 365 days.
SQL> show parameter control_file_reco
NAME TYPE VALUE
----------------------------- ------- -----
control_file_record_keep_time integer 365
This excessive setting caused the control file to grow too large, slowing down RMAN’s access to it and causing a chain of blockages.
Here is the blocking chain breakdown:
- RMAN Process: Took too long to access the control file, blocking the LGWR process from writing to the control file (
enq: CF - contentionwait event). - Blocked LGWR Process: Blocked numerous DML transactions that need LGWR to write to the log file during commit (
log file switch completionandlog file syncwait events). - Uncommitted Transactions: Blocked other transactions, leading to
gc buffer busy release,gc buffer busy acquire, andbuffer busy waitsevents.
Below is a visual representation of blocking chain:
+----------------------------------------------------------+
| RMAN Process |
| |
| Wait Event: control file sequential read |
+----------------------------------------------------------+
|
|
v
+----------------------------------------------------------+
| LGWR Process |
| |
| Wait Event: enq: CF - contention |
+----------------------------------------------------------+
|
|
v
+----------------------------------------------------------+
| DML Transactions |
| |
| Wait Events: log file switch completion, log file sync |
+----------------------------------------------------------+
|
|
v
+------------------------------------------------------------+
| Uncommitted Transactions Blocking Others |
| |
| Wait Events:gc buffer busy release, gc buffer busy acquire,|
| buffer busy waits |
+------------------------------------------------------------+
In summary, the troubleshooting process reveals that the root cause of the performance issue was an excessively large control file, which led to significant wait times and blockages.





Leave a comment