Oracle error 1013 during filling up usn avail cache:
One of the Oracle
RAC database environment, I used support for Oracle 11gR2 was facing
performance issue. User were not able to insert record in a timely
manner. Update was also very slow across entire RAC environment. Customer
experience was not satisfactory and system wide performance was
impacted. Time to generate AWR, ASH for the Oracle RAC
environment.
Fun was started,
after I looked at the top timed events in AWR, I was able to see the
wait class for concurrency, cluster and other, such as Concurrency -
row cache lock, Cluster - gc buffer busy acquire, Cluster - gc buffer
busy acquire, Other - enq: US - contention, Cluster - gc current
block busy, Other - enq: TA - contention. Top timed Foreground Events
was also reflecting same wait event in the same class. Top Timed
Background Event was System I/O, control file sequential read.
SysStat and
Global messaging – RAC was reflecting high GCS received on one of
the node, near to double compare to other nodes.
Let us explore
the SQL ordered by Elapsed Time. Highest elapsed time was for update,
insert and select.
Update statement:
update /*+ rule */ undo$ set
name=:2, file#=:3, block#=:4, status$=:5, user#=:6, undosqn=:7,
xactsqn=:8, scnbas=:9, scnwrp=:10, inst#=:11, ts#=:12, spare1=:13
where us#=:1
was the statement with highest elapsed time.
What the hack is
going on? What is happening? Update statement - update undo$ was
consuming near to 50% cpu time. Update statement – update $undo was
facing highest cluster wait time. Segment statistics in global AWR
and Segment by Global Cache Buffer Busy for each node was reflecting
UNDO$ with highest GC Buffer Busy, and % of capture was more than
90%.
When we execute
DML (insert, update, delete) Oracle server process has to write
undo, so in case of rollback, original data is available in undo.
Nothing wrong with the update statement – update /*rule */ redo,
but the higher cluster wait event is the pain. Because of Bug in the
11gR2, require no. of online rollback segments are not available, so
there is a contention.
By setting _rollback_segment_count to 10000
you can force system to keep 10000 rollback segment online, so DML
activity against database can be perform efficiently. It's a feature of ACID
compliant RDBMS. You will be able to find several bugs in Oracle
support related to rollback segments also for different situation
there is different parameter, so make sure to work with Oracle
support while setting hidden parameter / underscore parameter for your Oracle
database environment. So undo / rollback segments are still pain in
the neck for the system with higher no. of DML activity / higher no.
of concurrent transaction.
Each instance
alert.log file was reflecting “Error 1013 during filling up usn
avail cache”, and “Error 3113 during filling up usn avail cache”.
I had enough
information to troubleshoot the performance of the database
environment. Refer Oracle support Notes:
Alert log file - Error 1013
during filling up usn avail cache" report in instance alert log
(Doc ID 1630507.1)IF: Undo Wait Event - Enq: US - contention (Doc
ID 1951680.1)
How to correct performance issues with enq: US - contention related to undo segments (Doc ID 1332738.1)
How to correct performance issues with enq: US - contention related to undo segments (Doc ID 1332738.1)
After spending some time, I was able to find out it’s Oracle Bug
and couple of bugs are associated with same issues.
Bug 17560426 :
ORA-1013 DURING FILLING UP USN AVAIL CACHE IN OTHER NODES AFTER
INSTANCE CRASH.
Bug 17905112 : HIGH US, TA CONTENTION AND ERROR 1013 DURING FILLING UP USN AVAIL CACHE ON RAC.
Bug 21818185 : DB HUNG WITH ERROR 1013 DURING FILLING UP USN AVAIL CACHE IN ALERT.LOG.
Bug 19545828 : ERROR 1013 DURING FILLING UP USN AVAIL CACHE.
Bug 20010589 : DATABASE HANGS WITH ENQ: TA CONTENTION AS THE TOP WAIT EVENT.
Bug 20440571 : FIX THE ERROR HANDLING CODE IN KTUSMFILLAVAILCACHE IN 11.2.
Bug 17905112 : HIGH US, TA CONTENTION AND ERROR 1013 DURING FILLING UP USN AVAIL CACHE ON RAC.
Bug 21818185 : DB HUNG WITH ERROR 1013 DURING FILLING UP USN AVAIL CACHE IN ALERT.LOG.
Bug 19545828 : ERROR 1013 DURING FILLING UP USN AVAIL CACHE.
Bug 20010589 : DATABASE HANGS WITH ENQ: TA CONTENTION AS THE TOP WAIT EVENT.
Bug 20440571 : FIX THE ERROR HANDLING CODE IN KTUSMFILLAVAILCACHE IN 11.2.
Temporary work
around might work by setting hidden / underscore parameter
_rollback_segement_count to 10000. The no. of rollback segments
depends on no. of transaction executed against database. Make sure
you derive no. near to real time otherwise same issue, you might need
to increase it again.
Because of _rollback_segement_count your
alert.log file will filled with the messages SMON: online AUM
segments … and SMON: number of AUM segments … Another option is
to apply the 11gR2 patchset.
Doc ID 1562142.1
– Document 11.2.0.4 Patch Set - List of Bug Fixes by Problem Type
(Doc ID 1562142.1), has listed “13508485 – JDBC Statement.cancel
can cause ORA-1013 from a Rollback”, under JDBC category. Same
issue is fixed in 12.1.0.1 base release also.
Wish you
wonderful time to get the control of your RAC environment, and
troubleshoot Oracle RAC performance issue.
Comments
Post a Comment