Detecting Deadlock Source Part 1
In the case of deadlock oracle will raise ORA-60 error.Trace file is writen by the Global Enqueue Service Daemon (LMD) background process.
Trace file will record sql involved but NOT the SQL that initialy took the locks.It will record last entries.
So how to resolve issue and get all data involved with deadlocks?
Before I get to that point let me create simple deadlock case:
create table test(a int primary key);
insert into test select rownum from dba_tables where rownum <9;
commit;
Session 1:
update test set a=9 where a=1;
update test set a=10 where a=2;
session 2:
update test set a=11 where a=3;
update test set a=12 where a=4;
session 1:
update test set a=25 where a=8;
update test set a=13 where a=3;
session 2:
update test set a=22 where a=6;
update test set a=14 where a=1;
To troubleshoot deadlock I have create after logon trigger on database.
Although I can place all these commands together ( new option with events ) just for easy read I have used execute immediate for every command.
I have used new events syntax in 11g so event 10046 is sql_trace, ORA-60 is deadlock.I have posted it on previous blogs.
For HANGANALYZE level 4 is used.This level dumps blockers.
Bellow is hanganalyze level description:
10 Dump all processes (IGN state)
5 Level 4 + Dump all processes involved in wait chains (NLEAF state)
4 Level 3 + Dump leaf nodes (blockers) in wait chains (LEAF,LEAF_NW,IGN_DMP state)
3 Level 2 + Dump only processes thought to be in a hang (IN_HANG state)
1-2 Only HANGANALYZE output, no process dump at all
and trigger:
CREATE OR REPLACE TRIGGER SYS.TRACE_MILADIN_DEADLOCKS
after logon on database
begin
if user like ‘MILADIN’ then
execute immediate ‘alter session set timed_statistics=true’;
execute immediate ‘alter session set max_dump_file_size=unlimited’;
execute immediate ‘alter session set tracefile_identifier=”miladin_deadlock”’;
execute immediate ‘alter session set events ”sql_trace wait=true, bind=true,plan_stat=all_executions,level=12”’;
execute immediate ‘alter session set events ”deadlock trace name hanganalyze_global level 4,forever”’;
execute immediate ‘alter session set events ”deadlock trace name systemstate level 266,lifetime 1”’;
execute immediate ‘alter session set events ”deadlock trace name processstate level 10,forever”’;
end if;
end;
/
This trigger will generate trace files in trace dump destination named something like test1_ora_1575_miladin_deadlock.trc
I will start first with LMD trace entries for deadlock:
user session for deadlock lock 0x10fa4c010
sid: 120 ser: 64 audsid: 496794 user: 128/MILADIN flags: 0×100045
pid: 48 O/S info: user: oracle, term: UNKNOWN, ospid: 11683
image: oracle
client details:
O/S info: user: oracle, term: pts/0, ospid: 11594
application name: SQL*Plus, hash value=3669949024
Current SQL Statement:
update test set a=13 where a=3
user session for deadlock lock 0x10fa43310
sid: 118 ser: 349 audsid: 496793 user: 128/MILADIN flags: 0×45
pid: 35 O/S info: user: oracle, term: UNKNOWN, ospid: 11349
image:
client details:
O/S info: user: oracle, term: pts/1, ospid: 11292
machine: stpqldb01 program: sqlplus
application name: SQL*Plus, hash value=3669949024
Current SQL Statement:
update test set a=14 where a=1
user session for deadlock lock 0x10fbf5da0
so my sql of interest are
update test set a=14 where a=1 and update test set a=13 where a=3.
and Wait for graph:
Global Wait-For-Graph(WFG) at ddTS[0.3] :
BLOCKED 0x10fbf5da0 5 wq 2 cvtops x1 TX 0xa001e.0x3c22 [23000-0001-0000000F] 0
BLOCKER 0x10fbf28f0 5 wq 1 cvtops x8 TX 0xa001e.0x3c22 [30000-0001-00000010] 0
BLOCKED 0x10fa4c538 5 wq 2 cvtops x1 TX 0x4000e.0x2a16 [30000-0001-00000010] 0
BLOCKER 0x10fa4c388 5 wq 1 cvtops x8 TX 0x4000e.0x2a16 [23000-0001-0000000F] 0
These are TX locks on first instance in mode 5 .
Next I am going to check trace files written by trigger.
From SQL trace/hang/dumps trace for 1st session I will focus my search for update on table test where a=1 and a=3.
Trace file has only one entry for update test set a=9 where a=1 and had few entries ( from hanganalyze and dumps ) for update test set a=13 where a=3
In SQL trace/hang/dumps trace for 2nd session will again look for entries where( where a=1 and a=3).In SQL trace for 2nd session will again look for entries where( where a=1 and a=3).
These are the entries from trace for 2nd session :
update test set a=11 where a=3and update test set a=14 where a=1.
Since I already know last statements involved with deadlock ( in LMD trace files ) other two statements left are :
update test set a=9 where a=1 and update test set a=11 where a=3 ( these statements had initial locks )
Now I can focus on application code and find where is this coming from and fix it.
This is only simple case and I am sure normally things are more complicated than this but trigger I used simple test case will give you good start.
Just be careful setting this trigger.Narrow to specific time frame just to capture deadlock info and then disable.Otherwise you will be stack with zillions trace files.
Also in 11g there is event target called GES related to deadlock:
Most basic command would be:
alter session set events ‘trace[ges]‘; – deadlock detection
Of course this syntax can be expanded.
Comments
8 Responses to “Detecting Deadlock Source Part 1”Trackbacks
Check out what others are saying...-
[...] Modrakovic http://oraclue.com/2009/04/20/detecting-deadlock-source/ [...]
Well, your example assumes that you know which user will encounter a deadlock, and the approximate time frame (because of the 10046 filling up local disk).
I have been hunting for the source of a deadlock in our system for a while, but its occurrence is not predictable thus I can’t start a 10046.
Is there a way to find all the data related to a deadlock without the 10046?
Vlado,
I do not know how many users ( schema owners ) are causing deadlock in your system but you should be able to get at least user from LMD trace files from previous deadlocks.You should have it.So that would narrow you to user.If you have few different users than you do not have choice but to enable tracing on all of them.
You can compress trace files or clean trace files that are not of interest.
Also enable alert and paging on blocking locks and deadlocks.
I know that you have OEM so once you get paged you can quickly find which sessions were involved.
I have script for blocking locks that send me blocking sid,serial and instance because I have RAC.You can modify it to tailor your needs.
If you are talking about capturing binds and the rest event 10046 is your best choice.
You can trim captured information by setting different levels.
New syntax in 11g is very flexible.
If you know offending SQL you can trace only based on that sql.
Good luck.
Miladin
Yeah, we have a script for alerting on blocking locks too, and we set the event parameter to dump system statedumps on ora-60. With all that and OEM it still does not give all the data required, so there does not seem to be a way around using 10046 and the hassle with managing the trace files.
BTW, I was hoping that logminer might be the solution but as Doug pointed out on his blog (http://oracledoug.com/serendipity/index.php?/archives/1484-Diagnosing-Locking-Problems-using-ASH-Part-5.html) logminer does not capture select for update statements
There is always way around.
Great article!
Miladin please could you send me your script for blocking locks?
Regards
Dejan
HiDejan,
You can find it on my blog:
http://oraclue.com/2009/05/18/procedure-to-kill-blocking-session-in-rac-11g/
This procedure will find blocking locks and kill.It work for RAC too.
Regards,
Miladin
Hello:
I want to interpret and understand what these values mean in the graph?
0x10fbf5da0, wq, 2, cvtops, x1, x8, 0xa001e.0x3c22 [23000-0001-0000000F]
Can you help?
Thank you
Kumar