Home > Internals > Detecting Deadlock Source Part 1

Detecting Deadlock Source Part 1

In the case of deadlock oracle will raise ORA-60 error.Trace file is writen by the Global Enqueue Service Daemon (LMD) background process.

Trace file will record sql involved but NOT the SQL  that initialy took the locks.It will record last entries.

So how to resolve issue and get all data involved with deadlocks?

Before I get to that point let me create simple deadlock case:

create table test(a int primary key);

insert into test select rownum from dba_tables where rownum <9;
commit;
Session 1:

update test set a=9 where a=1;
update test set a=10 where a=2;

session 2:

update test set a=11 where a=3;
update test set a=12 where a=4;

session 1:

update test set a=25 where a=8;
update test set a=13 where a=3;
session 2:

update test set a=22 where a=6;
update test set a=14 where a=1;

To troubleshoot deadlock I have create after logon trigger on database.
Although I can place all these commands together ( new option with events ) just for easy read I have used execute immediate for every command.
I have used new events syntax in 11g so  event 10046 is sql_trace, ORA-60 is deadlock.I have posted it on previous blogs.

For HANGANALYZE level 4 is used.This level dumps blockers.

Bellow is hanganalyze level description:

10     Dump all processes (IGN state)
5      Level 4 + Dump all processes involved in wait chains (NLEAF state)
4      Level 3 + Dump leaf nodes (blockers) in wait chains (LEAF,LEAF_NW,IGN_DMP state)
3      Level 2 + Dump only processes thought to be in a hang (IN_HANG state)
1-2    Only HANGANALYZE output, no process dump at all

and  trigger:

CREATE OR REPLACE TRIGGER SYS.TRACE_MILADIN_DEADLOCKS
after logon on database
begin
if user like ‘MILADIN’ then
execute immediate ‘alter session set timed_statistics=true’;
execute immediate ‘alter session set max_dump_file_size=unlimited’;
execute immediate ‘alter session set tracefile_identifier=”miladin_deadlock”’;
execute immediate ‘alter session set events ”sql_trace  wait=true, bind=true,plan_stat=all_executions,level=12”’;
execute immediate ‘alter session set events ”deadlock trace name hanganalyze_global level 4,forever”’;
execute immediate ‘alter session set events ”deadlock trace name systemstate level 266,lifetime 1”’;
execute immediate ‘alter session set events ”deadlock trace name processstate level 10,forever”’;
end if;
end;
/


This trigger will generate trace files in trace dump destination named something like  test1_ora_1575_miladin_deadlock.trc

I will start first with LMD trace entries for deadlock:

user session for deadlock lock 0x10fa4c010
sid: 120 ser: 64 audsid: 496794 user: 128/MILADIN  flags: 0×100045
pid: 48 O/S info: user: oracle, term: UNKNOWN, ospid: 11683
image: oracle
client details:
O/S info: user: oracle, term: pts/0, ospid: 11594

application name: SQL*Plus, hash value=3669949024
Current SQL Statement:
update test set a=13 where a=3

user session for deadlock lock 0x10fa43310
sid: 118 ser: 349 audsid: 496793 user: 128/MILADIN flags: 0×45
pid: 35 O/S info: user: oracle, term: UNKNOWN, ospid: 11349
image:

client details:
O/S info: user: oracle, term: pts/1, ospid: 11292
machine: stpqldb01 program: sqlplus
application name: SQL*Plus, hash value=3669949024
Current SQL Statement:
update test set a=14 where a=1
user session for deadlock lock 0x10fbf5da0

so my sql of interest are

update test set a=14 where a=1 and update test set a=13 where a=3.
and Wait for graph:
Global Wait-For-Graph(WFG) at ddTS[0.3] :
BLOCKED 0x10fbf5da0 5 wq 2 cvtops x1 TX 0xa001e.0x3c22 [23000-0001-0000000F] 0
BLOCKER 0x10fbf28f0 5 wq 1 cvtops x8 TX 0xa001e.0x3c22 [30000-0001-00000010] 0
BLOCKED 0x10fa4c538 5 wq 2 cvtops x1 TX 0x4000e.0x2a16 [30000-0001-00000010] 0
BLOCKER 0x10fa4c388 5 wq 1 cvtops x8 TX 0x4000e.0x2a16 [23000-0001-0000000F] 0

These are TX locks on first instance in mode 5 .

Next I am going to check trace files written by trigger.
From SQL trace/hang/dumps trace  for 1st session I will focus my search for update on table test where a=1 and a=3.

Trace file has only one entry for update test set a=9 where a=1 and had few entries ( from hanganalyze and dumps ) for update test set a=13 where a=3

In SQL  trace/hang/dumps trace for 2nd session will again look for entries where( where a=1 and a=3).In SQL  trace for 2nd session will again look for entries where( where a=1 and a=3).

These are the entries from trace for 2nd session :

update test set a=11 where a=3and update test set a=14 where a=1.

Since I already know last statements involved with deadlock ( in LMD trace files ) other two statements  left are :

update test set a=9 where a=1 and update test set a=11 where a=3  ( these statements had  initial locks )

Now I can focus on application code and find where is this coming from and fix it.
This is only simple case  and I am sure normally things are more complicated than this but trigger I used simple test case will give you good start.
Just be careful setting this trigger.Narrow to specific time frame just to capture deadlock info and then disable.Otherwise you will be stack with zillions trace files.

Also in 11g there is event target called GES related to deadlock:

Most basic command would be:

alter session set events ‘trace[ges]‘; – deadlock detection

Of course this syntax can be expanded.

About these ads
Categories: Internals
  1. Vlado
    April 21, 2009 at 11:59 am | #1

    Well, your example assumes that you know which user will encounter a deadlock, and the approximate time frame (because of the 10046 filling up local disk).
    I have been hunting for the source of a deadlock in our system for a while, but its occurrence is not predictable thus I can’t start a 10046.
    Is there a way to find all the data related to a deadlock without the 10046?

  2. April 21, 2009 at 2:59 pm | #2

    Vlado,

    I do not know how many users ( schema owners ) are causing deadlock in your system but you should be able to get at least user from LMD trace files from previous deadlocks.You should have it.So that would narrow you to user.If you have few different users than you do not have choice but to enable tracing on all of them.

    You can compress trace files or clean trace files that are not of interest.

    Also enable alert and paging on blocking locks and deadlocks.

    I know that you have OEM so once you get paged you can quickly find which sessions were involved.

    I have script for blocking locks that send me blocking sid,serial and instance because I have RAC.You can modify it to tailor your needs.

    If you are talking about capturing binds and the rest event 10046 is your best choice.

    You can trim captured information by setting different levels.
    New syntax in 11g is very flexible.
    If you know offending SQL you can trace only based on that sql.

    Good luck.

    Miladin

  3. Vlado
    April 22, 2009 at 12:14 am | #3

    Yeah, we have a script for alerting on blocking locks too, and we set the event parameter to dump system statedumps on ora-60. With all that and OEM it still does not give all the data required, so there does not seem to be a way around using 10046 and the hassle with managing the trace files.

    BTW, I was hoping that logminer might be the solution but as Doug pointed out on his blog (http://oracledoug.com/serendipity/index.php?/archives/1484-Diagnosing-Locking-Problems-using-ASH-Part-5.html) logminer does not capture select for update statements :(

  4. April 22, 2009 at 10:27 pm | #4

    There is always way around.

  5. February 20, 2010 at 1:14 am | #5

    Great article!

    Miladin please could you send me your script for blocking locks?

    Regards

    Dejan

  6. February 20, 2010 at 2:40 am | #6

    HiDejan,

    You can find it on my blog:

    http://oraclue.com/2009/05/18/procedure-to-kill-blocking-session-in-rac-11g/

    This procedure will find blocking locks and kill.It work for RAC too.

    Regards,

    Miladin

  7. sarveswara
    September 8, 2011 at 5:53 am | #7

    Hello:
    I want to interpret and understand what these values mean in the graph?
    0x10fbf5da0, wq, 2, cvtops, x1, x8, 0xa001e.0x3c22 [23000-0001-0000000F]
    Can you help?

    Thank you
    Kumar

  8. yong
    December 17, 2013 at 9:33 am | #8

    could you tell If you know offending SQL you can trace only based on that sql. how to do it ?

  1. August 15, 2009 at 12:52 pm | #1

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s

Follow

Get every new post delivered to your Inbox.

Join 41 other followers

%d bloggers like this: