Nothing is more frustrating than trying to hunt down the underlying cause of a MS SQL Server deadlock. The problem is there are so many possible causes to a deadlock. It could bad queries, recursive triggers, cursor usage, missing indexes, duplicate indexes, unnecessary indexes, disk I/O. Where do you start looking? How do you handled it when it is only sporadically happening? This article will document some of the many tricks I have used over the years to help narrow down where to look.
A deadlock occurs when two processes have locks on separate objects and each process is trying to acquire a lock on the object that the other process has. Resources that can deadlock can be tables, indexes, partitions, memory, pages, rows, metadata, worker threads, "Parallel query execution-related resources" and other things. Deadlocking is different from blocking. Blocking happens normally and often without consequence. But when one process continues to block a particular resource, it can escalate into a deadlock. Supposedly the SQL Server will at some point kill the process that is deadlocking usually the one that has been waiting the shortest or the one that is least expensive to roll back, but in my experience on a ColdFusion server that is talking to MS SQL Server those "bad threads" don't get killed until I kill it manually using the Kill command in my SQL tools or the thread ends naturally after having backed up the entire server. Googling how to fix deadlock issues get overwhelming quick, there is much info out there some of which is difficult to understand and some of it is down right misleading. Below are the techniques I use all the time to find the bad acter and get the server back on it's feet. These work for me, your mileage may vary.
SQL Editor (like Microsoft SQL Server Management Studio or even better Aquadata )
Fusion Reactor with the JDBC wrapper installed: Click here for instructions on to install JDBC wrapper
Download my Deadlock Script Kit: Here (a collection of scripts I have gathered here and there)
STEP 1: Find the page ColdFusion Page with Fusion Reactor.
Open up the Fusion reactor in your browser and set the running requests to refresh every 5 seconds or
Once you find a page at the bottom that is taking a long time and seems to cause other request to "pile up" behind it, click the Request Details button: and then click the JDBC tab, what you will see is a list of all the calls to the database this page has made, at the top will be the Slowest SQL call. This is quite possibly the query or stored SP that is cause of your woes. Or it could just be the beginning of the your journey down the rabbit hole.
STEP 2: Is it a Deadlock or just a slow page?
Just because a page takes a long time doesn't mean it is truly deadlocking. Some pages are long running by nature and you might not be bothered by that. Some pages take a long time depending on user input. To determine if we really have deadlocking we need talk to the SQL server. Open the script "Deadlock Statements.sql" that you downloaded above create the fn_GetProcessInfo function and the run the query:
SELECT SPID, Program = program_name, LastCommand = dbo.fn_GetProcessInfo(SPID), *
WHERE blocked = 0
AND SPID IN (SELECT DISTINCT blocked FROM master..sysprocesses)
This will show you all the SPIDs that are BLOCKING :
Notice above the "LastCommand" list the entire text of a trigger, SP, or Ad Hoc Query if possible. This is causing blocking. Run it the query a couple more times. If results clear up, it is just a normal block. If you run it several times and both the Fusion Reactor JDBC and the SQL Script above show similar results.. you have a true deadlock. NOTE: The JDBC and the SQL "LastCommand" may have different calls but still from the same page. In my example above the SP being called was firing off a trigger, the SQL script told me about the trigger, but JDBC was blind to it.
SPIDS that Suspend but don't deadlock.
Sometimes I see a long running thread in fusion reactor but my deadlock script doesn't show anything. I usually find that SQL had SUSPENDED that thread, run this:
SELECT 'kill', spid, dbo.fn_GetProcessInfo(SPID)
WHERE status = 'suspended'
You may as well go ahead an kill that SPID because I have yet to see ColdFusion come back from it. Don't bother trying to kill the SPID from Fusion Reactor. It won't be able to. You need to get the SPID # (lets say it is 62) and type in SQL Management Studio or Aquadata:
STEP 2: Curing the Deadlock
Once you have identified a query, stored procedure or trigger that seems to be the deadlock, now is time to try to find out WHY it is deadlocking. It is usually how the SQL Statement is written or something to do with table Indexing.
SQL STATEMENT CHECKLIST:
- Does it use a Cursor?
- Does it use IN statements?
- Does it use IN statements in conjunction with a subquery i.e. SELECT * FROM art WHERE artistsID IN (SELECT artistsID FROM artists WHERE artists.style = 'NUDES')) ?
- Does it use EXISTS statements?
- Does it use SUB querys in the select statement?
- Are you NOT using WITH (NOLOCK)
If your answer to any of these is "yes" go take a good hard look at each parts of the SQL statement and evaluate what it is doing and why. Ask yourself is there a different way to do it.
Cursors are generally evil. They use a global @@FETCH Status. Avoid cursors at all cost. Usually there is a better way to write a SQL statement without cursors, if not use a WHILE LOOP in conjunction with a table variable rather than cursor.
IN Statements. In statements are fine if you have a limited list and you always know it is going to be limited. SELECT * FROM POLICY where status IN (0,1,3,5) is fine. If the list is lengthy or worse yet, is the result of a subquery that could return hundreds of results DO NOT USE AN IN STATEMENT!! Use Exists instead.. EXISTS is orders of magnitude quicker.
Exists Statements. Exists are generally ok, but sometimes people just do them wrong, check the output of your statement maybe getting the entire table by accident rather than the one row you thought you were getting.
SUB Queries in the select statement. Generally subqueries should be avoided if possible a properly done join is much preferred.
WITH (NOLOCK) If the return is such a "dirty read" (and they happen much less than you think) would not be catastrophic consider using smartly placed WITH (NOLOCK) in your table joins.
No one truly understand indexing. No one. Even the engineering teams that write SQL Server contradict each other on the how indexing works and the best way to do it. After years of reading about them and taking classes on them, it is still so much voodoo to me. Here is my medicine man routine for indexes.
Look first at overall index fragmentation. Open the script named Fragmentation Levels.txt. Run it on your database. Check the fragmentation percent column, anything above 30% is bad. I like to see below 15% levels. Try to co-relate the tables that are being called in the procedures you caught earlier to the tables with high levels of fragmentation. A index that is fragged is more than useless. In fact sometimes it is better to delete the offending index all together. This is particularly true if you INSERT and UPDATE the table more than you SELECT from the table. Indexes speed up SELECTS. But slow down INSERTS. If you find that you generally have high l