TLDR: SQL Server captures deadlock graphs, severe errors, and long-wait events by default through a built-in Extended Events session. Most teams never query it. The DBA in this story managed a server for two years without looking — weeks of firefighting that could have started with a single diagnostic query.
The Blind Spot
A CTO called us in panic because their SQL Server was hitting deadlocks every few minutes. Application timeouts, angry users, revenue impact.
Their DBA had managed this server for two years. Never once retrieved the detailed deadlock information SQL Server was already capturing. Without anyone querying the diagnostic data, the team just knew deadlocks were happening — not which processes were fighting, not which queries were involved, not how to prevent them.
The DBA’s approach for two years: retry logic in the application. A reasonable safety net, but it treats the symptom. Most deadlocks can be prevented once you understand the root cause — query patterns, indexing, transaction scope, or isolation behavior.
SQL Server had been recording every deadlock graph the entire time. The data was sitting in the
system_healthExtended Events session — enabled by default since installation. Nobody queried it.
What SQL Server Is Already Capturing
The system_health session runs by default on every instance (2008+). It captures:
- Deadlock graphs — full XML reports showing every process, resource, and query involved
- Errors with severity >= 20 — serious engine-level errors, often connection-ending
- Memory-related errors — out-of-memory conditions, memory broker issues
- Long waits — locks held longer than 30 seconds, latches longer than 15 seconds
- Non-yielding scheduler conditions — scheduler starvation from runaway workers or external calls
This is already running. The question is whether anyone on your team is looking at it.
Why Most Teams Miss It
Extended Events data isn’t in most teams’ day-to-day workflow. SSMS provides a UI for viewing sessions and event files, but most DBAs don’t build it into their routine.
Querying it requires some familiarity with T-SQL and XML parsing.
The result: servers run for years with diagnostic gold untouched. Deadlocks happen and nobody investigates. Severity 20 errors fire and nobody notices until an application team reports symptoms.
The DBA in our story wasn’t obviously negligent. Server was patched, backups running, maintenance plans configured. But they never queried system_health, never reviewed the data, never built alerts around it.
How to Actually Use This Data
For deadlocks, query system_health for the xml_deadlock_report event.
The output gives you the full deadlock graph: sessions involved, queries running, resources contested, and which session SQL Server chose as the victim.
Retention is limited. Default event file targets vary by edition and version — older defaults use 5MB files with 4 rollovers, newer editions may use larger files. Check your actual target settings rather than assuming defaults. If you need historical data, harvest the events into a DBA database on a nightly schedule — just make sure your harvest table has its own retention policy.
For severe errors: query for error_reported events with severity >= 20. These are often the first things we check when we take over a new environment.
The Trace Flags That Still Matter
Most diagnostic roles previously filled by trace flags have been absorbed by Extended Events and engine defaults. In 2026, the list worth enabling globally is short:
TF 3226 — Suppress successful backup messages. Transaction log backups run every one to five minutes. Each one writes a success entry to the error log. TF 3226 suppresses these while still logging failures, keeping your error log readable. Our default on every server.
Caveat: don’t enable this if your monitoring reads the error log for backup confirmation. Backup history is still recorded in msdb regardless.
Backup checksum default — use sp_configure, not a trace flag. The old approach was TF 3023. On any currently supported version:
EXEC sp_configure 'backup checksum default'
,1;
RECONFIGURE;This ensures checksums on backups by default, improving corruption detection during backup and enabling checksum validation at restore. Checksums catch certain corruption during backup, but they don’t prove the backup restores cleanly. Restore testing is still required.
Legacy trace flags (TF 1118, TF 1117) — remove them. These addressed tempdb contention on SQL Server 2014 and earlier. Every supported version has this behavior built in for tempdb. For user databases, the equivalent is controlled via database and filegroup settings in 2016+. If you’re still carrying these, remove them during your next maintenance window.
If a trace flag is undocumented, treat it as lab-only unless Microsoft Support tells you otherwise.
Want the full list? → Trace Flags in SQL Server: Usage, Monitoring & Best Practices
The Old Way vs. The Current Way
DBAs used to enable trace flags 1204 and 1222 to write deadlock info to the error log. Still works, still supported, but in a high-deadlock environment these flags spam the error log and make it harder to find other signals. Extended Events captures the same data in structured XML with less overhead.
The pattern repeats: trace flags were the tool of the 2000s. Extended Events, database-scoped configurations, and sp_configure are the tools of 2026. The data is often already being captured — the gap is in querying it.
What to Set Up on Day One
When we take over a new SQL Server environment, our diagnostic baseline:
- Verify
system_healthis running — query it for recent deadlocks, severity 20 errors, and long-wait events. - Set up a harvest job — nightly SQL Agent job copying
system_healthevents into a DBA database for long-term retention. - Enable TF 3226 — as a startup parameter (-T3226).
- Enable backup checksum default — via
sp_configure. - Review the error log — look for patterns: repeated errors, failed logins, resource pressure.
- Check current trace flags — run
DBCC TRACESTATUS(-1)and question every flag. If it’s legacy or undocumented, remove it.
If managing trace flags across multiple servers, use configuration management (DSC, Ansible, or equivalent) to standardize startup parameters. Avoid startup stored procedures in master — they’re less discoverable during incidents and add complexity to the one database you least want it in.
The Real Issue
The DBA didn’t lack tools. SQL Server had been recording every deadlock for two years. The system_health session was running.
The gap was operational: nobody built querying diagnostics into their routine. Nobody harvested the data. Nobody set up alerts. The tools were there — the process wasn’t.
Your SQL Server is capturing more diagnostic data than most teams realize. The question is whether anyone is looking.
Speak with a SQL Expert
In just 30 minutes, we will show you how we can eliminate your SQL Server headaches and provide operational peace of mind