TLDR: A client panicked when their database wouldn’t come online after a physical server move. They forced it into emergency mode, overwrote the MDF file, and discovered their backup chain was broken. Six weeks of production data, gone. Every step of the disaster was preventable.
What Happened
The client moved a physical SQL Server to a new location. When they brought it back up, the database was stuck in RESTORING/recovery mode.
This is normal. SQL Server replays transaction logs during recovery to bring the database to a consistent state. The correct response is to let it finish. Depending on the volume of uncommitted transactions, this can take a while — and in the absence of corruption or storage issues, it will complete on its own.
They didn’t wait. Instead, they:
- Forced the database into emergency mode
- Purchased third-party recovery tools
- Accidentally overwrote the original MDF file with a recovery attempt
At that point, the original data file was gone.
Backup Chain Was Already Broken
When we got the call, they had 12TB of backup files on hand. Plenty of transaction log backups. But no usable recent full backup.
Why? Two decisions made weeks earlier:
- They stopped running full backups because of “performance issues”
- Their cleanup job deleted anything older than two weeks — including the last full backup
Transaction log backups contain all log records generated since the previous log backup. They only work when you can restore them on top of a full backup baseline. Without that baseline, 12TB of log files is just disk space.
They eventually found an older full backup from November buried in a staging environment refresh. That got them back online — minus six weeks of production data.
The Panic Problem
The database would have come back on its own. SQL Server’s crash recovery process is designed exactly for this scenario — unclean shutdowns, power losses, server moves without a clean stop. The engine replays the write-ahead log, rolls back incomplete transactions, and brings the database to a consistent state.
Setting a database to EMERGENCY mode changes its state and bypasses normal recovery behavior, and should only be used in corruption scenarios. It’s a last resort for corruption scenarios, not an impatience shortcut. And once recovery tools overwrote the MDF, the original data was unrecoverable.
Lesson: when a database is in recovery, let it recover. Monitor the error log for progress. Don’t touch it.
“Stop Backups for Performance” Is a Governance Failure
Pressure to disable backups because they “slow things down” is one of the most common anti-patterns we encounter.
The performance impact of a backup is temporary. The impact of not having one is permanent.
If backups are causing measurable performance issues, investigate why. Compress them. Schedule them during low-activity windows. Move them to a different storage target. Reduce full backup frequency and lean on differentials. There are solutions that don’t involve disabling the one safety net that matters.
When non-technical leadership pressures a DBA to stop backups, the risk needs to be communicated in writing. If the decision stands, document it. When something goes wrong, there should be a paper trail showing who made that call.
Your Cleanup Job Might Be Deleting Your Last Restore Point
A retention policy that deletes “anything older than 14 days” without logic to preserve the last full backup is a ticking clock. The moment full backups stop — for any reason — the countdown begins. After 14 days, the cleanup job removes the last viable restore point and the entire backup chain collapses.
At minimum, cleanup logic should preserve at least one valid full backup and its associated differential and log chain required for restore. Date-based retention alone isn’t enough.
Schrödinger’s Backups
Backups exist in an uncertain state: they’re both valid and corrupt until you actually restore one. One DBA we spoke with described a company that kept tape cartridges in a cabinet bolted to a wall — on the other side of which sat a rack of industrial magnets. Those backups were worthless for years before anyone checked.
Testing restores is how you collapse that uncertainty.
For non-DBAs wondering what “test your restores” looks like in practice: take your most recent backups and restore them to a separate server on a regular schedule. Validate that the data is intact and free of corruption. The backup itself is just a file. Proving you can turn that file back into a working database is the part that matters.
Weekly restore tests for critical systems. Monthly at minimum for everything else. Automate where possible — tools like the dbatools PowerShell module can run restore validation on a schedule — but perform manual restores periodically so the process stays familiar.
Related: SQL Server Backup Best Practices Worth Bookmarking
Layers Worth Having
Backups are the foundation. They’re also the last line of defense.
- Always On Availability Groups give you a live copy of the database on a secondary replica. If the primary goes down, the secondary is already caught up. An AG could have significantly reduced the impact, depending on the nature of the storage failure.
- Multiple backup targets — don’t store backups only on the same server. Copy them to separate storage, a different site, or cloud. A single point of failure for your backups is a single point of failure for your business.
- Storage-level snapshots add another recovery layer on virtualized infrastructure, though they supplement native backups — they don’t replace them.
Takeaway
Three rules that would have prevented this entire incident:
- Let recovery finish. SQL Server’s crash recovery is built for exactly this scenario. Monitor the error log and wait.
- Monitor your backup chain. Know when your last full backup ran. Know where it is. Get alerted if one fails.
- Test restores regularly. The only backup you can trust is the one you’ve successfully restored.
Your backup strategy exists for the day everything goes wrong. Build it for that day.
Related: Another Backup Failure Story — This Time It Was a Bank
Speak with a SQL Expert
In just 30 minutes, we will show you how we can eliminate your SQL Server headaches and provide operational peace of mind