A Client Lost 1.5 Months of Data — It Wasn’t a Cyberattack

TL;DR

Six weeks of production data vanished after a routine server move, and every step that destroyed it was a choice the team made in a panic.

The database was in normal recovery and would have come back on its own. Let crash recovery finish and watch the error log.
Full backups had been stopped for “performance,” and a 14-day cleanup deleted the last one, so 12TB of log backups had no baseline to restore against.
Reserve EMERGENCY mode for real corruption. Here it just bypassed recovery, and a recovery tool then overwrote the MDF for good.

What Happened

The client moved a physical SQL Server to a new location. When they brought it back up, the database was stuck in RESTORING/recovery mode.

This is normal. SQL Server replays transaction logs during recovery to bring the database to a consistent state. The correct response is to let it finish. Depending on the volume of uncommitted transactions, this can take a while — and in the absence of corruption or storage issues, it will complete on its own.

They didn’t wait. Instead, they:

Forced the database into emergency mode
Purchased third-party recovery tools
Accidentally overwrote the original MDF file with a recovery attempt

At that point, the original data file was gone.

Backup Chain Was Already Broken

When we got the call, they had 12TB of backup files on hand. Plenty of transaction log backups. But no usable recent full backup.

Why? Two decisions made weeks earlier:

They stopped running full backups because of “performance issues”
Their cleanup job deleted anything older than two weeks — including the last full backup

Transaction log backups contain all log records generated since the previous log backup. They only work when you can restore them on top of a full backup baseline. Without that baseline, 12TB of log files is just disk space.

They eventually found an older full backup from November buried in a staging environment refresh. That got them back online — minus six weeks of production data.

The Panic Problem

The database would have come back on its own. SQL Server’s crash recovery process is designed exactly for this scenario — unclean shutdowns, power losses, server moves without a clean stop. The engine replays the write-ahead log, rolls back incomplete transactions, and brings the database to a consistent state.

Setting a database to EMERGENCY mode changes its state and bypasses normal recovery behavior, and should only be used in corruption scenarios. It’s a last resort for corruption scenarios, not an impatience shortcut. And once recovery tools overwrote the MDF, the original data was unrecoverable.

Lesson: when a database is in recovery, let it recover. Monitor the error log for progress. Don’t touch it.

“Stop Backups for Performance” Is a Governance Failure

Pressure to disable backups because they “slow things down” is one of the most common anti-patterns we encounter.

The performance impact of a backup is temporary. The impact of not having one is permanent.

If backups are causing measurable performance issues, investigate why. Compress them. Schedule them during low-activity windows. Move them to a different storage target. Reduce full backup frequency and lean on differentials. There are solutions that don’t involve disabling the one safety net that matters.

When non-technical leadership pressures a DBA to stop backups, the risk needs to be communicated in writing. If the decision stands, document it. When something goes wrong, there should be a paper trail showing who made that call.

Your Cleanup Job Might Be Deleting Your Last Restore Point

A retention policy that deletes “anything older than 14 days” without logic to preserve the last full backup is a ticking clock. The moment full backups stop — for any reason — the countdown begins. After 14 days, the cleanup job removes the last viable restore point and the entire backup chain collapses.

At minimum, cleanup logic should preserve at least one valid full backup and its associated differential and log chain required for restore. Date-based retention alone isn’t enough.

Schrödinger’s Backups

Backups exist in an uncertain state: they’re both valid and corrupt until you actually restore one. One DBA we spoke with described a company that kept tape cartridges in a cabinet bolted to a wall — on the other side of which sat a rack of industrial magnets. Those backups were worthless for years before anyone checked.

Testing restores is how you collapse that uncertainty.

For non-DBAs wondering what “test your restores” looks like in practice: take your most recent backups and restore them to a separate server on a regular schedule. Validate that the data is intact and free of corruption. The backup itself is just a file. Proving you can turn that file back into a working database is the part that matters.

Weekly restore tests for critical systems. Monthly at minimum for everything else. Automate where possible — tools like the dbatools PowerShell module can run restore validation on a schedule — but perform manual restores periodically so the process stays familiar.

Layers Worth Having

Backups are the foundation. They’re also the last line of defense.

Always On Availability Groups give you a live copy of the database on a secondary replica. If the primary goes down, the secondary is already caught up. An AG could have significantly reduced the impact, depending on the nature of the storage failure.
Multiple backup targets — don’t store backups only on the same server. Copy them to separate storage, a different site, or cloud. A single point of failure for your backups is a single point of failure for your business.
Storage-level snapshots add another recovery layer on virtualized infrastructure, though they supplement native backups — they don’t replace them.

Takeaway

Three rules that would have prevented this entire incident:

Let recovery finish. SQL Server’s crash recovery is built for exactly this scenario. Monitor the error log and wait.
Monitor your backup chain. Know when your last full backup ran. Know where it is. Get alerted if one fails.
Test restores regularly. The only backup you can trust is the one you’ve successfully restored.

Your backup strategy exists for the day everything goes wrong. Build it for that day.

Speak with a SQL Expert

In just 30 minutes, we will show you how we can eliminate your SQL Server headaches and provide  operational peace of mind

Schedule My Call Now ➜

Article by

Mark Varnas

Founder | CEO | SQL Veteran

Hey, I'm Mark, one of the guys behind Red9. I make a living performance tuning SQL Servers and making them more stable.