SQL Restarts Weekly? You're Ignoring the Real Problem

TL;DR

Restarting SQL Server to fix slowness is hitting snooze on a fire alarm. The cause stays, the evidence gets wiped, and the server runs slower until the caches rebuild.

A restart flushes the plan cache and buffer pool that SQL Server shaped around your workload. The first stretch after pays a cold-cache tax: more physical reads, higher CPU from recompilations, often worse response times than before.
Regular performance restarts mask a real cause. Stale statistics, missing indexes, parameter sniffing, blocking, tempdb contention, connection leaks, or a max server memory left at the unlimited default.
Check sys.dm_os_wait_stats first, then blocking, expensive queries, statistics, and index gaps. Restart for patching, CUs, and config changes that need it, but never as a performance fix.

The Restart Reflex

Stop me if this sounds familiar. SQL Server is crawling. Users are complaining. Someone suggests “just restart it.”

It works. Performance comes back. Everyone moves on.

Until it happens again three days later. Then two days. Then it’s a weekly scheduled reboot “just to keep things fresh.”

This pattern is far more common than it should be. And it’s one of the clearest signs that a real problem is being ignored.

What Actually Happens When You Restart

Restarting clears caches and internal state. A bad plan goes away, a blocking chain dies with the sessions, memory clerk bloat resets — but the underlying cause is still there.

SQL Server caches execution plans and keeps frequently accessed data pages in the buffer pool so they’re served from memory instead of disk. An instance running with a stable workload has a buffer pool and plan cache shaped to that workload. Restart it and all of that is gone.

Now your server rebuilds from scratch. Plans get recompiled. Data gets pulled from disk. For the first stretch after a restart, you’re paying a cold-cache penalty — more physical reads, higher CPU from recompilations, and potentially worse response times than before.

You haven’t fixed the problem. You’ve reset the timer and added a cold-cache tax on top.

Why People Do This

This pattern persists not because of stupidity. It’s a staffing problem.

Companies with small numbers of SQL Servers rarely have dedicated DBAs. They have smart sysadmins, network engineers, or developers who ended up responsible for SQL Server because there’s no one else.

As an accidental DBA, the restart makes sense. It visibly works — memory drops, CPU calms down, users stop calling. Without deeper SQL Server knowledge or basic diagnostics like wait stats and Query Store, finding the root cause is hard. So the restart becomes permanent.

It’s also a management problem. We’ve seen environments where application code was the root cause, but there was no organizational will to fix it. Restarting was easier than a difficult conversation about code quality.

One DBA described years of weekly restarts because management refused to address bad application queries. The server wasn’t the problem. The politics were.

The Bad Advice Hall of Fame

While we’re at it, here are some “solutions” that belong right next to the restart button:

AG (Availability Group) not syncing? Restart SQL Server because an Always On AG isn’t syncing.

AG sync issues typically come from network problems, endpoint configuration, redo thread pressure, log send queue backup, or disk throughput on the secondary. Restarting is rarely the right first move. Depending on your failover mode and configuration, it can also trigger an unplanned failover or an outage on that replica. Sometimes restarting a troubled secondary is a last resort to clear a wedged state — but it should never be the plan.

CPU pegged? Failover or reboot to fix high CPU.

The query driving CPU to 100% will still exist when SQL Server comes back up. That said, a failover can be a valid incident response move: restore service on the secondary while you preserve diagnostic state on the primary for investigation. The problem is when failover becomes the investigation — when nobody goes back to find the root cause.

Slow query? “Just add an index.”

Without understanding the query, the execution plan, and the data distribution, a new index might make things worse. It’s not a magic bullet.

Stats not helping? Update statistics every 10 minutes, then add another job every 5 minutes.

For some volatile, high-change tables, frequent targeted stats updates can be the right call. But blanketing every table with aggressive schedules causes plan churn, compilation overhead, and extra I/O. The real problem is cargo-cult scheduling without understanding which tables actually need it.

Optimizer wrong? Query hints will fix everything.

Query hints override the optimizer. Sometimes that’s necessary. But blanketing your codebase with hints means you’re telling SQL Server “I know better than you” across the board. You probably don’t. Hints also become long-term maintenance tax — they prevent the optimizer from adapting as data changes.

Every one of these is a symptom of the same underlying issue: not enough investment in root cause analysis. Toil is a real thing in tech. It creates tech debt, burns people out, and normalizes bad habits.

What’s Actually Wrong

If your SQL Server needs frequent restarts, one or more of these is likely the cause. Not exhaustive — but these are what we see most often.

Stale or missing statistics. SQL Server uses statistics to build execution plans. Outdated or poorly sampled stats lead to bad plans — bad enough to tank performance across the board. Auto-update statistics works for many workloads, but on large or volatile tables it lags behind real data changes. The fix is targeted stats maintenance on the tables that need it, not a blanket “update everything every N minutes” job. One payment processing system we saw had been restarting monthly for degraded response times. The fix was a scheduled Agent job to update stats on key tables. That was it.

Untuned or missing indexes. Without proper indexes, SQL Server scans entire tables to answer queries that should return in milliseconds. Not every scan is bad, but unnecessary scans causing resource pressure show up as escalating CPU, I/O contention, and growing response times — exactly the pattern that triggers the restart reflex.

Bad queries and parameter sniffing. Poorly written queries consume CPU, hold locks too long, and generate excessive I/O. But the biggest “it gets worse over time” culprit is parameter sniffing: SQL Server compiles a plan optimized for one set of values, then reuses it for different values where it performs terribly. A restart flushes the bad plan — until the same sniffing happens again.

Blocking and long-running transactions. Queries holding locks for extended periods cause other sessions to queue behind them. A restart kills all sessions and clears the chain, but the pattern repeats as soon as the same workload resumes.

Tempdb contention. Allocation page contention, excessive temp table usage, or version store bloat can degrade performance over time. A restart reinitializes tempdb, which can temporarily relieve certain allocation or version store pressure.

If tempdb contention is the culprit, start here: 4 TempDB best practices that prevent the problem from recurring.

Connection leaks. Applications that don’t properly close connections gradually accumulate sessions. SQL Server suffers when thousands of sessions pile up, consuming memory and worker threads. The restart clears the sessions. The leak in the application code is still there.

Memory configuration. High memory utilization on a dedicated SQL Server is expected — not a leak. But if max server memory is left at the default (2,147,483,647 MB — effectively unlimited), SQL Server consumes memory until the OS is starved, causing paging and degrading everything on the box. Set it to leave headroom for the OS and other services — typically total RAM minus 4–8 GB on a dedicated box. If you’re running Lock Pages in Memory, this is even more critical. Restarting because “memory is too high” without checking max server memory means you’re treating a misconfiguration as a mystery.

What to Check Instead of Restarting

Before you reach for the restart button, run these. None of them are silver bullets on their own, but together they’ll point you at the actual problem faster than any restart.

1. Check wait stats — what is SQL Server actually waiting on:

SELECT TOP 20 wait_type
	,wait_time_ms / 1000.0 AS wait_time_sec
	,signal_wait_time_ms / 1000.0 AS signal_wait_sec
	,waiting_tasks_count
	,wait_time_ms * 100.0 / SUM(wait_time_ms) OVER () AS pct
FROM sys.dm_os_wait_stats
WHERE wait_type NOT IN (
		'SLEEP_TASK'
		,'BROKER_TO_FLUSH'
		,'SQLTRACE_BUFFER_FLUSH'
		,'CLR_AUTO_EVENT'
		,'CLR_MANUAL_EVENT'
		,'LAZYWRITER_SLEEP'
		,'CHECKPOINT_QUEUE'
		,'WAIT_XTP_OFFLINE_CKPT_NEW_LOG'
		,'WAITFOR'
		,'XE_TIMER_EVENT'
		,'XE_DISPATCHER_WAIT'
		,'FT_IFTS_SCHEDULER_IDLE_WAIT'
		,'LOGMGR_QUEUE'
		,'REQUEST_FOR_DEADLOCK_SEARCH'
		,'DIRTY_PAGE_POLL'
		,'HADR_FILESTREAM_IOMGR_IOCOMPLETION'
		,'SP_SERVER_DIAGNOSTICS_SLEEP'
		,'BROKER_EVENTHANDLER'
		,'BROKER_RECEIVE_WAITFOR'
		,'BROKER_TASK_STOP'
		,'DBMIRROR_DBM_MUTEX'
		)
ORDER BY wait_time_ms DESC;

Wait stats are the single most useful diagnostic starting point. They tell you whether the bottleneck is CPU (SOS_SCHEDULER_YIELD), disk (PAGEIOLATCH_*), locking (LCK_M_*), memory grants (RESOURCE_SEMAPHORE), or something else entirely. Start here.

2. Check for active blocking chains:

SELECT r.session_id
	,r.blocking_session_id
	,r.wait_type
	,r.wait_time / 1000.0 AS wait_time_sec
	,s.program_name
	,s.host_name
	,s.login_name
	,t.TEXT AS query_text
FROM sys.dm_exec_requests r
JOIN sys.dm_exec_sessions s ON r.session_id = s.session_id
CROSS APPLY sys.dm_exec_sql_text(r.sql_handle) t
WHERE r.blocking_session_id > 0
ORDER BY r.wait_time DESC;

3. Check for expensive queries:

SELECT TOP 20 qs.total_worker_time / qs.execution_count AS avg_cpu_time
	,qs.total_elapsed_time / qs.execution_count AS avg_elapsed_time
	,qs.total_logical_reads / qs.execution_count AS avg_logical_reads
	,qs.execution_count
	,SUBSTRING(qt.TEXT, qs.statement_start_offset / 2 + 1, (
			CASE 
				WHEN qs.statement_end_offset = - 1
					THEN LEN(CONVERT(NVARCHAR(MAX), qt.TEXT)) * 2
				ELSE qs.statement_end_offset
				END - qs.statement_start_offset
			) / 2 + 1) AS query_text
FROM sys.dm_exec_query_stats qs
CROSS APPLY sys.dm_exec_sql_text(qs.sql_handle) qt
ORDER BY qs.total_worker_time DESC;

This orders by total CPU, which can bias toward frequently-run queries. Consider also sorting by avg_cpu_time or avg_logical_reads to find individual expensive executions. These DMVs only reflect what’s in the plan cache since the last restart or cache flush — if you’ve been restarting frequently, the window is small. If Query Store is enabled, use it for historical perspective.

For a deeper walkthrough of these DMVs and what the results mean, see our guide to the top 7 DMV queries for finding slow workloads.

4. Check statistics health on high-change tables:

SELECT OBJECT_NAME(sp.object_id) AS table_name
	,sp.stats_id
	,s.name AS stats_name
	,sp.last_updated
	,sp.rows
	,sp.rows_sampled
	,sp.modification_counter
FROM sys.dm_db_stats_properties(NULL, NULL) sp -- SQL 2016+ / compat 130+
JOIN sys.stats s ON sp.object_id = s.object_id
	AND sp.stats_id = s.stats_id
WHERE sp.modification_counter > 0
ORDER BY sp.modification_counter DESC;

Don’t rely on “last updated X days ago” as a proxy for whether stats are a problem. A stat updated yesterday can be useless if the table changed massively afterward. A stat updated 30 days ago can be fine if the table is stable. Focus on the modification_counter relative to row count — that tells you where stats are most likely to be causing plan quality issues.

5. Review missing index suggestions (with caveats):

SELECT TOP 20 OBJECT_NAME(mid.object_id) AS table_name
	,mid.equality_columns
	,mid.inequality_columns
	,mid.included_columns
	,migs.user_seeks
	,migs.avg_user_impact
FROM sys.dm_db_missing_index_details mid
JOIN sys.dm_db_missing_index_groups mig ON mid.index_handle = mig.index_handle
JOIN sys.dm_db_missing_index_group_stats migs ON mig.index_group_handle = migs.group_handle
WHERE mid.database_id = DB_ID()
ORDER BY migs.user_seeks * migs.avg_user_impact DESC;

Important: these are suggestions, not prescriptions. The missing index DMVs can recommend redundant or harmful indexes. They also reset on restart, so if you’ve been restarting frequently, you’re seeing a narrow window. Use these as starting points for investigation, and consolidate overlapping recommendations before creating indexes.

6. Check session counts by application and host:

SELECT s.program_name
	,s.host_name
	,s.login_name
	,COUNT(*) AS session_count
	,SUM(CASE 
			WHEN s.STATUS = 'sleeping'
				THEN 1
			ELSE 0
			END) AS sleeping_sessions
FROM sys.dm_exec_sessions s
WHERE s.is_user_process = 1
GROUP BY s.program_name
	,s.host_name
	,s.login_name
ORDER BY COUNT(*) DESC;

High session counts don’t automatically mean a leak — connection pooling can produce many sleeping sessions as normal behavior. What you’re looking for is unusual growth over time, or a specific application accumulating sessions that never close. Track this periodically to establish a baseline.

These six checks cover the highest-value diagnostics. If you want historical perspective beyond the current uptime window, enable Query Store — it persists query performance data across restarts and gives you regression detection, plan comparison, and workload analysis that the DMVs alone can’t provide.

When Restarts Are Actually Fine

Let’s be clear: restarts aren’t inherently evil. There are legitimate reasons to restart SQL Server:

Windows patching. OS updates frequently require a reboot. This is expected and normal. On large systems with big buffer pools, plan for the cold-cache penalty after restart — some environments use HA replicas to avoid cold starts on the busiest node, or have warm-up procedures to preload critical data.
SQL Server cumulative updates or upgrades. Applying CUs or moving to a new version requires a restart. Plan for it.
Configuration changes. Some sp_configure settings need a restart to take effect. So do certain environment changes like enabling Instant File Initialization or Lock Pages in Memory (which require service account privilege changes followed by a service restart).

The problem isn’t restarting. It’s restarting as a performance fix.

If your only reason for hitting restart is “it’s slow,” you have a diagnostics problem, not a server problem.

The Real Cost of Restart Culture

When restarting becomes the default fix, other bad habits follow.

Databases end up in Simple recovery mode because “Full recovery requires too much work.” Fine — until you need point-in-time recovery and realize you don’t have it.
Hard reboots happen “because it was taking too long to restart.” SQL Server can recover after a crash, but recovery takes time — longer if the transaction log has uncommitted work to roll back. And if the underlying storage is unhealthy, a hard reboot can increase corruption risk, especially if storage write caches are not properly protected. Either way, you’ve turned a performance problem into a longer outage.
Diagnostic information gets destroyed. Wait stats, plan cache contents, and session state all reset on restart. If you restart before anyone investigates, you’ve thrown away the evidence needed to fix the real problem. (Query Store preserves query performance data across restarts — one more reason to enable it.)
And the irony: the restart itself introduces downtime. The server that was “too slow” is now completely unavailable. Degraded service is almost always better than no service. A controlled outage can be the right call if cascading timeouts are causing wider failures — but that should be a deliberate decision, not a habit.

Bottom Line

In a stable environment, SQL Server should run for months without performance-related restarts. If yours can’t, the answer isn’t more restarts. It’s finding out why.

Start with wait stats to understand where the bottleneck actually is. Then look at blocking, your most expensive queries, statistics health, and index gaps. That’s where most performance problems live.

Your restart button isn’t a solution. It’s a snooze alarm.

Speak with a SQL Expert

In just 30 minutes, we will show you how we can eliminate your SQL Server headaches and provide  operational peace of mind

Schedule My Call Now ➜

Article by

Mark Varnas

Founder | CEO | SQL Veteran

Hey, I'm Mark, one of the guys behind Red9. I make a living performance tuning SQL Servers and making them more stable.