Our FULL SQL Server health check will cover a complete SQL Server health checklist of items on-premises and cloud-based DBMS services.
Hardware, operating system, SQL server configuration, security (users), availability, load (amount and type of access), operation (maintenance and monitoring plans), disaster recovery, and much more will be analyzed.
Some clients want to check server health (FULL SQL Health Check), while some only want to go deep on analysis for a SINGLE database health check.
Even though it is a single DB health analysis, we still check for best practices and optimal settings in server and instance-level items, covering performance, reliability, and security points.
That is intentional.
After the SQL health check report is done and presented to you, we will go over everything on the report until you run out of questions.
All the SQL checks are non-intrusive, very lightweight, and will not affect the production SQL Server. Data collection can be done during regular business operating hours.
This is about 90% of what will happen. The remaining 10% will be spent on items discovered during analysis, requiring more investigation. So, we will drill into that more as we find stuff.
Here is the complete list of items verified for an on-premise Microsoft SQL Server environment:
MS SQL Server Health Check Checklist
Server level configurations
Ensure that all your SQL Server services are working correctly according to your needs.
A comprehensive methodology will be applied to look at various levels SQL depended on NUMA, Clustering and failover, BIOS, VMWare setup, networking, and few more areas.
It may seem like an overkill to look at all the layers, but Red9 practices prove that comprehensive analysis is useful. Especially when stability and speed matter.
Check out some of the items below.
multiple SQL Server instances
Unnecessary SQL services
SSMS missing updates
System BIOS updates
SQL services using non SA account
Instant file initialization access right
Dangerous SQL Server builds
SQL engine startup settings
CPU schedulers offline
Windows operating system settings
Windows page file configuration
Windows visual effects settings
Inspect task scheduler
Windows power plan optimal
Lock pages in memory setting
Page verification not optimal
Windows updates settings
Multiple RDP sessions actives
Instance parameters and options
Is your SQL Server with the latest SQL Service Pack and Cumulative Updates? How about SQL Server Management Studio? Are the parallelism settings matching the workload? Is deadlocking happening? How heavy is tempDB usage?
Are the features being appropriately utilized? Make sure you are taking advantage of some of the less known features.
We will isolate top server waits since SQL instance starts and for 30-60 min (production workload) to determine the client’s main pains.
When your database is corrupt, hacked, or destroyed – slow queries will not matter. So, that is why we will address the Reliability and Security first.
Instance options and features
Trace flags usage
SQL max memory settings
Priority boost enabled
Deprecated features in use
Query store not in use
SQL server blocks, locks and deadlocks
Orphaned data files
DBCC shrink ran recently
Change tracking enabled
Default cost threshold for parallelism
Default max degree of parallelism
SQL Server tempDB configuration
Database properties and level checks
The purpose is to determine if your database’s properties for each of your SQL Server instances are appropriately configured for your particular environment.
How SQL Agent jobs are setup? Is there a Maintenance plan? How are DB backups set up? When was the last corruption check date? Is the transaction log larger than the DB data file?
Our goal is to ensure the highest SQL Server performance and availability of your databases.
Database options and Agent Jobs
Delayed durability not in use
SQL agent jobs starting simultaneously
SQL agent jobs owned by non SA account
SQL agent jobs history settings
SQL agent jobs without notifications
Maintenance plans missing
Missing failsafe operator
Backup Health Checks
Backups on live DB drivers (bkp files on C drive)
MSDB BKPs history not being purged
Snapshot BKPs occurring
Missing corruption checks
DBS missing BKPs
Backups compression settings
Full recovery mode without backup logs
Backups consuming high resources
BKPs over UNC
Simple recovery model usage
Architectural design overview
How is the majority of TSQL coming? Is it ad-hoc SQL queries or stored procs? Are files placed on best matching drives? How often do DB files grow? What causes the delay? Are you using third-party monitoring tools?
What is the read vs. write ratio at the DB file level? This will show if the workload is read or write-intensive, which changes how tuning is done.
We will test drive latencies and review how well DB files are placed across storage drives.
Also, our team will find top used objects: look at the sample data, data types in use, indexes, foreign keys, etc.
Disks and storage configurations
Filegroups and files layout
DB files layout optimal
system or users DBs on OS drive
High Virtual log files (VLF) counts
File growth options
Optimize for ad-hoc workloads
Large LDF file
Log files larger than MDF files
User tables in sys DBs
DB compatibility setting
Objects created with SET Options
Max DB file size set
DBS owner by non SA account
DB state offline
Data compression usage
Database with non-aligned indexes
Data and IXs within single filegroup/file
Filestream for large databases
Top SQL objects and performance checks
What are the tables with the most activity? What are the largest tables?
Are there tables without clustering keys? Are there any foreign keys that are not trusted?
Red9 will review top objects (usually up to 5-10 is sufficient) to see how well data type choices were made, indexing, constraints, triggers, statistics on those tables.
Which TSQL runs most frequently? Takes longest to execute? Which Consumes the most CPU, RAM and I/O (disk)?
Anything that costs too much in resources? If TSQL duration is sometimes fast and sometimes very slow, this is an opportunity for performance tuning.
Is there implicit data conversion happening? Are forced join hints? Are triggers being used for auditing? Are there UDFs in use? If yes, we will review how bad.
Additionally, our team will check how up to date are statistics and fragmentation levels.
The database indexes will be carefully analyzed – Often the fastest way to add speed to DB performance.
Review top worst performers
Query plan analysis
Database objects and tables designs
Indexes and statistics
Want a free health check sample report? Send an email using our contact us form, and say that you saw this post, and I will send you a report sample.
For fellow DBAs – what did I miss? What can be done better?