Service King

Service King reduces SQL Server outages from 90 per quarter down to zero.

Trusted by over 150 clients, from startups to Fortune 100 companies, including:

Challenges

Unstable SQL Server
Slowness
Lack of experience

Benefits

Stability
SQL Speed
Improved SLAs
Faster processing
Increased SQL Server capacity
Savings in hardware upgrade

Background

Service King Collision is one of the largest and most respected automotive repair shops in the US. They have been in business for over 40 years starting with a single location, and have now grown to 345 locations across 24 states.

When Service King approached Red9, they had opened 68 locations and were looking to expand further.

A couple of major technical issues were holding them back – daily SQL Server outages and severe performance issues. Their SQL Server would fail on a nearly daily basis causing 30-45 minute outages (sometimes multiple times per day). During these down periods, agents across all of their locations would be locked out from checking in new customers or executing any of their other administrative tasks.

The lost productivity was not only hurting their bottom line, it was inconveniencing their customers and preventing them from expanding further. On top of that, performance issues were causing every action taken in their systems to move at a snail’s pace, further adding to employee and customer frustration.

Service King’s leadership team was worried that these outages and performance issues would hold them back from continuing to acquire new locations and expand their business if left unresolved.

Red9 was able to identify and resolve the core issues causing the downtime and slowness without an additional hardware purchase and ultimately saved Service King Collision hundreds of thousands of dollars in the process.

Problem

There were many:

  • Unscheduled Windows Cluster Failovers:
    • Occurred during business hours.
    • Each failover lasted 30-45 minutes.
    • Only 25% of cluster nodes were functional; out of four nodes, only one could handle the production workload.
  • The SAN storage array was misconfigured and working at 40MB/s (or USB2 speeds).
  • SQL Server Challenges:
    • SQL backups could not be taken because full backups took over 24 hours to complete.
    • SQL FILESTREAMING feature was being used but had outgrown manageability.
    • Multiple databases had SQL corruption.
  • Configuration Issues:
    • Significant gaps in maintenance, monitoring, alerting, and feature usage.
    • Severe performance problems.
  • Massive Security and Configuration Problems:
    • VMWare was improperly configured, causing further stability and performance issues.
    • Admin-level access was accidentally granted to all employees.

At Red9 we always start with Red9 proprietary 145-Point SQL Server Health Check which we used to uncover a variety of technical issues.

Solution

We fixed Windows Failover Cluster misconfiguration, made networking configuration changes, improved SAN storage speed, and addressed backup and restore issues.

Additionally, we corrected SQL data corruption in a key database (which even Microsoft Support couldn’t fix), resolved database corruption caused by third-party software, and tuned SQL Server with best practices.

Finally, we configured VMWare host and guest for best practices and implemented monitoring, alerting, and proactive care.

Results

We’ve achieved remarkable improvements in system performance and reliability.

Here are some highlights:

  1. We reduced unscheduled Windows Cluster failovers from about 90 per quarter to zero.
  2. Full backups were taking 30h+. We were able to cut backup duration by 2x and set up a proper backup schedule for fulls, differentials, and transactional backups.
  3. Enhanced SAN storage speeds by 11-15 times and optimized data storage through strategic file grouping and layering:
    • Created multiple data file groups.
    • Handpicked the most used tables and indexes, placing them on the new file groups.
    • Organized these file groups across three SAN speed layers: fast (SSD array), medium (10,000 RPM drives), and slowest (7,500 RPM drives).
    • Automated the process of migrating unneeded data out of production databases to keep DB files small and fast.
  4. We fixed data corruption – one corrupted DB page at a time. Microsoft Support’s recommended solution would have caused too much downtime and was not practical. Luckily corruption was all in FILESTREAM data.
  5. We reconfigured third-party antivirus software which was the culprit of corruption.
  6. Multiple SQL Server settings changed, making SQL more stable, and faster.
  7. Massively improved RPO (recovery point objective) and RTO (recovery time objective) times 
  8. Fixed massive security issue, where one of them was causing 1,000+ employees to have sysadmin rights on the server.
  9. Implemented multiple new SQL Server features, such as Compression, partial availability, AlwaysOn, filestream optimizations, filegroup backups, resource governor, column store indexing, partitioning.
  10. Old data was migrated out of production database into data warehouse.
  11. Deployed proper monitoring and alerting.
  12. We used SQL performance tuning and improved the speed of about 200 most frequently run and most resources-intensive T-SQL queries.

Speak with a SQL Expert

In just 30 minutes, we will show you how we can eliminate your SQL Server headaches and provide 
operational peace of mind

Check Red9's SQL Server Services

SQL Consulting

Consulting customized to your SQL business needs.

Discover More

Managed SQL Server

Keep your SQL Servers healthy, stable, and fast - always.

Discover More

Emergency Support

When disaster strikes, talk to a human 24x7x365.

Discover More
Explore All Services