kohera-logo-regular.svg

Nasty ballooning kills your SQL performance

Vector_BG.png

Nasty ballooning kills your SQL performance

Vector_BG.png

This post might sound a bit like a game of Cluedo, but it happened on a production SQL Server…

But now you want to prove it
This post might sound a bit like a game of Cluedo, but it happened on a production SQL Server…
The issue showed itself on a hosted production VM with 64GB of Ram, where SQL could allocate memory from 32GB to 55GB. The agreements with the hosting partner clearly mentioned that while the CPU could be overcommitted, memory couldn’t.
All my alerts suddenly started to show that the machine was using all the available and became completely unresponsive (I couldn’t even open the Event Viewer to unload the logs).
Remote connection with management studio gave me this worrisome error:

 

But ping responses proved to be ok (less than 3 ms) so I didn’t suspect memory, page life expectancy gave a hint that something was certainly not OK

The Microsoft Performance dashboard showed worrying sings, so I RDP’d onto the server, and saw the horror of any SQL DBA this SQL Server was enduring Swapping Hell… whilst it should never even have to use its swap file… Ok something or someone murdered my SQL Server… let’s start the CSI investigation…

 

In the Recource Monitor we see that SQL is “only” getting ±20GB and is struggling to run. Swap file is overloaded and the server is running like a snail with arthritis.

 

So I suspected VMWare ballooning was the cruel pit here, but as this was especially agreed upon that this machine wouldn’t use memory ballooning I had to prove that it was happening Unfortunately as this is a hosted environment I could not check the ESX console to see the ballooning parameters, so I had to look for another way to find the smoking gun.
Luckily I found good documentation on the VMWare site explaining that there is a command file in the VMWare tools folder that just could to the trick, so I tried this on our murder(ed) victim and Hurray!  It gave me the proof I needed

 

So the murder weapon was known… VM Ware’s ballooning was to blame…

Cluedo summary
– Murder victim: Production Server
– Killed in VMWare ESX environment
– By the VM Ware ballooning service

Now I can go to the hosting partner and ask them to fix this, could become interesting because in the contracts, we especially agreed on not turning on ballooning on production SQL Servers ;-)
On a side note, normally VMWare should have released memory when the (SQL) Server started to ask more resources, but somehow it didn’t, so I also send them the following link to a known issue where the Balloon driver does not releases its memory when using default parameters:
Balloon driver retains hold on memory causing virtual machine guest operating system performance issues (1003470)
I’m interested how this is going to go on ;-)

On the other hand, the VMWareToolBoxCMD stat seems to be extremely useful, in many situations, so I pasted the most interesting ones from the complete list here:

Photo of successful woman coder hacker web creator sitting armchair comfortable workspace workstation indoors.
The hurdles and pitfalls of moving or migrating a System-versioned temporal table cross database
Maybe you already have your own way of doing this and are wondering about alternative methods, or maybe you are...
Group of computer programmers working in the office. Focus is on blond woman showing something to her colleague on PC.
Updating your Azure SQL server OAuth2 credentials in Power BI via PowerShell for automation purposes
The better way to update OAuth2 credentials in Power BI is by automating the process of updating Azure SQL Server...
2401-under-memory-pressure-featured-image
Under (memory) pressure
A few weeks ago, a client asked me if they were experiencing memory pressure and how they could monitor it...
2402-fabric-lakehouse-featured-image
Managing files from other devices in a Fabric Lakehouse using the Python Azure SDK
In this blogpost, you’ll see how to manage files in OneLake programmatically using the Python Azure SDK. Very little coding...
2319-blog-database-specific-security-featured-image
Database specific security in SQL Server
There are many different ways to secure your database. In this blog post we will give most of them a...
kohera-2312-blog-sql-server-level-security-featured-image
SQL Server security made easy on the server level
In this blog, we’re going to look at the options we have for server level security. In SQL Server we...