Blog

SQL Server Problem Tombola: MAXDOP 1

As you all may know, many vendors tend to force you to put the MAXDOP of your SQL Server to 1 (SAP, SHAREPOINT, AX). For these applications, this is by design because another MAXDOP setting might interfere with the otherwise performant workload of the application. Today I would like to demonstrate you the actual impact on your server of setting your MAXDOP to 1.

To start off I would like to give you a small introduction on what MAXDOP actually does. The MAXDOP setting limits the amount of CPUs used by one query in your SQL server instance. For example: if you put your MD to 2, then the query receives the option to create a parallel plan with 2 cores to execute a query. Fortunately, in SQL Server 2016 you can do this on database level. The MD setting works in parallel with another setting on instance level: Cost Threshold for Parallelism (default 5). The CTFP is the value you set for your instance to start evaluating the possibility to use a parallel plan if you have MD 2/4/8/…. For example, if your CTFP is set to 60 and your query plan only has a cost of 10, your query will not be evaluated to run as parallel. If then, however, you have another query with a cost of 80, the query optimizer will check the possibility to run this query in parallel (depending on your MD setting).

Approach

To show you what the impact to your system is, I am going to create an OLTP workload and OLAP workload to my virtual machine running on my laptop. The OLTP workload will do update inserts and deletes, the OLAP workload will do some read intensive queries onto the system. The OLTP workload will be small queries with a low cost, the OLAP workload will have more impact and calculations which will make them have a higher cost.The virtual machine will be configured with 4 cores. To run the workload, I will be using SQLQueryStress which can be found here. One thread will be used and will do 100 iterations for the two workloads. I will be running all my test against in-memory tables to reduce any disk overhead, and purely to look at the CPU usage. I will be running 5 tests with following setup.

  1. MAXDOP 1 OLAP & OLTP together
  2. MAXDOP 2 COST THRESHOLD FOR PARALLELISM 1 OLAP & OLTP together
  3. MAXDOP 4 COST THRESHOLD FOR PARALLELISM 1 OLAP & OLTP together
  4. MAXDOP 2 COST THRESHOLD FOR PARALLELISM 6 OLAP & OLTP together
  5. MAXDOP 2 COST THRESHOLD FOR PARALLELISM 6 OLAP & OLTP together

I have altered my CTFP setting to 6 because only then my OLAP queries will be eligible for a parallel plan since they have a plan cost of 7. Here are the results:

When looking at the full elapsed time, we can see that following configurations were the fastest.

When Looking at the CPU user for every iteration, we can see that following configurations used the least amount of CPU.

When looking at the amount of seconds used by the client for every iteration we see the following test to take the least amount of time

When looking at the full elapsed time following configuration were the slowest


When looking at the CPU usage for every iteration we can see that following configuration uses the most of the CPU


When looking at the amount of seconds used by every client for every iteration we see that following test were the slowest.

Conclusion OLTP and OLAP

As we can see in the results, the MAXDOP 1 setting is the slowest in comparison to all other tests in full. This means that, in most cases, you will limit the throughput of your SQL Server if you set your MAXDOP to one. The fastest configuration for my laptop has been proven to be MAXDOP 2 with COST THRESHOLD FOR PARALLELISM 6 for this workload with this hardware configuration. This is, however, not an exact science; this has to be tested for every system. The MAXDOP 1 was the slowest because we are limiting our read intensive workload to use one core for every query it executes. This increases the time for every read query in total, this makes that the workload was the slowest.

The fastest configuration is when we allow the read intensive workload to use 2 cores and limit the write/update/delete workload to 1 core. Because we are running both workloads together with 1 thread, we are letting our read intensive workload use 2 cores and leaving the 2 other cores available for our OLTP workload. This means that we are getting an ideal throughput for our workload as seen to our hardware.

What about datawarehouses?

Our datawarehouse will primarily have a read intensive aka OLAP workload, with one or more daily loads (INSERT/UPDATES/DELETES) happening. This means that we should get the most throughput on our read queries. I have been to some clients who have set their MAXDOP to 1 for their datawarehouse. This will have a negative effect on the speed at which queries are executed. I demonstrate this by testing the OLAP workload without the OLTP workload interfering For this I will do 3 tests.
1. 100 Iterations 1 thread OLAP workload MAXDOP 1
2. 100 Iterations 1 thread OLAP workload MAXDOP 2
3. 100 Iterations 1 thread OLAP workload MAXDOP 4

These are the results on my laptop:

As you can see in the table above the fastest way of doing read queries is by giving the workload as much cores as possible to increase the amount of reads which can be done in the least amount of time. MAXDOP 1 is the slowest run, with MAXDOP 2 being 17 SECONDS faster. There is not a lot of difference between the run of MAXDOP 4 and MAXDOP 2 because of hardware limitations.

Conclusion OLAP

Because of the read intended workload of a datawarehouse, we should not put our MAXDOP to 1. As seen in the test above, the more cores you have, the more read throughput you will have. This will make your queries in your DWH go faster. In these test we were 60% faster when using multiple cores for our workload instead of one core.

What about pure write workload?

I would also like to know what MAXDOP setting is best when just writing data(UPDATE INSERT DELETE). To test this, I created 3 test scenarios:
1. 100 Iterations 1 thread OLTP workload MAXDOP 1
2. 100 Iterations 1 thread OLTP workload MAXDOP 2
3. 100 Iterations 1 thread OLTP workload MAXDOP 4
These are the results on my laptop:

Conclusion OLTP

When inserting/updating/deleting data, the fastest option is running queries in a serial way. When running these processes in parallel, we will increase the amount of CPU needed for the query and this will increase the time elapsed. When writing to in-memory tables as we are doing in this test, we see that writing data in amount of CPU time is 50% faster on MAXDOP 1. This means that going in a serial way will be faster than doing this in a parallel way.

Overall Conclusion

We should use caution wwith concern to our MAXDOP setting. We could see that for the two different workloads, we have different needs. The read intensive workload will, in most cases, benefit from a parallel execution, while the write intensive workload will benefit of a serial execution. Because most systems are not READ or WRITE only, I generally think the way to go is try to filter out those read-intensive high cost plans and allow these plans to make use of parallel execution. Thus limiting the amount of queries to be executed in parallel, but not removing the option by default. This way we will not hold back our OLTP queries and we will be able to speed up those queries that can really benefit of the parallel execution. This means you should always use the MAXDOP setting in combination with your COST THRESHOLD FOR PARALLELISM SETTING. There is, however, no golden bullet for these settings and they will always need to be tested out to cope with the workload of your system.

Thank you for reading and stay tuned!