kohera-logo-regular.svg

Natively Stored Procs and In-memory tables: NextGen loading!

Today I was creating a dummy database and wanted to fill this with a certain amount of data. Since I was going to use this database for demo purposes, I created them with foreign keys, primary keys and some indexes. I found that the way I was loading data was quite slow, so I decided to load the data into my database using Natively Stored Procedures and in-memory tables, since this should be the fastest way. The result really baffled me!

Some background information: I’m using the DELL Inspirion 5220 with 16GB RAM. My virtual machine has access to all cores but is capped to 90%. It has 4096MB of memory of which 3584 is allocated to SQL Server. The VM is a server core with SQL server 2016 which runs on SSDs.

The first scenario I used loaded my data into SQL Server without the usage of in-memory or natively stored procedures. A lot of code will follow but this way you can test it yourself. The following script was used.

-- CREATE DATABASE PART
ALTER DATABASE REALTIMEANALYTICS SET SINGLE_USER WITH ROLLBACK IMMEDIATE

USE MASTER
GO
DROP DATABASE IF EXISTS RealtimeAnalytics
GO
CREATE DATABASE [RealtimeAnalytics]
CONTAINMENT = NONE
ON  PRIMARY
( NAME = N'RealtimeAnalytics_1', FILENAME = N'G:\DATA\RealtimeAnalytics_1.mdf' , SIZE = 131072KB , FILEGROWTH = 131072KB ),
( NAME = N'RealtimeAnalytics_2', FILENAME = N'G:\DATA\RealtimeAnalytics_2.ndf' , SIZE = 131072KB , FILEGROWTH = 131072KB ),
( NAME = N'RealtimeAnalytics_3', FILENAME = N'G:\DATA\RealtimeAnalytics_3.ndf' , SIZE = 131072KB , FILEGROWTH = 131072KB ),
( NAME = N'RealtimeAnalytics_4', FILENAME = N'G:\DATA\RealtimeAnalytics_4.ndf' , SIZE = 131072KB , FILEGROWTH = 131072KB )
LOG ON
( NAME = N'RealtimeAnalytics_log', FILENAME = N'C:\Data\LOG\RealtimeAnalytics_log.ldf' , SIZE = 131072KB , FILEGROWTH = 131072KB )
GO

I created a database with 4 files called Realtime Analytics. Afterwards I’m adding 4 tables with keys to each other.

 

-- CREATE TABLE PART
CREATE TABLE dbo.Clients
(
ClientID int identity(-2147483648,1)PRIMARY KEY NOT NULL,
ClientName varchar(200),
ClientCreditcard varchar(200)
)
GO
CREATE TABLE dbo.SalesPersons
(
SalesPersonID int identity(-2147483648,1)PRIMARY KEY NOT NULL,
SalesPersonName varchar(300),
SalesPersonDiscount int
)
GO
CREATE TABLE dbo.Item
(
ItemID int identity(-2147483648,1)PRIMARY KEY NOT NULL,
ItemName varchar(300),
ItemPrice numeric(15,2)
)
GO
CREATE TABLE dbo.Production
(
ProductionID int identity (-2147483648,1) PRIMARY KEY NOT NULL,
ClientID int FOREIGN KEY REFERENCES CLIENTS(ClientID)NOT NULL,
SalesPersonID int FOREIGN KEY REFERENCES SalesPersons(SalesPersonID)NOT NULL,
AmountOfItemsSold int,
ItemID int FOREIGN KEY REFERENCES Item(ItemID)NOT NULL,
DateOfSale datetime not null DEFAULT(Getdate())
)
GO

After the creation of the tables I’m going to insert data into the tables in a while loop. It’s not the best solution, but, hey, I will do the same thing with the natively.

 

---- FILL THE TABLES WITH DATA
DECLARE @Count int = 1
--FILL CLIENTS
WHILE @Count <= 1000
BEGIN
INSERT INTO dbo.Clients
select 'Name' + cast(@Count as varchar), '1111-1111-111'+cast(@Count as varchar)
SET @Count = @Count +1
END
SET @Count = 1
--FILL SalesPersons
WHILE @Count <= 10
BEGIN
INSERT INTO dbo.SalesPersons
select 'Name' + cast(@Count as varchar), FLOOR(RAND()*(50-1)+1)
SET @Count = @Count +1
END
SET @Count = 1
--FILL Items
WHILE @Count <= 800
BEGIN
INSERT INTO dbo.Item
select 'Item' + cast(@Count as varchar),cast(RAND()*(200000-1) as numeric(15,2))
SET @Count = @Count +1
END
SET @Count = 1
DECLARE @ClientID int
DECLARE @SalesPersonID int
DECLARE @ItemID int
--FILL Items
WHILE @Count <= 2000000
BEGIN
SET @ClientID = (select ClientID from dbo.Clients where ClientName = 'Name'+CAST(FLOOR(RAND()*(1000-1)+1)AS VARCHAR))
SET @SalesPersonID = (select SalesPersonID from dbo.SalesPersons where SalesPersonName = 'Name'+CAST(FLOOR(RAND()*(10-1)+1)AS VARCHAR))
SET @ItemID = (select ItemID from dbo.Item where ItemName = 'Item'+CAST(FLOOR(RAND()*(800-1)+1)as varchar))

INSERT INTO dbo.Production
select @ClientID,@SalesPersonID,FLOOR(RAND()*(100-1)+1),@ItemID,dateadd(second, -floor(rand()*(604800-1)+1),getdate())
SET @Count = @Count +1
END

When we run this script we see following workload on the C-drive. The SSD is getting hammered, but CPU-usage is quite fine.

I decided to stop the script after about an hour and a half, because this was taking way too long.

 

When I checked how many records where inserted into the Production Table I saw following results.

During all that time, only about 1 000 000 rows were inserted, whichis unacceptable. So I thought let’s use that improved in-memory table and natively stored procedures and see what the difference is. So I edited the script with following code:

On the Create database I made it Memory Optimized enabled

use master  
go  
alter database RealtimeAnalytics add filegroup RealtimeAnalytics_mod contains memory_optimized_data  
go  
-- adapt filename as needed  
alter database RealtimeAnalytics add file (name='RealtimeAnalytics_mod', filename='G:\data\RealtimeAnalytics_mod') to filegroup RealtimeAnalytics_mod  
go

I then created the same table structure but in-memory.

--MEMORY OPTIMIZED 
use RealtimeAnalytics  
go  
CREATE TABLE dbo.Clients_MON
(
ClientID int identity(1,1)PRIMARY KEY NONCLUSTERED NOT NULL,
ClientName varchar(200),
ClientCreditcard varchar(200)
)
with (memory_optimized=on)  
GO

CREATE TABLE dbo.SalesPersons_MON
(
SalesPersonID int identity(1,1)PRIMARY KEY NONCLUSTERED NOT NULL,
SalesPersonName varchar(300),
SalesPersonDiscount int
)
with (memory_optimized=on)  
GO

CREATE TABLE dbo.Item_MON
(
ItemID int identity(1,1)PRIMARY KEY NONCLUSTERED NOT NULL,
ItemName varchar(300),
ItemPrice numeric(15,2)
)
with (memory_optimized=on)  
GO

CREATE TABLE dbo.Production_MON
(
ProductionID int identity (1,1) PRIMARY KEY NONCLUSTERED NOT NULL,
ClientID int FOREIGN KEY REFERENCES CLIENTS_MON(ClientID)NOT NULL,
SalesPersonID int FOREIGN KEY REFERENCES SalesPersons_MON(SalesPersonID)NOT NULL,
AmountOfItemsSold int,
ItemID int FOREIGN KEY REFERENCES Item_MON(ItemID)NOT NULL,
DateOfSale datetime not null DEFAULT(Getdate())
)
with (memory_optimized=on)

Notice that I can’t set my ID to -2147483648, more info on that here. Why would they do that? But this is out of scope and I’m going to continue by giving you the stored proc that I created. This is exactly the same thing as I did in the previous test, only now I made a native stored proc out of it.

USE RealtimeAnalytics
GO

create procedure dbo.FillDatabase  
with native_compilation, schemabinding, execute as owner  
as  
begin atomic  
with (transaction isolation level=snapshot, language=N'us_english')  
  
DECLARE @Count int = 1

--FILL CLIENTS
WHILE @Count <= 1000
BEGIN
INSERT INTO dbo.Clients_MON
select 'Name' + cast(@Count as varchar), '1111-1111-111'+cast(@Count as varchar)
SET @Count = @Count +1
END

SET @Count = 1
--FILL SalesPersons
WHILE @Count <= 10
BEGIN
INSERT INTO dbo.SalesPersons_MON
select 'Name' + cast(@Count as varchar), FLOOR(RAND()*(50-1)+1)
SET @Count = @Count +1
END

SET @Count = 1
--FILL Items
WHILE @Count <= 800
BEGIN
INSERT INTO dbo.Item_MON
select 'Item' + cast(@Count as varchar),cast(RAND()*(200000-1) as numeric(15,2))
SET @Count = @Count +1
END

SET @Count = 1

DECLARE @ClientID int
DECLARE @SalesPersonID int
DECLARE @ItemID int
--FILL Items
WHILE @Count <= 2000000
BEGIN
SET @ClientID = (select ClientID from dbo.Clients_MON where ClientName = 'Name'+CAST(FLOOR(RAND()*(1000-1)+1)AS VARCHAR))
SET @SalesPersonID = (select SalesPersonID from dbo.SalesPersons_MON where SalesPersonName = 'Name'+CAST(FLOOR(RAND()*(10-1)+1)AS VARCHAR))
SET @ItemID = (select ItemID from dbo.Item_MON where ItemName = 'Item'+CAST(FLOOR(RAND()*(800-1)+1)as varchar))

INSERT INTO dbo.Production_MON
select @ClientID,@SalesPersonID,FLOOR(RAND()*(100-1)+1),@ItemID,dateadd(second, -floor(rand()*(604800-1)+1),getdate())
SET @Count = @Count +1
END
End

 

After this, I created following statement to load the data to in-memory and then switch it to my normal tables using a select into:

EXEC dbo.FillDatabase
GO
SET IDENTITY_INSERT dbo.Clients ON
insert into dbo.Clients(ClientID,ClientName,ClientCreditcard) 
select ClientID,ClientName,ClientCreditcard from Clients_MON
SET IDENTITY_INSERT dbo.Clients OFF
GO
SET IDENTITY_INSERT dbo.SalesPersons ON
insert into dbo.SalesPersons (SalesPersonID,SalesPersonName,SalesPersonDiscount)
select SalesPersonID,SalesPersonName,SalesPersonDiscount from SalesPersons_MON
SET IDENTITY_INSERT dbo.SalesPersons OFF
GO
SET IDENTITY_INSERT dbo.Item ON
insert into dbo.Item (ItemID,ItemName,ItemPrice)
select ItemID,ItemName,ItemPrice from Item_MON
SET IDENTITY_INSERT dbo.Item OFF
GO
SET IDENTITY_INSERT dbo.Production ON
insert into dbo.Production (ProductionID,ClientID,SalesPersonID,AmountOfItemsSold,ItemID,DateOfSale)
select ProductionID,ClientID,SalesPersonID,AmountOfItemsSold,ItemID,DateOfSale from Production_MON
SET IDENTITY_INSERT dbo.Production OFF

 

While executing the last statement we can see that the workload has changed. We no longer see our disk getting hammered. this is normal because we are hammering our memory and more CPU-usage!

 

After about 8 minutes I got a ping that my query had executed. 8 MINUTES!!!!

 

So for doing exactly the same thing but first loading them into memory I got my result in my table in 8 minutes and 48 seconds, while with the other way (no in-memory tables & native procs) I only had inserted 1 000 000 rows after 1 hour and 30 minutes. This is an excellent performance gain if you ask me! To conclude, let’s just say that in-memory optimized tables and natively stored procs are fast, very fast, extremely fast! If you have a high insert rate on your table and the table can’t follow I would suggest trying to implement this scenario because it might be the case that you can greatly benefit from it.

Thanks for reading and stay tuned!

Group of computer programmers working in the office. Focus is on blond woman showing something to her colleague on PC.
Updating your Azure SQL server OAuth2 credentials in Power BI via PowerShell for automation purposes
The better way to update OAuth2 credentials in Power BI is by automating the process of updating Azure SQL Server...
2401-under-memory-pressure-featured-image
Under (memory) pressure
A few weeks ago, a client asked me if they were experiencing memory pressure and how they could monitor it...
2402-fabric-lakehouse-featured-image
Managing files from other devices in a Fabric Lakehouse using the Python Azure SDK
In this blogpost, you’ll see how to manage files in OneLake programmatically using the Python Azure SDK. Very little coding...
2319-blog-database-specific-security-featured-image
Database specific security in SQL Server
There are many different ways to secure your database. In this blog post we will give most of them a...
kohera-2312-blog-sql-server-level-security-featured-image
SQL Server security made easy on the server level
In this blog, we’re going to look at the options we have for server level security. In SQL Server we...
blog-security_1
Microsoft SQL Server history
Since its inception in 1989, Microsoft SQL Server is a critical component of many organizations' data infrastructure. As data has...