A Historically correct approach on the evolution of the Microsoft Data Platform

December 7, 2018

A Historically correct approach on the evolution of the Microsoft Data Platform

December 7, 2018

As we are nearing the end of a very interesting year, I was mesmerizing about the changes we saw this year. This took me back to what I’d later would call the very beginning of what would become my personal IT-Career, animals were still able to talk and we were all happily typing away on those awesome 16-bit 80286 machines. Salespersons told us that we’d never be able to fill that enormous 40MB Ultra DMA/33 hard disk, nor would we ever run out of that 1MB of ram, hence the now famous quotes “DOS addresses only 1 Megabyte of RAM because we cannot imagine any applications needing more.” and the even more famous “640 kB is enough”

We all worked in either Microsoft Works of Lotus 1-2-3, both had an integrated spreadsheet, word processor and database program that we could use to process data and it ran with a minimum of 256k and a recommended 512k of memory. Well let’s say processing data was simple, simply because there wasn’t much data that we would want to analyze on a computer.

Although this process looked simple, it can be seen as the Mesopotamian era of data processing, people had to write their own datasets and were able to share insights by sharing the data on floppy disks. This meant the data was very small and suited to sustain a small insight or drive small decisions. The included “database” program. The database management system, while still a “flat file” (i.e., non-relational) even allowed the novice user to perform transformations through formulas and even create user-defined reports which then had to be copied as text to the clipboard.

Think further about this, I looked into the history of data as a whole, and found some fun facts I’d love to share with you all 😊

1974

In 1974, the IBM San Jose Research center developed a relational DBMS, System R, to implement Codd’s concepts. A key development of the System R project was SQL. To apply the relational model, Codd needed a relational-database language he named DSL/Alpha. At the time, IBM didn’t believe in the potential of Codd’s ideas, leaving the implementation to a group of programmers not under Codd’s supervision, who violated several fundamentals of Codd’s relational model; the result was Structured English QUEry Language or SEQUEL.

When IBM released its first relational-database product, they wanted to have a commercial-quality sublanguage as well, so it overhauled SEQUEL, and renamed the basically-new language Structured Query Language (SQL).

1984-1985

In this era Robert Epstein, Mark Hoffman, Jane Doughty, and Tom Haggin founded Sybase (initially as System ware) They set out to create a relational database management system (RDBMS) that would organize information and make it available to computers within a network, 1986: Sybase enters into talks with Microsoft to license Data Server, a database product built to run on UNIX computers. Those talks led to a product called Ashton-Tate/Microsoft SQL Server 1.0, which shipped in May 1989.

Remember the average computer in 1985 was a IBM compatible x80286 chip and had:

16K internal cache,
co-processor for arithmetic operations, the Intel 80287
Ran on an average clock speed of 6 to 12 Mhz
24-Bit Address Bus
Between 640kb and 1MB of Ram
A 20 MB Hard disk

1995

Now let’s fast forward to the mid-nineties, machines were still following More’s law and somewhere on tour de route the animals had lost their speech. IBM unleashed DB2, or IBM Database 2 in 1983 on its MVS mainframe platform, Microsoft Released SQL Server 6, Oracle had created the first read-consistent database in 1994 and Intel introduced the Pentium Series CPU, in many ways this lead to a revolution switching from a 32 (486) to an 64-bit external data bus (the Pentium P5) machines gave us the possibility to use more resources, and machines slowly became more powerful.

Remember the average PC on 1995 was a Pentium chip and had:

16K internal cache,
Ran on an average clock speed of 90Mhz
16-Bit ISA Bus
32-Bit EISA Bus
64-Bit PCI bus
Between 16 and 32MB of SIMM memory
A 1 GB EIDE Hard drive

This context is important because it is in this world that Ralph Kimball wrote his first edition of “The Data Warehouse Toolkit”. [ISBN 978-0-471-15337-5]. Datasets were still reasonably small but the hardware wasn’t able to handle even these smaller datasets. In Ralph Kimball’s dimensional design approach (also called the bottom-up design) data marts facilitating reports and analysis are created first; these are then combined together to create a broad data warehouse. Keeping in mind the most important business aspects or departments, data marts are created first. These provide a thin view into the organizational data and, as and when required, these can be combined into a larger data warehouse. Kimball defines data warehouse as “a copy of transaction data specifically structured for query and analysis”.

Kimball’s data warehousing architecture focuses on ease of end-user accessibility and provides a high level of performance to the data warehouse.

2000-2005

After everyone had survived the Millennium bug, and the absence of any cataclysmic disasters, a new mayor revolution changed the Data Processing world, AMD release their x86-64 processors (Opteron). It introduces two new modes of operation, 64-bit mode and compatibility mode, along with a new 4-level paging mode. With 64-bit mode and the new paging mode, it supported vastly larger amounts of virtual memory and physical memory than what was possible on its 32-bit predecessors, allowing programs to store larger amounts of data in memory.

Until then SQL Server 2000 (32-bit) could only access as much as 32GB of memory, but needed to use Address Windowing Extensions (AWE) to access any memory it needs beyond the virtual memory limit. The important point about how SQL Server used AWE memory was that only the database page cache (i.e., data and index pages) could utilize physical memory outside the virtual memory of the process. All the other uses of memory—including plan cache, query workspace memory, locks, and other structures such as user connections, cursors, and space used by utilities such as backup and restore—were still limited to the virtual address space. Therefore, if your SQL Server application faces memory pressure in these parts of the system, adding additional memory beyond 3GB on 32-bit systems didn’t yield enough benefits.

To cope with that Microsoft releasing the (almost unknown) IA64 version of SQL server 2000, but in 2005 it finished the complete revision of the old Sybase code into Microsoft code and released SQL Server 2005. This would become the first x64 edition of SQL Server.

With this change Microsoft effectively removed the memory boundaries the data platform was heavily suffering from.

2010

While invented in 1978 (Not a typo using DRAM), the real break true of the SSD’s came with the introduction of MLC cells in Enterprise Flash Drives or EFDs in approximately 2012 finally broke the IOPS barrier in a consistent way. As SSDs do not need to spin or seek to locate data, they proved vastly superior to HDDs in random read tests. However, SSDs had challenges with mixed reads and writes, and their performance may degrade over time. Most of the advantages of solid-state drives over traditional hard drives are due to their ability to access data completely electronically instead of electromechanically, resulting in superior transfer speeds and mechanical ruggedness. Also to achieve the same throughput as a single SSD, you needed between 12 and 24 HD’s in a RAID 1+0 configuration. In serial IO these configurations could outperform a single SSD and were more resilient to data loss.

Once you started to really use SSD’s where the IOPS bottlenecks lived, the SSD equipped servers started to outperform disk arrays easily.

The other game changer was… Azure.

The original Azure host operating system was a fork of the Windows OS called the ‘Red Dog OS’. Azure was pioneering functionality important to data centers everywhere. Running a fork of an OS is not ideal (in terms of the additional cost and complexity), so the Azure team talked to the Windows team.

2014

During Build 2014 the Azure team announced over 40+ updates to the Microsoft Azure cloud platform, including significant enhancements to Virtual Machines, Storage, Web Sites, Mobile Services and Azure SQL Database. In addition, a new Azure Preview Portal was released that provides a customizable end-to-end DevOps experience along with Azure Resource Manager, a new framework for composing and managing cloud applications.

Also Windows Azure got renamed to Microsoft Azure to reflect the strategy and focus on Azure as an enterprise-grade cloud platform that supports one of the broadest set of operating systems, languages and services of any public cloud.

This was just the beginning though…

2018

Azure Gen 2 Is rolling out, and now the data platform is ready to shake your world, as it will challenge you to question the old gods and the old ways of working.

My personal feeling about this: Challenge Accepted!

Kohera, Power BI

Updating your Azure SQL server OAuth2 credentials in Power BI via PowerShell for automation purposes

7 February 2024

The better way to update OAuth2 credentials in Power BI is by automating the process of updating Azure SQL Server...

Kohera

Under (memory) pressure

28 January 2024

A few weeks ago, a client asked me if they were experiencing memory pressure and how they could monitor it...

Fabric

Managing files from other devices in a Fabric Lakehouse using the Python Azure SDK

11 January 2024

In this blogpost, you’ll see how to manage files in OneLake programmatically using the Python Azure SDK. Very little coding...

SQL Server

Database specific security in SQL Server

12 October 2023

There are many different ways to secure your database. In this blog post we will give most of them a...

SQL Server

SQL Server security made easy on the server level

13 September 2023

In this blog, we’re going to look at the options we have for server level security. In SQL Server we...

Kohera, SQL Server

Microsoft SQL Server history

5 July 2023

Since its inception in 1989, Microsoft SQL Server is a critical component of many organizations' data infrastructure. As data has...

Cookie	Duration	Description
ARRAffinity	session	ARRAffinity cookie is set by Azure app service, and allows the service to choose the right instance established by a user to deliver subsequent requests made by that user.
ARRAffinitySameSite	session	This cookie is set by Windows Azure cloud, and is used for load balancing to make sure the visitor page requests are routed to the same server in any browsing session.
cookielawinfo-checkbox-advertisement	1 year	Set by the GDPR Cookie Consent plugin, this cookie records the user consent for the cookies in the "Advertisement" category.
cookielawinfo-checkbox-analytics	11 months	This cookie is set by GDPR Cookie Consent plugin. The cookie is used to store the user consent for the cookies in the category "Analytics".
cookielawinfo-checkbox-functional	11 months	The cookie is set by GDPR cookie consent to record the user consent for the cookies in the category "Functional".
cookielawinfo-checkbox-necessary	11 months	This cookie is set by GDPR Cookie Consent plugin. The cookies is used to store the user consent for the cookies in the category "Necessary".
cookielawinfo-checkbox-others	11 months	This cookie is set by GDPR Cookie Consent plugin. The cookie is used to store the user consent for the cookies in the category "Other.
cookielawinfo-checkbox-performance	11 months	This cookie is set by GDPR Cookie Consent plugin. The cookie is used to store the user consent for the cookies in the category "Performance".
CookieLawInfoConsent	1 year	CookieYes sets this cookie to record the default button state of the corresponding category and the status of CCPA. It works only in coordination with the primary cookie.
elementor	never	The website's WordPress theme uses this cookie. It allows the website owner to implement or change the website's content in real-time.
viewed_cookie_policy	11 months	The cookie is set by the GDPR Cookie Consent plugin and is used to store whether or not user has consented to the use of cookies. It does not store any personal data.

Cookie	Duration	Description
__cf_bm	30 minutes	Cloudflare set the cookie to support Cloudflare Bot Management.
pll_language	1 year	Polylang sets this cookie to remember the language the user selects when returning to the website and get the language information when unavailable in another way.

Cookie	Duration	Description
_ga	1 year 1 month 4 days	Google Analytics sets this cookie to calculate visitor, session and campaign data and track site usage for the site's analytics report. The cookie stores information anonymously and assigns a randomly generated number to recognise unique visitors.
_ga_*	1 year 1 month 4 days	Google Analytics sets this cookie to store and count page views.
_gat_gtag_UA_*	1 minute	Google Analytics sets this cookie to store a unique user ID.
_gid	1 day	Google Analytics sets this cookie to store information on how visitors use a website while also creating an analytics report of the website's performance. Some of the collected data includes the number of visitors, their source, and the pages they visit anonymously.
ai_session	30 minutes	This is a unique anonymous session identifier cookie set by Microsoft Application Insights software to gather statistical usage and telemetry data for apps built on the Azure cloud platform.
CONSENT	2 years	YouTube sets this cookie via embedded YouTube videos and registers anonymous statistical data.
vuid	1 year 1 month 4 days	Vimeo installs this cookie to collect tracking information by setting a unique ID to embed videos on the website.

Cookie	Duration	Description
ai_user	1 year	Microsoft Azure sets this cookie as a unique user identifier cookie, enabling counting of the number of users accessing the application over time.
VISITOR_INFO1_LIVE	5 months 27 days	YouTube sets this cookie to measure bandwidth, determining whether the user gets the new or old player interface.
YSC	session	Youtube sets this cookie to track the views of embedded videos on Youtube pages.
yt-remote-connected-devices	never	YouTube sets this cookie to store the user's video preferences using embedded YouTube videos.
yt-remote-device-id	never	YouTube sets this cookie to store the user's video preferences using embedded YouTube videos.
yt.innertube::nextId	never	YouTube sets this cookie to register a unique ID to store data on what videos from YouTube the user has seen.
yt.innertube::requests	never	YouTube sets this cookie to register a unique ID to store data on what videos from YouTube the user has seen.