A Historically correct approach on the evolution of the Microsoft Data Platform

As we are nearing the end of a very interesting year, I was mesmerizing about the changes we saw this year. This took me back to what I’d later would call the very beginning of what would become my personal IT-Career, animals were still able to talk and we were all happily typing away on those  awesome 16-bit 80286 machines. Salespersons told us that we’d never be able to fill that enormous 40MB Ultra DMA/33 hard disk, nor would we ever run out of that 1MB of ram, hence the now famous quotes “DOS addresses only 1 Megabyte of RAM because we cannot imagine any applications needing more.” and the even more famous “640 kB is enough”

We all worked in either  Microsoft Works of Lotus 1-2-3, both had an integrated spreadsheet, word processor and database program that we could use to process data and it ran with a minimum of 256k and a recommended 512k of memory. Well let’s say processing data was simple, simply because there wasn’t much data that we would want to analyze on a computer.

Although this process looked simple, it can be seen as the Mesopotamian era of data processing, people had to write their own datasets and were able to share insights by sharing the data on floppy disks. This meant the data was very small and suited to sustain a small insight or drive small decisions. The included “database” program. The database management system, while still a “flat file” (i.e., non-relational) even allowed the novice user to perform transformations through formulas and even create user-defined reports which then had to be copied as text to the clipboard.

Think further about this, I looked into the history of data as a whole, and found some fun facts I’d love to share with you all 😊

1974

In 1974, the IBM San Jose Research center developed a relational DBMS, System R, to implement Codd’s concepts. A key development of the System R project was SQL. To apply the relational model, Codd needed a relational-database language he named DSL/Alpha. At the time, IBM didn’t believe in the potential of Codd’s ideas, leaving the implementation to a group of programmers not under Codd’s supervision, who violated several fundamentals of Codd’s relational model; the result was Structured English QUEry Language or SEQUEL.

When IBM released its first relational-database product, they wanted to have a commercial-quality sublanguage as well, so it overhauled SEQUEL, and renamed the basically-new language Structured Query Language (SQL).

 

 

1984-1985

In this era Robert Epstein, Mark Hoffman, Jane Doughty, and Tom Haggin founded Sybase (initially as System ware) They set out to create a relational database management system (RDBMS) that would organize information and make it available to computers within a network, 1986: Sybase enters into talks with Microsoft to license Data Server, a database product built to run on UNIX computers. Those talks led to a product called Ashton-Tate/Microsoft SQL Server 1.0, which shipped in May 1989.

Remember the average computer in 1985 was a IBM compatible x80286 chip and had:

  • 16K internal cache,
  • co-processor for arithmetic operations, the Intel 80287
  • Ran on an average clock speed of 6 to 12 Mhz
  • 24-Bit Address Bus
  • Between 640kb and 1MB of Ram
  • A 20 MB Hard disk

1995

Now let’s fast forward to the mid-nineties, machines were still following More’s law and somewhere on tour de route the animals had lost their speech. IBM unleashed DB2, or IBM Database 2 in 1983 on its MVS mainframe platform, Microsoft Released SQL Server 6, Oracle had created the first read-consistent database in 1994 and Intel introduced the Pentium Series CPU, in many ways this lead to a revolution switching from a 32 (486) to an 64-bit external data bus (the Pentium P5) machines gave us the possibility to use more resources, and machines slowly became more powerful.

Remember the average PC on 1995 was a Pentium chip and had:

  • 16K internal cache,
  • Ran on an average clock speed of 90Mhz
  • 16-Bit ISA Bus
  • 32-Bit EISA Bus
  • 64-Bit PCI bus
  • Between 16 and 32MB of SIMM memory
  • A 1 GB EIDE Hard drive

This context is important because it is in this world that Ralph Kimball wrote his first edition of “The Data Warehouse Toolkit”. [ISBN 978-0-471-15337-5]. Datasets were still reasonably small but the hardware wasn’t able to handle even these smaller datasets. In Ralph Kimball’s dimensional design approach (also called the bottom-up design) data marts facilitating reports and analysis are created first; these are then combined together to create a broad data warehouse. Keeping in mind the most important business aspects or departments, data marts are created first. These provide a thin view into the organizational data and, as and when required, these can be combined into a larger data warehouse. Kimball defines data warehouse as “a copy of transaction data specifically structured for query and analysis”.

Kimball’s data warehousing architecture focuses on ease of end-user accessibility and provides a high level of performance to the data warehouse.

2000-2005

After everyone had survived the Millennium bug, and the absence of any cataclysmic disasters, a new mayor revolution changed the Data Processing world, AMD release their x86-64 processors (Opteron). It introduces two new modes of operation, 64-bit mode and compatibility mode, along with a new 4-level paging mode. With 64-bit mode and the new paging mode, it supported vastly larger amounts of virtual memory and physical memory than what was possible on its 32-bit predecessors, allowing programs to store larger amounts of data in memory.

Until then SQL Server 2000 (32-bit) could only access as much as 32GB of memory, but needed to use Address Windowing Extensions (AWE) to access any memory it needs beyond the virtual memory limit. The important point about how SQL Server used AWE memory was that only the database page cache (i.e., data and index pages) could utilize physical memory outside the virtual memory of the process. All the other uses of memory—including plan cache, query workspace memory, locks, and other structures such as user connections, cursors, and space used by utilities such as backup and restore—were still limited to the virtual address space. Therefore, if your SQL Server application faces memory pressure in these parts of the system, adding additional memory beyond 3GB on 32-bit systems didn’t yield enough benefits.

To cope with that Microsoft releasing the (almost unknown) IA64 version of SQL server 2000, but in 2005 it finished the complete revision of the old Sybase code into Microsoft code and released SQL Server 2005. This would become the first x64 edition of SQL Server.

With this change Microsoft effectively removed the memory boundaries the data platform was heavily suffering from.

2010

While invented in 1978 (Not a typo using DRAM), the real break true of the SSD’s came with the introduction of MLC cells in Enterprise Flash Drives or EFDs in approximately 2012 finally broke the IOPS barrier in a consistent way. As SSDs do not need to spin or seek to locate data, they proved vastly superior to HDDs in random read tests. However, SSDs had challenges with mixed reads and writes, and their performance may degrade over time. Most of the advantages of solid-state drives over traditional hard drives are due to their ability to access data completely electronically instead of electromechanically, resulting in superior transfer speeds and mechanical ruggedness. Also to achieve the same throughput as a single SSD, you needed between 12 and 24 HD’s in a RAID 1+0 configuration. In serial IO these configurations could outperform a single SSD and were more resilient to data loss.

Once you started to really use SSD’s where the IOPS bottlenecks lived, the SSD equipped servers started to outperform disk arrays easily.

The other game changer was… Azure.

The original Azure host operating system was a fork of the Windows OS called the ‘Red Dog OS’. Azure was pioneering functionality important to data centers everywhere. Running a fork of an OS is not ideal (in terms of the additional cost and complexity), so the Azure team talked to the Windows team.

2014

During Build 2014 the Azure team announced over 40+ updates to the Microsoft Azure cloud platform, including significant enhancements to Virtual Machines, Storage, Web Sites, Mobile Services and Azure SQL Database.  In addition, a new Azure Preview Portal was released that provides a customizable end-to-end DevOps experience along with Azure Resource Manager, a new framework for composing and managing cloud applications.

Also Windows Azure got renamed to Microsoft Azure to reflect the strategy and focus on Azure as an enterprise-grade cloud platform that supports one of the broadest set of operating systems, languages and services of any public cloud.

This was just the beginning though…

2018

Azure Gen 2 Is rolling out, and now the data platform is ready to shake your world, as it will challenge you to question the old gods and the old ways of working.

My personal feeling about this: Challenge Accepted!