Let me begin by introducing the main characters of this blog. We have Marc, a senior system administrator with a deep technical knowledge about Windows PowerShell, networks and everything related to Active Directory and DNS. There’s also Richard, CTO of the company, a great visionary and in the old days one of the best system administrators the company had. Finally, there is Stephanie, who has been working with SQL Server on various projects for almost ten years, working as an external consultant. Stephanie loves consultancy because of the variety of SQL Server environments she gets to work with. Her main motivation is pushing the limits of SQL Server while focussing on the needs of the customer in order to achieve the perfect design every time.
In this particular case, Stephanie is the only DBA, responsible for the new database server’s architecture. Apart from doing an initial health check and inventory, the goal is to migrate a scattered SQL Server landscape consisting of different versions and patch levels without touching the legacy systems. Stephanie doesn’t like running servers without or with hardly any backups, but, with a limited budget, default maintenance plans were made a long time ago. Besides, backup files were still there, someone from storage said. What could go wrong?
Preparation is key
This is the plan: a new, full-blown three node AlwaysOn High Availability cluster running on SQL Server 2016, nice CPUs and decent-sized memory and storage capacity. Stephanie is proud of the technical design so far, also thanks to Marc and Richard who supported her for making such choices. After all, SQL Server is not like an average file server.
After discussing the technical details of the cluster setup and automation with Marc, Stephanie is ready to start scripting and installing the SQL test cluster to move the first databases. With a smile she takes out her headphones, opens up Spotify and chooses the perfect soundtrack for a nice and smooth day of developing. This is what she likes. Marc doesn’t mind at all, he does the same whenever he needs deep focus to write some dark magic PowerShell scripts.
Murphy is around
Stephanie admits that the script was harder than she had anticipated. Several lines of code forming a beautiful, colourful landscape on her screen showed a working script, though. Time for a well-deserved coffee, also for Marc, who had been struggling with a Windows 2003 Server. Stephanie smiles and says: ”Soon, Marc. Soon we will be running on our new machines.”
Before Stephanie and Marc have the pleasure of enjoying their coffee, Richard shouts: “Stephanie, we have some serious issues with the production instances running our core CRM system”. The server Richard was talking about, was one of the oldest models that should have been replaced ages ago. Long story short, the disk had crashed and when restarting the database, it did not come back online. It was completely corrupt. When trying to restore the database so that users could continue their work, the backups appeared to be corrupt as well.
The importance of doing database restores
Stephanie took a large cup of the darkest and strongest coffee, her favourite when she was going to get her hands dirty. After almost two hours of non-stop typing and scrolling through Books Online, she was able to recover 98% of the database. Corruption in certain tables was so bad, they had lost 2% of the system data. This was the first time ever Stephanie was not able to recover everything inside the database. If only they had made backups at the beginning. Just another day in paradise, the extraordinary life of a DBA.
Backups are only the beginning
Regularly restoring databases is the most important step in keeping your data safe in case of a problem or disaster. It allows you to verify whether data can be retrieved up to point in time where the database went dead. Unfortunately, this is something which is often overlooked. It is also crucial that you have an a procedure ready to initiate the restore. When disaster strikes, databases have to be back online in no time. Every minute it takes, the longer the end users are unable to work. It’s a lifesaver to have a script and procedure ready to keep the downtime to a minimum.
Make sure you and the management team have an agreement on the RPO and RTO. Know how long it takes to get your system back online. Know how much data you can ‘afford’ to lose. This all might sound logical, but many organisations forget to discuss this.
Next up: a story on whether you know how many databases you have. Stay posted!