Azure pt.1: A word on Linked Services in Azure

How to best describe Azure Data Factory (ADF) V2?  A friend of mine described it as the ETL of the skies.  And since I based almost my entire career on ETL tools, this sounded like music to my ears. After running a nice project in ADF I can now look back on this statement, and conclude there is some truth behind that statement, but it’s more an ELT then a ETL tool. 

ADF is a data integration tool that can manage and orchestrate data streams between different Azure Data Component like Azure Blob Store, Data Lake Store, but also Data Lake Analytics and Azure Data Warehouse.  It can even grab data from your on premise data sources in your company.  The real transformation part isn’t its cup of tea however.

But hey, we got now other more powerful tools for this  inside Azure, we can even use ADF as a runtime environment for SSIS packages, that will do all the lookup and transform work for you.  Since its version 2 dress it also contains a graphical user interface.  Too bad for all these Die Hard fans out there writing and debugging JSON code by hand, we can now use drag and drop as well 🙂

This series will consist out of three blogposts.  Below I will go deeper into Linked Services, the glue that you’ll need to keep all these Azure Services together.  In a second blogpost I will explain how to load the data in our SQLtable using Polybase.  In a third post we’ll be working with parameters and a parameter table, so you can make your workflow more dynamic.

The glue

I have my file in Azure Data Lake Store and I want to load it into my Azure Database using Data Factory.

In order to achieve this we need to set up a linked service between Azure Data Lake Store and Data Factory.  You can think of Linked Services as being a connection string.  And off course every connection string needs to be able to authenticate itself to the service.  For this, we won’t be using our own user ID for this, but a service principal.

The service principal fits in the concept of Role-Based Access Control in Azure (RBAC).  Role-based access control (RBAC) helps you to manage who has access to Azure resources, what they can do with those resources, and what areas they have access to.   So you can define fine-grained what each user or group can see or do in your environment, instead of giving everyone full access.  For communication between Azure applications Service Principals are used, think of it as a user identity for an azure resource.  To the Service Principle you then add roles that define what exactly the service principle has access to, and the scope of it (Reader, Contributor, Owner).

Let’s get started

The first step is to create an ADF Instance in your Azure Resource Group, and click on Author and Monitor to create your first project.  Let’s skip the sissy wizards, and go straight to the real-men author section.

You go to Connections, Linked Service, Click New and choose to create a Linked Service to the Azure Data Lake Store.

 

Here you need to provide a Name, Choose your Azure Subscription and the Data Lake Store you want to connect too.  As the authentication type, you choose to use the Service Principal.

Next up, populate the Service Principal ID and Service Principle Key.  Microsoft here chose to use a cryptic definition of User ID and User Password of the service Principal.

 

I hereby assume you have not created a service principal yet, so I guide you through the process of creating one.

 

Go to Azure Active Directory

 

in the Azure Portal.  There you choose for App Registrations:

 

There you choose for New App Registration.

 

Provide a new for you application connector.  The Sign-on URL in this case can be given randomly, as it’s not important in this case.  Important is that you choose to Web app / API as the Application type.

Your application will now be between the applications.

 

Click on TestAppKris and you come into the following screen.

 

The Application ID is the Service Principle ID asked to provide in the generation of the Linked Service.  The password we will need to generate.  Therefor you need to click on Settings and then on Keys.

 

In the Password screen, provide a new description for the password, and set the expire date.
You can choose between 1 year, 2 years and never expire.

 

If you now safe you will see the password for the key.  This is the principal key.  Store this very carefully.

 

So we now populate the principle ID and principal key in the Linked Service, case closed and we go home happily?

 

Not entirely.

[{“code”:9083,”message”:”Access to https://datalakekris.azuredatalakestore.net/webhdfs/v1/ is denied. Make sure the ACL and firewall rule is correctly configured in the Azure Data Lake Store account. Service request id: 34c05f82-e638-4717-a616-9be3aa7024ff Response details: {\”RemoteException\”:{\”exception\”:\”AccessControlException\”,\”message\”:\”Permission denied on / [34c05f82-e638-4717-a616-9be3aa7024ff][2018-06-18T06:23:25.4259617-07:00]\”,\”javaClassName\”:\”org.apache.hadoop.security.AccessControlException\”}}”}]

That’s a nasty error message.  Thank you Microsoft.

The reason for the error is that you might have generated the service principal, but you did not specify that it can access our data lake store.

 

How can we do that?

In the portal we go back to the resource group, and to the data lake store we have our data in.

 

Click on the Access Control (IAM) and click Add.

 

Assign your service principal (here called TestAppKris) the role of contributor and click save.

All clear?  Go back to the Azure Data Factory tab in your browser and test again?

 

 

What the heck in the world might be going on?  I did all the steps exactly as the dude said, and still the computer said no… Fear not my friend, we have one more thing to do.

As it turns out.  We now have clearance for Azure Data Factory to read/write into the Data Lake Store, but we can’t access the individual folders.

 

 

Go to Data Explorer and once there choose for Access.

Choose for Add, and next provide the name of the service principle app.

 

Click Select.

 

Click OK, and the permissions will be granted.

 

 

Looking a lot better J.  Click Finish to create the linked service.

 

Connecting the Linked Service to the database

We now completed step 1 out of 3.

We still need to set up the connection to our Azure database, and pump the data over from our csv file.  In this blogpost I will explain in detail how to set up the linked service to the Azure Database.  Note that this could easily be an on-premise database as well if you wish to.

The connection to the database:

In ADF, you go back to Connections, New Linked Service and choose for Azure SQL Database.

Click Continue.

 

 

Populate the name of the linked service, choose your Azure Subscription, server name and database name of the Azure SQL Database.

As authentication type I choose for the SQL Authentication.  Provide a valid User Name and Password for your SQL Database and click Finish.

 

 

If you click Finish you have a working connection to the Azure Database.

That wasn’t so hard, right?