Oh, this is cheesy, can’t believe we could ever dare to mention Mariah Carey in our blogpost series. Well, the holiday season is starting, so there is no escape at all 😊.
What I meant with the title was: All I need for Christmas is you… Azure Data Flow. Data what? Azure Data Flow is a new functionality in Azure Data Factory. For more info on Azure Data Factory itself I would like to refer to my series of blogposts published last July on our website concerning Azure Data Factory. Please follow this link here, and don’t forget to come back afterwards.
But back on topic, we were talking about Azure Data Flow. As said, Azure Data Flow is a new functionality, that still has to be publicly released by Microsoft. Currently I’m as excited as a child waiting for Santa Claus (hence my title) to play with it on a customer’s site because , I got a private preview access to it, and that means that I can already share some things with you. How did I manage to do that, well let’s say that I know people who know people…
What’s all the fuzz about now? To be able to explain this, I’ll have to go back half a year. We were happily writing our pipelines in Azure Data Factory V2, using ADLA as an engine to clean / transform data while it was travelling down the road. ADLA uses U-SQL, a combination of SQL and C# that we as ‘classical’ ETL developers managed to pick up fairly quickly as it was within the ranges we felt comfy in. Happy days…
But then a new kid came on the block. The kid already made itself quiet a reputation in other parts of town, and quickly proved what it is worth in ADF as a welcome addition to our stack. An even more powerful way to transform data in the cloud. Enter Data Bricks. Because of its flexibility All eyes went to Data Bricks and suddenly, ADLA was no longer any good, it even didn’t got a place under the sun when using the Data Lake Gen2.
Okay, as Azure consultants, we love evolutions in technology. What else would we do with our weekends if we couldn’t read blogs anymore 😉. Big problem: This new kid talks Python and Scala, languages that grew out of Perl. And let’s be honest that’s very far out off our radars. So, we panicked, we even screamed for our mothers. Unfortunately, they couldn’t help us either (at least mine thought I became crazy, a python coming out of a pearl, on a Spark ??? I quickly left before the nurse came.). It became even more fun when we tried to explain this to our BI customers. We have this great new powerful tool, but it only speaks Python… Yeah, complicated… Mostly there was some 50+ year old guy in the room that raised his hand and suddenly felt all in fashion again. I know that shit… Python and flat files, they love it 😊.
Azure Data Flow allows you to build data transformation logic using a graphical interface. The way of building your logic is very similar to when we did it back in SSIS. But instead using the very powerful Data Bricks engine underneath. All this without having to write Python or Scala code yourself? Yes Sir! Or Madam 😊. I, at least, when being told on DataMinds, was all ears. Because SSIS-like component on Azure, that’s right in my shop. I know that shit…
Does that mean that you will never ever have to write Python code again when using Azure Data Bricks? Probably not, it’s still early days, and let’s see how far this new tool can bring us. But at least you got your basics covered. And maybe one day…
I promised you a sneak preview. You get a sneak preview.
This is the Microsoft Taxi Demo Data Flow:
As you can see, we are joining importing data, joining it, aggregating it and sinking it while it is travelling down the pipeline.
For the join you can choose between an inner, left, right, full outer, and even a cross join.
I can even go into debug mode and see a preview of my Data. I can optimize the performance of my flow by defining partitions and choosing to maintain one part of my join in memory. When I have set all this, I can inspect my settings and see how each individual column is used and handled by the join component.
And this is the aggregator, where we first define our group By
And then our aggregate clause
Pretty cool isn’t it? I personally think it is. Did you know by the way the expression language used in Azure Data factory, Azure Data Flow and Azure Logic Apps is shared among them? So, if you are familiar with the expression language of Azure Logic apps, you already know Azure Data Factory and the still in private preview Azure Data Flow as well.
Another Data Flow to show. Here a union happens between 2 sources, then we create a calculated column to calculate the new currency rate.
SSIS anyone? But this time with an oh so powerful workhorse underneath the hood.
Having built my Azure Data Flows, I can integrate them in my regular ADF V2 pipeline.
Under Move & Transform I choose the Data Flow component, I hook it up to the Data Flow I just created and link it to a Data Bricks node and an Azure blob storage containing my data. And we are good to go. Not a single line of Python code created 😊. Sweet…
All I want for Christmas is Azure Data Flow. And I really mean it!
Please Microsoft…
Cookie | Duration | Description |
---|---|---|
ARRAffinity | session | ARRAffinity cookie is set by Azure app service, and allows the service to choose the right instance established by a user to deliver subsequent requests made by that user. |
ARRAffinitySameSite | session | This cookie is set by Windows Azure cloud, and is used for load balancing to make sure the visitor page requests are routed to the same server in any browsing session. |
cookielawinfo-checkbox-advertisement | 1 year | Set by the GDPR Cookie Consent plugin, this cookie records the user consent for the cookies in the "Advertisement" category. |
cookielawinfo-checkbox-analytics | 11 months | This cookie is set by GDPR Cookie Consent plugin. The cookie is used to store the user consent for the cookies in the category "Analytics". |
cookielawinfo-checkbox-functional | 11 months | The cookie is set by GDPR cookie consent to record the user consent for the cookies in the category "Functional". |
cookielawinfo-checkbox-necessary | 11 months | This cookie is set by GDPR Cookie Consent plugin. The cookies is used to store the user consent for the cookies in the category "Necessary". |
cookielawinfo-checkbox-others | 11 months | This cookie is set by GDPR Cookie Consent plugin. The cookie is used to store the user consent for the cookies in the category "Other. |
cookielawinfo-checkbox-performance | 11 months | This cookie is set by GDPR Cookie Consent plugin. The cookie is used to store the user consent for the cookies in the category "Performance". |
CookieLawInfoConsent | 1 year | CookieYes sets this cookie to record the default button state of the corresponding category and the status of CCPA. It works only in coordination with the primary cookie. |
elementor | never | The website's WordPress theme uses this cookie. It allows the website owner to implement or change the website's content in real-time. |
viewed_cookie_policy | 11 months | The cookie is set by the GDPR Cookie Consent plugin and is used to store whether or not user has consented to the use of cookies. It does not store any personal data. |
Cookie | Duration | Description |
---|---|---|
__cf_bm | 30 minutes | Cloudflare set the cookie to support Cloudflare Bot Management. |
pll_language | 1 year | Polylang sets this cookie to remember the language the user selects when returning to the website and get the language information when unavailable in another way. |
Cookie | Duration | Description |
---|---|---|
_ga | 1 year 1 month 4 days | Google Analytics sets this cookie to calculate visitor, session and campaign data and track site usage for the site's analytics report. The cookie stores information anonymously and assigns a randomly generated number to recognise unique visitors. |
_ga_* | 1 year 1 month 4 days | Google Analytics sets this cookie to store and count page views. |
_gat_gtag_UA_* | 1 minute | Google Analytics sets this cookie to store a unique user ID. |
_gid | 1 day | Google Analytics sets this cookie to store information on how visitors use a website while also creating an analytics report of the website's performance. Some of the collected data includes the number of visitors, their source, and the pages they visit anonymously. |
ai_session | 30 minutes | This is a unique anonymous session identifier cookie set by Microsoft Application Insights software to gather statistical usage and telemetry data for apps built on the Azure cloud platform. |
CONSENT | 2 years | YouTube sets this cookie via embedded YouTube videos and registers anonymous statistical data. |
vuid | 1 year 1 month 4 days | Vimeo installs this cookie to collect tracking information by setting a unique ID to embed videos on the website. |
Cookie | Duration | Description |
---|---|---|
ai_user | 1 year | Microsoft Azure sets this cookie as a unique user identifier cookie, enabling counting of the number of users accessing the application over time. |
VISITOR_INFO1_LIVE | 5 months 27 days | YouTube sets this cookie to measure bandwidth, determining whether the user gets the new or old player interface. |
YSC | session | Youtube sets this cookie to track the views of embedded videos on Youtube pages. |
yt-remote-connected-devices | never | YouTube sets this cookie to store the user's video preferences using embedded YouTube videos. |
yt-remote-device-id | never | YouTube sets this cookie to store the user's video preferences using embedded YouTube videos. |
yt.innertube::nextId | never | YouTube sets this cookie to register a unique ID to store data on what videos from YouTube the user has seen. |
yt.innertube::requests | never | YouTube sets this cookie to register a unique ID to store data on what videos from YouTube the user has seen. |
Cookie | Duration | Description |
---|---|---|
WFESessionId | session | No description available. |