We live in a world where an enormous amount of data is available. But having access to the data is not the same as having insight. If we combine this data access with the right tools and a little bit of effort, you might be able to transform your data into information in no time.
I have had this particular example at a client where the results of a nightly ETL load is sent on a daily basis. It merely contains the status (success and error) and the start/end times (the status is also part of the email subject). This is for both their test and production environment. All these emails are dropped into an Exchange folder of my choice. This is an example email:
Dear,
Here is the result of the job :
30/11/2016 22:12:47 : Script DataMart_LoadData has started
30/11/2016 22:12:47 : Executing SSIS Package Datamart_SYN_Master.dtsx With Parameter :
01/12/2016 00:03:47 : SSIS Package Datamart_SYN_Master.dtsx Successfully executed
01/12/2016 00:03:47 : Script DataMart_LoadData has ended
Regards,
AutoSys
The log can be consulted here :
\\SomeFolder\LeftOutDetailsForSecurityReasons.log
The emails are being sent for more than a year now, but by looking at the emails one by one, it is very difficult to see a trend or a pattern. In the ideal world, start and end dates of an ETL should be coming from the log tables from SSIS, not from a collection of emails. But sometimes the ideal world is not at reach. Luckily with Power BI, we have a tool set that can connect to many sources and even integrate them in a model. One of the connectors that comes with Power BI is the Microsoft Exchange connector.
After choosing the Microsoft Exchange connector, we’re prompted to enter a mailbox. In the next step, we will choose which folder to load: Mail in our case.
We’re interested in filtering values from a subfolder of the \Inbox\ and further on, we only want to include emails with a certain topic. So, we filter the emails on their Folder Path and Subject to include only the emails we’re interested in.
Because we need to extract data from the email body, we go to the Body column and choose to keep the TextBody.
We’re having a lot of redundant columns and therefore we only keep the columns Subject and Body.TextBody.
Next, we’ll clean up some of the prefix and suffix strings, so we only have the Environment (PROD and TEST) in our Subject column (we actually replace it with an empty string).
Now it looks like this:
We then rename the column Subject to Environment. The tricky part is to retrieve the start and end times of the ETL, which are part of the message body. So, first, we clean up the introduction text and the signature of the emails, as they’re always the same. Even though they contain line breaks, we can do this by checking the Replace using special characters and insert the necessary line breaks (carriage return + line feed).
What’s left is a column Body.TextBody with four lines in each cell: four dates followed by some text. We will now split each cell over as many columns, again by using Split using special characters.
We remove the two unwanted columns (the two in the middle) and remove the suffix from both columns.
Finally rename the columns with something more appropriate: Start and End.
The dates are still in the Text data type, so we change the data type to Date/Time. In our case this works just fine, but if it wouldn’t have worked, for example if the month and the day were switched, we would still be able to extract the date by using a custom column. We also add a custom column to calculate the time difference in minutes between Start and End date. To accomplish this, we use the M function Duration.TotalMinutes.
At the end, we duplicate the Start column to get the start date and change the data type from Date/Time to Date (only the start date is important, not the start time). Also, the data type for the duration is set to Decimal Number. Now we’re done with cleaning up, click Close & Apply. We end up with a table that looks like a regular data table, not like a folder in your inbox.
We could have left out the Start and End columns in the output, but I figured they could be used some time as a back reference.We can start creating some reports now: we create a simple line chart with a trend line, that tells us how the load times behave over a longer period of time (in this case it was filtered from April 2016 until October 2016). When showing both the test and production environment, it all seems pretty stable.
This seems true for the load of the production environment.
But when we zoom in little (starting in july), we see that the production ETL is going faster over time.
The stability we did see at first was partly caused by the some outliners, but the visual effect was also masked by the same outliners. To prevent this, you can overrule the Start/End values on the Y-Axis. On the test environment we see the opposite trend.
And this is true: production has received some minor hardware-upgrades over the last few months, while this was not the case for our test environment. Also, the test environment is used for more purposes than just ETL, while the production environment is dedicated for ETL.
By gathering emails, and having the right toolset, you can make your emails speak more than just words. And how cool is it if you can go to your manager and show him/her a visual with insights and then saying that you extracted this from your emails!
Cookie | Duration | Description |
---|---|---|
ARRAffinity | session | ARRAffinity cookie is set by Azure app service, and allows the service to choose the right instance established by a user to deliver subsequent requests made by that user. |
ARRAffinitySameSite | session | This cookie is set by Windows Azure cloud, and is used for load balancing to make sure the visitor page requests are routed to the same server in any browsing session. |
cookielawinfo-checkbox-advertisement | 1 year | Set by the GDPR Cookie Consent plugin, this cookie records the user consent for the cookies in the "Advertisement" category. |
cookielawinfo-checkbox-analytics | 11 months | This cookie is set by GDPR Cookie Consent plugin. The cookie is used to store the user consent for the cookies in the category "Analytics". |
cookielawinfo-checkbox-functional | 11 months | The cookie is set by GDPR cookie consent to record the user consent for the cookies in the category "Functional". |
cookielawinfo-checkbox-necessary | 11 months | This cookie is set by GDPR Cookie Consent plugin. The cookies is used to store the user consent for the cookies in the category "Necessary". |
cookielawinfo-checkbox-others | 11 months | This cookie is set by GDPR Cookie Consent plugin. The cookie is used to store the user consent for the cookies in the category "Other. |
cookielawinfo-checkbox-performance | 11 months | This cookie is set by GDPR Cookie Consent plugin. The cookie is used to store the user consent for the cookies in the category "Performance". |
CookieLawInfoConsent | 1 year | CookieYes sets this cookie to record the default button state of the corresponding category and the status of CCPA. It works only in coordination with the primary cookie. |
elementor | never | The website's WordPress theme uses this cookie. It allows the website owner to implement or change the website's content in real-time. |
viewed_cookie_policy | 11 months | The cookie is set by the GDPR Cookie Consent plugin and is used to store whether or not user has consented to the use of cookies. It does not store any personal data. |
Cookie | Duration | Description |
---|---|---|
__cf_bm | 30 minutes | Cloudflare set the cookie to support Cloudflare Bot Management. |
pll_language | 1 year | Polylang sets this cookie to remember the language the user selects when returning to the website and get the language information when unavailable in another way. |
Cookie | Duration | Description |
---|---|---|
_ga | 1 year 1 month 4 days | Google Analytics sets this cookie to calculate visitor, session and campaign data and track site usage for the site's analytics report. The cookie stores information anonymously and assigns a randomly generated number to recognise unique visitors. |
_ga_* | 1 year 1 month 4 days | Google Analytics sets this cookie to store and count page views. |
_gat_gtag_UA_* | 1 minute | Google Analytics sets this cookie to store a unique user ID. |
_gid | 1 day | Google Analytics sets this cookie to store information on how visitors use a website while also creating an analytics report of the website's performance. Some of the collected data includes the number of visitors, their source, and the pages they visit anonymously. |
ai_session | 30 minutes | This is a unique anonymous session identifier cookie set by Microsoft Application Insights software to gather statistical usage and telemetry data for apps built on the Azure cloud platform. |
CONSENT | 2 years | YouTube sets this cookie via embedded YouTube videos and registers anonymous statistical data. |
vuid | 1 year 1 month 4 days | Vimeo installs this cookie to collect tracking information by setting a unique ID to embed videos on the website. |
Cookie | Duration | Description |
---|---|---|
ai_user | 1 year | Microsoft Azure sets this cookie as a unique user identifier cookie, enabling counting of the number of users accessing the application over time. |
VISITOR_INFO1_LIVE | 5 months 27 days | YouTube sets this cookie to measure bandwidth, determining whether the user gets the new or old player interface. |
YSC | session | Youtube sets this cookie to track the views of embedded videos on Youtube pages. |
yt-remote-connected-devices | never | YouTube sets this cookie to store the user's video preferences using embedded YouTube videos. |
yt-remote-device-id | never | YouTube sets this cookie to store the user's video preferences using embedded YouTube videos. |
yt.innertube::nextId | never | YouTube sets this cookie to register a unique ID to store data on what videos from YouTube the user has seen. |
yt.innertube::requests | never | YouTube sets this cookie to register a unique ID to store data on what videos from YouTube the user has seen. |
Cookie | Duration | Description |
---|---|---|
WFESessionId | session | No description available. |