Send emails from Azure Databricks

How to send emails with an SMTP server in Azure Databricks

A client asked if we could provide a simple form of monitoring on a part of a provided solution. The data platform we developed for them ingested a source that was afterwards used by a business team and our client’s clients. For this particular source, our client asked us to send a simple email with record counts to a mailing list. No problem! Let’s get to work.

SMTP and Databricks

To get this working there are a multitude of options you can explore. In this particular case, we were working with Azure components – mostly Azure Data Factory and Azure Databricks with a Python cluster – and we were looking for a quick solution with some flexibility. We opted to use an SMTP server called SendGrid in our Python Databricks scripts. Given that it’s a free, third-party server, we’re of course not going to be sending company secrets over it. A simple email with record counts, however, is not a problem.

Three easy steps

1. Set up your SMTP server

The first step is setting up your SMTP server. With SendGrid this was very easy. We created an account, set up an email address and created a log in. The process is very self-evident and it takes maybe 5 minutes.

2. Install a library on your Databricks cluster

Next, you need a suitable library to install on your Databricks cluster. Start by googling ‘smtplib whl’ and download the library from PyPi.org. In Databricks, click ‘Clusters’ in the sidebar on the left, click on your cluster and finally ‘Install New’ under ‘Libraries’. Upload the whl-library while making sure you’ve selected the correct extension and you’re good to go.

3. Create the right function

To actually get the mail sent, you need to create a function to send emails and call it where needed. You find the needed code for the function and an example of the call below. Make sure to set the SMTP server and port to the correct settings for your provider and don’t forget to fill out the proper names of the Azure KeyVault secrets you need (we’ve redacted them for obvious reasons 😉). Of course, this implies that these secrets exist in the first place, so create those as well if you haven’t already. That’s it, nothing more to it. You can now send emails through an SMTP server from Databricks.

The function

We’d like to think the code is quite readable. But in short we import the SMTP library that you installed in step 2. Then we define our function. I’d suggest putting this in a separate notebook that you can call on when needed. Finally we make use of our function in any notebook we want.

Defining the function

# Send an email through sendgrid

import smtplib

def SendEmail(recipient, subject, message):
  server = smtplib.SMTP ('smtp.sendgrid.net', 587) # check server and port with your provider
  server.ehlo()
  server.starttls()
  server.login("apikey", dbutils.secrets.get(scope = "key-vault-secrets", key = "")) # insert secret name
  sender = dbutils.secrets.get(scope = "key-vault-secrets", key = "") # insert secret name

  msg = MIMEMultipart()
  msg['Subject'] = subject
  msg['From'] = sender
  msg['To'] = recipient
  msg.attach(MIMEText(message))

  server.sendmail(sender, recipient, msg.as_string())
  server.close()

Calling the notebook defining the function and after the function itself

%run /Shared/YourFolder/NotebookHoldingFunction # change according to your Databricks setup

recipient = dbutils.secrets.get(scope = "key-vault-secrets", key = "") # insert secret name
message = "Your message here"
subject = "Your subject here"

SendEmail(recipient,subject,message)

Meeting business demands quickly

Now, Spiderman’s uncle Ben told us that with great power comes great responsibility. So, in developing this power of sending emails through Databricks, we must ask ourselves, is it the right way to go? We discussed this part of the project with Competence Leader Ronny. He validly raised the point that sending these types of emails would be something you typically do with the controlling process/component. In our case, this would be the Azure Data Factory.

It’s something to be discussed with our client. This alternative would take a bit longer though, both in planning and execution. But we needed to tie business’ needs over as soon as possible, so we chose this this quick and flexible solution. We’re not trying to milk a cow with our hands in our pants: the show can go on. And in case we would decide to go in the different direction, suggested by our dear colleague, we now have the time to set it up properly. Great!