T-SQL Tuesday #108: Non-SQL Server Technologies

This month’s T-SQL Tuesday is hosted by Mala Mahadevan [b|t], who has invited people to discuss new skills and technologies that they’d like to learn in the near future, with the twist that she’s particularly interested in non-SQL Server items this time out.

This ties in rather well with my post yesterday about my New (SQL) Year’s resolutions, one of which is to spend at least thirty minutes five times a week on professional development.  I’ve got two main non-SQL areas that I’m looking to learn more about between now and next year’s PASS Summit: Microsoft Azure and Python.

Although none of my clients are currently primed to move into the Azure cloud, I need to make sure that when they’re ready, I am too!  I’m looking to dive in to the business intelligence and analytics offerings, as well as being particularly intrigued by Azure Cosmos DB.  The first goal is to truly get my head around as many of the 100+ available services as possible so that I can understand why and when to use each one; the how portion can come later.

On the Python side I’ve got three main goals:

1.  Upgrade my pandas skills to be truly “pro-level”.   No more googling every other data frame manipulation I need to write!

2.  Learn Seaborn to take my plotting and visualization skill to the next level.

3.  Work on some fun areas of Python outside of the data science ecosystem to develop my overall Python skills.

I hope you’ll check back in on me over the next twelve months to find out how I’m doing!

 

Introduction To sp_execute_external_script With Python, Part 1

I’ve been speaking on integrating Python and SQL Server for a while now, and having given the same presentation at nearly a dozen SQL Saturdays, PASS Summit, and multiple user groups, it’s about time to retire my favourite session.  Just in case there’s anyone out there who is interested in the subject but hasn’t managed to see it yet, I thought I’d start sharing some of the code and use-cases in a series of blog posts.

We’ll start off with a simple step-by-step introduction to the sp_execute_external_script stored procedure.  This is the glue that enables us to integrate SQL Server with the Python engine by sending the output of a T-SQL query over to Python and getting a result set back. .  For example, we could develop a stored procedure to be used as a data set in an SSRS report that returns statistical data produced by a Python library such as SciPy.  In this post we’ll introduce the procedure and how to pass a simple data set into and back out of Python, and we’ll get into manipulating that data in a future post.

If you’d like to follow along with me, you’ll need to make sure that you’re running SQL Server 2017 or later, and that Machine Learning Services has been installed with the Python option checked.  Don’t worry if you installed both R and Python, as they play quite nicely together.  The SQL Server Launchpad service should be running, and you should make sure that the ability to execute the sp_execute_external_script procedure is enabled by running the following code:

 

As with any introduction to a new language, we’re contractually obligated to kick things off with a “Hello, world!”-style script.  You’ll note that our first procedure call below has only two parameters.  The @language parameter says that we’ll be using Python code as opposed to R, and that the Launchpad service should start instantiating the Python engine.   The @script parameter then contains the Python code that we’ll be executing, which in this case is a simple print statement:

 

 

 

Now that we know our Python integration is working, the next step is to send some data over from SQL Server and get it back out.  We specify a T-SQL query to run in the @input_data_1 parameter; the SQL engine will execute this query and send the resulting data set over to Python.  The @script parameter has now been updated to simply say that we will take that data, identified by its default name of InputDataSet, copy it over to OutputDataSet (another default) and then return it to SQL Server.  I’ve got a tally table (a table of sequential numbers) set up in my Admin database, and I’ll begin by just sending across the first ten numbers from that table:

 

 

 

 

 

 

 

If for some reason you don’t have or are not able to add a tally table to your system, you can use the ROW_NUMBER() function against the sys.objects DMO instead:

 

 

There’s no need to keep the default name of “InputDataSet” for our input data set.  I’m quite partial to “myQuery” myself.  Renaming it is as simple as adding another parameter called @input_data_1_name.  Of course, we have to remember to update the python code in @script to reflect the new name:

 

 

 

 

 

 

 

 

Unsurprisingly, we can also change the name of our output data set by adding the @output_data_1 parameter, and updating @script accordingly:

 

 

 

 

 

 

 

 

I should pause here to point out that although having parameters named @input_data_1, @input_data_1_name, and @output_data_1_name may imply that we can manage multiple input and output data sets with sp_execute_external_script, that is sadly not the case.  I’m hoping that this naming convention was chosen to allow room for that to happen in a future release (hint, hint, product team!)

Although this input query is very simple, in the real world we might be sending over the output of some very complex queries.  The input query in one of the use cases in the session I’ve been presenting is over 150 lines!  To make this more manageable in our integration, rather than embedding the whole query in our procedure call and making an unholy mess, we’ll instead store it in a variable called @InputQuery and pass that in to @input_data_1.  This allows us to use our standard T-SQL coding style, and has the added effect of being easy to copy-and-paste into a new query window for development and/or, heaven-forbid, debugging.

 

 

 

 

 

 

 

(Yes, “variablizing” is a real word.  I checked.)

 

We can do the same thing with our Python code, which in this example we’ve stored in a variable named @PyScript and then passed in to the @script parameter:

 

 

 

 

 

 

 

 

Look at that nice clean procedure call!  Marvel at how the business logic in our T-SQL query and our Python code is separated out, formatted nicely, and ready to copy into another window or IDE!  Can we all now agree to use this format for our sp_execute_external_script calls going forward?

This concludes the end of part one, but we’ll return with part two very soon, in which we start to manipulate our data set in Python and see what happens when things don’t go so well…

 

Kentuckiana Mini-Tour

Having just finished leading the efforts on the very successful SQL Saturday Albuquerque, by all rights I should be taking this week off to recuperate.  Instead I’m starting a whirlwind “mini-tour” of Kentuckiana tomorrow, with three spots on which I’ll be presenting on SQL Server’s Machine Learning Services components.

Tuesday night I’ll be at Blue Stallion Brewing in Lexington, KY talking to the Lexington PASS group about how to leverage Python for statistical analysis in SQL Server 2017.  I’m looking forward to the unique experience of presenting with a pint glass in my hand rather than a bottle of water.  As long as I don’t try to use the pint glass like a laser pointer, I’ll be fine!  Register Here.

I don’t have any speaking engagements on Wednesday night, but will instead be heading up to Columbus, OH with my friend Matt Gordon [b|t] (who founded the Lexington PASS group) and his son to watch the Columbus Crew take on the Philadelphia Union.  As many of you know, I’m a devout soccer (real football) fan, and I’m really looking forward to visiting one of the most storied environments in US soccer.

Thursday is a double-header starting in Evansville, IN at 11:00 AM and Louisville, KY at 6:00 PM.  With a two-hour drive between them, not to mention the one-hour time difference, this will be an intense day.  It’s a good thing then that the session I’ll be presenting is the same for both groups!  Like the Lexington session I’ll be talking about statistical analysis, but this time I’ll be focusing on the R programming language, so the talk will be directly applicable to SQL 2016 as well as 2017.  Register Here for Evansville, IN.  Register Here for Louisville, KY.

If you live close to any of these three presentations, I’d love for you to come out so I can meet you and talk data and statistics with you!