With the launch of the latest preview of SQL Server 2016 (CTP 3), the long-awaited fruit of Microsoft’s acquisition of Revolution Analytics has become public. It is now possible to install what has been named R Services and evaluate the benefits of in-database R. As I mentioned in the BlueGranite blog when the integration was first announced, this is revolutionary—for both the R and SQL Server communities. Whereas my initial assessment was all based on early speculation, I can say that having now seen demos as well as having used it in preview, the reality of R Services is better than I had initially envisioned.
Customer Clustering Scenario
As an initial foray into R Services, I used customer data from the AdventureWorksDW database as input for k-means clustering in R. I like clustering examples because people at all levels can appreciate the concept even if they do not know anything about the algorithm. Pick [x] number of clusters, provide a set of features, and each observation is assigned to one of those [x] clusters. Clusters are also easy to visualize in a scatter or bar graph. After creating my clustering stored procedure, I then used Reporting Services to visualize how many customers were assigned to each cluster. Finally, I pinned the SSRS chart to Power BI.
R Services Code
R Services combines R scripts with SQL in stored procedures. I created the following k-means sample (code) and provide six key points explaining the structure below. For a more permanent implementation, it would be best to add parameters for the number of clusters and anything else that is hard-coded.
Sample k-means stored procedure for R Services
1) sp_execute_external_script is the new stored procedure that executes the R Services code.
2) The language parameter specifies “R”. Apparently Microsoft is leaving the door open for additional language support in the future.
3) The input_data_1 parameter contains the SQL input query. The R script uses the results of this query. I selected three attributes from the Customer dimension in AdventureWorksDW and sampled one hundred customers. Forgive the lazy feature selection, but remember, it’s only a quick demo.
4) The script parameter contains the R script. I tried to present the k-means script in a clear and explicit manner so that it can be followed easily by anyone who is not familiar with R. Everything comes together as a data frame in the final R output: “OutputDataSet”.
5) WITH RESULT SETS allows you to specify the final SQL output of the stored procedure. In this case, a new “ClusterResult” column is added to the original data.
6) A new procedure that I called dbo.spRServicesKMeansSample contains all of the code. This is the final procedure that client tools can execute.
Using the R Services Output
Running a stored procedure in Management Studio returns the results to a grid. Any client that can execute SQL can obtain the results, however. For example, it is possible to use Reporting Services with R Services (SSRS with SS…R…S).
Reporting Services executing the R Services procedure
The algorithm takes the input of income, children, and house ownership for the one hundred sample customers and assigns each customer to one of four groups. Visualizing the customer counts in Reporting Services, I see my four clusters.
Four customer clusters assigned based on income, children, and house ownership
Once the Reporting Services report has been deployed, it is also easy to pin the chart to a dashboard in the Power BI service. Simply click on the new icon with the yellow Power BI logo. Pinning also creates a standard SSRS report subscription for refreshing the report.
Pin SSRS charts to Power BI dashboards
Select a dashboard and refresh schedule
SSRS chart pinned in Power BI
Who Uses R Services?
After seeing it in action, hopefully anyone who uses SQL Server feels that they can leverage R Services in some capacity. Whether a developer, an analyst, a data scientist, or even a database administrator; there can be something for everyone. For developers and analysts, one benefit of running R code in SQL Server is that you do not need to know R to take advantage of R Services. Developers can execute stored procedures that run R scripts as they would with any other stored procedure. They may never choose to go deeper into the R language. Advanced analytics practitioners can reuse their existing scripts. All groups can connect to R Services with familiar tools.
Why Use R Services?
Advanced Analytics: R excels at analytics where SQL and other languages fall short. Its capabilities for advanced statistics, machine learning, and other data science work far exceed what has traditionally been possible with SQL Server — even Analysis Services. With the type of analysis for which R allows as well as the existing community of R users, it made sense to integrate rather than reinvent.
Enterprise-Ready R: There have been historical drawbacks to R that limited it to a smaller, select audience. RevoScaleR from Revolution Analytics brings increased performance and scalability to R, making it more appropriate for enterprise workloads.
Ease of Use: With R Services, there is no new client tool. Work with familiar tools as long as they can connect.
Overall, I am excited to have R in-database in SQL Server, and I look forward to testing out additional capabilities and code. The release of R Services opens up an avenue to large-scale deployment of R scripts across enterprises just as R is gaining adoption by the broader business community. What Azure ML does for Cortana Analytics in the cloud, R Services now does for SQL Server on-premise. Taking advantage of advanced analytics on the Microsoft data platform has never been this promising.