SQL Server 2016 R Services: executing R code examples

Curious about running your fist R script inside SQL Server and want to learn how it looks like? Unfortunately, it’s not a mix of R and T-SQL code, because R commands will be executed in a separate engine, which is installed with SQL Server. Therefore, sp_execute_external_script procedure is introduced with a bunch of parameters.

To give an illustration of what I mean, let’s look at the required parameters @language and @script. First can now take the only value – R. In the future Microsoft is planning to support more languages, for example, they talked about Python and Julia at PASS Summit 2015. The second parameter is used to pass an R script to it. Let’s look at the first very simple example, which will return information about internal iris dataset, that is available in R by default. R language comes with some data integrated inside it in the form of datasets you can play with. It’s very convenient and makes it easier to start experiments.

The output information tells us, that iris dataset has 150 rows and 5 columns. It shows the name of the columns, its datatypes as well as some values.

All other parameters are optional, but probably you’ll use them actively, especially those, that are responsible for data exchange between SQL Server and R environments. For instance, you can send some data into R using and ordinary SELECT statement and pass it to @input_data_1 parameter. Inside R script, this data will be available in the form of a dataset with the default name of InputDataSet, if you haven’t changed it using @input_data_1_name parameter. Now, only one input dataset is supported. Another key thing to remember is that if you want to return some data back from R to the output of sp_execute_external_script procedure, you should put it inside R script into a dataset with the default name OutputDataSet, or any other in case you define its name in @output_data_1_name parameter. For instance, in the second example, I’m giving a simple SELECT statement as an input with the default name to R script and returning it back.

If you don’t want to use the default names, or you already have a script with other names and don’t want to touch it, you can define the names of input and output datasets.

The result of the execution will be the same.

You should have already noticed, that the output has no column names. This is because I used with result sets undefined option. With this in mind, you can also define the returned dataset in order to better control the results.

Furthermore, there is an option to send just a couple of variables using @params argument, where you define them, and later just pass the values there. Use output clause if you want to return some values.

That is all you need to start with executing basic R scripts inside SQL Server. In the next posts, I will show how to control parallel execution, work with large data volumes and some data type considerations you should know when working with R.

SQL Server 2016 R Services: offline installation and adding additional R packages

Let’s discover, how to install R Services with SQL Server 2016. First, you need to check the respective component it during the installation process. In the early CTP versions it was called the Advanced Analytics Extensions, but in RTM it was renamed to R Services (In-Database).

If your server is connected to the Internet, the installation process will be easy and straightforward, all the required components will be automatically downloaded and installed. But, the real world is different, and most of the servers don’t have a direct Internet connection because of the security requirements. In that case, you still should run SQL Server installation and do the same choice as I mentioned earlier, but on the next step, you’ll see the following window where you should agree with the R Open installation.

Click Accept, then Next. The installation program will ask you for the directory, where R Open and R Server distributives are located as well as the links, where you can download them.

If you don’t have these distributives, go download them from the computer, where there is the Internet connection and then put somewhere, so that the installation program will have access to this folder. After it click Next and wait for the installation to finish. As you see, the process is quite easy too. Now it’s time to activate the R Services after SQL Server 2016 installation. You can do it with sp_configure command.

Attention! You need to restart SQL Server instance. After it, you could run the basic R scripts inside you SQL Server, but as we learned earlier, the real power of R language is in its extensions, called packages. So, in the future, you’ll need some of them too. The packages installation is described in the official documentation, but I suggest you one more way, which is more convenient, I believe.

Go to the C:\Program files\MSSQL13.MSSQLSERVER\R_SERVICES\bin or C:\Program files\MSSQL13.<instanceName>\R_SERVICES\bin\x64 folder depending, whether you installed the default of named instance and run R.exe with administrative privileges (it is required to have write permissions the packages directory).

You’ll see common R language console, where you can enter any R command.

Let’s see, how to install the very popular ggplot2 packages, which is often used to create rich graphics. Again, if your server is connected to the Internet, you just need to run the following command.

If not, the process could be somewhat complicated. First of all, you need to download the package and all of its requirements from the Internet. To do that, we need to go to the CRAN repository. For the ggplot2 package the URL will be https://cran.r-project.org/web/packages/ggplot2/index.html.

On the bottom, you see the link for the Windows binary ZIP archive for this package, and on top in Depends and Imports sections the list of required packages, which should be also downloaded. And of course, the required packages for the required too. So, you have to follow the links recursively to download the complete set. In this moment there are 13 of them, in the future, this number could change. I put them in C:\temp directory on my server, but you can choose any other.

Not I need to create an index file for downloaded packages, the PACKAGES file. This could be completed using the following commands in the R console.

 

Now we are ready to start the installation process. Don’t forget to define the directory with downloaded packages.

After successful installation, we can check it by trying to load the package.

If you forget or miss to download any required package, you will see an error on this step with the name of absent package. Don’t give up, just go, download it too, put in the same folder, create the new PACKAGES file and repeat the ggplot2 installation. Hope, that this article will help you to save some time with offline installation and help to start using R Services faster. In the next post, we’ll see how to execute R scripts inside SQL Server.

SQL Server 2016 R Services: introduction

This article is a short introduction to R programming language and a new R Services feature in SQL Server 2016. I hope, it will be a good starting point for those, who haven’t heard about it at all. R is a software environment and programming language for statistical computing, predictive analytics, and rich graphics. R can be easily extended through user-created packages, thousands of which are available in the CRAN (The Comprehensive R Archive Network), the central repository for all R binaries and third-party packages. Installing and updating packages from this source is an easy and straightforward process, which you can do just by one command or through graphical interface.

R is an open source project and is freely available for various operating systems, which is one of the main reasons why it has become substantially popular among statisticians and data scientists in recent years for data analysis, prototyping, and even software development. R community is huge and you can find an answer to almost any question or problem over the Internet, tons of scripts and examples are available for learning, downloading and modifying. Of course, such popularity has caused the inclusion of R language support to some Microsoft products and services.

On April 6th, 2015 Microsoft announced the acquisition of Revolution Analytics company, which has made R enterprise-ready by offering such products as R Open and R Server, which can scale and run R scripts number of times faster than an open source version by providing optimized versions of popular R built-in functions, multi-threaded processing and loading data to memory by chunks as needed.

Without a doubt, R Services in SQL Server 2016 is one of the biggest results of this unity. It brings R language support to one of the most popular RDBMS in the world, allowing to implement advanced analytics scenarios inside database engine and process large amounts of data using Revolution Analytics technologies. It’s possible to extend T-SQL code with more powerful R functions for data processing and predictive analytics. Moreover, you can now easily create rich graphics visualizations by using R capabilities and use these images inside your applications and reports.
In the next posts of this series, I will start to discover this new feature for you step by step and try to show its pros and cons, from the installation nuances up to processing large volumes of data in parallel. Hope, all this information will help you learn R Services and start using it.