2019-06-18
Microsoft Azure ML provides over 500 individual R packages for use in R scripts. It is almost certain, however, that at some point you will wish to use an R package not available by default.
Several years ago, before Revolution Analytics was acquired by Microsoft, Andrie deVries created a very useful package called miniCRAN. This package makes it easy to create local packages that users can employ to install R packages. Often, this is necessary for security reasons; users are prevented from downloading executable files from external repositories.
However, miniCRAN also serves another valuable purpose which will suit our needs precisely. It is easy to upload a single independent R package into Microsoft Azure ML. However, many R packages have multiple dependencies, and managing all the requirements can become a management headache. A very practical solution is to define an R script on a local machine that creates a repository of the desired packages. This repository will automatically include required dependencies. The entire repository can then be zipped and loaded into your AzureML space, and desired packages can be installed as needed, just as easily as if you will using RStudio. Furthermore, the zipped repository can become a shared resource for all the R developers on the team.
Creating the Local Repository
In this example, we will install and load the package "sn" which is helpful for generating skewed probability distributions.
The following script is run in your local RStudio to generate a local CRAN-like repository:
library(miniCRAN)
# cf https://blog.revolutionanalytics.com/2014/10/introducing-minicran.html
options(repos = c(CRAN = "https://cran.at.r-project.org/"))
# we create a vector of all the CRAN packages we
# would like to include in our local repository
pkgs <- c("numDeriv", "sn")
localCRAN <- "~/localMiniCRAN"
dir.create(localCRAN)
makeRepo(pkgDep(pkgs), path = localCRAN, type = "source")
makeRepo(pkgDep(pkgs), path = localCRAN, type = "win.binary")
When we are finished, we can view the resulting repository files in the Windows File Explorer. Note that in this example we explicitly distinguish between packages distributed as source code and those distributed as Windows binaries.
Note that the local repository includes mnormt, which is a requirement of sn but which we did not explicitly mention in the repository script.
Loading the Zipped Repository into AzureML
The zipped repository is uploaded as you would any data file. The "dataset" is then connected to an Execute R Script step in the Azure ML Studio.
The following code installs the required package(s) for use by the R script. In this example, R code to view a table of packages is included, but this is solely to observe the repository packages in a test environment. It is not necessary in the deployed script.
# setting-up the repository
uri_repo <- "file:///C:/src/localMiniCRAN/"
options(repos = uri_repo)
# extracting the list of available packages
table_packages <- data.frame(package = rownames(available.packages()))
# installing a required package
install.packages("sn")
library(sn)
# do something with the newly loaded libraries
# outputting the list of packages
maml.mapOutputPort("table_packages")
Conclusion
The miniCRAN package is an excellent tool for managing package dependencies for Azure ML projects involving R script. The example here includes only a single package with two dependencies, but a real miniCRAN repository will include all the package references required by a multitude of Azure ML experiments. The miniCRAN repository then becomes a single easy-to-use and easy-to-manage resource for Azure ML scripts that require additional packages not supplied on Azure.