2019-06-27
When examining the available selection of machine learning environments, even the most ardent R user may suffer some pangs of Python envy. Most machine learning environments, such as Google's Tensorflow, are programmed in C++ for maximum performance and maximum utilization of hardware resources such as GPUs (graphics processing units). The Tensorflow API, however, is designed to permit Python programmers to easily access the functionality of core Tensorflow libraries. This brings us to a kind of "good-news, bad-news" joke. The good news is that the RStudio folks have provided an R package to provide R users access to the popular Tensorflow toolset. The bad news is that Tensorflow's API is still a Python interface, so any R code must be translated into Python-compatible instructions if they are to be correctly applied. The fundamental R data structures and datatypes, which are different from those of Python, must be scrutinized and managed precisely if the interface to Tensorflow is to work at all.
Our goal here is not to learn Tensorflow, of course, but to examine some of the ways that R code must be crafted if it is to work with Tensorflow. Another aspect of the "bad news" is that virtually all tutorials on using Tensorflow are written for the Python programmer. Our ironic task, therefore, is to learn to translate Python examples into R code which can then be translated into Python.
Of course, we are still obliged to create a "Hello, world" program for our first effort. This assumes you have already installed the R "tensorflow" package. As a nice touch, installing this package will trigger the installation of Tensorflow itself if it is not already installed.
library(tensorflow)
hello <- tf$constant('Hello, World!')
sess <- tf$Session()
sess$run(hello)
sess$close()
You should note immediately that there has been no instantiation of any object in our R code. We simply start using the "tf" variable to access Tensorflow. "tf" is created when you load the tensorflow package using the library function. You must be careful here; the accidental creation of a new variable called tf will mask the tf from the library. If you remove the tf variable, you will have to reload the library.
In the simple "Hello, world" example it would seem that the main difference is that R uses the "$" notation, so that what would be tf.constant( ) in Python is now tf$constant( ) in R. However, a great deal more attention to detail is necessary.
For example, [2,2] would be a vector consisting of two integers in Python. c(2,2) would seem to be the obvious R equivalent, but if we run class(c(2,2)) we find that the R version is a numeric vector, not an integer vector. We must use as.integer(c(2,2)). The following example uses this vector in the declaration of a Tensorflow matrix containing random values.
library(tensorflow)
# accessing "tf" takes a long time if the tensorflow library has just been loaded
W1 <- tf$ones(2,2) # from python W1 = tf.ones((2,2))
# W2 similarly. R doesn't like the double parens
W2 <- tf$Variable(tf$zeros(2,2), name="weights")
sess<-tf$Session()
print(sess$run(W1))
# tf$initialize_all_variables() is deprecated
# sess$run(tf$initialize_all_variables())
sess$run(tf$global_variables_initializer())
print(sess$run(W2))
print(class(c(2,2)))
# note that the vector c(2,2) is numeric in R, not integer
# tensorflow demands an integer in this context
R = tf$Variable(tf$random_normal(as.integer(c(2,2))), name="random_weights")
sess$run(tf$global_variables_initializer())
print(sess$run(R))
Python code is likely to use Dictionary objects which are analogous to, but not identical to, named lists in R. I have found that in some cases a list object will fail if the Tensorflow API expects a Dictionary. Fortunately the R package "reticulate" (reticulate python, get it?) provides an R "dict" object that can be used successfully in these cases. Reticulate is installed as a part of the R tensorflow installation.
input1 <- tf$placeholder(tf$float32)
input2 <- tf$placeholder(tf$float32)
output <- tf$multiply(input1, input2)
sess<-tf$Session()
# print(sess.run([output], feed_dict={input1:[7.], input2:[2.]}))
print(sess$run(output, feed_dict=dict(input1=c(7.0), input2=c(2.0))))
# dict is in the reticulate namespace. The R "list" primitive does not work here
Conclusion
The tensorflow package provides the necessary functionality for the R programmer to develop code that will execute in Google's Tensorflow ML platform. However, the fact that virtually all Tensorflow books and tutorials are written using Python, the R programmer must be able to read the Python examples with sufficient understanding before he or she can craft the R equivalent. Unlike many R packages, from which only a few simple functions might be required, tensorflow requires an understanding of the full workings of Google Tensorflow itself, which is a substantial undertaking.