LIS 4370 R Programming: R Objects, S3 vs S4

This week was all about the Object Oriented side to R. To be honest, in the last 6 months that I have been working with the language, this is the first time that I have really gone into detail with R and Object Oriented Programming. I have used object oriented languages before, such as JAVA, C++, and Swift. The C based languages all have a similar way of doing object oriented, and then there is R which to me seems to have evolved into an object oriented language. To begin with, R has two major categories of objects, and they are not necessarily compatible. There are S3 objects that have been around since the language was first developed, and then there are the S4 objects, which have been developed more recently and make the language much, much more powerful.

To discuss the Object Oriented Programming side of R I wrote an R script which can be found on my GitHub here, or follow the link at the bottom of this post.

To discuss some of the features to objects in R, I am going to use the Iris data set, to load it into my script I will save it to the object my_iris.

#Load in some data
my_iris <- iris
Code language: R (r)

Then I can attempt to use generic functions on the data set. Generic functions are functions in R that can have almost any type of object passed as an argument and it will do a ton of work on the backend to return a sensible result. Some of the most frequently used R functions are generic functions such as head(), list(), and str(). Below I try these functions on my dataset.

#Attempting Generic Functions
head(my_iris)
list(my_iris)
Code language: R (r)

These functions perform just as expected and the following is printed to the console. I only include the first few lines from the list() function to improve readability.

> head(my_iris)
  Sepal.Length Sepal.Width Petal.Length Petal.Width Species
1          5.1         3.5          1.4         0.2  setosa
2          4.9         3.0          1.4         0.2  setosa
3          4.7         3.2          1.3         0.2  setosa
4          4.6         3.1          1.5         0.2  setosa
5          5.0         3.6          1.4         0.2  setosa
6          5.4         3.9          1.7         0.4  setosa
> list(my_iris)
[[1]]
    Sepal.Length Sepal.Width Petal.Length Petal.Width    Species
1            5.1         3.5          1.4         0.2     setosa
2            4.9         3.0          1.4         0.2     setosa
3            4.7         3.2          1.3         0.2     setosa
4            4.6         3.1          1.5         0.2     setosa
5            5.0         3.6          1.4         0.2     setosa
6            5.4         3.9          1.7         0.4     setosa
Max size reached, 144 rows excludedCode language: plaintext (plaintext)

The functions worked just fine, in part because the argument passed to them is an S3 object. Had I tried to pass an S4 object to head(), then it would have failed because S4 objects are not subsetable. list() works fine though because it is actually creating a new object, a list of one S4 object.

But how do I know that my_iris is an S3 object? Since S4 objects cannot always be passed into the same functions as S3 objects, there is some need to know which is which. To tell the difference, one can use the function isS4(). Pass my_iris to the function and you get the following:

#Is the data set an S3, or an S4?
isS4(my_iris)
Code language: R (r)

This will return a Boolean TRUE if it is an S4 object, otherwise it is an S3. The output for the above line of code is thus:

[1] FALSECode language: plaintext (plaintext)

SO we can infer from this the my_iris is an S3 object.

So we can identify S3 versus S4, how about checking the base type of the data held inside the object. For S3, each element can be accessed through $ syntax. To demonstrate this, I will create an S3 object, and an S4 object with the same data. The object will be for an employee, who has a name, a job title, and a pay.

For an S3 object, creating the class and new object can be done by creating a list, and then renaming the class type to the desired class. To check the mode of any individual element we can use the object$element pattern and pass it as an argument to mode(). The code will be as follows:

#Creating an S3 Object
my_s3 <- list(name = "Bob", job_title = "Coder", pay = 30000)
class(my_s3) <- "Employee"
mode(my_s3)
attributes(my_s3)
mode(my_s3$name)
Code language: R (r)

The last three lines give the following output.

> my_s3 <- list(name = "Bob", job_title = "Coder", pay = 30000)
> class(my_s3) <- "Employee"
> mode(my_s3)
[1] "list"
> attributes(my_s3)
$names
[1] "name"      "job_title" "pay"      

$class
[1] "Employee"
> mode(my_s3$name)
[1] "character"Code language: plaintext (plaintext)

Here we can see things such as R recognizing the mode of the S3 object is a list. If you were to check the class though, you would see it is of class Employee.

S4 Objects are a little bit more complicated to create. First the class needs to be defined, and the base types for each slot needs to be defined. This is done with the setClass() function. Unlike S3 objects which have elements in a list, S4 objects have slots. Once the class is defined, the number and names of the slots are fixed. We can check the base type of any slot using the mode function. Unlike S3 objects, to access a slot in an S4 function we can use the slot() function and pass the name of the object and the name of the slot as a string or use the object@name pattern. The code to do all of this would look like this:

setClass("Employee", 
         representation(
           name = "character",
           job_title = "character",
           pay = "numeric"
         ))

my_s4 <- new("Employee", name = "Bob", job_title = "Coder", pay = 30000)
my_s4
mode(my_s4)
mode(slot(my_s4, "name"))
mode(my_s4@name)
Code language: R (r)

And the following is printed to the console.

> setClass("Employee", 
+          representation(
+            name = "character",
+            job_title = "character",
+            pay = "numeric"
+          ))
> my_s4 <- new("Employee", name = "Bob", job_title = "Coder", pay = 30000)
> my_s4
An object of class "Employee"
Slot "name":
  [1] "Bob"

Slot "job_title":
  [1] "Coder"

Slot "pay":
  [1] 30000

> mode(my_s4)
[1] "S4"
> mode(slot(my_s4, "name"))
[1] "character"
> mode(my_s4@name)
[1] "character"Code language: plaintext (plaintext)

Here is where some of the differences between S3 and S4 begin to become more apparent. We can see that R recognizes the mode of the S4 object as S4 instead of another base type. In this output R prints out the S4 object by slot instead of by list element like the S3 does.

To summarize, some of the key differences between S3 and S4 is how the class and object are initialized and then accessed. There are other differences where functions and methods come into play, specifically generic functions. S3 objects implement a generic using f.classname() and then declare them with UseMethod(). Meanwhile, S4’s implement a generic using setMethod() and declare them using SetGeneric(). These differences exist because of the different uses of R. For example, when I am doing exploratory analysis or cleaning large sets of data, I do not pay much attention to S3 versus S4. In that case I am not worried about making the best code, but in getting results from the computer. S4 on the other hand is much more methodical in programming. S4 is for when you want to build your own package and make code that anyone can use. In those cases making bullet proof code through defensive programming, type checking and bounds checking, becomes a priority.

Object oriented programming in R is interesting for me as I compare it to what I have experienced in JAVA, C++, and Swift. Working with Object Oriented in R really highlights the two sides to working in R. On one side you have people writing small scripts to clean, transform, and analyze large datasets. Then we have the other side that develops the tools and packages that are used by the analysis side. One of the coolest parts about R I have found is how relatively simple code has a lot running underneath the hood, and then there is a whole new layer of R that enables me to put my own stuff underneath the hood.

Relevant Links:

GitHub: https://github.com/SimonLiles/LIS4370RProgramming/blob/main/LIS4370Mod7.R