# a base object
is.object(1:10)
[1] FALSE
# An OO object
is.object(mtcars)
[1] TRUE
Object Oriented Programming (OOP) is a programming paradigm that is based on the concept of “objects” that contain both data and methods for manipulating that data. OOP is a popular programming paradigm in many languages, including R, and it can be a powerful tool for organizing and managing complex code.
The main reason to use OOP is polymorphism (literally: many shapes). Polymorphism means that a developer can consider a function’s interface separately from its implementation, making it possible to use the same function form for different types of input. This is closely related to the idea of encapsulation: the user doesn’t need to worry about details of an object because they are encapsulated behind a standard interface.
There are two main paradigms of object-oriented programming which differ in how methods and classes are related:
object.method(arg1, arg2)
. This is called encapsulated because the object encapsulates both data (with fields) and behaviour (with methods), and is the paradigm found in most popular languages.generic(object, arg2, arg3)
. This is called functional because from the outside it looks like a regular function call, and internally the components are also functions.Base R provides three OOP systems: S3
, S4
, and reference classes (RC
):
S3
is R’s first OOP system, and is an informal implementation of functional OOP and relies on common conventions rather than ironclad guarantees. This makes it easy to get started with, providing a low cost way of solving many simple problems.S4
is a formal and rigorous rewrite of S3
. It requires more upfront work than S3
, but in return provides more guarantees and greater encapsulation. S4
is implemented in the base {methods}
package, which is always installed with R.RC
implements encapsulated OO. RC
objects are a special type of S4
objects that are also mutable, i.e., instead of using R’s usual copy-on-modify semantics, they can be modified in place. This makes them harder to reason about, but allows them to solve problems that are difficult to solve in the functional OOP style of S3
and S4
.Some OOP systems are also provided as CRAN packages. Most notably, R6
implements encapsulated OOP like RC
.
Of course, everything in R is an object - but not in the OOP sense. We can use the base::is.object
function to see whether the object has a "class"
attribute. This is (a kind of) shorthand for using base::attr(1:10, "class")
or base::attr(mtcars, "class")
.
# a base object
is.object(1:10)
[1] FALSE
# An OO object
is.object(mtcars)
[1] TRUE
However, since it just returns TRUE
or FALSE
, that might not be super useful. Enter the {sloop}
package:
library(sloop)
otype(1:10)
[1] "base"
otype(mtcars)
[1] "S3"
While only OO-objects have a "class"
attribute, every object has a type
. These types are defined within R
and can only be added/modified by the R Core Team.
Before moving on to S3
objects, it is useful to review attributes in R. This is because S3
objects feature a unique attribute, called class
. We can set attributes arbitrarily with base::attr
:
<- 1:3
a attr(a, "x") <- "abcdef"
attr(a, "x")
[1] "abcdef"
attr(a, "y") <- 4:6
str(attributes(a))
List of 2
$ x: chr "abcdef"
$ y: int [1:3] 4 5 6
Or, equivalently
<- structure(
a 1:3,
x = "abcdef",
y = 4:6
)
str(attributes(a))
List of 2
$ x: chr "abcdef"
$ y: int [1:3] 4 5 6
Some known examples of attributes are names
and dims
:
# by direct assignment
<- c(a = 1, b = 2, c = 3)
x
# or separately
<- 1:3
x names(x) <- c("a", "b", "c")
S3
S3
is R’s first and simplest OO system.S3
is informal and ad hoc, but there is a certain elegance in its minimalism: you can’t take away any part of it and still have a useful OO system. For these reasons, you should use it, unless you have a compelling reason to do otherwise.S3
is the only OO system used in the base and stats packages, and it’s the most commonly used system in CRAN packages.
S3
objectAn S3
object is a base type with at least a class
attribute. The factor
, for example:
integer
as base type,factor
as class
attribute, andlevel
attribute to store the levels.<- factor(c("a", "b", "c"))
f
typeof(f)
[1] "integer"
attributes(f)
$levels
[1] "a" "b" "c"
$class
[1] "factor"
An S3
object behaves differently from its underlying base type whenever it’s passed to a generic (short for generic function). The easiest way to tell if a function is a generic is to use sloop::ftype()
and look for “generic” in the output:
library(sloop)
ftype(print)
[1] "S3" "generic"
ftype(str)
[1] "S3" "generic"
A generic function defines an interface, which uses a different implementation depending on the class of an argument (almost always the first argument).
print(f)
[1] a b c
Levels: a b c
print(unclass(f))
[1] 1 2 3
attr(,"levels")
[1] "a" "b" "c"
unclass()
is a special function (that is a primitive
and not a generic: see the output of ftype(unclass)
) that strips away the class
attribute of a S3
object.
You can use sloop::s3_dispatch
to inspect all the methods/implementations of a generic functions:
s3_dispatch(print(f))
=> print.factor
* print.default
The print
generic has a print.factor
method. You should never call the method directly, but instead rely on the generic to find it for you - i.e., let the dispatched do the work.
We can see the implementation details of a method with sloop::s3_get_method
.
S3
To define a class with S3
, there is no reserved way such as Python’s class MyClass
. Instead, it is enough to set the class
attribute.
<- structure(list(), class = "my_class") x
And we can use class(x)
to inspect the class. Since no formal way to define a class is provided, it is up to us to define a class constructor. One of the best practices is to define one called new_myclass()
where myclass
is your class name.
This new_*
function will be mainly used by the developer. As an example, let’s reimplement the Date
class:
<- function(x = double()) {
new_date stopifnot(is.double(x))
structure(x, class = "Date")
}
new_date(c(-1, 0, 1))
[1] "1969-12-31" "1970-01-01" "1970-01-02"
These new_*
function can be less comprehensive and “safe” compared to proper constructors. This is because they are meant to be used frequently during development. On the other hand, we can also define validator functions to ensure that the correct attributes are passed. These can (should) be used within the helper function, named myclass
that will be used by the end-user. These helper functions must have informative error messages, sensitive default values and call the new_myclass
constructor at the end:
<- function(x) {
myclass validate_myclass(x)
new_myclass(x)
}
S3
Let’s create a student class:
<- function(name, age) {
new_student structure(list(name = name, age = age), class = "Student")
}
<- function(name, age) {
validate_student stopifnot(is.character(name))
stopifnot(is.double(age))
}
<- function(name, age) {
student validate_student(name, age)
new_student(name, age)
}
<- student(name = "Andrew", age = 25) andrew
We can inspects its attributes (also called fields) with the $
accessor:
$name andrew
[1] "Andrew"
$age andrew
[1] 25
We can define a generic function greet
:
<- function(person) {
greet UseMethod("greet")
}
<- function(person) {
greet.Student cat("Hello", person$name)
}
greet(andrew)
Hello Andrew
In this way we created a new method for the generic greet
. We could use greet.Student()
too, but the UseMethod
does the job of dispatching to the correct type for us.
Let’s also create a new class and a new method for greet
:
<- function(name, age) {
new_prof structure(list(name = name, age = age), class = "Prof")
}
<- function(name, age) {
validate_prof stopifnot(is.character(name))
stopifnot(is.double(age))
}
<- function(name, age) {
prof validate_prof(name, age)
new_prof(name, age)
}
<- prof(name = "Max", age = 25) max
We can easily add a new method:
<- function(prof) {
greet.Prof cat("Good day,", prof$name)
}
And to see them in action:
greet(andrew)
Hello Andrew
greet(max)
Good day, Max
S3
: Advanced conceptsR6
R6
is available as a package, so make sure you install it in case if you don’t have it available:
install.packages('R6')
library(R6)
R6
OO object use the same ‘encapsulation’ paradigm of S3
objects (unlike S4
). As Hadley Wicham puts it:
If you’ve learned OOP in another programming language, it’s likely that
R6
will feel very natural, and you’ll be inclined to prefer it overS3
. Resist the temptation to follow the path of least resistance: in most casesR6
will lead you to non-idiomatic R code.
R6
R6
classes follow a more concise template:
<- R6Class("MyClass",
MyClass list(
...
) )
Where the list()
argument contains attributes and methods. We can access them with the self$attribute
or self$method()
:
<- R6Class("Accumulator", list(
Accumulator sum = 0,
add = function(x = 1) {
$sum <- self$sum + x
selfinvisible(self)
})
)
Accumulator
<Accumulator> object generator
Public:
sum: 0
add: function (x = 1)
clone: function (deep = FALSE)
Parent env: <environment: R_GlobalEnv>
Locked objects: TRUE
Locked class: FALSE
Portable: TRUE
(Keep in mind the invisible()
function for a moment, we shall talk about that in a bit). To instantiate a new object of the class we use MyClass$new()
:
<- Accumulator$new()
acc
$add(4)
acc$sum acc
[1] 4
invisible
return valueTo ensure method chaining, functions that have side effects (i.e., which modify the internal data/state of the object) should always return self
, but silently. With this, we can write the following:
$
accadd(10)$
add(10)$
sum
[1] 24
Much alike __init__()
, we can define a $initialize
method to override the default behaviour of $new
, and $print
behaves like __repr__()
, and should return a invisible(self)
. We can also implement a $validate
to ensure the arguments are checked.
Unlike Python (where every method/attribute is public, even though conventions are in place to denote fields that should not be touched), with R6
we can define private attributes and methods of a class. We can simply add a private
field, that will be an instance of list()
.
As a side note: this means that the fields defined after the class name are silently assigned to public
. In other words:
<- R6::R6Class("SecretAgent",
SecretAgent public = list(
validate = function(name, age) {
stopifnot(is.numeric(age))
stopifnot(is.character(name))
},initialize = function(name, age = NA) {
$validate(name, age)
self$name <- name
private$age <- age
private
},print = function(...) {
cat("SecretAgent: \n")
cat(" Name: ", private$name, "\n")
cat(" Age: ", private$age, "\n")
}
),private = list(
age = NA,
name = NULL
) )
We cannot access any private field: not even $private
itself:
# these are capital "o", not 0s
<- SecretAgent$new("James Bond", 47)
OO7
$name OO7
NULL
$age OO7
NULL
$private OO7
NULL
$private$name OO7
NULL
$print() OO7
SecretAgent:
Name: James Bond
Age: 47
These are attributes - i.e., are called like self$attr
- even though they are defined with functions. Much alike private methods, active fields are defined within the active = list(...)
argument of the class.
We can add a new method anytime by using the default $set
method:
<- R6Class("Accumulator")
Accumulator $set("public", "sum", 0)
Accumulator$set("public", "add", function(x = 1) {
Accumulator$sum <- self$sum + x
selfinvisible(self)
})
And we can use the inherit
attribute to provide the parent class. Much like Python, we can access the methods defined in the parent with super$
.
R6
and S3
Every R object has an S3
class - this means we can use class()
to obtain information about its class
class(acc)
[1] "Accumulator" "R6"
names(acc)
[1] ".__enclos_env__" "clone" "sum" "add"
Let’s go back to the accumulator: what will happen if we assign a new instance of Accumulator
to another class?
<- Accumulator$new()
acc2 <- acc2
acc3
$add(10)
acc2
c(acc2 = acc2$sum, acc3= acc3$sum)
acc2 acc3
10 10
The two acc2
and acc3
actually refer to the same object. To create a copy, we need to use the $copy()
method:
<- Accumulator$new()
acc2 <- acc2$clone()
acc3
$add(10)
acc2
c(acc2 = acc2$sum, acc3 = acc3$sum)
acc2 acc3
10 0
S4
and functional OOP in RS4
tutorialR6
vs RC