Does subsetting (matrices or arrays) always perform a partial copy?
Some large datasets are pushing memory and some functions I’m writing to the limit. I wanted to ask some questions about subsetting, of matrices and arrays in particular:
- Does defining a variable as a subset of another lead to copy? For instance
<pre style="background-color:#ffffff;">
<span style="color:#323232;">x <- matrix(rnorm(20*30), nrow=20, ncol=30)
</span><span style="color:#323232;">y <- x[, 1:10]
</span>
Some exploration with https://rdrr.io/cran/pryr/man/object_size.html from pryr
seems to indicate that a copy is made when y
is created, but I’d like to be sure.
- If I enter a subset of a matrix/array as argument to a function, does it get copied before the function is started? For instance in
<pre style="background-color:#ffffff;">
<span style="color:#323232;">x <- matrix(rnorm(20*30), nrow=20, ncol=30)
</span><span style="color:#323232;">y <- dnorm(0, mean=x[,1:10], sd=1)
</span>
I wonder if the data in x[,1:10]
are copied and then given as input to dnorm
.
I’ve heard that data.table
allows one to work with subsets without copies being made (unless necessary), but it seems that one is constrained to two dimensions only – no arrays – that way.
Cheers!
Add comment