# apply family in R

Apply functions in R are used when you have to repeat the same task over your objects.

For example, if you want to know the Standard Deviation of all the columns in your dataframe. You could write a For loop to do the job for you but loops become messy some time.

On the other hand apply() family in R provides a cleaner solution for such situations.

### apply()

syntax: apply(x, MARGIN, FUN)

where,

- X: Object on which you want to apply FUN
- margin: a vector giving subscripts which the function will be applied over. E.g., for a matrix 1 indicates rows, 2 indicates columns, c(1,2) indiacates both rows and columns.If X is a named object, we can also use character vector selecting dimension names.
- FUN: the function to be applied, it can be pre defined R function or a User Defined Function

Let’s create a sample matrix to start with:

```
m1<-matrix(1:16,nrow=4)
m1
```

```
## [,1] [,2] [,3] [,4]
## [1,] 1 5 9 13
## [2,] 2 6 10 14
## [3,] 3 7 11 15
## [4,] 4 8 12 16
```

Let say we want to calculate sum of each row-

```
apply(X=m1,MARGIN = 1,FUN=sum)
```

```
## [1] 28 32 36 40
```

In the example above, R has applied function sum() over each row of the matrix.

Let’s calculate average of each column

```
apply(X=m1,MARGIN=2,FUN=mean)
```

```
## [1] 2.5 6.5 10.5 14.5
```

In the example above, R has applied function mean() over each column of the matrix.

Let’s take Square root of each entry in matrix m1-

```
apply(X=m1,MARGIN=c(1,2),FUN=sqrt)
```

```
## [,1] [,2] [,3] [,4]
## [1,] 1.000000 2.236068 3.000000 3.605551
## [2,] 1.414214 2.449490 3.162278 3.741657
## [3,] 1.732051 2.645751 3.316625 3.872983
## [4,] 2.000000 2.828427 3.464102 4.000000
```

Here, we have asked R to apply function sqrt() over each element of matrix.

What if we have multi dimensional array?

Lets create one:

```
md<-array(1:32,dim=c(4,4,2))
md
```

```
## , , 1
##
## [,1] [,2] [,3] [,4]
## [1,] 1 5 9 13
## [2,] 2 6 10 14
## [3,] 3 7 11 15
## [4,] 4 8 12 16
##
## , , 2
##
## [,1] [,2] [,3] [,4]
## [1,] 17 21 25 29
## [2,] 18 22 26 30
## [3,] 19 23 27 31
## [4,] 20 24 28 32
```

Let’s try MARGIN=3

```
apply(X=md, MARGIN=3, FUN=sum)
```

```
## [1] 136 392
```

Basically, MARGIN takes the dimension index of an object. Remember, that a element can be identified by:

object(row, column, page) – So, if you want to apply FUN over rows give 1, if you want to apply FUN over columns give 2 etc.

#Other variants of apply-

R provides a range of apply() functions which are collectively known as apply family.

Apply Family Member | Input Data Type | Output Data Type |
---|---|---|

apply | Matrix or an array | Vector, Matrix or an Array |

sapply | Vector or a list | Vector or list(simplified) |

lapply | Vector or a list | Vector or list |

vapply | vector or list | vector or list |

mapply | vector or list | vector or list (simplified) |

rapply | recursive list | vector or (recursive) list |

tapply | vectors | vector |

### lapply

lapply returns the list of same length as of ‘X’, resulting an object by applying FUN on each element of ‘X’

Let’s create a list, which contains some objects:

```
lst<-list(a=1, b=c(1:10), c=c('a','b'))
lapply(lst,length)
```

```
## $a
## [1] 1
##
## $b
## [1] 10
##
## $c
## [1] 2
```

lapply() here has taken each element of list and returned result in a list format.

lapply input can be any vector or a list but output will always be in a list format.

For example, let’s calcualte standard deviation of first 4 columns in iris data

```
lapply(iris[,c(1:4)],sd)
```

```
## $Sepal.Length
## [1] 0.8280661
##
## $Sepal.Width
## [1] 0.4358663
##
## $Petal.Length
## [1] 1.765298
##
## $Petal.Width
## [1] 0.7622377
```

### sapply

sapply is similar to lapply, the only difference is that the output of lapply is always going to be a list while output of sapply will be a vector or a simplified list.

Let’s re-run the examples discussed above using sapply

```
lst<-list(a=1, b=c(1:10), c=c('a','b'))
sapply(lst,length)
```

```
## a b c
## 1 10 2
```

```
sapply(iris[,c(1:4)],sd)
```

```
## Sepal.Length Sepal.Width Petal.Length Petal.Width
## 0.8280661 0.4358663 1.7652982 0.7622377
```

We can get the same output from lapply by using unlist()

```
unlist(lapply(iris[,c(1:4)],sd))
```

```
## Sepal.Length Sepal.Width Petal.Length Petal.Width
## 0.8280661 0.4358663 1.7652982 0.7622377
```

sapply output can also be converted to lapply by using “simplify=FALSE” and “USE.NAMES=T”

```
sapply(iris[,c(1:4)],sd,simplify = F,USE.NAMES=T)
```

```
## $Sepal.Length
## [1] 0.8280661
##
## $Sepal.Width
## [1] 0.4358663
##
## $Petal.Length
## [1] 1.765298
##
## $Petal.Width
## [1] 0.7622377
```

### mapply

mapply is a multi-variate version of sapply. Majorly used when you have multiple objects and you want to apply a function on first element of each then 2nd element of each and so on…

Syntax of mapply-

mapply(FUN, .., SIMPLIFY=TRUE, USE.NAMES=T)

Let see an example,

```
mapply(FUN=sum,1:4,2:5,3:6)
```

```
## [1] 6 9 12 15
```

In the example above, we have added index wise elements of each vectors. i.e., sum(1,2,3), sum(2,3,4), sum(3,4,5), sum(4,5,6)

Let’s see one more example-

```
mapply(FUN=rep,1:4,1:4)
```

```
## [[1]]
## [1] 1
##
## [[2]]
## [1] 2 2
##
## [[3]]
## [1] 3 3 3
##
## [[4]]
## [1] 4 4 4 4
```

In this example, 1 (first element of first vector) is repeated once (first element of second vector), 2 (second element of first vector) is repeated twice (second element of second vector) and so on.

### rapply

rapply is a recursive version of lapply and comes very handy when we have nested lists.

```
lst<-list(list(c("a","b"),c("I","Love","R")),list(c("apply","family","is","very","interesting"),c("apply")))
rapply(lst,length)
```

```
## [1] 2 3 5 1
```

Here, rapply() has returned the number of elements within each nested list.

Let see what we would have got using lapply()

```
lapply(lst,length)
```

```
## [[1]]
## [1] 2
##
## [[2]]
## [1] 2
```

lapply isn’t processing the nested list but rapply is.

### tapply

tapply can be considered as an alternative to aggregate, tapply is used to apply FUN over vector(s) grouped by another vector(s)

Let see an example,

assume, we want to calculate the mean of Sepal.Length for each specie type-

```
tapply(iris$Sepal.Length,iris$Species,FUN=mean)
```

```
## setosa versicolor virginica
## 5.006 5.936 6.588
```

#### analyticsfreak

#### Latest posts by analyticsfreak (see all)

- Few interesting questions related to correlation - July 22, 2016
- How to make a reproducible example to share? - July 21, 2016
- Few random questions on Random Forest - July 20, 2016