# Lists and factor variables in R

Lists and factor variables are two powerful objects in R. Let’s start with Lists.

**Lists in R**

As you already know that vectors are one dimensional objects, matrix is 2 dimensional objects and Arrays are multi dimensional objects in R. But all of them can contain observations belonging to one data type only.

Dataframes can contain multiple data type objects but all of them should be of same length.e.g., if we run following piece of code in R, it will return an error.

```
> df<-data.frame(v1=c("a","b","c"),v2=c(1:10))
```

```
## Error in data.frame(v1 = c("a", "b", "c"), v2 = c(1:10)): arguments imply differing number of rows: 3, 10
```

What if we want to store multiple type of objects having multiple structures? In such scenarios we can use lists.

```
> lst<-list(v1=c("a","b","c"),v2=c(1:10))
> lst
```

```
## $v1
## [1] "a" "b" "c"
##
## $v2
## [1] 1 2 3 4 5 6 7 8 9 10
```

```
> class(lst)
```

```
## [1] "list"
```

A list can contain any kind of objects within it-

```
> lst_1<-list(v1=c("a","b","c"),v2=matrix(1:9,3,3),v3=head(iris))
> lst_1
```

```
## $v1
## [1] "a" "b" "c"
##
## $v2
## [,1] [,2] [,3]
## [1,] 1 4 7
## [2,] 2 5 8
## [3,] 3 6 9
##
## $v3
## Sepal.Length Sepal.Width Petal.Length Petal.Width Species
## 1 5.1 3.5 1.4 0.2 setosa
## 2 4.9 3.0 1.4 0.2 setosa
## 3 4.7 3.2 1.3 0.2 setosa
## 4 4.6 3.1 1.5 0.2 setosa
## 5 5.0 3.6 1.4 0.2 setosa
## 6 5.4 3.9 1.7 0.4 setosa
```

Here, we have combined a vector, a matrix and a dataframe into a list.

**Working with Lists in R
**

Before we start working with Lists, a question for you-

What will be the output of following line of code? 3 or more than that?

```
> length(lst_1)
```

```
## [1] 3
```

It is 3. i.e., we have 3 objects in lst_1.

1. Character vector

2. Matrix

3. Dataframe

Let see the summary and structure of this list.

```
> summary(lst_1)
```

```
## Length Class Mode
## v1 3 -none- character
## v2 9 -none- numeric
## v3 5 data.frame list
```

```
> str(lst_1)
```

```
## List of 3
## $ v1: chr [1:3] "a" "b" "c"
## $ v2: int [1:3, 1:3] 1 2 3 4 5 6 7 8 9
## $ v3:'data.frame': 6 obs. of 5 variables:
## ..$ Sepal.Length: num [1:6] 5.1 4.9 4.7 4.6 5 5.4
## ..$ Sepal.Width : num [1:6] 3.5 3 3.2 3.1 3.6 3.9
## ..$ Petal.Length: num [1:6] 1.4 1.4 1.3 1.5 1.4 1.7
## ..$ Petal.Width : num [1:6] 0.2 0.2 0.2 0.2 0.2 0.4
## ..$ Species : Factor w/ 3 levels "setosa","versicolor",..: 1 1 1 1 1 1
```

subsetting Lists is easy, just put a square bracket like other objects and it will do the job for you. Let’s try

```
> lst_1[1]
```

```
## $v1
## [1] "a" "b" "c"
```

let’s check the class of this subset object

```
> class(lst_1[1])
```

```
## [1] "list"
```

So, when you subset a list using single square bracket it returns a list. But what if we want a subset object of it’s original class i.e., vector for the first object in lst_1? In such cases you have to use a double square bracket instead of a single square bracket.

```
> lst_1[[1]]
```

```
## [1] "a" "b" "c"
```

```
> class(lst_1[[1]])
```

```
## [1] "character"
```

**unlisting a list**

applying unlist() function on a list returns a vector which contains all the atomic components occurring in the list:

```
> unlist(lst)
```

```
## v11 v12 v13 v21 v22 v23 v24 v25 v26 v27 v28 v29 v210
## "a" "b" "c" "1" "2" "3" "4" "5" "6" "7" "8" "9" "10"
```

That’s it about lists for now, let’s talk about Factors now

**Factor variables in R**

Factor variables in R are categorical variables in R which are very useful for multiple reasons.

Firstly, they are interpreted as categorical variables in Statistical modules. For example, if you have 0, 1 in your dataset as a response variable and you want to let R know to consider them as categories and not as numbers – Just convert them to factor variables.

Secondly, they save space. R stores categories as numeric values when you convert a string variable into a factor variable.

For example, assume you have gender as a variable in your dataset which contains ‘male’, ‘female’ as possible values. When you convert this gender variable into a factor variable, R internally stores ‘female’ as 1 and ‘male’ as 2 and hence saving the space.

Let’s create a few factor variables.

```
> fct<-factor(rep(c("I","Love","R"),100))
> head(fct)
```

```
## [1] I Love R I Love R
## Levels: I Love R
```

You see the last line starting with ‘Levels:’ – It is saying that there are three categories in fct variable, ‘I’, ‘Love’ & ‘R’

lets create a character variable with same structure,

```
> chr<-rep(c("I","Love","R"),100)
> head(chr)
```

```
## [1] "I" "Love" "R" "I" "Love" "R"
```

And now, let’s compare the size of both the objects.

```
> object.size(fct)
```

```
## 1776 bytes
```

```
> object.size(chr)
```

```
## 2584 bytes
```

You see that character object occupies more space than a factor variable.

When we create a factor variable, it assigns levels based on the alphabetical order. but you can change it by using ‘levels’ argument

```
> fct_1<-factor(rep(c("Low","Medium","High"),10))
> fct_1
```

```
## [1] Low Medium High Low Medium High Low Medium High Low
## [11] Medium High Low Medium High Low Medium High Low Medium
## [21] High Low Medium High Low Medium High Low Medium High
## Levels: High Low Medium
```

Because of the alphabetical ordering High is coming first then Low and then Medium.

But don’t worry we can change the ordering

```
> fct_2<-factor(rep(c("Low","Medium","High"),10),levels=c("Low","Medium","High"))
> fct_2
```

```
## [1] Low Medium High Low Medium High Low Medium High Low
## [11] Medium High Low Medium High Low Medium High Low Medium
## [21] High Low Medium High Low Medium High Low Medium High
## Levels: Low Medium High
```

**Beware of Factors**

Factors are good but you have to be careful while using them. Let’s see an example to emphasize on this part

```
> numbr<-c(100,200,300)
> fct_3<-as.factor(numbr)
> fct_3
```

```
## [1] 100 200 300
## Levels: 100 200 300
```

Here we have first created a vector ‘numbr’ which contains values 100,200 & 300 and then we converted that vector to a factor variable using as.factor()

as.factor is one of the type conversion functions, we’ll explore them in detail in next tutorials. But, for now consider it some function which converts numeric values to factor variable.

Let’s calculate mean of fct_3

```
> mean(fct_3)
```

```
## Warning in mean.default(fct_3): argument is not numeric or logical:
## returning NA
```

```
## [1] NA
```

This is something weird!!! We were expecting it to be 200.

This is because arithmatic function expects that you will pass a numeric vector to it, in case you don’t it will not calculate the expected output.

Let’s try it again by converting it to numeric vector again.

```
> numbr_1<-as.numeric(fct_3)
> mean(numbr_1)
```

```
## [1] 2
```

We expected it to be 200 but it returned 2, why?

R stores factor variables in the form of numeric aliases, i.e. 100 got stored as 1, 200 got stored as 2 and 300 got stored as 3.

Let see what does numbr_1 actually contains-

```
> numbr_1
```

```
## [1] 1 2 3
```

But the question arises, how can we get the correct result? To get the actual result we need to convert factors first into character and then to numeric.i.e.,

```
> char_1<-as.character(fct_3)
> numbr_2<-as.numeric(char_1)
> mean(numbr_2)
```

```
## [1] 200
```

Here you go!!!

There is one more way to get the same result without using as.character(), let’s see that as well

```
> numbr_3<-as.numeric(levels(fct_3)[(fct_3)])
> mean(numbr_3)
```

```
## [1] 200
```

Whatever method you use the message here is to be more careful when dealing with factor variables.

#### analyticsfreak

#### Latest posts by analyticsfreak (see all)

- Few interesting questions related to correlation - July 22, 2016
- How to make a reproducible example to share? - July 21, 2016
- Few random questions on Random Forest - July 20, 2016