# Basics of Data Frame

Introduction

Creating dataframe using vectors

Few functions for dataframes

Summary and Str functions

Quiz

### Introduction

If you are reading this article about Data Frames, then I am sure that you have basic idea of vectors, arrays and matrices.

What is a Dataframe? Consider dataframe nothing but a collection of vectors.

Basically whatever tables you have seen in your SQL classes can be considered as a type of dataframe.

Consider the following example where we have a table of 5 columns.

```
## Sepal.Length Sepal.Width Petal.Length Petal.Width Species
## 1 5.1 3.5 1.4 0.2 setosa
## 2 4.9 3.0 1.4 0.2 setosa
## 3 4.7 3.2 1.3 0.2 setosa
## 4 4.6 3.1 1.5 0.2 setosa
## 5 5.0 3.6 1.4 0.2 setosa
## 6 5.4 3.9 1.7 0.4 setosa
```

Each column can be considered as vectors with same or different data types.

### Creating dataframe using vectors

Let’s create a dataframe on our own.

To start with, we will create few vectors and then we will combine them together.

```
> v1<-c(1:5)
> v1
```

```
## [1] 1 2 3 4 5
```

```
> v2<-seq(1,10,by = 2)
> v2
```

```
## [1] 1 3 5 7 9
```

```
> v3<-c("a","b","c","d","e")
> v3
```

```
## [1] "a" "b" "c" "d" "e"
```

Now lets combine them and create a dataframe

```
> df<-data.frame(v1,v2,v3)
> df
```

```
## v1 v2 v3
## 1 1 1 a
## 2 2 3 b
## 3 3 5 c
## 4 4 7 d
## 5 5 9 e
```

The same result can be achieved without creating vectors

```
> df2<-data.frame(1:5,seq(1,10,by = 2),c("a","b","c","d","e"))
> df2
```

```
## X1.5 seq.1..10..by...2. c..a....b....c....d....e..
## 1 1 1 a
## 2 2 3 b
## 3 3 5 c
## 4 4 7 d
## 5 5 9 e
```

What happened to names? Looking weird?

Nothing to be afraid of, we can always change the names of while creating this dataframe.

Let’s create our dataframe again.

```
> df3<-data.frame(v1=1:5,v2=seq(1,10,by = 2),v3=c("a","b","c","d","e"))
> df3
```

```
## v1 v2 v3
## 1 1 1 a
## 2 2 3 b
## 3 3 5 c
## 4 4 7 d
## 5 5 9 e
```

Now let’s verify whether ‘df’ & ‘df3’ are identical or not –

```
> identical(df,df3)
```

```
## [1] TRUE
```

Now lets check the class of this dataframe:

```
> class(df)
```

```
## [1] "data.frame"
```

### Few functions on Dataframes

#### nrow

nrow() function is used to check the number of rows in a dataframe

```
> nrow(df)
```

```
## [1] 5
```

Remember, length() function used to return number of entries in a vector. What do you think, length() on a dataframe will return?

#### length

`> length(df) `

```
## [1] 3
```

So, length function returns the number of constituting within a R object. For a vector, the constituting entities were individual values hence we used to get number of entries.

But, for a dataframe the constituting entities are different vectors or the columns.

The same result can be obtained by using ncol() function-

#### ncol

`> ncol(df) `

```
## [1] 3
```

#### colnames

If we want to know the names of various columns of a data frame, we can use colnames() functions-

```
> colnames(df)
```

```
## [1] "v1" "v2" "v3"
```

colnames can not only be used to get the column names but also to set the column names.

Remember our df2 dataframe which we create? No??? Let’s print it out again.

```
> df2
```

```
## X1.5 seq.1..10..by...2. c..a....b....c....d....e..
## 1 1 1 a
## 2 2 3 b
## 3 3 5 c
## 4 4 7 d
## 5 5 9 e
```

Now let’s change the column names of df2 using colnames-

```
> colnames(df2)<-c("v1","v2","v3")
> colnames(df2)
```

```
## [1] "v1" "v2" "v3"
```

```
> df2
```

```
## v1 v2 v3
## 1 1 1 a
## 2 2 3 b
## 3 3 5 c
## 4 4 7 d
## 5 5 9 e
```

In this example we are using a dataframe we are using a very small dataframe which has only 5 rows.

Now, let’s create a little bit bigger dataframe

```
> df4<-data.frame(v1=1:100,v2=101:200,v3=201:300)
```

Now, if we print ‘df4’ on the screen it will look ugly. But what is the solution in that case? We absolutely want to see the structure of our data.

#### head

head() function can be used here to get first few rows on console instead of all the rows.

```
> head(df4,n=10)
```

```
## v1 v2 v3
## 1 1 101 201
## 2 2 102 202
## 3 3 103 203
## 4 4 104 204
## 5 5 105 205
## 6 6 106 206
## 7 7 107 207
## 8 8 108 208
## 9 9 109 209
## 10 10 110 210
```

So, we just printed out first 10 entries of a dataframe named as df4

#### tail

Let see what is stored at the end of it

```
> tail(df4,n=10)
```

```
## v1 v2 v3
## 91 91 191 291
## 92 92 192 292
## 93 93 193 293
## 94 94 194 294
## 95 95 195 295
## 96 96 196 296
## 97 97 197 297
## 98 98 198 298
## 99 99 199 299
## 100 100 200 300
```

If you omit the second parameter i.e., ānā – R will print 6 rows by default

```
> head(df4)
```

```
## v1 v2 v3
## 1 1 101 201
## 2 2 102 202
## 3 3 103 203
## 4 4 104 204
## 5 5 105 205
## 6 6 106 206
```

```
> tail(df4)
```

```
## v1 v2 v3
## 95 95 195 295
## 96 96 196 296
## 97 97 197 297
## 98 98 198 298
## 99 99 199 299
## 100 100 200 300
```

#### summary and str functions

R has two very good functions, which you will be using very frequently.

- summary()
- str()

summary() gives a detailed summary about the object

```
> summary(df4)
```

```
## v1 v2 v3
## Min. : 1.00 Min. :101.0 Min. :201.0
## 1st Qu.: 25.75 1st Qu.:125.8 1st Qu.:225.8
## Median : 50.50 Median :150.5 Median :250.5
## Mean : 50.50 Mean :150.5 Mean :250.5
## 3rd Qu.: 75.25 3rd Qu.:175.2 3rd Qu.:275.2
## Max. :100.00 Max. :200.0 Max. :300.0
```

str() is yet another good function which tells us how a object is structured

```
> str(df4)
```

```
## 'data.frame': 100 obs. of 3 variables:
## $ v1: int 1 2 3 4 5 6 7 8 9 10 ...
## $ v2: int 101 102 103 104 105 106 107 108 109 110 ...
## $ v3: int 201 202 203 204 205 206 207 208 209 210 ...
```

Here, we can see that the class of this object is a data.frame which containes 3 variables with 100 observations.

It also gives a variable level summary telling what are the variable names, variable type and few initial values stored in the variable.

In the next tutorial we will learn about accessing elements of a data frame through various methods.

### Quiz

You have gone through the first tutorial on Dataframes, let’s check your success!!

#### analyticsfreak

#### Latest posts by analyticsfreak (see all)

- Few interesting questions related to correlation - July 22, 2016
- How to make a reproducible example to share? - July 21, 2016
- Few random questions on Random Forest - July 20, 2016