# Introduction to Quantile Regression

Introduction

Quantile regression using ggplot

Quantile regression using quantreg package

Quantile regression is commonly used in the field of statistics and econometric. In simple linear regression we try to model the conditional mean Y given X=x wherein Quantile Regression we model a conditional quantile value of Y given X = x.

Quantile regression is extremely useful in the cases where we are more interested in Extremes. Consider the following example,

A telecom company(ABC) has around 1M customers and they want to know whether they have maximum share of wallet among their customers or not. For example, consider Robert spends $100 on telecom altogether but with ABC he spends only $20 in these case share of wallet of Robert with ABC will be 20%.

ABC wants to increase the share of wallet and for that they need to identify the potential customers. But the problem is that they don’t have competitor’s data so there is no direct way to measure the share of wallet.

One of the possible options is to identify similar customer set and see what is the maximum spend for the group and share of Nick’s actual spend (e.g., $35) and Maximum Spend in the group (e.g., $100) can be assumed as share of wallet. But you can expect 1000s or 10000s such groups.

Other option is to fit a linear regression model fitted to a possible quantile (85th, 90th…) instead of fitting through mean.

Developed by Koenker and Bassett (1978), Quantile Regression does the same thing for you.

If you are wondering, quantile is same as percentile the only difference is that quantile talks in the language of fractions i.e. 0.25 instead of 25%

Quantile Regression in R

Roger Koenker developed quantreg package in R.

### Quantile Regression Using ggplot

Before we go and implement quantile regression, let’s synthesize some data to work on-

```
x <- seq(0,100,length.out = 100) # independent variable
sig <- 0.1 + 0.06*x # non-constant variance
b0 <- 7 # true intercept
b1 <- 0.4 # true slope
set.seed(2) # make the next line reproducible
e <- rnorm(100,mean = 0, sd = sig) # normal random error with non-constant variance
y <- b0 + b1*x + e # dependent variable
df <- data.frame(x,y)
```

Let’s visualize the data using ggplot2

```
library(ggplot2)
ggplot(df, aes(x,y)) + geom_point()
```

Let’s first use the geom_smooth() function to regresses y on x-

```
ggplot(df, aes(x,y)) + geom_point() + geom_smooth(method="lm")
```

The blue line here shows the fitted linear regression line.

Now let’s use ggplot to plot a line at 90th percentile

```
ggplot(df, aes(x,y)) + geom_point() + geom_quantile(quantiles=0.9)
```

```
## Smoothing formula not specified. Using: y ~ x
```

Behind the scenes, ggplot has called quantreg package in R to compute the fitted line.

### Using quantreg package in R

Let’s use quantreg package to fit a quantile regression model-

The syntax and output are very similar to that of linear regression-

```
require(quantreg)
qr<-rq(y~x,data=df,tau=.9)
summary(qr)
```

```
##
## Call: rq(formula = y ~ x, tau = 0.9, data = df)
##
## tau: [1] 0.9
##
## Coefficients:
## coefficients lower bd upper bd
## (Intercept) 7.28141 6.97494 7.96709
## x 0.49964 0.48822 0.52040
```

Lower bd and upper bd values of Intercept and x determines the confidence interval for predicted 90th percentile value of y.

Let see whether the output from ggplot matches with this model or not?

```
ggplot(df, aes(x,y)) + geom_point() + geom_abline(intercept=coef(qr)[1], slope=coef(qr)[2])
```

#### analyticsfreak

#### Latest posts by analyticsfreak (see all)

- Few interesting questions related to correlation - July 22, 2016
- How to make a reproducible example to share? - July 21, 2016
- Few random questions on Random Forest - July 20, 2016