# Multi label classification

- Introduction
- Various approaches to deal with Multi Label Classification
- Working with Multi Label Classification in R

In Machine Learning, Multi Label Classification comes into picture when you when your dependent variable is not a single feature but a set of multiple classes altogether.

In other words a single observation in your data might have more than one categories assigned to it. Multi label problems are different in a way that in Multi Class Problems you have multiple exclusive classes but each observation can have only one class.

There are various application of the multi label problems e.g., text categorization, semantic image labeling, gene functionality classification etc. and the scope and interest is increasing with modern applications.

Let’s take an example of movie genres, Famous “Fight Club” movie falls under action, drama, dark comedy classes.

Other example can be News classification – One news can fall into multiple classes based on the content.

### Various approaches to solve Multi Label Problems

There are multiple approaches developed to solve this kind of problem. One of the basic approaches suggested by Tsoumakas is to consider train a classifier for each label (or class) separately, this method is know as Binary Relevance (BR).

There have been further improvement suggested to Binary Relevance approach, e.g., Dembczynski and Hariharan suggested to incorporate label correlations while Bi & Kwok suggested to use label hierarchy in training. But as the number of classes increases the complexity of these methods increases tremendously.

Hsu in 2009 suggested a 3 step approach to deal with these kind of problems-

- Step 1 – First of all we need to project high-dimensional label vector to a low-dimensional vector space using random transformation
- Step 2 – A regression model is built for each dimension of the transformed label vector
- Step 3 – Final predicted low dimensional label vector is re-projected to higher dimensional label vector

Tai and Lin found this random transformation to be ineffective and further suggested use of Principal Component Analysis a.k.a PCA on the label matrix (Y), known as Principal Label Space Transformation (PLST).

Chen & Lin in 2012 proposed to use Conditional Princpal Label Space Transformation (CPLST) which simultaneously reduces both the encodeing and training errors.

Similary, Canonical Correlation Analysis (CCA) was proposed by Zhang & Schneider that takes both output and input matrices into consideration.

Recently, Multiple Output Prediction Landmark Selection Method (MOPLMS) was suggested by Balasubramanian & Lebanon which is based on the assumption that we can recover all the output labels by a smaller subset of those labels.

### Using Multi Label Classification in R

This Machine Learning field is fairly new but if you want to experiment, you could use ‘mldr’ package and ‘rFerns’ package in R.

#### analyticsfreak

#### Latest posts by analyticsfreak (see all)

- Few interesting questions related to correlation - July 22, 2016
- How to make a reproducible example to share? - July 21, 2016
- Few random questions on Random Forest - July 20, 2016