You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Copy file name to clipboardExpand all lines: vignettes/tutorial/multilabel.Rmd
+21-21
Original file line number
Diff line number
Diff line change
@@ -19,16 +19,16 @@ set.seed(123)
19
19
20
20
Multilabel classification is a classification problem where multiple target labels can be assigned to each observation instead of only one like in multiclass classification.
21
21
22
-
Two different approaches exist for multilabel classification.
23
-
*Problem transformation methods* try to transform the multilabel classification into binary or multiclass classification problems.
22
+
Two different approaches exist for multilabel classification.
23
+
*Problem transformation methods* try to transform the multilabel classification into binary or multiclass classification problems.
24
24
*Algorithm adaptation methods* adapt multiclass algorithms so they can be applied directly to the problem.
25
25
26
26
# Creating a task
27
27
28
28
The first thing you have to do for multilabel classification in `mlr` is to
29
-
get your data in the right format.
30
-
You need a `data.frame` which consists of the features and a logical vector for each label which indicates if the label is present in the observation or not. After that you can create a `MultilabelTask` (`Task()`) like a normal `ClassifTask` (`Task()`).
31
-
Instead of one target name you have to specify a vector of targets which correspond to the names of logical variables in the `data.frame`.
29
+
get your data in the right format.
30
+
You need a `data.frame` which consists of the features and a logical vector for each label which indicates if the label is present in the observation or not. After that you can create a `MultilabelTask` (`Task()`) like a normal `ClassifTask` (`Task()`).
31
+
Instead of one target name you have to specify a vector of targets which correspond to the names of logical variables in the `data.frame`.
32
32
In the following example we get the yeast data frame from the already existing `yeast.task()`, extract the 14 label names and create the task again.
33
33
34
34
```{r}
@@ -48,18 +48,18 @@ Multilabel classification in `mlr` can currently be done in two ways:
48
48
49
49
## Algorithm adaptation methods
50
50
51
-
Currently the available algorithm adaptation methods in **R** are the multivariate random forest in the [%randomForestSRC] package and the random ferns multilabel algorithm in the [%rFerns] package.
51
+
Currently only the random ferns multilabel algorithm in the [%rFerns] package is available for multilabel classification tasks.
52
+
52
53
You can create the learner for these algorithms like in multiclass classification problems.
For generating a wrapped multilabel learner first create a binary (or multiclass) classification learner with `makeLearner()`.
62
+
For generating a wrapped multilabel learner first create a binary (or multiclass) classification learner with `makeLearner()`.
63
63
Afterwards apply a function like `makeMultilabelBinaryRelevanceWrapper()`, `makeMultilabelClassifierChainsWrapper()`, `makeMultilabelNestedStackingWrapper()`, `makeMultilabelDBRWrapper()` or `makeMultilabelStackingWrapper()` on the learner to convert it to a learner that uses the respective problem transformation method.
64
64
65
65
You can also generate a binary relevance learner directly, as you can see in the example.
Prediction can be done as usual in `mlr` with `predict` (`predict.WrappedModel()`) and by passing a trained model and either the task to the ``task`` argument or some new data to the ``newdata`` argument.
122
+
Prediction can be done as usual in `mlr` with `predict` (`predict.WrappedModel()`) and by passing a trained model and either the task to the ``task`` argument or some new data to the ``newdata`` argument.
123
123
As always you can specify a ``subset`` of the data which should be predicted.
124
124
125
125
```{r}
@@ -166,9 +166,9 @@ listMeasures("multilabel")
166
166
167
167
# Resampling
168
168
169
-
For evaluating the overall performance of the learning algorithm you can do some [resampling](resample.html){target="_blank"}.
170
-
As usual you have to define a resampling strategy, either via `makeResampleDesc()` or `makeResampleInstance()`.
171
-
After that you can run the `resample()` function.
169
+
For evaluating the overall performance of the learning algorithm you can do some [resampling](resample.html){target="_blank"}.
170
+
As usual you have to define a resampling strategy, either via `makeResampleDesc()` or `makeResampleInstance()`.
171
+
After that you can run the `resample()` function.
172
172
Below the default measure Hamming loss is calculated.
173
173
174
174
```{r echo = FALSE, results='hide'}
@@ -204,7 +204,7 @@ r
204
204
# Binary performance
205
205
206
206
If you want to calculate a binary performance measure like, e.g., the [accuracy](measures.html){target="_blank"}, the [mmce](measures.html){target="_blank"} or the [auc](measures.html){target="_blank"} for each label, you can use function `getMultilabelBinaryPerformances()`.
207
-
You can apply this function to any multilabel prediction, e.g., also on the resample multilabel prediction.
207
+
You can apply this function to any multilabel prediction, e.g., also on the resample multilabel prediction.
208
208
For calculating the [auc](measures.html){target="_blank"} you need predicted probabilities.
0 commit comments