Multiple outputs for regression task #1296

tdhock · 2025-04-25T08:15:17Z

Following up from mlr-org/mlr3torch#385 (review) I would like to request addition of a feature to support regression tasks with multiple targets / outputs (several columns to predict, not just one).
@sebffischer

sebffischer · 2025-04-25T08:55:59Z

Thanks @tdhock. This would probably be a relatively large change, so this needs to be discussed in depth.

tdhock · 2025-04-25T09:17:00Z

I can see two different ways forward:

change the existing TaskRegr (single output is a special case so hopefully back compatible)
create a new TaskRegrMulti

I do have a current project for which this would be useful.

tdhock · 2025-04-25T09:22:43Z

Currently I get an error when instantiating the task:

N_row <- 100
D_in <- 10
D_out <- 2
set.seed(1)
df <- data.frame(
  feature=matrix(rnorm(N_row*D_in), N_row, D_in),
  target=matrix(rnorm(N_row*D_out), N_row, D_out))
df[1,]
reg_task <- mlr3::TaskRegr$new(
  "example", df, target=paste0("target.", 1:D_out))

I got:

> df[1,]
   feature.1  feature.2 feature.3 feature.4 feature.5  feature.6 feature.7
1 -0.6264538 -0.6203667 0.4094018 0.8936737  1.074441 0.07730312 -0.341067
   feature.8 feature.9 feature.10 target.1  target.2
1 -0.7075682 -1.086909  -1.541403 1.134965 0.2418959
> reg_task <- mlr3::TaskRegr$new(
+ "example", df, target=paste0("target.", 1:D_out))
Erreur dans .__TaskRegr__initialize(self = self, private = private, super = super,  : 
  Assertion on 'target' failed: Must have length 1.
> sessionInfo()
R version 4.5.0 (2025-04-11)
Platform: x86_64-pc-linux-gnu
Running under: Ubuntu 24.04.2 LTS

Matrix products: default
BLAS:   /usr/lib/x86_64-linux-gnu/blas/libblas.so.3.12.0 
LAPACK: /usr/lib/x86_64-linux-gnu/lapack/liblapack.so.3.12.0  LAPACK version 3.12.0

locale:
 [1] LC_CTYPE=fr_FR.UTF-8       LC_NUMERIC=C              
 [3] LC_TIME=fr_FR.UTF-8        LC_COLLATE=fr_FR.UTF-8    
 [5] LC_MONETARY=fr_FR.UTF-8    LC_MESSAGES=fr_FR.UTF-8   
 [7] LC_PAPER=fr_FR.UTF-8       LC_NAME=C                 
 [9] LC_ADDRESS=C               LC_TELEPHONE=C            
[11] LC_MEASUREMENT=fr_FR.UTF-8 LC_IDENTIFICATION=C       

time zone: Europe/Paris
tzcode source: system (glibc)

attached base packages:
[1] stats     graphics  grDevices utils     datasets  methods   base     

loaded via a namespace (and not attached):
 [1] digest_0.6.37        backports_1.5.0      R6_2.6.1            
 [4] codetools_0.2-20     lgr_0.4.4            parallel_4.5.0      
 [7] palmerpenguins_0.1.1 mlr3misc_0.16.0      parallelly_1.43.0   
[10] future_1.34.0        mlr3_0.23.0          data.table_1.17.0   
[13] compiler_4.5.0       paradox_1.0.1        globals_0.16.3      
[16] tools_4.5.0          checkmate_2.3.2      listenv_0.9.1       
[19] crayon_1.5.3         uuid_1.2-1

tdhock · 2025-04-25T09:25:48Z

for comparison, scikit learn has support for some learners which are natively multi-output https://scikit-learn.org/stable/modules/generated/sklearn.linear_model.MultiTaskLasso.html

and this adaptor class for converting single-output regression learner to multi-output https://scikit-learn.org/stable/modules/generated/sklearn.multioutput.MultiOutputRegressor.html

sebffischer · 2025-04-25T09:31:55Z

So this feature was at some point already on the roadmap, we just never got around to implementing it.
And yes, we could either modify the existing TaskRegr (and give Learners a property akin to the "two-class" and "multi-class" property for binary classification) or add a new task type.
In either case, we would at least have to:

go over all mlr3::MeasureRegrs and adjust them to also work for multi-output regression.
check all places in the code where we are assuming there is exactly one regression target.

Regarding the conversion of single-output regression to multi-output:
We are working on something similar for time-series classification (where we re-train the same learner for different time horizons) here: https://github.com/mlr-org/mlr3forecast (cc @m-muecke), so we will possible already have some code for this.

What exactly are you trying to do in your current project? Maybe we can find an easy workaround for now.

tdhock · 2025-04-25T11:00:11Z

in the current project we can do a work-around by making one TaskRegr for each output, and one single-task model for each output (with a single-task measure like MSE for each). but this is sub-optimal for two reasons:

can't use learners which natively support multiple outputs
can't use measures which are defined on multiple outputs

sebffischer · 2025-04-25T11:06:59Z

@berndbischl @mb706 @be-marc

sebffischer added Type: Enhancement Status: Discussion Needed labels Apr 25, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Multiple outputs for regression task #1296

Multiple outputs for regression task #1296

tdhock commented Apr 25, 2025

sebffischer commented Apr 25, 2025

tdhock commented Apr 25, 2025

tdhock commented Apr 25, 2025

tdhock commented Apr 25, 2025

sebffischer commented Apr 25, 2025 •

edited

Loading

tdhock commented Apr 25, 2025

sebffischer commented Apr 25, 2025

Multiple outputs for regression task #1296

Multiple outputs for regression task #1296

Comments

tdhock commented Apr 25, 2025

sebffischer commented Apr 25, 2025

tdhock commented Apr 25, 2025

tdhock commented Apr 25, 2025

tdhock commented Apr 25, 2025

sebffischer commented Apr 25, 2025 • edited Loading

tdhock commented Apr 25, 2025

sebffischer commented Apr 25, 2025

sebffischer commented Apr 25, 2025 •

edited

Loading