Skip to content

Multiple outputs for regression task #1296

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
tdhock opened this issue Apr 25, 2025 · 7 comments
Open

Multiple outputs for regression task #1296

tdhock opened this issue Apr 25, 2025 · 7 comments

Comments

@tdhock
Copy link
Contributor

tdhock commented Apr 25, 2025

Following up from mlr-org/mlr3torch#385 (review) I would like to request addition of a feature to support regression tasks with multiple targets / outputs (several columns to predict, not just one).
@sebffischer

@sebffischer
Copy link
Member

Thanks @tdhock. This would probably be a relatively large change, so this needs to be discussed in depth.

@tdhock
Copy link
Contributor Author

tdhock commented Apr 25, 2025

I can see two different ways forward:

  • change the existing TaskRegr (single output is a special case so hopefully back compatible)
  • create a new TaskRegrMulti

I do have a current project for which this would be useful.

@tdhock
Copy link
Contributor Author

tdhock commented Apr 25, 2025

Currently I get an error when instantiating the task:

N_row <- 100
D_in <- 10
D_out <- 2
set.seed(1)
df <- data.frame(
  feature=matrix(rnorm(N_row*D_in), N_row, D_in),
  target=matrix(rnorm(N_row*D_out), N_row, D_out))
df[1,]
reg_task <- mlr3::TaskRegr$new(
  "example", df, target=paste0("target.", 1:D_out))

I got:

> df[1,]
   feature.1  feature.2 feature.3 feature.4 feature.5  feature.6 feature.7
1 -0.6264538 -0.6203667 0.4094018 0.8936737  1.074441 0.07730312 -0.341067
   feature.8 feature.9 feature.10 target.1  target.2
1 -0.7075682 -1.086909  -1.541403 1.134965 0.2418959
> reg_task <- mlr3::TaskRegr$new(
+ "example", df, target=paste0("target.", 1:D_out))
Erreur dans .__TaskRegr__initialize(self = self, private = private, super = super,  : 
  Assertion on 'target' failed: Must have length 1.
> sessionInfo()
R version 4.5.0 (2025-04-11)
Platform: x86_64-pc-linux-gnu
Running under: Ubuntu 24.04.2 LTS

Matrix products: default
BLAS:   /usr/lib/x86_64-linux-gnu/blas/libblas.so.3.12.0 
LAPACK: /usr/lib/x86_64-linux-gnu/lapack/liblapack.so.3.12.0  LAPACK version 3.12.0

locale:
 [1] LC_CTYPE=fr_FR.UTF-8       LC_NUMERIC=C              
 [3] LC_TIME=fr_FR.UTF-8        LC_COLLATE=fr_FR.UTF-8    
 [5] LC_MONETARY=fr_FR.UTF-8    LC_MESSAGES=fr_FR.UTF-8   
 [7] LC_PAPER=fr_FR.UTF-8       LC_NAME=C                 
 [9] LC_ADDRESS=C               LC_TELEPHONE=C            
[11] LC_MEASUREMENT=fr_FR.UTF-8 LC_IDENTIFICATION=C       

time zone: Europe/Paris
tzcode source: system (glibc)

attached base packages:
[1] stats     graphics  grDevices utils     datasets  methods   base     

loaded via a namespace (and not attached):
 [1] digest_0.6.37        backports_1.5.0      R6_2.6.1            
 [4] codetools_0.2-20     lgr_0.4.4            parallel_4.5.0      
 [7] palmerpenguins_0.1.1 mlr3misc_0.16.0      parallelly_1.43.0   
[10] future_1.34.0        mlr3_0.23.0          data.table_1.17.0   
[13] compiler_4.5.0       paradox_1.0.1        globals_0.16.3      
[16] tools_4.5.0          checkmate_2.3.2      listenv_0.9.1       
[19] crayon_1.5.3         uuid_1.2-1          

@tdhock
Copy link
Contributor Author

tdhock commented Apr 25, 2025

for comparison, scikit learn has support for some learners which are natively multi-output https://scikit-learn.org/stable/modules/generated/sklearn.linear_model.MultiTaskLasso.html

and this adaptor class for converting single-output regression learner to multi-output https://scikit-learn.org/stable/modules/generated/sklearn.multioutput.MultiOutputRegressor.html

@sebffischer
Copy link
Member

sebffischer commented Apr 25, 2025

So this feature was at some point already on the roadmap, we just never got around to implementing it.
And yes, we could either modify the existing TaskRegr (and give Learners a property akin to the "two-class" and "multi-class" property for binary classification) or add a new task type.
In either case, we would at least have to:

  • go over all mlr3::MeasureRegrs and adjust them to also work for multi-output regression.
  • check all places in the code where we are assuming there is exactly one regression target.

Regarding the conversion of single-output regression to multi-output:
We are working on something similar for time-series classification (where we re-train the same learner for different time horizons) here: https://github.com/mlr-org/mlr3forecast (cc @m-muecke), so we will possible already have some code for this.


What exactly are you trying to do in your current project? Maybe we can find an easy workaround for now.

@tdhock
Copy link
Contributor Author

tdhock commented Apr 25, 2025

in the current project we can do a work-around by making one TaskRegr for each output, and one single-task model for each output (with a single-task measure like MSE for each). but this is sub-optimal for two reasons:

  • can't use learners which natively support multiple outputs
  • can't use measures which are defined on multiple outputs

@sebffischer
Copy link
Member

@berndbischl @mb706 @be-marc

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants