Skip to content

Weird bug when setting val_check_interval dynamically in setup() #20894

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
davidgill97 opened this issue Jun 11, 2025 · 4 comments
Open

Weird bug when setting val_check_interval dynamically in setup() #20894

davidgill97 opened this issue Jun 11, 2025 · 4 comments
Labels
bug Something isn't working needs triage Waiting to be triaged by maintainers ver: 2.5.x

Comments

@davidgill97
Copy link

davidgill97 commented Jun 11, 2025

Bug description

I want to dynamically set val_check_interval based on the total number of training steps. Specifically, i calculate val_check_interval using self.trainer.estimated_stepping_batches // 10 in the setup() method, aiming for 10 validations.

When i assign a constant value to self.trainer.val_check_interval, it works as expected, but when I use the dynamic calculation (self.trainer.estimated_stepping_batches // 10), it doesn't seem to work, even though the calculated value is correct and everything else is identical. I also set self.trainer.check_val_every_n_epoch=None, as per documentation.

What could be causing this weird bug, and how can I ensure that the dynamically calculated value is applied properly?

What version are you seeing the problem on?

v2.5

Reproduced in studio

No response

How to reproduce the bug

class Dummy(LightningModule):
    def setup(self, stage: str):
        if self.trainer:
            # This does not seem to change validation interval 
            self.trainer.val_check_interval = self.trainer.estimated_stepping_batches // 100
            # But this does 
            self.trainer.val_check_interval = 10

trainer.fit(dummy)

Error messages and logs

# Error messages and logs here please

Environment

Current environment
#- PyTorch Lightning Version (e.g., 2.5.0): 2.5.0.post0
#- PyTorch Version (e.g., 2.5): 2.6.0+cu126
#- Python version (e.g., 3.12): 3.10.11
#- OS (e.g., Linux): Windows 11
#- CUDA/cuDNN version: 12.8
#- GPU models and configuration:
#- How you installed Lightning(`conda`, `pip`, source): pip

More info

No response

@davidgill97 davidgill97 added bug Something isn't working needs triage Waiting to be triaged by maintainers labels Jun 11, 2025
@MrAnayDongre
Copy link

Reproduced it. It seems val_check_interval and check_val_every_n_epoch need to be set at Trainer initialization to control validation scheduling. The FitLoop configures its schedule before LightningModule.setup(stage="fit") is called, so changes within setup() don't affect the loop's validation frequency.

@davidgill97
Copy link
Author

davidgill97 commented Jun 12, 2025

Just found another internal variable val_check_batch in FitLoop, and changing its value and val_check_interval to the dynamically calculated value does the job. This behavior is still weird, as assigning a constant value to val_check_interval works as expected..

@MrAnayDongre
Copy link

Good find.
Depending on @lantiga @Borda feedback, maybe we can add this in the documentation to guide future users.

@Borda Borda changed the title Weird bug when setting val_check_interval dynamically in setup() Weird bug when setting val_check_interval dynamically in setup() Jun 13, 2025
@Borda
Copy link
Member

Borda commented Jun 13, 2025

Reproduced it. It seems val_check_interval and check_val_every_n_epoch need to be set at Trainer initialization to control validation scheduling. The FitLoop configures its schedule before LightningModule.setup(stage="fit") is called, so changes within setup() don't affect the loop's validation frequency.

Yes, that sounds reasonable to be changed

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working needs triage Waiting to be triaged by maintainers ver: 2.5.x
Projects
None yet
Development

No branches or pull requests

3 participants