Description
Hi, your work is outstanding, but I have a question.
In the file gaussian_diffusion.py, line 36 as follow:
elif schedule_name == 'warmup-decay':
warmup_steps = max(1, int(warmup_steps_ratio * num_diffusion_timesteps))
sqrt_steps = get_named_beta_schedule('sqrt', num_diffusion_timesteps)
beta_mid = sqrt_steps[-warmup_steps]
warmup = np.linspace(beta_mid, 0.0001, warmup_steps)
return np.concatenate([sqrt_steps[:-warmup_steps], warmup])
Why should the beta values of the last steps be reduced to 0.0001 and what is the benefit?
shouldn't the warm-up start from the very beginning, namely increasing to 0.0001, like this
elif schedule_name == 'warmup-decay':
warmup_steps = max(1, int(warmup_steps_ratio * num_diffusion_timesteps))
warmup = np.linspace(0.0001, sqrt_steps[warmup_steps], warmup_steps)
return np.concatenate([warmup, sqrt_steps[warmup_steps:]])