Open
Description
Hi, I'm Junmo Cho.
I've read the paper which was pretty interesting. Sorry for taking your time, but while running the code, I've got some questions.
- Is minus of binary_cross_entropy between img, and pred_img coming from assuming the reward distribution as Bernoulli distribution? I thought that for each pixel in img (which is gt target, and value is 1 or 0) is used as Ber(y|pi) = pi^y * (1-pi)^(1-y) where y is pixel and from for each pixel dist in pred_img, we input it for pi. Please correct me if my understanding is wrong.
- Another thing is why do we divide steps (which is length of generation sequence of GFN - 16 here) for logprobs, and reward when calculating TB loss? I thought that logprobs is itself log of production of P_F(s_i | s_{i-1}) from i=1 to n as in the paper.
- Also, why there is no backward policy term in the TB loss? Are we assuming backward policy as uniform and involve it in logZ?
It would be grateful if I can have some answers! Thanks.
Metadata
Metadata
Assignees
Labels
No labels