You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
I have a custom environment where the legal actions depend on the state of the board and the current player , and when I try to train my first agent the legal_actions mask isn't computed correctly for the agent, but it is for the opponent. Im guessing the issue comes from the code below (found in SelfPlayWrapper). Since the legal_actions depend on current_player_num and agent_player_num != current_player_num it can not calculate the correct mask for the agent. Please let me know if you have any ideas on how to fix this
Did you found a solution to this? I have the same problem when running Test. On the other hand while running Train, my agent does not care about the legal_actions what so ever... it doesnt call it at all and just chooses a random action num
Did you found a solution to this? I have the same problem when running Test. On the other hand while running Train, my agent does not care about the legal_actions what so ever... it doesnt call it at all and just chooses a random action num
Yeah this is my exact issue. Haven't found a solution yet
I have a custom environment where the legal actions depend on the state of the board and the current player , and when I try to train my first agent the
legal_actions
mask isn't computed correctly for the agent, but it is for the opponent. Im guessing the issue comes from the code below (found in SelfPlayWrapper). Since thelegal_actions
depend oncurrent_player_num
andagent_player_num != current_player_num
it can not calculate the correct mask for the agent. Please let me know if you have any ideas on how to fix thisThe text was updated successfully, but these errors were encountered: