Skip to content

issue when legal actions mask is dependant on current player #39

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
AdamLang96 opened this issue Sep 24, 2023 · 3 comments
Open

issue when legal actions mask is dependant on current player #39

AdamLang96 opened this issue Sep 24, 2023 · 3 comments

Comments

@AdamLang96
Copy link

I have a custom environment where the legal actions depend on the state of the board and the current player , and when I try to train my first agent the legal_actions mask isn't computed correctly for the agent, but it is for the opponent. Im guessing the issue comes from the code below (found in SelfPlayWrapper). Since the legal_actions depend on current_player_num and agent_player_num != current_player_num it can not calculate the correct mask for the agent. Please let me know if you have any ideas on how to fix this

  def continue_game(self):
            observation = None
            reward = None
            done = None
            while self.current_player_num != self.agent_player_num:
                action = self.current_agent.choose_action(self, choose_best_action = False, mask_invalid_actions = True)
                observation, reward, done, _ = super(SelfPlayEnv, self).step(action)
                logger.debug(f'Rewards: {reward}')
                logger.debug(f'Done: {done}')
                if done:
                    break

            return observation, reward, done, None
@laymelek
Copy link

laymelek commented Nov 8, 2023

Did you found a solution to this? I have the same problem when running Test. On the other hand while running Train, my agent does not care about the legal_actions what so ever... it doesnt call it at all and just chooses a random action num

@AdamLang96
Copy link
Author

Did you found a solution to this? I have the same problem when running Test. On the other hand while running Train, my agent does not care about the legal_actions what so ever... it doesnt call it at all and just chooses a random action num

Yeah this is my exact issue. Haven't found a solution yet

@sakapadia
Copy link

anyone find a solution?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants