Skip to content

How hoes “state” influences the output? #2

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
AmuroPeng opened this issue Apr 18, 2021 · 0 comments
Open

How hoes “state” influences the output? #2

AmuroPeng opened this issue Apr 18, 2021 · 0 comments

Comments

@AmuroPeng
Copy link

Hi tocom242242, sorry to interrupt you, I found you did really great works on MARL and I followed your GitHub recently. I’m interested in this minimax_q_learning repo, may I ask you a quick question about “state”?

It seems like in your code there is only one state, which is the default state “nonstate”, and you set dict to save q, pie and v matrixes for each state separately. It runs correctly when there is one state, but when I tried multiple states, I don’t know how it can influence the output. I was wondering if this “state” is the same as the state of q learning since I guess the state in q learning is something like a combination of the opponent’s previous action and my previous action. Based on the state S(a,a’) , the Q matrix can tell in this state S1 if I choose my action a1, the Q value would be xxx; in this state S1 if I choose action a2, the Q value would be yyy. But when I try to understand the state in your repo, it seems each state has a Q matrix, and the state is only determined by the opponent’s action.

I would be grateful if you could let me know how the state works, I really appreciate that!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant