You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Hi tocom242242, sorry to interrupt you, I found you did really great works on MARL and I followed your GitHub recently. I’m interested in this minimax_q_learning repo, may I ask you a quick question about “state”?
It seems like in your code there is only one state, which is the default state “nonstate”, and you set dict to save q, pie and v matrixes for each state separately. It runs correctly when there is one state, but when I tried multiple states, I don’t know how it can influence the output. I was wondering if this “state” is the same as the state of q learning since I guess the state in q learning is something like a combination of the opponent’s previous action and my previous action. Based on the state S(a,a’) , the Q matrix can tell in this state S1 if I choose my action a1, the Q value would be xxx; in this state S1 if I choose action a2, the Q value would be yyy. But when I try to understand the state in your repo, it seems each state has a Q matrix, and the state is only determined by the opponent’s action.
I would be grateful if you could let me know how the state works, I really appreciate that!
The text was updated successfully, but these errors were encountered:
Hi tocom242242, sorry to interrupt you, I found you did really great works on MARL and I followed your GitHub recently. I’m interested in this minimax_q_learning repo, may I ask you a quick question about “state”?
It seems like in your code there is only one state, which is the default state “nonstate”, and you set dict to save q, pie and v matrixes for each state separately. It runs correctly when there is one state, but when I tried multiple states, I don’t know how it can influence the output. I was wondering if this “state” is the same as the state of q learning since I guess the state in q learning is something like a combination of the opponent’s previous action and my previous action. Based on the state S(a,a’) , the Q matrix can tell in this state S1 if I choose my action a1, the Q value would be xxx; in this state S1 if I choose action a2, the Q value would be yyy. But when I try to understand the state in your repo, it seems each state has a Q matrix, and the state is only determined by the opponent’s action.
I would be grateful if you could let me know how the state works, I really appreciate that!
The text was updated successfully, but these errors were encountered: