Thanks for a very interesting work. Have you tried deepseek models on your benchmark? Would be interesting to see the results.