Skip to content

visualisation of benchmark results #530

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
jmatejcz opened this issue Apr 16, 2025 · 6 comments
Open

visualisation of benchmark results #530

jmatejcz opened this issue Apr 16, 2025 · 6 comments
Assignees
Labels
enhancement New feature or request priority/major Important work that comes next after all critical and blocking tasks are completed.

Comments

@jmatejcz
Copy link
Contributor

jmatejcz commented Apr 16, 2025

Is your feature request related to a problem? Please describe.

Describe the solution you'd like
Upgrade and unify gathering results from benchmarks
Add python script showing different charts

Describe alternatives you've considered

Additional context

@jmatejcz jmatejcz added the enhancement New feature or request label Apr 16, 2025
@maciejmajek maciejmajek added the priority/major Important work that comes next after all critical and blocking tasks are completed. label Apr 17, 2025
@jmatejcz
Copy link
Contributor Author

jmatejcz commented Apr 23, 2025

  • add task categories to results in tool_agent_benchmark - manipulation, spatial, etc..

@jmatejcz
Copy link
Contributor Author

jmatejcz commented Apr 23, 2025

  • how to assign error to subtask in all validators?

@jmatejcz
Copy link
Contributor Author

jmatejcz commented Apr 24, 2025

  • group by complexity of tasks
  • extra tool calls used by model/task/subtask ?

@jmatejcz
Copy link
Contributor Author

jmatejcz commented Apr 28, 2025

  • update unit tests to changed validators and subtasks

@jmatejcz
Copy link
Contributor Author

jmatejcz commented Apr 29, 2025

  • restructure new code
  • README update

@jmatejcz
Copy link
Contributor Author

jmatejcz commented Apr 29, 2025

  • integrate manipulation o3de results same way
  • computing summary results after every step, in case early stop

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request priority/major Important work that comes next after all critical and blocking tasks are completed.
Projects
None yet
Development

No branches or pull requests

2 participants