Skip to content

feat: custom ros2 interfaces benchmark #487

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
wants to merge 28 commits into from

Conversation

jmatejcz
Copy link
Contributor

@jmatejcz jmatejcz commented Mar 27, 2025

Purpose

Extend existing tool benchmark to test custom interfaces

Proposed Changes

For now only mocks of PublishROS2MessageTool and MockGetROS2MessageInterfaceTool.
For now 1 test PublishROS2CustomMessageTask

Issues

cause of this PR -> #63
issue with openai models -> #492
issue with get interface tool not returning types -> #497

Testing

python src/rai_bench/rai_bench/examples/tool_calling_agent_test_bench.py

Check results in src/rai_bench/rai_bench/experiments/tool_calling_agent_test_bench

):
return "Message published successfully"
else:
return "Failed to publish message"
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I suggest returning the same message as if the original tool returned if something goes wrong. In the future we will possibly be allowing the agent to get better after it makes a mistake in calling the tool (e.g. If the agent calls tool with wrong argument and, based on the tool output, the agent will call the tool once again with the corrected argument). Therefore, we need to reflect the behaviour of the original tool.
Please take into account that the message can be different depending on what went wrong (e.g. when the wrong topic is passed, the message from the original tool may be different than if the wrong message_type is passed.)

if msg_type in self.mock_interfaces:
return self.mock_interfaces[msg_type]
else:
return f"Interface for {msg_type} not found."
Copy link
Member

@MagdalenaKotynia MagdalenaKotynia Mar 27, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The same as in #487 (comment)

Comment on lines 1659 to 1664
"topic": "/to_human",
"message": {
"header": {"stamp": {"sec": 0, "nanosec": 0}, "frame_id": ""},
"text": "Hello!",
"images": [],
"audios": [],
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I suggest making it a class attribute, as it is repeated twice in this task, so it will be easier to maintain the code.

@jmatejcz jmatejcz force-pushed the jm/feat/tool-benchmark-custom-interfaces branch from 84c5582 to ea172d6 Compare March 28, 2025 08:38
@jmatejcz jmatejcz force-pushed the jm/feat/tool-benchmark-custom-interfaces branch 4 times, most recently from d797495 to 0d3e7f8 Compare March 31, 2025 14:58
@jmatejcz
Copy link
Contributor Author

new issue encountered, before solving that these task doesn't have much sense, as agent can build proper messages
#497

@jmatejcz jmatejcz force-pushed the jm/feat/tool-benchmark-custom-interfaces branch 2 times, most recently from 8db2816 to fcdd601 Compare April 2, 2025 11:42
@jmatejcz jmatejcz force-pushed the jm/feat/tool-benchmark-custom-interfaces branch from 8bdd62b to 348a557 Compare April 2, 2025 13:01
@jmatejcz jmatejcz mentioned this pull request Apr 14, 2025
@jmatejcz
Copy link
Contributor Author

closing, as these changes refactored and applied here:
#515

@jmatejcz jmatejcz closed this Apr 24, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants