-
Notifications
You must be signed in to change notification settings - Fork 39
feat: custom ros2 interfaces benchmark #487
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
): | ||
return "Message published successfully" | ||
else: | ||
return "Failed to publish message" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I suggest returning the same message as if the original tool returned if something goes wrong. In the future we will possibly be allowing the agent to get better after it makes a mistake in calling the tool (e.g. If the agent calls tool with wrong argument and, based on the tool output, the agent will call the tool once again with the corrected argument). Therefore, we need to reflect the behaviour of the original tool.
Please take into account that the message can be different depending on what went wrong (e.g. when the wrong topic is passed, the message from the original tool may be different than if the wrong message_type is passed.)
if msg_type in self.mock_interfaces: | ||
return self.mock_interfaces[msg_type] | ||
else: | ||
return f"Interface for {msg_type} not found." |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The same as in #487 (comment)
"topic": "/to_human", | ||
"message": { | ||
"header": {"stamp": {"sec": 0, "nanosec": 0}, "frame_id": ""}, | ||
"text": "Hello!", | ||
"images": [], | ||
"audios": [], |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I suggest making it a class attribute, as it is repeated twice in this task, so it will be easier to maintain the code.
84c5582
to
ea172d6
Compare
d797495
to
0d3e7f8
Compare
new issue encountered, before solving that these task doesn't have much sense, as agent can build proper messages |
moved mocked topics to class varaible fro easier managment
ajusted topic task prompt removed unneccesary code
fixed avialable topics values passed to mocked tool
…l subtypes and descriptions just like ros2 interface show command
moved cutom interfaces tasks to new file created common parent class
8db2816
to
fcdd601
Compare
8bdd62b
to
348a557
Compare
… with default values
tools validate input with pydantic models
formatting changes
added examples to system prompt
empty request handle
closing, as these changes refactored and applied here: |
Purpose
Extend existing tool benchmark to test custom interfaces
Proposed Changes
For now only mocks of PublishROS2MessageTool and MockGetROS2MessageInterfaceTool.
For now 1 test PublishROS2CustomMessageTask
Issues
cause of this PR -> #63
issue with openai models -> #492
issue with get interface tool not returning types -> #497
Testing
Check results in
src/rai_bench/rai_bench/experiments/tool_calling_agent_test_bench