feat: custom ros2 interfaces benchmark #487

jmatejcz · 2025-03-27T11:09:26Z

Purpose

Extend existing tool benchmark to test custom interfaces

Proposed Changes

For now only mocks of PublishROS2MessageTool and MockGetROS2MessageInterfaceTool.
For now 1 test PublishROS2CustomMessageTask

Issues

cause of this PR -> #63
issue with openai models -> #492
issue with get interface tool not returning types -> #497

Testing

python src/rai_bench/rai_bench/examples/tool_calling_agent_test_bench.py

Check results in src/rai_bench/rai_bench/experiments/tool_calling_agent_test_bench

MagdalenaKotynia · 2025-03-27T12:28:09Z

src/rai_bench/rai_bench/tool_calling_agent_bench/mocked_tools.py

+        ):
+            return "Message published successfully"
+        else:
+            return "Failed to publish message"


I suggest returning the same message as if the original tool returned if something goes wrong. In the future we will possibly be allowing the agent to get better after it makes a mistake in calling the tool (e.g. If the agent calls tool with wrong argument and, based on the tool output, the agent will call the tool once again with the corrected argument). Therefore, we need to reflect the behaviour of the original tool.
Please take into account that the message can be different depending on what went wrong (e.g. when the wrong topic is passed, the message from the original tool may be different than if the wrong message_type is passed.)

MagdalenaKotynia · 2025-03-27T12:39:01Z

src/rai_bench/rai_bench/tool_calling_agent_bench/mocked_tools.py

+        if msg_type in self.mock_interfaces:
+            return self.mock_interfaces[msg_type]
+        else:
+            return f"Interface for {msg_type} not found."


The same as in #487 (comment)

MagdalenaKotynia · 2025-03-27T13:10:53Z

src/rai_bench/rai_bench/tool_calling_agent_bench/ros2_agent_tasks.py

+                    "topic": "/to_human",
+                    "message": {
+                        "header": {"stamp": {"sec": 0, "nanosec": 0}, "frame_id": ""},
+                        "text": "Hello!",
+                        "images": [],
+                        "audios": [],


I suggest making it a class attribute, as it is repeated twice in this task, so it will be easier to maintain the code.

jmatejcz · 2025-03-31T16:23:42Z

new issue encountered, before solving that these task doesn't have much sense, as agent can build proper messages
#497

moved mocked topics to class varaible fro easier managment

ajusted topic task prompt removed unneccesary code

fixed avialable topics values passed to mocked tool

…l subtypes and descriptions just like ros2 interface show command

moved cutom interfaces tasks to new file created common parent class

… with default values

tools validate input with pydantic models

formatting changes

added examples to system prompt

empty request handle

jmatejcz · 2025-04-24T06:43:07Z

closing, as these changes refactored and applied here:
#515

MagdalenaKotynia reviewed Mar 27, 2025

View reviewed changes

jmatejcz force-pushed the jm/feat/tool-benchmark-custom-interfaces branch from 84c5582 to ea172d6 Compare March 28, 2025 08:38

jmatejcz mentioned this pull request Mar 28, 2025

Agent does not see message argument in ToolMock #492

Open

jmatejcz force-pushed the jm/feat/tool-benchmark-custom-interfaces branch 4 times, most recently from d797495 to 0d3e7f8 Compare March 31, 2025 14:58

jmatejcz and others added 13 commits April 2, 2025 11:22

feat: publish message and get interfaces tools mock, test for publish

fef6d27

refactor: mock tools outputs and throws errors formatted like real tools

16ec2da

refactor: change from expected to available variables

9d854e5

moved mocked topics to class varaible fro easier managment

feat: add call service mock and call service task

9a2ae6e

feat: add action tools mocks

1bd2bca

feat: start action task

9fcfb35

ajusted topic task prompt removed unneccesary code

feat: publish message tasks for other messages types

e71374b

fixed avialable topics values passed to mocked tool

fix: import fixes after merge

0cb33fe

feat: mock new tools

398518e

feat: tasks for rest of the service types

c77db63

feat: GetROS2MessageInterfaceTool now return output as string with al…

962da6b

…l subtypes and descriptions just like ros2 interface show command

refactor: updated mock interfaces output to new tool

19f918f

moved cutom interfaces tasks to new file created common parent class

feat: separate parent classes for topic, service and actions tasks

8db2816

jmatejcz force-pushed the jm/feat/tool-benchmark-custom-interfaces branch 2 times, most recently from 8db2816 to fcdd601 Compare April 2, 2025 11:42

maciejmajek and others added 3 commits April 2, 2025 15:00

feat: enhance destroy_subscribers behavior (#499)

9cac642

feat: adjusted service and action tasks to parent classes

0a2f854

refactor: update tasks file

348a557

jmatejcz force-pushed the jm/feat/tool-benchmark-custom-interfaces branch from 8bdd62b to 348a557 Compare April 2, 2025 13:01

refactor: moved all tasks back into one file

5a08dde

MagdalenaKotynia and others added 11 commits April 2, 2025 16:13

refactor: actions as pydantic basemodels - for easier validation

a70f21a

feat: add pydantic models for messages

c875531

feat: verify messages based on pydantic models

4a80a84

feat: adjust HRIMessage mocks to new fields

7a3f843

refactor: all fields in models are optional now as ros fills the rest…

12eb88a

… with default values

feat: changed task validation to make it more flexible

6b41cf3

tools validate input with pydantic models

refactor: result file saves task name to make it more readable

4a2e86d

formatting changes

chore: add licenses

b9323ac

feat: add task with varios extra calls values,

235dd40

added examples to system prompt

fix: what i see service typo fix

62bfec0

empty request handle

refactor: adjust interfaces for topics and services

b676d44

jmatejcz mentioned this pull request Apr 14, 2025

refactor: rai_bench #517

Merged

jmatejcz closed this Apr 24, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat: custom ros2 interfaces benchmark #487

feat: custom ros2 interfaces benchmark #487

jmatejcz commented Mar 27, 2025 •

edited

Loading

MagdalenaKotynia Mar 27, 2025

MagdalenaKotynia Mar 27, 2025 •

edited

Loading

MagdalenaKotynia Mar 27, 2025

jmatejcz commented Mar 31, 2025

jmatejcz commented Apr 24, 2025

feat: custom ros2 interfaces benchmark #487

feat: custom ros2 interfaces benchmark #487

Conversation

jmatejcz commented Mar 27, 2025 • edited Loading

Purpose

Proposed Changes

Issues

Testing

MagdalenaKotynia Mar 27, 2025

Choose a reason for hiding this comment

MagdalenaKotynia Mar 27, 2025 • edited Loading

Choose a reason for hiding this comment

MagdalenaKotynia Mar 27, 2025

Choose a reason for hiding this comment

jmatejcz commented Mar 31, 2025

jmatejcz commented Apr 24, 2025

jmatejcz commented Mar 27, 2025 •

edited

Loading

MagdalenaKotynia Mar 27, 2025 •

edited

Loading