Skip to content

Commit ae49685

Browse files
authored
feat: add bridge mode for extension (#228)
1 parent bacfef0 commit ae49685

Some content is hidden

Large Commits have some content hidden by default. Use the searchbox below for content that may be hidden.

50 files changed

+1831
-279
lines changed

README.md

-1
Original file line numberDiff line numberDiff line change
@@ -2,7 +2,6 @@
22
<img alt="Midscene.js" width="260" src="https://github.com/user-attachments/assets/f60de3c1-dd6f-4213-97a1-85bf7c6e79e4">
33
</p>
44

5-
65
<h1 align="center">Midscene.js</h1>
76
<div align="center">
87

Original file line numberDiff line numberDiff line change
@@ -0,0 +1,94 @@
1+
# Bridge Mode by Chrome Extension
2+
3+
import { PackageManagerTabs } from '@theme';
4+
5+
The bridge mode in the Midscene Chrome extension is a tool that allows you to use local scripts to control the desktop version of Chrome. Your scripts can connect to either a new tab or the currently active tab.
6+
7+
Using the desktop version of Chrome allows you to reuse all cookies, plugins, page status, and everything else you want. You can work with automation scripts to complete your tasks. This mode is commonly referred to as 'man-in-the-loop' in the context of automation.
8+
9+
![bridge mode](/midscene-bridge-mode.jpg)
10+
11+
:::info Demo Project
12+
you can check the demo project of bridge mode here: [https://github.com/web-infra-dev/midscene-example/blob/main/bridge-mode-demo](https://github.com/web-infra-dev/midscene-example/blob/main/bridge-mode-demo)
13+
:::
14+
15+
## Preparation
16+
17+
Install [Midscene extension from Chrome web store](https://chromewebstore.google.com/detail/midscene/gbldofcpkknbggpkmbdaefngejllnief). We will use it later.
18+
19+
## Step 1. install dependencies
20+
21+
<PackageManagerTabs command="install @midscene/web tsx --save-dev" />
22+
23+
## Step 2. write scripts
24+
25+
Write and save the following code as `./demo-new-tab.ts`.
26+
27+
```typescript
28+
import { AgentOverChromeBridge } from "@midscene/web/bridge-mode";
29+
30+
const sleep = (ms) => new Promise((r) => setTimeout(r, ms));
31+
Promise.resolve(
32+
(async () => {
33+
const agent = new AgentOverChromeBridge();
34+
35+
// This will connect to a new tab on your desktop Chrome
36+
// remember to start your chrome extension, click 'allow connection' button. Otherwise you will get an timeout error
37+
await agent.connectNewTabWithUrl("https://www.bing.com");
38+
39+
// these are the same as normal Midscene agent
40+
await agent.ai('type "AI 101" and hit Enter');
41+
await sleep(3000);
42+
43+
await agent.aiAssert("there are some search results");
44+
await agent.destroy();
45+
})()
46+
);
47+
```
48+
49+
## Step 3. run
50+
51+
Launch your desktop Chrome. Start Midscene extension and switch to 'Bridge Mode' tab. Click "Allow connection".
52+
53+
Run your scripts
54+
55+
```bash
56+
tsx demo-new-tab.ts
57+
```
58+
59+
After executing the script, you should see the status of the Chrome extension switched to 'connected', and a new tab has been opened. Now this tab is controlled by your scripts.
60+
61+
:::info
62+
⁠Whether the scripts are run before or after clicking 'Allow connection' in the browser is not significant.
63+
:::
64+
65+
## API
66+
67+
Except [the normal agent interface](./api), `AgentOverChromeBridge` provides some other interfaces to control the desktop Chrome.
68+
69+
:::info
70+
You should always call `connectCurrentTab` or `connectNewTabWithUrl` before doing further actions.
71+
72+
Each of the agent instance can only connect to one tab instance, and it cannot be reconnected after destroy.
73+
:::
74+
75+
### `connectCurrentTab`
76+
77+
Connect to the current active tab on Chrome.
78+
79+
### `connectNewTabWithUrl(ur: string)`
80+
81+
Create a new tab with url and connect to immediately.
82+
83+
### `destroy`
84+
85+
Destroy the connection.
86+
87+
## Use bridge mode in yaml-script
88+
89+
We are still building this, and it will be ready soon.
90+
91+
92+
93+
94+

apps/site/docs/en/index.mdx

+3-2
Original file line numberDiff line numberDiff line change
@@ -46,11 +46,12 @@ await aiAssert("There is a category filter on the left");
4646

4747
## Multiple ways to integrate
4848

49-
To start experiencing the core feature of Midscene, we recommend you use [The Chrome Extension](./quick-experience). You can call Action / Query / Assert by natural language on any webpage, without needing to set up a code project.
49+
To start experiencing the core feature of Midscene, we recommend you use [the Chrome Extension](./quick-experience). You can call Action / Query / Assert by natural language on any webpage, without needing to set up a code project.
5050

5151
Also, there are several ways to integrate Midscene into your code project:
5252

53-
* [Automate with Scripts in YAML](./automate-with-scripts-in-yaml)
53+
* [Automate with Scripts in YAML](./automate-with-scripts-in-yaml), use this if you prefer to write YAML file instead of code
54+
* [Bridge Mode by Chrome Extension](./bridge-mode-by-chrome-extension), use this to control the desktop Chrome by scripts
5455
* [Integrate with Puppeteer](./integrate-with-puppeteer)
5556
* [Integrate with Playwright](./integrate-with-playwright)
5657

apps/site/docs/en/model-provider.md

+11
Original file line numberDiff line numberDiff line change
@@ -38,7 +38,9 @@ export OPENAI_MAX_TOKENS=2048
3838
Use ADT token provider
3939

4040
```bash
41+
# this is always true when using Azure OpenAI Service
4142
export MIDSCENE_USE_AZURE_OPENAI=1
43+
4244
export MIDSCENE_AZURE_OPENAI_SCOPE="https://cognitiveservices.azure.com/.default"
4345
export AZURE_OPENAI_ENDPOINT="..."
4446
export AZURE_OPENAI_API_VERSION="2024-05-01-preview"
@@ -110,6 +112,15 @@ export OPENAI_API_KEY="..."
110112
export MIDSCENE_MODEL_NAME="ep-202....."
111113
```
112114

115+
## Example: config request headers (like for openrouter)
116+
117+
```bash
118+
export OPENAI_BASE_URL="https://openrouter.ai/api/v1"
119+
export OPENAI_API_KEY="..."
120+
export MIDSCENE_MODEL_NAME="..."
121+
export MIDSCENE_OPENAI_INIT_CONFIG_JSON='{"defaultHeaders":{"HTTP-Referer":"...","X-Title":"..."}}'
122+
```
123+
113124
## Troubleshooting LLM Service Connectivity Issues
114125

115126
If you want to troubleshoot connectivity issues, you can use the 'connectivity-test' folder in our example project: [https://github.com/web-infra-dev/midscene-example/tree/main/connectivity-test](https://github.com/web-infra-dev/midscene-example/tree/main/connectivity-test)
232 KB
Loading
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,94 @@
1+
# 使用 Chrome 插件的桥接模式(Bridge Mode)
2+
3+
import { PackageManagerTabs } from '@theme';
4+
5+
使用 Midscene 的 Chrome 插件的桥接模式,你可以用本地脚本控制桌面版本的 Chrome。你的脚本可以连接到新标签页或当前已激活的标签页。
6+
7+
使用桌面版本的 Chrome 可以让你复用已有的 cookie、插件、页面状态等。你可以使用自动化脚本与操作者互动,来完成你的任务。
8+
9+
![bridge mode](/midscene-bridge-mode.jpg)
10+
11+
:::info Demo Project
12+
you can check the demo project of bridge mode here: [https://github.com/web-infra-dev/midscene-example/blob/main/bridge-mode-demo](https://github.com/web-infra-dev/midscene-example/blob/main/bridge-mode-demo)
13+
:::
14+
15+
## 准备工作
16+
17+
安装 [Midscene 插件](https://chromewebstore.google.com/detail/midscene/gbldofcpkknbggpkmbdaefngejllnief)
18+
19+
## 第一步:安装依赖
20+
21+
<PackageManagerTabs command="install @midscene/web tsx --save-dev" />
22+
23+
## 第二步:编写脚本
24+
25+
编写并保存以下代码为 `./demo-new-tab.ts`
26+
27+
```typescript
28+
import { AgentOverChromeBridge } from "@midscene/web/bridge-mode";
29+
30+
const sleep = (ms) => new Promise((r) => setTimeout(r, ms));
31+
Promise.resolve(
32+
(async () => {
33+
const agent = new AgentOverChromeBridge();
34+
35+
// 这个方法将连接到你的桌面 Chrome 的新标签页
36+
// 记得启动你的 Chrome 插件,并点击 'allow connection' 按钮。否则你会得到一个 timeout 错误
37+
await agent.connectNewTabWithUrl("https://www.bing.com");
38+
39+
// 这些方法与普通 Midscene agent 相同
40+
await agent.ai('type "AI 101" and hit Enter');
41+
await sleep(3000);
42+
43+
await agent.aiAssert("there are some search results");
44+
await agent.destroy();
45+
})()
46+
);
47+
```
48+
49+
## 第三步:运行脚本
50+
51+
启动你的桌面 Chrome。启动 Midscene 插件,并切换到 'Bridge Mode' 标签页。点击 "Allow connection"。
52+
53+
运行你的脚本
54+
55+
```bash
56+
tsx demo-new-tab.ts
57+
```
58+
59+
执行脚本后,你应该看到 Chrome 插件的状态展示切换为 'connected',并且新标签页已打开。现在这个标签页由你的脚本控制。
60+
61+
:::info
62+
执行脚本和点击插件中的 'Allow connection' 按钮没有顺序要求。
63+
:::
64+
65+
## API
66+
67+
除了 [普通的 agent 接口](./api)`AgentOverChromeBridge` 还提供了一些额外的接口来控制桌面 Chrome。
68+
69+
:::info
70+
你应该在执行其他操作前,先调用 `connectCurrentTab``connectNewTabWithUrl`
71+
72+
每个 agent 实例只能连接到一个标签页实例,并且一旦被销毁,就无法重新连接。
73+
:::
74+
75+
### `connectCurrentTab`
76+
77+
连接到当前已激活的标签页。
78+
79+
### `connectNewTabWithUrl(ur: string)`
80+
81+
创建一个新标签页,并立即连接到它。
82+
83+
### `destroy`
84+
85+
销毁连接。
86+
87+
## 在 YAML 脚本中使用桥接模式
88+
89+
这个功能正在开发中,很快就会与你见面。
90+
91+
92+
93+
94+

apps/site/docs/zh/index.mdx

+2-1
Original file line numberDiff line numberDiff line change
@@ -37,7 +37,8 @@ console.log("headphones in stock", items);
3737

3838
此外,还有几种形式将 Midscene 集成到代码:
3939

40-
* [使用 YAML 格式的自动化脚本](./automate-with-scripts-in-yaml)
40+
* [使用 YAML 格式的自动化脚本](./automate-with-scripts-in-yaml),如果你更喜欢写 YAML 文件而不是代码
41+
* [使用 Chrome 插件的桥接模式](./bridge-mode-by-chrome-extension),用它来通过脚本控制桌面 Chrome
4142
* [集成到 Puppeteer](./integrate-with-puppeteer)
4243
* [集成到 Playwright](./integrate-with-playwright)
4344

apps/site/docs/zh/model-provider.md

+2
Original file line numberDiff line numberDiff line change
@@ -35,7 +35,9 @@ export OPENAI_MAX_TOKENS=2048
3535
使用 ADT token provider
3636

3737
```bash
38+
# 使用 Azure OpenAI 服务时,配置为 1
3839
export MIDSCENE_USE_AZURE_OPENAI=1
40+
3941
export MIDSCENE_AZURE_OPENAI_SCOPE="https://cognitiveservices.azure.com/.default"
4042
export AZURE_OPENAI_ENDPOINT="..."
4143
export AZURE_OPENAI_API_VERSION="2024-05-01-preview"

apps/site/rspress.config.ts

+8-10
Original file line numberDiff line numberDiff line change
@@ -26,16 +26,6 @@ export default defineConfig({
2626
'https://applink.larkoffice.com/client/chat/chatter/add_by_link?link_token=291q2b25-e913-411a-8c51-191e59aab14d',
2727
},
2828
],
29-
// footer: {
30-
// message: `
31-
// <footer class="footer">
32-
// <div class="footer-content">
33-
// <img src="/midscene-icon.png" alt="Midscene.js Logo" class="footer-logo" />
34-
// <p class="footer-text">&copy; 2024 Midscene.js. All Rights Reserved.</p>
35-
// </div>
36-
// </footer>
37-
// `,
38-
// },
3929
locales: [
4030
{
4131
lang: 'en',
@@ -70,6 +60,10 @@ export default defineConfig({
7060
text: 'Automate with Scripts in YAML',
7161
link: '/automate-with-scripts-in-yaml',
7262
},
63+
{
64+
text: 'Bridge Mode by Chrome Extension',
65+
link: '/bridge-mode-by-chrome-extension',
66+
},
7367
{
7468
text: 'Integrate with Playwright',
7569
link: '/integrate-with-playwright',
@@ -127,6 +121,10 @@ export default defineConfig({
127121
text: '使用 YAML 格式的自动化脚本',
128122
link: '/zh/automate-with-scripts-in-yaml',
129123
},
124+
{
125+
text: '使用 Chrome 插件的桥接模式(Bridge Mode)',
126+
link: '/zh/bridge-mode-by-chrome-extension',
127+
},
130128
{
131129
text: '集成到 Playwright',
132130
link: '/zh/integrate-with-playwright',

nx.json

+1-3
Original file line numberDiff line numberDiff line change
@@ -5,14 +5,12 @@
55
"dependsOn": ["^build"]
66
},
77
"build": {
8-
"dependsOn": ["^build"],
9-
"cache": true
8+
"dependsOn": ["^build"]
109
},
1110
"build:watch": {
1211
"dependsOn": ["^build"]
1312
},
1413
"test": {
15-
"dependsOn": ["^build"],
1614
"cache": false
1715
},
1816
"e2e": {

packages/midscene/package.json

+5-7
Original file line numberDiff line numberDiff line change
@@ -38,20 +38,18 @@
3838
"prepublishOnly": "npm run build"
3939
},
4040
"dependencies": {
41-
"@anthropic-ai/sdk": "0.33.1",
4241
"@azure/identity": "4.5.0",
43-
"@langchain/core": "0.3.26",
42+
"@anthropic-ai/sdk": "0.33.1",
4443
"@midscene/shared": "workspace:*",
45-
"dirty-json": "0.9.2",
46-
"langchain": "0.3.8",
47-
"openai": "4.57.1",
48-
"optional": "0.1.4",
49-
"socks-proxy-agent": "8.0.4"
44+
"@langchain/core": "0.3.26",
45+
"socks-proxy-agent": "8.0.4",
46+
"openai": "4.57.1"
5047
},
5148
"devDependencies": {
5249
"@modern-js/module-tools": "2.60.6",
5350
"@types/node": "^18.0.0",
5451
"@types/node-fetch": "2.6.11",
52+
"dirty-json": "0.9.2",
5553
"dotenv": "16.4.5",
5654
"langsmith": "0.1.36",
5755
"typescript": "~5.0.4",

packages/midscene/src/action/executor.ts

+2-1
Original file line numberDiff line numberDiff line change
@@ -143,7 +143,8 @@ export class Executor {
143143
taskIndex++;
144144
} catch (e: any) {
145145
successfullyCompleted = false;
146-
task.error = e?.message || 'error-without-message';
146+
task.error =
147+
e?.message || (typeof e === 'string' ? e : 'error-without-message');
147148
task.errorStack = e.stack;
148149

149150
task.status = 'failed';

packages/midscene/src/ai-model/inspect.ts

+4-4
Original file line numberDiff line numberDiff line change
@@ -161,6 +161,7 @@ export async function AiInspectElement<
161161
type: 'image_url',
162162
image_url: {
163163
url: screenshotBase64WithElementMarker || screenshotBase64,
164+
detail: 'high',
164165
},
165166
},
166167
{
@@ -228,6 +229,7 @@ export async function AiExtractElementInfo<
228229
type: 'image_url',
229230
image_url: {
230231
url: screenshotBase64,
232+
detail: 'high',
231233
},
232234
},
233235
{
@@ -251,10 +253,7 @@ export async function AiExtractElementInfo<
251253

252254
export async function AiAssert<
253255
ElementType extends BaseElement = BaseElement,
254-
>(options: {
255-
assertion: string;
256-
context: UIContext<ElementType>;
257-
}) {
256+
>(options: { assertion: string; context: UIContext<ElementType> }) {
258257
const { assertion, context } = options;
259258

260259
assert(assertion, 'assertion should be a string');
@@ -272,6 +271,7 @@ export async function AiAssert<
272271
type: 'image_url',
273272
image_url: {
274273
url: screenshotBase64,
274+
detail: 'high',
275275
},
276276
},
277277
{

packages/midscene/src/ai-model/openai/index.ts

+1
Original file line numberDiff line numberDiff line change
@@ -118,6 +118,7 @@ async function createChatClient({
118118
endpoint: getAIConfig(AZURE_OPENAI_ENDPOINT),
119119
apiVersion: getAIConfig(AZURE_OPENAI_API_VERSION),
120120
deployment: getAIConfig(AZURE_OPENAI_DEPLOYMENT),
121+
dangerouslyAllowBrowser: true,
121122
...extraConfig,
122123
...extraAzureConfig,
123124
});

0 commit comments

Comments
 (0)