English | 简体中文
Your AI Operator for Web, Android, Automation & Testing
Midscene.js allows AI to serve as your web and Android operator 🤖. Simply describe what you want to achieve in natural language, and it will assist you in operating the interface, validating content, and extracting data. Whether you seek a quick experience or in-depth development, you'll find it easy to get started.
Instruction | Video |
---|---|
Post a Tweet (By UI-TARS model) | twitter-video-1080p.mp4 |
Use JS code to drive task orchestration, collect information about Jay Chou's concert, and write it into Google Docs (By UI-TARS model) | google-doc-1080p.mp4 |
Control Maps App on Android (By Qwen-2.5-VL model) | control-maps-app-on-android.mp4 |
Besides the default model GPT-4o, we have added two new recommended open-source models to Midscene.js: UI-TARS and Qwen2.5-VL. (Yes, Open Source models !) They are dedicated models for image recognition and UI automation, which are known for performing well in UI automation scenarios. Read more about it in Choose a model.
- Natural Language Interaction 👆: Just describe your goals and steps, and Midscene will plan and operate the user interface for you.
- UI Automation 🤖
- Web Automation 🖥️: Start in-browser experience immediately through the Chrome extension, or integrate with Puppeteer and Playwright.
- Android Automation 📱: Use the Android playground to start experience immediately, or integrate javascript SDK with adb.
- Visual Reports for Debugging 🎞️: Through our test reports and Playground, you can easily understand, replay and debug the entire process.
- Support Caching 🔄: The first time you execute a task through AI, it will be cached, and subsequent executions of the same task will significantly improve execution efficiency.
- Completely Open Source 🔥: Experience a whole new automation development experience, enjoy!
- Understand UI, JSON Format Responses 🔍: You can specify data format requirements and receive responses in JSON format.
- Intuitive Assertions 🤔: Express your assertions in natural language, and AI will understand and process them.
You can use multimodal LLMs like gpt-4o
, or visual-language models like Qwen2.5-VL
, gemini-2.5-pro
and UI-TARS
. In which UI-TARS
is an open-source model dedicated for UI automation.
Read more about Choose a model
There are so many UI automation tools out there, and each one seems to be all-powerful. What's special about Midscene.js?
-
Debugging Experience: You will soon realize that debugging and maintaining automation scripts is the real challenge. No matter how magical the demo looks, ensuring stability over time requires careful debugging. Midscene.js offers a visualized report file, a built-in playground, and a Chrome Extension to simplify the debugging process. These are the tools most developers truly need, and we’re continually working to improve the debugging experience.
-
Open Source, Free, Deploy as you want: Midscene.js is an open-source project. It's decoupled from any cloud service and model provider, you can choose either public or private deployment. There is always a suitable plan for your business.
-
Integrate with Javascript: You can always bet on Javascript 😎
- Home Page: https://midscenejs.com
- Web Browser Automation
- Android Automation
- API Reference
- Choose a model
- Config Model and Provider
We would like to thank the following projects:
- Rsbuild for the build tool.
- UI-TARS for the open-source agent model UI-TARS.
- Qwen2.5-VL for the open-source VL model Qwen2.5-VL.
- scrcpy and yume-chan allow us to control Android devices with browser.
- appium-adb for the javascript bridge of adb.
- YADB for the yadb tool which improves the performance of text input.
- Puppeteer for browser automation and control.
- Playwright for browser automation and control and testing.
If you use Midscene.js in your research or project, please cite:
@software{Midscene.js,
author = {Xiao Zhou, Tao Yu, YiBing Lin},
title = {Midscene.js: Your AI Operator for Web, Android, Automation & Testing.},
year = {2025},
publisher = {GitHub},
url = {https://github.com/web-infra-dev/midscene}
}
Midscene.js is MIT licensed.