Skip to content

Commit 7f3ff4a

Browse files
Merge pull request #9 from ryanontheinside/feat/mediapipe-vision
Feat/mediapipe vision
2 parents 97d028b + de7d9c9 commit 7f3ff4a

File tree

111 files changed

+12125
-41
lines changed

Some content is hidden

Large Commits have some content hidden by default. Use the searchbox below for content that may be hidden.

111 files changed

+12125
-41
lines changed

README.MD

Lines changed: 100 additions & 17 deletions
Original file line numberDiff line numberDiff line change
@@ -4,7 +4,8 @@ A growing suite of nodes for real-time ComfyUI workflows. Features include value
44

55
The intention for this repository is to build a suite of nodes that can be used in the burgeoning real-time diffusion space. Contributions are welcome!
66

7-
## Nodes
7+
8+
## Control Nodes
89

910
### Value Controls 🎚️
1011
- **FloatControl**: Outputs a floating point value that changes over time using various patterns (sine wave, bounce, random walk, etc).
@@ -27,17 +28,11 @@ The intention for this repository is to build a suite of nodes that can be used
2728
- **DTypeConverter**: Convert masks between different data types (float16, uint8, float32, float64).
2829
- **FastWebcamCapture**: High-performance webcam capture node with resizing capabilities.
2930
- **SimilarityFilter**: Filter out similar consecutive images and control downstream execution. Perfect for optimizing real-time workflows by skipping redundant processing of similar frames.
31+
32+
### Logic 🧠
3033
- **LazyCondition**: Powerful conditional execution node that supports any input type. Uses lazy evaluation to truly skip execution of unused paths and maintains state to avoid feedback loops.
3134

32-
## Movement Patterns 🔄
3335

34-
All value and motion controls support various movement patterns:
35-
- **Sine**: Smooth sinusoidal motion
36-
- **Triangle**: Linear interpolation with smooth direction changes
37-
- **Sawtooth**: Linear interpolation with sharp resets
38-
- **Square**: Instant transitions between min/max values
39-
- **Static**: No movement (constant value)
40-
- **and more**
4136

4237
## Usage 📖
4338

@@ -65,6 +60,74 @@ Use utility nodes to optimize and control your workflow:
6560
- **SimilarityFilter**: Skip processing of similar frames by comparing consecutive images. Great for optimizing real-time workflows by only processing frames that have meaningful changes.
6661
- **LazyCondition**: Create conditional execution paths that truly skip processing of unused branches. Works with any input type (images, latents, text, numbers) and maintains state of the last successful output to avoid feedback loops.
6762

63+
## 🔮 MediaPipe Vision
64+
65+
### ✨ Overview
66+
67+
This repository provides a complete implementation of Google MediaPipe vision tasks for ComfyUI. It enables computer vision capabilities that can be used for interactive AI art, responsive interfaces, motion tracking, and advanced masking workflows.
68+
69+
### 🚀 Features
70+
71+
| Category | Available Tools |
72+
|----------|-------------|
73+
| **Face Analysis** | Face detection, face mesh (478 points), blendshapes, head pose |
74+
| **Body Tracking** | Pose estimation (33 landmarks), segmentation masks |
75+
| **Hand Analysis** | Hand tracking (21 landmarks per hand), gesture recognition |
76+
| **Image Processing** | Object detection, image segmentation, image embeddings |
77+
| **Creative Tools** | Face stylization, interactive segmentation |
78+
79+
### 📋 Supported MediaPipe Tasks
80+
81+
* **Face Detection:** Face bounding boxes and keypoints
82+
* **Face Landmark Detection:** Face mesh landmarks with expression analysis
83+
* **Hand Landmark Detection:** Hand position tracking with 21 landmarks
84+
* **Pose Landmark Detection:** Body pose tracking with 33 landmarks
85+
* **Object Detection:** Common object detection using models like EfficientDet
86+
* **Image Segmentation:** Category-based image segmentation
87+
* **Gesture Recognition:** Recognition of common hand gestures
88+
* **Image Embedding:** Feature vector generation for image similarity
89+
* **Interactive Segmentation:** User-guided image masking
90+
* **Face Stylization:** Artistic style application to faces
91+
* **Holistic Landmark Detection:** Full-body landmark detection (legacy)
92+
93+
> **Note:** Holistic landmark detection uses the legacy MediaPipe API as we await the official Tasks API release.
94+
95+
### ⚙️ Landmark System
96+
97+
The project's landmark system allows extracting and using position data:
98+
99+
#### Position Extraction
100+
101+
**Landmark Position Extractors** access coordinate data from any landmark:
102+
- Extract x, y, z positions from face, hand, or pose landmarks
103+
- Access visibility and presence information where available
104+
- Access world coordinates when available (hand and pose)
105+
- Input landmark indices directly to access any point
106+
- Process batches for multi-frame workflows
107+
108+
#### Position Processing
109+
110+
Several node types work with landmark position data:
111+
112+
- **Delta Controls** - Track movement and map changes to parameter values
113+
- **Proximity Nodes** - Calculate distances between landmarks
114+
- **Masking Nodes** - Generate masks centered at landmark positions
115+
- **Head Pose Extraction** - Calculate yaw, pitch, roll from face landmarks
116+
- **Blendshape Analysis** - Extract facial expression parameters
117+
118+
### Example Workflow
119+
120+
```
121+
Load Face Landmarker → Face Landmarker ← Image Input
122+
|
123+
↓ landmarks
124+
Face Landmark Position (Index: 1) → x,y,z coordinates
125+
|
126+
↓ x,y,z
127+
Position Delta Float Control → value → ComfyUI Parameter
128+
```
129+
130+
68131
## Examples 🎬
69132

70133
### Value Control Demo
@@ -97,6 +160,7 @@ git clone https://github.com/ryanontheinside/ComfyUI_RealTimeNodes
97160
cd ComfyUI_RealTimeNodes
98161
pip install -r requirements.txt
99162
```
163+
> **Note:** For MediaPipe, GPU Support varies by platform. For Linux, see [these instructions](https://ai.google.dev/edge/mediapipe/framework/getting_started/gpu_support).
100164
101165
## Coming Soon 🚀
102166

@@ -116,19 +180,37 @@ This is an evolving project that aims to expand the real-time capabilities of Co
116180

117181
### Contributing 🤝
118182

119-
Your feedback and contributions are more than welcome! This project grows stronger with community input.
183+
This project provides flexible infrastructure for computer vision in ComfyUI. If you have ideas for:
184+
185+
- Creative AI interactions using vision
186+
- Specific landmark tracking or detection needs
187+
- Real-time vision workflows
188+
- Improvements to the current implementation
189+
190+
Please open an issue, even if you're not sure how to implement it.
191+
192+
The aim is to **iterate quickly** to keep up with this burgeoning field of real-time ComfyUI
120193

121-
- Have an idea? Open an issue! 💡
122-
- Found a bug? Open an issue! 🐛
123-
- Made an improvement? Submit a PR! 🎉
124-
- Want to help? Join the discussion! 💬
125194

126195
Please visit our [GitHub Issues](https://github.com/ryanontheinside/ComfyUI_RealTimeNodes/issues) page to contribute.
127196

128197
## Related Projects 🔗
129198

130-
### ComfyUI_RyanOnTheInside - Everything Reactivity ⚡
131-
Make anything react to anything in your ComfyUI workflows. [ComfyUI_RyanOnTheInside](https://github.com/ryanontheinside/ComfyUI_RyanOnTheInside) - my main custom nodes suite that brings complete reactive control to standard ComfyUI workflows:
199+
## 🔗 Related Projects
200+
201+
### [ComfyUI_ControlFreak](https://github.com/ryanontheinside/ComfyUI_ControlFreak)
202+
Universal MIDI & Gamepad Mapping in ComfyUI. Map any MIDI controller or gamepad to any parameter in your ComfyUI workflow for intuitive, hands-on control of your generative art. Perfect for live performances, interactive installations, and streamlined creative workflows.
203+
204+
### [comfystream](https://github.com/yondonfu/comfystream)
205+
A real-time streaming framework for ComfyUI that enables running workflows continuously on video streams, perfect for
206+
combining with MediaPipe vision capabilities.
207+
208+
### [ComfyUI-Stream-Pack](https://github.com/livepeer/ComfyUI-Stream-Pack)
209+
A collection of ComfyUI nodes for multimedia streaming applications. Combines video processing with generative models
210+
for real-time media effects.
211+
212+
### [ComfyUI_RyanOnTheInside](https://github.com/ryanontheinside/ComfyUI_RyanOnTheInside) - Everything Reactivity ⚡
213+
Make anything react to anything in your ComfyUI workflows. - my main custom nodes suite that brings complete reactive control to standard ComfyUI workflows:
132214

133215
- Dynamic node relationships
134216
- React to audio, MIDI, motion, time, depth, color, Whisper, and more
@@ -143,4 +225,5 @@ Make anything react to anything in your ComfyUI workflows. [ComfyUI_RyanOnTheIns
143225
- Reactive DepthFlow
144226
- Actually more
145227

146-
Use it alongside these Control Nodes to master parameter control in both the batch and real-time paradigms in ComfyUI! The POWER!!
228+
Use it alongside these Control Nodes to master parameter control in both the batch and real-time paradigms in ComfyUI! The POWER!!
229+

examples/broccoli.png

383 KB
Loading

examples/hand_tracking_mask_resizer.json renamed to examples/control_nodes/hand_tracking_mask_resizer.json

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -99,7 +99,7 @@
9999
},
100100
"23": {
101101
"inputs": {
102-
"image": "dead_inside_512.png",
102+
"image": "harold.png",
103103
"upload": "image"
104104
},
105105
"class_type": "LoadImage",

examples/mask_string.json renamed to examples/control_nodes/mask_string.json

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -1,7 +1,7 @@
11
{
22
"57": {
33
"inputs": {
4-
"image": "dead_inside_512.png",
4+
"image": "harold.png",
55
"upload": "image"
66
},
77
"class_type": "LoadImage"

examples/motioncontrol_example_API.json renamed to examples/control_nodes/motioncontrol_example_API.json

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -207,7 +207,7 @@
207207
},
208208
"39": {
209209
"inputs": {
210-
"image": "dead_inside_512.png",
210+
"image": "harold.png",
211211
"upload": "image"
212212
},
213213
"class_type": "LoadImage",

examples/dead_inside_512.png

-267 KB
Binary file not shown.

examples/harold.png

288 KB
Loading
Lines changed: 191 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,191 @@
1+
{
2+
"1": {
3+
"inputs": {
4+
"image": "broccoli.png",
5+
"upload": "image"
6+
},
7+
"class_type": "LoadImage",
8+
"_meta": {
9+
"title": "Load Image"
10+
}
11+
},
12+
"3": {
13+
"inputs": {
14+
"num_faces": 1,
15+
"min_face_detection_confidence": 0.5,
16+
"min_face_presence_confidence": 0.5,
17+
"min_tracking_confidence": 0.5,
18+
"output_blendshapes": true,
19+
"output_transform_matrix": true,
20+
"running_mode": "video",
21+
"delegate": "cpu",
22+
"image": [
23+
"6",
24+
0
25+
],
26+
"model_info": [
27+
"4",
28+
0
29+
]
30+
},
31+
"class_type": "MediaPipeFaceLandmarker",
32+
"_meta": {
33+
"title": "Face Landmarker (MediaPipe)"
34+
}
35+
},
36+
"4": {
37+
"inputs": {
38+
"model_variant": "default"
39+
},
40+
"class_type": "MediaPipeFaceLandmarkerModelLoader",
41+
"_meta": {
42+
"title": "Load Face Landmarker Model (MediaPipe)"
43+
}
44+
},
45+
"6": {
46+
"inputs": {
47+
"image": "harold.png",
48+
"upload": "image"
49+
},
50+
"class_type": "PrimaryInputLoadImage",
51+
"_meta": {
52+
"title": "PrimaryInputLoadImage"
53+
}
54+
},
55+
"7": {
56+
"inputs": {
57+
"x": 0,
58+
"y": 0,
59+
"resize_source": false,
60+
"destination": [
61+
"6",
62+
0
63+
],
64+
"source": [
65+
"1",
66+
0
67+
],
68+
"mask": [
69+
"18",
70+
0
71+
]
72+
},
73+
"class_type": "ImageCompositeMasked",
74+
"_meta": {
75+
"title": "ImageCompositeMasked"
76+
}
77+
},
78+
"8": {
79+
"inputs": {
80+
"value": [
81+
"21",
82+
0
83+
],
84+
"width": 512,
85+
"height": 512
86+
},
87+
"class_type": "SolidMask",
88+
"_meta": {
89+
"title": "SolidMask"
90+
}
91+
},
92+
"15": {
93+
"inputs": {
94+
"part_name": "FACE_OVAL",
95+
"face_landmarks": [
96+
"16",
97+
0
98+
],
99+
"image_for_dimensions": [
100+
"6",
101+
0
102+
]
103+
},
104+
"class_type": "MaskFromFaceLandmarks",
105+
"_meta": {
106+
"title": "Mask From Face Landmarks (MediaPipe)"
107+
}
108+
},
109+
"16": {
110+
"inputs": {
111+
"num_faces": 1,
112+
"min_face_detection_confidence": 0.5,
113+
"min_face_presence_confidence": 0.5,
114+
"min_tracking_confidence": 0.5,
115+
"output_blendshapes": true,
116+
"output_transform_matrix": true,
117+
"running_mode": "video",
118+
"delegate": "cpu",
119+
"image": [
120+
"6",
121+
0
122+
],
123+
"model_info": [
124+
"17",
125+
0
126+
]
127+
},
128+
"class_type": "MediaPipeFaceLandmarker",
129+
"_meta": {
130+
"title": "Face Landmarker (MediaPipe)"
131+
}
132+
},
133+
"17": {
134+
"inputs": {
135+
"model_variant": "default"
136+
},
137+
"class_type": "MediaPipeFaceLandmarkerModelLoader",
138+
"_meta": {
139+
"title": "Load Face Landmarker Model (MediaPipe)"
140+
}
141+
},
142+
"18": {
143+
"inputs": {
144+
"x": 0,
145+
"y": 0,
146+
"operation": "subtract",
147+
"destination": [
148+
"8",
149+
0
150+
],
151+
"source": [
152+
"15",
153+
0
154+
]
155+
},
156+
"class_type": "MaskComposite",
157+
"_meta": {
158+
"title": "MaskComposite"
159+
}
160+
},
161+
"21": {
162+
"inputs": {
163+
"blendshape_name": "jawOpen",
164+
"score_min": 0,
165+
"score_max": 1,
166+
"output_min_float": 0,
167+
"output_max_float": 1,
168+
"clamp": true,
169+
"blendshapes": [
170+
"3",
171+
1
172+
]
173+
},
174+
"class_type": "BlendshapeControlFloat",
175+
"_meta": {
176+
"title": "Blendshape Control (Float)"
177+
}
178+
},
179+
"25": {
180+
"inputs": {
181+
"images": [
182+
"7",
183+
0
184+
]
185+
},
186+
"class_type": "PreviewImage",
187+
"_meta": {
188+
"title": "Preview Image"
189+
}
190+
}
191+
}

0 commit comments

Comments
 (0)