增加Tars
This commit is contained in:
146
TARS/UI-TARS/codes/README.md
Normal file
146
TARS/UI-TARS/codes/README.md
Normal file
@@ -0,0 +1,146 @@
|
||||
# ui-tars
|
||||
|
||||
A python package for parsing VLM-generated GUI action instructions into executable pyautogui codes.
|
||||
|
||||
---
|
||||
|
||||
## Introduction
|
||||
|
||||
`ui-tars` is a Python package for parsing VLM-generated GUI action instructions, automatically generating pyautogui scripts, and supporting coordinate conversion and smart image resizing.
|
||||
|
||||
- Supports multiple VLM output formats (e.g., Qwen-VL, Seed-VL)
|
||||
- Automatically handles coordinate scaling and format conversion
|
||||
- One-click generation of pyautogui automation scripts
|
||||
|
||||
---
|
||||
|
||||
## Quick Start
|
||||
|
||||
### Installation
|
||||
|
||||
```bash
|
||||
pip install ui-tars
|
||||
# or
|
||||
uv pip install ui-tars
|
||||
```
|
||||
|
||||
### Parse output into structured actions
|
||||
|
||||
```python
|
||||
from ui_tars.action_parser import parse_action_to_structure_output, parsing_response_to_pyautogui_code
|
||||
|
||||
response = "Thought: Click the button\nAction: click(point='<point>200 300</point>')"
|
||||
original_image_width, original_image_height = 1920, 1080
|
||||
parsed_dict = parse_action_to_structure_output(
|
||||
response,
|
||||
factor=1000,
|
||||
origin_resized_height=original_image_height,
|
||||
origin_resized_width=original_image_width,
|
||||
model_type="doubao"
|
||||
)
|
||||
print(parsed_dict)
|
||||
parsed_pyautogui_code = parsing_response_to_pyautogui_code(
|
||||
responses=parsed_dict,
|
||||
image_height=original_image_height,
|
||||
image_width=original_image_width
|
||||
)
|
||||
print(parsed_pyautogui_code)
|
||||
```
|
||||
|
||||
### Generate pyautogui automation script
|
||||
|
||||
```python
|
||||
from ui_tars.action_parser import parsing_response_to_pyautogui_code
|
||||
|
||||
pyautogui_code = parsing_response_to_pyautogui_code(parsed_dict, original_image_height, original_image_width)
|
||||
print(pyautogui_code)
|
||||
```
|
||||
|
||||
### Visualize coordinates on the image (optional)
|
||||
|
||||
```python
|
||||
from PIL import Image, ImageDraw
|
||||
import numpy as np
|
||||
import matplotlib.pyplot as plt
|
||||
|
||||
image = Image.open("your_image_path.png")
|
||||
start_box = parsed_dict[0]["action_inputs"]["start_box"]
|
||||
coordinates = eval(start_box)
|
||||
x1 = int(coordinates[0] * original_image_width)
|
||||
y1 = int(coordinates[1] * original_image_height)
|
||||
draw = ImageDraw.Draw(image)
|
||||
radius = 5
|
||||
draw.ellipse((x1 - radius, y1 - radius, x1 + radius, y1 + radius), fill="red", outline="red")
|
||||
plt.imshow(np.array(image))
|
||||
plt.axis("off")
|
||||
plt.show()
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## API Documentation
|
||||
|
||||
### parse_action_to_structure_output
|
||||
|
||||
```python
|
||||
def parse_action_to_structure_output(
|
||||
text: str,
|
||||
factor: int,
|
||||
origin_resized_height: int,
|
||||
origin_resized_width: int,
|
||||
model_type: str = "qwen25vl",
|
||||
max_pixels: int = 16384 * 28 * 28,
|
||||
min_pixels: int = 100 * 28 * 28
|
||||
) -> list[dict]:
|
||||
...
|
||||
```
|
||||
|
||||
**Description:**
|
||||
Parses output action instructions into structured dictionaries, automatically handling coordinate scaling and box/point format conversion.
|
||||
|
||||
**Parameters:**
|
||||
- `text`: The output string
|
||||
- `factor`: Scaling factor
|
||||
- `origin_resized_height`/`origin_resized_width`: Original image height/width
|
||||
- `model_type`: Model type (e.g., "qwen25vl", "doubao")
|
||||
- `max_pixels`/`min_pixels`: Image pixel upper/lower limits
|
||||
|
||||
**Returns:**
|
||||
A list of structured actions, each as a dict with fields like `action_type`, `action_inputs`, `thought`, etc.
|
||||
|
||||
---
|
||||
|
||||
### parsing_response_to_pyautogui_code
|
||||
|
||||
```python
|
||||
def parsing_response_to_pyautogui_code(
|
||||
responses: dict | list[dict],
|
||||
image_height: int,
|
||||
image_width: int,
|
||||
input_swap: bool = True
|
||||
) -> str:
|
||||
...
|
||||
```
|
||||
|
||||
**Description:**
|
||||
Converts structured actions into a pyautogui script string, supporting click, type, hotkey, drag, scroll, and more.
|
||||
|
||||
**Parameters:**
|
||||
- `responses`: Structured actions (dict or list of dicts)
|
||||
- `image_height`/`image_width`: Image height/width
|
||||
- `input_swap`: Whether to use clipboard paste for typing (default True)
|
||||
|
||||
**Returns:**
|
||||
A pyautogui script string, ready for automation execution.
|
||||
|
||||
---
|
||||
|
||||
## Contribution
|
||||
|
||||
Contributions, issues, and suggestions are welcome!
|
||||
|
||||
---
|
||||
|
||||
## License
|
||||
|
||||
Apache-2.0 License
|
||||
Reference in New Issue
Block a user