Files
JoyD/TARS/UI-TARS/README_deploy.md
2025-10-31 11:12:44 +08:00

5.1 KiB
Raw Permalink Blame History

UI-TARS 1.5 HuggingFace Endpoint Deployment Guide

1. HuggingFace Inference Endpoints Cloud Deployment

We use HuggingFace's Inference Endpoints platform to quickly deploy a cloud-based model.

Deployment Steps

  1. Access the Deployment Interface

  2. Configure Settings

    • Hardware Configuration

      • In the Hardware Configuration section, choose a GPU instance. Here are the recommendations for different model sizes:
        • For the 7B model, select GPU L40S 1GPU 48G (Recommended: Nvidia L4 / Nvidia A100).
          Hardware Configuration
    • Container Configuration

      • Set the following parameters:
        • Max Input Length (per Query): 65536
        • Max Batch Prefill Tokens: 65536
        • Max Number of Tokens (per Query): 65537 Container Configuration
    • Environment Variables

      • Add the following environment variables:
        • CUDA_GRAPHS=0 to avoid deployment failures. For details, refer to issue 2875.
        • PAYLOAD_LIMIT=8000000 to prevent request failures due to large images. For details, refer to issue 1802.
          Environment Variables
    • Create Endpoint

      • Click Create to set up the endpoint.
        Create Endpoint
    • Enter Setup

      • Once the deployment is finished, you will see the confirmation page and need to enter the settings page.
        Complete
    • Update URI -

      • Go to the Container page, set the Container URI to ghcr.io/huggingface/text-generation-inference:3.2.1, and click Update Endpoint to apply the changes. Complete

2. API Usage Example

Python Test Code

# pip install openai
import io
import re
import json
import base64
from PIL import Image
from io import BytesIO
from openai import OpenAI

def add_box_token(input_string):
    # Step 1: Split the string into individual actions
    if "Action: " in input_string and "start_box=" in input_string:
        suffix = input_string.split("Action: ")[0] + "Action: "
        actions = input_string.split("Action: ")[1:]
        processed_actions = []
        for action in actions:
            action = action.strip()
            # Step 2: Extract coordinates (start_box or end_box) using regex
            coordinates = re.findall(r"(start_box|end_box)='\((\d+),\s*(\d+)\)'", action)
            
            updated_action = action  # Start with the original action
            for coord_type, x, y in coordinates:
                # Convert x and y to integers
                updated_action = updated_action.replace(f"{coord_type}='({x},{y})'", f"{coord_type}='<|box_start|>({x},{y})<|box_end|>'")
            processed_actions.append(updated_action)
        
        # Step 5: Reconstruct the final string
        final_string = suffix + "\n\n".join(processed_actions)
    else:
        final_string = input_string
    return final_string

client = OpenAI(
    base_url="https:xxx",
    api_key="hf_xxx"
)

result = {}
messages = json.load(open("./data/test_messages.json"))
for message in messages:
    if message["role"] == "assistant":
        message["content"] = add_box_token(message["content"])
        print(message["content"])

chat_completion = client.chat.completions.create(
    model="tgi",
    messages=messages,
    top_p=None,
    temperature=0.0,
    max_tokens=400,
    stream=True,
    seed=None,
    stop=None,
    frequency_penalty=None,
    presence_penalty=None
)

response = ""
for message in chat_completion:
    response += message.choices[0].delta.content
print(response)

Expected Output

Thought: 我看到Preferences窗口已经打开了但这里显示的都是系统资源相关的设置要设置图片的颜色模式我得先看看左侧的选项列表"Color Management"这个选项看起来很有希望应该就是处理颜色管理的地方让我点击它看看里面有什么选项
Action: click(start_box='(177,549)')