ClawWork is an economic accountability framework for AI agents developed by the Data Intelligence Lab at the University of Hong Kong (HKUDS). The core concept: agents are given a budget and must earn income by completing professional tasks, then pay for their own token usage — maintaining economic solvency across a benchmark of 220 tasks spanning 44 professional sectors.
The benchmark measures not just task completion rates but economic efficiency: how much value an agent generates relative to its inference costs. Top-performing agents achieve $1,500+ per hour equivalent, a metric that quantifies the ROI of using AI agents for professional work. The framework supports Claude Sonnet 4.6, Gemini 3.1 Pro, and Qwen-3.5-Plus, enabling direct cross-model economic comparisons.
ClawWork represents a significant evolution in agent evaluation methodology. Rather than measuring success as binary task completion, it introduces economic pressure that rewards agents for being efficient, prioritizing high-value tasks, and avoiding unnecessary computation. For organizations evaluating OpenClaw or other agent frameworks for real business deployment, ClawWork provides a rigorous, economically grounded benchmark beyond standard accuracy metrics.
Frequently Asked Questions
What is the ClawWork economic benchmark?
ClawWork is an AI agent evaluation framework where agents are given a starting budget and must earn income by completing professional tasks, then pay for their own token usage. The framework measures economic efficiency — not just whether tasks are completed, but whether agents generate more value than they consume in inference costs. Top performers achieve $1,500+ per hour equivalent.
How many tasks does the ClawWork benchmark include?
ClawWork covers 220 professional tasks spanning 44 different sectors, from software engineering and legal research to financial analysis and scientific writing. The breadth ensures the benchmark captures performance across domains rather than specializing in one area.
Which AI models does ClawWork support?
ClawWork currently supports Claude Sonnet 4.6 (Anthropic), Gemini 3.1 Pro (Google), and Qwen-3.5-Plus (Alibaba). This cross-vendor support enables direct economic comparisons between frontier models — a more actionable metric than accuracy benchmarks for organizations evaluating AI ROI.
How does ClawWork relate to HKUDS ClawTeam?
Both are products of the Data Intelligence Lab at the University of Hong Kong (HKUDS). ClawTeam is the multi-agent orchestration framework (leader + specialist sub-agents). ClawWork is the economic evaluation benchmark. They are complementary: ClawTeam provides the infrastructure for multi-agent work, and ClawWork provides the methodology to measure whether that work is economically worthwhile.