yangyue's blog

building agents that see, think, and act in laptops and robots. mostly debugging the ones that don't.

also can't stop thinking about self-organized criticality in a world of animals, machines, and planets.

mail git scholar x linkedin

papers

GUI-Perturbed: Domain Randomization Reveals Systematic Brittleness in GUI Grounding Models Apr 2026
[arxiv] [src]
MultiNet v1.0: A Comprehensive Benchmark for Evaluating Multimodal Reasoning and Action Models Across Diverse Domains Dec 2025
[project] [arxiv] [src]
An Open-Source Software Toolkit & Benchmark Suite for the Evaluation and Adaptation of Multimodal Action Models Jun 2025
[project] [arxiv]
Benchmarking Vision, Language, & Action Models in Procedurally Generated, Open Ended Action Environments Jun 2025
[project] [arxiv] [src]
Benchmarking Vision, Language, & Action Models On Robotic Learning Tasks Nov 2024
[project] [arxiv] [src]

systems

GUI-DR 2026
Probing GUI grounding model brittleness via domain randomization.
[blog] [src]
MultiNet 2025
Benchmarking vision-language-action models across robotic learning tasks.
[project] [src]

posts

Hello world! 15 Apr 2026