yangyue's blog

Researching agents that see, think, and act. Somewhere between global minima and the edge of chaos.

Hope you find something that makes you curious here.

mail git scholar x linkedin

papers

MultiNet v1.0: A Comprehensive Benchmark for Evaluating Multimodal Reasoning and Action Models Across Diverse Domains Dec 2025
[project] [arxiv] [src]
An Open-Source Software Toolkit & Benchmark Suite for the Evaluation and Adaptation of Multimodal Action Models Jun 2025
[project] [arxiv]
Benchmarking Vision, Language, & Action Models in Procedurally Generated, Open Ended Action Environments Jun 2025
[project] [arxiv] [src]
Benchmarking Vision, Language, & Action Models On Robotic Learning Tasks Nov 2024
[project] [arxiv] [src]

systems

MultiNet 2025
Benchmarking vision-language-action models across robotic learning tasks.
[project] [src]
Software Control 2025
Agents that navigate and control arbitrary software environments.
[src]