
building agents that see, think, and act in laptops and robots. mostly debugging the ones that don't.
also can't stop thinking about self-organized criticality in a world of animals, machines, and planets.
papers
- MultiNet v1.0: A Comprehensive Benchmark for Evaluating Multimodal Reasoning and Action Models Across Diverse Domains
[project] [arxiv] [src] - An Open-Source Software Toolkit & Benchmark Suite for the Evaluation and Adaptation of Multimodal Action Models
[project] [arxiv] - Benchmarking Vision, Language, & Action Models in Procedurally Generated, Open Ended Action Environments
[project] [arxiv] [src] - Benchmarking Vision, Language, & Action Models On Robotic Learning Tasks
[project] [arxiv] [src]