Suhwan Choi

I'm an undergraduate student majoring in Physics and Computer Science at Seoul National University. I'm currently a Principal Researcher at Maum.ai, where I lead the autonomous robotics research division.

My main research interests are in approximating and imitating human behavior and intelligence in multimodal modalities, utilizing end-to-end architectures and scalable training suites. I focus on embodied AI, robotic navigation, vision-language models, and multimodal learning.

Email / CV / LinkedIn / Github / Blog

Research & Publications

I work on embodied AI, robotic navigation, and multimodal learning. My research focuses on scaling vision-action pretraining, commonsense-aware navigation systems, and vision-language model improvements. Some papers are highlighted.

WorldCam: Interactive Autoregressive 3D Gaming Worlds with Camera Pose as a Unifying Geometric Representation
Jisu Nam, Yicong Hong, Chun-Hao Paul Huang, Feng Liu, JoungBin Lee, Jiyoung Kim, Siyoon Jin, Yunsung Lee, Jaeyoon Jung, Suhwan Choi, Seungryong Kim†, Yang Zhou†
arXiv 2026
arXiv / project page

An interactive gaming world model using camera pose as a unifying geometric representation for precise action control and long-horizon 3D consistency.

vla-eval: A Unified Evaluation Harness for Vision-Language-Action Models
Suhwan Choi, Yunsung Lee, Yubeen Park, Chris Dongjoo Kim, Ranjay Krishna, Dieter Fox, Youngjae Yu
arXiv 2026
arXiv / code / leaderboard

One framework to evaluate any VLA model on any robot simulation benchmark. Features batch parallel evaluation with 47x throughput, Docker-isolated benchmarks, and the largest unified VLA leaderboard (500+ models x 17 benchmarks).

D2E: Scaling Vision-Action Pretraining on Desktop Data for Transfer to Embodied AI
Suhwan Choi*, Jaeyoon Jung*, Haebin Seong*, Minchan Kim, Minyeong Kim, Yongjun Cho, Yoonshik Kim, Yubeen Park, Youngjae Yu†, Yunsung Lee†
ICLR 2026
project page

Scaling vision-action pretraining on desktop data enables effective transfer to embodied AI tasks.

CostNav: A Navigation Benchmark for Real-World Economic-Cost Evaluation of Physical AI Agents
Haebin Seong*, Sungmin Kim*, Yongjun Cho*, Myunchul Joe, Geunwoo Kim, Yubeen Park, Sunhoo Kim, Yoonshik Kim, Suhwan Choi, Jaeyoon Jung, Jiyong Youn, Jinmyung Kwak, Sunghee Ahn, Jaemin Lee, Younggil Do, Seungyeop Yi, Woojin Cheong, Minhyeok Oh, Minchan Kim, Seongjae Kang, Samwoo Seong, Youngjae Yu, Yunsung Lee
arXiv 2025
arXiv / project page / code

A navigation benchmark that shifts evaluation from purely technical metrics to real-world economic cost and revenue, featuring high-fidelity physics simulation with delivery robot dynamics.

Revisiting Residual Connections: Orthogonal Updates for Stable and Efficient Deep Networks
Giyeong Oh, Woohyun Cho, Siyeol Kim, Suhwan Choi, Youngjae Yu†
NeurIPS 2025
arXiv

Revisiting residual connections with orthogonal updates for more stable and efficient deep networks.

CANVAS: Commonsense-Aware Navigation System for Intuitive Human-Robot Interaction
Suhwan Choi*, Yongjun Cho*, Minchan Kim*, Jaeyoon Jung*, Myunchul Joe, Yubeen Park, Minseo Kim, Sungwoong Kim, Sungjae Lee, Hwiseong Park, Jiwan Chung, Youngjae Yu†
ICRA 2025 (Outstanding Paper Award at NeurIPS 2024 Workshop, 3%)
project page

A commonsense-aware navigation system that enables intuitive human-robot interaction through natural language understanding.

ESREAL: Exploiting Semantic Reconstruction to Mitigate Hallucinations in Vision-Language Models
Minchan Kim*, Minyeong Kim*, Junik Bae*, Suhwan Choi, Sungkyung Kim, Buru Chang†
ECCV 2024
arXiv

Exploiting semantic reconstruction to mitigate hallucinations in vision-language models.

Experience

Principal Researcher at Maum.ai (Feb 2024 – Present)

Founded autonomous robotics research division as the first researcher, leading strategic decisions and team expansion to 10 researchers.
Contributed as first author to majority of research projects in robotic navigation and embodied AI.
Led CORE: Slurm-based DGX Cluster construction project (96 H100 GPUs, 12 nodes). [Blog]
Implemented company-wide Notion workspace enhancing productivity and streamlining workflows. [Template]

Machine Learning Engineer Intern at Hyperconnect (July 2023 – Jan 2024)

Worked on diffusion-based personalized profile image generation for real-world applications.

Open Source Projects

allenai/vla-evaluation-harness

Primary author of a unified evaluation framework for Vision-Language-Action models across robot simulation benchmarks. Features batch parallel evaluation with 47x throughput, Docker-isolated benchmarks, and the largest unified VLA leaderboard (500+ models x 17 benchmarks).

MilkClouds/awesome-vla-study

A structured reading list on Vision-Language-Action (VLA) models — from diffusion/flow matching foundations through state-of-the-art robot foundation model architectures to data scaling, RL fine-tuning, and world models.

MilkClouds/vla0-trl

Unofficial reimplementation of VLA-0 using TRL's SFTTrainer. While common VLA codebases are over 10,000 lines, vla0-trl contains only ~1,200 lines total. Gets ~92% on LIBERO by just fine-tuning Qwen2.5-VL to predict actions as text, without any custom architecture.

MilkClouds/SimpleRPyC

WebSocket-based RPC library for Python. Uses transparent proxy objects to interact with remote Python objects as if they were local, with convenient module patching for minimal code changes.

MilkClouds/lazyregistry

A lightweight Python library for lazy-loading registries with namespace support and type safety.

open-world-agents/MediaRef

Pydantic media reference for images and video frames (with timestamp support) from data URIs, HTTP URLs, file URIs, and local paths. Features lazy loading and optimized batch video decoding.

MilkClouds/smon

Real-time Slurm cluster monitoring tool with interactive TUI with Textual. Visualizes GPU/CPU/memory allocation across nodes with job-level drill-down.

Open World Agents

Core contributor and maintainer with 180+ merged PRs. Built comprehensive multimodal desktop agent framework including optimized data collection tool (ocap), standardized efficient data format (OWAMcap), dataset visualizer, multimedia data management/processing pipelines, agent training, Python packaging, and CI/CD infrastructure.

Awards & Honors

QHack Coding Challenge (2023 and 2024)

Ranked 4th/793 teams in 2023, Ranked 3rd/618 teams in 2024.
Contest implementing quantum algorithms, quantum machine learning, quantum chemistry, and brain-teasing puzzles.

2023 Quantum Hackathon (2023)

1st place, Minister of Science and ICT Award
Topic: Utilizing symmetry to solve variational quantum algorithm (quantum machine learning) efficiently.

NAVER CLOVA AI RUSH 2022 (July – Sept 2022)

3rd place on Landmark Detection (3,000,000 KRW)
2nd place on Shopping User Embedding Extraction, Classification (7,000,000 KRW)

Google Codejam 2022 (2022)

Round 3, 546th (awarded T-Shirt).

Website source code available on GitHub.