|
Suhwan Choi
I'm an undergraduate student majoring in Physics and Computer Science at Seoul National University. I'm currently a Principal Researcher
at Maum.ai, where I lead the autonomous robotics research division.
My main research interests are in approximating and imitating human behavior and intelligence in
multimodal modalities, utilizing end-to-end architectures and scalable training suites. I focus on
embodied AI, robotic navigation, vision-language models, and multimodal learning.
Email /
CV /
LinkedIn /
Github /
Blog
|
|
Research & Publications
I work on embodied AI, robotic navigation, and multimodal learning. My research focuses on scaling
vision-action pretraining, commonsense-aware navigation systems, and vision-language model
improvements. Some papers are highlighted.
|
D2E: Scaling Vision-Action Pretraining on Desktop Data for Transfer to
Embodied AI
Suhwan Choi*, Jaeyoon Jung*, Haebin Seong*, Minchan Kim, Minyeong Kim, Yongjun Cho,
Yoonshik Kim, Yubeen Park, Youngjae Yu†, Yunsung Lee†
Under Review
project page
Scaling vision-action pretraining on desktop data enables effective transfer to embodied AI tasks.
|
Revisiting Residual Connections: Orthogonal Updates for Stable and
Efficient Deep Networks
Giyeong Oh, Woohyun Cho, Siyeol Kim, Suhwan Choi, Youngjae Yu†
NeurIPS 2025
arXiv
Revisiting residual connections with orthogonal updates for more stable and efficient deep
networks.
|
CANVAS: Commonsense-Aware Navigation System for Intuitive Human-Robot
Interaction
Suhwan Choi*, Yongjun Cho*, Minchan Kim*, Jaeyoon Jung*, Myunchul Joe, Yubeen Park,
Minseo Kim, Sungwoong Kim, Sungjae Lee, Hwiseong Park, Jiwan Chung, Youngjae Yu†
ICRA 2025 (Outstanding Paper Award at NeurIPS 2024 Workshop,
3%)
project page
A commonsense-aware navigation system that enables intuitive human-robot interaction through
natural language understanding.
|
ESREAL: Exploiting Semantic Reconstruction to Mitigate Hallucinations in
Vision-Language Models
Minchan Kim*, Minyeong Kim*, Junik Bae*, Suhwan Choi, Sungkyung Kim, Buru Chang†
ECCV 2024
arXiv
Exploiting semantic reconstruction to mitigate hallucinations in vision-language models.
|
Principal Researcher at Maum.ai (Feb 2024 – Present)
- Founded autonomous robotics research division as the first researcher, leading strategic
decisions and team expansion to 10 researchers.
- Contributed as first author to majority of research projects in robotic navigation and embodied
AI.
- Led CORE: Slurm-based DGX Cluster construction project (96 H100 GPUs, 12 nodes). [Blog]
- Implemented company-wide Notion workspace enhancing productivity and streamlining workflows. [Template]
|
Machine Learning Engineer Intern at Hyperconnect (July 2023 – Jan 2024)
- Worked on diffusion-based personalized profile image generation for real-world applications.
|
QHack Coding Challenge (2023 and 2024)
- Ranked 4th/793 teams in 2023, Ranked 3rd/618 teams in 2024.
- Contest implementing quantum algorithms, quantum machine learning, quantum chemistry, and
brain-teasing puzzles.
|
2023 Quantum Hackathon (2023)
- 1st place, Minister of Science and ICT Award
- Topic: Utilizing symmetry to solve variational quantum algorithm (quantum machine learning)
efficiently.
|
NAVER CLOVA AI RUSH 2022 (July
– Sept 2022)
- 3rd place on Landmark Detection (3,000,000 KRW)
- 2nd place on Shopping User Embedding Extraction, Classification (7,000,000 KRW)
|
Google Codejam 2022 (2022)
- Round 3, 546th (awarded T-Shirt).
|
Open Source Contributions
|
|
Open World Agents
Built comprehensive multimodal desktop agent framework including optimized data collection tool,
standardized efficient data format, multimedia data processing pipelines, dataset management, agent
training infrastructure, Python packaging, and CI/CD.
|
Website source code available on GitHub.
|
|