Huan Wang

Director at Salesforce Research

I am currently a Director at Salesforce Research in Palo Alto, CA. I received my Ph.D. in Computer Science from Yale University in 2013, where I was advised by Prof. Daniel Spielman. I was also mentored by Prof. John Wright at Columbia University. Prior to Yale, I was a member of the Multimedia Lab at the Chinese University of Hong Kong, supervised by Prof. Xiaoou Tang, Prof. Shuicheng Yan, and Prof. Jianzhuang Liu.

Huan Wang - Director at Salesforce Research and Computer Science PhD from Yale University

Open Source Projects

A collection of research projects and tools I've contributed to, spanning Large Language Models, AI Agents, Multimodal AI, and more.

APIGen

Automated Pipeline for Generating Verifiable and Diverse Function-Calling Datasets

LLM Agents

APIGen-MT

Agentic Pipeline for Multi-Turn Data Generation via Simulated Agent-Human Interplay

LLM Multi-Turn

CodeGen

An Open Large Language Model for Code with Multi-Turn Program Synthesis

Code Generation LLM

CoDA

Coding LM via Diffusion Adaptation

Diffusion Code

xLAM

A Family of Large Action Models to Empower AI Agent Systems

LAM Agents

AgentLite

A Lightweight Library for Building and Advancing Task-Oriented LLM Agent System

Agents Framework

BOLAA

Benchmarking and Orchestrating LLM-augmented Autonomous Agents

Benchmarking Agents

CRM Arena

Understanding the Capacity of LLM Agents to Perform Professional CRM Tasks in Realistic Environments

CRM Evaluation

MCP Eval

Automatic MCP-based Deep Evaluation for AI Agent Models

Evaluation MCP

Persona Bench

Evaluating AI Models on Understanding Personal Information through Accessing (Synthetic) Private User Data

Privacy Benchmark

UserBench

An Interactive Gym Environment for User-Centric Agents

User-Centric Environment

UserRL

User-Centric Reinforcement Learning

RL User-Centric

LoCoBench

A Benchmark for Long-Context Large Language Models in Complex Software Engineering

Long-Context Software Engineering

MobileAIBench

Benchmarking LLMs and LMMs for On-Device Use Cases

Mobile On-Device

LATTE

LeArning to Think wiTh Vision SpEcialists

Vision Multimodal

xGen-MM (BLIP3)

Multimodal Large Language Model

Multimodal BLIP3

Unicontrol

Unified Control for Multimodal Generation

Control Generation

WarpDrive

Extremely Fast End-to-End Deep Multi-Agent Reinforcement Learning on a GPU

RL GPU

CausalAI

Causal AI Research Tools and Frameworks

Causal AI

Merlion

A Machine Learning Library for Time Series

Time Series ML

Promptomatix

A Powerful Framework for LLM Prompt Optimization

Prompt Optimization

Diversity Empowers Intelligence

Integrating Expertise of Software Engineering Agents

Software Engineering Diversity

Hive

Harnessing Human Feedback for Instructional Visual Editing

Platform Research

Retroformer

Retrospective Large Language Agents with Policy Gradient Optimization

Retrospective Policy Gradient

DialogStudio

Towards Richest and Most Diverse Unified Dataset Collection and Instruction-Aware Models for Conversational AI

Dialog Conversational

Converse

A Flexible Framework for Building and Deploying Task-Oriented Chatbots.

Conversational Framework

Ensemble Averages

Improving Model Selection and Boosting Performance in Domain Generalization

Ensemble Learning

Publication Highlights

Full publication list can be found on Google Scholar.

Large Language Model (LLM)

APIGen-MT: Agentic PIpeline for Multi-Turn Data Generation via Simulated Agent-Human Interplay, by Akshara Prabhakar, Zuxin Liu, Ming Zhu, Jianguo Zhang, Tulika Awalgaonkar, Shiyu Wang, Zhiwei Liu, Haolin Chen, Thai Hoang, Juan Carlos Niebles, Shelby Heinecke, Weiran Yao, Huan Wang*, Silvio Savarese*, Caiming Xiong*. NeurIPS Datasets and Benchmarks Track, 2025. [Data][Model], * co-corresponding authors.

APIGen: Automated PIpeline for Generating Verifiable and Diverse Function-Calling Datasets, by Zuxin Liu, Thai Hoang, Jianguo Zhang, Ming Zhu, Tian Lan, Shirley Kokane, Juntao Tan, Weiran Yao, Zhiwei Liu, Yihao Feng, Rithesh Murthy, Liangwei Yang, Silvio Savarese, Juan Carlos Niebles, Huan Wang, Shelby Heinecke, Caiming Xiong. NeurIPS, 2024. [Data][Model]

xLAM: A Family of Large Action Models to Empower AI Agent Systems, by Jianguo Zhang, Tian Lan, Ming Zhu, Zuxin Liu, Thai Hoang, Shirley Kokane, Weiran Yao, Juntao Tan, Akshara Prabhakar, Haolin Chen, Zhiwei Liu, Yihao Feng, Tulika Awalgaonkar, Rithesh Murthy, Eric Hu, Zeyuan Chen, Ran Xu, Juan Carlos Niebles, Shelby Heinecke, Huan Wang*, Silvio Savarese*, Caiming Xiong*. Arxiv, 2024. [Github Repo], * co-corresponding authors.

Retroformer: Retrospective Large Language Agents with Policy Gradient Optimization, by Weiran Yao, Shelby Heinecke, Juan Carlos Niebles, Zhiwei Liu, Yihao Feng, Le Xue, Rithesh Murthy, Zeyuan Chen, Jianguo Zhang, Devansh Arpit, Ran Xu, Phil Mui, Huan Wang*, Caiming Xiong*, Silvio Savarese*. Arxiv, 2023. [Github Repo], * co-corresponding authors.

BOLAA: Benchmarking and Orchestrating LLM-augmented Autonomous Agents, by Zhiwei Liu, Weiran Yao, Jianguo Zhang, Le Xue, Shelby Heinecke, Rithesh Murthy, Yihao Feng, Zeyuan Chen, Juan Carlos Niebles, Devansh Arpit, Ran Xu, Phil Mui, Huan Wang*, Caiming Xiong*, Silvio Savarese*. Arxiv, 2023. [Github Repo], * co-corresponding authors.

REX: Rapid Exploration and eXploitation for AI Agents, by Rithesh Murthy, Shelby Heinecke, Juan Carlos Niebles, Zhiwei Liu, Le Xue, Weiran Yao, Yihao Feng, Zeyuan Chen, Akash Gokul, Devansh Arpit, Ran Xu, Phil Mui, Huan Wang*, Caiming Xiong*, Silvio Savarese*. Arxiv, 2023. * co-corresponding authors.

AgentOhana: Design Unified Data and Training Pipeline for Effective Agent Learning, by Jianguo Zhang, Tian Lan, Rithesh Murthy, Zhiwei Liu, Weiran Yao, Ming Zhu, Juntao Tan, Thai Hoang, Zuxin Liu, Liangwei Yang, Yihao Feng, Shirley Kokane, Tulika Awalgaonkar, Juan Carlos Niebles, Silvio Savarese, Shelby Heinecke, Huan Wang, Caiming Xiong. Arxiv, 2024. [Github Repo]

CodeGen: An Open Large Language Model for Code with Multi-Turn Program Synthesis, by Erik Nijkamp, Bo Pang, Hiroaki Hayashi, Lifu Tu, Huan Wang, Yingbo Zhou, Silvio Savarese, Caiming Xiong. Arxiv, 2022. [Github Repo]

AI Agent

AgentLite: A Lightweight Library for Building and Advancing Task-Oriented LLM Agent System, by Zhiwei Liu, Weiran Yao, Jianguo Zhang, Liangwei Yang, Zuxin Liu, Juntao Tan, Prafulla K. Choubey, Tian Lan, Jason Wu, Huan Wang, Shelby Heinecke, Caiming Xiong, Silvio Savarese. Arxiv, 2024. [Github Repo]

Diversity Empowers Intelligence: Integrating Expertise of Software Engineering Agents, by Kexin Zhang, Weiran Yao, Zuxin Liu, Yihao Feng, Zhiwei Liu, Rithesh Murthy, Tian Lan, Lei Li, Renze Lou, Jiacheng Xu, Bo Pang, Yingbo Zhou, Shelby Heinecke, Silvio Savarese, Huan Wang, Caiming Xiong. Arxiv, 2024.

CRMArena: Understanding the Capacity of LLM Agents to Perform Professional CRM Tasks in Realistic Environments, by Kung-Hsiang Huang, Akshara Prabhakar, Sidharth Dhawan, Yixin Mao, Huan Wang, Silvio Savarese, Caiming Xiong, Philippe Laban, Chien-Sheng Wu. NAACL, 2025. [Github Repo]

MobileAIBench: Benchmarking LLMs and LMMs for On-Device Use Cases, by Rithesh Murthy, Liangwei Yang, Juntao Tan, Tulika Manoj Awalgaonkar, Yilun Zhou, Shelby Heinecke, Sachin Desai, Jason Wu, Ran Xu, Sarah Tan, Jianguo Zhang, Zhiwei Liu, Shirley Kokane, Zuxin Liu, Ming Zhu, Huan Wang, Caiming Xiong, Silvio Savarese. Arxiv, 2024.

MCPEval: Automatic MCP-based Deep Evaluation for AI Agent Models, by Zhiwei Liu, Jielin Qiu, Shiyu Wang, Jianguo Zhang, Zuxin Liu, Roshan Ram, Haolin Chen, Weiran Yao, Shelby Heinecke, Silvio Savarese, Huan Wang, Caiming Xiong. Arxiv, 2025. [Github Repo]

ToolScan: A Benchmark for Characterizing Errors in Tool-Use LLMs, by Shirley Kokane, Ming Zhu, Tulika Awalgaonkar, Jianguo Zhang, Thai Hoang, Akshara Prabhakar, Zuxin Liu, Tian Lan, Liangwei Yang, Juntao Tan, Rithesh Murthy, Weiran Yao, Zhiwei Liu, Juan Carlos Niebles, Huan Wang, Shelby Heinecke, Caiming Xiong, Silvio Savarese. Arxiv, 2024.

UserBench: An Interactive Gym Environment for User-Centric Agents, by Cheng Qian, Zuxin Liu, Akshara Prabhakar, Zhiwei Liu, Jianguo Zhang, Haolin Chen, Heng Ji, Weiran Yao, Shelby Heinecke, Silvio Savarese, Caiming Xiong, Huan Wang. Arxiv, 2025.

LoCoBench: A Benchmark for Long-Context Large Language Models in Complex Software Engineering, by Jielin Qiu, Zuxin Liu, Zhiwei Liu, Rithesh Murthy, Jianguo Zhang, Haolin Chen, Shiyu Wang, Ming Zhu, Liangwei Yang, Juntao Tan, Zhepeng Cen, Cheng Qian, Shelby Heinecke, Weiran Yao, Silvio Savarese, Caiming Xiong, Huan Wang. Arxiv, 2025. [Github Repo]

LAM SIMULATOR: Advancing Data Generation for Large Action Model Training via Online Exploration and Trajectory Feedback, by Thai Hoang, Kung-Hsiang Huang, Shirley Kokane, Jianguo Zhang, Zuxin Liu, Ming Zhu, Jake Grigsby, Tian Lan, Michael S Ryoo, Chien-Sheng Wu, Shelby Heinecke, Huan Wang, Silvio Savarese, Caiming Xiong, Juan Carlos Niebles. Arxiv, 2025. [Github Repo]

PersonaBench: Evaluating AI Models on Understanding Personal Information through Accessing (Synthetic) Private User Data, by Juntao Tan, Liangwei Yang, Zuxin Liu, Zhiwei Liu, Rithesh Murthy, Tulika Manoj Awalgaonkar, Jianguo Zhang, Weiran Yao, Ming Zhu, Shirley Kokane, Silvio Savarese, Huan Wang, Caiming Xiong, Shelby Heinecke. ACL Findings, 2025. [Github Repo]

LLM Reasoning

Language Models are Hidden Reasoners: Unlocking Latent Reasoning Capabilities via Self-Rewarding, by Haolin Chen, Yihao Feng, Zuxin Liu, Weiran Yao, Akshara Prabhakar, Shelby Heinecke, Ricky Ho, Phil Mui, Silvio Savarese, Caiming Xiong, Huan Wang. Arxiv, 2024.

PRACT: Optimizing Principled Reasoning and Acting of LLM Agent, by Zhiwei Liu, Weiran Yao, Jianguo Zhang, Rithesh Murthy, Liangwei Yang, Zuxin Liu, Tian Lan, Ming Zhu, Juntao Tan, Shirley Kokane, Thai Hoang, Juan Carlos Niebles, Shelby Heinecke, Huan Wang, Silvio Savarese, Caiming Xiong. SIG CoNLL, 2024.

LATTE: Learning to Think with Vision Specialists, by Zixian Ma, Jianguo Zhang, Zhiwei Liu, Jieyu Zhang, Juntao Tan, Manli Shu, Juan Carlos Niebles, Shelby Heinecke, Huan Wang, Caiming Xiong, Ranjay Krishna, Silvio Savarese. Arxiv, 2024.

Reinforcement Learning

On the Generalization Gap in Reparameterizable Reinforcement Learning, by Huan Wang, Stephan Zheng, Caiming Xiong, Richard Socher. ICML, 2019.

Policy Finetuning: Bridging Sample-Efficient Offline and Online Reinforcement Learning, by Tengyang Xie, Nan Jiang, Huan Wang, Caiming Xiong, Yu Bai. NeurIPS, 2021.

Sample-Efficient Learning of Stackelberg Equilibria in General-Sum Games, by Yu Bai, Chi Jin, Huan Wang, and Caiming Xiong. NeurIPS, 2021.

WarpDrive: Extremely Fast End-to-End Deep Multi-Agent Reinforcement Learning on a GPU, by Tian Lan, Sunil Srinivasa, Huan Wang, Stephan Zheng. Arxiv, 2021. [Github Repo]

Uncertainty Estimation

Improved Online Conformal Prediction via Strongly Adaptive Online Learning, by Aadyot Bhatnagar, Huan Wang, Caiming Xiong, Yu Bai. ICML, 2023.

Understanding the Under-Coverage Bias in Uncertainty Estimation, by Yu Bai, Song Mei, Huan Wang, Caiming Xiong. NeurIPS, 2021.

Localized Calibration: Metrics and Recalibration, by Rachel Luo, Aadyot Bhatnagar, Huan Wang, Caiming Xiong, Silvio Savarese, Yu Bai, Shengjia Zhao, Stefano Ermon. Arxiv, 2021.

Natural Language Processing

Unsupervised Paraphrasing with Pretrained Language Models, by Tong Niu, Semih Yavuz, Yingbo Zhou, Nitish Shirish Keskar, Huan Wang and Caiming Xiong. EMNLP, 2021.

BatchMixup: Improving Training by Interpolating Hidden States of the Entire Mini-batch, by Wenpeng Yin, Huan Wang, Jin Qu, Caiming Xiong. ACL.Findings, 2021.

Neural Network and Deep Learning

Evaluating State-of-the-Art Classification Models Against Bayes Optimality, by Ryan Theisen, Huan Wang, Lav R Varshney, Caiming Xiong, and Richard Socher. NeurIPS, 2021.

Catastrophic Fisher Explosion: Early Phase Fisher Matrix Impacts Generalization, by Stanislaw Jastrzebski, Devansh Arpit, Oliver Astrand, Giancarlo Kerg, Huan Wang, Caiming Xiong, Richard Socher, Kyunghyun Cho, Krzysztof Geras. ICML, 2021.

Sparse Representation and Dictionary Learning

Exact Recovery of Sparsely-Used Dictionaries, by Daniel Spielman, Huan Wang, and John Wright. Best paper award of the 25th Conference on Learning Theory (COLT), Jun.2012.

Music

不会写代码的研究员不是好歌手 - Original compositions and musical works

My Compositions (POP and NEW AGE)

最后我们还是变成了曾经讨厌的那个人

YouTube Video Preview
Watch on YouTube

好久不见

Listen

冷眼人间

Listen

流年

Listen

暖冬

Listen

月光回忆

YouTube Video Preview
Watch on YouTube
Listen

桎梏

YouTube Video Preview
Watch on YouTube
Listen

生活不止眼前的苟且

YouTube Video Preview
Watch on YouTube

追忆

YouTube Video Preview
Watch on YouTube
Instrumental

涟漪-A Ripple of Love

YouTube Video Preview
Watch on YouTube
MV (400K+ views) Song Instruments

错过

Listen

我们-钢琴轻音乐

Listen

流年-钢琴轻音乐

Listen

MEMORY

Listen

LILIUM (remix)

Listen

WAITING IN DARKNESS

Listen

LIGHT MOOD

Listen

My Recordings / 翻唱

你是我的刘若英

Listen

剪爱

Listen

离开你以后

Listen

Everything I Do

Listen

Season In The Sun

Listen

Mandy

Listen

啦啦歌

Listen

YouTube Channel

Subscribe to my YouTube channel for more music content:

JoyousPrince (@YouTube)

NetEase Music Channe (网易音乐人)

Follow my NetEase Music artist page for streaming my compositions:

Huan Wang (@NetEase Music)