Hi! I am a third-year PhD student in the Natural Language Processing (NLP) group at the University of Washington (UW), advised by Prof. Mari Ostendorf and Noah A. Smith. I am also closely working with Prof. Ranjay Krishna. I have also worked as a student researcher in Google Research and Tencent AI Lab. My primary interestes are Large Multimodal models and (Reinforcement) Learning from Human (AI) Feedback (RLHF).

Previously, I graduated from the University of Chicago with B.S. in Mathematics, Computer Science, and Economics in 2021, where I was fortunate to be advised by Prof. Karen Livescu at Toyota Technological Institute at Chicago (TTIC).

Publications

Most recent publications on Semantic Scholar and Google Scholar
* indicates equal contribution

Visual Program Distillation: Distilling Tools and Programmatic Reasoning into Vision-Language Models
Yushi Hu, Otilia Stretcu, Chun-Ta Lu, Krishnamurthy Viswanathan, Kenji Hata, Enming Luo, Ranjay Krishna, Ariel Fuxman
CVPR 2024
[paper] [project page]
TLDR: Using LLM-generated codes + vision tools to generate high-quality multimodal chain-of-thought reasoning data for large multimodal model (LMM) training. The resulting models, PaLI-3-VPD (5B) and PaLI-X-VPD (55B) set the new SOTA for many existing vision-Language tasks.

DreamSync: Aligning Text-to-Image Generation with Image Understanding Feedback
Jiao Sun*, Deqing Fu*, Yushi Hu*, Su Wang, Royi Rassin, Da-Cheng Juan, Dana Alon, Charles Herrmann, Sjoerd van Steenkiste, Ranjay Krishna, Cyrus Rashtchian
Preprint 2023
[paper]
TLDR: Using TIFA as the reward model for text-to-image generation models. Improve the text-to-image faithfulness and image aesthetics with simple rejection-sampling fine-tuning.

Davidsonian Scene Graph: Improving Reliability in Fine-Grained Evaluation for Text-Image Generation
Jaemin Cho, Yushi Hu, Roopal Garg, Peter Anderson, Ranjay Krishna, Jason Baldridge, Mohit Bansal, Jordi Pont-Tuset, Su Wang
ICLR 2024
[paper] [project page] [code & data]
TLDR: An improved and more reliable version of TIFA for text-to-image evaluation, based on Davidsonian semantics and scene graphs.

Fine-Grained Human Feedback Gives Better Rewards for Language Model Training
Zeqiu Wu*, Yushi Hu*, Weijia Shi, Nouha Dziri, Alane Suhr, Prithviraj Ammanabrolu, Noah A. Smith, Mari Ostendorf, Hannaneh Hajishirzi
NeurIPS 2023 (Spotlight)
[paper] [project page] [code & data]
TLDR: F in current RLHF is overall preference, which conveys limited information. We introduce Fine-Grained RLHF and train LMs with explicit feedback like "sentence 1 is not factual", "sentence 2 is toxic". We show that Fine-Grained RLHF is more effective and enables customizing LMs for specific needs.

TIFA: Accurate and Interpretable Text-to-Image Faithfulness Evaluation with Question Answering
Yushi Hu, Benlin Liu, Jungo Kasai, Yizhong Wang, Mari Ostendorf, Ranjay Krishna, Noah A. Smith
ICCV 2023
[paper] [project page] [code & data] [poster]
TLDR: Fine-grained and accurate evaluation of synthesized images using Image-to-Text Models (e.g. GPT-4, BLIP-2, etc.) and Large Language Models (e.g. GPT-3.5). More accurate than CLIP!

PromptCap: Prompt-Guided Task-Aware Image Captioning
Yushi Hu*, Hang Hua*, Zhengyuan Yang, Weijia Shi, Noah A. Smith, Jiebo Luo
ICCV 2023
[paper][project page] [Huggingface Checkpoint] [poster]
TLDR: A captioning model that is controlled by natural language instruction. Simple and effective visual frontend for LLMs like GPT-3 and ChatGPT.

One Embedder, Any Task: Instruction-Finetuned Text Embeddings
Hongjin Su*, Weijia Shi*, Jungo Kasai, Yizhong Wang, Yushi Hu, Mari Ostendorf, Wen-tau Yih, Noah A. Smith, Luke Zettlemoyer, Tao Yu
ACL 2023
[paper] [project page] [checkpoint]
TLDR: Instruction-finetuned text embeddings. The SOTA embedding for retrieval, semantic similarity, etc. Open-source, and better than OpenAI embeddings!

Binding Language Models in Symbolic Languages
Zhoujun Cheng*, Tianbao Xie*, Peng Shi, Chengzu Li, Rahul Nadkarni, Yushi Hu, Caiming Xiong, Dragomir Radev, Mari Ostendorf, Luke Zettlemoyer, Noah A Smith, Tao Yu
ICLR 2023 (Spotlight)
[paper][project page]
TLDR: Combining GPT-3 with Python and SQL. Proposes the concept of Toolformer and ChatGPT plugins.

Unsupervised Learning of Hierarchical Conversation Structure
Bo-Ru Lu, Yushi Hu, Hao Cheng, Noah A. Smith, Mari Ostendorf
EMNLP 2022
[paper]
TLDR: Learning the common dialogue structure from a huge amount of customer-service dialogues.

In-Context Learning for Few-Shot Dialogue State Tracking
Yushi Hu, Chia-Hsuan Lee, Tianbao Xie, Tao Yu, Noah A. Smith, Mari Ostendorf
EMNLP 2022
[paper] [code] [bibtex]
TLDR: The first paper that shows GPT-3 is surprisingly good at dialogue understanding tasks.

Acoustic Span Embeddings for Multilingual Query-by-Example Search
Yushi Hu, Shane Settle, Karen Livescu
IEEE Spoken Language Technology Workshop (SLT 2021)
[paper] [code] [bibtex] [slides]

Multilingual Jointly Trained Acoustic and Written Word Embeddings
Yushi Hu, Shane Settle, Karen Livescu
InterSpeech 2020
[paper] [code] [bibtex] [slides] [video] [demo]

Freestanding Ferroelectric Bubble Domains
Saidur R Bakaul, Sergei Prokhorenko, Qi Zhang, Yousra Nahas, Yushi Hu, Amanda Petford-Long, Laurent Bellaiche, Nagarajan Valanoor
Advanced Materials, 2021
[paper]

Work Experience


  • Student Researcher, Google, Mountain View, CA, summer 2023
  • Research Intern, Tencent AI Lab (Seattle), Bellevue, WA, summer 2022
  • Research Assistant, Toyota Technological Institute at Chicago, Chicago, IL, 2019 - 2021
  • Machine Learning Engineer Intern, Learnable.ai, Boston, MA, summer 2019
  • Research Intern, Argonne National Laboratory, Lemont, IL, summer 2018