Hi! I am a fourth-year PhD student at the University of Washington (UW),
advised by Prof. Mari Ostendorf and Noah A. Smith.
I also closely collaborate with Prof. Ranjay Krishna.
I am a research scientist intern at Meta GenAI, building better LLaMAs.
Previously, I have also interned at Allen Institute for AI (AI2), Google Research and Tencent AI.
My research primarily focuses on building multimodal models that can understand, reason, and generate across many modalities (text, image, video, ...).
I am also interested in building powerful multimodal agents with these models.
Prior to that, I graduated from the University of Chicago with B.S. in Mathematics, Computer Science, and Economics in 2021,
where I was fortunate to be advised by Prof. Karen Livescu at Toyota Technological Institute at Chicago (TTIC).
Publications
Most recent publications on Semantic Scholar and Google Scholar* indicates equal contribution
Visual Sketchpad: Sketching as a Visual Chain of Thought for Multimodal Language Models
Yushi Hu*, Weijia Shi*, Xingyu Fu, Dan Roth, Mari Ostendorf, Luke Zettlemoyer, Noah A. Smith, Ranjay Krishna
NeurIPS 2024
[paper]
[code]
[project page]
BLINK: Multimodal Large Language Models Can See but Not Percieve
Xingyu Fu*, Yushi Hu*, Bangzheng Li, Yu Feng, Haoyu Wang, Xudong Lin, Dan Roth, Noah A. Smith, Wei-Chiu Ma, Ranjay Krishna
ECCV 2024
[paper]
[project page]
[code]
[HF data]
Visual Program Distillation:
Distilling Tools and Programmatic Reasoning into Vision-Language Models
Yushi Hu, Otilia Stretcu, Chun-Ta Lu, Krishnamurthy Viswanathan, Kenji Hata,
Enming Luo, Ranjay Krishna, Ariel Fuxman
CVPR 2024 (Oral)
[paper]
[project
page]
Fine-Grained Human Feedback Gives Better
Rewards for Language Model Training
Zeqiu Wu*, Yushi Hu*, Weijia Shi, Nouha Dziri, Alane Suhr, Prithviraj
Ammanabrolu, Noah A. Smith, Mari Ostendorf, Hannaneh Hajishirzi
NeurIPS 2023 (Spotlight)
[paper]
[project page]
[code & data]
TIFA: Accurate and Interpretable
Text-to-Image Faithfulness Evaluation with Question Answering
Yushi Hu, Benlin Liu, Jungo Kasai, Yizhong Wang, Mari Ostendorf, Ranjay
Krishna, Noah A. Smith
ICCV 2023
[paper]
[project page]
[code & data]
[poster]
PromptCap: Prompt-Guided Task-Aware Image
Captioning
Yushi Hu*, Hang Hua*, Zhengyuan Yang, Weijia Shi, Noah A. Smith, Jiebo
Luo
ICCV 2023
[paper][project
page]
[Huggingface Checkpoint]
[poster]
Visual Sketchpad: Sketching as a Visual
Chain of Thought for Multimodal Language Models
Yushi Hu*, Weijia Shi*, Xingyu Fu, Dan Roth, Mari Ostendorf, Luke
Zettlemoyer, Noah A. Smith, Ranjay Krishna
NeurIPS 2024
[paper]
[project page]
Decoding-Time Language Model Alignment with Multiple Objectives
Ruizhe Shi, Yifang Chen, Yushi Hu, Alisa Liu, Hannaneh Hajishirzi, Noah A. Smith, Simon Du
NeurIPS 2024
[paper]
BLINK: Multimodal Large Language Models Can
See but Not Percieve
Xingyu Fu*, Yushi Hu*, Bangzheng Li, Yu Feng, Haoyu Wang, Xudong Lin, Dan
Roth, Noah A. Smith, Wei-Chiu Ma, Ranjay Krishna
ECCV 2024
[paper]
[project page]
[code]
[HF
data]
TLDR: We introduce Blink, a new benchmark for multimodal language models (LLMs) that focuses on core visual perception
abilities not found in other evaluations.
Training Language Models to Generate Text with Citations via Fine-grained Rewards
Chengyu Huang, Zeqiu Wu, Yushi Hu, Wenya Wang
ACL 2024
[paper]
TLDR: Using fine-grained rewards to train LLMs to generate text with citations.
Visual Program Distillation:
Distilling Tools and Programmatic Reasoning into Vision-Language Models
Yushi Hu, Otilia Stretcu, Chun-Ta Lu, Krishnamurthy Viswanathan, Kenji Hata,
Enming Luo, Ranjay Krishna, Ariel Fuxman
CVPR 2024 (Oral)
[paper]
[project
page]
TLDR: Using LLM-generated codes + vision tools to generate high-quality multimodal
chain-of-thought reasoning data for large multimodal model (LMM) training.
The resulting models, PaLI-3-VPD (5B) and PaLI-X-VPD (55B) set the new SOTA for many existing vision-Language
tasks.
DreamSync: Aligning Text-to-Image Generation
with Image Understanding Feedback
Jiao Sun*, Deqing Fu*, Yushi Hu*, Su Wang, Royi Rassin, Da-Cheng Juan, Dana
Alon, Charles Herrmann, Sjoerd van Steenkiste, Ranjay Krishna, Cyrus
Rashtchian
Preprint 2023
[paper]
TLDR: Using TIFA as the reward model for text-to-image generation models. Improve the
text-to-image faithfulness and image aesthetics with simple rejection-sampling fine-tuning.
Davidsonian Scene Graph: Improving
Reliability in Fine-Grained Evaluation for Text-Image Generation
Jaemin Cho, Yushi Hu, Roopal Garg, Peter Anderson, Ranjay Krishna, Jason
Baldridge, Mohit Bansal, Jordi Pont-Tuset, Su Wang
ICLR 2024
[paper]
[project page]
[code & data]
TLDR: An improved and more reliable version of TIFA for text-to-image evaluation,
based on Davidsonian semantics and scene graphs.
Fine-Grained Human Feedback Gives Better
Rewards for Language Model Training
Zeqiu Wu*, Yushi Hu*, Weijia Shi, Nouha Dziri, Alane Suhr, Prithviraj
Ammanabrolu, Noah A. Smith, Mari Ostendorf, Hannaneh Hajishirzi
NeurIPS 2023 (Spotlight)
[paper]
[project page]
[code & data]
TLDR: F in current RLHF is overall preference, which conveys limited information. We
introduce Fine-Grained RLHF and train LMs with explicit feedback like "sentence 1 is not factual", "sentence 2
is
toxic". We show that Fine-Grained RLHF is more effective and enables customizing LMs for specific needs.
TIFA: Accurate and Interpretable
Text-to-Image Faithfulness Evaluation with Question Answering
Yushi Hu, Benlin Liu, Jungo Kasai, Yizhong Wang, Mari Ostendorf, Ranjay
Krishna, Noah A. Smith
ICCV 2023
[paper]
[project page]
[code & data]
[poster]
TLDR: Fine-grained and accurate evaluation of synthesized images using Image-to-Text
Models (e.g. GPT-4, BLIP-2, etc.) and Large Language Models (e.g. GPT-3.5). More accurate than CLIP!
PromptCap: Prompt-Guided Task-Aware Image
Captioning
Yushi Hu*, Hang Hua*, Zhengyuan Yang, Weijia Shi, Noah A. Smith, Jiebo
Luo
ICCV 2023
[paper][project
page]
[Huggingface Checkpoint]
[poster]
TLDR: A captioning model that is controlled by natural language instruction. Simple
and effective visual frontend for LLMs like GPT-3 and ChatGPT.
One Embedder, Any Task:
Instruction-Finetuned Text Embeddings
Hongjin Su*, Weijia Shi*, Jungo Kasai, Yizhong Wang, Yushi Hu, Mari
Ostendorf, Wen-tau Yih, Noah A. Smith, Luke
Zettlemoyer, Tao Yu
ACL 2023
[paper]
[project page]
[checkpoint]
TLDR: Instruction-finetuned text embeddings. The SOTA embedding for retrieval,
semantic similarity, etc. Open-source, and better than OpenAI embeddings!
Binding Language Models in Symbolic
Languages
Zhoujun Cheng*, Tianbao Xie*, Peng Shi, Chengzu Li, Rahul Nadkarni, Yushi Hu,
Caiming Xiong, Dragomir Radev, Mari Ostendorf, Luke Zettlemoyer, Noah A Smith, Tao Yu
ICLR 2023 (Spotlight)
[paper][project page]
TLDR: Combining GPT-3 with Python and SQL. Proposes the concept of Toolformer and
ChatGPT plugins.
Unsupervised Learning of Hierarchical
Conversation Structure
Bo-Ru Lu, Yushi Hu, Hao Cheng, Noah A. Smith, Mari Ostendorf
EMNLP 2022
[paper]
TLDR: Learning the common dialogue structure from a huge amount of customer-service
dialogues.
In-Context Learning for Few-Shot Dialogue
State Tracking
Yushi Hu, Chia-Hsuan Lee, Tianbao Xie, Tao Yu, Noah A. Smith, Mari
Ostendorf
EMNLP 2022
[paper] [code] [bibtex]
TLDR: The first paper that shows GPT-3 is surprisingly good at dialogue understanding tasks.
Acoustic Span Embeddings for Multilingual
Query-by-Example Search
Yushi Hu, Shane Settle, Karen Livescu
IEEE Spoken Language Technology Workshop (SLT 2021)
[paper]
[code]
[bibtex]
[slides]
Multilingual Jointly Trained Acoustic and
Written Word Embeddings
Yushi Hu, Shane Settle, Karen Livescu
InterSpeech 2020
[paper]
[code]
[bibtex]
[slides]
[video]
[demo]
Freestanding Ferroelectric Bubble Domains
Saidur R Bakaul, Sergei Prokhorenko, Qi Zhang, Yousra Nahas, Yushi Hu, Amanda
Petford-Long, Laurent Bellaiche, Nagarajan
Valanoor
Advanced Materials, 2021
[paper]
Work Experience
- Student Researcher, Allen Institute for AI, Seattle, WA, 2024
- Student Researcher, Google Research, Mountain View, CA, summer 2023
- Research Intern, Tencent AI Lab (Seattle), Bellevue, WA, summer 2022
- Research Assistant, Toyota Technological Institute at Chicago (TTIC), Chicago, IL, 2019 - 2021
- Research Intern, Argonne National Laboratory, Lemont, IL, summer 2018