Yufa Zhou 周宇发
Logo CS PhD Student @ Duke

I am an incoming CS PhD student at the Duke University.

I have a profound interest in AI, encompassing theoretical, empirical, and even philosophical aspects. My current research focuses on advancing the understanding of large language models (mechanisms, theory), improving optimization (acceleration, efficiency), and strengthening trustworthiness (safety, privacy, interpretability). I’m also broadly interested in post-training, multi-agent systems, reasoning, domain generalization, and alignment.

I am happy to chat on everything. Feel free to contact.

Curriculum Vitae

Education
  • Duke University
    Duke University
    Ph.D. in Computer Science
    Aug. 2025 – Present
  • University of Pennsylvania
    University of Pennsylvania
    M.S.E. in Scientific Computing
    Aug. 2023 - May. 2025
  • Wuhan University
    Wuhan University
    B.E. in Engineering Mechanics
    Sep. 2019 - Jul. 2023
News
2025
1 paper got accepted by ICCV 2025
Jun 26
Accepted the Ph.D. offer in Computer Science at Duke University
Feb 27
1 paper got accepted by AISTATS 2025 and 1 paper got accepted by ICLR 2025
Jan 22
2024
2 papers got accepted by AAAI 2025
Dec 09
4 papers got accepted by NeurIPS 2024 Workshop
Oct 10
Selected Publications (view all )
Unraveling the Smoothness Properties of Diffusion Models: A Gaussian Mixture Perspective
Unraveling the Smoothness Properties of Diffusion Models: A Gaussian Mixture Perspective

Yingyu Liang, Zhizhou Sha, Zhenmei Shi, Zhao Song, Mingda Wan, Yufa Zhou (alphabetical order)

ICCV 2025

We provide a theoretical analysis showing that for diffusion models with Gaussian mixture data, the diffusion process preserves the mixture structure; we derive tight, component-independent bounds on Lipschitz constants and second moments, and establish error guarantees for diffusion solvers—offering deeper insights into the diffusion dynamics under common data distributions.

Unraveling the Smoothness Properties of Diffusion Models: A Gaussian Mixture Perspective

Yingyu Liang, Zhizhou Sha, Zhenmei Shi, Zhao Song, Mingda Wan, Yufa Zhou (alphabetical order)

ICCV 2025

We provide a theoretical analysis showing that for diffusion models with Gaussian mixture data, the diffusion process preserves the mixture structure; we derive tight, component-independent bounds on Lipschitz constants and second moments, and establish error guarantees for diffusion solvers—offering deeper insights into the diffusion dynamics under common data distributions.

Reasoning Like an Economist: Post-Training on Economic Problems Induces Strategic Generalization in LLMs
Reasoning Like an Economist: Post-Training on Economic Problems Induces Strategic Generalization in LLMs

Yufa Zhou*, Shaobo Wang*, Xingyu Dong*, Xiangqi Jin, Yifang Chen, Yue Min, Kexin Yang, Xingzhang Ren, Dayiheng Liu, Linfeng Zhang (* equal contribution)

arXiv 2025

We investigate whether post-training techniques such as SFT and RLVR can generalize to multi-agent systems, and introduce Recon—a 7B model trained on a curated dataset of economic reasoning problems—which achieves strong benchmark performance and exhibits emergent strategic generalization in multi-agent games.

Reasoning Like an Economist: Post-Training on Economic Problems Induces Strategic Generalization in LLMs

Yufa Zhou*, Shaobo Wang*, Xingyu Dong*, Xiangqi Jin, Yifang Chen, Yue Min, Kexin Yang, Xingzhang Ren, Dayiheng Liu, Linfeng Zhang (* equal contribution)

arXiv 2025

We investigate whether post-training techniques such as SFT and RLVR can generalize to multi-agent systems, and introduce Recon—a 7B model trained on a curated dataset of economic reasoning problems—which achieves strong benchmark performance and exhibits emergent strategic generalization in multi-agent games.

Looped relu mlps may be all you need as practical programmable computers
Looped relu mlps may be all you need as practical programmable computers

Yingyu Liang, Zhizhou Sha, Zhenmei Shi, Zhao Song, Yufa Zhou (alphabetical order)

AISTATS 2025

We demonstrate that a looped 23-layer ReLU-MLP can function as a universal programmable computer—revealing that simple neural network modules possess greater expressive power than previously thought and can perform complex tasks without relying on advanced architectures like Transformers.

Looped relu mlps may be all you need as practical programmable computers

Yingyu Liang, Zhizhou Sha, Zhenmei Shi, Zhao Song, Yufa Zhou (alphabetical order)

AISTATS 2025

We demonstrate that a looped 23-layer ReLU-MLP can function as a universal programmable computer—revealing that simple neural network modules possess greater expressive power than previously thought and can perform complex tasks without relying on advanced architectures like Transformers.

Beyond Linear Approximations: A Novel Pruning Approach for Attention Matrix
Beyond Linear Approximations: A Novel Pruning Approach for Attention Matrix

Yingyu Liang, Jiangxuan Long, Zhenmei Shi, Zhao Song, Yufa Zhou (alphabetical order)

ICLR 2025

We introduce a novel LLM weight pruning method that directly optimizes for approximating the non-linear attention matrix—with theoretical convergence guarantees—effectively reducing computational costs while maintaining model performance.

Beyond Linear Approximations: A Novel Pruning Approach for Attention Matrix

Yingyu Liang, Jiangxuan Long, Zhenmei Shi, Zhao Song, Yufa Zhou (alphabetical order)

ICLR 2025

We introduce a novel LLM weight pruning method that directly optimizes for approximating the non-linear attention matrix—with theoretical convergence guarantees—effectively reducing computational costs while maintaining model performance.

All publications
Academic Services
  • Conference Reviewer: ICLR 2025, NAACL 2025, IJCAI 2025, ACL 2025, EMNLP 2025.
  • Journal Reviewer: TKDE, TNNLS.