Yufa Zhou 周宇发
Logo CS PhD Student @ Duke

I am a 1st year CS PhD student at Duke University, advised by Prof. Anru Zhang.

I study the foundations of advanced AI models, focusing on how and why they work.

My current research centers on understanding how and why advanced AI models—such as large language models and diffusion models—work, and on leveraging this understanding to make them more accurate, efficient, and robust. I am deeply interested in discovering the first law of intelligence. More broadly, I am interested in AI across theoretical, empirical, and even philosophical dimensions, as well as in areas where stronger foundations may unlock exciting applications, from trustworthy AI to agentic intelligence and AI for science.

I am always open to discussions and collaborations. Feel free to reach out.

Curriculum Vitae

Education
  • Duke University
    Duke University
    Ph.D. in Computer Science
    Aug. 2025 – Present
  • University of Pennsylvania
    University of Pennsylvania
    M.S.E. in Scientific Computing
    Aug. 2023 - May. 2025
  • Wuhan University
    Wuhan University
    B.E. in Engineering Mechanics
    Sep. 2019 - Jul. 2023
News
2025
1 paper got accepted by NeurIPS 2025 and 1 paper got accepted by NeurIPS 2025 Workshop Oral
Sep 23
1 paper got accepted by ICCV 2025
Jun 26
Accepted the Ph.D. offer in Computer Science at Duke University
Feb 27
1 paper got accepted by AISTATS 2025 and 1 paper got accepted by ICLR 2025
Jan 22
2024
2 papers got accepted by AAAI 2025
Dec 09
4 papers got accepted by NeurIPS 2024 Workshop
Oct 10
Selected Publications (view all )
The Geometry of Reasoning: Flowing Logics in Representation Space
The Geometry of Reasoning: Flowing Logics in Representation Space

Yufa Zhou*, Yixiao Wang*, Xunjian Yin*, Shuyan Zhou, Anru R. Zhang(* equal contribution)

arXiv 2025

We study how LLMs “think” through their embeddings by introducing a geometric framework of reasoning flows, where reasoning emerges as smooth trajectories in representation space whose velocity and curvature are governed by logical structure rather than surface semantics, validated through cross-topic and cross-language experiments, opening a new lens for interpretability.

×
BibTeX Citation
@article{zhou2025geometry, title={The Geometry of Reasoning: Flowing Logics in Representation Space}, author={Zhou, Yufa and Wang, Yixiao and Yin, Xunjian and Zhou, Shuyan and Zhang, Anru R.}, journal={arXiv preprint arXiv:2510.09782}, year={2025} }
The Geometry of Reasoning: Flowing Logics in Representation Space

Yufa Zhou*, Yixiao Wang*, Xunjian Yin*, Shuyan Zhou, Anru R. Zhang(* equal contribution)

arXiv 2025

We study how LLMs “think” through their embeddings by introducing a geometric framework of reasoning flows, where reasoning emerges as smooth trajectories in representation space whose velocity and curvature are governed by logical structure rather than surface semantics, validated through cross-topic and cross-language experiments, opening a new lens for interpretability.

×
BibTeX Citation
@article{zhou2025geometry, title={The Geometry of Reasoning: Flowing Logics in Representation Space}, author={Zhou, Yufa and Wang, Yixiao and Yin, Xunjian and Zhou, Shuyan and Zhang, Anru R.}, journal={arXiv preprint arXiv:2510.09782}, year={2025} }
Why Do Transformers Fail to Forecast Time Series In-Context?
Why Do Transformers Fail to Forecast Time Series In-Context?

Yufa Zhou*, Yixiao Wang*, Surbhi Goel, Anru R. Zhang(* equal contribution)

NeurIPS 2025 Workshop: What Can('t) Transformers Do? Oral (3/68 ≈ 4.4%)

We analyze why Transformers fail in time-series forecasting through in-context learning theory, proving that, under AR($p$) data, linear self-attention cannot outperform classical linear predictors and suffers a strict $O(1/n)$ excess-risk gap, while chain-of-thought inference compounds errors exponentially—revealing fundamental representational limits of attention and offering principled insights.

×
BibTeX Citation
@article{zhou2025tsf, title={Why Do Transformers Fail to Forecast Time Series In-Context?}, author={Zhou, Yufa and Wang, Yixiao and Goel, Surbhi and Zhang, Anru R.}, journal={arXiv preprint arXiv:2510.09776}, year={2025} }
Why Do Transformers Fail to Forecast Time Series In-Context?

Yufa Zhou*, Yixiao Wang*, Surbhi Goel, Anru R. Zhang(* equal contribution)

NeurIPS 2025 Workshop: What Can('t) Transformers Do? Oral (3/68 ≈ 4.4%)

We analyze why Transformers fail in time-series forecasting through in-context learning theory, proving that, under AR($p$) data, linear self-attention cannot outperform classical linear predictors and suffers a strict $O(1/n)$ excess-risk gap, while chain-of-thought inference compounds errors exponentially—revealing fundamental representational limits of attention and offering principled insights.

×
BibTeX Citation
@article{zhou2025tsf, title={Why Do Transformers Fail to Forecast Time Series In-Context?}, author={Zhou, Yufa and Wang, Yixiao and Goel, Surbhi and Zhang, Anru R.}, journal={arXiv preprint arXiv:2510.09776}, year={2025} }
Unraveling the Smoothness Properties of Diffusion Models: A Gaussian Mixture Perspective
Unraveling the Smoothness Properties of Diffusion Models: A Gaussian Mixture Perspective

Yingyu Liang*, Zhizhou Sha*, Zhenmei Shi*, Zhao Song*, Mingda Wan*, Yufa Zhou*(α–β alphabetical order)

ICCV 2025

We provide a theoretical analysis showing that for diffusion models with Gaussian mixture data, the diffusion process preserves the mixture structure; we derive tight, component-independent bounds on Lipschitz constants and second moments, and establish error guarantees for diffusion solvers—offering deeper insights into the diffusion dynamics under common data distributions.

×
BibTeX Citation
@inproceedings{liang2025unraveling, author = {Liang, Yingyu and Sha, Zhizhou and Shi, Zhenmei and Song, Zhao and Wan, Mingda and Zhou, Yufa}, title = {Unraveling the Smoothness Properties of Diffusion Models: A Gaussian Mixture Perspective}, booktitle = {Proceedings of the IEEE/CVF International Conference on Computer Vision (ICCV)}, month = {October}, year = {2025}, pages = {11436-11446} }
Unraveling the Smoothness Properties of Diffusion Models: A Gaussian Mixture Perspective

Yingyu Liang*, Zhizhou Sha*, Zhenmei Shi*, Zhao Song*, Mingda Wan*, Yufa Zhou*(α–β alphabetical order)

ICCV 2025

We provide a theoretical analysis showing that for diffusion models with Gaussian mixture data, the diffusion process preserves the mixture structure; we derive tight, component-independent bounds on Lipschitz constants and second moments, and establish error guarantees for diffusion solvers—offering deeper insights into the diffusion dynamics under common data distributions.

×
BibTeX Citation
@inproceedings{liang2025unraveling, author = {Liang, Yingyu and Sha, Zhizhou and Shi, Zhenmei and Song, Zhao and Wan, Mingda and Zhou, Yufa}, title = {Unraveling the Smoothness Properties of Diffusion Models: A Gaussian Mixture Perspective}, booktitle = {Proceedings of the IEEE/CVF International Conference on Computer Vision (ICCV)}, month = {October}, year = {2025}, pages = {11436-11446} }
Beyond Linear Approximations: A Novel Pruning Approach for Attention Matrix
Beyond Linear Approximations: A Novel Pruning Approach for Attention Matrix

Yingyu Liang*, Jiangxuan Long*, Zhenmei Shi*, Zhao Song*, Yufa Zhou*(α–β alphabetical order)

ICLR 2025

We introduce a novel LLM weight pruning method that directly optimizes for approximating the non-linear attention matrix—with theoretical convergence guarantees—effectively reducing computational costs while maintaining model performance.

×
BibTeX Citation
@inproceedings{liang2025beyond, title={Beyond Linear Approximations: A Novel Pruning Approach for Attention Matrix}, author={Yingyu Liang and Jiangxuan Long and Zhenmei Shi and Zhao Song and Yufa Zhou}, booktitle={The Thirteenth International Conference on Learning Representations}, year={2025}, url={https://openreview.net/forum?id=sgbI8Pxwie} }
Beyond Linear Approximations: A Novel Pruning Approach for Attention Matrix

Yingyu Liang*, Jiangxuan Long*, Zhenmei Shi*, Zhao Song*, Yufa Zhou*(α–β alphabetical order)

ICLR 2025

We introduce a novel LLM weight pruning method that directly optimizes for approximating the non-linear attention matrix—with theoretical convergence guarantees—effectively reducing computational costs while maintaining model performance.

×
BibTeX Citation
@inproceedings{liang2025beyond, title={Beyond Linear Approximations: A Novel Pruning Approach for Attention Matrix}, author={Yingyu Liang and Jiangxuan Long and Zhenmei Shi and Zhao Song and Yufa Zhou}, booktitle={The Thirteenth International Conference on Learning Representations}, year={2025}, url={https://openreview.net/forum?id=sgbI8Pxwie} }
All publications
Mentees
Academic Services
  • Conference Reviewer: ICLR (2025, 2026), NAACL 2025, IJCAI 2025, ACL 2025, EMNLP 2025, AAAI 2026.
  • Journal Reviewer: TKDE, TNNLS.