Homepage - Yufa Zhou

Selected Publications (view all )

The Geometry of Reasoning: Flowing Logics in Representation Space

Yufa Zhou*, Yixiao Wang*, Xunjian Yin*, Shuyan Zhou, Anru R. Zhang(* equal contribution)

arXiv 2025

We study how LLMs “think” through their embeddings by introducing a geometric framework of reasoning flows, where reasoning emerges as smooth trajectories in representation space whose velocity and curvature are governed by logical structure rather than surface semantics, validated through cross-topic and cross-language experiments, opening a new lens for interpretability.

[Paper] [Code] [Dataset] [BibTeX]

The Geometry of Reasoning: Flowing Logics in Representation Space

Yufa Zhou*, Yixiao Wang*, Xunjian Yin*, Shuyan Zhou, Anru R. Zhang(* equal contribution)

arXiv 2025

[Paper] [Code] [Dataset] [BibTeX]

Why Do Transformers Fail to Forecast Time Series In-Context?

Yufa Zhou*, Yixiao Wang*, Surbhi Goel, Anru R. Zhang(* equal contribution)

NeurIPS 2025 Workshop: What Can('t) Transformers Do? Oral (3/68 ≈ 4.4%)

We analyze why Transformers fail in time-series forecasting through in-context learning theory, proving that, under AR($p$) data, linear self-attention cannot outperform classical linear predictors and suffers a strict $O(1/n)$ excess-risk gap, while chain-of-thought inference compounds errors exponentially—revealing fundamental representational limits of attention and offering principled insights.

[Paper] [Code] [BibTeX]

Why Do Transformers Fail to Forecast Time Series In-Context?

Yufa Zhou*, Yixiao Wang*, Surbhi Goel, Anru R. Zhang(* equal contribution)

NeurIPS 2025 Workshop: What Can('t) Transformers Do? Oral (3/68 ≈ 4.4%)

[Paper] [Code] [BibTeX]

Unraveling the Smoothness Properties of Diffusion Models: A Gaussian Mixture Perspective

Yingyu Liang*, Zhizhou Sha*, Zhenmei Shi*, Zhao Song*, Mingda Wan*, Yufa Zhou*(α–β alphabetical order)

ICCV 2025

We provide a theoretical analysis showing that for diffusion models with Gaussian mixture data, the diffusion process preserves the mixture structure; we derive tight, component-independent bounds on Lipschitz constants and second moments, and establish error guarantees for diffusion solvers—offering deeper insights into the diffusion dynamics under common data distributions.

[Paper] [BibTeX]

Unraveling the Smoothness Properties of Diffusion Models: A Gaussian Mixture Perspective

Yingyu Liang*, Zhizhou Sha*, Zhenmei Shi*, Zhao Song*, Mingda Wan*, Yufa Zhou*(α–β alphabetical order)

ICCV 2025

[Paper] [BibTeX]

Beyond Linear Approximations: A Novel Pruning Approach for Attention Matrix

Yingyu Liang*, Jiangxuan Long*, Zhenmei Shi*, Zhao Song*, Yufa Zhou*(α–β alphabetical order)

ICLR 2025

We introduce a novel LLM weight pruning method that directly optimizes for approximating the non-linear attention matrix—with theoretical convergence guarantees—effectively reducing computational costs while maintaining model performance.

[Paper] [BibTeX]

Beyond Linear Approximations: A Novel Pruning Approach for Attention Matrix

Yingyu Liang*, Jiangxuan Long*, Zhenmei Shi*, Zhao Song*, Yufa Zhou*(α–β alphabetical order)

ICLR 2025

[Paper] [BibTeX]

Action required

Education

News

Selected Publications (view all )

The Geometry of Reasoning: Flowing Logics in Representation Space

The Geometry of Reasoning: Flowing Logics in Representation Space

Why Do Transformers Fail to Forecast Time Series In-Context?

Why Do Transformers Fail to Forecast Time Series In-Context?

Unraveling the Smoothness Properties of Diffusion Models: A Gaussian Mixture Perspective

Unraveling the Smoothness Properties of Diffusion Models: A Gaussian Mixture Perspective

Beyond Linear Approximations: A Novel Pruning Approach for Attention Matrix

Beyond Linear Approximations: A Novel Pruning Approach for Attention Matrix

All publications

Mentees

Academic Services

Action required

Education

News

Selected Publications (view all )

The Geometry of Reasoning: Flowing Logics in Representation Space

BibTeX Citation

The Geometry of Reasoning: Flowing Logics in Representation Space

BibTeX Citation

Why Do Transformers Fail to Forecast Time Series In-Context?

BibTeX Citation

Why Do Transformers Fail to Forecast Time Series In-Context?

BibTeX Citation

Unraveling the Smoothness Properties of Diffusion Models: A Gaussian Mixture Perspective

BibTeX Citation

Unraveling the Smoothness Properties of Diffusion Models: A Gaussian Mixture Perspective

BibTeX Citation

Beyond Linear Approximations: A Novel Pruning Approach for Attention Matrix

BibTeX Citation

Beyond Linear Approximations: A Novel Pruning Approach for Attention Matrix

BibTeX Citation

All publications

Mentees

Academic Services