2025

Automating Structural Engineering Workflows with Large Language Model Agents
Automating Structural Engineering Workflows with Large Language Model Agents

Haoran Liang*, Yufa Zhou*, Mohammad Talebi-Kalaleh, Qipei Mei(* equal contribution)

arXiv 2025

We present MASSE, the first multi-agent system that automates structural engineering workflows by integrating reasoning, planning, and tool use to perform complex design and verification tasks—achieving training-free automation that cuts expert workload from hours to minutes and demonstrates tangible real-world impact.

×
BibTeX Citation
@article{liang2025masse, title = {Automating Structural Engineering Workflows with Large Language Model Agents}, author = {Haoran Liang and Yufa Zhou and Mohammad Talebi Kalaeh and Qipei Mei}, journal = {arXiv preprint arXiv:2510.11004}, year = {2025} }
Automating Structural Engineering Workflows with Large Language Model Agents

Haoran Liang*, Yufa Zhou*, Mohammad Talebi-Kalaleh, Qipei Mei(* equal contribution)

arXiv 2025

We present MASSE, the first multi-agent system that automates structural engineering workflows by integrating reasoning, planning, and tool use to perform complex design and verification tasks—achieving training-free automation that cuts expert workload from hours to minutes and demonstrates tangible real-world impact.

×
BibTeX Citation
@article{liang2025masse, title = {Automating Structural Engineering Workflows with Large Language Model Agents}, author = {Haoran Liang and Yufa Zhou and Mohammad Talebi Kalaeh and Qipei Mei}, journal = {arXiv preprint arXiv:2510.11004}, year = {2025} }
The Geometry of Reasoning: Flowing Logics in Representation Space
The Geometry of Reasoning: Flowing Logics in Representation Space

Yufa Zhou*, Yixiao Wang*, Xunjian Yin*, Shuyan Zhou, Anru R. Zhang(* equal contribution)

arXiv 2025

We study how LLMs “think” through their embeddings by introducing a geometric framework of reasoning flows, where reasoning emerges as smooth trajectories in representation space whose velocity and curvature are governed by logical structure rather than surface semantics, validated through cross-topic and cross-language experiments, opening a new lens for interpretability.

×
BibTeX Citation
@article{zhou2025geometry, title={The Geometry of Reasoning: Flowing Logics in Representation Space}, author={Zhou, Yufa and Wang, Yixiao and Yin, Xunjian and Zhou, Shuyan and Zhang, Anru R.}, journal={arXiv preprint arXiv:2510.09782}, year={2025} }
The Geometry of Reasoning: Flowing Logics in Representation Space

Yufa Zhou*, Yixiao Wang*, Xunjian Yin*, Shuyan Zhou, Anru R. Zhang(* equal contribution)

arXiv 2025

We study how LLMs “think” through their embeddings by introducing a geometric framework of reasoning flows, where reasoning emerges as smooth trajectories in representation space whose velocity and curvature are governed by logical structure rather than surface semantics, validated through cross-topic and cross-language experiments, opening a new lens for interpretability.

×
BibTeX Citation
@article{zhou2025geometry, title={The Geometry of Reasoning: Flowing Logics in Representation Space}, author={Zhou, Yufa and Wang, Yixiao and Yin, Xunjian and Zhou, Shuyan and Zhang, Anru R.}, journal={arXiv preprint arXiv:2510.09782}, year={2025} }
Why Do Transformers Fail to Forecast Time Series In-Context?
Why Do Transformers Fail to Forecast Time Series In-Context?

Yufa Zhou*, Yixiao Wang*, Surbhi Goel, Anru R. Zhang(* equal contribution)

NeurIPS 2025 Workshop: What Can('t) Transformers Do? Oral (3/68 ≈ 4.4%)

We analyze why Transformers fail in time-series forecasting through in-context learning theory, proving that, under AR($p$) data, linear self-attention cannot outperform classical linear predictors and suffers a strict $O(1/n)$ excess-risk gap, while chain-of-thought inference compounds errors exponentially—revealing fundamental representational limits of attention and offering principled insights.

×
BibTeX Citation
@article{zhou2025tsf, title={Why Do Transformers Fail to Forecast Time Series In-Context?}, author={Zhou, Yufa and Wang, Yixiao and Goel, Surbhi and Zhang, Anru R.}, journal={arXiv preprint arXiv:2510.09776}, year={2025} }
Why Do Transformers Fail to Forecast Time Series In-Context?

Yufa Zhou*, Yixiao Wang*, Surbhi Goel, Anru R. Zhang(* equal contribution)

NeurIPS 2025 Workshop: What Can('t) Transformers Do? Oral (3/68 ≈ 4.4%)

We analyze why Transformers fail in time-series forecasting through in-context learning theory, proving that, under AR($p$) data, linear self-attention cannot outperform classical linear predictors and suffers a strict $O(1/n)$ excess-risk gap, while chain-of-thought inference compounds errors exponentially—revealing fundamental representational limits of attention and offering principled insights.

×
BibTeX Citation
@article{zhou2025tsf, title={Why Do Transformers Fail to Forecast Time Series In-Context?}, author={Zhou, Yufa and Wang, Yixiao and Goel, Surbhi and Zhang, Anru R.}, journal={arXiv preprint arXiv:2510.09776}, year={2025} }
Efficient Multi-modal Large Language Models via Progressive Consistency Distillation
Efficient Multi-modal Large Language Models via Progressive Consistency Distillation

Zichen Wen, Shaobo Wang, Yufa Zhou, Junyuan Zhang, Qintong Zhang, Yifeng Gao, Zhaorun Chen, Bin Wang, Weijia Li, Conghui He, Linfeng Zhang

NeurIPS 2025

We propose EPIC, a progressive consistency distillation framework that mitigates the training difficulty of token compression in multi-modal LLMs by enforcing token- and layer-level consistency, achieving superior efficiency, robustness, and generalization across benchmarks.

×
BibTeX Citation
@inproceedings{wen2025efficient, title={Efficient Multi-modal Large Language Models via Progressive Consistency Distillation}, author={Wen, Zichen and Wang, Shaobo and Zhou, Yufa and Zhang, Junyuan and Zhang, Qintong and Gao, Yifeng and Chen, Zhaorun and Wang, Bin and Li, Weijia and He, Conghui and Zhang, Linfeng}, booktitle={The Thirty-ninth Annual Conference on Neural Information Processing Systems}, year={2025}, url={https://openreview.net/forum?id=gZjPllL9jM} }
Efficient Multi-modal Large Language Models via Progressive Consistency Distillation

Zichen Wen, Shaobo Wang, Yufa Zhou, Junyuan Zhang, Qintong Zhang, Yifeng Gao, Zhaorun Chen, Bin Wang, Weijia Li, Conghui He, Linfeng Zhang

NeurIPS 2025

We propose EPIC, a progressive consistency distillation framework that mitigates the training difficulty of token compression in multi-modal LLMs by enforcing token- and layer-level consistency, achieving superior efficiency, robustness, and generalization across benchmarks.

×
BibTeX Citation
@inproceedings{wen2025efficient, title={Efficient Multi-modal Large Language Models via Progressive Consistency Distillation}, author={Wen, Zichen and Wang, Shaobo and Zhou, Yufa and Zhang, Junyuan and Zhang, Qintong and Gao, Yifeng and Chen, Zhaorun and Wang, Bin and Li, Weijia and He, Conghui and Zhang, Linfeng}, booktitle={The Thirty-ninth Annual Conference on Neural Information Processing Systems}, year={2025}, url={https://openreview.net/forum?id=gZjPllL9jM} }
Unraveling the Smoothness Properties of Diffusion Models: A Gaussian Mixture Perspective
Unraveling the Smoothness Properties of Diffusion Models: A Gaussian Mixture Perspective

Yingyu Liang*, Zhizhou Sha*, Zhenmei Shi*, Zhao Song*, Mingda Wan*, Yufa Zhou*(α–β alphabetical order)

ICCV 2025

We provide a theoretical analysis showing that for diffusion models with Gaussian mixture data, the diffusion process preserves the mixture structure; we derive tight, component-independent bounds on Lipschitz constants and second moments, and establish error guarantees for diffusion solvers—offering deeper insights into the diffusion dynamics under common data distributions.

×
BibTeX Citation
@inproceedings{liang2025unraveling, author = {Liang, Yingyu and Sha, Zhizhou and Shi, Zhenmei and Song, Zhao and Wan, Mingda and Zhou, Yufa}, title = {Unraveling the Smoothness Properties of Diffusion Models: A Gaussian Mixture Perspective}, booktitle = {Proceedings of the IEEE/CVF International Conference on Computer Vision (ICCV)}, month = {October}, year = {2025}, pages = {11436-11446} }
Unraveling the Smoothness Properties of Diffusion Models: A Gaussian Mixture Perspective

Yingyu Liang*, Zhizhou Sha*, Zhenmei Shi*, Zhao Song*, Mingda Wan*, Yufa Zhou*(α–β alphabetical order)

ICCV 2025

We provide a theoretical analysis showing that for diffusion models with Gaussian mixture data, the diffusion process preserves the mixture structure; we derive tight, component-independent bounds on Lipschitz constants and second moments, and establish error guarantees for diffusion solvers—offering deeper insights into the diffusion dynamics under common data distributions.

×
BibTeX Citation
@inproceedings{liang2025unraveling, author = {Liang, Yingyu and Sha, Zhizhou and Shi, Zhenmei and Song, Zhao and Wan, Mingda and Zhou, Yufa}, title = {Unraveling the Smoothness Properties of Diffusion Models: A Gaussian Mixture Perspective}, booktitle = {Proceedings of the IEEE/CVF International Conference on Computer Vision (ICCV)}, month = {October}, year = {2025}, pages = {11436-11446} }
Reasoning Like an Economist: Post-Training on Economic Problems Induces Strategic Generalization in LLMs
Reasoning Like an Economist: Post-Training on Economic Problems Induces Strategic Generalization in LLMs

Yufa Zhou*, Shaobo Wang*, Xingyu Dong*, Xiangqi Jin, Yifang Chen, Yue Min, Kexin Yang, Xingzhang Ren, Dayiheng Liu, Linfeng Zhang(* equal contribution)

arXiv 2025

We investigate whether post-training techniques such as SFT and RLVR can generalize to multi-agent systems, and introduce Recon—a 7B model trained on a curated dataset of economic reasoning problems—which achieves strong benchmark performance and exhibits emergent strategic generalization in multi-agent games.

×
BibTeX Citation
@article{zhou2025recon, title={Reasoning Like an Economist: Post-Training on Economic Problems Induces Strategic Generalization in LLMs}, author={Zhou, Yufa and Wang, Shaobo and Dong, Xingyu and Jin, Xiangqi and Chen, Yifang and Min, Yue and Yang, Kexin and Ren, Xingzhang and Liu, Dayiheng and Zhang, Linfeng}, journal={arXiv preprint arXiv:2506.00577}, year={2025}, url={https://arxiv.org/abs/2506.00577} }
Reasoning Like an Economist: Post-Training on Economic Problems Induces Strategic Generalization in LLMs

Yufa Zhou*, Shaobo Wang*, Xingyu Dong*, Xiangqi Jin, Yifang Chen, Yue Min, Kexin Yang, Xingzhang Ren, Dayiheng Liu, Linfeng Zhang(* equal contribution)

arXiv 2025

We investigate whether post-training techniques such as SFT and RLVR can generalize to multi-agent systems, and introduce Recon—a 7B model trained on a curated dataset of economic reasoning problems—which achieves strong benchmark performance and exhibits emergent strategic generalization in multi-agent games.

×
BibTeX Citation
@article{zhou2025recon, title={Reasoning Like an Economist: Post-Training on Economic Problems Induces Strategic Generalization in LLMs}, author={Zhou, Yufa and Wang, Shaobo and Dong, Xingyu and Jin, Xiangqi and Chen, Yifang and Min, Yue and Yang, Kexin and Ren, Xingzhang and Liu, Dayiheng and Zhang, Linfeng}, journal={arXiv preprint arXiv:2506.00577}, year={2025}, url={https://arxiv.org/abs/2506.00577} }
FastCar: Cache Attentive Replay for Fast Auto-Regressive Video Generation on the Edge
FastCar: Cache Attentive Replay for Fast Auto-Regressive Video Generation on the Edge

Xuan Shen, Weize Ma, Yufa Zhou, Enhao Tang, Yanyue Xie, Zhengang Li, Yifan Gong, Quanyi Wang, Henghui Ding, Yiwei Wang, Yanzhi Wang, Pu Zhao, Jun Lin, Jiuxiang Gu

arXiv 2025

We propose FastCar, a unified framework that accelerates auto-regressive video generation by exploiting temporal redundancy through a Temporal Attention Score for selective computation reuse, integrating with sparse attention and dynamic scheduling to enable real-time, high-resolution synthesis with over 2.1× speedup and minimal quality loss.

×
BibTeX Citation
@article{shen2025fastcar, title={Fastcar: Cache attentive replay for fast auto-regressive video generation on the edge}, author={Shen, Xuan and Ma, Weize and Zhou, Yufa and Tang, Enhao and Xie, Yanyue and Li, Zhengang and Gong, Yifan and Wang, Quanyi and Ding, Henghui and Wang, Yiwei and others}, journal={arXiv preprint arXiv:2505.14709}, year={2025} }
FastCar: Cache Attentive Replay for Fast Auto-Regressive Video Generation on the Edge

Xuan Shen, Weize Ma, Yufa Zhou, Enhao Tang, Yanyue Xie, Zhengang Li, Yifan Gong, Quanyi Wang, Henghui Ding, Yiwei Wang, Yanzhi Wang, Pu Zhao, Jun Lin, Jiuxiang Gu

arXiv 2025

We propose FastCar, a unified framework that accelerates auto-regressive video generation by exploiting temporal redundancy through a Temporal Attention Score for selective computation reuse, integrating with sparse attention and dynamic scheduling to enable real-time, high-resolution synthesis with over 2.1× speedup and minimal quality loss.

×
BibTeX Citation
@article{shen2025fastcar, title={Fastcar: Cache attentive replay for fast auto-regressive video generation on the edge}, author={Shen, Xuan and Ma, Weize and Zhou, Yufa and Tang, Enhao and Xie, Yanyue and Li, Zhengang and Gong, Yifan and Wang, Quanyi and Ding, Henghui and Wang, Yiwei and others}, journal={arXiv preprint arXiv:2505.14709}, year={2025} }
DraftAttention: Fast Video Diffusion via Low-Resolution Attention Guidance
DraftAttention: Fast Video Diffusion via Low-Resolution Attention Guidance

Xuan Shen*, Chenxia Han*, Yufa Zhou*, Yanyue Xie, Yifan Gong, Quanyi Wang, Yiwei Wang, Yanzhi Wang, Pu Zhao, Jiuxiang Gu(* equal contribution)

arXiv 2025

We propose DraftAttention, a method that accelerates video diffusion transformers by leveraging low-resolution pooled attention maps to enable dynamic sparse attention and hardware-efficient execution, achieving up to 1.75× speedup with minimal quality loss.

×
BibTeX Citation
@article{shen2025draftattention, title={DraftAttention: Fast Video Diffusion via Low-Resolution Attention Guidance}, author={Shen, Xuan and Han, Chenxia and Zhou, Yufa and Xie, Yanyue and Gong, Yifan and Wang, Quanyi and Wang, Yiwei and Wang, Yanzhi and Zhao, Pu and Gu, Jiuxiang}, journal={arXiv preprint arXiv:2505.14708}, year={2025} }
DraftAttention: Fast Video Diffusion via Low-Resolution Attention Guidance

Xuan Shen*, Chenxia Han*, Yufa Zhou*, Yanyue Xie, Yifan Gong, Quanyi Wang, Yiwei Wang, Yanzhi Wang, Pu Zhao, Jiuxiang Gu(* equal contribution)

arXiv 2025

We propose DraftAttention, a method that accelerates video diffusion transformers by leveraging low-resolution pooled attention maps to enable dynamic sparse attention and hardware-efficient execution, achieving up to 1.75× speedup with minimal quality loss.

×
BibTeX Citation
@article{shen2025draftattention, title={DraftAttention: Fast Video Diffusion via Low-Resolution Attention Guidance}, author={Shen, Xuan and Han, Chenxia and Zhou, Yufa and Xie, Yanyue and Gong, Yifan and Wang, Quanyi and Wang, Yiwei and Wang, Yanzhi and Zhao, Pu and Gu, Jiuxiang}, journal={arXiv preprint arXiv:2505.14708}, year={2025} }
Looped relu mlps may be all you need as practical programmable computers
Looped relu mlps may be all you need as practical programmable computers

Yingyu Liang*, Zhizhou Sha*, Zhenmei Shi*, Zhao Song*, Yufa Zhou*(α–β alphabetical order)

AISTATS 2025

We demonstrate that a looped 23-layer ReLU-MLP can function as a universal programmable computer—revealing that simple neural network modules possess greater expressive power than previously thought and can perform complex tasks without relying on advanced architectures like Transformers.

×
BibTeX Citation
@inproceedings{liang2025looped, title={Looped ReLU MLPs May Be All You Need as Practical Programmable Computers}, author={Liang, Yingyu and Sha, Zhizhou and Shi, Zhenmei and Song, Zhao and Zhou, Yufa}, booktitle={International Conference on Artificial Intelligence and Statistics}, pages={2647--2655}, year={2025}, organization={PMLR} }
Looped relu mlps may be all you need as practical programmable computers

Yingyu Liang*, Zhizhou Sha*, Zhenmei Shi*, Zhao Song*, Yufa Zhou*(α–β alphabetical order)

AISTATS 2025

We demonstrate that a looped 23-layer ReLU-MLP can function as a universal programmable computer—revealing that simple neural network modules possess greater expressive power than previously thought and can perform complex tasks without relying on advanced architectures like Transformers.

×
BibTeX Citation
@inproceedings{liang2025looped, title={Looped ReLU MLPs May Be All You Need as Practical Programmable Computers}, author={Liang, Yingyu and Sha, Zhizhou and Shi, Zhenmei and Song, Zhao and Zhou, Yufa}, booktitle={International Conference on Artificial Intelligence and Statistics}, pages={2647--2655}, year={2025}, organization={PMLR} }
Beyond Linear Approximations: A Novel Pruning Approach for Attention Matrix
Beyond Linear Approximations: A Novel Pruning Approach for Attention Matrix

Yingyu Liang*, Jiangxuan Long*, Zhenmei Shi*, Zhao Song*, Yufa Zhou*(α–β alphabetical order)

ICLR 2025

We introduce a novel LLM weight pruning method that directly optimizes for approximating the non-linear attention matrix—with theoretical convergence guarantees—effectively reducing computational costs while maintaining model performance.

×
BibTeX Citation
@inproceedings{liang2025beyond, title={Beyond Linear Approximations: A Novel Pruning Approach for Attention Matrix}, author={Yingyu Liang and Jiangxuan Long and Zhenmei Shi and Zhao Song and Yufa Zhou}, booktitle={The Thirteenth International Conference on Learning Representations}, year={2025}, url={https://openreview.net/forum?id=sgbI8Pxwie} }
Beyond Linear Approximations: A Novel Pruning Approach for Attention Matrix

Yingyu Liang*, Jiangxuan Long*, Zhenmei Shi*, Zhao Song*, Yufa Zhou*(α–β alphabetical order)

ICLR 2025

We introduce a novel LLM weight pruning method that directly optimizes for approximating the non-linear attention matrix—with theoretical convergence guarantees—effectively reducing computational costs while maintaining model performance.

×
BibTeX Citation
@inproceedings{liang2025beyond, title={Beyond Linear Approximations: A Novel Pruning Approach for Attention Matrix}, author={Yingyu Liang and Jiangxuan Long and Zhenmei Shi and Zhao Song and Yufa Zhou}, booktitle={The Thirteenth International Conference on Learning Representations}, year={2025}, url={https://openreview.net/forum?id=sgbI8Pxwie} }
Numerical Pruning for Efficient Autoregressive Models
Numerical Pruning for Efficient Autoregressive Models

Xuan Shen, Zhao Song, Yufa Zhou, Bo Chen, Jing Liu, Ruiyi Zhang, Ryan A. Rossi, Hao Tan, Tong Yu, Xiang Chen, Yufan Zhou, Tong Sun, Pu Zhao, Yanzhi Wang, Jiuxiang Gu

AAAI 2025

We present a training-free structural pruning method using Newton’s method and compensation algorithms to efficiently compress decoder-only transformer models, achieving state-of-the-art performance with reduced memory usage and faster generation on GPUs.

×
BibTeX Citation
@inproceedings{shen2025numerical, title={Numerical pruning for efficient autoregressive models}, author={Shen, Xuan and Song, Zhao and Zhou, Yufa and Chen, Bo and Liu, Jing and Zhang, Ruiyi and Rossi, Ryan A and Tan, Hao and Yu, Tong and Chen, Xiang and others}, booktitle={Proceedings of the AAAI Conference on Artificial Intelligence}, volume={39}, number={19}, pages={20418--20426}, year={2025} }
Numerical Pruning for Efficient Autoregressive Models

Xuan Shen, Zhao Song, Yufa Zhou, Bo Chen, Jing Liu, Ruiyi Zhang, Ryan A. Rossi, Hao Tan, Tong Yu, Xiang Chen, Yufan Zhou, Tong Sun, Pu Zhao, Yanzhi Wang, Jiuxiang Gu

AAAI 2025

We present a training-free structural pruning method using Newton’s method and compensation algorithms to efficiently compress decoder-only transformer models, achieving state-of-the-art performance with reduced memory usage and faster generation on GPUs.

×
BibTeX Citation
@inproceedings{shen2025numerical, title={Numerical pruning for efficient autoregressive models}, author={Shen, Xuan and Song, Zhao and Zhou, Yufa and Chen, Bo and Liu, Jing and Zhang, Ruiyi and Rossi, Ryan A and Tan, Hao and Yu, Tong and Chen, Xiang and others}, booktitle={Proceedings of the AAAI Conference on Artificial Intelligence}, volume={39}, number={19}, pages={20418--20426}, year={2025} }
LazyDiT: Lazy Learning for the Acceleration of Diffusion Transformers
LazyDiT: Lazy Learning for the Acceleration of Diffusion Transformers

Xuan Shen, Zhao Song, Yufa Zhou, Bo Chen, Yanyu Li, Yifan Gong, Kai Zhang, Hao Tan, Jason Kuen, Henghui Ding, Zhihao Shu, Wei Niu, Pu Zhao, Yanzhi Wang, Jiuxiang Gu

AAAI 2025

We present LazyDiT, a framework that accelerates Diffusion Transformers by reusing computations from previous steps and dynamically skipping redundancies, achieving superior performance over existing methods like DDIM across multiple models and devices.

×
BibTeX Citation
@inproceedings{shen2025lazydit, title={Lazydit: Lazy learning for the acceleration of diffusion transformers}, author={Shen, Xuan and Song, Zhao and Zhou, Yufa and Chen, Bo and Li, Yanyu and Gong, Yifan and Zhang, Kai and Tan, Hao and Kuen, Jason and Ding, Henghui and others}, booktitle={Proceedings of the AAAI Conference on Artificial Intelligence}, volume={39}, number={19}, pages={20409--20417}, year={2025} }
LazyDiT: Lazy Learning for the Acceleration of Diffusion Transformers

Xuan Shen, Zhao Song, Yufa Zhou, Bo Chen, Yanyu Li, Yifan Gong, Kai Zhang, Hao Tan, Jason Kuen, Henghui Ding, Zhihao Shu, Wei Niu, Pu Zhao, Yanzhi Wang, Jiuxiang Gu

AAAI 2025

We present LazyDiT, a framework that accelerates Diffusion Transformers by reusing computations from previous steps and dynamically skipping redundancies, achieving superior performance over existing methods like DDIM across multiple models and devices.

×
BibTeX Citation
@inproceedings{shen2025lazydit, title={Lazydit: Lazy learning for the acceleration of diffusion transformers}, author={Shen, Xuan and Song, Zhao and Zhou, Yufa and Chen, Bo and Li, Yanyu and Gong, Yifan and Zhang, Kai and Tan, Hao and Kuen, Jason and Ding, Henghui and others}, booktitle={Proceedings of the AAAI Conference on Artificial Intelligence}, volume={39}, number={19}, pages={20409--20417}, year={2025} }

2024

Differentially private attention computation

Yeqi Gao*, Zhao Song*, Xin Yang*, Yufa Zhou*(α–β alphabetical order)

NeurIPS 2024 Workshop: Safe Generative AI

We propose an efficient algorithm to approximate the attention matrix in Transformer-based large language models with differential privacy guarantees, addressing security and privacy concerns by preventing leakage of sensitive information during inference—building on advancements in fast attention computation and differentially private matrix publishing.

×
BibTeX Citation
@inproceedings{gao2024differentially, title={Differentially Private Attention Computation}, author={Yeqi Gao and Zhao Song and Xin Yang and Yufa Zhou}, booktitle={Neurips Safe Generative AI Workshop 2024}, year={2024}, url={https://openreview.net/forum?id=dj70ulvXDo} }
Differentially private attention computation

Yeqi Gao*, Zhao Song*, Xin Yang*, Yufa Zhou*(α–β alphabetical order)

NeurIPS 2024 Workshop: Safe Generative AI

We propose an efficient algorithm to approximate the attention matrix in Transformer-based large language models with differential privacy guarantees, addressing security and privacy concerns by preventing leakage of sensitive information during inference—building on advancements in fast attention computation and differentially private matrix publishing.

×
BibTeX Citation
@inproceedings{gao2024differentially, title={Differentially Private Attention Computation}, author={Yeqi Gao and Zhao Song and Xin Yang and Yufa Zhou}, booktitle={Neurips Safe Generative AI Workshop 2024}, year={2024}, url={https://openreview.net/forum?id=dj70ulvXDo} }
Fine-grained Attention I/O Complexity: Comprehensive Analysis for Backward Passes
Fine-grained Attention I/O Complexity: Comprehensive Analysis for Backward Passes

Xiaoyu Li*, Yingyu Liang*, Zhenmei Shi*, Zhao Song*, Yufa Zhou*(α–β alphabetical order)

arXiv 2024

We establish tight I/O complexity bounds for attention mechanisms in large language models across small and large cache sizes—confirming FlashAttention's optimality in large caches, improving algorithms for small caches, extending analysis to sparse attention, and offering insights for efficient LLM training and inference.

×
BibTeX Citation
@article{li2024fine, title={Fine-grained attention i/o complexity: Comprehensive analysis for backward passes}, author={Li, Xiaoyu and Liang, Yingyu and Shi, Zhenmei and Song, Zhao and Zhou, Yufa}, journal={arXiv preprint arXiv:2410.09397}, year={2024} }
Fine-grained Attention I/O Complexity: Comprehensive Analysis for Backward Passes

Xiaoyu Li*, Yingyu Liang*, Zhenmei Shi*, Zhao Song*, Yufa Zhou*(α–β alphabetical order)

arXiv 2024

We establish tight I/O complexity bounds for attention mechanisms in large language models across small and large cache sizes—confirming FlashAttention's optimality in large caches, improving algorithms for small caches, extending analysis to sparse attention, and offering insights for efficient LLM training and inference.

×
BibTeX Citation
@article{li2024fine, title={Fine-grained attention i/o complexity: Comprehensive analysis for backward passes}, author={Li, Xiaoyu and Liang, Yingyu and Shi, Zhenmei and Song, Zhao and Zhou, Yufa}, journal={arXiv preprint arXiv:2410.09397}, year={2024} }
Multi-Layer Transformers Gradient Can be Approximated in Almost Linear Time

Yingyu Liang*, Zhizhou Sha*, Zhenmei Shi*, Zhao Song*, Yufa Zhou*(α–β alphabetical order)

NeurIPS 2024 Workshop: Optimization for Machine Learning

We prove that gradients in multi-layer transformer models can be computed in almost linear time $n^{1+o(1)}$ using a novel fast approximation method with polynomially small error, overcoming the quadratic complexity bottleneck of self-attention and enabling more efficient training and deployment of long-context language models with general loss functions and common sub-modules like residual connections, causal masks, and multi-head attention.

×
BibTeX Citation
@inproceedings{liang2024multilayer, title={Multi-Layer Transformers Gradient Can be Approximated in Almost Linear Time}, author={Yingyu Liang and Zhizhou Sha and Zhenmei Shi and Zhao Song and Yufa Zhou}, booktitle={OPT 2024: Optimization for Machine Learning}, year={2024}, url={https://openreview.net/forum?id=1LJIPZ4SvS} }
Multi-Layer Transformers Gradient Can be Approximated in Almost Linear Time

Yingyu Liang*, Zhizhou Sha*, Zhenmei Shi*, Zhao Song*, Yufa Zhou*(α–β alphabetical order)

NeurIPS 2024 Workshop: Optimization for Machine Learning

We prove that gradients in multi-layer transformer models can be computed in almost linear time $n^{1+o(1)}$ using a novel fast approximation method with polynomially small error, overcoming the quadratic complexity bottleneck of self-attention and enabling more efficient training and deployment of long-context language models with general loss functions and common sub-modules like residual connections, causal masks, and multi-head attention.

×
BibTeX Citation
@inproceedings{liang2024multilayer, title={Multi-Layer Transformers Gradient Can be Approximated in Almost Linear Time}, author={Yingyu Liang and Zhizhou Sha and Zhenmei Shi and Zhao Song and Yufa Zhou}, booktitle={OPT 2024: Optimization for Machine Learning}, year={2024}, url={https://openreview.net/forum?id=1LJIPZ4SvS} }
Differential Privacy of Cross-Attention with Provable Guarantee
Differential Privacy of Cross-Attention with Provable Guarantee

Yingyu Liang*, Zhenmei Shi*, Zhao Song*, Yufa Zhou*(α–β alphabetical order)

NeurIPS 2024 Workshop: Safe Generative AI

We present the first differential privacy (DP) data structure for cross-attention modules—securing sensitive information in key and value matrices across AI applications like retrieval-augmented generation and guided stable diffusion—with theoretical guarantees on privacy and efficiency, robustness to adaptive attacks, and potential to inspire future privacy designs in large generative models.

×
BibTeX Citation
@inproceedings{liang2024differential, title={Differential Privacy of Cross-Attention with Provable Guarantee}, author={Yingyu Liang and Zhenmei Shi and Zhao Song and Yufa Zhou}, booktitle={Neurips Safe Generative AI Workshop 2024}, year={2024}, url={https://openreview.net/forum?id=GttuYQVARs} }
Differential Privacy of Cross-Attention with Provable Guarantee

Yingyu Liang*, Zhenmei Shi*, Zhao Song*, Yufa Zhou*(α–β alphabetical order)

NeurIPS 2024 Workshop: Safe Generative AI

We present the first differential privacy (DP) data structure for cross-attention modules—securing sensitive information in key and value matrices across AI applications like retrieval-augmented generation and guided stable diffusion—with theoretical guarantees on privacy and efficiency, robustness to adaptive attacks, and potential to inspire future privacy designs in large generative models.

×
BibTeX Citation
@inproceedings{liang2024differential, title={Differential Privacy of Cross-Attention with Provable Guarantee}, author={Yingyu Liang and Zhenmei Shi and Zhao Song and Yufa Zhou}, booktitle={Neurips Safe Generative AI Workshop 2024}, year={2024}, url={https://openreview.net/forum?id=GttuYQVARs} }
Tensor attention training: Provably efficient learning of higher-order transformers
Tensor attention training: Provably efficient learning of higher-order transformers

Yingyu Liang*, Zhenmei Shi*, Zhao Song*, Yufa Zhou*(α–β alphabetical order)

NeurIPS 2024 Workshop: Adaptive Foundation Models: Evolving AI for Personalized and Efficient Learning

We prove that, under bounded entries, the backward gradient of tensor attention can be computed in almost linear time—overcoming the $O(n^3)$ complexity barrier—and propose efficient methods to enable practical higher-order transformer training with tensor attention architectures.

×
BibTeX Citation
@article{liang2024tensor, title={Tensor attention training: Provably efficient learning of higher-order transformers}, author={Liang, Yingyu and Shi, Zhenmei and Song, Zhao and Zhou, Yufa}, journal={arXiv preprint arXiv:2405.16411}, year={2024} }
Tensor attention training: Provably efficient learning of higher-order transformers

Yingyu Liang*, Zhenmei Shi*, Zhao Song*, Yufa Zhou*(α–β alphabetical order)

NeurIPS 2024 Workshop: Adaptive Foundation Models: Evolving AI for Personalized and Efficient Learning

We prove that, under bounded entries, the backward gradient of tensor attention can be computed in almost linear time—overcoming the $O(n^3)$ complexity barrier—and propose efficient methods to enable practical higher-order transformer training with tensor attention architectures.

×
BibTeX Citation
@article{liang2024tensor, title={Tensor attention training: Provably efficient learning of higher-order transformers}, author={Liang, Yingyu and Shi, Zhenmei and Song, Zhao and Zhou, Yufa}, journal={arXiv preprint arXiv:2405.16411}, year={2024} }

2023

Multiscale optimization of additively manufactured graded non-stochastic and stochastic lattice structures

Hui Liu, Lianxiong Chen, Yi Jiang, Dezhou Zhu, Yufa Zhou, Xinzhong Wang

Composite Structures 2023

We develop a multiscale optimization framework for graded lattice structures—both non-stochastic and stochastic—by modeling microstructures, optimizing macroscopic relative density, and reconstructing full-scale lattices, demonstrating mechanical advantages over traditional single-scale structures through analysis and experiments.

×
BibTeX Citation
@article{liu2023multiscale, title={Multiscale optimization of additively manufactured graded non-stochastic and stochastic lattice structures}, author={Liu, Hui and Chen, Lianxiong and Jiang, Yi and Zhu, Dezhou and Zhou, Yufa and Wang, Xinzhong}, journal={Composite Structures}, volume={305}, pages={116546}, year={2023}, publisher={Elsevier} }
Multiscale optimization of additively manufactured graded non-stochastic and stochastic lattice structures

Hui Liu, Lianxiong Chen, Yi Jiang, Dezhou Zhu, Yufa Zhou, Xinzhong Wang

Composite Structures 2023

We develop a multiscale optimization framework for graded lattice structures—both non-stochastic and stochastic—by modeling microstructures, optimizing macroscopic relative density, and reconstructing full-scale lattices, demonstrating mechanical advantages over traditional single-scale structures through analysis and experiments.

×
BibTeX Citation
@article{liu2023multiscale, title={Multiscale optimization of additively manufactured graded non-stochastic and stochastic lattice structures}, author={Liu, Hui and Chen, Lianxiong and Jiang, Yi and Zhu, Dezhou and Zhou, Yufa and Wang, Xinzhong}, journal={Composite Structures}, volume={305}, pages={116546}, year={2023}, publisher={Elsevier} }