2024

Beyond Linear Approximations: A Novel Pruning Approach for Attention Matrix
Beyond Linear Approximations: A Novel Pruning Approach for Attention Matrix

Yingyu Liang, Jiangxuan Long, Zhenmei Shi, Zhao Song, Yufa Zhou (alphabetical order)

Arxiv 2024

We introduce a novel LLM weight pruning method that directly optimizes for approximating the non-linear attention matrix—with theoretical convergence guarantees—effectively reducing computational costs while maintaining model performance.

Beyond Linear Approximations: A Novel Pruning Approach for Attention Matrix

Yingyu Liang, Jiangxuan Long, Zhenmei Shi, Zhao Song, Yufa Zhou (alphabetical order)

Arxiv 2024

We introduce a novel LLM weight pruning method that directly optimizes for approximating the non-linear attention matrix—with theoretical convergence guarantees—effectively reducing computational costs while maintaining model performance.

Differentially private attention computation

Yeqi Gao, Zhao Song, Xin Yang, Yufa Zhou (alphabetical order)

NeurIPS 2024 Workshop: Safe Generative AI

We propose an efficient algorithm to approximate the attention matrix in Transformer-based large language models with differential privacy guarantees, addressing security and privacy concerns by preventing leakage of sensitive information during inference—building on advancements in fast attention computation and differentially private matrix publishing.

Differentially private attention computation

Yeqi Gao, Zhao Song, Xin Yang, Yufa Zhou (alphabetical order)

NeurIPS 2024 Workshop: Safe Generative AI

We propose an efficient algorithm to approximate the attention matrix in Transformer-based large language models with differential privacy guarantees, addressing security and privacy concerns by preventing leakage of sensitive information during inference—building on advancements in fast attention computation and differentially private matrix publishing.

Looped relu mlps may be all you need as practical programmable computers
Looped relu mlps may be all you need as practical programmable computers

Yingyu Liang, Zhizhou Sha, Zhenmei Shi, Zhao Song, Yufa Zhou (alphabetical order)

Arxiv 2024

We demonstrate that a looped 23-layer ReLU-MLP can function as a universal programmable computer—revealing that simple neural network modules possess greater expressive power than previously thought and can perform complex tasks without relying on advanced architectures like Transformers.

Looped relu mlps may be all you need as practical programmable computers

Yingyu Liang, Zhizhou Sha, Zhenmei Shi, Zhao Song, Yufa Zhou (alphabetical order)

Arxiv 2024

We demonstrate that a looped 23-layer ReLU-MLP can function as a universal programmable computer—revealing that simple neural network modules possess greater expressive power than previously thought and can perform complex tasks without relying on advanced architectures like Transformers.

Fine-grained Attention I/O Complexity: Comprehensive Analysis for Backward Passes
Fine-grained Attention I/O Complexity: Comprehensive Analysis for Backward Passes

Xiaoyu Li, Yingyu Liang, Zhenmei Shi, Zhao Song, Yufa Zhou (alphabetical order)

Arxiv 2024

We establish tight I/O complexity bounds for attention mechanisms in large language models across small and large cache sizes—confirming FlashAttention's optimality in large caches, improving algorithms for small caches, extending analysis to sparse attention, and offering insights for efficient LLM training and inference.

Fine-grained Attention I/O Complexity: Comprehensive Analysis for Backward Passes

Xiaoyu Li, Yingyu Liang, Zhenmei Shi, Zhao Song, Yufa Zhou (alphabetical order)

Arxiv 2024

We establish tight I/O complexity bounds for attention mechanisms in large language models across small and large cache sizes—confirming FlashAttention's optimality in large caches, improving algorithms for small caches, extending analysis to sparse attention, and offering insights for efficient LLM training and inference.

Multi-Layer Transformers Gradient Can be Approximated in Almost Linear Time

Yingyu Liang, Zhizhou Sha, Zhenmei Shi, Zhao Song, Yufa Zhou (alphabetical order)

NeurIPS 2024 Workshop: Optimization for Machine Learning

We prove that gradients in multi-layer transformer models can be computed in almost linear time $n^{1+o(1)}$ using a novel fast approximation method with polynomially small error, overcoming the quadratic complexity bottleneck of self-attention and enabling more efficient training and deployment of long-context language models with general loss functions and common sub-modules like residual connections, causal masks, and multi-head attention.

Multi-Layer Transformers Gradient Can be Approximated in Almost Linear Time

Yingyu Liang, Zhizhou Sha, Zhenmei Shi, Zhao Song, Yufa Zhou (alphabetical order)

NeurIPS 2024 Workshop: Optimization for Machine Learning

We prove that gradients in multi-layer transformer models can be computed in almost linear time $n^{1+o(1)}$ using a novel fast approximation method with polynomially small error, overcoming the quadratic complexity bottleneck of self-attention and enabling more efficient training and deployment of long-context language models with general loss functions and common sub-modules like residual connections, causal masks, and multi-head attention.

Differential Privacy of Cross-Attention with Provable Guarantee

Yingyu Liang, Zhenmei Shi, Zhao Song, Yufa Zhou (alphabetical order)

NeurIPS 2024 Workshop: Safe Generative AI

We present the first differential privacy (DP) data structure for cross-attention modules—securing sensitive information in key and value matrices across AI applications like retrieval-augmented generation and guided stable diffusion—with theoretical guarantees on privacy and efficiency, robustness to adaptive attacks, and potential to inspire future privacy designs in large generative models.

Differential Privacy of Cross-Attention with Provable Guarantee

Yingyu Liang, Zhenmei Shi, Zhao Song, Yufa Zhou (alphabetical order)

NeurIPS 2024 Workshop: Safe Generative AI

We present the first differential privacy (DP) data structure for cross-attention modules—securing sensitive information in key and value matrices across AI applications like retrieval-augmented generation and guided stable diffusion—with theoretical guarantees on privacy and efficiency, robustness to adaptive attacks, and potential to inspire future privacy designs in large generative models.

Tensor attention training: Provably efficient learning of higher-order transformers
Tensor attention training: Provably efficient learning of higher-order transformers

Yingyu Liang, Zhenmei Shi, Zhao Song, Yufa Zhou (alphabetical order)

NeurIPS 2024 Workshop: Adaptive Foundation Models: Evolving AI for Personalized and Efficient Learning

We prove that, under bounded entries, the backward gradient of tensor attention can be computed in almost linear time—overcoming the $O(n^3)$ complexity barrier—and propose efficient methods to enable practical higher-order transformer training with tensor attention architectures.

Tensor attention training: Provably efficient learning of higher-order transformers

Yingyu Liang, Zhenmei Shi, Zhao Song, Yufa Zhou (alphabetical order)

NeurIPS 2024 Workshop: Adaptive Foundation Models: Evolving AI for Personalized and Efficient Learning

We prove that, under bounded entries, the backward gradient of tensor attention can be computed in almost linear time—overcoming the $O(n^3)$ complexity barrier—and propose efficient methods to enable practical higher-order transformer training with tensor attention architectures.

Unraveling the Smoothness Properties of Diffusion Models: A Gaussian Mixture Perspective
Unraveling the Smoothness Properties of Diffusion Models: A Gaussian Mixture Perspective

Yingyu Liang, Zhenmei Shi, Zhao Song, Yufa Zhou (alphabetical order)

Arxiv 2024

We provide a theoretical analysis showing that for diffusion models with Gaussian mixture data, the diffusion process preserves the mixture structure; we derive tight, component-independent bounds on Lipschitz constants and second moments, and establish error guarantees for diffusion solvers—offering deeper insights into the diffusion dynamics under common data distributions.

Unraveling the Smoothness Properties of Diffusion Models: A Gaussian Mixture Perspective

Yingyu Liang, Zhenmei Shi, Zhao Song, Yufa Zhou (alphabetical order)

Arxiv 2024

We provide a theoretical analysis showing that for diffusion models with Gaussian mixture data, the diffusion process preserves the mixture structure; we derive tight, component-independent bounds on Lipschitz constants and second moments, and establish error guarantees for diffusion solvers—offering deeper insights into the diffusion dynamics under common data distributions.

2023

Multiscale optimization of additively manufactured graded non-stochastic and stochastic lattice structures

Hui Liu, Lianxiong Chen, Yi Jiang, Dezhou Zhu, Yufa Zhou, Xinzhong Wang

Composite Structures 2023

We develop a multiscale optimization framework for graded lattice structures—both non-stochastic and stochastic—by modeling microstructures, optimizing macroscopic relative density, and reconstructing full-scale lattices, demonstrating mechanical advantages over traditional single-scale structures through analysis and experiments.

Multiscale optimization of additively manufactured graded non-stochastic and stochastic lattice structures

Hui Liu, Lianxiong Chen, Yi Jiang, Dezhou Zhu, Yufa Zhou, Xinzhong Wang

Composite Structures 2023

We develop a multiscale optimization framework for graded lattice structures—both non-stochastic and stochastic—by modeling microstructures, optimizing macroscopic relative density, and reconstructing full-scale lattices, demonstrating mechanical advantages over traditional single-scale structures through analysis and experiments.