Models and learning methods for natural language generation
Parallel Intersected Multi-scale Attention for Sequence to Sequence Learning
Guangxiang Zhao, Xu Sun, Jingjing Xu, Zhiyuan Zhang, Liangcheng Luo.
TLDR: We propose a simple module Prime that consistently outperforms the complicated Transformer model on main NMT datasets with SOTA performance by simply stacking this module; We also find that when combine the convolution and self-attention, their operations for learning interactions between tokens should be performed on the same features.
[pdf] [code, scripts, and pretrained models]
Explicit Sparse Transformer
Guangxiang Zhao, Junyang Lin, Qi Zeng, Xuancheng Ren, Qi Su, Xu Sun.
TLDR: We propose to sparse attention weights in transformer according to their activations; We also give the evidence that sparse attention(8 or 1/4 of the sequence length(30) in NMT) is better; sparse attention without local attention constraint.
Review-Driven Multi-Label Music Style Classification by Exploiting Style Correlations
Guangxiang Zhao*, Jingjing Xu* (Equal Contribution), Qi Zeng, Xuancheng Ren, Xu Sun.
TLDR: We build a multi-label text classification dataset (music styles are hidden in the text) with strong label correlations, propose a method that automatically learns and exploits labels correlation during training.
Learning Relation Alignment for Calibrated Cross-modal Retrieval
Shuhuai Ren, Junyang Lin, Guangxiang Zhao, Rui Men, An Yang, Jingren Zhou, Xu Sun, Hongxia Yang.
TLDR: We propose the idea of relation alignment that aligns self-attention among two modalities.
Understanding and Improving Layer Normalization
Jingjing Xu, Xu Sun, Zhiyuan Zhang, Guangxiang Zhao, Junyang Lin.
TLDR:We find that the backprop of layernorm is essential and propose a better normalization method.
Layer-Wise Cross-View Decoding for Sequence-to-Sequence Learning
Fenglin Liu*, Xuancheng Ren* (Equal Contribution), Guangxiang Zhao, Xu Sun
TLDR: We find a limitation about information flow in Transformer and propose an effective cross-view decoding method to solve it.
Merit Student of Peking University, 2019
As a reviewer at ICLR2022, ACL2021, TNNLS
Teach assistant for "Foudations of Computer Science for Art Special", "Introduction to Natural Language Processing"