What deep learning at current time can not do well in NLP
Parallel Intersected Multi-scale Attention for Sequence to Sequence Learning
Guangxiang Zhao, Xu Sun, Jingjing Xu, Zhiyuan Zhang, Liangcheng Luo.
TLDR: Propose a simple module consistently outperforms self-attention and Transformer model on main NMT datasets with SoTA performance; find that shared projection is essential for combining the convolution and self attention
[pdf] [code, scripts, and pretrained models]
Explicit Sparse Transformer
Guangxiang Zhao, Junyang Lin, Qi Zeng, Xuancheng Ren, Qi Su, Xu Sun.
TLDR: Propose to sparse attention weights in transformer according to their activations; give evidence that sparse attention(8 or 1/4 of the sequence length(30) in NMT) is better; sparse attention without local attention constraint.
Review-Driven Multi-Label Music Style Classification by Exploiting Style Correlations
Guangxiang Zhao*, Jingjing Xu* (Equal Contribution), Qi Zeng, Xuancheng Ren, Xu Sun.
TLDR: Build a multi-label text classification dataset (music styles are hidden in the text) with strong label correlations, propose a method that automatically learn and exploit labels correlation during training.
Understanding and Improving Layer Normalization
Jingjing Xu, Xu Sun, Zhiyuan Zhang, Guangxiang Zhao, Junyang Lin.
TLDR: Find that the backprop of layernorm is essential and propose better normalization method.
Layer-Wise Cross-View Decoding for Sequence-to-Sequence Learning
Fenglin Liu*, Xuancheng Ren* (Equal Contribution), Guangxiang Zhao, Xu Sun
TLDR: Find a limitation about information flow in Transformer and propose an effective cross-view decoding method to solve it.
Merit Student of Peking University, 2019
As a subreviewer at ACL 2019, EMNLP 2020
Teach assistant for "Foudations of computer science for art special"