Skip to content

Commit

Permalink
Automated update on 2024-09-20
Browse files Browse the repository at this point in the history
  • Loading branch information
Furyton authored and github-actions[bot] committed Sep 20, 2024
1 parent a29a067 commit 560cb7c
Show file tree
Hide file tree
Showing 6 changed files with 12 additions and 5 deletions.
Original file line number Diff line number Diff line change
Expand Up @@ -3,4 +3,5 @@ The Expressive Power of Tuning Only the Normalization Layers,2023-07-12,https://
ResiDual: Transformer with Dual Residual Connections,2023-04-28,http://arxiv.org/abs/2304.14802,Shufang Xie; Huishuai Zhang; Junliang Guo; Xu Tan; Jiang Bian; Hany Hassan Awadalla; Arul Menezes; Tao Qin; Rui Yan
"DeepNet: Scaling Transformers to 1,000 Layers",2022-03-01,http://arxiv.org/abs/2203.00555,Hongyu Wang; Shuming Ma; Li Dong; Shaohan Huang; Dongdong Zhang; Furu Wei
On Layer Normalization in the Transformer Architecture,2020-06-29,http://arxiv.org/abs/2002.04745,Ruibin Xiong; Yunchang Yang; Di He; Kai Zheng; Shuxin Zheng; Chen Xing; Huishuai Zhang; Yanyan Lan; Liwei Wang; Tie-Yan Liu
On the Role of Attention Masks and LayerNorm in Transformers,2024-05-29,http://arxiv.org/abs/2405.18781,Xinyi Wu; Amir Ajorlou; Yifei Wang; Stefanie Jegelka; Ali Jadbabaie
On the Role of Attention Masks and LayerNorm in Transformers,2024-05-29,http://arxiv.org/abs/2405.18781,Xinyi Wu; Amir Ajorlou; Yifei Wang; Stefanie Jegelka; Ali Jadbabaie
"Re-Introducing LayerNorm: Geometric Meaning, Irreversibility and a Comparative Study with RMSNorm",2024-09-19,https://arxiv.org/pdf/2409.12951,Akshat Gupta; Atahan Ozdemir; Gopala Anumanchipalli
3 changes: 2 additions & 1 deletion papers/mechanistic-engineering/papers.csv
Original file line number Diff line number Diff line change
Expand Up @@ -42,4 +42,5 @@ Modularity in Transformers: Investigating Neuron Separability & Specialization,2
Extracting Paragraphs from LLM Token Activations,2024-09-10,http://arxiv.org/abs/2409.06328,Nicholas Pochinkov; Angelo Benoit; Lovkush Agarwal; Zainab Ali Majid; Lucile Ter-Minassian
Explaining Datasets in Words: Statistical Models with Natural Language Parameters,2024-09-13,http://arxiv.org/abs/2409.08466,Ruiqi Zhong; Heng Wang; Dan Klein; Jacob Steinhardt
Optimal ablation for interpretability,2024-09-16,http://arxiv.org/abs/2409.09951,Maximilian Li; Lucas Janson
Self-Attention Limits Working Memory Capacity of Transformer-Based Models,2024-09-16,http://arxiv.org/abs/2409.10715,Dongyu Gong; Hantao Zhang
Self-Attention Limits Working Memory Capacity of Transformer-Based Models,2024-09-16,http://arxiv.org/abs/2409.10715,Dongyu Gong; Hantao Zhang
"Linguistic Minimal Pairs Elicit Linguistic Similarity in Large Language Models",2024-09-19,https://arxiv.org/pdf/2409.12435,Xinyu Zhou; Delong Chen; Samuel Cahyawijaya; Xufeng Duan; Zhenguang G. Cai
3 changes: 2 additions & 1 deletion papers/phenomena-of-interest/chain-of-thought/papers.csv
Original file line number Diff line number Diff line change
Expand Up @@ -9,4 +9,5 @@ Iteration Head: A Mechanistic Study of Chain-of-Thought,2024-06-04,http://arxiv.
On the Representational Capacity of Neural Language Models with Chain-of-Thought Reasoning,2024-06-20,http://arxiv.org/abs/2406.14197,Franz Nowak; Anej Svete; Alexandra Butoi; Ryan Cotterell
Unveiling the Statistical Foundations of Chain-of-Thought Prompting Methods,2024-08-25,http://arxiv.org/abs/2408.14511,Xinyang Hu; Fengzhuo Zhang; Siyu Chen; Zhuoran Yang
"Deciphering the Factors Influencing the Efficacy of Chain-of-Thought: Probability, Memorization, and Noisy Reasoning",2024-07-01,http://arxiv.org/abs/2407.01687,Akshara Prabhakar; Thomas L. Griffiths; R. Thomas McCoy
"Autoregressive + Chain of Thought (CoT) ≃ Recurrent: Recurrence's Role in Language Models and a Revist of Recurrent Transformer",2024-09-14,http://arxiv.org/abs/2409.09239,Xiang Zhang; Muhammad Abdul-Mageed; Laks V.S. Lakshmanan
"Autoregressive + Chain of Thought (CoT) ≃ Recurrent: Recurrence's Role in Language Models and a Revist of Recurrent Transformer",2024-09-14,http://arxiv.org/abs/2409.09239,Xiang Zhang; Muhammad Abdul-Mageed; Laks V.S. Lakshmanan
"Small Language Models are Equation Reasoners",2024-09-19,https://arxiv.org/pdf/2409.12393,Bumjun Kim; Kunha Lee; Juyeon Kim; Sangam Lee
3 changes: 2 additions & 1 deletion papers/phenomena-of-interest/in-context-learning/papers.csv
Original file line number Diff line number Diff line change
Expand Up @@ -74,4 +74,5 @@ Unveiling In-Context Learning: A Coordinate System to Understand Its Working Mec
Polynomial Regression as a Task for Understanding In-context Learning Through Finetuning and Alignment,2024-07-27,http://arxiv.org/abs/2407.19346,Max Wilcoxson; Morten Svendgård; Ria Doshi; Dylan Davis; Reya Vir; Anant Sahai
One-Layer Transformer Provably Learns One-Nearest Neighbor In Context,2024-07-24,https://klusowski.princeton.edu/sites/g/files/toruqf5901/files/documents/li2024one.pdf,Zihao Li; Yuan Cao; Cheng Gao; Yihan He; Han Liu; Jason M. Klusowski; Jianqing Fan; Mengdi Wang
Learning vs Retrieval: The Role of In-Context Examples in Regression with LLMs,2024-09-06,http://arxiv.org/abs/2409.04318,Aliakbar Nafar; Kristen Brent Venable; Parisa Kordjamshidi
Unveiling Induction Heads: Provable Training Dynamics and Feature Learning in Transformers,2024-09-10,http://arxiv.org/abs/2409.10559,Siyu Chen; Heejune Sheen; Tianhao Wang; Zhuoran Yang
Unveiling Induction Heads: Provable Training Dynamics and Feature Learning in Transformers,2024-09-10,http://arxiv.org/abs/2409.10559,Siyu Chen; Heejune Sheen; Tianhao Wang; Zhuoran Yang
"Provable In-Context Learning of Linear Systems and Linear Elliptic PDEs with Transformers",2024-09-18,https://arxiv.org/pdf/2409.12293,Frank Cole; Yulong Lu; Riley O'Neill; Tianhao Zhang
3 changes: 2 additions & 1 deletion papers/phenomena-of-interest/learning/papers.csv
Original file line number Diff line number Diff line change
Expand Up @@ -39,4 +39,5 @@ Unforgettable Generalization in Language Models,2024-09-03,http://arxiv.org/abs/
The Many Faces of Optimal Weak-to-Strong Learning,2024-08-30,http://arxiv.org/abs/2408.17148,Mikael Møller Høgsgaard; Kasper Green Larsen; Markus Engelund Mathiasen
On the Empirical Complexity of Reasoning and Planning in LLMs,2024-04-17,http://arxiv.org/abs/2404.11041,Liwei Kang; Zirui Zhao; David Hsu; Wee Sun Lee
Understanding Simplicity Bias towards Compositional Mappings via Learning Dynamics,2024-09-15,http://arxiv.org/abs/2409.09626,Yi Ren; Danica J. Sutherland
"Physics of Language Models: Part 2.2, How to Learn From Mistakes on Grade-School Math Problems",2024-08-29,http://arxiv.org/abs/2408.16293,Tian Ye; Zicheng Xu; Yuanzhi Li; Zeyuan Allen-Zhu
"Physics of Language Models: Part 2.2, How to Learn From Mistakes on Grade-School Math Problems",2024-08-29,http://arxiv.org/abs/2408.16293,Tian Ye; Zicheng Xu; Yuanzhi Li; Zeyuan Allen-Zhu
"Zero-to-Strong Generalization: Eliciting Strong Capabilities of Large Language Models Iteratively without Gold Labels",2024-09-19,https://arxiv.org/pdf/2409.12425,Chaoqun Liu; Qin Chao; Wenxuan Zhang; Xiaobao Wu; Boyang Li; Anh Tuan Luu; Lidong Bing
2 changes: 2 additions & 0 deletions papers/training-paradigms/papers.csv
Original file line number Diff line number Diff line change
@@ -1,3 +1,5 @@
Title,Date,Url,Author
Knowledge Distillation vs. Pretraining from Scratch under a Fixed (Computation) Budget,2024-04-30,http://arxiv.org/abs/2404.19319,Minh Duc Bui; Fabian David Schmidt; Goran Glavaš; Katharina von der Wense
Why are Adaptive Methods Good for Attention Models?,2020-10-23,http://arxiv.org/abs/1912.03194,Jingzhao Zhang; Sai Praneeth Karimireddy; Andreas Veit; Seungyeon Kim; Sashank J. Reddi; Sanjiv Kumar; Suvrit Sra

"Exploring and Enhancing the Transfer of Distribution in Knowledge Distillation for Autoregressive Language Models",2024-09-19,https://arxiv.org/pdf/2409.12512,Jun Rao; Xuebo Liu; Zepeng Lin; Liang Ding; Jing Li; Dacheng Tao

0 comments on commit 560cb7c

Please sign in to comment.