Automated update on 2024-09-20

Furyton · Sep 20, 2024 · 560cb7c · 560cb7c
1 parent a29a067
commit 560cb7c
Show file tree

Hide file tree

Showing 6 changed files with 12 additions and 5 deletions.
diff --git a/papers/architectural-effectivity/layer-normalization/papers.csv b/papers/architectural-effectivity/layer-normalization/papers.csv
@@ -3,4 +3,5 @@ The Expressive Power of Tuning Only the Normalization Layers,2023-07-12,https://
 ResiDual: Transformer with Dual Residual Connections,2023-04-28,http://arxiv.org/abs/2304.14802,Shufang Xie; Huishuai Zhang; Junliang Guo; Xu Tan; Jiang Bian; Hany Hassan Awadalla; Arul Menezes; Tao Qin; Rui Yan
 "DeepNet: Scaling Transformers to 1,000 Layers",2022-03-01,http://arxiv.org/abs/2203.00555,Hongyu Wang; Shuming Ma; Li Dong; Shaohan Huang; Dongdong Zhang; Furu Wei
 On Layer Normalization in the Transformer Architecture,2020-06-29,http://arxiv.org/abs/2002.04745,Ruibin Xiong; Yunchang Yang; Di He; Kai Zheng; Shuxin Zheng; Chen Xing; Huishuai Zhang; Yanyan Lan; Liwei Wang; Tie-Yan Liu
-On the Role of Attention Masks and LayerNorm in Transformers,2024-05-29,http://arxiv.org/abs/2405.18781,Xinyi Wu; Amir Ajorlou; Yifei Wang; Stefanie Jegelka; Ali Jadbabaie
+On the Role of Attention Masks and LayerNorm in Transformers,2024-05-29,http://arxiv.org/abs/2405.18781,Xinyi Wu; Amir Ajorlou; Yifei Wang; Stefanie Jegelka; Ali Jadbabaie
+"Re-Introducing LayerNorm: Geometric Meaning, Irreversibility and a Comparative Study with RMSNorm",2024-09-19,https://arxiv.org/pdf/2409.12951,Akshat Gupta; Atahan Ozdemir; Gopala Anumanchipalli
diff --git a/papers/mechanistic-engineering/papers.csv b/papers/mechanistic-engineering/papers.csv
@@ -42,4 +42,5 @@ Modularity in Transformers: Investigating Neuron Separability & Specialization,2
 Extracting Paragraphs from LLM Token Activations,2024-09-10,http://arxiv.org/abs/2409.06328,Nicholas Pochinkov; Angelo Benoit; Lovkush Agarwal; Zainab Ali Majid; Lucile Ter-Minassian
 Explaining Datasets in Words: Statistical Models with Natural Language Parameters,2024-09-13,http://arxiv.org/abs/2409.08466,Ruiqi Zhong; Heng Wang; Dan Klein; Jacob Steinhardt
 Optimal ablation for interpretability,2024-09-16,http://arxiv.org/abs/2409.09951,Maximilian Li; Lucas Janson
-Self-Attention Limits Working Memory Capacity of Transformer-Based Models,2024-09-16,http://arxiv.org/abs/2409.10715,Dongyu Gong; Hantao Zhang
+Self-Attention Limits Working Memory Capacity of Transformer-Based Models,2024-09-16,http://arxiv.org/abs/2409.10715,Dongyu Gong; Hantao Zhang
+"Linguistic Minimal Pairs Elicit Linguistic Similarity in Large Language Models",2024-09-19,https://arxiv.org/pdf/2409.12435,Xinyu Zhou; Delong Chen; Samuel Cahyawijaya; Xufeng Duan; Zhenguang G. Cai
diff --git a/papers/phenomena-of-interest/chain-of-thought/papers.csv b/papers/phenomena-of-interest/chain-of-thought/papers.csv
@@ -9,4 +9,5 @@ Iteration Head: A Mechanistic Study of Chain-of-Thought,2024-06-04,http://arxiv.
 On the Representational Capacity of Neural Language Models with Chain-of-Thought Reasoning,2024-06-20,http://arxiv.org/abs/2406.14197,Franz Nowak; Anej Svete; Alexandra Butoi; Ryan Cotterell
 Unveiling the Statistical Foundations of Chain-of-Thought Prompting Methods,2024-08-25,http://arxiv.org/abs/2408.14511,Xinyang Hu; Fengzhuo Zhang; Siyu Chen; Zhuoran Yang
 "Deciphering the Factors Influencing the Efficacy of Chain-of-Thought: Probability, Memorization, and Noisy Reasoning",2024-07-01,http://arxiv.org/abs/2407.01687,Akshara Prabhakar; Thomas L. Griffiths; R. Thomas McCoy
-"Autoregressive + Chain of Thought (CoT) ≃ Recurrent: Recurrence's Role in Language Models and a Revist of Recurrent Transformer",2024-09-14,http://arxiv.org/abs/2409.09239,Xiang Zhang; Muhammad Abdul-Mageed; Laks V.S. Lakshmanan
+"Autoregressive + Chain of Thought (CoT) ≃ Recurrent: Recurrence's Role in Language Models and a Revist of Recurrent Transformer",2024-09-14,http://arxiv.org/abs/2409.09239,Xiang Zhang; Muhammad Abdul-Mageed; Laks V.S. Lakshmanan
+"Small Language Models are Equation Reasoners",2024-09-19,https://arxiv.org/pdf/2409.12393,Bumjun Kim; Kunha Lee; Juyeon Kim; Sangam Lee
diff --git a/papers/phenomena-of-interest/in-context-learning/papers.csv b/papers/phenomena-of-interest/in-context-learning/papers.csv
@@ -74,4 +74,5 @@ Unveiling In-Context Learning: A Coordinate System to Understand Its Working Mec
 Polynomial Regression as a Task for Understanding In-context Learning Through Finetuning and Alignment,2024-07-27,http://arxiv.org/abs/2407.19346,Max Wilcoxson; Morten Svendgård; Ria Doshi; Dylan Davis; Reya Vir; Anant Sahai
 One-Layer Transformer Provably Learns One-Nearest Neighbor In Context,2024-07-24,https://klusowski.princeton.edu/sites/g/files/toruqf5901/files/documents/li2024one.pdf,Zihao Li; Yuan Cao; Cheng Gao; Yihan He; Han Liu; Jason M. Klusowski; Jianqing Fan; Mengdi Wang
 Learning vs Retrieval: The Role of In-Context Examples in Regression with LLMs,2024-09-06,http://arxiv.org/abs/2409.04318,Aliakbar Nafar; Kristen Brent Venable; Parisa Kordjamshidi
-Unveiling Induction Heads: Provable Training Dynamics and Feature Learning in Transformers,2024-09-10,http://arxiv.org/abs/2409.10559,Siyu Chen; Heejune Sheen; Tianhao Wang; Zhuoran Yang
+Unveiling Induction Heads: Provable Training Dynamics and Feature Learning in Transformers,2024-09-10,http://arxiv.org/abs/2409.10559,Siyu Chen; Heejune Sheen; Tianhao Wang; Zhuoran Yang
+"Provable In-Context Learning of Linear Systems and Linear Elliptic PDEs with Transformers",2024-09-18,https://arxiv.org/pdf/2409.12293,Frank Cole; Yulong Lu; Riley O'Neill; Tianhao Zhang
diff --git a/papers/phenomena-of-interest/learning/papers.csv b/papers/phenomena-of-interest/learning/papers.csv
@@ -39,4 +39,5 @@ Unforgettable Generalization in Language Models,2024-09-03,http://arxiv.org/abs/
 The Many Faces of Optimal Weak-to-Strong Learning,2024-08-30,http://arxiv.org/abs/2408.17148,Mikael Møller Høgsgaard; Kasper Green Larsen; Markus Engelund Mathiasen
 On the Empirical Complexity of Reasoning and Planning in LLMs,2024-04-17,http://arxiv.org/abs/2404.11041,Liwei Kang; Zirui Zhao; David Hsu; Wee Sun Lee
 Understanding Simplicity Bias towards Compositional Mappings via Learning Dynamics,2024-09-15,http://arxiv.org/abs/2409.09626,Yi Ren; Danica J. Sutherland
-"Physics of Language Models: Part 2.2, How to Learn From Mistakes on Grade-School Math Problems",2024-08-29,http://arxiv.org/abs/2408.16293,Tian Ye; Zicheng Xu; Yuanzhi Li; Zeyuan Allen-Zhu
+"Physics of Language Models: Part 2.2, How to Learn From Mistakes on Grade-School Math Problems",2024-08-29,http://arxiv.org/abs/2408.16293,Tian Ye; Zicheng Xu; Yuanzhi Li; Zeyuan Allen-Zhu
+"Zero-to-Strong Generalization: Eliciting Strong Capabilities of Large Language Models Iteratively without Gold Labels",2024-09-19,https://arxiv.org/pdf/2409.12425,Chaoqun Liu; Qin Chao; Wenxuan Zhang; Xiaobao Wu; Boyang Li; Anh Tuan Luu; Lidong Bing
diff --git a/papers/training-paradigms/papers.csv b/papers/training-paradigms/papers.csv
@@ -1,3 +1,5 @@
 Title,Date,Url,Author
 Knowledge Distillation vs. Pretraining from Scratch under a Fixed (Computation) Budget,2024-04-30,http://arxiv.org/abs/2404.19319,Minh Duc Bui; Fabian David Schmidt; Goran Glavaš; Katharina von der Wense
 Why are Adaptive Methods Good for Attention Models?,2020-10-23,http://arxiv.org/abs/1912.03194,Jingzhao Zhang; Sai Praneeth Karimireddy; Andreas Veit; Seungyeon Kim; Sashank J. Reddi; Sanjiv Kumar; Suvrit Sra
+
+"Exploring and Enhancing the Transfer of Distribution in Knowledge Distillation for Autoregressive Language Models",2024-09-19,https://arxiv.org/pdf/2409.12512,Jun Rao; Xuebo Liu; Zepeng Lin; Liang Ding; Jing Li; Dacheng Tao