Automated update on 2024-09-25

Furyton · Sep 25, 2024 · e9a6de4 · e9a6de4
1 parent a29a067
commit e9a6de4
Show file tree

Hide file tree

Showing 6 changed files with 12 additions and 6 deletions.
diff --git a/papers/mechanistic-engineering/papers.csv b/papers/mechanistic-engineering/papers.csv
@@ -42,4 +42,5 @@ Modularity in Transformers: Investigating Neuron Separability & Specialization,2
 Extracting Paragraphs from LLM Token Activations,2024-09-10,http://arxiv.org/abs/2409.06328,Nicholas Pochinkov; Angelo Benoit; Lovkush Agarwal; Zainab Ali Majid; Lucile Ter-Minassian
 Explaining Datasets in Words: Statistical Models with Natural Language Parameters,2024-09-13,http://arxiv.org/abs/2409.08466,Ruiqi Zhong; Heng Wang; Dan Klein; Jacob Steinhardt
 Optimal ablation for interpretability,2024-09-16,http://arxiv.org/abs/2409.09951,Maximilian Li; Lucas Janson
-Self-Attention Limits Working Memory Capacity of Transformer-Based Models,2024-09-16,http://arxiv.org/abs/2409.10715,Dongyu Gong; Hantao Zhang
+Self-Attention Limits Working Memory Capacity of Transformer-Based Models,2024-09-16,http://arxiv.org/abs/2409.10715,Dongyu Gong; Hantao Zhang
+"Training Neural Networks for Modularity aids Interpretability",2024-09-24,https://arxiv.org/pdf/2409.15747,Satvik Golechha; Dylan Cope; Nandi Schoots
diff --git a/papers/miscellanea/papers.csv b/papers/miscellanea/papers.csv
@@ -74,4 +74,5 @@ Modeling Language Tokens as Functionals of Semantic Fields,2024-05-02,http://ope
 Implicit Geometry of Next-token Prediction: From Language Sparsity Patterns to Model Representations,2024-07-10,http://openreview.net/pdf?id=qyilOnIRHI,Yize Zhao; Tina Behnia; Vala Vakilian; Christos Thrampoulidis
 Beyond Parameter Count: Implicit Bias in Soft Mixture of Experts,2024-09-02,http://arxiv.org/abs/2409.00879,Youngseog Chung; Dhruv Malik; Jeff Schneider; Yuanzhi Li; Aarti Singh
 Reframing Data Value for Large Language Models Through the Lens of Plausability,2024-08-30,http://arxiv.org/abs/2409.00284,Mohamad Rida Rammal; Ruida Zhou; Suhas Diggavi
-A Controlled Study on Long Context Extension and Generalization in LLMs,2024-09-18,http://arxiv.org/abs/2409.12181,Yi Lu; Jing Nathan Yan; Songlin Yang; Justin T. Chiu; Siyu Ren; Fei Yuan; Wenting Zhao; Zhiyong Wu; Alexander M. Rush
+A Controlled Study on Long Context Extension and Generalization in LLMs,2024-09-18,http://arxiv.org/abs/2409.12181,Yi Lu; Jing Nathan Yan; Songlin Yang; Justin T. Chiu; Siyu Ren; Fei Yuan; Wenting Zhao; Zhiyong Wu; Alexander M. Rush
+"Cognitive phantoms in LLMs through the lens of latent variables",2024-09-06,https://arxiv.org/pdf/2409.15324,Sanne Peereboom; Inga Schwabe; Bennett Kleinberg
diff --git a/papers/phenomena-of-interest/hallucination/papers.csv b/papers/phenomena-of-interest/hallucination/papers.csv
@@ -6,4 +6,5 @@ Calibrated Language Models Must Hallucinate,2023-11-24,http://arxiv.org/abs/2311
 The Curious Case of Hallucinatory Unanswerablity: Finding Truths in the Hidden States of Over-Confident Large Language Models,2023-10-18,http://arxiv.org/abs/2310.11877,Aviv Slobodkin; Omer Goldman; Avi Caciularu; Ido Dagan; Shauli Ravfogel
 Does Fine-Tuning LLMs on New Knowledge Encourage Hallucinations?,2024-05-09,http://arxiv.org/abs/2405.05904,Zorik Gekhman; Gal Yona; Roee Aharoni; Matan Eyal; Amir Feder; Roi Reichart; Jonathan Herzig
 Estimating the Hallucination Rate of Generative AI,2024-06-11,http://arxiv.org/abs/2406.07457,Andrew Jesson; Nicolas Beltran-Velez; Quentin Chu; Sweta Karlekar; Jannik Kossen; Yarin Gal; John P. Cunningham; David Blei
-Shared Imagination: LLMs Hallucinate Alike,2024-07-23,http://arxiv.org/abs/2407.16604,Yilun Zhou; Caiming Xiong; Silvio Savarese; Chien-Sheng Wu
+Shared Imagination: LLMs Hallucinate Alike,2024-07-23,http://arxiv.org/abs/2407.16604,Yilun Zhou; Caiming Xiong; Silvio Savarese; Chien-Sheng Wu
+"Cognitive phantoms in LLMs through the lens of latent variables",2024-09-06,https://arxiv.org/pdf/2409.15324,Sanne Peereboom; Inga Schwabe; Bennett Kleinberg
diff --git a/papers/phenomena-of-interest/in-context-learning/papers.csv b/papers/phenomena-of-interest/in-context-learning/papers.csv
@@ -74,4 +74,5 @@ Unveiling In-Context Learning: A Coordinate System to Understand Its Working Mec
 Polynomial Regression as a Task for Understanding In-context Learning Through Finetuning and Alignment,2024-07-27,http://arxiv.org/abs/2407.19346,Max Wilcoxson; Morten Svendgård; Ria Doshi; Dylan Davis; Reya Vir; Anant Sahai
 One-Layer Transformer Provably Learns One-Nearest Neighbor In Context,2024-07-24,https://klusowski.princeton.edu/sites/g/files/toruqf5901/files/documents/li2024one.pdf,Zihao Li; Yuan Cao; Cheng Gao; Yihan He; Han Liu; Jason M. Klusowski; Jianqing Fan; Mengdi Wang
 Learning vs Retrieval: The Role of In-Context Examples in Regression with LLMs,2024-09-06,http://arxiv.org/abs/2409.04318,Aliakbar Nafar; Kristen Brent Venable; Parisa Kordjamshidi
-Unveiling Induction Heads: Provable Training Dynamics and Feature Learning in Transformers,2024-09-10,http://arxiv.org/abs/2409.10559,Siyu Chen; Heejune Sheen; Tianhao Wang; Zhuoran Yang
+Unveiling Induction Heads: Provable Training Dynamics and Feature Learning in Transformers,2024-09-10,http://arxiv.org/abs/2409.10559,Siyu Chen; Heejune Sheen; Tianhao Wang; Zhuoran Yang
+"In-Context Learning May Not Elicit Trustworthy Reasoning: A-Not-B Errors in Pretrained Language Models",2024-09-23,https://arxiv.org/pdf/2409.15454,Pengrui Han; Peiyang Song; Haofei Yu; Jiaxuan You
diff --git a/papers/representational-capacity/what-can-transformer-do/papers.csv b/papers/representational-capacity/what-can-transformer-do/papers.csv
@@ -53,4 +53,5 @@ Attention is a smoothed cubic spline,2024-08-19,http://arxiv.org/abs/2408.09624,
 Transformers As Approximations of Solomonoff Induction,2024-08-22,http://arxiv.org/abs/2408.12065,Nathan Young; Michael Witbrock
 Implicit Geometry of Next-token Prediction: From Language Sparsity Patterns to Model Representations,2024-08-27,http://arxiv.org/abs/2408.15417,Yize Zhao; Tina Behnia; Vala Vakilian; Christos Thrampoulidis
 A Law of Next-Token Prediction in Large Language Models,2024-08-24,http://arxiv.org/abs/2408.13442,Hangfeng He; Weijie J. Su
-"Physics of Language Models: Part 1, Learning Hierarchical Language Structures",2024-06-02,http://arxiv.org/abs/2305.13673,Zeyuan Allen-Zhu; Yuanzhi Li
+"Physics of Language Models: Part 1, Learning Hierarchical Language Structures",2024-06-02,http://arxiv.org/abs/2305.13673,Zeyuan Allen-Zhu; Yuanzhi Li
+"Self-attention as an attractor network: transient memories without backpropagation",2024-09-24,https://arxiv.org/pdf/2409.16112,Francesco D'Amico; Matteo Negri
diff --git a/papers/representational-capacity/what-can-transformer-not-do/papers.csv b/papers/representational-capacity/what-can-transformer-not-do/papers.csv
@@ -21,4 +21,5 @@ When can transformers compositionally generalize in-context?,2024-07-17,http://a
 When Can Transformers Count to n?,2024-07-21,http://arxiv.org/abs/2407.15160,Gilad Yehudai; Haim Kaplan; Asma Ghandeharioun; Mor Geva; Amir Globerson
 Your Context Is Not an Array: Unveiling Random Access Limitations in Transformers,2024-08-10,http://arxiv.org/abs/2408.05506,MohammadReza Ebrahimi; Sunny Panchal; Roland Memisevic
 One-layer transformers fail to solve the induction heads task,2024-08-26,http://arxiv.org/abs/2408.14332,Clayton Sanford; Daniel Hsu; Matus Telgarsky
-Self-Attention Limits Working Memory Capacity of Transformer-Based Models,2024-09-16,http://arxiv.org/abs/2409.10715,Dongyu Gong; Hantao Zhang
+Self-Attention Limits Working Memory Capacity of Transformer-Based Models,2024-09-16,http://arxiv.org/abs/2409.10715,Dongyu Gong; Hantao Zhang
+"In-Context Learning May Not Elicit Trustworthy Reasoning: A-Not-B Errors in Pretrained Language Models",2024-09-23,https://arxiv.org/pdf/2409.15454,Pengrui Han; Peiyang Song; Haofei Yu; Jiaxuan You