update papers last week

Furyton · Sep 17, 2024 · 6c3c10d · 6c3c10d
1 parent dca0894
commit 6c3c10d
Show file tree

Hide file tree

Showing 7 changed files with 23 additions and 10 deletions.
diff --git a/generate_readme.py b/generate_readme.py
@@ -1,3 +1,4 @@
+# encoding: utf-8
 import os
 import csv
 import json
@@ -25,7 +26,7 @@
 def generate_table_of_content(category_info):
     header = "Table of Content\n====================\n<!--ts-->\n"
     header += (
-        "- [Awesome Transformers LM Analytics ](#awesome-transformers-lm-analytics-)\n"
+        "- [Awesome Language Model Analysis](#awesome-language-model-analysis-)\n"
         "- [Table of Content](#table-of-content)\n"
     )
     footer = "<!--te-->\n"
@@ -56,7 +57,7 @@ def gen_entry(name):
 
 
 def generate_section_template(category_info):
-    header_template = "## **{}**\n\n**[`^        back to top        ^`](#awesome-transformers-lm-analytics-)**\n\n{}"
+    header_template = "## **{}**\n\n**[`^        back to top        ^`](#awesome-language-model-analysis-)**\n\n{}"
     body_template = """
 <details open>
 <summary><em>paper list (click to fold / unfold)</em></summary>
@@ -138,10 +139,14 @@ def get_section_list(topic):
 
     # read as dict, the first line is the header
 
-    with open(p, "r") as f:
+    with open(p, "r", encoding="utf-8") as f:
         reader = csv.DictReader(f)
         # sort by date
-        reader = sorted(reader, key=lambda x: x["Date"], reverse=True)
+        try:
+            reader = sorted(reader, key=lambda x: x["Date"], reverse=True)
+        except Exception as e:
+            print(f"Error reading {p}: {e}")
+            return [], []
         # sanity check of each row
         for row in reader:
             assert len(row.keys()) == 4, f"topic: {topic}, row: {row}"

diff --git a/papers/architectural-effectivity/linear-attention/papers.csv b/papers/architectural-effectivity/linear-attention/papers.csv
@@ -4,4 +4,5 @@ Transformers are SSMs: Generalized Models and Efficient Algorithms Through Struc
 Just read twice: closing the recall gap for recurrent language models,2024-07-07,http://arxiv.org/abs/2407.05483,Simran Arora; Aman Timalsina; Aaryan Singhal; Benjamin Spector; Sabri Eyuboglu; Xinyi Zhao; Ashish Rao; Atri Rudra; Christopher Ré
 Transformers to SSMs: Distilling Quadratic Knowledge to Subquadratic Models,2024-08-19,http://arxiv.org/abs/2408.10189,Aviv Bick; Kevin Y. Li; Eric P. Xing; J. Zico Kolter; Albert Gu
 Recurrent Neural Networks Learn to Store and Generate Sequences using Non-Linear Representations,2024-08-20,http://arxiv.org/abs/2408.10920,Róbert Csordás; Christopher Potts; Christopher D. Manning; Atticus Geiger
-"Theory, Analysis, and Best Practices for Sigmoid Self-Attention",2024-09-06,http://arxiv.org/abs/2409.04431,Jason Ramapuram; Federico Danieli; Eeshan Dhekane; Floris Weers; Dan Busbridge; Pierre Ablin; Tatiana Likhomanenko; Jagrit Digani; Zijin Gu; Amitis Shidani; Russ Webb
+"Theory, Analysis, and Best Practices for Sigmoid Self-Attention",2024-09-06,http://arxiv.org/abs/2409.04431,Jason Ramapuram; Federico Danieli; Eeshan Dhekane; Floris Weers; Dan Busbridge; Pierre Ablin; Tatiana Likhomanenko; Jagrit Digani; Zijin Gu; Amitis Shidani; Russ Webb
+"Autoregressive + Chain of Thought (CoT) ≃ Recurrent: Recurrence's Role in Language Models and a Revist of Recurrent Transformer",2024-09-14,http://arxiv.org/abs/2409.09239,Xiang Zhang; Muhammad Abdul-Mageed; Laks V.S. Lakshmanan
diff --git a/papers/mechanistic-engineering/papers.csv b/papers/mechanistic-engineering/papers.csv
@@ -38,4 +38,7 @@ The Mechanics of Conceptual Interpretation in GPT Models: Interpretative Insight
 A Mechanistic Interpretation of Syllogistic Reasoning in Auto-Regressive Language Models,2024-08-16,http://arxiv.org/abs/2408.08590,Geonhee Kim; Marco Valentino; André Freitas
 Transformer Circuit Faithfulness Metrics are not Robust,2024-07-11,http://arxiv.org/abs/2407.08734,Joseph Miller; Bilal Chughtai; William Saunders
 LLM Circuit Analyses Are Consistent Across Training and Scale,2024-07-15,http://arxiv.org/abs/2407.10827,Curt Tigges; Michael Hanna; Qinan Yu; Stella Biderman
-Modularity in Transformers: Investigating Neuron Separability & Specialization,2024-08-30,http://arxiv.org/abs/2408.17324,Nicholas Pochinkov; Thomas Jones; Mohammed Rashidur Rahman
+Modularity in Transformers: Investigating Neuron Separability & Specialization,2024-08-30,http://arxiv.org/abs/2408.17324,Nicholas Pochinkov; Thomas Jones; Mohammed Rashidur Rahman
+Extracting Paragraphs from LLM Token Activations,2024-09-10,http://arxiv.org/abs/2409.06328,Nicholas Pochinkov; Angelo Benoit; Lovkush Agarwal; Zainab Ali Majid; Lucile Ter-Minassian
+Explaining Datasets in Words: Statistical Models with Natural Language Parameters,2024-09-13,http://arxiv.org/abs/2409.08466,Ruiqi Zhong; Heng Wang; Dan Klein; Jacob Steinhardt
+Optimal ablation for interpretability,2024-09-16,http://arxiv.org/abs/2409.09951,Maximilian Li; Lucas Janson
diff --git a/papers/phenomena-of-interest/chain-of-thought/papers.csv b/papers/phenomena-of-interest/chain-of-thought/papers.csv
@@ -8,4 +8,5 @@ The Expressive Power of Transformers with Chain of Thought,2023-10-13,https://op
 Iteration Head: A Mechanistic Study of Chain-of-Thought,2024-06-04,http://arxiv.org/abs/2406.02128,Vivien Cabannes; Charles Arnal; Wassim Bouaziz; Alice Yang; Francois Charton; Julia Kempe
 On the Representational Capacity of Neural Language Models with Chain-of-Thought Reasoning,2024-06-20,http://arxiv.org/abs/2406.14197,Franz Nowak; Anej Svete; Alexandra Butoi; Ryan Cotterell
 Unveiling the Statistical Foundations of Chain-of-Thought Prompting Methods,2024-08-25,http://arxiv.org/abs/2408.14511,Xinyang Hu; Fengzhuo Zhang; Siyu Chen; Zhuoran Yang
-"Deciphering the Factors Influencing the Efficacy of Chain-of-Thought: Probability, Memorization, and Noisy Reasoning",2024-07-01,http://arxiv.org/abs/2407.01687,Akshara Prabhakar; Thomas L. Griffiths; R. Thomas McCoy
+"Deciphering the Factors Influencing the Efficacy of Chain-of-Thought: Probability, Memorization, and Noisy Reasoning",2024-07-01,http://arxiv.org/abs/2407.01687,Akshara Prabhakar; Thomas L. Griffiths; R. Thomas McCoy
+"Autoregressive + Chain of Thought (CoT) ≃ Recurrent: Recurrence's Role in Language Models and a Revist of Recurrent Transformer",2024-09-14,http://arxiv.org/abs/2409.09239,Xiang Zhang; Muhammad Abdul-Mageed; Laks V.S. Lakshmanan
diff --git a/papers/phenomena-of-interest/hallucination/papers.csv b/papers/phenomena-of-interest/hallucination/papers.csv
@@ -6,4 +6,5 @@ Calibrated Language Models Must Hallucinate,2023-11-24,http://arxiv.org/abs/2311
 The Curious Case of Hallucinatory Unanswerablity: Finding Truths in the Hidden States of Over-Confident Large Language Models,2023-10-18,http://arxiv.org/abs/2310.11877,Aviv Slobodkin; Omer Goldman; Avi Caciularu; Ido Dagan; Shauli Ravfogel
 Does Fine-Tuning LLMs on New Knowledge Encourage Hallucinations?,2024-05-09,http://arxiv.org/abs/2405.05904,Zorik Gekhman; Gal Yona; Roee Aharoni; Matan Eyal; Amir Feder; Roi Reichart; Jonathan Herzig
 Estimating the Hallucination Rate of Generative AI,2024-06-11,http://arxiv.org/abs/2406.07457,Andrew Jesson; Nicolas Beltran-Velez; Quentin Chu; Sweta Karlekar; Jannik Kossen; Yarin Gal; John P. Cunningham; David Blei
-Shared Imagination: LLMs Hallucinate Alike,2024-07-23,http://arxiv.org/abs/2407.16604,Yilun Zhou; Caiming Xiong; Silvio Savarese; Chien-Sheng Wu
+Shared Imagination: LLMs Hallucinate Alike,2024-07-23,http://arxiv.org/abs/2407.16604,Yilun Zhou; Caiming Xiong; Silvio Savarese; Chien-Sheng Wu
+"LLMs Will Always Hallucinate, and We Need to Live With This",2024-09-09,http://arxiv.org/abs/2409.05746,Sourav Banerjee; Ayushi Agarwal; Saloni Singla
diff --git a/papers/phenomena-of-interest/learning/papers.csv b/papers/phenomena-of-interest/learning/papers.csv
@@ -37,4 +37,5 @@ On the Generalization of Preference Learning with DPO,2024-08-06,http://arxiv.or
 Reasoning in Large Language Models: A Geometric Perspective,2024-07-02,http://arxiv.org/abs/2407.02678,Romain Cosentino; Sarath Shekkizhar
 Unforgettable Generalization in Language Models,2024-09-03,http://arxiv.org/abs/2409.02228,Eric Zhang; Leshem Chosen; Jacob Andreas
 The Many Faces of Optimal Weak-to-Strong Learning,2024-08-30,http://arxiv.org/abs/2408.17148,Mikael Møller Høgsgaard; Kasper Green Larsen; Markus Engelund Mathiasen
-On the Empirical Complexity of Reasoning and Planning in LLMs,2024-04-17,http://arxiv.org/abs/2404.11041,Liwei Kang; Zirui Zhao; David Hsu; Wee Sun Lee
+On the Empirical Complexity of Reasoning and Planning in LLMs,2024-04-17,http://arxiv.org/abs/2404.11041,Liwei Kang; Zirui Zhao; David Hsu; Wee Sun Lee
+Understanding Simplicity Bias towards Compositional Mappings via Learning Dynamics,2024-09-15,http://arxiv.org/abs/2409.09626,Yi Ren; Danica J. Sutherland
diff --git a/papers/phenomena-of-interest/training-dynamics/papers.csv b/papers/phenomena-of-interest/training-dynamics/papers.csv
@@ -29,4 +29,5 @@ Learning Dynamics of LLM Finetuning,2024-07-15,http://arxiv.org/abs/2407.10490,Y
 Parameter-Efficient Fine-Tuning for Continual Learning: A Neural Tangent Kernel Perspective,2024-07-24,http://arxiv.org/abs/2407.17120,Jingren Liu; Zhong Ji; YunLong Yu; Jiale Cao; Yanwei Pang; Jungong Han; Xuelong Li
 Global Convergence in Training Large-Scale Transformers,2024-08,https://klusowski.princeton.edu/sites/g/files/toruqf5901/files/documents/gao2024global.pdf,Cheng Gao; Yuan Cao; Zihao Li; Yihan He; Mengdi Wang; Han Liu; Jason M. Klusowski; Jianqing Fan
 On the Convergence of Encoder-only Shallow Transformers,2024-08,https://proceedings.neurips.cc/paper_files/paper/2023/file/a3cf318fbeec1126da21e9185ae9908c-Paper-Conference.pdf,Yongtao Wu; Fanghui Liu; Grigorios G Chrysos; Volkan Cevher
-"The AdEMAMix Optimizer: Better, Faster, Older",2024-09-05,http://arxiv.org/abs/2409.03137,Matteo Pagliardini; Pierre Ablin; David Grangier
+"The AdEMAMix Optimizer: Better, Faster, Older",2024-09-05,http://arxiv.org/abs/2409.03137,Matteo Pagliardini; Pierre Ablin; David Grangier
+Optimization Hyper-parameter Laws for Large Language Models,2024-09-07,http://arxiv.org/abs/2409.04777,Xingyu Xie; Kuangyu Ding; Shuicheng Yan; Kim-Chuan Toh; Tianwen Wei