[Update] Automated update on 2024-09-24 #17
Closed
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Retrieved 50 papers from Scholar Inbox
Mechanistic Engineering / Probing / Interpretability
"Evaluating Synthetic Activations composed of SAE Latents in GPT-2",2024-09-23,"https://arxiv.org/pdf/2409.15019",Giorgi Giglemiani; Nora Petrova; Chatrik Singh Mangat; Jett Janiak; Stefan Heimersheim
Knowledge / Memory Mechanisms
"Co-occurrence is not Factual Association in Language Models",2024-09-21,"https://arxiv.org/pdf/2409.14057",Xiao Zhang; Miao Li; Ji Wu
Learning / Generalization / Reasoning / Weak to Strong Generalization
"Co-occurrence is not Factual Association in Language Models",2024-09-21,"https://arxiv.org/pdf/2409.14057",Xiao Zhang; Miao Li; Ji Wu
In-Context Learning
"Rule Extrapolation in Language Models: A Study of Compositional Generalization on OOD Prompts",2024-09-10,"https://arxiv.org/pdf/2409.13728",Anna Mészáros; Szilvia Ujváry; Wieland Brendel; Patrik Reizinger; Ferenc Huszár
Miscellanea
"Rule Extrapolation in Language Models: A Study of Compositional Generalization on OOD Prompts",2024-09-10,"https://arxiv.org/pdf/2409.13728",Anna Mészáros; Szilvia Ujváry; Wieland Brendel; Patrik Reizinger; Ferenc Huszár
Layer-normalization
"You can remove GPT2s LayerNorm by fine-tuning",2024-09-06,"https://arxiv.org/pdf/2409.13710",Stefan Heimersheim
Scaling Laws / Emergent Abilities / Grokking / etc.
"Rethinking Conventional Wisdom in Machine Learning: From Generalization to Scaling",2024-09-23,"https://arxiv.org/pdf/2409.15156",Lechao Xiao
Chain-of-Thought
"Can Language Model Understand Word Semantics as A Chatbot? An Empirical Study of Language Model Internal External Mismatch",2024-09-21,"https://arxiv.org/pdf/2409.13972",Jinman Zhao; Xueyan Zhang; Xingyu Yue; Weizhe Chen; Zifan Qian; Ruiyu Wang
All Digest Papers From Scholar Inbox
"You Only Use Reactive Attention Slice For Long Context Retrieval",2024-09-03,"https://arxiv.org/pdf/2409.13695",Yun Joon Soh; Hanxian Huang; Yuandong Tian; Jishen Zhao
"Routing in Sparsely-gated Language Models responds to Context",2024-09-21,"https://arxiv.org/pdf/2409.14107",Stefan Arnold; Marian Fietta; Dilara Yesilbas
"A is for Absorption: Studying Feature Splitting and Absorption in Sparse Autoencoders",2024-09-22,"https://arxiv.org/pdf/2409.14507",David Chanin; James Wilken-Smith; Tomáš Dulka; Hardik Bhatnagar; Joseph Bloom
"Towards Building Efficient Sentence BERT Models using Layer Pruning",2024-09-21,"https://arxiv.org/pdf/2409.14168",Anushka Shelke; Riya Savant; Raviraj Joshi
"Co-occurrence is not Factual Association in Language Models",2024-09-21,"https://arxiv.org/pdf/2409.14057",Xiao Zhang; Miao Li; Ji Wu
"Probing Context Localization of Polysemous Words in Pre-trained Language Model Sub-Layers",2024-09-21,"https://arxiv.org/pdf/2409.14097",Soniya Vijayakumar; Josef van Genabith; Simon Ostermann
"Interpreting Arithmetic Mechanism in Large Language Models through Comparative Neuron Analysis",2024-09-21,"https://arxiv.org/pdf/2409.14144",Zeping Yu; Sophia Ananiadou
"Inference-Friendly Models With MixAttention",2024-09-23,"https://arxiv.org/pdf/2409.15012",Shashank Rajput; Ying Sheng; Sean Owen; Vitaliy Chiley
"Loop-Residual Neural Networks for Iterative Refinement",2024-09-21,"https://arxiv.org/pdf/2409.14199",Kei-Sing Ng; Qingchen Wang
"Normalized Narrow Jump To Conclusions: Normalized Narrow Shortcuts for Parameter Efficient Early Exit Transformer Prediction",2024-09-21,"https://arxiv.org/pdf/2409.14091",Amrit Diggavi Seshadri
"Rule Extrapolation in Language Models: A Study of Compositional Generalization on OOD Prompts",2024-09-10,"https://arxiv.org/pdf/2409.13728",Anna Mészáros; Szilvia Ujváry; Wieland Brendel; Patrik Reizinger; Ferenc Huszár
"Consistency for Large Neural Networks",2024-09-21,"https://arxiv.org/pdf/2409.14123",Haoran Zhan; Yingcun Xia
"Instruction Following without Instruction Tuning",2024-09-22,"https://arxiv.org/pdf/2409.14254",John Hewitt; Nelson F. Liu; Percy Liang; Christopher D. Manning
"From Lazy to Rich: Exact Learning Dynamics in Deep Linear Networks",2024-09-23,"https://arxiv.org/pdf/2409.14623",Clémentine C. J. Dominé; Nicolas Anguita; Alexandra M. Proca; Lukas Braun; Daniel Kunin; Pedro A. M. Mediano; Andrew M. Saxe
"You can remove GPT2s LayerNorm by fine-tuning",2024-09-06,"https://arxiv.org/pdf/2409.13710",Stefan Heimersheim
"Prompt Baking",2024-09-04,"https://arxiv.org/pdf/2409.13697",Aman Bhargava; Cameron Witkowski; Alexander Detkov; Matt Thomson
"Context-Aware Membership Inference Attacks against Pre-trained Large Language Models",2024-09-11,"https://arxiv.org/pdf/2409.13745",Hongyan Chang; Ali Shahin Shamsabadi; Kleomenis Katevas; Hamed Haddadi; Reza Shokri
"Rethinking Conventional Wisdom in Machine Learning: From Generalization to Scaling",2024-09-23,"https://arxiv.org/pdf/2409.15156",Lechao Xiao
"Can Language Model Understand Word Semantics as A Chatbot? An Empirical Study of Language Model Internal External Mismatch",2024-09-21,"https://arxiv.org/pdf/2409.13972",Jinman Zhao; Xueyan Zhang; Xingyu Yue; Weizhe Chen; Zifan Qian; Ruiyu Wang
"Evaluating Synthetic Activations composed of SAE Latents in GPT-2",2024-09-23,"https://arxiv.org/pdf/2409.15019",Giorgi Giglemiani; Nora Petrova; Chatrik Singh Mangat; Jett Janiak; Stefan Heimersheim
"Do language models practice what they preach? Examining language ideologies about gendered language reform encoded in LLMs",2024-09-20,"https://arxiv.org/pdf/2409.13852",Julia Watson; Sophia Lee; Barend Beekhuizen; Suzanne Stevenson
"Eliciting Instruction-tuned Code Language Models Capabilities to Utilize Auxiliary Function for Code Generation",2024-09-21,"https://arxiv.org/pdf/2409.13928",Seonghyeon Lee; Suyeon Kim; Joonwon Jang; Heejae Chon; Dongha Lee; Hwanjo Yu
"Backtracking Improves Generation Safety",2024-09-22,"https://arxiv.org/pdf/2409.14586",Yiming Zhang; Jianfeng Chi; Hailey Nguyen; Kartikeya Upasani; Daniel M. Bikel; Jason Weston; Eric Michael Smith
"Direct Judgement Preference Optimization",2024-09-23,"https://arxiv.org/pdf/2409.14664",Peifeng Wang; Austin Xu; Yilun Zhou; Caiming Xiong; Shafiq Joty
"Peer-to-Peer Learning Dynamics of Wide Neural Networks",2024-09-23,"https://arxiv.org/pdf/2409.15267",Shreyas Chaudhari; Srinivasa Pranav; Emile Anand; José M. F. Moura
"TracrBench: Generating Interpretability Testbeds with Large Language Models",2024-09-07,"https://arxiv.org/pdf/2409.13714",Hannes Thurnherr; Jérémy Scheurer
"Uncovering Latent Chain of Thought Vectors in Language Models",2024-09-21,"https://arxiv.org/pdf/2409.14026",Jason Zhang; Scott Viteri
"EchoAtt: Attend, Copy, then Adjust for More Efficient Large Language Models",2024-09-22,"https://arxiv.org/pdf/2409.14595",Hossein Rajabzadeh; Aref Jafari; Aman Sharma; Benyamin Jami; Hyock Ju Kwon; Ali Ghodsi; Boxing Chen; Mehdi Rezagholizadeh
"Flat-LoRA: Low-Rank Adaption over a Flat Loss Landscape",2024-09-22,"https://arxiv.org/pdf/2409.14396",Tao Li; Zhengbao He; Yujun Li; Yasheng Wang; Lifeng Shang; Xiaolin Huang
"Reducing the Footprint of Multi-Vector Retrieval with Minimal Performance Impact via Token Pooling",2024-09-23,"https://arxiv.org/pdf/2409.14683",Benjamin Clavié; Antoine Chaffin; Griffin Adams
"Instruction Tuning Vs. In-Context Learning: Revisiting Large Language Models in Few-Shot Computational Social Science",2024-09-23,"https://arxiv.org/pdf/2409.14673",Taihang Wang; Xiaoman Xu; Yimin Wang; Ye Jiang
"Target-Aware Language Modeling via Granular Data Sampling",2024-09-23,"https://arxiv.org/pdf/2409.14705",Ernie Chang; Pin-Jie Lin; Yang Li; Changsheng Zhao; Daeil Kim; Rastislav Rabatin; Zechun Liu; Yangyang Shi; Vikas Chandra
"Boosting Healthcare LLMs Through Retrieved Context",2024-09-23,"https://arxiv.org/pdf/2409.15127",Jordi Bayarri-Planas; Ashwin Kumar Gururajan; Dario Garcia-Gasulla
"LLM for Everyone: Representing the Underrepresented in Large Language Models",2024-09-20,"https://arxiv.org/pdf/2409.13897",Samuel Cahyawijaya
"Domino: Eliminating Communication in LLM Training via Generic Tensor Slicing and Overlapping",2024-09-23,"https://arxiv.org/pdf/2409.15241",Guanhua Wang; Chengming Zhang; Zheyu Shen; Ang Li; Olatunji Ruwase
"Creative Writers Attitudes on Writing as Training Data for Large Language Models",2024-09-22,"https://arxiv.org/pdf/2409.14281",Katy Ilonka Gero; Meera Desai; Carly Schnitzler; Nayun Eom; Jack Cushman; Elena L. Glassman
"Sketched Lanczos uncertainty score: a low-memory summary of the Fisher information",2024-09-23,"https://arxiv.org/pdf/2409.15008",Marco Miani; Lorenzo Beretta; Søren Hauberg
"Scaling Laws of Decoder-Only Models on the Multilingual Machine Translation Task",2024-09-23,"https://arxiv.org/pdf/2409.15051",Gaëtan Caillaut; Raheel Qader; Mariam Nakhlé; Jingshu Liu; Jean-Gabriel Barthélemy
"Measuring Copyright Risks of Large Language Model via Partial Information Probing",2024-09-20,"https://arxiv.org/pdf/2409.13831",Weijie Zhao; Huajie Shao; Zhaozhuo Xu; Suzhen Duan; Denghui Zhang
"One Model, Any Conjunctive Query: Graph Neural Networks for Answering Complex Queries over Knowledge Graphs",2024-09-21,"https://arxiv.org/pdf/2409.13959",Krzysztof Olejniczak; Xingyue Huang; İsmail İlkan Ceylan; Mikhail Galkin
"Persistent Backdoor Attacks in Continual Learning",2024-09-20,"https://arxiv.org/pdf/2409.13864",Zhen Guo; Abhinav Kumar; Reza Tourani
"Knowing When to Ask -- Bridging Large Language Models and Data",2024-09-10,"https://arxiv.org/pdf/2409.13741",Prashanth Radhakrishnan; Jennifer Chen; Bo Xu; Prem Ramaswami; Hannah Pho; Adriana Olmos; James Manyika; R. V. Guha
"OmniBench: Towards The Future of Universal Omni-Language Models",2024-09-23,"https://arxiv.org/pdf/2409.15272",Yizhi Li; Ge Zhang; Yinghao Ma; Ruibin Yuan; Kang Zhu; Hangyu Guo; Yiming Liang; Jiaheng Liu; Jian Yang; Siwei Wu; Xingwei Qu; Jinjie Shi; Xinyue Zhang; Zhenzhu Yang; Xiangzhou Wang; Zhaoxiang Zhang; Zachary Liu; Emmanouil Benetos; Wenhao Huang; Chenghua Lin
"zsLLMCode: An Effective Approach for Functional Code Embedding via LLM with Zero-Shot Learning",2024-09-23,"https://arxiv.org/pdf/2409.14644",Zixiang Xian; Chenhui Cui; Rubing Huang; Chunrong Fang; Zhenyu Chen
"Do Large Language Models Need a Content Delivery Network?",2024-09-16,"https://arxiv.org/pdf/2409.13761",Yihua Cheng; Kuntai Du; Jiayi Yao; Junchen Jiang
"Obliviate: Neutralizing Task-agnostic Backdoors within the Parameter-efficient Fine-tuning Paradigm",2024-09-21,"https://arxiv.org/pdf/2409.14119",Jaehan Kim; Minkyoo Song; Seung Ho Na; Seungwon Shin
"Not Only the Last-Layer Features for Spurious Correlations: All Layer Deep Feature Reweighting",2024-09-23,"https://arxiv.org/pdf/2409.14637",Humza Wajid Hameed; Geraldin Nanfack; Eugene Belilovsky
"Order of Magnitude Speedups for LLM Membership Inference",2024-09-22,"https://arxiv.org/pdf/2409.14513",Martin Bertran; Rongting Zhang; Aaron Roth
"Perfect Gradient Inversion in Federated Learning: A New Paradigm from the Hidden Subset Sum Problem",2024-09-22,"https://arxiv.org/pdf/2409.14260",Qiongxiu Li; Lixia Luo; Agnese Gini; Changlong Ji; Zhanhao Hu; Xiao Li; Chengfang Fang; Jie Shi; Xiaolin Hu
"RNR: Teaching Large Language Models to Follow Roles and Rules",2024-09-10,"https://arxiv.org/pdf/2409.13733",Kuan Wang; Alexander Bukharin; Haoming Jiang; Qingyu Yin; Zhengyang Wang; Tuo Zhao; Jingbo Shang; Chao Zhang; Bing Yin; Xian Li; Jianshu Chen; Shiyang Li