Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Update] Automated update on 2024-09-24 #17

Closed
wants to merge 1 commit into from

Conversation

Furyton
Copy link
Owner

@Furyton Furyton commented Sep 24, 2024

Retrieved 50 papers from Scholar Inbox

Mechanistic Engineering / Probing / Interpretability

"Evaluating Synthetic Activations composed of SAE Latents in GPT-2",2024-09-23,"https://arxiv.org/pdf/2409.15019",Giorgi Giglemiani; Nora Petrova; Chatrik Singh Mangat; Jett Janiak; Stefan Heimersheim

Knowledge / Memory Mechanisms

"Co-occurrence is not Factual Association in Language Models",2024-09-21,"https://arxiv.org/pdf/2409.14057",Xiao Zhang; Miao Li; Ji Wu

Learning / Generalization / Reasoning / Weak to Strong Generalization

"Co-occurrence is not Factual Association in Language Models",2024-09-21,"https://arxiv.org/pdf/2409.14057",Xiao Zhang; Miao Li; Ji Wu

In-Context Learning

"Rule Extrapolation in Language Models: A Study of Compositional Generalization on OOD Prompts",2024-09-10,"https://arxiv.org/pdf/2409.13728",Anna Mészáros; Szilvia Ujváry; Wieland Brendel; Patrik Reizinger; Ferenc Huszár

Miscellanea

"Rule Extrapolation in Language Models: A Study of Compositional Generalization on OOD Prompts",2024-09-10,"https://arxiv.org/pdf/2409.13728",Anna Mészáros; Szilvia Ujváry; Wieland Brendel; Patrik Reizinger; Ferenc Huszár

Layer-normalization

"You can remove GPT2s LayerNorm by fine-tuning",2024-09-06,"https://arxiv.org/pdf/2409.13710",Stefan Heimersheim

Scaling Laws / Emergent Abilities / Grokking / etc.

"Rethinking Conventional Wisdom in Machine Learning: From Generalization to Scaling",2024-09-23,"https://arxiv.org/pdf/2409.15156",Lechao Xiao

Chain-of-Thought

"Can Language Model Understand Word Semantics as A Chatbot? An Empirical Study of Language Model Internal External Mismatch",2024-09-21,"https://arxiv.org/pdf/2409.13972",Jinman Zhao; Xueyan Zhang; Xingyu Yue; Weizhe Chen; Zifan Qian; Ruiyu Wang


All Digest Papers From Scholar Inbox

  • "You Only Use Reactive Attention Slice For Long Context Retrieval",2024-09-03,"https://arxiv.org/pdf/2409.13695",Yun Joon Soh; Hanxian Huang; Yuandong Tian; Jishen Zhao

  • "Routing in Sparsely-gated Language Models responds to Context",2024-09-21,"https://arxiv.org/pdf/2409.14107",Stefan Arnold; Marian Fietta; Dilara Yesilbas

  • "A is for Absorption: Studying Feature Splitting and Absorption in Sparse Autoencoders",2024-09-22,"https://arxiv.org/pdf/2409.14507",David Chanin; James Wilken-Smith; Tomáš Dulka; Hardik Bhatnagar; Joseph Bloom

  • "Towards Building Efficient Sentence BERT Models using Layer Pruning",2024-09-21,"https://arxiv.org/pdf/2409.14168",Anushka Shelke; Riya Savant; Raviraj Joshi

  • "Co-occurrence is not Factual Association in Language Models",2024-09-21,"https://arxiv.org/pdf/2409.14057",Xiao Zhang; Miao Li; Ji Wu

  • "Probing Context Localization of Polysemous Words in Pre-trained Language Model Sub-Layers",2024-09-21,"https://arxiv.org/pdf/2409.14097",Soniya Vijayakumar; Josef van Genabith; Simon Ostermann

  • "Interpreting Arithmetic Mechanism in Large Language Models through Comparative Neuron Analysis",2024-09-21,"https://arxiv.org/pdf/2409.14144",Zeping Yu; Sophia Ananiadou

  • "Inference-Friendly Models With MixAttention",2024-09-23,"https://arxiv.org/pdf/2409.15012",Shashank Rajput; Ying Sheng; Sean Owen; Vitaliy Chiley

  • "Loop-Residual Neural Networks for Iterative Refinement",2024-09-21,"https://arxiv.org/pdf/2409.14199",Kei-Sing Ng; Qingchen Wang

  • "Normalized Narrow Jump To Conclusions: Normalized Narrow Shortcuts for Parameter Efficient Early Exit Transformer Prediction",2024-09-21,"https://arxiv.org/pdf/2409.14091",Amrit Diggavi Seshadri

  • "Rule Extrapolation in Language Models: A Study of Compositional Generalization on OOD Prompts",2024-09-10,"https://arxiv.org/pdf/2409.13728",Anna Mészáros; Szilvia Ujváry; Wieland Brendel; Patrik Reizinger; Ferenc Huszár

  • "Consistency for Large Neural Networks",2024-09-21,"https://arxiv.org/pdf/2409.14123",Haoran Zhan; Yingcun Xia

  • "Instruction Following without Instruction Tuning",2024-09-22,"https://arxiv.org/pdf/2409.14254",John Hewitt; Nelson F. Liu; Percy Liang; Christopher D. Manning

  • "From Lazy to Rich: Exact Learning Dynamics in Deep Linear Networks",2024-09-23,"https://arxiv.org/pdf/2409.14623",Clémentine C. J. Dominé; Nicolas Anguita; Alexandra M. Proca; Lukas Braun; Daniel Kunin; Pedro A. M. Mediano; Andrew M. Saxe

  • "You can remove GPT2s LayerNorm by fine-tuning",2024-09-06,"https://arxiv.org/pdf/2409.13710",Stefan Heimersheim

  • "Prompt Baking",2024-09-04,"https://arxiv.org/pdf/2409.13697",Aman Bhargava; Cameron Witkowski; Alexander Detkov; Matt Thomson

  • "Context-Aware Membership Inference Attacks against Pre-trained Large Language Models",2024-09-11,"https://arxiv.org/pdf/2409.13745",Hongyan Chang; Ali Shahin Shamsabadi; Kleomenis Katevas; Hamed Haddadi; Reza Shokri

  • "Rethinking Conventional Wisdom in Machine Learning: From Generalization to Scaling",2024-09-23,"https://arxiv.org/pdf/2409.15156",Lechao Xiao

  • "Can Language Model Understand Word Semantics as A Chatbot? An Empirical Study of Language Model Internal External Mismatch",2024-09-21,"https://arxiv.org/pdf/2409.13972",Jinman Zhao; Xueyan Zhang; Xingyu Yue; Weizhe Chen; Zifan Qian; Ruiyu Wang

  • "Evaluating Synthetic Activations composed of SAE Latents in GPT-2",2024-09-23,"https://arxiv.org/pdf/2409.15019",Giorgi Giglemiani; Nora Petrova; Chatrik Singh Mangat; Jett Janiak; Stefan Heimersheim

  • "Do language models practice what they preach? Examining language ideologies about gendered language reform encoded in LLMs",2024-09-20,"https://arxiv.org/pdf/2409.13852",Julia Watson; Sophia Lee; Barend Beekhuizen; Suzanne Stevenson

  • "Eliciting Instruction-tuned Code Language Models Capabilities to Utilize Auxiliary Function for Code Generation",2024-09-21,"https://arxiv.org/pdf/2409.13928",Seonghyeon Lee; Suyeon Kim; Joonwon Jang; Heejae Chon; Dongha Lee; Hwanjo Yu

  • "Backtracking Improves Generation Safety",2024-09-22,"https://arxiv.org/pdf/2409.14586",Yiming Zhang; Jianfeng Chi; Hailey Nguyen; Kartikeya Upasani; Daniel M. Bikel; Jason Weston; Eric Michael Smith

  • "Direct Judgement Preference Optimization",2024-09-23,"https://arxiv.org/pdf/2409.14664",Peifeng Wang; Austin Xu; Yilun Zhou; Caiming Xiong; Shafiq Joty

  • "Peer-to-Peer Learning Dynamics of Wide Neural Networks",2024-09-23,"https://arxiv.org/pdf/2409.15267",Shreyas Chaudhari; Srinivasa Pranav; Emile Anand; José M. F. Moura

  • "TracrBench: Generating Interpretability Testbeds with Large Language Models",2024-09-07,"https://arxiv.org/pdf/2409.13714",Hannes Thurnherr; Jérémy Scheurer

  • "Uncovering Latent Chain of Thought Vectors in Language Models",2024-09-21,"https://arxiv.org/pdf/2409.14026",Jason Zhang; Scott Viteri

  • "EchoAtt: Attend, Copy, then Adjust for More Efficient Large Language Models",2024-09-22,"https://arxiv.org/pdf/2409.14595",Hossein Rajabzadeh; Aref Jafari; Aman Sharma; Benyamin Jami; Hyock Ju Kwon; Ali Ghodsi; Boxing Chen; Mehdi Rezagholizadeh

  • "Flat-LoRA: Low-Rank Adaption over a Flat Loss Landscape",2024-09-22,"https://arxiv.org/pdf/2409.14396",Tao Li; Zhengbao He; Yujun Li; Yasheng Wang; Lifeng Shang; Xiaolin Huang

  • "Reducing the Footprint of Multi-Vector Retrieval with Minimal Performance Impact via Token Pooling",2024-09-23,"https://arxiv.org/pdf/2409.14683",Benjamin Clavié; Antoine Chaffin; Griffin Adams

  • "Instruction Tuning Vs. In-Context Learning: Revisiting Large Language Models in Few-Shot Computational Social Science",2024-09-23,"https://arxiv.org/pdf/2409.14673",Taihang Wang; Xiaoman Xu; Yimin Wang; Ye Jiang

  • "Target-Aware Language Modeling via Granular Data Sampling",2024-09-23,"https://arxiv.org/pdf/2409.14705",Ernie Chang; Pin-Jie Lin; Yang Li; Changsheng Zhao; Daeil Kim; Rastislav Rabatin; Zechun Liu; Yangyang Shi; Vikas Chandra

  • "Boosting Healthcare LLMs Through Retrieved Context",2024-09-23,"https://arxiv.org/pdf/2409.15127",Jordi Bayarri-Planas; Ashwin Kumar Gururajan; Dario Garcia-Gasulla

  • "LLM for Everyone: Representing the Underrepresented in Large Language Models",2024-09-20,"https://arxiv.org/pdf/2409.13897",Samuel Cahyawijaya

  • "Domino: Eliminating Communication in LLM Training via Generic Tensor Slicing and Overlapping",2024-09-23,"https://arxiv.org/pdf/2409.15241",Guanhua Wang; Chengming Zhang; Zheyu Shen; Ang Li; Olatunji Ruwase

  • "Creative Writers Attitudes on Writing as Training Data for Large Language Models",2024-09-22,"https://arxiv.org/pdf/2409.14281",Katy Ilonka Gero; Meera Desai; Carly Schnitzler; Nayun Eom; Jack Cushman; Elena L. Glassman

  • "Sketched Lanczos uncertainty score: a low-memory summary of the Fisher information",2024-09-23,"https://arxiv.org/pdf/2409.15008",Marco Miani; Lorenzo Beretta; Søren Hauberg

  • "Scaling Laws of Decoder-Only Models on the Multilingual Machine Translation Task",2024-09-23,"https://arxiv.org/pdf/2409.15051",Gaëtan Caillaut; Raheel Qader; Mariam Nakhlé; Jingshu Liu; Jean-Gabriel Barthélemy

  • "Measuring Copyright Risks of Large Language Model via Partial Information Probing",2024-09-20,"https://arxiv.org/pdf/2409.13831",Weijie Zhao; Huajie Shao; Zhaozhuo Xu; Suzhen Duan; Denghui Zhang

  • "One Model, Any Conjunctive Query: Graph Neural Networks for Answering Complex Queries over Knowledge Graphs",2024-09-21,"https://arxiv.org/pdf/2409.13959",Krzysztof Olejniczak; Xingyue Huang; İsmail İlkan Ceylan; Mikhail Galkin

  • "Persistent Backdoor Attacks in Continual Learning",2024-09-20,"https://arxiv.org/pdf/2409.13864",Zhen Guo; Abhinav Kumar; Reza Tourani

  • "Knowing When to Ask -- Bridging Large Language Models and Data",2024-09-10,"https://arxiv.org/pdf/2409.13741",Prashanth Radhakrishnan; Jennifer Chen; Bo Xu; Prem Ramaswami; Hannah Pho; Adriana Olmos; James Manyika; R. V. Guha

  • "OmniBench: Towards The Future of Universal Omni-Language Models",2024-09-23,"https://arxiv.org/pdf/2409.15272",Yizhi Li; Ge Zhang; Yinghao Ma; Ruibin Yuan; Kang Zhu; Hangyu Guo; Yiming Liang; Jiaheng Liu; Jian Yang; Siwei Wu; Xingwei Qu; Jinjie Shi; Xinyue Zhang; Zhenzhu Yang; Xiangzhou Wang; Zhaoxiang Zhang; Zachary Liu; Emmanouil Benetos; Wenhao Huang; Chenghua Lin

  • "zsLLMCode: An Effective Approach for Functional Code Embedding via LLM with Zero-Shot Learning",2024-09-23,"https://arxiv.org/pdf/2409.14644",Zixiang Xian; Chenhui Cui; Rubing Huang; Chunrong Fang; Zhenyu Chen

  • "Do Large Language Models Need a Content Delivery Network?",2024-09-16,"https://arxiv.org/pdf/2409.13761",Yihua Cheng; Kuntai Du; Jiayi Yao; Junchen Jiang

  • "Obliviate: Neutralizing Task-agnostic Backdoors within the Parameter-efficient Fine-tuning Paradigm",2024-09-21,"https://arxiv.org/pdf/2409.14119",Jaehan Kim; Minkyoo Song; Seung Ho Na; Seungwon Shin

  • "Not Only the Last-Layer Features for Spurious Correlations: All Layer Deep Feature Reweighting",2024-09-23,"https://arxiv.org/pdf/2409.14637",Humza Wajid Hameed; Geraldin Nanfack; Eugene Belilovsky

  • "Order of Magnitude Speedups for LLM Membership Inference",2024-09-22,"https://arxiv.org/pdf/2409.14513",Martin Bertran; Rongting Zhang; Aaron Roth

  • "Perfect Gradient Inversion in Federated Learning: A New Paradigm from the Hidden Subset Sum Problem",2024-09-22,"https://arxiv.org/pdf/2409.14260",Qiongxiu Li; Lixia Luo; Agnese Gini; Changlong Ji; Zhanhao Hu; Xiao Li; Chengfang Fang; Jie Shi; Xiaolin Hu

  • "RNR: Teaching Large Language Models to Follow Roles and Rules",2024-09-10,"https://arxiv.org/pdf/2409.13733",Kuan Wang; Alexander Bukharin; Haoming Jiang; Qingyu Yin; Zhengyang Wang; Tuo Zhao; Jingbo Shang; Chao Zhang; Bing Yin; Xian Li; Jianshu Chen; Shiyang Li

@Furyton Furyton closed this Oct 11, 2024
@Furyton Furyton deleted the automated-update-1727161996 branch October 11, 2024 11:45
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant