[Update] Automated update on 2024-09-18 #13
Closed
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Retrieved 50 papers from Scholar Inbox
In-Context Learning
"Unveiling Induction Heads: Provable Training Dynamics and Feature Learning in Transformers",2024-09-09,"https://arxiv.org/pdf/2409.10559",Siyu Chen; Heejune Sheen; Tianhao Wang; Zhuoran Yang
Other Phenomena / Discoveries
"Norm of Mean Contextualized Embeddings Determines their Variance",2024-09-17,"https://arxiv.org/pdf/2409.11253",Hiroaki Yamagiwa; Hidetoshi Shimodaira
Knowledge / Memory Mechanisms
"Self-Attention Limits Working Memory Capacity of Transformer-Based Models",2024-09-16,"https://arxiv.org/pdf/2409.10715",Dongyu Gong; Hantao Zhang
What Can Transformer Do? / Properties of Transformer
"Adaptive Large Language Models By Layerwise Attention Shortcuts",2024-09-17,"https://arxiv.org/pdf/2409.10870",Prateek Verma; Mert Pilanci
What Can Transformer Not Do? / Limitation of Transformer
"Self-Attention Limits Working Memory Capacity of Transformer-Based Models",2024-09-16,"https://arxiv.org/pdf/2409.10715",Dongyu Gong; Hantao Zhang
All Digest Papers From Scholar Inbox
"Unveiling Induction Heads: Provable Training Dynamics and Feature Learning in Transformers",2024-09-09,"https://arxiv.org/pdf/2409.10559",Siyu Chen; Heejune Sheen; Tianhao Wang; Zhuoran Yang
"Promptriever: Instruction-Trained Retrievers Can Be Prompted Like Language Models",2024-09-17,"https://arxiv.org/pdf/2409.11136",Orion Weller; Benjamin Van Durme; Dawn Lawrie; Ashwin Paranjape; Yuhao Zhang; Jack Hessel
"Semformer: Transformer Language Models with Semantic Planning",2024-09-17,"https://arxiv.org/pdf/2409.11143",Yongjing Yin; Junran Ding; Kai Song; Yue Zhang
"Linear Recency Bias During Training Improves Transformers Fit to Reading Times",2024-09-17,"https://arxiv.org/pdf/2409.11250",Christian Clark; Byung-Doh Oh; William Schuler
"Norm of Mean Contextualized Embeddings Determines their Variance",2024-09-17,"https://arxiv.org/pdf/2409.11253",Hiroaki Yamagiwa; Hidetoshi Shimodaira
"Kolmogorov-Arnold Transformer",2024-09-16,"https://arxiv.org/pdf/2409.10594",Xingyi Yang; Xinchao Wang
"Adaptive Large Language Models By Layerwise Attention Shortcuts",2024-09-17,"https://arxiv.org/pdf/2409.10870",Prateek Verma; Mert Pilanci
"Propulsion: Steering LLM with Tiny Fine-Tuning",2024-09-17,"https://arxiv.org/pdf/2409.10927",Md Kowsher; Nusrat Jahan Prottasha; Prakash Bhat
"Improving the Efficiency of Visually Augmented Language Models",2024-09-17,"https://arxiv.org/pdf/2409.11148",Paula Ontalvilla; Aitor Ormazabal; Gorka Azkune
"Investigating Context-Faithfulness in Large Language Models: The Roles of Memory Strength and Evidence Style",2024-09-17,"https://arxiv.org/pdf/2409.10955",Yuepei Li; Kang Zhou; Qiao Qiao; Bach Nguyen; Qing Wang; Qi Li
"Self-Attention Limits Working Memory Capacity of Transformer-Based Models",2024-09-16,"https://arxiv.org/pdf/2409.10715",Dongyu Gong; Hantao Zhang
"SOAP: Improving and Stabilizing Shampoo using Adam",2024-09-17,"https://arxiv.org/pdf/2409.11321",Nikhil Vyas; Depen Morwani; Rosie Zhao; Itai Shapira; David Brandfonbrener; Lucas Janson; Sham Kakade
"CSKV: Training-Efficient Channel Shrinking for KV Cache in Long-Context Scenarios",2024-09-16,"https://arxiv.org/pdf/2409.10593",Luning Wang; Shiyao Li; Xuefei Ning; Zhihang Yuan; Shengen Yan; Guohao Dai; Yu Wang
"Convolutional Networks as Extremely Small Foundation Models: Visual Prompting and Theoretical Perspective",2024-09-03,"https://arxiv.org/pdf/2409.10555",Jianqiao Wangni
"A close pair of orbiters embedded in a gaseous disk: the repulsive effect",2024-09-16,"https://arxiv.org/pdf/2409.10751",F. J. Sanchez-Salcedo; F. S. Masset; S. Cornejo
"KVPruner: Structural Pruning for Faster and Memory-Efficient Large Language Models",2024-09-17,"https://arxiv.org/pdf/2409.11057",Bo Lv; Quan Zhou; Xuanang Ding; Yan Wang; Zeming Ma
"Protecting Copyright of Medical Pre-trained Language Models: Training-Free Backdoor Watermarking",2024-09-14,"https://arxiv.org/pdf/2409.10570",Cong Kong; Rui Xu; Weixi Chen; Jiawei Chen; Zhaoxia Yin
"ASFT: Aligned Supervised Fine-Tuning through Absolute Likelihood",2024-09-14,"https://arxiv.org/pdf/2409.10571",Ruoyu Wang; Jiachen Sun; Shaowei Hua; Quan Fang
"Improving Multi-candidate Speculative Decoding",2024-09-16,"https://arxiv.org/pdf/2409.10644",Xiaofan Lu; Yixiao Zeng; Feiyang Ma; Zixu Yu; Marco Levorato
"Selective algorithm processing of subset sum distributions",2024-09-17,"https://arxiv.org/pdf/2409.11076",Nick Dawes
"Reasoning Graph Enhanced Exemplars Retrieval for In-Context Learning",2024-09-17,"https://arxiv.org/pdf/2409.11147",Yukang Lin; Bingchen Zhong; Shuoran Jiang; Joanna Siebert; Qingcai Chen
"Fairness in Survival Analysis with Distributionally Robust Optimization",2024-08-31,"https://arxiv.org/pdf/2409.10538",Shu Hu; George H. Chen
"Implicit Reasoning in Deep Time Series Forecasting",2024-09-17,"https://arxiv.org/pdf/2409.10840",Willa Potosnak; Cristian Challu; Mononito Goswami; Michał Wiliński; Nina Żukowska
"Model Tells Itself Where to Attend: Faithfulness Meets Automatic Attention Steering",2024-09-17,"https://arxiv.org/pdf/2409.10790",Qingru Zhang; Xiaodong Yu; Chandan Singh; Xiaodong Liu; Liyuan Liu; Jianfeng Gao; Tuo Zhao; Dan Roth; Hao Cheng
"Query Learning of Advice and Nominal Automata",2024-09-17,"https://arxiv.org/pdf/2409.10822",Kevin Zhou
"From Latent to Engine Manifolds: Analyzing ImageBinds Multimodal Embedding Space",2024-08-30,"https://arxiv.org/pdf/2409.10528",Andrew Hamara; Pablo Rivas
"THaMES: An End-to-End Tool for Hallucination Mitigation and Evaluation in Large Language Models",2024-09-17,"https://arxiv.org/pdf/2409.11353",Mengfei Liang; Archish Arun; Zekun Wu; Cristian Munoz; Jonathan Lutch; Emre Kazim; Adriano Koshiyama; Philip Treleaven
"Says Who? Effective Zero-Shot Annotation of Focalization",2024-09-17,"https://arxiv.org/pdf/2409.11390",Rebecca M. M. Hicke; Yuri Bizzoni; Pascale Feldkamp; Ross Deans Kristensen-McLachlan
"Evaluating the Impact of Compression Techniques on Task-Specific Performance of Large Language Models",2024-09-17,"https://arxiv.org/pdf/2409.11233",Bishwash Khanal; Jeffery M. Capone
"Leveraging Distillation Techniques for Document Understanding: A Case Study with FLAN-T5",2024-09-17,"https://arxiv.org/pdf/2409.11282",Marcel Lamott; Muhammad Armaghan Shakir
"Clustering with Non-adaptive Subset Queries",2024-09-17,"https://arxiv.org/pdf/2409.10908",Hadley Black; Euiwoong Lee; Arya Mazumdar; Barna Saha
"A Comprehensive Evaluation of Quantized Instruction-Tuned Large Language Models: An Experimental Analysis up to 405B",2024-09-17,"https://arxiv.org/pdf/2409.11055",Jemin Lee; Sihyeong Park; Jinse Kwon; Jihun Oh; Yongin Kwon
"Communication Lower Bounds and Optimal Algorithms for Symmetric Matrix Computations",2024-09-17,"https://arxiv.org/pdf/2409.11304",Hussam Al Daas; Grey Ballard; Laura Grigori; Suraj Kumar; Kathryn Rouse; Mathieu Verite
"A Best-of-Both Approach to Improve Match Predictions and Reciprocal Recommendations for Job Search",2024-09-17,"https://arxiv.org/pdf/2409.10992",Shuhei Goda; Yudai Hayashi; Yuta Saito
"Generalized Measures of Anticipation and Responsivity in Online Language Processing",2024-09-16,"https://arxiv.org/pdf/2409.10728",Mario Giulianelli; Andreas Opedal; Ryan Cotterell
"Relative Representations: Topological and Geometric Perspectives",2024-09-17,"https://arxiv.org/pdf/2409.10967",Alejandro García-Castellanos; Giovanni Luca Marchetti; Danica Kragic; Martina Scolamiero
"RoMath: A Mathematical Reasoning Benchmark in Romanian",2024-09-17,"https://arxiv.org/pdf/2409.11074",Adrian Cosma; Ana-Maria Bucur; Emilian Radoi
"The Complexity of Maximizing the MST-ratio",2024-09-17,"https://arxiv.org/pdf/2409.11079",Afrouz Jabal Ameli; Faezeh Motiei; Morteza Saghafian
"Boolean Functions with Small Approximate Spectral Norm",2024-09-16,"https://arxiv.org/pdf/2409.10634",Tsun-Ming Cheung; Hamed Hatami; Rosie Zhao; Itai Zilberstein
"Physics-Informed Neural Networks with Trust-Region Sequential Quadratic Programming",2024-09-17,"https://arxiv.org/pdf/2409.10777",Xiaoran Cheng; Sen Na
"DeFi Arbitrage in Hedged Liquidity Tokens",2024-09-17,"https://arxiv.org/pdf/2409.11339",Maxim Bichuch; Zachary Feinstein
"Tight Lower Bounds under Asymmetric High-Order Hölder Smoothness and Uniform Convexity",2024-09-17,"https://arxiv.org/pdf/2409.10773",Site Bai; Brian Bullins
"GenCRF: Generative Clustering and Reformulation Framework for Enhanced Intent-Driven Information Retrieval",2024-09-17,"https://arxiv.org/pdf/2409.10909",Wonduk Seo; Haojie Zhang; Yueyang Zhang; Changhao Zhang; Songyao Duan; Lixin Su; Daiting Shi; Jiashu Zhao; Dawei Yin
"CLIP Adaptation by Intra-modal Overlap Reduction",2024-09-17,"https://arxiv.org/pdf/2409.11338",Alexey Kravets; Vinay Namboodiri
"Trajectory-Oriented Control Using Gradient Descent: An Unconventional Approach",2024-09-16,"https://arxiv.org/pdf/2409.10662",Ramin Esmzad; Hamidreza Modares
"Detection Made Easy: Potentials of Large Language Models for Solidity Vulnerabilities",2024-09-15,"https://arxiv.org/pdf/2409.10574",Md Tauseef Alam; Raju Halder; Abyayananda Maiti
"Enhancing Low-Resource Language and Instruction Following Capabilities of Audio Language Models",2024-09-17,"https://arxiv.org/pdf/2409.10999",Potsawee Manakul; Guangzhi Sun; Warit Sirichotedumrong; Kasima Tharnpipitchai; Kunat Pipatanakul
"On the number of prime factors with a given multiplicity over h-free and h-full numbers",2024-09-17,"https://arxiv.org/pdf/2409.11275",Sourabhashis Das; Wentang Kuo; Yu-Ru Liu
"Elementary symmetric partitions",2024-09-17,"https://arxiv.org/pdf/2409.11268",Cristina Ballantine; George Beck; Mircea Merca; Bruce Sagan
"Online Combinatorial Allocations and Auctions with Few Samples",2024-09-17,"https://arxiv.org/pdf/2409.11091",Paul Dütting; Thomas Kesselheim; Brendan Lucier; Rebecca Reiffenhäuser; Sahil Singla