Here are some of the most important machine learning and AI research papers from August 19 to 25, 2024. These papers present fresh ideas, tools, and platforms that could change how AI is used in many areas of life. This research highlights the amazing power of artificial intelligence and machine learning, offering new solutions that make businesses run better and help technology grow.

Automated Design of Agentic Systems

  • Author(s): Shengran Hu, Cong Lu, Jeff Clune

Overview of the proposed algorithm Meta Agent Search and examples of discovered
agents.

The paper “Automated Design of Agentic Systems” presents a new area of research focused on the automatic creation of powerful agentic systems using foundation models as components. Traditionally, machine learning has shifted from hand-designed to learned solutions, and this research aims to further automate the design process. The study introduces the concept of Automated Design of Agentic Systems (ADAS), which seeks to invent new building blocks or combine existing ones in innovative ways. A key aspect of this research is the use of a meta-agent that programs new agents in code, leveraging the Turing-complete nature of programming languages. This approach allows for the development of diverse agentic systems, including novel prompts and control flows. The paper introduces an algorithm called Meta Agent Search, which iteratively creates and refines agents based on past discoveries. Experiments across the coding, science, and math domains demonstrate that these automatically designed agents outperform traditional hand-designed ones. The findings highlight the robustness and adaptability of these agents across different domains, suggesting a promising direction for developing advanced agentic systems that can benefit various fields.

LLM Pruning and Distillation in Practice: The Minitron Approach

  • Author(s): Sharath Turuvekere Sreenivas, Saurav Muralidharan, Raviraj Joshi, Marcin Chochowski, Mostofa Patwary, Mohammad Shoeybi, Bryan Catanzaro, Jan Kautz, Pavlo Molchanov

LLM: Pruning and Distillation in Practice

The paper “LLM Pruning and Distillation in Practice: The Minitron Approach” details a process for compressing large language models, specifically Llama 3.1 8B and Mistral NeMo 12B, into smaller, more efficient models with 4B and 8B parameters. This is achieved through two pruning strategies: depth pruning and joint hidden/attention/MLP (width) pruning. The study evaluates these strategies using common benchmarks from the LM Evaluation Harness. The models are further refined using the NeMo Aligner and tested in instruct-tuned versions. This results in a 4B model derived from Llama 3.1 8B and a high-performing Mistral-NeMo-Minitron-8B model from Mistral NeMo 12B. The research highlights the benefits of fine-tuning teacher models on the distillation dataset, even without access to the original data. The paper also announces the open-sourcing of the base model weights on Hugging Face with a permissive license, allowing broader access and collaboration. This work provides valuable insights into model compression techniques, making large language models more accessible and efficient for practical applications.

The Vizier Gaussian Process Bandit Algorithm

  • Author(s): Xingyou Song, Qiuyi Zhang, Chansoo Lee, Emily Fertig, Tzu-Kuo Huang, Lior Belenki, Greg Kochanski, Setareh Ariafar, Srinivas Vasudevan, Sagi Perel, Daniel Golovin

Key components of the Google Vizier Bayesian optimization algorithm.

The paper titled “The Vizier Gaussian Process Bandit Algorithm” discusses the advancements and implementation details of the algorithm used by Google Vizier, a service that has successfully performed millions of optimizations. This service has significantly contributed to accelerating various research and production systems at Google, showcasing the effectiveness of Bayesian optimization on a large scale. Over the years, the algorithm has undergone substantial improvements, informed by extensive research efforts and user feedback. The technical report provides insights into the design choices and implementation specifics of the current default algorithm offered by Open Source Vizier. Through experiments conducted on standardized benchmarks, the paper demonstrates the algorithm’s robustness and versatility, comparing it favorably against established industry baselines across multiple practical applications. This research highlights the algorithm’s ability to adapt and perform efficiently in diverse scenarios, reinforcing its value as a reliable tool for optimization tasks. The findings underscore the potential of the Vizier-Gaussian Process Bandit Algorithm to enhance optimization processes in various fields, making it a significant contribution to the field of machine learning.

Language Modeling on Tabular Data: A Survey of Foundations, Techniques and Evolution

  • Author(s): Yucheng Ruan, Xiang Lan, Jingying Ma, Yizhi Dong, Kai He, Mengling Feng

The structure of the survey paper. It includes three main parts: foundations in tabular data, tabular data modelling techniques, and the evolution of language modelling on tabular data.

The paper “Language Modeling on Tabular Data: A Survey of Foundations, Techniques and Evolution” addresses the challenges and advancements in modeling tabular data, a common data type characterized by its varied nature and complex structures. The research highlights the potential for improved predictive performance and robustness in analyzing tabular data, which is crucial for many applications. Influenced by recent progress in natural language processing, especially transformer architectures, new methods have been developed for tabular data modeling. Initially, these methods focused on pre-training transformers from scratch, facing scalability challenges. Later, the use of pre-trained language models like BERT emerged, requiring less data and offering better performance. The introduction of large language models, such as GPT and LLaMA, has further transformed the field, enabling more sophisticated applications with minimal fine-tuning. This paper provides a comprehensive review of language modeling techniques for tabular data, including data structures, key datasets, modeling techniques, and the evolution of language models. It also identifies ongoing challenges and suggests future research directions, contributing significantly to the understanding and development of language modeling for tabular data analysis.

Enhancing Robustness in Large Language Models: Prompting for Mitigating the Impact of Irrelevant Information

  • Author(s): Ming Jiang, Tingting Huang, Biao Guo, Yao Lu, Feng Zhang

An overview of the ATF method includes the process of extracting information from the question, analysing the information, and identifying irrelevant information (a), as well as the process of filtering out irrelevant information from the question (b).

The paper “Enhancing Robustness in Large Language Models: Prompting for Mitigating the Impact of Irrelevant Information” explores methods to improve the performance of large language models when faced with irrelevant data. Large language models are often challenged by extraneous information that can lead to inaccuracies in their outputs. This study introduces a prompting strategy aimed at reducing the influence of such irrelevant content, thereby enhancing the model’s robustness. The approach involves designing prompts that guide the model’s focus towards relevant information, improving its decision-making process. The research evaluates this method across various tasks and datasets, demonstrating its effectiveness in maintaining the accuracy and reliability of language models even when irrelevant information is present. The findings suggest that strategic prompting can be a valuable tool in optimizing the performance of language models, ensuring they remain efficient and accurate in diverse scenarios. This work contributes to the ongoing efforts to refine language model capabilities, offering practical solutions to common challenges faced in real-world applications. The study highlights the potential for prompting techniques to significantly enhance the robustness of large language models.

Graph Retrieval-Augmented Generation: A Survey

  • Author(s): Boci Peng, Yun Zhu, Yongchao Liu, Xiaohe Bo, Haizhou Shi, Chuntao Hong, Yan Zhang, Siliang Tang

Comparision between Direct LLM, RAG, and GraphRAG

The paper “Graph Retrieval-Augmented Generation: A Survey” explores the advancements in Retrieval-Augmented Generation (RAG) and its application in improving the outputs of Large Language Models (LLMs). RAG addresses issues like hallucinations, a lack of domain-specific knowledge, and outdated information by referencing external knowledge bases. However, the complex relationships between entities in databases pose challenges for RAG systems. To tackle this, GraphRAG utilizes structural information across entities to enable more precise retrieval and context-aware responses. This survey provides a comprehensive overview of GraphRAG methodologies, detailing the workflow that includes graph-based indexing, graph-guided retrieval, and graph-enhanced generation. It outlines the core technologies and training methods at each stage, examines downstream tasks, application domains, evaluation methodologies, and industrial use cases. The paper also identifies future research directions to inspire further exploration and advancement in the field. This systematic review highlights the potential of GraphRAG to enhance the capabilities of LLMs, making it a valuable resource for researchers and practitioners interested in leveraging graph structures for improved language model performance.

MagicDec: Breaking the Latency-Throughput Tradeoff for Long Context Generation with Speculative Decoding

  • Author(s): Jian Chen, Vashisth Tiwari, Ranajoy Sadhukhan, Zhuoming Chen, Jinyuan Shi, Ian En-Hsu Yen, Beidi Chen

Throughput vs. Latency for TinyLLama-1.1B speculating LLaMA-2-7B-32K at different prefill lengths.
(a) Throughput of autoregressive and SD against per-token latency for prefill 8000. (b) Throughput ratio of SD to autoregressive across latency budgets, showing that SD improves throughput for sequences longer than 1024.

The paper “MagicDec: Breaking the Latency-Throughput Tradeoff for Long Context Generation with Speculative Decoding” addresses the challenges of serving long-context requests in applications like chatbots and document analysis. Large Language Models (LLMs) often struggle with balancing low latency and high throughput in these scenarios. Speculative decoding (SD) is a common method to reduce latency, but it is traditionally thought to be effective only for small batch sizes. MagicDec demonstrates that SD can also achieve significant speedups in high throughput settings for moderate to long sequences. The approach involves an intelligent drafting strategy that enhances speedup as batch sizes increase. MagicDec identifies bottlenecks related to batch size and sequence length, using these insights to apply speculative decoding more efficiently. It employs draft models with sparse KV cache to manage the KV bottleneck, which scales with both sequence length and batch size. This method shows up to twice the speedup for LLaMA-2-7B-32K and 1.84 times for LLaMA-3.1-8B with batch sizes from 32 to 256 on 8 NVIDIA A100 GPUs, highlighting its broad applicability in improving throughput and reducing latency without losing accuracy.

Controllable Text Generation for Large Language Models: A Survey

  • Author(s): Xun Liang, Hanyu Wang, Yezhaohui Wang, Shichao Song, Jiawei Yang, Simin Niu, Jie Hu, Dan Liu, Shunyu Yao, Feiyu Xiong, Zhiyu Li

Controllable Text Generation for Large Language Models: A Survey

The paper “Controllable Text Generation for Large Language Models: A Survey” examines the development of techniques that allow for controlled text generation in large language models (LLMs). While LLMs have shown high-quality text generation, they face challenges in meeting complex real-world requirements. These include generating content that avoids misleading or inappropriate information and caters to specific user needs, such as mimicking particular writing styles or producing text with poetic qualities. To address these needs, Controllable Text Generation (CTG) techniques have been developed. These techniques ensure that generated outputs meet predefined control conditions, such as safety, sentiment, thematic consistency, and linguistic style, while maintaining high standards of fluency and diversity. The paper categorizes CTG tasks into content control and attribute control, discussing methods like model retraining, fine-tuning, reinforcement learning, prompt engineering, latent space manipulation, and decoding-time intervention. It evaluates each method’s strengths and weaknesses, reviews CTG evaluation methods, and summarizes its applications across different domains. The paper also highlights challenges such as reduced fluency and practicality, offering guidance for future research and development in this area.

Challenges and Responses in the Practice of Large Language Models

  • Author(s): Hongyin Zhu

Challenges and Responses in the Practice of Large Language Models

The paper “Challenges and Responses in the Practice of Large Language Models” explores the practical issues encountered when deploying large language models (LLMs) and the strategies developed to address these challenges. As LLMs become more prevalent in various applications, they face significant hurdles such as high computational costs, data privacy concerns, and the potential for generating biased or harmful content. The paper discusses how these challenges impact the scalability and ethical use of LLMs in real-world scenarios. It also examines the responses from the research community and industry to mitigate these issues, including advancements in model efficiency, techniques for ensuring data privacy, and methods for reducing bias in generated outputs. By analyzing current practices and solutions, the paper provides a comprehensive overview of the state of LLM deployment, highlighting both the progress made and the areas that require further attention. This work serves as a valuable resource for researchers and practitioners aiming to optimize the use of LLMs while addressing the associated ethical and technical challenges.

PEDAL: Enhancing Greedy Decoding with Large Language Models using Diverse Exemplars

  • Author(s): Sumanth Prabhu

High level overview of PEDAL (Prompts based on Exemplar Diversity Aggregated using an LLM)

The paper “PEDAL: Enhancing Greedy Decoding with Large Language Models Using Diverse Exemplars” introduces a novel approach to improving text generation in large language models (LLMs). Traditional self-ensembling techniques like self-consistency have shown significant accuracy improvements but come with high inference costs due to generating numerous output tokens. These techniques also rely on an effective answer extraction process to consolidate multiple outputs. Recent research indicates that diverse exemplars in prompts can enhance the diversity of LLM outputs. PEDAL (Prompts based on Exemplar Diversity Aggregated Using LLMs) is a hybrid self-ensembling method that integrates diverse exemplar-based prompts with LLM-based aggregation to boost performance. The paper demonstrates that PEDAL achieves higher accuracy than greedy decoding methods while maintaining lower inference costs compared to self-consistency approaches. Experiments conducted on the SVAMP and ARC datasets validate these findings, highlighting PEDAL’s effectiveness in balancing accuracy and efficiency. This approach offers a promising direction for optimizing text generation in LLMs, making it a valuable contribution to the field of natural language processing.