Articles by FavTutor
  • AI News
  • Data Structures
  • Web Developement
  • AI Code GeneratorNEW
  • Student Help
  • Main Website
No Result
View All Result
FavTutor
  • AI News
  • Data Structures
  • Web Developement
  • AI Code GeneratorNEW
  • Student Help
  • Main Website
No Result
View All Result
Articles by FavTutor
No Result
View All Result
Home AI News, Research & Latest Updates

Google’s Infini-attention Give LLMs Infinite Context Length

Dhruv Kudalkar by Dhruv Kudalkar
April 18, 2024
Reading Time: 5 mins read
Google Infini-Attention
Follow us on Google News   Subscribe to our newsletter

Google Researchers introduced a novel approach for scaling Large Language Models (LLMs) to process infinitely long text inputs. They developed Infini-attention, a technique that configures LLMs to extend their context window while keeping memory and computational requirements constant.

Highlights:

  • Researchers at Google introduced Infini-attention, a novel approach to give LLMs infinite context length.
  • Researchers at Google claim that models using Infini-attention can sustain quality across a context window of one million tokens.
  • Results demonstrate that Infini-Transformers can efficiently process extremely long input sequences with bounded memory.

What is a Context Window?

The context window is an important term in the field of LLMs, referring to the number of words or tokens that a model considers at any given time when processing text. It determines the extent of the model’s understanding and influences its ability to generate meaningful responses.

If a conversation exceeds the context length, tokens from earlier parts of the conversation may be disregarded, thus in turn affecting the model’s performance and effectiveness. Every model is designed with a specified context window that represents the optimal operating scope for the model.

In today’s world, expanding context length has emerged as a significant focus for enhancing model performance and gaining a competitive edge. Researchers at Google claim that models equipped with Infini-attention can sustain quality across a context window of one million tokens without necessitating extra memory.

Google’s Infini-attention Methodology

Compressive Memory

Infini-attention incorporates a compressive memory into the standard attention mechanism, combining both masked local attention and long-term linear attention in a single Transformer block. It reuses the key, value, and query states from the dot-product attention computation for long-term memory consolidation and retrieval.

The compressive memory is parameterized with an associative matrix, and the memory update and retrieval process is cast as a linear attention mechanism.

Attention Layer

In Infini-Transformers, the attention layer maintains both global compressive and local fine-grained states. The local attention context is computed within each input segment, while the compressive memory stores and retrieves the entire context history.

The final contextual output is an aggregation of the long-term memory-retrieved values and the local attention contexts.

Attention Layer

Infini-Transformers process extremely long inputs in a streaming fashion, enabling them to scale to infinitely long contexts with bounded memory and compute resources. The approach introduces minimal changes to the standard scaled dot-product attention and supports plug-and-play continual pre-training and long-context adaptation.

The image below shows a comparison between Google’s Infini-Transformer, and Transformer-XL. Like Transformer-XL, Infini-Transformer functions on a sequence of segments, computing standard causal dot-product attention within each segment.

This attention computation is localized within the segment’s N tokens (where N represents the segment length). 

 Infini-Transformer

Unlike local attention, which discards previous segment attention states, Infini-Transformers reuse these states to maintain a comprehensive context history, achieved through a compressive memory approach. 

Infini-Transformer has an entire context history whereas Transformer-XL discards old contexts since it caches the KV states for the last segment only. Thus, each attention layer in Infini-Transformers integrates both global compressive and local fine-grained states, defining an efficient attention mechanism called Infini-attention.

Experiments Conducted

The effectiveness of Infini-Transformers was demonstrated through experiments on various tasks involving extremely long input sequences. The experiments conducted are as follows:

  • Long-context language modeling: Small Infini-Transformer models were trained and evaluated on PG19 and Arxiv-math benchmarks. The models outperformed Transformer-XL and Memorizing Transformers while maintaining significantly fewer memory parameters.
  • 1M passkey retrieval benchmark: A 1B LLM with Infini-attention was continually pre-trained on 4K length inputs and fine-tuned on the passkey retrieval task. The model successfully solved the task with up to 1M context length after fine-tuning on only 5K length inputs.
  • 500K length book summarization (BookSum): An 8B LLM model with Infini-attention was continuously pre-trained with 8K input length and fine-tuned on the BookSum task. The model outperformed the previous best results and achieved a new state-of-the-art BookSum by processing the entire text from the books.

Results

In the long-context language modelling experiments, Infini-Transformers achieved better perplexity scores than Transformer-XL and Memorizing Transformers while maintaining 114x fewer memory parameters. Further increasing the training sequence length to 100K resulted in even lower perplexity scores.

For the 1M passkey retrieval benchmark, Infini-Transformers solved the task with up to 1M context length after fine-tuning on only 5K length inputs, demonstrating their ability to extrapolate to much longer input lengths than seen during training.

In the 500K length book summarization task, Infini-Transformers outperformed previous state-of-the-art models and achieved better Rouge scores with more text provided as input from the books.

The results demonstrate that Infini-Transformers can efficiently process extremely long input sequences with bounded memory and computation, making them a promising approach for scaling LLMs to infinitely long context windows. Infini-attention allows for easy adaptation of existing LLMs to long-context tasks through continual pre-training and fine-tuning.

Conclusion

Google’s introduction of Infini-attention within Infini-Transformers presents a groundbreaking approach for scaling LLMs to process infinitely long text inputs.

ShareTweetShareSendSend
Dhruv Kudalkar

Dhruv Kudalkar

Hello, I'm Dhruv Kudalkar, a final year undergraduate student pursuing a degree in Information Technology. My research interests revolve around Generative AI and Natural Language Processing (NLP). I constantly explore new technologies and strive to stay up-to-date in these fields, driven by a passion for innovation and a desire to contribute to the ever-evolving landscape of intelligent systems.

RelatedPosts

Candidate during Interview

9 Best AI Interview Assistant Tools For Job Seekers in 2025

May 1, 2025
AI Generated Tom and Jerry Video

AI Just Created a Full Tom & Jerry Cartoon Episode

April 12, 2025
Amazon Buy for Me AI

Amazon’s New AI Makes Buying from Any Website Easy

April 12, 2025
Microsoft New AI version of Quake 2

What Went Wrong With Microsoft’s AI Version of Quake II?

April 7, 2025
AI Reasoning Model Better Method

This Simple Method Can Make AI Reasoning Faster and Smarter

April 3, 2025

About FavTutor

FavTutor is a trusted online tutoring service to connects students with expert tutors to provide guidance on Computer Science subjects like Java, Python, C, C++, SQL, Data Science, Statistics, etc.

Categories

  • AI News, Research & Latest Updates
  • Trending
  • Data Structures
  • Web Developement
  • Data Science

Important Subjects

  • Python Assignment Help
  • C++ Help
  • R Programming Help
  • Java Homework Help
  • Programming Help

Resources

  • About Us
  • Contact Us
  • Editorial Policy
  • Privacy Policy
  • Terms and Conditions

Website listed on Ecomswap. © Copyright 2025 All Rights Reserved.

No Result
View All Result
  • AI News
  • Data Structures
  • Web Developement
  • AI Code Generator
  • Student Help
  • Main Website

Website listed on Ecomswap. © Copyright 2025 All Rights Reserved.