Attention Engine Optimization: It’s All About the R in RAG

Marianne Calilhanna
Nov 9
5 min read

Updated: Nov 11

Guest blog post by Rich Dominelli, Senior Systems Architect, DCL

As AI becomes increasingly integrated into all industries, organizations are striving to ensure that the insights generated by AI systems are reliable, accurate, and grounded in truth. Achieving trustworthy AI requires not only robust algorithms but also careful attention to the quality and structure of the information fed into them. One of the most important factors in this equation is the context size of large language models (LLMs).

Context Size of LLMs

In simplest terms, the context size of an LLM is how much data a model can actively utilize when answering a prompt. This is a combination of the conversational history plus any supplemental information such as uploaded documents or tool results like web search. The size is typically measured in tokens, with a word being equal to about 1.33 tokens. A 100-word phrase will take about 133 tokens, due to some words being broken up and punctuation adding to the total. The earliest LLMs had small context windows of 2,048 tokens, but today's modern LLMs boast context sizes of up to 2 million tokens.

AI-generated image of an anthropomorphized LLM ingesting structured content and XML

Increasingly it becomes evident that the more of the context window you consume, the poorer the accuracy of the LLM’s response is (see Context Rot or Context Management for more explanation). Accuracy can decline to less than 70% once the context window exceeds 32,000 tokens but can start long before that token count. It appears that even though an LLM has a large amount of memory, its attention can focus on only so much at once. Where this decline starts will vary based on the data at hand and what LLM you are using.

A Simple Example

Taking the list of names from the website 1000 Random Names (https://1000names.com/) I pasted the list of names into a text document and confirmed that there are indeed 1000 names (1 per line). Next, I went to ChatGPT and pasted the first 100 names into the chat input box with the prompt “How many names are in this list?” Following are the results:

Names In Prompt	LLM Result	Comment
100 names	100
200 names	200
300 names	280
500 names	500	Trying the names first and then the prompt gave me an accurate count just once.
500 names	490
1000 names	980

LLMs are notoriously bad at math (see the numerous fails on counting the Rs in strawberry), but generally modern LLMs are better at math for simple counting and the fall-off starts once the data exceeds a certain size. This is the accuracy degradation also known as context rot.

A second example is the classic needle-in-a-haystack example. Take a long unrelated text and bury a fact about halfway through. For example in a document about vacations, bury the line “The best writing advice I received from my college classmate was to write every week."

Query for this fact from your prompt and the answer will frequently degrade based on the size of the context. To make this problem worse, add a semantically similar fact such as

“I think the best advice I received from my college professor was to write everyday."

Then ask the LLM what advice you received from your classmates. For a thorough discussion of the results and implications of this see the chroma technical report.

Context Rot and RAG

The biggest area where context rot will impact the common use cases of LLMs is RAG. RAG (retrieval augmented generation) essentially works like this:

The user enters a prompt.
The system queries a back-end system to find content fragments related to that prompt.
Those content fragments and the user’s original prompt are assembled together into a larger prompt and fed to an LLM for answering.
The LLM responds by analyzing and summarizing the fragments to the user.

The success and accuracy of your RAG system becomes entirely dependent on how well your back end is able to find those content fragments (step 2) and rank them based on relevance, because the narrower and more high-quality the back-end search results are, the better the information the LLM has to work with and the less likely it is that context rot will affect your answer.

The earliest RAG systems relied solely on semantic search---matching based on computed meaning. These systems would create an embedded version of the user’s query and compare it to items in the corpus using similarity measures, with mixed results. As the corpuses grew larger, the number of hits grew to the point where ranking and filtering needed to be done to narrow the returned information.

In the world of computing this is not a new problem to solve and is most analogous to the work done for web search, but it’s equally analogous to the work done for effective searching of any large corpuses of documents. Semantic search has become part of the toolkit (but only part); it coexists with keyword and entity extraction, traditional data elements, and more traditional ranking and filtering.

Prepping content for semantic search relies on effective chunking of that content – breaking the content into meaningful pieces. Augmenting that content to prepare for truly effective searching relies on the traditional roles of data analysis, entity extraction, and metadata extraction. This means that in addition to chunking the information contained within a document, extracting the structured information is a worthwhile and valuable step, as it will allow you to tune the search results and focus the attention of the LLM in much the same way that SEO used to drive the most meaningful web pages to the top of the search results.

Structured Content is the Reliable Foundation

Ultimately, the quality of AI-driven search and insights depends not just on the model itself, but also on the structure and integrity of the content it retrieves. Well-organized, structured content provides a reliable foundation, allowing LLMs to focus on the most relevant information and produce responses that are both accurate and meaningful. By treating content structure as a strategic asset, organizations can build systems that are not only powerful but trustworthy, ensuring that AI delivers insights grounded in truth rather than noise.

Join DCL's Rich Dominelli and David Turner to dive deeper into the topic of trustworthy AI on December 17th at the DCL Learning Series Webinar

Trustworthy AI: Optimizing Content for Large Language Models

Wed, Dec 17, 2025 12:00 PM - 1:00 PM EST

Trustworthy AI is the strategic target for every organization striving to turn information into intelligence. Achieving this goal starts with well-structured, high-quality content that provides a reliable foundation for reasoning and response. This DCL Learning Series webinar unpacks how context management impacts LLM performance and explores tactics for preparing content that supports precise, reliable responses in AI-powered systems.