Boosting RAG Performance through Glossary Integration ― A LlamaIndex Hands-On Tutorial

Daniel Klitzke
ITNEXT
Published in
7 min readMar 20, 2024

--

Engineering RAG Systems to behave robustly in real industry use cases is a real challenge. Techniques like Query Rewriting can come in handy and novel evaluation tools can help to identify common failure modes. (Image Created by Author Using Spotlight)

tl;dr

The user of a RAG system is often not familiar with the specific terminology that is best suited for querying your database. Rewriting queries using a glossary can substantially improve RAG Retrieval Performance. This tutorial shows how to easily achieve that using LlamaIndex.

These are the relevant resources:

Introduction

In real use cases, custom terminology might cause your RAG system to fail. In this tutorial, I show you a solution that allows integrating custom terminology using a glossary or similar data source. And the best part is, it is completely without fine-tuning and can run locally!

So, before we start with the hands-on part, how does it work? In my experience, a RAG system can mainly fail in two stages:

  1. The retrieval of relevant documents.
  2. The information extraction and rephrasing from the retrieved context.

I have found that in both stages, injecting additional information can significantly improve the RAG system’s performance. This could be information such as:

  • A simple table of contents explaining the chronological order of information.
  • Explanation of terms, e.g., a glossary.
  • Index structures, explaining where to find certain information.

So, in other words, mechanisms that can also help humans find and understand information can also be beneficial for machine learning models.

Note that an approach quite similar is using knowledge graphs along with LLMs; however, I find that it definitely doesn’t always have to be the fully-fledged knowledge graph solution. Simpler, use-case-tailored solutions might be just as effective.

Improving RAG Using Glossary Integration: Llama Index Implementation

In this section, I want to show you how you can inject your custom knowledge into a RAG system for optimizing retrieval and generation in your use case. After demonstrating how to improve both steps when building a RAG system with LlamaIndex, I will conduct a short qualitative evaluation to see if it was all worth it in the first place ;-)

I will demonstrate all this using the documentation of our open-source data curation tool, Renumics Spotlight.

Full code of this tutorial is on GitHub

A Typical Failure Mode

Before we dive into how we can improve our system by integrating a glossary, let’s first discuss what a failure might look like. Let’s assume a typical question a user might ask our system is:

I want to look at my audio data. Which possibilities does Spotlight offer?

This will lead to the answer:

Based on the provided context information, Spotlight offers two possibilities to view audio data: the `audio()` function and the `spectrogram()` function. The `audio()` function adds an audio viewer to the Spotlight inspector widget, while the `spectrogram()` function adds an audio spectrogram viewer to the Spotlight inspector widget. Both functions support a single column of type `spotlight.Audio` with an optional second column of type `spotlight.Window`.

However, this is actually very imprecise, especially considering that I was not really asking about the API but more about general concepts or UI elements I could use. After all, Spotlight is an interactive data exploration tool. Upon reviewing the error, I can see that the context contains only API information, so the model essentially had no choice here. So the question now is:

How do we fix this error???

Optimizing Retrieval and Generation

To summarize plainly and simply, we will do two things to address this:

  1. Rewriting the query so it is more suitable for retrieving relevant information.
  2. Providing the generative model with additional information to align the terminology more closely when formulating its answer.

The solution for rewriting the query is called Query Augmentation. Specifically, the user might not be familiar with the terms used in your documentation and software. So, how about rewriting the query so it adheres more closely to your terminology and includes more terms that can lead to potentially relevant matches in the database?

For the second part, improving generation, we will keep it simple and just insert glossary information in the rewriting prompt to provide additional context.

Using LlamaIndex, both improvements can be easily realized using a CustomQueryEngine:

import json
from typing import Optional
from llama_index.core.query_engine import CustomQueryEngine
from llama_index.core.retrievers import BaseRetriever
from llama_index.core.response_synthesizers import BaseSynthesizer
from llama_index.core import get_response_synthesizer
from llama_index.core.llms import LLM


class NewQueryEngine(CustomQueryEngine):
"""Custom Query Engine that uses the original query."""

glossary: Optional[dict]
llm: LLM
retriever: BaseRetriever
response_synthesizer: BaseSynthesizer

def __init__(self, glossary_path=None, *args, **kwargs):
super().__init__(*args, **kwargs)
self.glossary = {}
if glossary_path:
with open(glossary_path) as f:
self.glossary = json.load(f)

def custom_query(self, query_str: str):
# Retrieve documents based on the original query

glossary_explanation = "\n".join([f'{item.get("term")}, Explanation: {item.get("explanation")}' for item in self.glossary.get("terms", [])])

# Create a prompt for the LLM to rewrite the query, including the glossary explanations
prompt = f"Given the glossary terms and their explanations:\n{glossary_explanation}\nPlease rewrite the following search query to be more effective: '{query_str}'\n\nKeep the question format while enriching the query with important terms while preserving the original intent. Do not exceed 100 words for the query and do only output the query itself."

# Assuming the method to generate a new query string based on the prompt
rewritten_query = self.llm.complete(prompt).text

# Print the rewritten query string
print("Rewritten Query String:", rewritten_query)


nodes = self.retriever.retrieve(rewritten_query)
# Synthesize a response based on the retrieved documents
new_prompt = f"Question: {query_str}\n\nGlossary to incorporate (not part of question):\n{glossary_explanation}"
response_obj = self.response_synthesizer.synthesize(new_prompt, nodes)
return response_obj

For the full code see the GitHub link provided earlier. The relevant file is called improved_rag.ipynb.

The CustomQueryEngine class functions as follows: It is provided with a glossary JSON file that essentially contains pairs of terms and explanations. When a query is received, it prompts the LLM, which it also uses for generation, to rewrite the query to be more effective, given the glossary. After retrieval, it injects the glossary into the prompt asking the LLM to formulate an answer.

By applying this to our use case, our query is transformed as follows:

From:

I want to look at my audio data. Which possibilities does Spotlight offer?

to:

Which widgets in Spotlight are suitable for analyzing audio data? Specifically, what functionalities does it offer for inspecting and handling audio data points?

Suddenly, it includes wording such as “widgets” and “inspecting and handling data points,” which it extracts from the glossary information that, for example, contains:

[
{
"term": "Widget",
"explanation": "A view or plot that can be integrated within the UI of Renumics Spotlight, such as histograms, scatter plots, or similarity maps."
},
{
"term": "Inspector Widget",
"explanation": "A component that allows detailed examination and editing of the features of individual data points, supporting multiple data types and views."
},
]

As you can easily guess, this will surface many more relevant pages from the documentation than just the original query because the system now understands that Spotlight and the rendering of different data points are all about widgets and such.

For the second step, as mentioned before, we will also insert the glossary into the context. However, note that in many cases, I found that optimizing the retrieval step was more crucial and that inserting too much additional information into the context is not always beneficial, depending on your model.

With both steps combined, the model will formulate the following answer:

Spotlight offers two possibilities for viewing audio data: an Audio Player and a Spectrogram viewer. The Audio Player supports annotating event windows in the data, while the Spectrogram viewer displays the audio data in the frequency domain with the option to customize the frequency and amplitude scales. Both options can be found in the Inspector Widget of Spotlight.

An answer that is much more suited to the question’s real intent and also closer to Spotlight’s terminology. I feel that for such a low effort — not building knowledge graphs and writing just a few lines of code — this is quite an improvement.

Another cool thing about this is that it’s very easy to understand for the user and can be easily checked for correctness. For example, you could simply rewrite the query and then ask the user again if that’s what they meant. Also, you could, of course, implement more complex query patterns here, for instance, decomposing the query into multiple queries. For example, for our docs, it could be a natural fit to query different sections with specific new queries, e.g., search for fitting API functions, then search for matching tutorials.

Evaluating the System

But until now, you have only seen one example. So the question is, is it a magic weapon that suddenly improves everything? For this tutorial, I only conducted a brief evaluation, and what I found was the following:

  1. It definitely helps by incorporating the proper terminology into the query.
  2. Thus, it significantly aids the retrieval part.
  3. The additional context when rewriting an answer can sometimes distract the model, so be cautious here!
  4. It can sometimes also lead to the query being rewritten in a way that pulls in less relevant information, so consider implementing additional checks here!

So, it definitely helps, especially when the user might not be familiar with the terms they should search for, or when the original question might not lead to retrieving relevant data, but it needs some additional checks here and there!

You can interactively browse the full evaluation results by executing the error_analysis notebook in the code. The code also contains a quantitative analysis using ragas.

Conclusion

Query rewriting can significantly improve your RAG system’s performance. In general, I believe that many techniques common in search use cases still hold their value in designing robust RAG systems. It doesn’t mean that you have to incorporate all of them from the start. It’s better to start with a vanilla system and identify failure modes using tools such as ragas or Spotlight. Then, make informed decisions on your changes and evaluate them against the original state.

If you want to know more or have a specific use case you want to discuss, feel free to contact me on LinkedIn.

--

--