OutSystems, OpenAI Embeddings and Qdrant Vector Database — Find Similar

Stefan Weber
ITNEXT
Published in
9 min readAug 5, 2023

--

In this first article of a two-article series, we’ll explore how you can use OpenAI embeddings and the Qdrant Vector Database to make text searches for similar meanings in your OutSystems application.

This article, as well as the upcoming one, includes a sample QnA application that I have published on Forge. I will explain the implementation details using this sample. See links below.

Part 1 — “Find Similar” (this article)

In this article you will learn the basics of vector embeddings and vector databases. Plus, it'll show you how to use pre-built Forge components to create these vector embeddings, save them in Qdrant Vector Database, and execute similarity queries.

Part 2 — “Answer Right”

In the second part, we’re expanding our application. We’re combining our similarity search with OpenAI’s completions. We’re using the search results from our QnA application as context for an OpenAI prompt. This helps OpenAI answer users’ questions based only on the answers we’ve collected. This is a common way to use generative AI along with a reliable information source to avoid incorrect answers.

What are Vector Embeddings

Embeddings are numerical representations (vectors) used to encode several types of information, such as text, images, audio, and video files. These representations are generated through a trained language model, allowing them to capture the true meaning of the input data.

By the time of writing the OpenAI embeddings model text-embedding-ada-002 creates vectors with 1536 dimensions, meaning each embedding is an array of 1536 floating-point numbers. The key advantage of embeddings is that items that are close together in vector space are also semantically similar. This means that if two vectors are numerically similar, the corresponding data they represent share similar meanings or context.

A new type of database called vector databases has emerged to address the efficient storage and retrieval of vectors, particularly for performing similarity queries. Unlike traditional databases, which may not be optimized for handling high-dimensional numerical data like vectors, vector databases are designed to excel in these tasks.

Qdrant Vector Database

Qdrant is both a vector database and a vector similarity search engine, created and maintained by Qdrant Solutions. The project is openly available on Github under the permissive Apache 2.0 license. In addition, Qdrant Solutions provides a fully managed Qdrant Database Cluster cloud service.

Vector databases like Qdrant play an essential role in combination with large language models to enhance the efficiency and effectiveness of natural language processing tasks. The combination of vector databases and language models empowers applications like information retrieval, recommendation systems, sentiment analysis and semantic search.

A brief note on the sample application

In sample application, I am directly using the OpenAI API endpoints for vector embeddings. You might have noticed that OutSystems recently released an Azure OpenAI component on the Forge Marketplace, which also provides support for creating vector embeddings. However, since not everyone has access to an Azure tenant and the process for obtaining Azure OpenAI access might still require an application at the time of writing, I opted to use the non-Azure endpoints instead.

Anyhow, if you have access to Azure OpenAI services you can exchange the embeddings part in the demo application with the official and supported component.

Besides OpenAI you can use any other service for creating vector embeddings.

Prerequisites

Before you can use the demo application you need to perform the following tasks.

Download Sample Application

Go to OutSystems Forge and download the sample application.

The sample application has dependencies to the following other Forge components.

OpenAI Embeddings is a small connector that implements only the Embeddings endpoint of OpenAI. It is used to create vector embeddings for questions and search terms in the sample application.

Qdrant Vector Database connects with self-hosted or Qdrant Solutions cloud service instances of a Qdrant database cluster. It provides server actions to list and create collections of vector embeddings, save vector embeddings (Points) and querying.

Register an OpenAI Account

Visit the OpenAI website and sign up for an account. OpenAI is a commercial offering, and you will be charged per usage. With registration you will get some free credits which are more than sufficient for experimenting.

After signing up go to View API Keys in your profile menu and create a new API Key. Make sure to copy the key when it gets displayed on the screen. The key is needed to authorize your requests to the OpenAI API.

Create a Qdrant Cluster

Qdrant Solutions is offering a free tier of their Qdrant vector database cloud service. At https://cloud.qdrant.io you can sign up with your GitHub or Google Account.

After signing up follow the wizard to create your free tier cluster. Copy the Cluster URL and your API key.

Configure Qdrant Module

Open OutSystems Service Center of the environment where you installed the Qdrant Vector Database Forge component.

Under the Modules menu locate the qdrant_IS module and open it. Select the Integrations tab and under the Consumed REST APIs section set the qdrant URL to the Qdrant cluster URL you copied.

Configure Site Properties

Lastly you need to configure some site properties in the semantic search demo application module VectorEmbeddingsDemo

  • Set OpenAIKey to the API Key you created in your OpenAI account profile
  • Set QdrantAPIKey to the API key you got when creating your Qdrant cluster
  • Change the QdrantCollectionName property to your liking or leave the default.

Sample application walkthrough

With all prerequisites done you can now open the sample application.

I have added some sample data taken from the Munich Airport FAQs. Click on the Bootstrap Sample Data button and wait until the Question-Answers pairs are displayed on the screen. Make sure that you have completed all the prerequisites above.

Sample application with bootstrapped question-answer pairs

While adding sample data behind the scenes the application creates vector embeddings for all questions (not answers) and adds them to your Qdrant Cluster collection.

Now try your first search. Enter the search term “I am severely disabled. What do I need to know about this?” and click the Search button.

Search results with similarity score

The application takes the input search term and generates vector embeddings based on it. Subsequently, it conducts a similarity search within your Qdrant Cluster collection. Qdrant provides results along with a score ranging from 0.1 to 1.0, reflecting the proximity between the embeddings of the entered query and the question embeddings stored in the Cluster collection.

Please note that your Qdrant Collection does not store the text of your questions but rather just its embeddings. It's up to your application to match Qdrant results to Question-Answer pairs stored in your applications database.

Feel free to add additional question-answer pairs or create your own knowledge base from scratch. Try out different search terms and see how results and scoring change.

Adding a Question-Answer Pair

Open Service Studio and the sample application module. In the Logic tab open the Articles_SaveArticle server action.

Save Article Server Action

The Articles_SaveArticle server action first adds or updates an article in the database.

Then it calls the OpenAI_CreateEmbeddings server action of the OpenAI Embeddings Forge component. Embeddings are created for the question of our Question-Answer pair.

CreateEmbeddings Properties

OpenAI_CreateEmbeddings takes an API Key from the configured OpenAIKey site property and an OpenAI model that is suitable to create embeddings. At the time of writing, you can only use the text-embedding-ada-002 model. Last OpenAI_CreateEmbeddings takes an array of text (which later results in an array of generated embeddings). In our case that is the question.

Upon success the embeddings are then written to your Qdrant Vector Database cluster using the Qdrant_UpsertPoints server action from the Qdrant Vector Database Forge component.

Qdrant_UpsertPoints creates or updates vector embeddings.

Upsert Qdrant Points Properties

ApiKey and CollectioName are retrieved from the configured site properties.

A point in Qdrant represents vector embeddings and is identified by a unique identifier. We use LocalArticleId which is either set to a new UUID (for new questions) or an existing Question-Answer pair UUID (for existing questions).

The Vector property is set to the result (first result of the array) of the OpenAI_CreateEmbeddings server action.

Last you can optionally add some additional Payload data. A MetadataId and one or more keywords. MetadataId can be used both for filtering query results and as a grouping identifier. Likewise, keywords can be used to filter query results.

Note the ValidateCollection server action on top of the Articles_SaveArticle flow which checks if the Qdrant collection exists and if not creates the collection.

ValidateCollection Server Action

When creating a Qdrant collection you need to specify the dimensions of the vector embeddings you want to store. OpenAI embeddings returns 1536 dimensions. If you want to use another Embeddings service, check their documentation on how many dimensions are returned.

You must also set the Distance property. This specifies how similarity queries are performed in that collection. Qdrant supports Cosine, Dot and Euclide distance queries. More on that can be found in the documentation.

Querying Question-Answer Pairs

Next open the Articles_SearchArticles server action in Vector Embeddings demo application. This server action takes a single SearchTerm as input parameter. It first performs a query against your Qdrant Cluster collection matching results with stored Question-Answer pairs in your application database.

Search Question Answer Pairs

This server action flow has two main streams

If no search term is given, then it just returns all Question-Answer pairs from the database.

If a search term is given:

  • The search term is transformed to vector embeddings.
  • The Qdrant Cluster collection is then queried using the generated vector embeddings using Qdrant_SearchPoints from the Qdrant Vector Database Forge component.

In the sample application Qdrant_SearchPoints is configured to return a maximum of 6 articles. You can also add a scoring threshold filter, to only return records with a scoring greater than the value provided (ranging from 0.1 to 1.0)

If this leads to no results the server action just exits out return an empty result list.

If there is a result, the whole result is stored in local variable for later use.

  • For each result entry the Id — which corresponds to a Question-Answer pair id in the database — is used to build a safe SQL IN Filter.
  • That filter is used in the advanced SQL statement to get all corresponding Question-Answer pairs from the database
  • Last the database results are merged with Qdrant results (combining the score with the Question-Answer pairs), ordered by scoring and the result list is returned.

Articles_SearchArticles is used in the GetArticles data action in the Demo screen to retrieve all articles.

With OutSystems, Vector Embeddings and a Qdrant Vector Database it is easy to add a semantic similarity text search capability to an application.

Thank you for reading. I hope you liked it and that i have explained the important parts well. Let me know if not 😊

If you have difficulties in getting up and running, please use the OutSystems Forum to get help. Suggestions on how to improve this article are very welcome. Send me a message via my OutSystems Profile or respond directly here on medium.

If you like my articles, please leave some claps. Follow me and subscribe to receive a notification whenever I publish a new article. Happy Low Coding!

--

--

Digital Craftsman, OutSytems MVP, AWS Community Builder and Senior Director at Telelink Business Services