OutSystems, OpenAI Embeddings and Qdrant Vector Database — Find Similar

Published in

ITNEXT

9 min readAug 5, 2023

In this first article of a two-article series, we’ll explore how you can use OpenAI embeddings and the Qdrant Vector Database to make text searches for similar meanings in your OutSystems application.

This article, as well as the upcoming one, includes a sample QnA application that I have published on Forge. I will explain the implementation details using this sample. See links below.

Part 1 — “Find Similar” (this article)

In this article you will learn the basics of vector embeddings and vector databases. Plus, it'll show you how to use pre-built Forge components to create these vector embeddings, save them in Qdrant Vector Database, and execute similarity queries.

Part 2 — “Answer Right”

OutSystems, OpenAI Embeddings and Qdrant Vector Database — Answer Right

In this article we combine Qdrant semantic similarity search with OpenAI Chat Completions to generate tailored answers.

lcnc.blog

In the second part, we’re expanding our application. We’re combining our similarity search with OpenAI’s completions. We’re using the search results from our QnA application as context for an OpenAI prompt. This helps OpenAI answer users’ questions based only on the answers we’ve collected. This is a common way to use generative AI along with a reliable information source to avoid incorrect answers.

What are Vector Embeddings

Embeddings are numerical representations (vectors) used to encode several types of information, such as text, images, audio, and video files. These representations are generated through a trained language model, allowing them to capture the true meaning of the input data.

By the time of writing the OpenAI embeddings model text-embedding-ada-002 creates vectors with 1536 dimensions, meaning each embedding is an array of 1536 floating-point numbers. The key advantage of embeddings is that items that are close together in vector space are also semantically similar. This means that if two vectors are numerically similar, the corresponding data they represent share similar meanings or context.

A new type of database called vector databases has emerged to address the efficient storage and retrieval of vectors, particularly for performing similarity queries. Unlike traditional databases, which may not be optimized for handling high-dimensional numerical data like vectors, vector databases are designed to excel in these tasks.

Qdrant Vector Database

Qdrant is both a vector database and a vector similarity search engine, created and maintained by Qdrant Solutions. The project is openly available on Github under the permissive Apache 2.0 license. In addition, Qdrant Solutions provides a fully managed Qdrant Database Cluster cloud service.

Vector Search Database | Qdrant Cloud

Managed cloud solution of the Qdrant vector search engine. Cloud-native vector database for high performant vector…

cloud.qdrant.io

GitHub - qdrant/qdrant: Qdrant - Vector Database for the next generation of AI applications. Also…

Qdrant - Vector Database for the next generation of AI applications. Also available in the cloud…

github.com

Vector databases like Qdrant play an essential role in combination with large language models to enhance the efficiency and effectiveness of natural language processing tasks. The combination of vector databases and language models empowers applications like information retrieval, recommendation systems, sentiment analysis and semantic search.

A brief note on the sample application

In sample application, I am directly using the OpenAI API endpoints for vector embeddings. You might have noticed that OutSystems recently released an Azure OpenAI component on the Forge Marketplace, which also provides support for creating vector embeddings. However, since not everyone has access to an Azure tenant and the process for obtaining Azure OpenAI access might still require an application at the time of writing, I opted to use the non-Azure endpoints instead.

Anyhow, if you have access to Azure OpenAI services you can exchange the embeddings part in the demo application with the official and supported component.

Azure OpenAI Connector

The Azure OpenAI Connector enables developers to seamlessly connect and leverage the advanced artificial intelligence…

www.outsystems.com

Besides OpenAI you can use any other service for creating vector embeddings.

Prerequisites

Before you can use the demo application you need to perform the following tasks.

Download Sample Application

Go to OutSystems Forge and download the sample application.

Vector Embeddings Demo

Demo application on how to use OpenAI embeddings and Qdrant Vector Database to add semantic similiarity search…

www.outsystems.com

The sample application has dependencies to the following other Forge components.

OpenAI Embeddings is a small connector that implements only the Embeddings endpoint of OpenAI. It is used to create vector embeddings for questions and search terms in the sample application.

OpenAI Embeddings

API Connector for the embeddings endpoint of OpenAI

www.outsystems.com

Qdrant Vector Database connects with self-hosted or Qdrant Solutions cloud service instances of a Qdrant database cluster. It provides server actions to list and create collections of vector embeddings, save vector embeddings (Points) and querying.

Qdrant Vector Database

Connector for Qdrant vector database. Vector databases efficiently store and manage high-dimensional vectors and are…

www.outsystems.com

Register an OpenAI Account

Visit the OpenAI website and sign up for an account. OpenAI is a commercial offering, and you will be charged per usage. With registration you will get some free credits which are more than sufficient for experimenting.

After signing up go to View API Keys in your profile menu and create a new API Key. Make sure to copy the key when it gets displayed on the screen. The key is needed to authorize your requests to the OpenAI API.

Create a Qdrant Cluster

Qdrant Solutions is offering a free tier of their Qdrant vector database cloud service. At https://cloud.qdrant.io you can sign up with your GitHub or Google Account.

After signing up follow the wizard to create your free tier cluster. Copy the Cluster URL and your API key.

Configure Qdrant Module

Open OutSystems Service Center of the environment where you installed the Qdrant Vector Database Forge component.

Under the Modules menu locate the qdrant_IS module and open it. Select the Integrations tab and under the Consumed REST APIs section set the qdrant URL to the Qdrant cluster URL you copied.

Configure Site Properties

Lastly you need to configure some site properties in the semantic search demo application module VectorEmbeddingsDemo

Set OpenAIKey to the API Key you created in your OpenAI account profile
Set QdrantAPIKey to the API key you got when creating your Qdrant cluster
Change the QdrantCollectionName property to your liking or leave the default.

Sample application walkthrough

With all prerequisites done you can now open the sample application.

I have added some sample data taken from the Munich Airport FAQs. Click on the Bootstrap Sample Data button and wait until the Question-Answers pairs are displayed on the screen. Make sure that you have completed all the prerequisites above.

Sample application with bootstrapped question-answer pairs

While adding sample data behind the scenes the application creates vector embeddings for all questions (not answers) and adds them to your Qdrant Cluster collection.

Now try your first search. Enter the search term “I am severely disabled. What do I need to know about this?” and click the Search button.

The application takes the input search term and generates vector embeddings based on it. Subsequently, it conducts a similarity search within your Qdrant Cluster collection. Qdrant provides results along with a score ranging from 0.1 to 1.0, reflecting the proximity between the embeddings of the entered query and the question embeddings stored in the Cluster collection.

Please note that your Qdrant Collection does not store the text of your questions but rather just its embeddings. It's up to your application to match Qdrant results to Question-Answer pairs stored in your applications database.

Feel free to add additional question-answer pairs or create your own knowledge base from scratch. Try out different search terms and see how results and scoring change.

Adding a Question-Answer Pair

Open Service Studio and the sample application module. In the Logic tab open the Articles_SaveArticle server action.

The Articles_SaveArticle server action first adds or updates an article in the database.

Then it calls the OpenAI_CreateEmbeddings server action of the OpenAI Embeddings Forge component. Embeddings are created for the question of our Question-Answer pair.

OpenAI_CreateEmbeddings takes an API Key from the configured OpenAIKey site property and an OpenAI model that is suitable to create embeddings. At the time of writing, you can only use the text-embedding-ada-002 model. Last OpenAI_CreateEmbeddings takes an array of text (which later results in an array of generated embeddings). In our case that is the question.

Upon success the embeddings are then written to your Qdrant Vector Database cluster using the Qdrant_UpsertPoints server action from the Qdrant Vector Database Forge component.

Qdrant_UpsertPoints creates or updates vector embeddings.

ApiKey and CollectioName are retrieved from the configured site properties.

A point in Qdrant represents vector embeddings and is identified by a unique identifier. We use LocalArticleId which is either set to a new UUID (for new questions) or an existing Question-Answer pair UUID (for existing questions).

The Vector property is set to the result (first result of the array) of the OpenAI_CreateEmbeddings server action.

Last you can optionally add some additional Payload data. A MetadataId and one or more keywords. MetadataId can be used both for filtering query results and as a grouping identifier. Likewise, keywords can be used to filter query results.

Note the ValidateCollection server action on top of the Articles_SaveArticle flow which checks if the Qdrant collection exists and if not creates the collection.

When creating a Qdrant collection you need to specify the dimensions of the vector embeddings you want to store. OpenAI embeddings returns 1536 dimensions. If you want to use another Embeddings service, check their documentation on how many dimensions are returned.

You must also set the Distance property. This specifies how similarity queries are performed in that collection. Qdrant supports Cosine, Dot and Euclide distance queries. More on that can be found in the documentation.

Search - Qdrant

Qdrant is an Open-Source Vector Database and Vector Search Engine written in Rust. It provides fast and scalable vector…

qdrant.tech

Querying Question-Answer Pairs

Next open the Articles_SearchArticles server action in Vector Embeddings demo application. This server action takes a single SearchTerm as input parameter. It first performs a query against your Qdrant Cluster collection matching results with stored Question-Answer pairs in your application database.

This server action flow has two main streams

If no search term is given, then it just returns all Question-Answer pairs from the database.

If a search term is given:

The search term is transformed to vector embeddings.
The Qdrant Cluster collection is then queried using the generated vector embeddings using Qdrant_SearchPoints from the Qdrant Vector Database Forge component.

In the sample application Qdrant_SearchPoints is configured to return a maximum of 6 articles. You can also add a scoring threshold filter, to only return records with a scoring greater than the value provided (ranging from 0.1 to 1.0)

If this leads to no results the server action just exits out return an empty result list.

If there is a result, the whole result is stored in local variable for later use.

For each result entry the Id — which corresponds to a Question-Answer pair id in the database — is used to build a safe SQL IN Filter.
That filter is used in the advanced SQL statement to get all corresponding Question-Answer pairs from the database
Last the database results are merged with Qdrant results (combining the score with the Question-Answer pairs), ordered by scoring and the result list is returned.

Articles_SearchArticles is used in the GetArticles data action in the Demo screen to retrieve all articles.

With OutSystems, Vector Embeddings and a Qdrant Vector Database it is easy to add a semantic similarity text search capability to an application.

Thank you for reading. I hope you liked it and that i have explained the important parts well. Let me know if not 😊

If you have difficulties in getting up and running, please use the OutSystems Forum to get help. Suggestions on how to improve this article are very welcome. Send me a message via my OutSystems Profile or respond directly here on medium.

If you like my articles, please leave some claps. Follow me and subscribe to receive a notification whenever I publish a new article. Happy Low Coding!

OutSystems, OpenAI Embeddings and Qdrant Vector Database — Find Similar

OutSystems, OpenAI Embeddings and Qdrant Vector Database — Answer Right

In this article we combine Qdrant semantic similarity search with OpenAI Chat Completions to generate tailored answers.

What are Vector Embeddings

Qdrant Vector Database

Vector Search Database | Qdrant Cloud

Managed cloud solution of the Qdrant vector search engine. Cloud-native vector database for high performant vector…

GitHub - qdrant/qdrant: Qdrant - Vector Database for the next generation of AI applications. Also…

Qdrant - Vector Database for the next generation of AI applications. Also available in the cloud…

A brief note on the sample application

Azure OpenAI Connector

The Azure OpenAI Connector enables developers to seamlessly connect and leverage the advanced artificial intelligence…

Prerequisites

Download Sample Application

Vector Embeddings Demo

Demo application on how to use OpenAI embeddings and Qdrant Vector Database to add semantic similiarity search…

OpenAI Embeddings

API Connector for the embeddings endpoint of OpenAI

Qdrant Vector Database

Connector for Qdrant vector database. Vector databases efficiently store and manage high-dimensional vectors and are…

Register an OpenAI Account

Create a Qdrant Cluster

Configure Qdrant Module

Configure Site Properties

Sample application walkthrough

Adding a Question-Answer Pair

Search - Qdrant

Qdrant is an Open-Source Vector Database and Vector Search Engine written in Rust. It provides fast and scalable vector…

Querying Question-Answer Pairs

Written by Stefan Weber