Ontotext Marketing Gets a Boost from Knowledge Graph Powered LLMs

Ontotext
9 min readMar 13, 2024

This is an abbreviated and updated version of a presentation from Ontotext’s Knowledge Graph Forum 2023 by Krasimira Bozhanova, Solutions Architect at Ontotext

Motivated by our marketing team’s aim to simplify content discovery on our website, we initiated the Ontotext Knowledge Graph (OTKG) project. Our years-long experience in aiding organizations to gain maximum benefits from their content taught us the necessity of more than just a rudimentary search interface.

Early on, we grasped the potential of our emerging project. We envisioned harnessing the power of our products to elevate our entire content publishing process, thereby facilitating in-depth knowledge exploration. Eventually, this led to the transformation of the project into forming an expansive knowledge graph containing all the marketing knowledge we’ve generated, ultimately benefiting the whole organization.

What is OTKG?

OTKG models information about Ontotext, combined with content produced by different teams inside the organization. We started with our marketing content and quickly expanded that to also integrate a set of workflows for data and content management. Our goal is to generate a knowledge space where information is easy to find, reuse, and fuel knowledge-driven insights.

The project involves different departments and leverages our products and capabilities by combining them with our marketing team’s expertise. It also allows us to put the new features of our products into a real-world use case right away. Doing so we become early adopters, gather instant feedback from our internal users, and use it to improve our products further. Working in a cross-functional team also enables us to do short value-driven iterative cycles. It allows us to quickly prototype and assess the potential of innovative ideas and focus on the features that bring the strongest impact to the users.

From knowledge graph building to value

To enhance the discoverability of information about Ontotext, we aimed at unlocking the value embedded within our content and rendering it readily accessible.

Our standard methodology for such projects is to start by defining competency questions that would help us understand what we need to model in our graph. Then we build a knowledge graph consisting of custom ontologies (in our case, an extension of schema.org), and custom taxonomies. We store this in GraphDB by leveraging standard tooling for knowledge graph management. We also use Ontotext Refine for transforming structured and semi-structured content to RDF.

Through Ontotext Metadata Studio (OMDS), we then apply semantic content enrichment using text analysis based on our marketing vocabularies. This allows us to classify our content according to the refined model and the entities of interest. We expose this classified content by flexible semantic faceted search with the help of metaphacts’ knowledge graph platform metaphactory.

These steps help pave the way to integrate the knowledge graph with large language models (LLMs) and provide state-of-the-art knowledge discovery and exploration.

To make this as valuable as possible, the system needs to work with the latest content, so we have built workflows to maintain the data in the graph in sync with what is published on our website.

Moreover, our goal was to build this system in such a way that it can be easily adapted to other organizations that might want the same content management benefits.

Where does AI fit into this?

Now let’s see how we integrated knowledge graphs with AI on each of these layers.

During the knowledge graph building and semantic enrichment processes, we aim to boost the discoverability of our content, both for search engines and our platform. We achieve this by quality tagging during content publishing via graph-based entity linking. In this way, we benefit from better SEO and semantic-driven content discovery.

For efficient knowledge discovery, we have to identify information sources efficiently. We also have to be able to provide users with tailored navigation and relevant recommendations for building up their knowledge.

Finally, when it comes to question-answering and insights, we aim to expose a natural language interface to consume the information in our knowledge graph in a user-friendly way. The knowledge graph also provides a foundation for trend analysis and knowledge-driven insights.

The behind-the-scenes interface

Let’s see how this works. OMDS shows the collections of all the documents we currently have, such as blog posts, events, news articles, etc. This is what our marketing team uses to enrich new content with semantic metadata before publishing it.

Let’s open a document about making GraphDB available on the Amazon Web Services Marketplace. On the right side, we see the text of the document, and on the left, all the mentions that have been automatically discovered by the OMDS Tagger.

This is graph-based tagging, so the mentions are not just keywords. They are the entities we have modeled and the graph’s connectedness is instrumental for us in classifying our content with them. We also see that they are ordered by relevance and, not surprisingly, Amazon Web Services is the highest relevant concept followed by GraphDB.

Now, let’s say a member of the marketing team wants to understand more about why the document has been classified like that. We can open the annotation about cloud providers and we can click on the Highlight button in the top right corner to see the explainability of the hints in the text that led to this classification. For “Amazon Web Services” the Highlight will show us that it has been assigned because of the “AWS” mention in the text and also because we have a relationship between cloud providers and AWS in our graph. So, here is the direct application of the inference logic in our RDF database that can enrich the mentions.

The end-user interface

Let’s see how all described above can be explored by users.

As you can see from the screenshot above, our content is at the center and there are various filters on the left. We have standard filters, based on metadata, such as document type, publication date, or author. But we also have a variety of additional semantic filters based on our custom vocabularies and the OMDS classifications. Expanding these, we will see the whole tree for each custom vocabulary. All this allows us to better filter, navigate, and find content we might be interested in.

For example, if we filter only blog posts in the Resource Type, choose System Operations from Capabilities and Cloud Providers in Topics, and add the search term “cloud”, we will see the relevant matches on the right. The first result is the blog post about GraphDB on the cloud that we already saw in the OMDS interface as well as other results.

If we open this post, we will see three of the mentions we have assigned via OMDS. We can also observe the nodes in the mentions and click on different topics to read about their definitions in the knowledge graph. We can also see trends and analytics based on other concepts in our knowledge graph that correlate the most with the mention we are interested in.

Thanks to these mentions, we can also provide recommendations for relevant content. It is worth noting that these are graph-based recommendations that don’t rely on just content similarity.

The chat interface

Faceted search is the classic way to query our content. Let’s now switch to the fancier one — our natural language querying interface, which can be combined with the filters.

To do that, we have integrated ChatGPT with our knowledge graph by enhancing the standard retrieval augmented generation pattern. As described in our previous blog post where we inspect the various approaches to achieve Graph RAG, we can use the GraphDB ChatGPT Retrieval Plugin Connector to easily transform our knowledge graph into embeddings and identify the information we need to provide to the LLM to answer the user question.

The diagram above shows how we achieve the chat functionality with our products.

Starting from the left, the user selects their content filters and sends their question to the system. The request is consumed by GraphDB, which stores all content enrichment metadata produced by OMDS. The user question and filters are then sent to a simple chat application. It retrieves the most relevant grounding context for the question from a vector database (in this case, Weaviate).

The vector database has been synchronized with the content in GraphDB through a connector enabling dynamic synchronization. It means that when new information comes in GraphDB, it will be automatically updated in the vector database, so questions can be answered using new information straight away. The grounding context, together with the question, is sent to the LLM for answering. After receiving the response from the LLM, we enrich it with semantic metadata and the response served to the user is then contextualized with appropriate follow-up suggestions.

Asking a question

Let’s see an example. We can filter our content by publication date and focus on the last 3 years. Then, we can ask a question such as, “What are the main benefits of using GraphDB on AWS?”

The chat interface returns an answer and, besides standard information, it also provides insights, such as that it simplifies procurement.

It is worth noting that we can limit the hallucinations of the LLM by grounding it with reliable context. The links in the answer point to the sources of the generated answer, which allows users to trace back the information and read further if they are interested in more details.

If we scroll further down, we can see a tree of the classification mentions, assigned to the response. We provide related follow-up reading materials that help the user get up to speed with the topic they are interested in. By combining all these techniques, we can place the user’s interest in the context of our knowledge graph by showing the information that is relevant to them.

LLMs and knowledge graphs

Let’s now zoom out and see how our products benefit from the power of AI and how this synergy helped us achieve our goal.

  • For quality tagging, OMDS enables knowledge graph aware text analysis over all of our content.
  • For knowledge discovery, the GraphDB ChatGPT Retrieval Plugin Connector generates vector embeddings and allows us to efficiently identify information sources. In addition, GraphGB Connector integrations with FTS and vector databases deliver graph-driven recommendations.
  • For question-answering and insights, we use LLMs, and the graph connectedness and inference help us identify new knowledge, trends, co-occurrences, user needs, and so on.

Our experience with this project shows once again how LLMs and knowledge graphs integrate naturally into a better package.

In OTKG, we have applied a set of techniques that address the major challenges of using LLMs in a production system:

  • We enhance LLMs with our own structured and unstructured information to address the fact that they are limited to public knowledge.
  • We provide them with reliable grounding context to counter the problem of hallucinations and untrustworthiness.
  • We show inline links to the sources of the generated answer to address the lack of explainability and traceability.

To wrap it up

Using LLMs in a production system may not be as straightforward as it may initially appear. However, LLMs and knowledge graphs together can bring more value than the sum of their parts. They really complement each other leading to exciting new capabilities. Finally, the combination of knowledge graphs and semantic metadata enhances LLMs for better content discovery, understanding, question-answering, and insights.

Krasimira Bozhanova, Solutions Architect at Ontotext

Originally published at https://www.ontotext.com on March 13, 2024.

--

--

Ontotext

Ontotext is a global leader in enterprise knowledge graph technology and semantic database engines.