A bird’s eye view on where to start in building a knowledge graph solution to help your business excel in a data-driven market
How do you think about your data? Do you view it as isolated bits of information or as a network of interconnected pieces?
It’s interesting that when we think about things, people, places, it’s obvious how one thought leads to another. Yet when we consider the data items that represent these things, our legacy systems have led us to the limited understanding that data is made of discrete objects, which hardly interrelate. Often, it happens that siloed data isolated in data islands leads to siloed thinking.
Case in point, think of any product spreadsheet you use at work. Is that spreadsheet you send to your partners related to the product features table on your website? Or is it only related to the data your colleagues send to Google Merchant Feed? In terms of the information it carries, it most certainly is.
Yet, are these easily comparable and connectable? Can you edit them in one single place? Can you perform analytics over them without the need to transform all the formats this same information lives in? Probably not. And it is from that disconnected state, that the knowledge graph journey starts.
The Connected Data Mindset Behind Knowledge Graphs
A knowledge graph is an expression of the idea that everything is semantically connected one way or another. That’s why it is not a product you can just get out of the box and start using. Neither is it a specific task (or a set of tasks) that once you’ve completed, you put on the shelf and forget about it.
A knowledge graph is a process that you gradually weave into your enterprise information lifecycle. When approached with such an understanding, the efforts you put into it turn into scalable repeatable ways of using and sharing your knowledge.
With that in mind, we at Ontotext, together with our clients, approach the knowledge graph endeavor not as a one-off project that solves a specific need. Instead, we view it as building a set of ongoing repeatable and iterative activities that will support how you connect your data pieces long term. This approach entails designing and setting up processes for a future-proof enterprise system, ready to deal with any data-driven project.
This approach entails operational challenges. However, in the long run, it pays off, as data is now well connected and governed, which allows it to readily provide actionable insights. In addition, by leveraging the semantic structure and relationships in the graph, it is possible to create reusable workflows.
This standardizes your data processing across different teams and departments, reducing manual effort and improving data quality. As your knowledge graph grows and adapts to your rapidly changing business needs, it can be used to develop automated processes, making it easier to add new data sources and maintain the integrity of your data.
A Journey of a Thousand Data Pieces Begins with a Single Step
Knowledge graph projects are complex. They call for viewing the challenges as they are being built to meet as an ecosystem. However, at the end of this complexity lies the simple premise of any technology: Does it enable me to know more and do my business better and if so, how?
The straightforward answer to this question is that knowledge graphs put data into context to provide a smart framework for data integration, unification, analytics and sharing. So if you are looking for insights from your data or want to do analytics over it, they enable you to get quick, accurate and valuable contextualized information.
A knowledge graph is not a one-off engineering project. Building a KG requires collaboration between functional domain experts, data engineers, data modelers and key sponsors. It also combines technology, strategy and organizational aspects; focusing only on technology leads to a high risk of a KG’s failure. — Gartner® , Building Knowledge Graphs, Sumit Pal, 7 July 2022
But no matter what levels of complexity you need to handle, you don’t have to deal with everything all at once. You don’t have to try and build a monolithic structure that will solve all your data challenges from the start. We all know that’s not feasible.
The beauty of knowledge graph technology is that it allows you to start small. You can do a pilot use case to test your knowledge graph capabilities and then incrementally and at your own pace build on what you already have.
One of the main advantages of having a knowledge graph is that it can evolve together with your growing enterprise data and changing business needs. The same applies to its project timelines and backlogs. A knowledge graph can start as a small use case in one department, and (as everyone else will want to reap the benefits) it can further blossom into a full-blown data and knowledge management undertaking.
Walking the Knowledge Graph Talk in Three Steps
From what we’ve seen over the years, your knowledge graph project will most likely take you across three pillar processes, namely: definition, discovery and design, and deployment.
Step 1: Definition of Scope [Why You Need A Knowledge Graph]
In the first step of the journey, you decide why you need a knowledge graph. To set up such a future-proof architecture, it is important to have a clear understanding of your organization’s data needs and use cases and involve various stakeholders in the process. This would ensure that your knowledge graph is aligned with your business goals and that it is adopted and used effectively by all those who will rely on it.
By agreeing upon their needs and expectations from the project, stakeholders clarify (sometimes for themselves as well) what the purpose of building a knowledge graph is and what would be the benefits to the business. This includes establishing ownership and responsibilities for building and maintaining the knowledge graph as well as encouraging a culture of connected data mindset and collaboration across teams.
This step has nothing to do with data crunching, data wrangling, data integration or whatever other techy talk you have heard. It’s about the ability to ask the things you want to ask from your data and get insights that feed into your business.
The best practice is to start by agreeing upon a list of questions that your newly built knowledge graph will answer. We call them competency questions. These are the questions that the various stakeholders want to get answered that help businesses with better actionable insights. For example: “Which companies, part of the EU, bring the highest revenue for my business?” or “Based on the results of an ongoing lead generation campaign, what are the best leads to work on?”.
Having discovered the use case requirements for the project, you then want to find out what data you already have.
Step 2: Discovery and Design [What You Need To Have In Your Knowledge Graph]
In this step, you carefully analyze what data sources are available in your organization and whether there are any external data sources that can be of use for your project. This is also the step where you identify relevant ontologies and taxonomies that are specific to your domain or find external ones you can reuse. In this way, your data can be organized and connected without having to make a new ontology from scratch.
At this point, you also evaluate the potential benefits and costs of integrating each data source, or any given ontology or taxonomy. This is where you might consider factors such as the relevance of the data to your project, the cost and effort required to integrate it, its availability and quality, and so on.
Finally, all your stakeholders make a decision about which sources and data schemas to use and prioritize them based on the project’s requirements from the previous step. This process is a team effort where different teams brainstorm and arrive at a consensus. As we already said, building a knowledge graph is an incremental process and limiting the initial scope of the project is important so it doesn’t bog down quickly.
Once the sources and the relevant ontologies and taxonomies are defined, data gets integrated into the knowledge graph. For that to happen, all data needs to be cleaned and structured. This alone might require text analysis pipelines, ETL tools, mapping editors, and so on. So, you’ll need to allocate time and resources to decide on the best pipelines and services that you will build to populate and refresh the knowledge graph.
Step 3: Deployment [How Will You Build Your Knowledge Graph]
This is the stage where your enterprise knowledge graph will be available for use in a production environment. At this point, most of the work has been done. Still, to ensure the quality of the data and to keep the knowledge graph up-to-date, it’s important to develop a comprehensive data governance and management strategy. This includes things like data quality checks, versioning, auditing, data security and monitoring.
In the Deployment stage, you can add interfaces for applications on top of the knowledge graph to further improve the accessibility and usability of your data for end users. This will allow them to extract insights, share and reuse knowledge and make data-driven decisions. The knowledge graph can also enable personalized recommendations or power other applications and services. Or you can build a semantic search layer on top of your website or your CRM system to feed your business with actionable insights.
The deployment process can also be iterative. In other words, as the knowledge graph evolves along with your business requirements, new features and data can be added as needed.
Epilogue: Trust the Knowledge Graph Process
Don’t let your knowledge graph project be a one-off. Lay your data foundations properly, and avoid costly complications and ensure your business processes remain agile. And, the best place to start is to discover exactly what you want from your knowledge graph.
Then you need to think about the actions and the actors that will take you there — what does it take to deliver on your vision? Such an approach, especially compared to building a knowledge graph as a one-off project, may take longer to set up. But in the long run, it allows you to reap the benefits of data that is well structured and future-proof and that can be utilized for any upcoming data challenges.