How Knowledge Graphs Power Data Mesh and Data Fabric

Ontotext
7 min readApr 10, 2024

--

This is an abbreviated and updated version of a presentation from Ontotext’s Knowledge Graph Forum 2023 by Sumit Pal, Strategic Technology Director at Ontotext.

The data ecosystem today is crowded with dazzling buzzwords, all fighting for investment dollars. A survey in 2021 found that a data company was being funded every 45 minutes. Data ecosystems have become jungles and in spite of all the technology, data teams are struggling to create a modern data experience. This is a “Datastrophe“.

Data managed correctly can turn assets into actionable insights, left unmanaged it can be the uranium, causing massive damage to enterprises with data and security breaches being rampant.

Drowning in Data, Thirsting for Context

We’ve heard the saying, “Data, data everywhere. Not a drop of insight.” As more data accumulates, context gets diluted and lost. Most organizations today languish at the information layer of the DIKW pyramid, unable to make the leap to the knowledge and wisdom layer, where the real value of data is.

Bad Data Tax

One of the reasons for this is what we call “bad data tax”. Bad data tax is rampant in most organizations. Currently, every organization is blindly chasing the GenAI race, often forgetting that data quality and semantics is one of the fundamentals to achieving AI success. Sadly, data quality is losing to data quantity, resulting in “Infobesity”.

Any enterprise CEO really ought to be able to ask a question that involves connecting data across the organization, be able to run a company effectively, and especially to be able to respond to unexpected events. Most organizations are missing this ability to connect all the data together. Sir Tim Berners-Lee

Challenges in the Enterprise

The mind map below shows some of the major pain points in mid- to big-sized organizations.

In most enterprises data teams lack a data map and data asset inventory and are often unaware of data that exists across the organization, its associated profile, quality and associated metadata. Teams can’t access data to build their business use cases. Data duplication and data copying across business silos, and prevent organizations from linking this data across different systems to get an integrated view. Most do manual reconciliations which result in downstream challenges and reporting errors.

Data Lakes, Data Catalogs, and Findability

Organizations approach data lakes as cheap storage. They move data to data lakes creating another copy — the mantra being — “Lets move the data to a data lake and then we will figure out what to do with it”. This results in a huge findability challenge. A survey from McKinsey a few years back found data personas in most organizations spend 30% of their time in finding data. Proliferation of data catalogs across vendors creates metadata silos and does little to connect the data semantically.

Most organizations take a myopic view of their data ecosystems and think of transactional and analytical as isolated systems.

Additionally, there’s a lack of semantics for the data. In order for data to be useful, it needs semantics and not more tools and data. If organizations want to have a competitive advantage, their data needs to be semantically aware and semantically connected. The problem is not the data silos, but the disconnect they cause. As data proliferates it loses consistency, morphs its meaning, and leads to downstream challenges.

Enough about challenges and problems, let’s see how one can solve these challenges.

Knowledge graphs and semantic metadata

Knowledge graphs (KGs) are the key to:

  • Advanced Data Architecture & Models like Data Fabric, Data Mesh
  • Unified Data Access
  • Semantic Data Integration

These fundamental capabilities of KGs enable them to bridge the chasm between information and knowledge in the DIKW pyramid. It serves a hub for data, metadata, and content, thus making existing data interlinked and contextualized. Today’s ecosystem doesn’t need plain metadata, it needs semantic metadata.

For example, a product data tag is basic metadata. Product tags have barcodes, with numbers and symbols. However, this is undecipherable until connected to reference data that makes it machine-readable and contextually-aware.

Semantic metadata opens the door to interconnecting data meaningfully, forging new experiences for data exploration and discovery, avoiding ambiguity, and improving findability.

New Approaches to Data Management

Over the last few years, we’ve heard about data fabric and data mesh.

Customers often ask, “Do I need a data fabric or a data mesh?” But this is the wrong question as these are not competing but complementing paradigms with a symbiotic relationship to localize data ownership at the domain level and create reusable data products across the organization.

While data mesh takes a bottom-up approach, data fabric is top-down. Coalescence of data fabric and data mesh, powered by semantic knowledge graphs, can lead to significant reduction of ETL, data moving, and data copying and eliminating redundancies.

Data Fabric

Most organizations struggle to stitch data from disparate data sources in a coherent and useful manner. Data Fabric aims to make it easy to link, consolidate, and meaningfully describe data. It does that by combining several data management techniques like semantic data integration, data orchestration, semantically driven data pipelines, semantic data catalogs, and automation.

By providing a consolidated user experience and access to data, data fabric helps organizations manage their data, regardless of the form or the location where it’s stored. It removes friction and mitigates cost as there’s no need to make copies of the data or pay for its storage or movement.

Data fabric doesn’t involve any centralization into a data lake or a data warehouse. It requires sourcing the data from its current location by implementing service-level agreements (SLAs) from each of the business units. Thus, it delegates the responsibilities for datasets closer to where data is produced and utilizes ML/AI to create a semantic approach to accessing it.

By bringing data and metadata together, data fabric powered by KGs translates disparate data systems into useful organizational knowledge. As data changes, the metadata gets updated dynamically simplifying data ingestion, access, and storage.

Data Mesh

Unlike data lakehouses and cloud data warehouses, data mesh doesn’t differentiate between analytical and transactional systems. It marries organizational patterns with technology and architectural approaches.

By promoting data autonomy, it enables users to make domain-related decisions. They no longer have to rely on centralized data engineering or IT teams to provide data access, a common obstacle in most organizations. Instead, data mesh distributes data ownership and responsibilities, reduces dependencies across services, and thereby delivers more value at velocity.

The following diagram provides a high-level picture of the data mesh.

The centralized services in a data mesh approach, powers data sharing, reusability, interoperability and relieves domain teams from performing repeated data ingestion, processing and storage steps. Each domain builds data products with localized data catalog and domain-specific business information. Data products are the core concepts of data mesh as it encapsulates code, data, infrastructure, and metadata and is created and offered to consumers as self-service.

Data mesh requires data contracts between producers and consumers, ensuring SLAs as data flows within the organization. However, this requires a culture shift from how data teams work. Knowledge graphs ensure that data contracts are standardized, semantically correct, and aligned.

Integration of knowledge graphs with data mesh leads to emergence of a “semantic data mesh”. It provides data with context and meaning across different business units within an organization. This promotes data discoverability, interoperability, augmentation, enrichment and explainability with AI and ML.

Major Takeaways

Data management is not about just managing data. It is about managing the knowledge that is inherent within the different business units and departments of an organization. It’s about managing data with a context to bring actionable insights. For most organizations, context and semantics are the missing piece of the puzzle.

Data fabric and data mesh offer a new way of looking at data management for enterprises. They are evolutionary architectures, which are still a work in progress. Knowledge graphs can help unite data fabric and data mesh. Without knowledge graphs and semantics both of these architectures will fail.

By empowering semantics to connect data between users, systems, and applications consistently, unambiguously, and confidently, knowledge graphs ensure data quality, helping organizations cross the chasm from information to wisdom.

By empowering semantics to connect data between users, systems, and applications consistently, unambiguously, and confidently, knowledge graphs ensure data quality, helping organizations cross the chasm from information to wisdom.

Sumit Pal, Strategic Technology Director at Ontotext

Originally published at https://www.ontotext.com on April 10, 2024.

--

--

Ontotext

Ontotext is a global leader in enterprise knowledge graph technology and semantic database engines.