The Gold Standard — The Key to Information Extraction and Data Quality Control — Ontotext

7 min readMay 27, 2021

How a human curated body of data is used in AI to train algorithms for search, extraction and classification, and to measure their accuracy

We often want computers to do the tasks we give them the way people do. But, as Ontotext’s CEO Atanas Kiryakov often says, we forget that nobody nurtures the Artificial Intelligence (AI) systems for 7 years to learn how to walk, talk, count, read, write and, even more important, not touch hot stoves and avoid tilting full glasses and asking: Can I get 3 more cookies? They also don’t take 12 grades of formal education in school to get to know the basic concepts of all sciences, the most important geographical objects, people, companies, etc.

Without all this background knowledge, before computers can perform like humans, they need a machine-readable point of reference that represents “the ground truth”. And this is where the Gold Standard comes in.

Originally, the Gold Standard was a monetary system that required countries to fix the value of their currencies to a certain amount of gold, aiming to replace the unreliable human control with a fixed measurement that could be used by everyone. In much the same way, in the context of Artificial Intelligence AI systems, the Gold Standard refers to a set of data that has been manually prepared or verified and that represents “the objective truth” as closely as possible.

One of the main uses of the Gold Standard is to train AI systems to identify the patterns in various types of data with the help of machine learning (ML) algorithms. In other words, by giving them plenty of data to learn from, we can “teach” AI systems to automatically identify such patterns. We can also use the Gold Standard to evaluate the performance of such algorithms.

What do AI systems need to learn?

To be able to recognize patterns accurately, the data, AI systems are fed with, needs to be unambiguous, otherwise they can’t make any sense of it. And this is a challenge, as today’s data comes in huge volumes and from various sources. Each source covers different aspects of the same real world phenomena or uses different terms for relatively similar things.

Let’s take as an example some data about company transactions and let’s say that we are interested in who buys who and for how much. Our first data source says that on June 13, 2016 Microsoft bought LinkedIn, while the second — that Microsoft bought LinkedIn for $28.1 billion. Here, we see that the first source doesn’t specify the sum and the second has no mention of the date. But following the real-world business logic, if we have Microsoft and LinkedIn in both data sources, it follows that these two records refer to the same transaction and so they should be merged into one.

However, this is not always so straightforward. Consider an example in which our first data source says that Microsoft invested $240 million in Facebook and the second — that on October 24, 2007 Microsoft invested in Facebook. Here, we cannot merge these two records solely because we have Microsoft and Facebook in both sources, as, while an acquisition is usually a one time affair, a company can invest in another company more than once.

So, to help AI systems make sense of all this ambiguity, we use data linking techniques. They identify, match and merge data records referring to the same or a similar entity in multiple datasets and also identify entities that seem to be the same or similar but are not.

In the above case of merging information about companies from different data sources, data linking helps us encode the real-world business logic into data linking rules. But, before we can have any larger scale implementation of these rules, we have to test their validity. Simply put, we need to be able to measure and evaluate our results against clearly set criteria.

How does the Gold Standard help data linking?

To evaluate the performance of our data linking algorithms and calibrate them for higher accuracy, we need to use a set of Gold Standard data that has been manually verified. This verification has to come from at least two independent raters to make sure we avoid any biased point of views. Even more important, there should be sufficient level of agreement between the human experts, which is the only way to prove that the task is well defined and there are clear guidelines for rating, coding or annotation. Once we are satisfied with the results, we can train our AI system to automatically match other data.

Now, let’s consider some specific examples!

In the EU-funded research project CIMA (Intelligent Matching and Linking of Organization Data from Different Sources), the aim was to link and harmonize company data from various sources.

One of the tasks connected to the project was to classify companies by industry sectors. Ontotext used DBpedia, a structured version of Wikipedia, as a Gold Standard. This made sense because the thousands of organizations in this giant dataset had already been classified into industry sectors by more than one human participant. As Wikipedia is not only the biggest encyclopedia, but its contributors adhere to a strict editorial process, each industry tag is usually assigned by one person and reviewed by another. So, for the purposes of this task, Ontotext structured the data syntax into industry taxonomy where organizations were classified according to their attributes and relationships from DBpedia as well as their text descriptions. Then, ML algorithms were trained to classify other organizations under the 32 top-level industry sectors, e.g., Automotive, Healthcare and Retail.

Another task was to make it possible to find companies similar to a particular company (i.e., big tech companies like Google, Facebook and Apple) as well as to specify the degree of similarity between them. For this purpose, Ontotext prepared a Gold Standard dataset of thousands of company pairs, each with a specific degree of similarity — very similar, little similar, has some similarity, has no similarity. Then this sample set was used for fine-tuning the ML algorithms even further.

How does the Gold Standard help Information Extraction from text?

Gold Standards are also needed to create training and evaluation sets for AI systems when we want to extract information from free text. In natural language processing (NLP) and computational linguistics the Gold Standard typically represents a corpus of text or a set of documents, annotated or tagged with the desired results for the analysis — be it designation of the corresponding part of speech, syntactic parsing, concept or relationship.

When “reading” unstructured text, AI systems first need to transform it into machine-readable sets of facts. This happens through the process of semantic annotation, where documents are tagged with relevant concepts and enriched with metadata, i.e., references that link the content to concepts, described in a knowledge graph.

But here again ambiguity is a stumbling block. For example, while most of the time people can easily tell from the context whether Apple refers to Apple Inc. (the technology company) or Apple Records (the record label found by the Beatles in 1978), AI systems lack the background knowledge to make such a distinction.

To help AI systems “tell” whether “Apple” refers to Apple Inc. or Apple Records, we again need to prepare a Gold Standard set of documents, where the accuracy of the results has been verified by more than one person. In the same way as with data linking, we have to adjust our ML algorithms by giving them plenty of documents to learn from.

Once developed and trained, these algorithms become the building blocks of systems that can automatically interpret data. And by gathering detailed structured data from unstructured texts, they can enable the automation of tasks such as smart content classification, relevant content recommendation, intelligent semantic search, mining for patterns and trends, and many more.

Gold Standard takeaways

Trustworthy sample sets are indispensable for the training of machine learning algorithms that are used in various AI systems for pattern recognition. If these sample sets are not of high quality, clean and representative, we cannot hope to train the algorithms to get useful results.

Beyond training, each and every AI system, be it based on symbolic rules or statistical models, has to be evaluated, in the same way children take exams to graduate. Evaluation as well requires a trusted reference, which is again the Gold Standard. Evaluation is for AI systems what quality assurance (QA) is for software systems.
Atanas Kiryakov, CEO at Ontotext

That is why it is vital to manually verify that the rules we have set represent the “objective truth” as closely as possible and that they are interpreted consistently and unambiguously.

Creating a Gold Standard set is a laborious and time-consuming process. But it lays the groundwork for many tailor-made solutions that work with content and data from multiple sources and solve complex business challenges.

Gergana Petkova, Marketing Content Manager

Originally published at https://www.ontotext.com on May 27, 2021.

The Gold Standard — The Key to Information Extraction and Data Quality Control — Ontotext

What do AI systems need to learn?

How does the Gold Standard help data linking?

How does the Gold Standard help Information Extraction from text?

Gold Standard takeaways

Written by Ontotext