If you are a data science type of person, knowledge graphs are really cool.
But for the vast majority of human beings out there that aren't, knowledge graphs probably sound like so much technobabble.
However. there are actually a number of real-world (and not so real world) examples of knowledge graphs that you could actually use today for doing things that might prove very useful.
Most modern content management systems make use of resources that are identified by unique URLs, have some kind of modifiable type, have properties that are associated with those URL nodes, and can be ordered as lists. Put another way, applications such as Drupal were some of the first formal knowledge graphs, even though it can be argued that this particular design was not wholly intentional.
Knowledge graphs lend themselves well to content management systems, especially once you figure that the publishing paradigm that underlies both CMS systems and RESTful systems are pretty much the same. Put an image and some HTML text into URL references and text literals, assign the document a type such as an article, an announcement, an advertisement, an actor, an automobile (and that's just touching the "A's"), and you have the means to determine how such a document gets rendered generically based on type.
Semantic wikis have been around for a while, but they are beginning to come into their own as a useful product category. Let's say that you wanted to create a graph that contained all of the relevant parts of a novel - the characters, the scenes, plot tropes, MacGuffins, and so forth. An editable semantic wiki, where properties determined types that in turn allowed for selections from lists or type-ahead, meant that you could build up an entire, complex world, editing properties dynamically as you created them.
Or perhaps you're wanting to track marketing campaigns aligning ads, actors, characters, products, locations and so forth - again, this is a perfect tool for a context management system or an editable semantic wiki. I even know one person who built an entire knowledge graph to track the different variables used in the production, purchase, and consumption of wines.
Regardless, such context management systems are the logical next steps of content management systems and are especially useful when the person entering content is not a professional ontologist. So long as there is a core underpinning of relevant context-free modeling (OWL or SHACL, for instance) such relatively small scale knowledge graphs are likely to end up becoming very widely used, with ontology "starter sets" giving users a head start in building out everything from Lord of the Rings to a scandal-ridden political administration.
One other facet that's important with such CxM systems is that they may encode other narrative formats. A literal may still be semantically meaningful markup - HTML, XML, Markdown, or similar hypertext aware content, and while some links may be made to external documents, as often as not, CxM systems usually are fairly self-contained, though readily explorable by either customers or admins.
This is the context management system's bigger brother. The nodes in the graph still refer to different types of objects, but those objects are typically going to be products of some sort. Semantic catalogs typically have some properties that can be treated as text or narrative HTML (such as product descriptions or warranty information), but they also tend to have more in the way of numeric, unit-based, currency-related, temporal and structural metadata. Often times, such catalogs are usually tied into a classification system or taxonomy that often may be heavily inter-related (genres in a book or media catalog, taxa in a biological research catalog, roles in an HR system).
Semantic Catalogs also tend to differ from CxMs because they are usually populated not by user forms but by the ingestion of data coming from databases, spreadsheets, CSV, files, and other data sources. Consequently, such catalogs usually tend to be a montage of ontologies that largely reflect the data sources, and the relationships that do exist are at their lowest level operational in nature (used to drive applications) and only secondarily conformant with a single homogenous "one ring to rule them all" ontology. Semantic catalogs also tend to be more transactional in nature than CxM systems.
Not all semantic catalogs are focused on eCommerce, but many of them are. If you were a company that sold coffee products, for instance, supply chain considerations, transactional management, ID management, and similar concerns may play out, even if part of the knowledge graph is associated with things like coffee blend origins, coffee notes, and related projects.
Semantic catalogs are often good for capturing pairings (a dark roast coffee might go better with a cranberry tart, for instance), and you are more likely to see what I call contracts - objects that connect other entities together as fourth normal form structures. Pairings, like the one just given, are one type of example, where you have what amounts to many-to-many relationships. For instance, cranberry tarts might also go with certain seasonal blends in addition to purely dark roast coffees, while such coffees might also go well with orange scones).
Semantic catalogs tend to focus on product descriptions and narratives, but data catalogs, while having a similar name, are actually a somewhat different beast altogether. A semantic catalog is usually a centralized repository of information, where the information, for the most part, is coerced into something that, at a fairly low level, looks like an ontology. A data catalog, on the other hand, is a mechanism that lets you connect multiple data sources together and query them as if they were one large database. In effect, data catalogs keep track of where data exists (both as relational tables and virtual views), and, in essence, creates a proxy representation of that data so that it appears to be one centralized repository.
In this model, the data catalog doesn't actually hold the data. Instead, it holds representations of the schemas associated with that data and then abstracts any queries such that, from the calling person's standpoint, the data looks like one giant database. In this approach, the data catalog typically translates everything into triples in the background, then returns the triples in some other representation - relational active records, CSVs, data cubes, data frames, JSON, XML and so forth. A data hub, on the other hand, does much the same thing, but it also basically translates from one format to another through a centralized data model that may be either a human-derived ontology or a machine-language oriented transformation.
These two types of knowledge graphs may not necessarily even be obvious as knowledge graphs. Data hubs usually tend to work behind the scenes, in effect reducing the complexity of data integration dramatically while still keeping such a system from becoming a bottleneck within an organization. On the flip side, they can push the problem of generating a consolidated enterprise model down the road a bit, but in many cases, this comes at the cost of letting legacy systems remain in active use even when they'd be better off being retired.
Data catalogs, in general, can help to resuscitate swampy data lakes where data was consolidated but without a lot of thought about dealing with heterogeneous data models and master data management (identity management) issues. They can provide an abstraction layer, making data available in a wide variety of forms and mapped to different perspectives and views, while maintaining the integrity of the original systems.
Compliance is one of those problem domains that may seem easy on the surface, but that can quickly turn into a morass of complexity. One reason for this is the fact that when you are dealing with compliance you are typically dealing with rules that are applicable only in certain jurisdictions, for certain types of products built with specific components or ingredients, encoded in a number of different ways.
Another reason for the pain has to do with the fact that it is usually not enough to determine that a particular product or activity is not in compliance, but rather that a given user needs to understand why something isn't compliant. This makes machine learning approaches to compliance testing insufficient: you can build a training set indicating whether a given scenario is compliant or not, but as regulatory environments change, any model that's built needs to be constantly recalibrated.
Knowledge graphs, on the other hand, turn out to be great vehicles for managing compliance. Entities, even abstract entities such as regulations and rules, can be defined to be applicable to certain scopes, jurisdictions, and other features. Rules can also be encoded both for human consumption and for machine conception. Category futures can also handle discontinuous functions such as taxes that jump from 6% to 8% when the amount reaches a cut-off threshold, something that can give machine learning models conniption fits.
Another aspect of such graphs is that it makes it possible to simulate changes that would occur if either products or regulator environments change, typically by changing parameters in a query. This holds especially true when you have interdependencies where changing one element will change another element that can change a third.
There have been a number of smart contract implementations over the years, each focusing on different aspects of contracts, with some more successful than others. What is worth noting with contracts, however, is that they represent a fusion of concepts, from the need to define and certify entities as valid to the requirements to indicate and establish rules for initiating processes.
All of these things can be facilitated by knowledge graphs, albeit usually mediated by other technologies. For instance, it is possible to create a distributed ledger in a knowledge graph by making such graphs immutable (meaning that once an assertion is created, it can't be destroyed). It is also possible to adapt an existing blockchain type ledger so that the ledger contains the relevant keys (pointers) that are then used to reference the appropriate metadata processes.
Process management in a knowledge graph may be done by using Prov-o, a semantics ontology that has been used for managing both provenances of resources and for identifying (and enacting) state transitions. This means that when a particular condition is completed (such as the delivery of a product), changes in the status of this product can be picked up by a specific query, which in turn precipitates other actions. While semantic contracts have not yet been widely adopted, I expect that they will become more commonplace over the next few years.
Distributed gaming has had both successes and failures over the last decades, but as the complexity of such games increases, the use of relational databases to manage multiplayer environments is running into significant limitations. I expect that knowledge bases, which are built from the ground up with federation in mind, will increasingly become the tool of choice for game developers, especially as GraphQL and similar technologies make the intricacies of knowledge bases transparent to the typical developer and even more so to the end-user.
This may actually prove to be a major boon for educators as well. As more educational services move online in the post-COVID period, the complexity of such systems, built originally around relational MOOCs, is pushing developers and investors to seek databases that are more flexible, more optimized to handle live updates of models, and in general, are more flexible than traditional SQL-based systems. Again, this segment is currently fairly small, but it is also growing rapidly as the need for building out dynamic-metadata driven tools begin to predominate.
It should be emphasized that knowledge graphs are just another kind of index, a way of storing information with three (or sometimes four) index keys making it possible to retrieve content based on fairly complex joins. This makes it possible to associate metadata with anything - games, students, classes, assignments, images, videos, audio, and so forth. This means that systems can more intelligently use the metadata about properties and classes to drive applications in ways that would have been unthinkable in a relational database world, while at the same time providing ways of tracking how that data is being consumed both in systems and out in the wider world.
This is actually one of my favorite knowledge graph applications. You can think of a personal knowledge graph as being a data purse. Imagine the idea of having a knowledge graph that you can use to store and tag documents, capture phone numbers, insurance numbers, or transaction confirmation numbers, keep track of your favorite sports teams, write novels, or keep your health records, all in one box. That's a personal knowledge graph (PKG).
In essence, you can add new modules to the PKG by loading ontologies that working within specific domains. You want to be able to track various components in your wines, you can load in an Oenophile ontology (call it an o-package). Want to keep notes from your school classes - a PKG can help you do that as well, cross-referencing content quietly in the background even as you work. Writing a novel? A novel o-package will let you keep track of who your characters and settings are, when and how they interacted, and what story arc they are currently a part of.
A PKG can give you not only the classes (or complex types) for you, but automatically build the interfaces you need dynamically without having to have something custom, and you can add new classes and properties that allow you to expand it to ensure that you don't break continuity. Such PKGs could be cloud-based or local, depending upon your needs, and with it you can take the data produced and convert it into any number of different formats transparently, or import data from others using similar ontologies, mapping only the differences along the edges.
Finally, such a knowledge graph may be at the heart of a true data purse that will let you control who has access to your data without having to spend a lot of time managing the control process. This has profound implications since it gives you true data security - in essence, someone can only get specific information if they are in your keychain for the knowledge graph, or perhaps a particular marketing firm may only be able to ask pre-controlled yes/no questions type questions, and even then only be requesting this information first. What this does as well is let you build an audit trail of what information external systems have about you, as well as how current that information is.
On the other side of the graph spectrum are large-scale graph analytics systems. These will normally tend to be property graphs vs. knowledge graphs (though over time this distinction will fade) that are specifically designed to perform complex network analytics. What is significant here is that a significant number of innovations (including machine learning and neural networks) can in fact be expressed as graphs. These are often used to determine the characteristics of connectedness.
For instance, consider supply-chain systems, where you are moving goods used in production around the globe, and you want to know, at any given time, where your goods are, what delays are in various parts of the supply matrix, and how those delays are going to impact your ability to meet production quotas. This is one of those cases where you may actually have a knowledge graph and a property graph working side-by-side, one essentially managing the dynamic distribution of factors, the other maintaining the more long term-metadata. By running these systems in parallel, you're able to create a synthesized view that incorporates both richness of content and decent performance.
The same thing actually applies in areas such as medicine, where you use the property graphs to keep track of dynamically changing (transactional information) while keeping relevant metadata in more readily accessible knowledge graphs. As with the blockchain example above, what you get from such a hybrid approach is lightweight speed in the property graph (with data possibly generated from the base knowledge graph) coupled with the richness of content that knowledge graphs provide.
This also spills over into AI realm in which graph embeddings and graph neural nets allow you to not only see how information is clustered but also what the relationships are between the clustered data points to help broach the explainability issue that is currently such a headache in this space.
On the business front, Customer 360 (and similar kinds of) knowledge graphs are basically used to track the connectedness of various different aspects of customer transactions in a process. The idea here is that you want to be able to see all aspects of the customer interaction in order to get better insights about who would likely be prospective customers.
These kinds of systems are usually fairly easy to visualize and have applicability on a wide range of domains. They also can be used to connect such systems with related systems, e.g., match keys with a health system and you can basically track the correlative impact of health on purchases. Customer 360 KG Systems will likely become a mainstay of most corporate marketing departments, ways of both seeing who customers are (and what drives them) and ways for those same customers to find products, services, and related information. Not surprisingly Customer 360s are becoming especially popular among health care and health insurance companies, as they can tie into (or potentially replace) electronic health record systems.
A similar type of application is HR 360, which can tie in recruiting, benefits, employee career tracking, job listings, and so forth into a single integrated package. Supply chain management is yet another variation, where trying to trace and track packages, products, and shipping containers as they make their way through the supply chain becomes feasible given the wide variety of transport systems and shipping protocols.
The knowledge graph is, at its core, a better way of organizing information of certain kinds, and as such, the potential for such knowledge graphs is vast. We may be at a precipice where knowledge graphs form intelligence not only in centralized hubs but also along the periphery in edge-based computing: personal knowledge graphs on your mobile devices, embedded knowledge graphs in IoT devices, regional or domain-specific distributed knowledge graphs, even increasingly knowledge graphs embedded within the very media that we utilize. I fully expect that the next decade will see knowledge graphs becoming near-universal.
Kurt is the founder and CEO of Semantical, LLC, a consulting company focusing on enterprise data hubs, metadata management, semantics, and NoSQL systems. He has developed large scale information and data governance strategies for Fortune 500 companies in the health care/insurance sector, media and entertainment, publishing, financial services and logistics arenas, as well as for government agencies in the defense and insurance sector (including the Affordable Care Act). Kurt holds a Bachelor of Science in Physics from the University of Illinois at Urbana–Champaign.