Of Superheroes, Hypergraphs and the Intricacy of Roles

Kurt Cagle 19/07/2021

In my previous post in which I discussed names, I also led in with the fact that I am a writer.

Significantly, I did not really talk much about that particular assertion, because is in fact comes with its own rabbit hole quite apart from that associated with names and naming. Specifically, this assertion is all about roles.

Ontologists, especially neophyte data modelers, often get caught up in the definition of classes, wanting to treat everything as a class. However, there are two types of things that don't actually fit cleanly into traditional distinctions of class: roles and categorizations. I'm not going to keep the focus of this article on roles, preferring to treat categorizations separately, though they have a lot of overlap.

In describing a person, it's worth dividing this particular task up into those things that describe the physical body and those things that describe what that person does. The first can be thought of as characteristics: height, weight, date of birth (and possibly death), skin color, physical gender, scars and marks, hair color, eye color, and so forth. Note that all of these things change over time, so blocks of characteristics may all be described at given intervals, with some kind of timestamp indicator of when these characteristics were assessed.

In a purely relational database, these personal relationships would be grouped together in a cluster of related content, (a separate table row) with each row having its own specific timestamps. The foreign key for the row would be a reference to the person in question.

In a semantic triple store (a hypergraph), this relationship gets turned around a bit. A block of characteristics describes the individual - the "parent" of this block is that person. In UML, this would be considered a composition but SQL can't in fact differentiate between a composition (a characteristic that is intrinsic to its parent) and an association (a relationship between two distinct entities). That's why SQL treats both as second normal form relationships.

A semantic graph, on the other hand, is actually what's called a hypergraph. This is a point that a surprising number of people even in the semantic world don't realize. In a normal, directed graph, if you have an assertion where the subject and the predicate are the same, then you can have only one object for that pair (keeping in mind that the arrow of the edge in the directed graph always goes from subject to object, not the other way around). This is in fact exactly what is described in SQL - you can't have the same property for a given row point to different foreign keys.

In a hypergraph, on the other hand, there is no such limit - the same subject and predicate can point to multiple objects without a problem. This means that you CAN in fact differentiate a composition from an association in the graph, rather than through an arbitrary convention. The downside to this, however, is that while all graphs are hypergraphs, not all hypergraphs are graphs. in the strictest sense of the word. Put another way, the moment that you have a property for a given subject point to more than one object, you cannot represent that hypergraph in a relational database.

Ironically, this is one of the reasons that compositions tend to be underutilized in programming - they are a pain to serialize in SQL. Compositions are easier to manage in both XML and JSON, but in the case of JSON, this is only because properties can take arrays as arguments (XML uses sequences and is actually much better oriented for working with them). An array is a data structure - which means that an array with a single value is NOT the same thing structurally as a singleton entity, which can be a real problem when dealing with serialized JSON versions of RDF.

Given all that, the reason that hypergraphs are so relevant to roles is that a role is intrinsically a hypergraph-type relationship. For instance, consider a given person. That person may be simultaneously a novelist and a screenwriter (and could even be a producer (in the US, the UK still uses show runner for the person who is largely responsible for the overall story of several episodes of a show). Just to make things even more exciting, it's even possible for that same person to be an actor at the same time as well.

What makes roles so problematic is that they are not just descriptive. A screenwriter writes a draft of a story for a production. An author writes a book. They may describe the same basic story, but beyond the essence of the story, the two may be dramatically different works (J.R.R. Tolkien wrote the Lord of the Rings series, but there have been multiple adaptations of these books by different screenwriters over the years). This means that the information that a role provides will likely vary from role to role.

In point of fact, a role is an example of a more abstract construct which I like to call a binding. All bindings have a beginning and an end, and they typically connect two or more entities together. Contracts are examples of bindings, but so are marriages, job roles, and similar entities. In the case of a script binding, it would look something like this (using the Templeton notation I discussed in my previous article):

Where is the role here? It's actually implied and contextual. For instance, relative to a given production you can determine all of the scriptwriters after the fact:

In other words, a role, semantically, is generally surfaced, rather than explicitly stated. This makes sense, of course: over time, a production may have any number of script writers. One of the biggest areas of mistakes that data modelers make usually tends to be in forgetting that anything that has a temporal component should be treated as having more than one value. This is especially true with people who came to modeling through SQL, because SQL is not a hypergraph and as such can only have many-to-one relationships, not one-to-many relations.

By the way, there's one assertion format in Templeton I didn't cover in the previous article:

The expression (<=skos:prefLabel) means that the indicated predicate script:hasWorkingTitle should be seen as a subproperty of skos:prefLabel. More subtly,

indicates that the predicate script:hasAuthor is a subproperty of entity:hasPerson.

Note that this is a way of getting around a seemingly pesky problem. The statement:

is a statement that indicates that there is an implicit property entity:hasPerson that script:hasAuthor is a sub-property of, and by extension, that Class:_Author is a subclass of Class:_Person. However, with the binding approach, there is no such explicit formal class as Class:_Author. We don't need it ! We can surface it by inference, but given that a person can be a scriptwriter for multiple productions and a production can have multiple scriptwriters, ultimately, what we think of as role is usually just a transaction between two distinct kinds of entities.

While discussing media, this odd nature of roles can be seen in discussion of characters. I personally love to discuss comic book characters to illustrate the complex nature of roles, because, despite the seeming childishness of the topic, characters are among some of the hardest relationships to model well. The reason for this is that superheroes and supervillains are characters in stories, and periodically those stories are redefined to keep the characters seemingly relevant over time.

I find it useful in modeling to recognize that there is a distinction between a Person and what I refer to as a Mantle. A person, within a given narrative universe, is born, lives a life, and ultimately dies, A mantle is a role that the person takes on, with the possibility that, within that narrative universe, multiple people may take on the mantle over time. For instance, the character of Iron Man was passed from one person to another.

Notice here that I've slipped in the concept of narrative universe here. A narrative universe, or Continuity for brevity, is a story telling device that says that people within a given continuity have a consistent history. In comics, this device became increasingly used to help to deal with inconsistencies that would emerge over time between different writing teams and books. The character Tony Stark from Marvel comics featured the same "person" in different continuities, though each had their own distinct back stories and histories.

I use both Tony Stark and Iron Man here because Tony Stark wore the mantle of Iron Man in most continuities, but not all Iron Man characters were Tony Stark. When data modeling, its important to look for edge cases like this because they can often point to weaknesses in the model.

Note also that while characters are tied into the media of the story they are from, a distinction needs to be made between between the same character in multiple continuities and the character at different ages (temporal points) within the same continuity. Obi Wan Kenobi was played by multiple actors over several different presentations, but there are only a small number of continuities in the Star Wars Universe, with the canonical continuity remaining remarkably stable.

Finally comes the complications due to different actors performing different aspects of the same characters in productions. James Earl Jones and David Prowse performed the voice and movement respectively of Anakin Skywalker wearing the mantle of Darth Vader in the first three movies, with Hayden Christensen playing Anakin before he donned the cape and helmet, Jake Lloyd played him as a child, and Sebastian Shaw played him as the dying Skywalker in Return of the Jedi. If you define a character as being the analog of a person in a narrative world, then the true relationship can get to be complex:

This is a complex definition, and it is also still somewhat skeletal (I haven't even begun digging into organizational associations yet, though that's coming to a theatre near you soon). It can be broken down into a graphviz (which can actually be created fairly handily from Templeton) as something like the following:

and rendered as the following:

We can use this as a template to discuss Tony Stark and Iron Man in the Marvel Cinematic Universe (MCU):

This may seem like modeling overkill, and in many cases, it probably is (especially if the people being modeled are limited to only one particular continuum). However, the lessons to be learned here are that this is not THAT much overkill. People move about from one organization to the next, take on jobs (and hence roles) and shed them, and especially if what you are building is a knowledge base, one of the central questions that you have to ask when working out modeling is "What will the world look like for this individual in three years, or ten years?" The biggest reason that knowledge graphs fail is not because they are over-modeled, but because they are hideously under-modeled, primarily because building knowledge graphs means looking beyond the moment.

What's more important, if we take out the continuum and the actor/production connections, this model actually collapses very nicely into a simple job role model, which is another good test of a model. A good model should gracefully collapse to a simpler form when things like continuum are held constant. The reason that Characteristics in the above model is treated as a separate entity is because the characteristics of a person changes with time. Note that this characteristic model ties into the character (or person) rather than the role, though it's also possible to assign characteristic attributes that are specific to the role itself (the President of the United States can sign executive orders, for instance, but once out of that role, the former president cannot do the same thing).

I have recently finished watching the first Season of Disney/Marvel's Loki. Beyond being a fun series to watch, Loki is a fantastic deconstruction of the whole notion of characters, mantles, variants and roles within intellectual works. It also gets into ideas about whether, in fact, we live in a multiverse (The Many Worlds Theory) or instead whether quantum mechanics implies that the universe simply creates properties when it needs to, but the cost of these properties is quantum weirdness.

Next up in the queue - organizations, and modeling instances vs sets.

Share this article

Leave your comments

Post comment as a guest

Comments

Comments

No comments found

Kurt Cagle

Tech Expert

Kurt is the founder and CEO of Semantical, LLC, a consulting company focusing on enterprise data hubs, metadata management, semantics, and NoSQL systems. He has developed large scale information and data governance strategies for Fortune 500 companies in the health care/insurance sector, media and entertainment, publishing, financial services and logistics arenas, as well as for government agencies in the defense and insurance sector (including the Affordable Care Act). Kurt holds a Bachelor of Science in Physics from the University of Illinois at Urbana–Champaign.

Of Superheroes, Hypergraphs and the Intricacy of Roles