内容简介:After so much success of my previous lengthy post aboutMake sure you have some popcorn nearby and get ready for some in-depth graph analysis.You can always visualize the graph schema in Neo4j Browser with the
After so much success of my previous lengthy post about combining NLP techniques and graphs , I have prepared another exhaustive post for you. We will go over a couple of topics. We will begin by importing the data into Neo4j via the WikiData API . By the time we are done, we will scrape most of the LOTR information available on WikiData. In the next step, we will prepare an exploratory data analysis and show how to populate missing values based on some hypothesis. To top it off, we will run a couple of graph algorithms and prepare some beautiful visualizations.
Make sure you have some popcorn nearby and get ready for some in-depth graph analysis.
Agenda
- Import Wikipedia data to Neo4j
- Basic graph exploration
- Populate missing value
- Some more graph exploration
- Weakly connected component
- Betweenness centrality
- Graph visualizations with Bloom (cool part!)
Graph Schema
You can always visualize the graph schema in Neo4j Browser with the db.schema.visualization()
procedure. It is a convenient procedure that automatically captures the schema of the stored graph.
p.s. Run it only after we have imported the graph
CALL db.schema.visualization()
Results
We have been using simple graph schemas for quite some time now. I am delighted to say that this time it is a bit more complicated. We have a social network of characters with familial ties like SPOUSE, SIBLING, HAS_FATHER, and even not so familial relationships like ENEMY. We know some additional information about the characters such as their country, race, and any group they are members of.
WikiData import
As mentioned, we will fetch the data from the WikiData API with the help of the apoc.load.json
procedure. If you don’t know yet, APOC provides excellent support for importing data into Neo4j. Besides the ability to fetch data from any REST API, it also features integrations with other databases such as MongoDB or relational databases via the JDBC driver.
P.s. You should check out the Neosematics library if you work a lot with RDF data.
We will start by importing all the races in the LOTR world. I have to admit I am a total noob when it comes to SPARQL, so I won’t be explaining the syntax in depth. If you need a basic introduction on how to query WikiData, I suggest this tutorial on Youtube . Basically, all the races in the LOTR world are an instance of the Middle-earth races entity with id Q989255. To get the occurrences of a specific item, we use the following SPARQL clause:
?item wdt:P31 wd:Q989255
This can be translated as “We would like to fetch an item, which is an instance of (wdt: P31 ) an entity with an id Q989255 “. After we have downloaded the data with APOC, we store the results to Neo4j.
// Prepare a SPARQL query WITH 'SELECT ?item ?itemLabel WHERE{ ?item wdt:P31 wd:Q989255 . SERVICE wikibase:label { bd:serviceParam wikibase:language "[AUTO_LANGUAGE],en" } }' AS sparql // make a request to Wikidata CALL apoc.load.jsonParams( "https://query.wikidata.org/sparql?query=" + apoc.text.urlencode(sparql), { Accept: "application/sparql-results+json"}, null) YIELD value // Unwind results to row UNWIND value['results']['bindings'] as row // Prepare data WITH row['itemLabel']['value'] as race, row['item']['value'] as url, split(row['item']['value'],'/')[-1] as id // Store to Neo4j CREATE (r:Race) SET r.race = race, r.url = url, r.id = id
That was easy. The next step is to fetch the characters that are an instance of a given Middle-earth race. The SPARQL syntax is almost identical to the previous query, except this time, we iterate over each race and find the characters that are a part of it.
// Iterate over each race in graph MATCH (r:Race) // Prepare a SparQL query WITH 'SELECT ?item ?itemLabel WHERE { ?item wdt:P31 wd:' + r.id + ' . SERVICE wikibase:label { bd:serviceParam wikibase:language "[AUTO_LANGUAGE],en" } }' AS sparql, r // make a request to Wikidata CALL apoc.load.jsonParams( "https://query.wikidata.org/sparql?query=" + apoc.text.urlencode(sparql), { Accept: "application/sparql-results+json"}, null) YIELD value UNWIND value['results']['bindings'] as row WITH row['itemLabel']['value'] as name, row['item']['value'] as url, split(row['item']['value'],'/')[-1] as id, r // Store to Neo4j CREATE (c:Character) SET c.name = name, c.url = url, c.id = id CREATE (c)-[:BELONG_TO]->(r)
Did you know that there are at least 700 characters in the Middle-earth world? I would have never guessed there would be so many of them documented on WikiData. Our first exploratory cypher query will be to count them by race.
MATCH (r:Race) RETURN r.race as race, size((r)<-[:BELONG_TO]-()) as members ORDER BY members DESC LIMIT 10
Results
race | members |
---|---|
“men in Tolkien’s legendarium” | 344 |
“Hobbit” | 150 |
“Middle-earth elf” | 83 |
“dwarves in Tolkien’s legendarium” | 52 |
“Valar” | 16 |
“half-elven” | 12 |
“Maiar” | 10 |
“Orcs in Tolkien’s legendarium” | 10 |
“Ent” | 5 |
The Fellowship of the Ring group is a somewhat representative sample of races in the Middle-earth. Most of the characters are either human or hobbits, with a couple of elves and dwarves strolling by. This is the first time I have heard of Valar and Maiar races, though.
Now it is time to enrich the graph with information about characters’ gender, country, and manner of death. The SPARQL query will be a bit different than before. This time we will select a WikiData entity directly by its unique id and optionally fetch some of its properties. We can filter a specific item by its id using the following SPARQL clause:
filter (?item = wd:' + r.id + ')
Similar to the cypher query language, SPARQL also differentiates between a MATCH
and an OPTIONAL MATCH
. When we want to return multiple properties of an entity, it is best to wrap each property into an OPTIONAL MATCH
. This way, we will get results if any of the properties exist. Without the OPTIONAL MATCH
, we would only get results for entities where all three properties exist. This is an identical behavior to cypher.
OPTIONAL{ ?item wdt:P21 [rdfs:label ?gender] . filter (lang(?gender)="en") }
The wdt:P21
indicates we are interested in the gender property. We also specify that we want to get the English label of an entity instead of its WikiData id. The easiest way to search for the desired property id is to inspect the item on the WikiData web page and hover over a property name.
Another way is to use the WikiData query editor , which has a great autocomplete function by using the CTRL+T command.
To store the graph back to Neo4j, we will use the FOREACH
trick. Because some of our results will contain null values, we have to wrap the MERGE
statement into the FOREACH
statement which supports conditional execution. Check the Tips and tricks blog post by Michael Hunger for more information.
// Iterate over characters MATCH (r:Character) // Prepare a SparQL query WITH 'SELECT * WHERE{ ?item rdfs:label ?name . filter (?item = wd:' + r.id + ') filter (lang(?name) = "en" ) . OPTIONAL{ ?item wdt:P21 [rdfs:label ?gender] . filter (lang(?gender)="en") } OPTIONAL{ ?item wdt:P27 [rdfs:label ?country] . filter (lang(?country)="en") } OPTIONAL{ ?item wdt:P1196 [rdfs:label ?death] . filter (lang(?death)="en") }}' AS sparql, r // make a request to Wikidata CALL apoc.load.jsonParams( "https://query.wikidata.org/sparql?query=" + apoc.text.urlencode(sparql), { Accept: "application/sparql-results+json"}, null) YIELD value UNWIND value['results']['bindings'] as row SET r.gender = row['gender']['value'], r.manner_of_death = row['death']['value'] // Execute FOREACH statement FOREACH(ignoreme in case when row['country'] is not null then [1] else [] end | MERGE (c:Country{name:row['country']['value']}) MERGE (r)-[:IN_COUNTRY]->(c))
We are connecting additional information to our graph bit by bit and slowly transforming it into a knowledge graph. Let’s first look at the manner of death property.
MATCH (n:Character) WHERE exists (n.manner_of_death) RETURN n.manner_of_death as manner_of_death, count(*) as count
Results
manner_of_death | count |
---|---|
“accident” | 1 |
“homicide” | 3 |
“death in battle” | 1 |
Nothing of interest. This is obviously not the Game of Thrones series. Let’s also inspect the results of the country property.
MATCH (c:Country) RETURN c.name as country, size((c)<-[:IN_COUNTRY]-()) as members ORDER BY members DESC LIMIT 10
Results
country | members |
---|---|
“Gondor” | 70 |
“Shire” | 48 |
“Númenor” | 34 |
“Rohan” | 34 |
“Arthedain” | 17 |
“Arnor” | 8 |
“Doriath” | 6 |
“Reunited Kingdom” | 3 |
“Gondolin” | 3 |
“Lothlórien” | 3 |
We have the country information for 236 characters. We could make some hypotheses and try to populate missing country values. Let’s assume that if two persons are siblings, they belong to the same country. This makes a lot of sense. To be able to achieve this, we have to import the familial ties from WikiData. Specifically, we will fetch the father, mother, relative, sibling, and spouse connections.
// Iterate over characters MATCH (r:Character) WITH 'SELECT * WHERE{ ?item rdfs:label ?name . filter (?item = wd:' + r.id + ') filter (lang(?name) = "en" ) . OPTIONAL{ ?item wdt:P22 ?father } OPTIONAL{ ?item wdt:P25 ?mother } OPTIONAL{ ?item wdt:P1038 ?relative } OPTIONAL{ ?item wdt:P3373 ?sibling } OPTIONAL{ ?item wdt:P26 ?spouse }}' AS sparql, r // make a request to wikidata CALL apoc.load.jsonParams( "https://query.wikidata.org/sparql?query=" + apoc.text.urlencode(sparql), { Accept: "application/sparql-results+json"}, null) YIELD value UNWIND value['results']['bindings'] as row FOREACH(ignoreme in case when row['mother'] is not null then [1] else [] end | MERGE (c:Character{url:row['mother']['value']}) MERGE (r)-[:HAS_MOTHER]->(c)) FOREACH(ignoreme in case when row['father'] is not null then [1] else [] end | MERGE (c:Character{url:row['father']['value']}) MERGE (r)-[:HAS_FATHER]->(c)) FOREACH(ignoreme in case when row['relative'] is not null then [1] else [] end | MERGE (c:Character{url:row['relative']['value']}) MERGE (r)-[:HAS_RELATIVE]-(c)) FOREACH(ignoreme in case when row['sibling'] is not null then [1] else [] end | MERGE (c:Character{url:row['sibling']['value']}) MERGE (r)-[:SIBLING]-(c)) FOREACH(ignoreme in case when row['spouse'] is not null then [1] else [] end | MERGE (c:Character{url:row['spouse']['value']}) MERGE (r)-[:SPOUSE]-(c))
Before we begin filling-in missing values, let’s check for promiscuity in the Middle-earth. The first query will search for characters with multiple spouses.
MATCH p=()-[:SPOUSE]-()-[:SPOUSE]-() RETURN p LIMIT 10
Results
We actually found a single character with two spouses. It is Finwë, the first King of the Noldor. We can also take a look if someone has kids with multiple partners
MATCH (c:Character)<-[:HAS_FATHER|HAS_MOTHER]-()-[:HAS_FATHER|HAS_MOTHER]->(other) WITH c, collect(distinct other) as others WHERE size(others) > 1 MATCH p=(c)<-[:HAS_FATHER|HAS_MOTHER]-()-[:HAS_FATHER|HAS_MOTHER]->() RETURN p
Results
So it seems that Finwë has four children with Indis and a single child with Míriel. On the other hand, it is quite weird that Beren has two fathers. I guess Adanel has some explaining to do. We would probably find more death and promiscuity in the GoT world.
Populate missing values
Now that we know that the Middle-earth characters abstain from promiscuity let’s populate the missing country values. Remember, our hypothesis was:
If two characters are siblings, they belong to the same country.
Before we fill the missing values for countries, let’s populate the missing values for siblings. We will assume that if two characters have the same mother or father, they are siblings. Let’s look at some sibling candidates.
MATCH p=(a:Character)-[:HAS_FATHER|:HAS_MOTHER]->()<-[:HAS_FATHER|:HAS_MOTHER]-(b:Character) WHERE NOT (a)-[:SIBLING]-(b) RETURN p LIMIT 5
Results
Adamanta Chubb has at least six children. Only two of them are marked as siblings. Because all of them are siblings by definition, we will fill in the missing connections.
MATCH p=(a:Character)-[:HAS_FATHER|:HAS_MOTHER]->()<-[:HAS_FATHER|:HAS_MOTHER]-(b:Character) WHERE NOT (a)-[:SIBLING]-(b) MERGE (a)-[:SIBLING]-(b)
The query added 118 missing relationships. I need to learn how to update the WikiData knowledge graph and add the absent connections in bulk. Now we can fill in the missing country values for siblings. We will match all characters with the filled country information and search for their siblings that don’t have the country information. I love how easy it is to express this pattern with the cypher query language.
MATCH (country)<-[:IN_COUNTRY]-(s:Character)-[:SIBLING]-(t:Character) WHERE NOT (t)-[:IN_COUNTRY]->() MERGE (t)-[:IN_COUNTRY]->(country)
There were 49 missing countries added. We could quickly come up with more hypotheses to fill in the missing values. You can try and maybe add some other missing properties yourself.
We still have to add some information to our graph. In this query, we will add the information about the occupation, language, groups, and events of characters. The SPARQL query is identical to before, where we iterate over each character and fetch additional properties.
MATCH (r:Character) WHERE exists (r.id) WITH 'SELECT * WHERE{ ?item rdfs:label ?name . filter (?item = wd:' + r.id + ') filter (lang(?name) = "en" ) . OPTIONAL { ?item wdt:P106 [rdfs:label ?occupation ] . filter (lang(?occupation) = "en" ). } OPTIONAL { ?item wdt:P103 [rdfs:label ?language ] . filter (lang(?language) = "en" ) . } OPTIONAL { ?item wdt:P463 [rdfs:label ?member_of ] . filter (lang(?member_of) = "en" ). } OPTIONAL { ?item wdt:P1344[rdfs:label ?participant ] . filter (lang(?participant) = "en") . } OPTIONAL { ?item wdt:P39[rdfs:label ?position ] . filter (lang(?position) = "en") . }}' AS sparql, r CALL apoc.load.jsonParams( "https://query.wikidata.org/sparql?query=" + apoc.text.urlencode(sparql), { Accept: "application/sparql-results+json"}, null) YIELD value UNWIND value['results']['bindings'] as row FOREACH(ignoreme in case when row['language'] is not null then [1] else [] end | MERGE (c:Language{name:row['language']['value']}) MERGE (r)-[:HAS_LANGUAGE]->(c)) FOREACH(ignoreme in case when row['occupation'] is not null then [1] else [] end | MERGE (c:Occupation{name:row['occupation']['value']}) MERGE (r)-[:HAS_OCCUPATION]->(c)) FOREACH(ignoreme in case when row['member_of'] is not null then [1] else [] end | MERGE (c:Group{name:row['member_of']['value']}) MERGE (r)-[:MEMBER_OF]->(c)) FOREACH(ignoreme in case when row['participant'] is not null then [1] else [] end | MERGE (c:Event{name:row['participant']['value']}) MERGE (r)-[:PARTICIPATED]->(c)) SET r.position = row['position']['value']
Let’s investigate the results of the groups and the occupation of the characters.
MATCH (n:Group)<-[:MEMBER_OF]-(c) OPTIONAL MATCH (c)-[:HAS_OCCUPATION]->(o) RETURN n.name as group, count(*) as size, collect(c.name)[..3] as members, collect(distinct o.name)[..3] as occupations ORDER BY size DESC
Results
group | size | members | occupations |
---|---|---|---|
“Thorin and Company” | 14 | [“Óin”, “Dori”, “Ori”] | [“diarist”, “swordsman”] |
“Fellowship of the Ring” | 8 | [“Frodo Baggins”, “Peregrin Took”, “Gandalf”] | [“swordsman”, “archer”] |
“White Council” | 2 | [“Elrond”, “Gandalf”] | [] |
“Rangers of Ithilien” | 2 | [“Damrod”, “Madril”] | [] |
“Union of Maedhros” | 2 | [“Halmir”, “Haldir”] | [] |
“Wise” | 2 | [“Adanel”, “Andreth”] | [] |
“Istari” | 1 | [“Gandalf”] | [] |
“White Company” | 1 | [“Beregond”] | [] |
It was at this moment that I realized the whole Hobbit series are included. Balin was the diarist for the Thorin and Company group. For some reason, I was expecting Bilbo Baggins to be the diarist. Obviously, there can be only one archer in the Fellowship of the Ring group, and that is Legolas. Gandalf seems to be involved in a couple of groups.
We will execute one more WikiData API call. This time we will fetch the enemies and the items the characters own.
MATCH (r:Character) WHERE exists (r.id) WITH 'SELECT * WHERE { ?item rdfs:label ?name . filter (?item = wd:' + r.id + ') filter (lang(?name) = "en" ) . OPTIONAL{ ?item wdt:P1830 [rdfs:label ?owner ] . filter (lang(?owner) = "en" ). } OPTIONAL{ ?item wdt:P7047 ?enemy }}' AS sparql, r CALL apoc.load.jsonParams( "https://query.wikidata.org/sparql?query=" + apoc.text.urlencode(sparql), { Accept: "application/sparql-results+json"}, null) YIELD value WITH value,r WHERE value['results']['bindings'] <> [] UNWIND value['results']['bindings'] as row FOREACH(ignoreme in case when row['owner'] is not null then [1] else [] end | MERGE (c:Item{name:row['owner']['value']}) MERGE (r)-[:OWNS_ITEM]->(c)) FOREACH(ignoreme in case when row['enemy'] is not null then [1] else [] end | MERGE (c:Character{url:row['enemy']['value']}) MERGE (r)-[:ENEMY]->(c))
Finally, we have finished importing our graph. Let’s look at how many enemies are there between direct family members.
MATCH p=(a)-[:SPOUSE|SIBLING|HAS_FATHER|HAS_MOTHER]-(b) WHERE (a)-[:ENEMY]-(b) RETURN p
Results
It looks like Morgoth and Manwë are brothers and enemies. This is the first time I have heard of the two, but LOTR fandom site claims Morgoth was the first Dark Lord. Let’s look at how many enemies are within the second-degree relatives.
MATCH p=(a)-[:SPOUSE|SIBLING|HAS_FATHER|HAS_MOTHER*..2]-(b) WHERE (a)-[:ENEMY]-(b) RETURN p
Results
There are not a lot of enemies within the second-degree family. We can observe that Varda has taken her husband’s stance and is also an enemy with Morgoth. This is an example of a stable triangle or triad. The triangle consists of one positive relationship (SPOUSE) and two negatives (ENEMY). In social network analysis, triangles are used to measure the cohesiveness and structural stability of a network.
Graph data science
If you have read any of my previous blog posts, you know that I just have to include some example use cases of graph algorithms from the Graph Data Science library. If you need a quick refresher on how the GDS library works and what is happening behind the scenes, I suggest you read myprevious blog post.
We will start by projecting the family network. We load all the characters and the familial relationships like SPOUSE, SIBLING, HAS_FATHER, and HAS_MOTHER between them.
CALL gds.graph.create('family','Character', ['SPOUSE','SIBLING','HAS_FATHER','HAS_MOTHER'])
Weakly connected component
The weakly connected component algorithm is used to find islands or disconnected components within our network. The following visualizations contain two connected components. The first component is composed of Michael, Mark, and Doug, while the second one consists of Alice, Charles, and Bridget.
In our case, we will use the weakly connected component algorithm to find islands within the family network. All members within the same family component are related to each other somehow. Could be a cousin of the sister-in-law’s grandmother or something more direct like a sibling. To get a rough feeling of the results, we will run the stats
mode of the algorithm.
CALL gds.wcc.stats('family') YIELD componentCount, componentDistribution RETURN componentCount as components, componentDistribution.p75 as p75, componentDistribution.p90 as p90, apoc.math.round(componentDistribution.mean,2) as mean, componentDistribution.max as max
Results
components | p75 | p90 | mean | max |
---|---|---|---|---|
145 | 1 | 3 | 4.83 | 328 |
There are 145 islands in our graph. More than 75% of the components contain only a single character. This means that around 110 (75% * 145) characters don’t have any familial link described in the graph. If they had a single connection, the size of the component would be at least two. The largest component has 328 members. That must be one happy family. Let’s write back the results and further analyze the family components.
CALL gds.wcc.write('family', {writeProperty:'familyComponent'})
We will start by looking at the top five biggest family components. The first thing we are interested in is which races are present in the family trees. We’ll also add some random members in the results to get a better feeling of the data.
MATCH (c:Character) OPTIONAL MATCH (c)-[:BELONG_TO]->(race) WITH c.familyComponent as familyComponent, count(*) as size, collect(c.name) as members, collect(distinct race.race) as family_race ORDER BY size DESC LIMIT 5 RETURN familyComponent, size, members[..3] as random_members, family_race
Results
familyComponent | size | random_members | family_race |
---|---|---|---|
169 | 328 | [“Galadriel”, “Fingolfin”, “Amras”] | [“Middle-earth elf”, “Maiar”, “men in Tolkien’s legendarium”, “half-elven”] |
3 | 139 | [“Donnamira Took”, “Isengar Took”, “Fortinbras Took I”] | [“Hobbit”] |
252 | 29 | [“Thorin II”, “Gimli”, “Balin”] | [“dwarves in Tolkien’s legendarium”] |
374 | 21 | [“Cirion”, “Eradan”, “Belegorn”] | [“men in Tolkien’s legendarium”] |
153 | 6 | [“Aulë”, “Oromë”, “Tulkas”] | [“Valar”] |
As mentioned, the largest family has 328 members of various races ranging from elves to humans and even Maiar. It appears that elven and human lives are quite intertwined in the Middle-earth. Also their legs. There is a reason why the half-elven race even exists. Other races, like hobbits and dwarves, stick more to their own kind.
Let’s examine the interracial marriages in the largest community.
MATCH (c:Character) WHERE c.familyComponent = 169 // fix the family component MATCH p=(race)<-[:BELONG_TO]-(c)-[:SPOUSE]-(other)-[:BELONG_TO]->(other_race) WHERE race <> other_race AND id(c) > id(other) RETURN c.name as spouse_1, race.race as race_1, other.name as spouse_2, other_race.race as race_2
Results
spouse_1 | race_1 | spouse_2 | race_2 |
---|---|---|---|
“Melian” | “Maiar” | “Thingol” | “Middle-earth elf” |
“Beren Erchamion” | “men in Tolkien’s legendarium” | “Lúthien” | “Middle-earth elf” |
“Tuor” | “men in Tolkien’s legendarium” | “Idril” | “Middle-earth elf” |
“Arwen” | “half-elven” | “Aragorn” | “men in Tolkien’s legendarium” |
“Elrond” | “half-elven” | “Celebrían” | “Middle-earth elf” |
“Dior Eluchíl” | “half-elven” | “Nimloth” | “Middle-earth elf” |
First of all, I didn’t know that Elrond was a half-elf. It seems like the human and elven “alliance” is as old as time itself. I was mainly expecting to see Arwen and Aragorn as I remember that from the movies. It would be interesting to learn how far back do half-elves go. Let’s look who are the half-elves with the most descendants.
MATCH (c:Character) WHERE (c)-[:BELONG_TO]->(:Race{race:'half-elven'}) MATCH p=(c)<-[:HAS_FATHER|HAS_MOTHER*..20]-(end) WHERE NOT (end)<-[:HAS_FATHER|:HAS_MOTHER]-() WITH c, max(length(p)) as descendants ORDER BY descendants DESC LIMIT 5 RETURN c.name as character, descendants
Results
character | descendants |
---|---|
“Dior Eluchíl” | 11 |
“Eärendil” | 10 |
“Elwing” | 10 |
“Elros” | 9 |
“Elrond” | 2 |
It seems like Dior Eluchíl is the oldest recorded half-elf. I inspected results on the LOTR fandom site , and it seems we are correct. Dior Eluchil was born in the First Age in the year 470. There are a couple of other half-elves who were born within 50 years of Dior.
Betweenness centrality
We will also take a look at the betweenness centrality algorithm. It is used to find bridge nodes between different communities. If we take a look at the following visualization, we can observe that Captain America has the highest betweenness centrality score. That is because he is the main bridge in the network and connects the left-hand side of the graph to the right-hand side. The second bridge in the network is the Beast. We can easily see that all the information exchanged between the central and right-hand side of the graph has to go through him to reach the right-hand side.
We will look for the bridge characters in the largest family network. My guess would be that spouses in an interracial marriage will come out on top. This is because all the communication between the races flows through them. We’ve seen that there are only six interracial marriages, so probably some of them will come out on top.
CALL gds.alpha.betweenness.stream({ nodeQuery:"MATCH (n:Character) WHERE n.familyComponent = 169 RETURN id(n) as id", relationshipQuery:"MATCH (s:Character)-[:HAS_FATHER|HAS_MOTHER|SPOUSE|SIBLING]-(t:Character) RETURN id(s) as source, id(t) as target", validateRelationships:false}) YIELD nodeId, centrality RETURN gds.util.asNode(nodeId).name as character, centrality ORDER BY centrality DESC LIMIT 10
Results
character | centrality |
---|---|
“Arwen” | 45400.0 |
“Aragorn” | 44884.0 |
“Arathorn II” | 43524.0 |
“Arador” | 43240.0 |
“Argonui” | 42952.0 |
“Arathorn I” | 42660.0 |
“Arassuil” | 42364.0 |
“Arahad II” | 42064.0 |
“Aravorn” | 41760.0 |
“Elrond” | 41644.77380952381 |
Interesting to see that Arwen and Aragorn come out on top. Not exactly sure why, but I keep on thinking that they are the modern Romeo and Juliet that have formed an alliance between men and half-elves with their marriage. I have no idea how the JRR Tolkien system for generating names worked, but it seems a bit biased towards names starting with an A.
Neo4j Bloom
So far, we have done the data analysis and gained a few insights. Now it is time to impress our coworkers with a practical application of the graph. Neo4j Bloom is part of the Neo4j for Graph Data Science ecosystem. It is a tool primarily designed to explore the graph and allow the users to do it with little to no cypher knowledge. Check out the Bloom-ing marvellous post by Lju Lazarevic to learn of the latest features.
Neo4j Bloom comes pre-installed with the Neo4j Desktop package. I have writtena blog post on how to get started with it. Once you have opened the Neo4j Bloom, create your first perspective.
Click on the Generate button to automatically generate the graph perspective. Once you have created the view, hover over it and click the Use Perspective.
Near-natural language search
Welcome to Neo4j Bloom. Users with no cypher query knowledge can use the Near-natural language search to explore the graph. Let’s start by typing in the ENEMY in the search bar.
Bloom automatically provides us with a pattern that might be related to our search query. If we click on it, we will get the following visualization.
Yes, I know. All the nodes in your network are colored blue. We’ll get to that in a second. The visualization clearly shows two clusters or communities. On the left side of the graph, we have the good guys fighting versus Sauron. This is from the LOTR series. On the right side, we have the good guys fighting versus Morgoth. I guess this must be the Hobbit series then.
If you want to change the color of the nodes, follow this image. First, you click on the Character label, then select the Rule-based tab and input your rule for colors. In our case, we colored all the female characters red.
With the near-natural language search, we can also define more precise graph patterns. For example, let’s say we would like to inspect the enemies of mister Frodo Baggins.
It automatically completes the pattern we are looking for. If we click on it, we get.
Search phrase: Shortest path
The search phrase mechanism allows us to add custom search functionalities to Neo4j Bloom. We begin by defining how the search phrase should look like. We can add parameters to the search query with the $ sign. The auto-complete support for the search phrase parameters comes out of the box, which is really lovely. We then input the desired cypher query, and we are good to go. We will use the following cypher query to find the shortest path of familial ties between any two characters.
MATCH (s:Character) WHERE s.name = $source MATCH (t:Character) WHERE t.name = $target MATCH p=shortestPath((s)-[:HAS_FATHER|HAS_MOTHER|SIBLING|SPOUSE*..25]-(t)) RETURN p
The filled-out search phrase will look like this:
Now we can execute our new search phrase in the search box. As mentioned, the application helps us with auto-complete.
We get the following result for the shortest path of familial ties between Frodo Baggins and Samwise Gamgee. They are related, but like 9 hops away. Frodo’s cousin has a son, that is grandfather to the husband of Sam’s daughter. I hope I didn’t mess it up.
Search phrase: Ancestor tree
Last but not least, we will create a search phrase for analyzing the family ancestor tree of a given character. I have prepared two variations for the cypher query. The first variation only traverses HAS_FATHER and HAS_MOTHER relationships.
MATCH (c:Character) WHERE c.name = $name MATCH p=(c)-[:HAS_FATHER|HAS_MOTHER*..20]->() RETURN p
The second variation visualizes the whole family component as computed before with the weakly connected component algorithm.
MATCH (c:Character) WHERE c.name = $name WITH c.familyComponent as family MATCH p=(c1)--(c2) WHERE c1.familyComponent = family AND c2.familyComponent = family RETURN p
We will use the first variation as it produces prettier visualizations for a blog post, but I encourage you to try out the second variation yourself.
We have added yet another search phrase. We can now use it in the search bar.
Results
Conclusion
I really enjoyed writing this blog post and scraping the WikiData knowledge graph. It contains a wealth of information that we can analyze in Neo4j. I could have broken this blog post in two or even three parts. Still, I like to keep in all in one place to show you how easy it is to complete the whole circle of graph analysis from importing and enriching the graph to basic data exploration and analysis that we top of with some beautiful graph visualizations. Try it out today and download the Neo4j Desktop . If you have any feedback or questions, you can share them on the Neo4j Community Site .
As always, the code is available on GitHub .
以上所述就是小编给大家介绍的《Lord of the Wiki Ring: Importing Wikidata into Neo4j and analyzing family trees》,希望对大家有所帮助,如果大家有任何疑问请给我留言,小编会及时回复大家的。在此也非常感谢大家对 码农网 的支持!
猜你喜欢:本站部分资源来源于网络,本站转载出于传递更多信息之目的,版权归原作者或者来源机构所有,如转载稿涉及版权问题,请联系我们。
Invisible Users
Jenna Burrell / The MIT Press / 2012-5-4 / USD 36.00
The urban youth frequenting the Internet cafes of Accra, Ghana, who are decidedly not members of their country's elite, use the Internet largely as a way to orchestrate encounters across distance and ......一起来看看 《Invisible Users》 这本书的介绍吧!