JanusGraph批量导入数据代码总结

栏目: 服务器 · 发布时间: 5年前

内容简介:版权声明:本文为博主原创文章,未经博主允许不得转载。如有问题,欢迎指正! https://blog.csdn.net/sun7545526/article/details/90757800

版权声明:本文为博主原创文章,未经博主允许不得转载。如有问题,欢迎指正! https://blog.csdn.net/sun7545526/article/details/90757800

这里写自定义目录标题

  • 1. Json导入到本地TinkerGraph
  • 2. CSV导入到本地TinkerGraph
  • 3. Json导入到分布式存储(berkeleyje-es)

本文中的代码基于janusgraph 0.3.1进行演示。数据文件都为janusgraph包中自带的数据文件。

1. Json导入到本地TinkerGraph

1.1 配置

conf/hadoop-graph/hadoop-load-json.properties 配置如下:

#
# Hadoop Graph Configuration
#
gremlin.graph=org.apache.tinkerpop.gremlin.hadoop.structure.HadoopGraph
gremlin.hadoop.graphReader=org.apache.tinkerpop.gremlin.hadoop.structure.io.graphson.GraphSONInputFormat
gremlin.hadoop.graphWriter=org.apache.hadoop.mapreduce.lib.output.NullOutputFormat
gremlin.hadoop.inputLocation=./data/grateful-dead.json
gremlin.hadoop.outputLocation=output
gremlin.hadoop.jarsInDistributedCache=true


#
# SparkGraphComputer Configuration
#
spark.master=local[*]
spark.executor.memory=1g
spark.serializer=org.apache.spark.serializer.KryoSerializer
spark.kryo.registrator=org.apache.tinkerpop.gremlin.spark.structure.io.gryo.GryoRegistrator

1.2 样例Json

{"id":1,"label":"song","inE":{"followedBy":[{"id":3059,"outV":153,"properties":{"weight":1}},{"id":276,"outV":5,"properties":{"weight":2}},{"id":3704,"outV":3,"properties":{"weight":2}},{"id":4383,"outV":62,"pr
operties":{"weight":1}}]},"outE":{"followedBy":[{"id":0,"inV":2,"properties":{"weight":1}},{"id":1,"inV":3,"properties":{"weight":2}},{"id":2,"inV":4,"properties":{"weight":1}},{"id":3,"inV":5,"properties":{"we
ight":1}},{"id":4,"inV":6,"properties":{"weight":1}}],"sungBy":[{"id":7612,"inV":340}],"writtenBy":[{"id":7611,"inV":527}]},"properties":{"name":[{"id":0,"value":"HEY BO DIDDLEY"}],"songType":[{"id":2,"value":"
cover"}],"performances":[{"id":1,"value":5}]}}
{"id":2,"label":"song","inE":{"followedBy":[{"id":0,"outV":1,"properties":{"weight":1}},{"id":323,"outV":34,"properties":{"weight":1}}]},"outE":{"followedBy":[{"id":6190,"inV":123,"properties":{"weight":1}},{"i
d":6191,"inV":50,"properties":{"weight":1}}],"sungBy":[{"id":7666,"inV":525}],"writtenBy":[{"id":7665,"inV":525}]},"properties":{"name":[{"id":3,"value":"IM A MAN"}],"songType":[{"id":5,"value":"cover"}],"perfo
rmances":[{"id":4,"value":1}]}}
s

1.3 代码

readGraph = GraphFactory.open('conf/hadoop-graph/hadoop-load-json.properties')
writeGraphConf = new BaseConfiguration()
writeGraphConf.setProperty("gremlin.graph", "org.apache.tinkerpop.gremlin.tinkergraph.structure.TinkerGraph")
writeGraphConf.setProperty("gremlin.tinkergraph.graphFormat", "gryo")
writeGraphConf.setProperty("gremlin.tinkergraph.graphLocation", "/tmp/csv-graph.kryo")
blvp = BulkLoaderVertexProgram.build().bulkLoader(OneTimeBulkLoader).writeGraph(writeGraphConf).create(readGraph)
readGraph.compute(SparkGraphComputer).workers(1).program(blvp).submit().get()

1.4 文件校验

新生成的文件如下

[root@vm03 data]# ls -l /tmp/csv-graph.kryo 
-rw-r--r--. 1 root root 726353 May 29 04:09 /tmp/csv-graph.kryo

2. CSV导入到本地TinkerGraph

2.1 配置

conf/hadoop-graph/hadoop-load-csv.properties 配置如下:

#
# Hadoop Graph Configuration
#
gremlin.graph=org.apache.tinkerpop.gremlin.hadoop.structure.HadoopGraph
gremlin.hadoop.graphReader=org.apache.tinkerpop.gremlin.hadoop.structure.io.script.ScriptInputFormat
gremlin.hadoop.graphWriter=org.apache.tinkerpop.gremlin.hadoop.structure.io.graphson.GraphSONOutputFormat
gremlin.hadoop.inputLocation=./data/grateful-dead.txt
gremlin.hadoop.outputLocation=output
gremlin.hadoop.jarsInDistributedCache=true
gremlin.hadoop.scriptInputFormat.script=./data/script-input-grateful-dead.groovy

#
# SparkGraphComputer Configuration
#
spark.master=local[*]
spark.executor.memory=1g
spark.serializer=org.apache.spark.serializer.KryoSerializer
spark.kryo.registrator=org.apache.tinkerpop.gremlin.spark.structure.io.gryo.GryoRegistrator

2.2 样例CSV

1,song,HEY BO DIDDLEY,cover,5   followedBy,2,1|followedBy,3,2|followedBy,4,1|followedBy,5,1|followedBy,6,1|sungBy,340|writtenBy,527     followedBy,3,2|followedBy,5,2|followedBy,62,1|followedBy,153,1
2,song,IM A MAN,cover,1 followedBy,50,1|followedBy,123,1|sungBy,525|writtenBy,525       followedBy,1,1|followedBy,34,1
3,song,NOT FADE AWAY,cover,531  followedBy,81,1|followedBy,86,5|followedBy,127,10|followedBy,59,1|followedBy,83,3|followedBy,103,2|followedBy,68,1|followedBy,134,2|followedBy,131,1|followedBy,151,1|followedBy,3

2.3 代码

script-input-grateful-dead.groovy 代码如下:

def parse(line) {
    def (vertex, outEdges, inEdges) = line.split(/\t/, 3)
    def (v1id, v1label, v1props) = vertex.split(/,/, 3)
    def v1 = graph.addVertex(T.id, v1id.toInteger(), T.label, v1label)
    switch (v1label) {
        case "song":
            def (name, songType, performances) = v1props.split(/,/)
            v1.property("name", name)
            v1.property("songType", songType)
            v1.property("performances", performances.toInteger())
            break
        case "artist":
            v1.property("name", v1props)
            break
        default:
            throw new Exception("Unexpected vertex label: ${v1label}")
    }
    [[outEdges, true], [inEdges, false]].each { def edges, def out ->
        edges.split(/\|/).grep().each { def edge ->
            def parts = edge.split(/,/)
            def otherV, eLabel, weight = null
            if (parts.size() == 2) {
                (eLabel, otherV) = parts
            } else {
                (eLabel, otherV, weight) = parts
            }
            def v2 = graph.addVertex(T.id, otherV.toInteger())
            def e = out ? v1.addOutEdge(eLabel, v2) : v1.addInEdge(eLabel, v2)
            if (weight != null) e.property("weight", weight.toInteger())
        }
    }
    return v1
}

janusgraph代码:

readGraph = GraphFactory.open('conf/hadoop-graph/hadoop-load-csv.properties')
writeGraphConf = new BaseConfiguration()
writeGraphConf.setProperty("gremlin.graph", "org.apache.tinkerpop.gremlin.tinkergraph.structure.TinkerGraph")
writeGraphConf.setProperty("gremlin.tinkergraph.graphFormat", "gryo")
writeGraphConf.setProperty("gremlin.tinkergraph.graphLocation", "/tmp/csv-graph2.kryo")
blvp = BulkLoaderVertexProgram.build().bulkLoader(OneTimeBulkLoader).writeGraph(writeGraphConf).create(readGraph)
readGraph.compute(SparkGraphComputer).workers(1).program(blvp).submit().get()

g = GraphFactory.open(writeGraphConf).traversal()
g.V().valueMap(true)

2.4 文件校验

新生成的文件如下

[root@vm03 data]# ls -l /tmp/csv-graph2.kryo 
-rw-r--r--. 1 root root 339939 May 29 04:56 /tmp/csv-graph2.kryo

3. Json导入到分布式存储(berkeleyje-es)

3.1 配置

conf/hadoop-graph/hadoop-load-json-ber-es.properties 配置如下:

#
# Hadoop Graph Configuration
#
gremlin.graph=org.apache.tinkerpop.gremlin.hadoop.structure.HadoopGraph
gremlin.hadoop.graphReader=org.apache.tinkerpop.gremlin.hadoop.structure.io.graphson.GraphSONInputFormat
gremlin.hadoop.graphWriter=org.apache.hadoop.mapreduce.lib.output.NullOutputFormat
gremlin.hadoop.inputLocation=./data/grateful-dead.json
gremlin.hadoop.outputLocation=output
gremlin.hadoop.jarsInDistributedCache=true


#
# SparkGraphComputer Configuration
#
spark.master=local[*]
spark.executor.memory=1g
spark.serializer=org.apache.spark.serializer.KryoSerializer
spark.kryo.registrator=org.apache.tinkerpop.gremlin.spark.structure.io.gryo.GryoRegistrator

./conf/janusgraph-berkeleyje-es-bulkload.properties 配置如下:

gremlin.graph=org.janusgraph.core.JanusGraphFactory
storage.backend=berkeleyje
storage.directory=../db/berkeley
index.search.backend=elasticsearch

3.2 样例Json

{"id":1,"label":"song","inE":{"followedBy":[{"id":3059,"outV":153,"properties":{"weight":1}},{"id":276,"outV":5,"properties":{"weight":2}},{"id":3704,"outV":3,"properties":{"weight":2}},{"id":4383,"outV":62,"pr
operties":{"weight":1}}]},"outE":{"followedBy":[{"id":0,"inV":2,"properties":{"weight":1}},{"id":1,"inV":3,"properties":{"weight":2}},{"id":2,"inV":4,"properties":{"weight":1}},{"id":3,"inV":5,"properties":{"we
ight":1}},{"id":4,"inV":6,"properties":{"weight":1}}],"sungBy":[{"id":7612,"inV":340}],"writtenBy":[{"id":7611,"inV":527}]},"properties":{"name":[{"id":0,"value":"HEY BO DIDDLEY"}],"songType":[{"id":2,"value":"
cover"}],"performances":[{"id":1,"value":5}]}}
{"id":2,"label":"song","inE":{"followedBy":[{"id":0,"outV":1,"properties":{"weight":1}},{"id":323,"outV":34,"properties":{"weight":1}}]},"outE":{"followedBy":[{"id":6190,"inV":123,"properties":{"weight":1}},{"i
d":6191,"inV":50,"properties":{"weight":1}}],"sungBy":[{"id":7666,"inV":525}],"writtenBy":[{"id":7665,"inV":525}]},"properties":{"name":[{"id":3,"value":"IM A MAN"}],"songType":[{"id":5,"value":"cover"}],"perfo
rmances":[{"id":4,"value":1}]}}
s

3.3 代码

outputGraphConfig = './conf/janusgraph-berkeleyje-es-bulkload.properties'
readGraph = GraphFactory.open('conf/hadoop-graph/hadoop-load-json-ber-es.properties')

blvp = BulkLoaderVertexProgram.build().writeGraph(outputGraphConfig).create(readGraph)
readGraph.compute(SparkGraphComputer).workers(1).program(blvp).submit().get()
g = GraphFactory.open(outputGraphConfig).traversal()
g.V().valueMap(true)

3.4 验证

通过gremlin-server搭建服务进行验证

  1. gremline-server配置文件如下(gremlin-server-berkeleyje-bulkload.yaml),与gremlin-server-berkeleyje.yaml类似,下面的位置进行调整:
graph: conf/janusgraph-berkeleyje-es-bulkload.properties
  1. ./gremlin-server.sh conf/gremlin-server/gremlin-server-berkeleyje-bulkload.yaml 启动服务
  2. 通过graphexp进行查询

以上就是本文的全部内容,希望对大家的学习有所帮助,也希望大家多多支持 码农网

查看所有标签

猜你喜欢:

本站部分资源来源于网络,本站转载出于传递更多信息之目的,版权归原作者或者来源机构所有,如转载稿涉及版权问题,请联系我们

An Introduction to the Analysis of Algorithms

An Introduction to the Analysis of Algorithms

Robert Sedgewick、Philippe Flajolet / Addison-Wesley Professional / 1995-12-10 / CAD 67.99

This book is a thorough overview of the primary techniques and models used in the mathematical analysis of algorithms. The first half of the book draws upon classical mathematical material from discre......一起来看看 《An Introduction to the Analysis of Algorithms》 这本书的介绍吧!

JS 压缩/解压工具
JS 压缩/解压工具

在线压缩/解压 JS 代码

Markdown 在线编辑器
Markdown 在线编辑器

Markdown 在线编辑器

RGB HSV 转换
RGB HSV 转换

RGB HSV 互转工具