JanusGraph批量导入数据代码总结

栏目: 服务器 · 发布时间: 6年前

内容简介:版权声明:本文为博主原创文章,未经博主允许不得转载。如有问题,欢迎指正! https://blog.csdn.net/sun7545526/article/details/90757800

版权声明:本文为博主原创文章,未经博主允许不得转载。如有问题,欢迎指正! https://blog.csdn.net/sun7545526/article/details/90757800

这里写自定义目录标题

  • 1. Json导入到本地TinkerGraph
  • 2. CSV导入到本地TinkerGraph
  • 3. Json导入到分布式存储(berkeleyje-es)

本文中的代码基于janusgraph 0.3.1进行演示。数据文件都为janusgraph包中自带的数据文件。

1. Json导入到本地TinkerGraph

1.1 配置

conf/hadoop-graph/hadoop-load-json.properties 配置如下:

#
# Hadoop Graph Configuration
#
gremlin.graph=org.apache.tinkerpop.gremlin.hadoop.structure.HadoopGraph
gremlin.hadoop.graphReader=org.apache.tinkerpop.gremlin.hadoop.structure.io.graphson.GraphSONInputFormat
gremlin.hadoop.graphWriter=org.apache.hadoop.mapreduce.lib.output.NullOutputFormat
gremlin.hadoop.inputLocation=./data/grateful-dead.json
gremlin.hadoop.outputLocation=output
gremlin.hadoop.jarsInDistributedCache=true


#
# SparkGraphComputer Configuration
#
spark.master=local[*]
spark.executor.memory=1g
spark.serializer=org.apache.spark.serializer.KryoSerializer
spark.kryo.registrator=org.apache.tinkerpop.gremlin.spark.structure.io.gryo.GryoRegistrator

1.2 样例Json

{"id":1,"label":"song","inE":{"followedBy":[{"id":3059,"outV":153,"properties":{"weight":1}},{"id":276,"outV":5,"properties":{"weight":2}},{"id":3704,"outV":3,"properties":{"weight":2}},{"id":4383,"outV":62,"pr
operties":{"weight":1}}]},"outE":{"followedBy":[{"id":0,"inV":2,"properties":{"weight":1}},{"id":1,"inV":3,"properties":{"weight":2}},{"id":2,"inV":4,"properties":{"weight":1}},{"id":3,"inV":5,"properties":{"we
ight":1}},{"id":4,"inV":6,"properties":{"weight":1}}],"sungBy":[{"id":7612,"inV":340}],"writtenBy":[{"id":7611,"inV":527}]},"properties":{"name":[{"id":0,"value":"HEY BO DIDDLEY"}],"songType":[{"id":2,"value":"
cover"}],"performances":[{"id":1,"value":5}]}}
{"id":2,"label":"song","inE":{"followedBy":[{"id":0,"outV":1,"properties":{"weight":1}},{"id":323,"outV":34,"properties":{"weight":1}}]},"outE":{"followedBy":[{"id":6190,"inV":123,"properties":{"weight":1}},{"i
d":6191,"inV":50,"properties":{"weight":1}}],"sungBy":[{"id":7666,"inV":525}],"writtenBy":[{"id":7665,"inV":525}]},"properties":{"name":[{"id":3,"value":"IM A MAN"}],"songType":[{"id":5,"value":"cover"}],"perfo
rmances":[{"id":4,"value":1}]}}
s

1.3 代码

readGraph = GraphFactory.open('conf/hadoop-graph/hadoop-load-json.properties')
writeGraphConf = new BaseConfiguration()
writeGraphConf.setProperty("gremlin.graph", "org.apache.tinkerpop.gremlin.tinkergraph.structure.TinkerGraph")
writeGraphConf.setProperty("gremlin.tinkergraph.graphFormat", "gryo")
writeGraphConf.setProperty("gremlin.tinkergraph.graphLocation", "/tmp/csv-graph.kryo")
blvp = BulkLoaderVertexProgram.build().bulkLoader(OneTimeBulkLoader).writeGraph(writeGraphConf).create(readGraph)
readGraph.compute(SparkGraphComputer).workers(1).program(blvp).submit().get()

1.4 文件校验

新生成的文件如下

[root@vm03 data]# ls -l /tmp/csv-graph.kryo 
-rw-r--r--. 1 root root 726353 May 29 04:09 /tmp/csv-graph.kryo

2. CSV导入到本地TinkerGraph

2.1 配置

conf/hadoop-graph/hadoop-load-csv.properties 配置如下:

#
# Hadoop Graph Configuration
#
gremlin.graph=org.apache.tinkerpop.gremlin.hadoop.structure.HadoopGraph
gremlin.hadoop.graphReader=org.apache.tinkerpop.gremlin.hadoop.structure.io.script.ScriptInputFormat
gremlin.hadoop.graphWriter=org.apache.tinkerpop.gremlin.hadoop.structure.io.graphson.GraphSONOutputFormat
gremlin.hadoop.inputLocation=./data/grateful-dead.txt
gremlin.hadoop.outputLocation=output
gremlin.hadoop.jarsInDistributedCache=true
gremlin.hadoop.scriptInputFormat.script=./data/script-input-grateful-dead.groovy

#
# SparkGraphComputer Configuration
#
spark.master=local[*]
spark.executor.memory=1g
spark.serializer=org.apache.spark.serializer.KryoSerializer
spark.kryo.registrator=org.apache.tinkerpop.gremlin.spark.structure.io.gryo.GryoRegistrator

2.2 样例CSV

1,song,HEY BO DIDDLEY,cover,5   followedBy,2,1|followedBy,3,2|followedBy,4,1|followedBy,5,1|followedBy,6,1|sungBy,340|writtenBy,527     followedBy,3,2|followedBy,5,2|followedBy,62,1|followedBy,153,1
2,song,IM A MAN,cover,1 followedBy,50,1|followedBy,123,1|sungBy,525|writtenBy,525       followedBy,1,1|followedBy,34,1
3,song,NOT FADE AWAY,cover,531  followedBy,81,1|followedBy,86,5|followedBy,127,10|followedBy,59,1|followedBy,83,3|followedBy,103,2|followedBy,68,1|followedBy,134,2|followedBy,131,1|followedBy,151,1|followedBy,3

2.3 代码

script-input-grateful-dead.groovy 代码如下:

def parse(line) {
    def (vertex, outEdges, inEdges) = line.split(/\t/, 3)
    def (v1id, v1label, v1props) = vertex.split(/,/, 3)
    def v1 = graph.addVertex(T.id, v1id.toInteger(), T.label, v1label)
    switch (v1label) {
        case "song":
            def (name, songType, performances) = v1props.split(/,/)
            v1.property("name", name)
            v1.property("songType", songType)
            v1.property("performances", performances.toInteger())
            break
        case "artist":
            v1.property("name", v1props)
            break
        default:
            throw new Exception("Unexpected vertex label: ${v1label}")
    }
    [[outEdges, true], [inEdges, false]].each { def edges, def out ->
        edges.split(/\|/).grep().each { def edge ->
            def parts = edge.split(/,/)
            def otherV, eLabel, weight = null
            if (parts.size() == 2) {
                (eLabel, otherV) = parts
            } else {
                (eLabel, otherV, weight) = parts
            }
            def v2 = graph.addVertex(T.id, otherV.toInteger())
            def e = out ? v1.addOutEdge(eLabel, v2) : v1.addInEdge(eLabel, v2)
            if (weight != null) e.property("weight", weight.toInteger())
        }
    }
    return v1
}

janusgraph代码:

readGraph = GraphFactory.open('conf/hadoop-graph/hadoop-load-csv.properties')
writeGraphConf = new BaseConfiguration()
writeGraphConf.setProperty("gremlin.graph", "org.apache.tinkerpop.gremlin.tinkergraph.structure.TinkerGraph")
writeGraphConf.setProperty("gremlin.tinkergraph.graphFormat", "gryo")
writeGraphConf.setProperty("gremlin.tinkergraph.graphLocation", "/tmp/csv-graph2.kryo")
blvp = BulkLoaderVertexProgram.build().bulkLoader(OneTimeBulkLoader).writeGraph(writeGraphConf).create(readGraph)
readGraph.compute(SparkGraphComputer).workers(1).program(blvp).submit().get()

g = GraphFactory.open(writeGraphConf).traversal()
g.V().valueMap(true)

2.4 文件校验

新生成的文件如下

[root@vm03 data]# ls -l /tmp/csv-graph2.kryo 
-rw-r--r--. 1 root root 339939 May 29 04:56 /tmp/csv-graph2.kryo

3. Json导入到分布式存储(berkeleyje-es)

3.1 配置

conf/hadoop-graph/hadoop-load-json-ber-es.properties 配置如下:

#
# Hadoop Graph Configuration
#
gremlin.graph=org.apache.tinkerpop.gremlin.hadoop.structure.HadoopGraph
gremlin.hadoop.graphReader=org.apache.tinkerpop.gremlin.hadoop.structure.io.graphson.GraphSONInputFormat
gremlin.hadoop.graphWriter=org.apache.hadoop.mapreduce.lib.output.NullOutputFormat
gremlin.hadoop.inputLocation=./data/grateful-dead.json
gremlin.hadoop.outputLocation=output
gremlin.hadoop.jarsInDistributedCache=true


#
# SparkGraphComputer Configuration
#
spark.master=local[*]
spark.executor.memory=1g
spark.serializer=org.apache.spark.serializer.KryoSerializer
spark.kryo.registrator=org.apache.tinkerpop.gremlin.spark.structure.io.gryo.GryoRegistrator

./conf/janusgraph-berkeleyje-es-bulkload.properties 配置如下:

gremlin.graph=org.janusgraph.core.JanusGraphFactory
storage.backend=berkeleyje
storage.directory=../db/berkeley
index.search.backend=elasticsearch

3.2 样例Json

{"id":1,"label":"song","inE":{"followedBy":[{"id":3059,"outV":153,"properties":{"weight":1}},{"id":276,"outV":5,"properties":{"weight":2}},{"id":3704,"outV":3,"properties":{"weight":2}},{"id":4383,"outV":62,"pr
operties":{"weight":1}}]},"outE":{"followedBy":[{"id":0,"inV":2,"properties":{"weight":1}},{"id":1,"inV":3,"properties":{"weight":2}},{"id":2,"inV":4,"properties":{"weight":1}},{"id":3,"inV":5,"properties":{"we
ight":1}},{"id":4,"inV":6,"properties":{"weight":1}}],"sungBy":[{"id":7612,"inV":340}],"writtenBy":[{"id":7611,"inV":527}]},"properties":{"name":[{"id":0,"value":"HEY BO DIDDLEY"}],"songType":[{"id":2,"value":"
cover"}],"performances":[{"id":1,"value":5}]}}
{"id":2,"label":"song","inE":{"followedBy":[{"id":0,"outV":1,"properties":{"weight":1}},{"id":323,"outV":34,"properties":{"weight":1}}]},"outE":{"followedBy":[{"id":6190,"inV":123,"properties":{"weight":1}},{"i
d":6191,"inV":50,"properties":{"weight":1}}],"sungBy":[{"id":7666,"inV":525}],"writtenBy":[{"id":7665,"inV":525}]},"properties":{"name":[{"id":3,"value":"IM A MAN"}],"songType":[{"id":5,"value":"cover"}],"perfo
rmances":[{"id":4,"value":1}]}}
s

3.3 代码

outputGraphConfig = './conf/janusgraph-berkeleyje-es-bulkload.properties'
readGraph = GraphFactory.open('conf/hadoop-graph/hadoop-load-json-ber-es.properties')

blvp = BulkLoaderVertexProgram.build().writeGraph(outputGraphConfig).create(readGraph)
readGraph.compute(SparkGraphComputer).workers(1).program(blvp).submit().get()
g = GraphFactory.open(outputGraphConfig).traversal()
g.V().valueMap(true)

3.4 验证

通过gremlin-server搭建服务进行验证

  1. gremline-server配置文件如下(gremlin-server-berkeleyje-bulkload.yaml),与gremlin-server-berkeleyje.yaml类似,下面的位置进行调整:
graph: conf/janusgraph-berkeleyje-es-bulkload.properties
  1. ./gremlin-server.sh conf/gremlin-server/gremlin-server-berkeleyje-bulkload.yaml 启动服务
  2. 通过graphexp进行查询

以上就是本文的全部内容,希望对大家的学习有所帮助,也希望大家多多支持 码农网

查看所有标签

猜你喜欢:

本站部分资源来源于网络,本站转载出于传递更多信息之目的,版权归原作者或者来源机构所有,如转载稿涉及版权问题,请联系我们

Building Websites with Joomla!

Building Websites with Joomla!

H Graf / Packt Publishing / 2006-01-20 / USD 44.99

This book is a fast paced tutorial to creating a website using Joomla!. If you've never used Joomla!, or even any web content management system before, then this book will walk you through each step i......一起来看看 《Building Websites with Joomla!》 这本书的介绍吧!

SHA 加密
SHA 加密

SHA 加密工具

XML 在线格式化
XML 在线格式化

在线 XML 格式化压缩工具

HSV CMYK 转换工具
HSV CMYK 转换工具

HSV CMYK互换工具