内容简介:这是坚持技术写作计划(含翻译)的第20篇,定个小目标999,每周最少2篇。本文主要介绍,如何使用大数据神兽Kylin(2.6.1)连接cdh6.2。
这是坚持技术写作计划(含翻译)的第20篇,定个小目标999,每周最少2篇。
本文主要介绍,如何使用大数据神兽Kylin(2.6.1)连接cdh6.2。
提示
-
因为cdh6.2使用的是hadoop3,而目前的kylin3.0beta版本只是hadoop2,所以只能安装kylin2.5+,此处选择kylin2.6.1-cdh60(cdh6.0版)
-
cdh 默认开启hdfs的权限校验,但是kylin在使用过程中,会频繁报hdfs权限问题,所以需要关闭校验,先登录CM,关闭后,记得重启hdfs示例。
安装kylin
下载kylin2.6.1二进制包
wget http://mirror.bit.edu.cn/apache/kylin/apache-kylin-2.6.1/apache-kylin-2.6.1-bin-cdh60.tar.gz tar zxf apache-kylin-2.6.1-bin-cdh60.tar.gz -C /usr/local/ ln -s /usr/local/apache-kylin-2.6.1-bin-cdh60 /usr/local/kylin
配置kylin环境变量
cat << EOF | sudo tee -a /etc/profile #设置 java 环境 export JAVA_HOME=/usr/java/jdk1.8.0_181-cloudera/ export CLASSPATH=.:\$JAVA_HOME/lib:\$JAVA_HOME/jre/lib:\$CLASSPATH export KYLIN_HOME=/usr/local/kylin export PATH=\$JAVA_HOME/bin:\$JAVA_HOME/jre/bin:\$PATH export CDH_HOME=/opt/cloudera/parcels/CDH export SPARK_HOME=\${CDH_HOME}/lib/spark export HBASE_HOME=\${CDH_HOME}/lib/hbase EOF source /etc/profile
如果不加 $HBASE_HOME
会报 hbase-common lib not found
Retrieving hadoop conf dir... KYLIN_HOME is set to /usr/local/kylin Retrieving hive dependency... Retrieving hbase dependency... Error: Could not find or load main class org.apache.hadoop.hbase.util.GetJavaProperty hbase-common lib not found
在hdfs创建kylin和spark目录
sudo -u hdfs hadoop fs -mkdir -p /kylin/spark-history
否则会报(其实关闭hdfs权限校验后,已经不需要使用hdfs hadoop 身份创建文件夹了)
$KYLIN_HOME/bin/check-env.sh Retrieving hadoop conf dir... Error: Could not find or load main class org.apache.hadoop.hbase.util.GetJavaProperty KYLIN_HOME is set to /usr/local/kylin mkdir: Permission denied: user=root, access=WRITE, inode="/kylin":hdfs:supergroup:drwxr-xr-x Failed to create hdfs:///kylin/spark-history. Please make sure the user has right to access hdfs:///kylin/spark-history
yum install -y net-tools
否则会报
$KYLIN_HOME/bin/check-env.sh Retrieving hadoop conf dir... Error: Could not find or load main class org.apache.hadoop.hbase.util.GetJavaProperty KYLIN_HOME is set to /usr/local/kylin /usr/local/kylin/bin/check-port-availability.sh: line 27: netstat: command not found
启动kylin
$KYLIN_HOME/bin/kylin.sh start
如果成功会输出
A new Kylin instance is started by root. To stop it, run 'kylin.sh stop' Check the log at /usr/local/kylin/logs/kylin.log Web UI is at http://<hostname>:7070/kylin
浏览器打开 http://IP:7070/kylin ,用户名密码是 ADMIN/KYLIN
使用kylin(以官方demo演示)
导入数据
$KYLIN_HOME/bin/sample.sh Retrieving hadoop conf dir... Error: Could not find or load main class org.apache.hadoop.hbase.util.GetJavaProperty Loading sample data into HDFS tmp path: /tmp/kylin/sample_cube/data Going to create sample tables in hive to database DEFAULT by cli WARNING: Use "yarn jar" to launch YARN applications. SLF4J: Class path contains multiple SLF4J bindings. SLF4J: Found binding in [jar:file:/opt/cloudera/parcels/CDH-6.2.0-1.cdh6.2.0.p0.967373/jars/log4j-slf4j-impl-2.8.2.jar!/org/slf4j/impl/StaticLoggerBinder.class] SLF4J: Found binding in [jar:file:/opt/cloudera/parcels/CDH-6.2.0-1.cdh6.2.0.p0.967373/jars/slf4j-log4j12-1.7.25.jar!/org/slf4j/impl/StaticLoggerBinder.class] SLF4J: See http://www.slf4j.org/codes.html#multiple_bindings for an explanation. SLF4J: Actual binding is of type [org.apache.logging.slf4j.Log4jLoggerFactory] Logging initialized using configuration in jar:file:/opt/cloudera/parcels/CDH-6.2.0-1.cdh6.2.0.p0.967373/jars/hive-common-2.1.1-cdh6.2.0.jar!/hive-log4j2.properties Async: false OK //.... Sample cube is created successfully in project 'learn_kylin'. ** Restart Kylin Server or click Web UI => System Tab => Reload Metadata to take effect **
重新加载元数据
选择 learn_kylin
构建Cube
选择 Model,选择kylin_sales_model,选择build
此处选择起止日期。
如果没关闭hdfs权限校验,此处肯定会build失败。可以通过右侧 >
图标点击查看进度。
build成功后,回到Insight界面,此时已经成功构建出5张表了。
讲解demo表
Kylin的示例是销售业务分析
- KYLIN_SALES 事实表,存有销售订单的详细信息(卖家,商品分类,订单金额,商品数量等)
- KYLIN_COUNTRY 维度表,存有国家信息(简写,名称等)
- KYLIN_CATEGORY_GROUPINGS 维度表,存有商品分类的详细介绍(分类名称等)
- KYLIN_CAL_DT 维度表,存有时间扩展信息(日期所在年始,月始,周始,年份,月份等)
- KYLIN_ACCOUNT 维度表,存有账户信息(账户id,卖家等级,买家等级,国家等)
运行查询语句
执行 select count(1) from kylin_sales
点击submit,下方会显示执行结果,以及执行耗时(此处是1.8秒)。kylin会缓存执行结果,再次执行发现变成了0.18秒
执行稍微复杂的 SQL 语句
select sum(KYLIN_SALES.PRICE) as price_sum,KYLIN_CATEGORY_GROUPINGS.META_CATEG_NAME,KYLIN_CATEGORY_GROUPINGS.CATEG_LVL2_NAME from KYLIN_SALES inner join KYLIN_CATEGORY_GROUPINGS on KYLIN_SALES.LEAF_CATEG_ID = KYLIN_CATEGORY_GROUPINGS.LEAF_CATEG_ID and KYLIN_SALES.LSTG_SITE_ID = KYLIN_CATEGORY_GROUPINGS.SITE_ID group by KYLIN_CATEGORY_GROUPINGS.META_CATEG_NAME,KYLIN_CATEGORY_GROUPINGS.CATEG_LVL2_NAME order by KYLIN_CATEGORY_GROUPINGS.META_CATEG_NAME asc,KYLIN_CATEGORY_GROUPINGS.CATEG_LVL2_NAME desc
自带简单的可视化。
以上就是本文的全部内容,希望对大家的学习有所帮助,也希望大家多多支持 码农网
猜你喜欢:本站部分资源来源于网络,本站转载出于传递更多信息之目的,版权归原作者或者来源机构所有,如转载稿涉及版权问题,请联系我们。