内容简介:这是坚持技术写作计划(含翻译)的第20篇,定个小目标999,每周最少2篇。本文主要介绍,如何使用大数据神兽Kylin(2.6.1)连接cdh6.2。
这是坚持技术写作计划(含翻译)的第20篇,定个小目标999,每周最少2篇。
本文主要介绍,如何使用大数据神兽Kylin(2.6.1)连接cdh6.2。
提示
-
因为cdh6.2使用的是hadoop3,而目前的kylin3.0beta版本只是hadoop2,所以只能安装kylin2.5+,此处选择kylin2.6.1-cdh60(cdh6.0版)
-
cdh 默认开启hdfs的权限校验,但是kylin在使用过程中,会频繁报hdfs权限问题,所以需要关闭校验,先登录CM,关闭后,记得重启hdfs示例。
安装kylin
下载kylin2.6.1二进制包
wget http://mirror.bit.edu.cn/apache/kylin/apache-kylin-2.6.1/apache-kylin-2.6.1-bin-cdh60.tar.gz tar zxf apache-kylin-2.6.1-bin-cdh60.tar.gz -C /usr/local/ ln -s /usr/local/apache-kylin-2.6.1-bin-cdh60 /usr/local/kylin
配置kylin环境变量
cat << EOF | sudo tee -a /etc/profile #设置 java 环境 export JAVA_HOME=/usr/java/jdk1.8.0_181-cloudera/ export CLASSPATH=.:\$JAVA_HOME/lib:\$JAVA_HOME/jre/lib:\$CLASSPATH export KYLIN_HOME=/usr/local/kylin export PATH=\$JAVA_HOME/bin:\$JAVA_HOME/jre/bin:\$PATH export CDH_HOME=/opt/cloudera/parcels/CDH export SPARK_HOME=\${CDH_HOME}/lib/spark export HBASE_HOME=\${CDH_HOME}/lib/hbase EOF source /etc/profile
如果不加 $HBASE_HOME
会报 hbase-common lib not found
Retrieving hadoop conf dir... KYLIN_HOME is set to /usr/local/kylin Retrieving hive dependency... Retrieving hbase dependency... Error: Could not find or load main class org.apache.hadoop.hbase.util.GetJavaProperty hbase-common lib not found
在hdfs创建kylin和spark目录
sudo -u hdfs hadoop fs -mkdir -p /kylin/spark-history
否则会报(其实关闭hdfs权限校验后,已经不需要使用hdfs hadoop 身份创建文件夹了)
$KYLIN_HOME/bin/check-env.sh Retrieving hadoop conf dir... Error: Could not find or load main class org.apache.hadoop.hbase.util.GetJavaProperty KYLIN_HOME is set to /usr/local/kylin mkdir: Permission denied: user=root, access=WRITE, inode="/kylin":hdfs:supergroup:drwxr-xr-x Failed to create hdfs:///kylin/spark-history. Please make sure the user has right to access hdfs:///kylin/spark-history
yum install -y net-tools
否则会报
$KYLIN_HOME/bin/check-env.sh Retrieving hadoop conf dir... Error: Could not find or load main class org.apache.hadoop.hbase.util.GetJavaProperty KYLIN_HOME is set to /usr/local/kylin /usr/local/kylin/bin/check-port-availability.sh: line 27: netstat: command not found
启动kylin
$KYLIN_HOME/bin/kylin.sh start
如果成功会输出
A new Kylin instance is started by root. To stop it, run 'kylin.sh stop' Check the log at /usr/local/kylin/logs/kylin.log Web UI is at http://<hostname>:7070/kylin
浏览器打开 http://IP:7070/kylin ,用户名密码是 ADMIN/KYLIN
使用kylin(以官方demo演示)
导入数据
$KYLIN_HOME/bin/sample.sh Retrieving hadoop conf dir... Error: Could not find or load main class org.apache.hadoop.hbase.util.GetJavaProperty Loading sample data into HDFS tmp path: /tmp/kylin/sample_cube/data Going to create sample tables in hive to database DEFAULT by cli WARNING: Use "yarn jar" to launch YARN applications. SLF4J: Class path contains multiple SLF4J bindings. SLF4J: Found binding in [jar:file:/opt/cloudera/parcels/CDH-6.2.0-1.cdh6.2.0.p0.967373/jars/log4j-slf4j-impl-2.8.2.jar!/org/slf4j/impl/StaticLoggerBinder.class] SLF4J: Found binding in [jar:file:/opt/cloudera/parcels/CDH-6.2.0-1.cdh6.2.0.p0.967373/jars/slf4j-log4j12-1.7.25.jar!/org/slf4j/impl/StaticLoggerBinder.class] SLF4J: See http://www.slf4j.org/codes.html#multiple_bindings for an explanation. SLF4J: Actual binding is of type [org.apache.logging.slf4j.Log4jLoggerFactory] Logging initialized using configuration in jar:file:/opt/cloudera/parcels/CDH-6.2.0-1.cdh6.2.0.p0.967373/jars/hive-common-2.1.1-cdh6.2.0.jar!/hive-log4j2.properties Async: false OK //.... Sample cube is created successfully in project 'learn_kylin'. ** Restart Kylin Server or click Web UI => System Tab => Reload Metadata to take effect **
重新加载元数据
选择 learn_kylin
构建Cube
选择 Model,选择kylin_sales_model,选择build
此处选择起止日期。
如果没关闭hdfs权限校验,此处肯定会build失败。可以通过右侧 >
图标点击查看进度。
build成功后,回到Insight界面,此时已经成功构建出5张表了。
讲解demo表
Kylin的示例是销售业务分析
- KYLIN_SALES 事实表,存有销售订单的详细信息(卖家,商品分类,订单金额,商品数量等)
- KYLIN_COUNTRY 维度表,存有国家信息(简写,名称等)
- KYLIN_CATEGORY_GROUPINGS 维度表,存有商品分类的详细介绍(分类名称等)
- KYLIN_CAL_DT 维度表,存有时间扩展信息(日期所在年始,月始,周始,年份,月份等)
- KYLIN_ACCOUNT 维度表,存有账户信息(账户id,卖家等级,买家等级,国家等)
运行查询语句
执行 select count(1) from kylin_sales
点击submit,下方会显示执行结果,以及执行耗时(此处是1.8秒)。kylin会缓存执行结果,再次执行发现变成了0.18秒
执行稍微复杂的 SQL 语句
select sum(KYLIN_SALES.PRICE) as price_sum,KYLIN_CATEGORY_GROUPINGS.META_CATEG_NAME,KYLIN_CATEGORY_GROUPINGS.CATEG_LVL2_NAME from KYLIN_SALES inner join KYLIN_CATEGORY_GROUPINGS on KYLIN_SALES.LEAF_CATEG_ID = KYLIN_CATEGORY_GROUPINGS.LEAF_CATEG_ID and KYLIN_SALES.LSTG_SITE_ID = KYLIN_CATEGORY_GROUPINGS.SITE_ID group by KYLIN_CATEGORY_GROUPINGS.META_CATEG_NAME,KYLIN_CATEGORY_GROUPINGS.CATEG_LVL2_NAME order by KYLIN_CATEGORY_GROUPINGS.META_CATEG_NAME asc,KYLIN_CATEGORY_GROUPINGS.CATEG_LVL2_NAME desc
自带简单的可视化。
以上就是本文的全部内容,希望对大家的学习有所帮助,也希望大家多多支持 码农网
猜你喜欢:本站部分资源来源于网络,本站转载出于传递更多信息之目的,版权归原作者或者来源机构所有,如转载稿涉及版权问题,请联系我们。
How to Build a Billion Dollar App
George Berkowski / Little, Brown Book Group / 2015-4-1 / USD 24.95
Apps have changed the way we communicate, shop, play, interact and travel and their phenomenal popularity has presented possibly the biggest business opportunity in history. In How to Build a Billi......一起来看看 《How to Build a Billion Dollar App》 这本书的介绍吧!