内容简介:本文将逐一介绍要修改的文件在
软件 | 版本 | 下载地址 |
---|---|---|
linux | Ubuntu Server 18.04.2 LTS | www.ubuntu.com/download/se… |
hadoop | hadoop-2.7.1 | archive.apache.org/dist/hadoop… |
java | jdk-8u211-linux-x64 | www.oracle.com/technetwork… |
hive | hive-2.3.5 | mirror.bit.edu.cn/apache/hive… |
mysql-connector-java | mysql-connector-java-5.1.45.jar | 命令行安装 |
postgresql-jdbc4 | postgresql-jdbc4.jar | 命令行安装 |
1.2 节点安排
名称 | ip | hostname |
---|---|---|
主节点 | 192.168.233.200 | Master |
子节点1 | 192.168.233.201 | Slave01 |
子节点2 | 192.168.233.202 | Slave02 |
1.3 说明
注意:本文的 hive
、 MySQL
、 PostgreSQL
均只安装在 Master
节点上,实际生产环境中,需根据实际情况调整
Hive
默认元数据保存在内嵌的 Derby
数据库中,这是最简单的一种存储方式,使用 derby
存储方式时,运行 hive
会在当前目录生成一个 derby
文件和一个 metastore_db
目录。 Derby
数据库中,只能允许一个会话连接,只适合简单的测试,实际生产环境中不适用。 为了支持多用户会话,则需要一个独立的元数据库,使用 MySQL
或者 PostgreSQL
作为元数据库, Hive
内部对 MySQL
和 PostgreSQL
提供了很好的支持。
本文将逐一介绍 hive
连接 Derby
、 PostgreSQL
、 MySQL
这三种数据库数据库的安装和配置。
2 hive连接Derby
2.1 解压
$ tar -zxvf apache-hive-2.3.5-bin.tar.gz -C /usr/local/bigdata & cd /usr/local/bigdata $ mv apache-hive-2.3.5-bin hive-2.3.5 $ sudo chown -R hadoop:hadoop hive #之前bigdata目录已经修改过权限了 复制代码
2.2 修改配置文件
要修改的文件在 /usr/local/hive-2.3.5/conf
目录下,需要修改 hive-site.xml
、 hive-env.sh
、 hive-log4j2.properties
这3个文件。
先把 .template
文件复制一份出来,然后进行修改。
$ cd /usr/local/hive-2.3.5/conf $ cp hive-default.xml.template hive-site.xml $ cp hive-env.sh.template hive-env.sh $ cp hive-log4j.properties.template hive-log4j.properties 复制代码
2.2.1 hive-site.xml(Derby)
配置Derby只需要修改 javax.jdo.option.ConnectionURL
指定 metastore_db
的存储位置即可
具体修改如下:
<property> <name>javax.jdo.option.ConnectionURL</name> <value>jdbc:derby:;databaseName=/usr/local/bigdata/hive-2.3.5/metastore/metastore_db;create=true</value> <description> JDBC connect string for a JDBC metastore. To use SSL to encrypt/authenticate the connection, provide database-specific SSL flag in the connection URL. For example, jdbc:postgresql://myhost/db?ssl=true for postgres database. </description> </property> 复制代码
2.2.2 hive-env .sh
添加:
export HADOOP_HOME=/usr/local/bigdata/hadoop-2.7.1 export HIVE_CONF_DIR=/usr/local/bigdata/hive-2.3.5/conf 复制代码
2.2.3 hive-log4j2.properties
日志配置可以先默认,暂时不修改什么。
2.3 配置环境变量
在 ~/.bashrc
文件中添加如下内容,执行 source ~/.bashrc
使其生效。
export HIVE_HOME=/usr/local/bigdata/hive-2.3.5 export PATH=$PATH:/usr/local/bigdata/hive-2.3.5/bin 复制代码
2.4 为hive创建数据仓库存储目录
注意先启动 hadoop
集群
$ hadoop fs -mkdir -p /user/hive/warehouse $ hadoop fs -mkdir -p /tmp $ hadoop fs -chmod g+w /user/hive/warehouse $ hadoop fs -chmod g+w /tmp 复制代码
2.4 启动hive
初始化元数据数据库
$ schematool -initSchema -dbType derby 复制代码
成功初始化应该出现如下内容:
$ schematool -initSchema -dbType derby SLF4J: Class path contains multiple SLF4J bindings. SLF4J: Found binding in [jar:file:/usr/local/bigdata/hive-2.3.5/lib/log4j-slf4j-impl-2.6.2.jar!/org/slf4j/impl/StaticLoggerBinder.class] SLF4J: Found binding in [jar:file:/usr/local/bigdata/hadoop-2.7.1/share/hadoop/common/lib/slf4j-log4j12-1.7.10.jar!/org/slf4j/impl/StaticLoggerBinder.class] SLF4J: See http://www.slf4j.org/codes.html#multiple_bindings for an explanation. SLF4J: Actual binding is of type [org.apache.logging.slf4j.Log4jLoggerFactory] Metastore connection URL: jdbc:derby:;databaseName=/usr/local/bigdata/hive-2.3.5/metastore/metastore_db;create=true Metastore Connection Driver : org.apache.derby.jdbc.EmbeddedDriver Metastore connection User: APP Starting metastore schema initialization to 2.3.0 Initialization script hive-schema-2.3.0.derby.sql Initialization script completed schemaTool completed 复制代码
启动hive
$ hive 复制代码
如果成功运行将出现如下内容:
$ hive SLF4J: Class path contains multiple SLF4J bindings. SLF4J: Found binding in [jar:file:/usr/local/bigdata/hive-2.3.5/lib/log4j-slf4j-impl-2.6.2.jar!/org/slf4j/impl/StaticLoggerBinder.class] SLF4J: Found binding in [jar:file:/usr/local/bigdata/hadoop-2.7.1/share/hadoop/common/lib/slf4j-log4j12-1.7.10.jar!/org/slf4j/impl/StaticLoggerBinder.class] SLF4J: See http://www.slf4j.org/codes.html#multiple_bindings for an explanation. SLF4J: Actual binding is of type [org.apache.logging.slf4j.Log4jLoggerFactory] Logging initialized using configuration in file:/usr/local/bigdata/hive-2.3.5/conf/hive-log4j2.properties Async: true Hive-on-MR is deprecated in Hive 2 and may not be available in the future versions. Consider using a different execution engine (i.e. spark, tez) or using Hive 1.X releases. hive> > 复制代码
创建表
create table t1( id int ,name string ,hobby array<string> ,add map<String,string> ) row format delimited fields terminated by ',' collection items terminated by '-' map keys terminated by ':' ; 复制代码
hive> > > > show databases; OK default Time taken: 22.279 seconds, Fetched: 1 row(s) hive> create table t1( > id int > ,name string > ,hobby array<string> > ,add map<String,string> > ) > row format delimited > fields terminated by ',' > collection items terminated by '-' > map keys terminated by ':' > ; OK Time taken: 1.791 seconds hive> 复制代码
至此,以 Derby
做元数据库的hive连接方式就配置完成了。
下面介绍如何将 hive
连接到 PostgreSQL
和 MySQL
3 PostgreSQL的安装
3.1 安装
执行如下命令:
$ sudo apt install postgresql postgresql-contrib 复制代码
安装完成后默认会有一个 postgres
的用户,且没有密码,作为管理员
3.2 启动PostgreSQL
$ sudo systemctl enable postgresql $ sudo systemctl start postgresql 复制代码
3.3 登录
hadoop@Master:~$ sudo -i -u postgres postgres@Master:~$ psql psql (10.8 (Ubuntu 10.8-0ubuntu0.18.04.1)) Type "help" for help. postgres=# help You are using psql, the command-line interface to PostgreSQL. Type: \copyright for distribution terms \h for help with SQL commands \? for help with psql commands \g or terminate with semicolon to execute query \q to quit postgres=# 复制代码
4 hive连接PostgreSQL
4.1 安装PostgreSQL-JDBC驱动
$ sudo apt-get install libpostgresql-jdbc-java $ ln -s /usr/share/java/postgresql-jdbc4.jar /usr/local/bigdata/hive-2.3.5/lib 复制代码
4.2 修改pg_hba.conf文件
修改 /etc/postgresql/10/main/pg_hba.conf文件
# Database administrative login by Unix domain socket #local all postgres peer local all postgres trust # TYPE DATABASE USER ADDRESS METHOD # "local" is for Unix domain socket connections only #local all all peer local all all trust # IPv4 local connections: #host all all 127.0.0.1/32 md5 host all all 127.0.0.1/32 trust # IPv6 local connections: #host all all ::1/128 md5 host all all ::1/128 trust # Allow replication connections from localhost, by a user with the # replication privilege. #local replication all peer #local replication all peer #local replication all peer local replication all trust host replication all 127.0.0.1/32 trust host replication all ::1/128 trust 复制代码
4.3 在PostpreSQL中创建数据库和用户
先创建一个名为 hiveuser
的用户,密码: 123456
,
然后创建一个名为 metastore
的数据库:
$ sudo -u postgres psql postgres=# CREATE USER hiveuser WITH PASSWORD '123456'; postgres=# CREATE DATABASE metastore; 复制代码
测试用户和数据库是否能登录
$ psql -h localhost -U hiveuser -d pymetastore 复制代码
登录成功说明配置完成
hadoop@Master:~$ psql -h localhost -U hiveuser -d metastore Password for user hive: psql (10.8 (Ubuntu 10.8-0ubuntu0.18.04.1)) SSL connection (protocol: TLSv1.2, cipher: ECDHE-RSA-AES256-GCM-SHA384, bits: 256, compression: off) Type "help" for help. pymetastore=> 复制代码
4.5 修改hive-site.xml(PostgreSQL)
之前配置的是以 Derby
做元数据库,现在同样也是修改 hive-site.xml
文件。
首先在开头添加如下内容:
<property> <name>system:java.io.tmpdir</name> <value>/tmp/hive/java</value> </property> <property> <name>system:user.name</name> <value>${user.name}</value> </property> 复制代码
然后修改如下属性:
name | value | description |
---|---|---|
javax.jdo.option.ConnectionURL
|
jdbc:postgresql://localhost/metastore
|
指定连接的数据库(之前创建的) |
javax.jdo.option.ConnectionDriverName
|
org.postgresql.Driver
|
数据库驱动 |
javax.jdo.option.ConnectionUserName
|
hiveuser
|
用户名(之前创建的) |
javax.jdo.option.ConnectionPassword
|
123456
|
用户名密码 |
具体如下:
<property> <name>javax.jdo.option.ConnectionURL</name> <value>jdbc:postgresql://localhost/metastore</value> <description> JDBC connect string for a JDBC metastore. To use SSL to encrypt/authenticate the connection, provide database-specific SSL flag in the connection URL. For example, jdbc:postgresql://myhost/db?ssl=true for postgres database. </description> </property> <property> <name>javax.jdo.option.ConnectionDriverName</name> <value>org.postgresql.Driver</value> <description>Driver class name for a JDBC metastore</description> </property> <property> <name>javax.jdo.option.ConnectionUserName</name> <value>hiveuser</value> <description>Username to use against metastore database</description> </property> <property> <name>javax.jdo.option.ConnectionPassword</name> <value>123456</value> <description>password to use against metastore database</description> </property> 复制代码
4.6 启动Hive
先运行 schematool
进行初始化:
schematool -dbType postgres -initSchema 复制代码
然后执行 $ hive
启动hive。
创建表格进行测试
hadoop@Master:~$ hive SLF4J: Class path contains multiple SLF4J bindings. SLF4J: Found binding in [jar:file:/usr/local/bigdata/hive-2.3.5/lib/log4j-slf4j-impl-2.6.2.jar!/org/slf4j/impl/StaticLoggerBinder.class] SLF4J: Found binding in [jar:file:/usr/local/bigdata/hadoop-2.7.7/share/hadoop/common/lib/slf4j-log4j12-1.7.10.jar!/org/slf4j/impl/StaticLoggerBinder.class] SLF4J: See http://www.slf4j.org/codes.html#multiple_bindings for an explanation. SLF4J: Actual binding is of type [org.apache.logging.slf4j.Log4jLoggerFactory] Logging initialized using configuration in file:/usr/local/bigdata/hive-2.3.5/conf/hive-log4j2.properties Async: true Java HotSpot(TM) 64-Bit Server VM warning: You have loaded library /usr/local/bigdata/hadoop-2.7.7/lib/native/libhadoop.so which might have disabled stack guard. The VM will try to fix the stack guard now. It's highly recommended that you fix the library with 'execstack -c <libfile>', or link it with '-z noexecstack'. Hive-on-MR is deprecated in Hive 2 and may not be available in the future versions. Consider using a different execution engine (i.e. spark, tez) or using Hive 1.X releases. hive> > show databases; OK default Time taken: 12.294 seconds, Fetched: 1 row(s) hive> create table t1( > id int > ,name string > ,hobby array<string> > ,add map<String,string> > ) > row format delimited > fields terminated by ',' > collection items terminated by '-' > map keys terminated by ':' > ; OK Time taken: 1.239 seconds hive> Connection reset by 192.168.233.200 port 22 复制代码
查看是否创建成功:
$ psql -h localhost -U hiveuser -d metastore psql (10.8 (Ubuntu 10.8-0ubuntu0.18.04.1)) SSL connection (protocol: TLSv1.2, cipher: ECDHE-RSA-AES256-GCM-SHA384, bits: 256, compression: off) Type "help" for help. metastore=> SELECT * from "TBLS"; TBL_ID | CREATE_TIME | DB_ID | LAST_ACCESS_TIME | OWNER | RETENTION | SD_ID | TBL_NAME | TBL_TYPE | VIEW_EXPANDED_TEXT | VIEW_ORIGINAL_TEXT | IS_REWRITE_ENABLED --------+-------------+-------+------------------+--------+-----------+-------+----------+---------------+--------------------+--------------------+-------------------- 1 | 1560074934 | 1 | 0 | hadoop | 0 | 1 | t1 | MANAGED_TABLE | | | f (1 row) 复制代码
5 MySQL 安装
5.1 安装
$ sudo apt install mysql-server 复制代码
5.2 设置MySQL的root用户密码
如果没有设置密码的话,设置密码。
这里密码设置为 hadoop
$ mysql -u root -p 复制代码
6 Hive连接MySQL
6.1 在MySQL中为Hive新建数据库
用来存放Hive的元数据。
与Hive配置文件 hive-site.xml
中的 mysql://localhost:3306/metastore
对应
#建立数据库和用户 mysql> create database if not exists metastore; mysql> CREATE USER 'hiveuser'@'localhost' IDENTIFIED BY '123456'; #设置远程登录的权限 mysql> REVOKE ALL PRIVILEGES, GRANT OPTION FROM 'hiveuser'@'localhost'; mysql> GRANT ALL PRIVILEGES ON metastore.* TO 'hiveuser'@'localhost'; #刷新配置 mysql> FLUSH PRIVILEGES; mysql> quit; 复制代码
6.2 安装MySQL-JDBC驱动
$ sudo apt-get install libmysql-java $ ln -s /usr/share/java/mysql-connector-java-5.1.45.jar /usr/local/bigdata/hive-2.3.5/lib 复制代码
6.3 修改修改hive-site.xml(MySQL)
首先在开头添加如下内容:
<property> <name>system:java.io.tmpdir</name> <value>/tmp/hive/java</value> </property> <property> <name>system:user.name</name> <value>${user.name}</value> </property> 复制代码
然后修改如下属性:
name | value | description |
---|---|---|
javax.jdo.option.ConnectionURL
|
jdbc:mysql://localhost:3306/metastore?useSSL=true
|
指定连接的数据库(之前创建的) |
javax.jdo.option.ConnectionDriverName
|
com.mysql.jdbc.Driver
|
数据库驱动 |
javax.jdo.option.ConnectionUserName
|
hiveuser
|
用户名(之前创建的) |
javax.jdo.option.ConnectionPassword
|
123456
|
用户名密码 |
具体如下:
<property> <name>javax.jdo.option.ConnectionURL</name> <value>jdbc:mysql://localhost:3306/metastore?useSSL=true</value> <description> JDBC connect string for a JDBC metastore. To use SSL to encrypt/authenticate the connection, provide database-specific SSL flag in the connection URL. For example, jdbc:postgresql://myhost/db?ssl=true for postgres database. </description> </property> <property> <name>javax.jdo.option.ConnectionDriverName</name> <value>com.mysql.jdbc.Driver</value> <description>Driver class name for a JDBC metastore</description> </property> <property> <name>javax.jdo.option.ConnectionUserName</name> <value>hiveuser</value> <description>Username to use against metastore database</description> </property> <property> <name>javax.jdo.option.ConnectionPassword</name> <value>123456</value> <description>password to use against metastore database</description> </property> 复制代码
6.4 启动hive
先初始化
schematool -dbType mysql -initSchema 复制代码
和前面一样,执行
$ hive 复制代码
7 问题总结
问题1
初始化derby时报如下错误,提示没有 hive-exec-*.jar
hadoop@Master:~$ schematool -initSchema -dbType derby Missing Hive Execution Jar: /usr/local/biddata/hive-2.3.5/lib/hive-exec-*.jar 复制代码
解决:
检查该目录下是否确实不存在 hive-exec-2.35.jar
,如果不存在,下载一个放到该目录下。
下载地址: mvnrepository.com/artifact/or…
如果存在,那一定是环境变量配置有问题,查看 HIVE_HOME
及 $HIVE_HOME/bin
是否配置正确。
问题2
报错:
Exception in thread "main" java.lang.IllegalArgumentException: java.net.URISyntaxException: Relative path in absolute URI: ${system:java.io.tmpdir%7D/$%7Bsystem:user.na at org.apache.hadoop.fs.Path.initialize(Path.java:205) at org.apache.hadoop.fs.Path.<init>(Path.java:171) at org.apache.hadoop.hive.ql.session.SessionState.createSessionDirs(SessionState.java:659) at org.apache.hadoop.hive.ql.session.SessionState.start(SessionState.java:582) at org.apache.hadoop.hive.ql.session.SessionState.beginStart(SessionState.java:549) at org.apache.hadoop.hive.cli.CliDriver.run(CliDriver.java:750) at org.apache.hadoop.hive.cli.CliDriver.main(CliDriver.java:686) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:498) at org.apache.hadoop.util.RunJar.run(RunJar.java:221) at org.apache.hadoop.util.RunJar.main(RunJar.java:136) Caused by: java.net.URISyntaxException: Relative path in absolute URI: ${system:java.io.tmpdir%7D/$%7Bsystem:user.name%7D at java.net.URI.checkPath(URI.java:1823) at java.net.URI.<init>(URI.java:745) at org.apache.hadoop.fs.Path.initialize(Path.java:202) ... 12 more 复制代码
解决
在 hive-site.xml
文件开头加入如下配置:
<property> <name>system:java.io.tmpdir</name> <value>/tmp/hive/java</value> </property> <property> <name>system:user.name</name> <value>${user.name}</value> </property> 复制代码
问题3
执行 $ schematool -dbType postgres -initSchema
时报错
hadoop@Master:~$ schematool -dbType postgres -initSchema SLF4J: Class path contains multiple SLF4J bindings. SLF4J: Found binding in [jar:file:/usr/local/bigdata/hive-2.3.5/lib/log4j-slf4j-impl-2.6.2.jar!/org/slf4j/impl/StaticLoggerBinder.class] SLF4J: Found binding in [jar:file:/usr/local/bigdata/hadoop-2.7.7/share/hadoop/common/lib/slf4j-log4j12-1.7.10.jar!/org/slf4j/impl/StaticLoggerBinder.class] SLF4J: See http://www.slf4j.org/codes.html#multiple_bindings for an explanation. SLF4J: Actual binding is of type [org.apache.logging.slf4j.Log4jLoggerFactory] Metastore connection URL: jdbc:postgresql://localhost/pymetastore Metastore Connection Driver : org.postgresql.Driver Metastore connection User: hive Starting metastore schema initialization to 2.3.0 Initialization script hive-schema-2.3.0.postgres.sql Error: ERROR: relation "BUCKETING_COLS" already exists (state=42P07,code=0) org.apache.hadoop.hive.metastore.HiveMetaException: Schema initialization FAILED! Metastore state would be inconsistent !! Underlying cause: java.io.IOException : Schema script failed, errorcode 2 Use --verbose for detailed stacktrace. *** schemaTool failed *** 复制代码
另外也会有这个错:
Error: ERROR: relation "txns" already exists (state=42P07,code=0) org.apache.hadoop.hive.metastore.HiveMetaException: Schema initialization FAILED! Metastore state would be inconsistent !! Underlying cause: java.io.IOException : Schema script failed, errorcode 2 Use --verbose for detailed stacktrace. *** schemaTool failed *** 复制代码
这个问题,我尝试了很久也没有找到原因,网上有说是 hive
版本的原因,我换了 hive-1.2.1
、 hive-1.2.2
等低版本的hive,依然时候有这个问题。
最后是重新创建用户和数据库就没有这个问题了,感觉是数据库有冲突。
问题4
Error: Duplicate key name 'PCS_STATS_IDX' (state=42000,code=1061) org.apache.hadoop.hive.metastore.HiveMetaException: Schema initialization FAILED! Metastore state would be inconsistent !! Underlying cause: java.io.IOException : Schema script failed, errorcode 2 Use --verbose for detailed stacktrace. *** schemaTool failed *** 复制代码
解决:
注意使用MySQL存储元数据的时候,使用root用户有可能权限不够,会报错。另外, $ schematool -dbType postgres -initSchema
执行一次就好了。
以上所述就是小编给大家介绍的《Hive的安装及配置》,希望对大家有所帮助,如果大家有任何疑问请给我留言,小编会及时回复大家的。在此也非常感谢大家对 码农网 的支持!
猜你喜欢:本站部分资源来源于网络,本站转载出于传递更多信息之目的,版权归原作者或者来源机构所有,如转载稿涉及版权问题,请联系我们。
Flash与后台
刘明伟 / 清华大学出版社 / 2007-6 / 52.00元
《Flash与后台:ASP/ASP.NET/PHP/Java/JavaScript/Delphi总动员》从目前热门的F1ash与ASP、ASP.NET、PHP、Java、JavaScript和Delphi的交互知识入手,深入浅出地讲解了F1ash与后台通信的原理和交互的过程,力求使阅读《Flash与后台:ASP/ASP.NET/PHP/Java/JavaScript/Delphi总动员》的每一位读......一起来看看 《Flash与后台》 这本书的介绍吧!