Hive的安装及配置

栏目: 服务器 · 发布时间: 5年前

内容简介:本文将逐一介绍要修改的文件在
软件 版本 下载地址
linux Ubuntu Server 18.04.2 LTS www.ubuntu.com/download/se…
hadoop hadoop-2.7.1 archive.apache.org/dist/hadoop…
java jdk-8u211-linux-x64 www.oracle.com/technetwork…
hive hive-2.3.5 mirror.bit.edu.cn/apache/hive…
mysql-connector-java mysql-connector-java-5.1.45.jar 命令行安装
postgresql-jdbc4 postgresql-jdbc4.jar 命令行安装

1.2 节点安排

名称 ip hostname
主节点 192.168.233.200 Master
子节点1 192.168.233.201 Slave01
子节点2 192.168.233.202 Slave02

1.3 说明

注意:本文的 hiveMySQLPostgreSQL 均只安装在 Master 节点上,实际生产环境中,需根据实际情况调整

Hive 默认元数据保存在内嵌的 Derby 数据库中,这是最简单的一种存储方式,使用 derby 存储方式时,运行 hive 会在当前目录生成一个 derby 文件和一个 metastore_db 目录。 Derby 数据库中,只能允许一个会话连接,只适合简单的测试,实际生产环境中不适用。 为了支持多用户会话,则需要一个独立的元数据库,使用 MySQL 或者 PostgreSQL 作为元数据库, Hive 内部对 MySQLPostgreSQL 提供了很好的支持。

本文将逐一介绍 hive 连接 DerbyPostgreSQLMySQL 这三种数据库数据库的安装和配置。

2 hive连接Derby

2.1 解压

$ tar -zxvf apache-hive-2.3.5-bin.tar.gz -C /usr/local/bigdata & cd /usr/local/bigdata
$ mv apache-hive-2.3.5-bin hive-2.3.5
$ sudo chown -R hadoop:hadoop hive #之前bigdata目录已经修改过权限了
复制代码

2.2 修改配置文件

要修改的文件在 /usr/local/hive-2.3.5/conf 目录下,需要修改 hive-site.xmlhive-env.shhive-log4j2.properties 这3个文件。

先把 .template 文件复制一份出来,然后进行修改。

$ cd /usr/local/hive-2.3.5/conf
$ cp hive-default.xml.template hive-site.xml
$ cp hive-env.sh.template hive-env.sh
$ cp hive-log4j.properties.template hive-log4j.properties
复制代码

2.2.1 hive-site.xml(Derby)

配置Derby只需要修改 javax.jdo.option.ConnectionURL 指定 metastore_db 的存储位置即可 具体修改如下:

<property>
    <name>javax.jdo.option.ConnectionURL</name>
    <value>jdbc:derby:;databaseName=/usr/local/bigdata/hive-2.3.5/metastore/metastore_db;create=true</value>
    <description>
      JDBC connect string for a JDBC metastore.
      To use SSL to encrypt/authenticate the connection, provide database-specific SSL flag in the connection URL.
      For example, jdbc:postgresql://myhost/db?ssl=true for postgres database.
    </description>
</property>
复制代码

2.2.2 hive-env .sh

添加:

export HADOOP_HOME=/usr/local/bigdata/hadoop-2.7.1
export HIVE_CONF_DIR=/usr/local/bigdata/hive-2.3.5/conf
复制代码

2.2.3 hive-log4j2.properties

日志配置可以先默认,暂时不修改什么。

2.3 配置环境变量

~/.bashrc 文件中添加如下内容,执行 source ~/.bashrc 使其生效。

export HIVE_HOME=/usr/local/bigdata/hive-2.3.5
export PATH=$PATH:/usr/local/bigdata/hive-2.3.5/bin
复制代码

2.4 为hive创建数据仓库存储目录

注意先启动 hadoop 集群

$ hadoop fs -mkdir -p /user/hive/warehouse
$ hadoop fs -mkdir -p /tmp
$ hadoop fs -chmod g+w /user/hive/warehouse
$ hadoop fs -chmod g+w /tmp
复制代码

2.4 启动hive

初始化元数据数据库

$ schematool -initSchema -dbType derby
复制代码

成功初始化应该出现如下内容:

$ schematool -initSchema -dbType derby
SLF4J: Class path contains multiple SLF4J bindings.
SLF4J: Found binding in [jar:file:/usr/local/bigdata/hive-2.3.5/lib/log4j-slf4j-impl-2.6.2.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: Found binding in [jar:file:/usr/local/bigdata/hadoop-2.7.1/share/hadoop/common/lib/slf4j-log4j12-1.7.10.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: See http://www.slf4j.org/codes.html#multiple_bindings for an explanation.
SLF4J: Actual binding is of type [org.apache.logging.slf4j.Log4jLoggerFactory]
Metastore connection URL:        jdbc:derby:;databaseName=/usr/local/bigdata/hive-2.3.5/metastore/metastore_db;create=true
Metastore Connection Driver :    org.apache.derby.jdbc.EmbeddedDriver
Metastore connection User:       APP
Starting metastore schema initialization to 2.3.0
Initialization script hive-schema-2.3.0.derby.sql
Initialization script completed
schemaTool completed
复制代码

启动hive

$ hive

复制代码

如果成功运行将出现如下内容:

$ hive
SLF4J: Class path contains multiple SLF4J bindings.
SLF4J: Found binding in [jar:file:/usr/local/bigdata/hive-2.3.5/lib/log4j-slf4j-impl-2.6.2.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: Found binding in [jar:file:/usr/local/bigdata/hadoop-2.7.1/share/hadoop/common/lib/slf4j-log4j12-1.7.10.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: See http://www.slf4j.org/codes.html#multiple_bindings for an explanation.
SLF4J: Actual binding is of type [org.apache.logging.slf4j.Log4jLoggerFactory]

Logging initialized using configuration in file:/usr/local/bigdata/hive-2.3.5/conf/hive-log4j2.properties Async: true
Hive-on-MR is deprecated in Hive 2 and may not be available in the future versions. Consider using a different execution engine (i.e. spark, tez) or using Hive 1.X releases.
hive>
    >
复制代码

创建表

create table t1(
     id      int
    ,name    string
    ,hobby   array<string>
    ,add     map<String,string>
    )
    row format delimited
    fields terminated by ','
    collection items terminated by '-'
    map keys terminated by ':'
    ;
复制代码
hive>
    >
    >
    > show databases;
OK
default
Time taken: 22.279 seconds, Fetched: 1 row(s)
hive> create table t1(
    >     id      int
    >    ,name    string
    >    ,hobby   array<string>
    >    ,add     map<String,string>
    > )
    > row format delimited
    > fields terminated by ','
    > collection items terminated by '-'
    > map keys terminated by ':'
    > ;
OK
Time taken: 1.791 seconds
hive>
复制代码

至此,以 Derby 做元数据库的hive连接方式就配置完成了。

下面介绍如何将 hive 连接到 PostgreSQLMySQL

3 PostgreSQL的安装

3.1 安装

执行如下命令:

$ sudo apt install postgresql postgresql-contrib
复制代码

安装完成后默认会有一个 postgres 的用户,且没有密码,作为管理员

3.2 启动PostgreSQL

$ sudo systemctl enable postgresql
$ sudo systemctl start postgresql
复制代码

3.3 登录

hadoop@Master:~$ sudo -i -u postgres
postgres@Master:~$ psql
psql (10.8 (Ubuntu 10.8-0ubuntu0.18.04.1))
Type "help" for help.

postgres=# help
You are using psql, the command-line interface to PostgreSQL.
Type:  \copyright for distribution terms
       \h for help with SQL commands
       \? for help with psql commands
       \g or terminate with semicolon to execute query
       \q to quit
postgres=#
复制代码

4 hive连接PostgreSQL

4.1 安装PostgreSQL-JDBC驱动

$ sudo apt-get install libpostgresql-jdbc-java
$ ln -s /usr/share/java/postgresql-jdbc4.jar /usr/local/bigdata/hive-2.3.5/lib
复制代码

4.2 修改pg_hba.conf文件

修改 /etc/postgresql/10/main/pg_hba.conf文件

# Database administrative login by Unix domain socket
#local   all             postgres                                peer
local   all             postgres                                trust

# TYPE  DATABASE        USER            ADDRESS                 METHOD

# "local" is for Unix domain socket connections only
#local   all             all                                     peer
local   all             all                                     trust
# IPv4 local connections:
#host    all             all             127.0.0.1/32            md5
host    all             all             127.0.0.1/32            trust
# IPv6 local connections:
#host    all             all             ::1/128                 md5
host    all             all             ::1/128                 trust
# Allow replication connections from localhost, by a user with the
# replication privilege.
#local   replication     all                                     peer
#local   replication     all                                     peer
#local   replication     all                                     peer
local   replication     all                                     trust
host    replication     all             127.0.0.1/32            trust
host    replication     all             ::1/128                 trust
复制代码

4.3 在PostpreSQL中创建数据库和用户

先创建一个名为 hiveuser 的用户,密码: 123456

然后创建一个名为 metastore 的数据库:

$ sudo -u postgres psql 

postgres=# CREATE USER hiveuser WITH PASSWORD '123456';
postgres=# CREATE DATABASE metastore;
复制代码

测试用户和数据库是否能登录

$ psql -h localhost -U hiveuser -d pymetastore
复制代码

登录成功说明配置完成

hadoop@Master:~$  psql -h localhost -U hiveuser -d metastore
Password for user hive:
psql (10.8 (Ubuntu 10.8-0ubuntu0.18.04.1))
SSL connection (protocol: TLSv1.2, cipher: ECDHE-RSA-AES256-GCM-SHA384, bits: 256, compression: off)
Type "help" for help.

pymetastore=>
复制代码

4.5 修改hive-site.xml(PostgreSQL)

之前配置的是以 Derby 做元数据库,现在同样也是修改 hive-site.xml 文件。 首先在开头添加如下内容:

<property>
    <name>system:java.io.tmpdir</name>
    <value>/tmp/hive/java</value>
  </property>
 <property>
    <name>system:user.name</name>
    <value>${user.name}</value>
 </property>
复制代码

然后修改如下属性:

name value description
javax.jdo.option.ConnectionURL jdbc:postgresql://localhost/metastore 指定连接的数据库(之前创建的)
javax.jdo.option.ConnectionDriverName org.postgresql.Driver 数据库驱动
javax.jdo.option.ConnectionUserName hiveuser 用户名(之前创建的)
javax.jdo.option.ConnectionPassword 123456 用户名密码

具体如下:

<property>
    <name>javax.jdo.option.ConnectionURL</name>
    <value>jdbc:postgresql://localhost/metastore</value>
    <description>
      JDBC connect string for a JDBC metastore.
      To use SSL to encrypt/authenticate the connection, provide database-specific SSL flag in the connection URL.
      For example, jdbc:postgresql://myhost/db?ssl=true for postgres database.
    </description>
  </property>

  <property>
    <name>javax.jdo.option.ConnectionDriverName</name>
    <value>org.postgresql.Driver</value>
    <description>Driver class name for a JDBC metastore</description>
  </property>

  <property>
    <name>javax.jdo.option.ConnectionUserName</name>
    <value>hiveuser</value>
    <description>Username to use against metastore database</description>
  </property>

  <property>
    <name>javax.jdo.option.ConnectionPassword</name>
    <value>123456</value>
    <description>password to use against metastore database</description>
  </property>

复制代码

4.6 启动Hive

先运行 schematool 进行初始化:

schematool -dbType postgres -initSchema
复制代码

然后执行 $ hive 启动hive。

创建表格进行测试

hadoop@Master:~$ hive
SLF4J: Class path contains multiple SLF4J bindings.
SLF4J: Found binding in [jar:file:/usr/local/bigdata/hive-2.3.5/lib/log4j-slf4j-impl-2.6.2.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: Found binding in [jar:file:/usr/local/bigdata/hadoop-2.7.7/share/hadoop/common/lib/slf4j-log4j12-1.7.10.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: See http://www.slf4j.org/codes.html#multiple_bindings for an explanation.
SLF4J: Actual binding is of type [org.apache.logging.slf4j.Log4jLoggerFactory]
Logging initialized using configuration in file:/usr/local/bigdata/hive-2.3.5/conf/hive-log4j2.properties Async: true
Java HotSpot(TM) 64-Bit Server VM warning: You have loaded library /usr/local/bigdata/hadoop-2.7.7/lib/native/libhadoop.so which might have disabled stack guard. The VM will try to fix the stack guard now.
It's highly recommended that you fix the library with 'execstack -c <libfile>', or link it with '-z noexecstack'.
Hive-on-MR is deprecated in Hive 2 and may not be available in the future versions. Consider using a different execution engine (i.e. spark, tez) or using Hive 1.X releases.
hive>
    > show databases;
OK
default
Time taken: 12.294 seconds, Fetched: 1 row(s)
hive> create table t1(
    >     id      int
    >    ,name    string
    >    ,hobby   array<string>
    >    ,add     map<String,string>
    > )
    > row format delimited
    > fields terminated by ','
    > collection items terminated by '-'
    > map keys terminated by ':'
    > ;
OK
Time taken: 1.239 seconds
hive> Connection reset by 192.168.233.200 port 22

复制代码

查看是否创建成功:

$ psql -h localhost -U hiveuser -d metastore
psql (10.8 (Ubuntu 10.8-0ubuntu0.18.04.1))
SSL connection (protocol: TLSv1.2, cipher: ECDHE-RSA-AES256-GCM-SHA384, bits: 256, compression: off)
Type "help" for help.

metastore=> SELECT * from "TBLS";
 TBL_ID | CREATE_TIME | DB_ID | LAST_ACCESS_TIME | OWNER  | RETENTION | SD_ID | TBL_NAME |   TBL_TYPE    | VIEW_EXPANDED_TEXT | VIEW_ORIGINAL_TEXT | IS_REWRITE_ENABLED
--------+-------------+-------+------------------+--------+-----------+-------+----------+---------------+--------------------+--------------------+--------------------
      1 |  1560074934 |     1 |                0 | hadoop |         0 |     1 | t1       | MANAGED_TABLE |                    |                    | f
(1 row)

复制代码

5 MySQL 安装

5.1 安装

$ sudo apt install mysql-server
复制代码

5.2 设置MySQL的root用户密码

如果没有设置密码的话,设置密码。

这里密码设置为 hadoop

$ mysql -u root -p
复制代码

6 Hive连接MySQL

6.1 在MySQL中为Hive新建数据库

用来存放Hive的元数据。

与Hive配置文件 hive-site.xml 中的 mysql://localhost:3306/metastore 对应

#建立数据库和用户
mysql> create database if not exists metastore;
mysql> CREATE USER 'hiveuser'@'localhost' IDENTIFIED BY '123456';

#设置远程登录的权限
mysql> REVOKE ALL PRIVILEGES, GRANT OPTION FROM 'hiveuser'@'localhost';
mysql> GRANT ALL PRIVILEGES ON metastore.* TO 'hiveuser'@'localhost';

#刷新配置
mysql> FLUSH PRIVILEGES;
mysql> quit;
复制代码

6.2 安装MySQL-JDBC驱动

$ sudo apt-get install libmysql-java
$ ln -s /usr/share/java/mysql-connector-java-5.1.45.jar /usr/local/bigdata/hive-2.3.5/lib
复制代码

6.3 修改修改hive-site.xml(MySQL)

首先在开头添加如下内容:

<property>
    <name>system:java.io.tmpdir</name>
    <value>/tmp/hive/java</value>
  </property>
 <property>
    <name>system:user.name</name>
    <value>${user.name}</value>
 </property>
复制代码

然后修改如下属性:

name value description
javax.jdo.option.ConnectionURL jdbc:mysql://localhost:3306/metastore?useSSL=true 指定连接的数据库(之前创建的)
javax.jdo.option.ConnectionDriverName com.mysql.jdbc.Driver 数据库驱动
javax.jdo.option.ConnectionUserName hiveuser 用户名(之前创建的)
javax.jdo.option.ConnectionPassword 123456 用户名密码

具体如下:

<property>
    <name>javax.jdo.option.ConnectionURL</name>
    <value>jdbc:mysql://localhost:3306/metastore?useSSL=true</value>
    <description>
      JDBC connect string for a JDBC metastore.
      To use SSL to encrypt/authenticate the connection, provide database-specific SSL flag in the connection URL.
      For example, jdbc:postgresql://myhost/db?ssl=true for postgres database.
    </description>
  </property>

  <property>
    <name>javax.jdo.option.ConnectionDriverName</name>
    <value>com.mysql.jdbc.Driver</value>
    <description>Driver class name for a JDBC metastore</description>
  </property>

  <property>
    <name>javax.jdo.option.ConnectionUserName</name>
    <value>hiveuser</value>
    <description>Username to use against metastore database</description>
  </property>

  <property>
    <name>javax.jdo.option.ConnectionPassword</name>
    <value>123456</value>
    <description>password to use against metastore database</description>
  </property>

复制代码

6.4 启动hive

先初始化

schematool -dbType mysql -initSchema
复制代码

和前面一样,执行

$ hive
复制代码

7 问题总结

问题1

初始化derby时报如下错误,提示没有 hive-exec-*.jar

hadoop@Master:~$ schematool -initSchema -dbType derby
Missing Hive Execution Jar: /usr/local/biddata/hive-2.3.5/lib/hive-exec-*.jar
复制代码

解决:

检查该目录下是否确实不存在 hive-exec-2.35.jar ,如果不存在,下载一个放到该目录下。 下载地址: mvnrepository.com/artifact/or… 如果存在,那一定是环境变量配置有问题,查看 HIVE_HOME$HIVE_HOME/bin 是否配置正确。

问题2

报错:

Exception in thread "main" java.lang.IllegalArgumentException: java.net.URISyntaxException: Relative path in absolute URI: ${system:java.io.tmpdir%7D/$%7Bsystem:user.na
        at org.apache.hadoop.fs.Path.initialize(Path.java:205)
        at org.apache.hadoop.fs.Path.<init>(Path.java:171)
        at org.apache.hadoop.hive.ql.session.SessionState.createSessionDirs(SessionState.java:659)
        at org.apache.hadoop.hive.ql.session.SessionState.start(SessionState.java:582)
        at org.apache.hadoop.hive.ql.session.SessionState.beginStart(SessionState.java:549)
        at org.apache.hadoop.hive.cli.CliDriver.run(CliDriver.java:750)
        at org.apache.hadoop.hive.cli.CliDriver.main(CliDriver.java:686)
        at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
        at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
        at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
        at java.lang.reflect.Method.invoke(Method.java:498)
        at org.apache.hadoop.util.RunJar.run(RunJar.java:221)
        at org.apache.hadoop.util.RunJar.main(RunJar.java:136)
Caused by: java.net.URISyntaxException: Relative path in absolute URI: ${system:java.io.tmpdir%7D/$%7Bsystem:user.name%7D
        at java.net.URI.checkPath(URI.java:1823)
        at java.net.URI.<init>(URI.java:745)
        at org.apache.hadoop.fs.Path.initialize(Path.java:202)
        ... 12 more
复制代码

解决

hive-site.xml 文件开头加入如下配置:

<property>
    <name>system:java.io.tmpdir</name>
    <value>/tmp/hive/java</value>
  </property>
<property>
    <name>system:user.name</name>
    <value>${user.name}</value>
</property>
复制代码

问题3

执行 $ schematool -dbType postgres -initSchema 时报错

hadoop@Master:~$ schematool -dbType postgres -initSchema
SLF4J: Class path contains multiple SLF4J bindings.
SLF4J: Found binding in [jar:file:/usr/local/bigdata/hive-2.3.5/lib/log4j-slf4j-impl-2.6.2.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: Found binding in [jar:file:/usr/local/bigdata/hadoop-2.7.7/share/hadoop/common/lib/slf4j-log4j12-1.7.10.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: See http://www.slf4j.org/codes.html#multiple_bindings for an explanation.
SLF4J: Actual binding is of type [org.apache.logging.slf4j.Log4jLoggerFactory]
Metastore connection URL:        jdbc:postgresql://localhost/pymetastore
Metastore Connection Driver :    org.postgresql.Driver
Metastore connection User:       hive
Starting metastore schema initialization to 2.3.0
Initialization script hive-schema-2.3.0.postgres.sql
Error: ERROR: relation "BUCKETING_COLS" already exists (state=42P07,code=0)
org.apache.hadoop.hive.metastore.HiveMetaException: Schema initialization FAILED! Metastore state would be inconsistent !!
Underlying cause: java.io.IOException : Schema script failed, errorcode 2
Use --verbose for detailed stacktrace.
*** schemaTool failed ***
复制代码

另外也会有这个错:

Error: ERROR: relation "txns" already exists (state=42P07,code=0)
org.apache.hadoop.hive.metastore.HiveMetaException: Schema initialization FAILED! Metastore state would be inconsistent !!
Underlying cause: java.io.IOException : Schema script failed, errorcode 2
Use --verbose for detailed stacktrace.
*** schemaTool failed ***
复制代码

这个问题,我尝试了很久也没有找到原因,网上有说是 hive 版本的原因,我换了 hive-1.2.1hive-1.2.2 等低版本的hive,依然时候有这个问题。 最后是重新创建用户和数据库就没有这个问题了,感觉是数据库有冲突。

问题4

Error: Duplicate key name 'PCS_STATS_IDX' (state=42000,code=1061)
org.apache.hadoop.hive.metastore.HiveMetaException: Schema initialization FAILED! Metastore state would be inconsistent !!
Underlying cause: java.io.IOException : Schema script failed, errorcode 2
Use --verbose for detailed stacktrace.
*** schemaTool failed ***
复制代码

解决:

注意使用MySQL存储元数据的时候,使用root用户有可能权限不够,会报错。另外, $ schematool -dbType postgres -initSchema 执行一次就好了。


以上所述就是小编给大家介绍的《Hive的安装及配置》,希望对大家有所帮助,如果大家有任何疑问请给我留言,小编会及时回复大家的。在此也非常感谢大家对 码农网 的支持!

查看所有标签

猜你喜欢:

本站部分资源来源于网络,本站转载出于传递更多信息之目的,版权归原作者或者来源机构所有,如转载稿涉及版权问题,请联系我们

Flash与后台

Flash与后台

刘明伟 / 清华大学出版社 / 2007-6 / 52.00元

《Flash与后台:ASP/ASP.NET/PHP/Java/JavaScript/Delphi总动员》从目前热门的F1ash与ASP、ASP.NET、PHP、Java、JavaScript和Delphi的交互知识入手,深入浅出地讲解了F1ash与后台通信的原理和交互的过程,力求使阅读《Flash与后台:ASP/ASP.NET/PHP/Java/JavaScript/Delphi总动员》的每一位读......一起来看看 《Flash与后台》 这本书的介绍吧!

SHA 加密
SHA 加密

SHA 加密工具

RGB CMYK 转换工具
RGB CMYK 转换工具

RGB CMYK 互转工具

HEX HSV 转换工具
HEX HSV 转换工具

HEX HSV 互换工具