Hive 配置

栏目: 数据库 · Mysql · 发布时间: 6年前

内容简介：如何将一个基于传统关系型数据库和结构化查询语句（SQL）的现有数据转移到 Hadoop ，对于大量的 SQL 用户来说 HIVE 就是解决这个问题的方案。它提供了一个被称为 Hive 查询语言（简称 HiveQL 或 HQL）的 SQL 方言，来查询储存在 Hadoop 集群中的数据。这里我选择版本较老的 HIVE 1.1.0 使用，因为它和我之前安装的 Hadoop比较配因为版本太老了在 Mirror 里面都找不到，只有 Archive 里面才有

如何将一个基于传统关系型数据库和结构化查询语句（SQL）的现有数据转移到 Hadoop ，对于大量的 SQL 用户来说 HIVE 就是解决这个问题的方案。它提供了一个被称为 Hive 查询语言（简称 HiveQL 或 HQL）的 SQL 方言，来查询储存在 Hadoop 集群中的数据。

准备工作

这里我选择版本较老的 HIVE 1.1.0 使用，因为它和我之前安装的 Hadoop比较配

因为版本太老了在 Mirror 里面都找不到，只有 Archive 里面才有

curl -O https://archive.apache.org/dist/hive/hive-1.1.0/apache-hive-1.1.0-bin.tar.gz
tar -xf apache-hive-1.1.0-bin.tar.gz

配置环境变量

vi /etc/profile

在配置下方添加以下内容

export HIVE_HOME=/root/apache-hive-1.1.0-bin
export PATH=$PATH:$HIVE_HOME/bin

让系统更新配置文件

source /etc/profile

修改配置文件

cp /root/apache-hive-1.1.0-bin/conf/hive-default.xml.template /root/apache-hive-1.1.0-bin/conf/hive-site.xml # 使用默认配置文件为正式配置文件
vi /root/apache-hive-1.1.0-bin/conf/hive-site.xml

在文件头部增加以下信息，如果不加启动会报错

<property>
    <name>system:java.io.tmpdir</name>
    <value>/root/apache-hive-1.1.0-bin/tmp</value>
  </property>
  <property>
    <name>system:user.name</name>
    <value>root</value>
  </property>
  <property>

启动

hive

如果你的的命令行变为了

hive>

说明 HIVE 已经启动成功了

如果你遇到了这类错误

SLF4J: Class path contains multiple SLF4J bindings.

可以执行以下指令，删除冲突的 jar 包

rm /root/apache-hive-1.1.0-bin/lib/hive-jdbc-1.1.0-standalone.jar

建表

create database test; -- 创建数据库
show databases; -- 查询已创建的数据库
use test; -- 使用该库
create table test1(hold string, city string, url string)row format delimited fields terminated by '\t';
-- 新建表，并将 \t 作为字段分隔符
desc test1; -- 查看表结构
quit;

插入数据

这是我使用的测试数据 small

curl -O https://st.blackyau.net/blog/13/small # 将文件下载到本地
hdfs dfs -put ./small / # 将文件传到HDFS

接下来的命令在 HIVE 下进行

use test; -- 使用 test 库
load data inpath "/small" into table test1; -- 将文件加载进 test1 表中
select * from test1;
select city, url from tesst1;

输出数据如下即导入成功

OK
白银	by.58.com/
庆阳	qingyang.58.com/
嘉峪关	jyg.58.com/
... 手动省略
宜宾	yb.58.com/
自贡	zg.58.com/
乐山	ls.58.com/
	NULL
Time taken: 0.062 seconds, Fetched: 51 row(s)

HIVE 参数设置

以下列出的参数大概是有点用的

<property>
  <!--- HIVE 大部分操作都会触发一个 MapReduce 修改该参数后它会尝试使用本地模式 以降低资源消耗-->
  <name>hive.exec.mode.local.auto</name>
  <value>true</value>
</property>

<property>
  <!--- 这是 HIVE 加载 JAR 的路径 -->
  <name>hive.aux.jars.path</name>
  <value>/root/apache-hive-1.1.0-bin/lib</value>
</property>

<property>
  <!--- 如果启用了 LIMIT 优化，这个用来控制 LIMIT 的最小行取样量 -->
  <name>hive.limit.row.max.size</name>
  <value>100000</value>
  <description>When trying a smaller subset of data for simple LIMIT, how much size we need to guarantee each row to have at least.</description>
</property>
<property>
  <!--- 如果启用了 LIMIT 优化，这个用来控制 LIMIT 的最大文件数 -->
  <name>hive.limit.optimize.limit.file</name>
  <value>10</value>
  <description>When trying a smaller subset of data for simple LIMIT, maximum number of files we can sample.</description>
</property>
<property>
  <!--- 是否启用 LIMIT 优化，这是对元数据进行抽样统计，有可能输入有用的数据永远不会被处理到。毕竟是抽样 -->
  <name>hive.limit.optimize.enable</name>
  <value>true</value>
  <description>Whether to enable to optimization to trying a smaller subset of data for simple LIMIT first.</description>
</property>

<property>
  <!--- 并行执行，一个查询可能会有多个阶段而且这些阶段可能并非完全相互依赖，所以阶段越多job可能就更快完成。同时它会增加对集群的利用率 -->
  <name>hive.exec.parallel</name>
  <value>true</value>
  <description>Whether to execute jobs in parallel</description>
</property>
<property>
  <!--- 并行执行最大线程数 -->
  <name>hive.exec.parallel.thread.number</name>
  <value>8</value>
  <description>How many jobs at most can be executed in parallel</description>
</property>

<property>
  <!--- 严格模式,如果修改为 strict 那么会禁止3种类型的查询:1.不限制分区查询(不允许用户扫描所有分区,where必须有两个或以上) 2.order by必须要LIMIT 3.限制笛卡积尔查询,多表连接查询应该使用join
  和on -->
  <name>hive.mapred.mode</name>
  <value>nonstrict</value>
  <description>
    The mode in which the Hive operations are being performed.
    In strict mode, some risky queries are not allowed to run. They include:
      Cartesian Product.
      No partition being picked up for a query.
      Comparing bigints and strings.
      Comparing bigints and doubles.
      Orderby without limit.
  </description>
</property>

<property>
  <!-- 因为Hive使用输入数据量的大小来确定reducer个数，修改这个数量就可以更改使用reducer的个数 -->
  <name>hive.exec.reducers.bytes.per.reducer</name>
  <value>256000000</value>
  <description>size per reducer.The default is 256Mb, i.e if the input size is 1G, it will use 4 reducers.</description>
</property>
<property>
  <!-- 设置一个查询最多可消耗的reducers的量-->
  <name>hive.exec.reducers.max</name>
  <value>1009</value>
  <description>
    max number of reducers will be used. If the one specified in the configuration parameter mapred.reduce.tasks is
    negative, Hive will use this one as the max number of reducers when automatically determine number of reducers.
  </description>
</property>

Scala 数据推送

先下载程序为配置做准备

curl -O https://mirrors.tuna.tsinghua.edu.cn/apache/sqoop/1.4.7/sqoop-1.4.7.bin__hadoop-2.6.0.tar.gz
tar -xf sqoop-1.4.7.bin__hadoop-2.6.0.tar.gz

配置环境变量

vi /etc/profile

在文本下方写入以下配置

export SQOOP_HOME=/root/sqoop-1.4.7.bin__hadoop-2.6.0
export PATH=$PATH:$SQOOP_HOME/bin

sqoop help

当终端输出以下信息时说明你的配置成功了

Warning: /root/sqoop-1.4.7.bin__hadoop-2.6.0/../hbase does not exist! HBase imports will fail.
Please set $HBASE_HOME to the root of your HBase installation.
Warning: /root/sqoop-1.4.7.bin__hadoop-2.6.0/../hcatalog does not exist! HCatalog jobs will fail.
Please set $HCAT_HOME to the root of your HCatalog installation.
Warning: /root/sqoop-1.4.7.bin__hadoop-2.6.0/../accumulo does not exist! Accumulo imports will fail.
Please set $ACCUMULO_HOME to the root of your Accumulo installation.
Warning: /root/sqoop-1.4.7.bin__hadoop-2.6.0/../zookeeper does not exist! Accumulo imports will fail.
Please set $ZOOKEEPER_HOME to the root of your Zookeeper installation.
19/04/20 18:55:32 INFO sqoop.Sqoop: Running Sqoop version: 1.4.7
usage: sqoop COMMAND [ARGS]

Available commands:
codegen Generate code to interact with database records
create-hive-table Import a table definition into Hive
eval Evaluate a SQL statement and display the results
export Export an HDFS directory to a database table
help List available commands
import Import a table from a database to HDFS
import-all-tables Import tables from a database to HDFS
import-mainframe Import datasets from a mainframe server to HDFS
job Work with saved jobs
list-databases List available databases on a server
list-tables List available tables in a database
merge Merge results of incremental imports
metastore Run a standalone Sqoop metastore
version Display version information

See 'sqoop help COMMAND' for information on a specific command.

如果你没有使用 HBase、HCatalog、Accumulo、Zookeeper 你可以忽略它的警告，但是如果你和我一样觉得烦。你可以通过注释相关代码以跳过检查。

## Moved to be a runtime check in sqoop.
#if [ ! -d "${HBASE_HOME}" ]; then
#  echo "Warning: $HBASE_HOME does not exist! HBase imports will fail."
#  echo 'Please set $HBASE_HOME to the root of your HBase installation.'
#fi

## Moved to be a runtime check in sqoop.
#if [ ! -d "${HCAT_HOME}" ]; then
#  echo "Warning: $HCAT_HOME does not exist! HCatalog jobs will fail."
#  echo 'Please set $HCAT_HOME to the root of your HCatalog installation.'
#fi

#if [ ! -d "${ACCUMULO_HOME}" ]; then
#  echo "Warning: $ACCUMULO_HOME does not exist! Accumulo imports will fail."
#  echo 'Please set $ACCUMULO_HOME to the root of your Accumulo installation.'
#fi
#if [ ! -d "${ZOOKEEPER_HOME}" ]; then
#  echo "Warning: $ZOOKEEPER_HOME does not exist! Accumulo imports will fail."
#  echo 'Please set $ZOOKEEPER_HOME to the root of your Zookeeper installation.'
#fi

安装 mysql

下载安装包

curl -O https://mirrors.tuna.tsinghua.edu.cn/mysql/yum/mysql57-community-el7/mysql-community-common-5.7.25-1.el7.x86_64.rpm
curl -O https://mirrors.tuna.tsinghua.edu.cn/mysql/yum/mysql57-community-el7/mysql-community-libs-5.7.25-1.el7.x86_64.rpm
curl -O https://mirrors.tuna.tsinghua.edu.cn/mysql/yum/mysql57-community-el7/mysql-community-client-5.7.25-1.el7.x86_64.rpm
curl -O https://mirrors.tuna.tsinghua.edu.cn/mysql/yum/mysql57-community-el7/mysql-community-server-5.7.25-1.el7.x86_64.rpm

卸载自带的 MariaDB 或之前安装的 MySQL

安装

rpm -ivh mysql-community-common-5.7.25-1.el7.x86_64.rpm
rpm -ivh mysql-community-libs-5.7.25-1.el7.x86_64.rpm
rpm -ivh mysql-community-client-5.7.25-1.el7.x86_64.rpm
rpm -ivh mysql-community-server-5.7.25-1.el7.x86_64.rpm

配置

mysqld --initialize --user=mysql # 使用 mysql 用户运行MySQL
cat /var/log/mysqld.log # 查看一下生成的临时密码

2019-04-21T01:53:51.675008Z 0 [Warning] TIMESTAMP with implicit DEFAULT value is deprecated. Please use --explicit_defaults_for_timestamp server option (see documentation for more details).
2019-04-21T01:53:52.388438Z 0 [Warning] InnoDB: New log files created, LSN=45790
2019-04-21T01:53:52.484682Z 0 [Warning] InnoDB: Creating foreign key constraint system tables.
2019-04-21T01:53:52.549955Z 0 [Warning] No existing UUID has been found, so we assume that this is the first time that this server has been started. Generating a new UUID: 5075e10a-63d8-11e9-9cd1-000c2937e0f0.
2019-04-21T01:53:52.552680Z 0 [Warning] Gtid table is not ready to be used. Table 'mysql.gtid_executed' cannot be opened.
2019-04-21T01:53:52.583134Z 1 [Note] A temporary password is generated for root@localhost: zeAxOg16AZ!7

如上: zeAxOg16AZ!7 就是密码

接下来启动 MySQL

systemctl start mysqld # 启动
systemctl status mysqld # 查看运行状态,看到active (running)就行了

mysql -u root -p # 以root身份登录MySQL,密码就是上面的密码
SET PASSWORD = PASSWORD('your_new_password'); # 修改一下密码
select version();

输出以下信息说明配置成功

+-----------+
| version() |
+-----------+
| 5.7.25    |
+-----------+
1 row in set (0.00 sec)

以上就是本文的全部内容，希望对大家的学习有所帮助，也希望大家多多支持码农网

查看所有标签

猜你喜欢:

本站部分资源来源于网络，本站转载出于传递更多信息之目的，版权归原作者或者来源机构所有，如转载稿涉及版权问题，请联系我们。

码农书籍

四步创业法

[美] Steven Gary Blank / 七印部落 / 华中科技大学出版社 / 2012-8-1 / 48.00

《四步创业法》获李开复推荐，是精益创业理论的奠基之作。作者Steven Gary Blank博士是硅谷资深企业家，他一共创办了八家企业，并担任多家硅谷公司的董事和创业顾问。本书总结作者25年创业经验，提出全新的客户发展方法（诣在弥补传统产品开发方法的缺陷），掀起了硅谷近年精益创业的浪潮。七印部落正在翻译作者的博客和授课视频，欢迎访问微博：http://weibo.com/7seals ......一起来看看《四步创业法》这本书的介绍吧!

码农工具

Hive 配置

准备工作

配置环境变量

修改配置文件

启动

建表

插入数据

HIVE 参数设置

Scala 数据推送

配置环境变量

安装 mysql

四步创业法

CSS 压缩/解压工具

html转js在线工具

UNIX 时间戳转换