macOS 下体验 Hadoop

栏目: 服务器 · 发布时间: 5年前

内容简介:原文地址:PS: 首先确保你已正常安装了默认情况下,你安装的 hadoop 配置目录在

原文地址: https://crowall.com/topic/84

PS: 首先确保你已正常安装了 brew 和 JDK。

1. 安装 Hadoop

brew install hadoop

2. 配置

export HADOOP_HOME=/usr/local/Cellar/hadoop/3.0.0/

默认情况下,你安装的 hadoop 配置目录在 /usr/local/Cellar/hadoop/3.0.0/libexec/etc/hadoop/ (注意版本号差别,可以先直接看下这个目录下你安装的版本 /usr/local/Cellar/hadoop/ )。

MBP:~ tony$ ll /usr/local/Cellar/hadoop/3.0.0/libexec/etc/hadoop/
total 304
drwxr-xr-x  30 tony  admin   960B  5  3 17:43 .
drwxr-xr-x   3 tony  admin    96B 12  9 03:17 ..
-rw-r--r--   1 tony  admin   7.7K 12  9 03:30 capacity-scheduler.xml
-rw-r--r--   1 tony  admin   1.3K 12  9 03:32 configuration.xsl
-rw-r--r--   1 tony  admin   1.2K 12  9 03:30 container-executor.cfg
-rw-r--r--   1 tony  admin   774B 12  9 03:17 core-site.xml
-rw-r--r--   1 tony  admin    16K 12  9 03:42 hadoop-env.sh
-rw-r--r--   1 tony  admin   3.2K 12  9 03:17 hadoop-metrics2.properties
-rw-r--r--   1 tony  admin    10K 12  9 03:17 hadoop-policy.xml
-rw-r--r--   1 tony  admin   3.3K 12  9 03:17 hadoop-user-functions.sh.example
-rw-r--r--   1 tony  admin   775B 12  9 03:19 hdfs-site.xml
-rw-r--r--   1 tony  admin   1.4K 12  9 03:19 httpfs-env.sh
-rw-r--r--   1 tony  admin   1.6K 12  9 03:19 httpfs-log4j.properties
-rw-r--r--   1 tony  admin    21B 12  9 03:19 httpfs-signature.secret
-rw-r--r--   1 tony  admin   620B 12  9 03:19 httpfs-site.xml
-rw-r--r--   1 tony  admin   3.4K 12  9 03:17 kms-acls.xml
-rw-r--r--   1 tony  admin   1.3K 12  9 03:17 kms-env.sh
-rw-r--r--   1 tony  admin   1.7K 12  9 03:17 kms-log4j.properties
-rw-r--r--   1 tony  admin   682B 12  9 03:17 kms-site.xml
-rw-r--r--   1 tony  admin    13K 12  9 03:17 log4j.properties
-rw-r--r--   1 tony  admin   1.7K 12  9 03:32 mapred-env.sh
-rw-r--r--   1 tony  admin   4.0K 12  9 03:32 mapred-queues.xml.template
-rw-r--r--   1 tony  admin   758B 12  9 03:32 mapred-site.xml
drwxr-xr-x   3 tony  admin    96B 12  9 03:17 shellprofile.d
-rw-r--r--   1 tony  admin   2.3K 12  9 03:17 ssl-client.xml.example
-rw-r--r--   1 tony  admin   2.6K 12  9 03:17 ssl-server.xml.example
-rw-r--r--   1 tony  admin   2.6K 12  9 03:19 user_ec_policies.xml.template
-rw-r--r--   1 tony  admin    10B 12  9 03:17 workers
-rw-r--r--   1 tony  admin   5.3K 12  9 03:30 yarn-env.sh
-rw-r--r--   1 tony  admin   690B 12  9 03:30 yarn-site.xml

编辑 hadoop-env.sh 文件

cd /usr/local/Cellar/hadoop/3.0.0/libexec/etc/hadoop/
vim hadoop-env.sh

# 查找 HADOOP_OPTS
MBP:hadoop tony$ cat hadoop-env.sh |grep -n "export HADOOP_OPTS"
90:# export HADOOP_OPTS="-Djava.net.preferIPv4Stack=true"
92:# export HADOOP_OPTS="-Djava.net.preferIPv4Stack=true -Dsun.security.krb5.debug=true -Dsun.security.spnego.debug"
106:    export HADOOP_OPTS="${HADOOP_OPTS} -Djava.security.krb5.realm= "
107:    export HADOOP_OPTS="${HADOOP_OPTS} -Djava.security.krb5.kdc= "
108:    export HADOOP_OPTS="${HADOOP_OPTS} -Djava.security.krb5.conf= "

# 第 92 行取消注释,并加上一行 JAVA_HOME (注意不要直接 Copy 我的路径)

export HADOOP_OPTS="-Djava.net.preferIPv4Stack=true -Dsun.security.krb5.debug=true -Dsun.security.spnego.debug"
export JAVA_HOME="/Library/Java/JavaVirtualMachines/jdk1.8.0_162.jdk/Contents/Home"

配置 HDFS 的访问地址及存储路径

# 配置 hadoop TMP 目录路径 (此处请随意,自己建个地址也行)

mkdir -p /tmp/hadoop/hdfs/tmp
chmod -R 777 /tmp/hadoop/hdfs/tmp

# 修改 core-site.xml 文件
vim core-site.xml

# 加上配置的属性

<configuration>
    <property>
        <name>hadoop.tmp.dir</name>
        <value>/tmp/hadoop/hdfs/tmp</value>
        <description>A base for other temporary directories.</description>
    </property>

    <property>
        <name>fs.default.name</name>
        <value>hdfs://localhost:8020</value>
    </property>
</configuration>

设置 MapReduce 的访问地址

vim mapred-site.xml

<configuration>
    <property>
        <name>mapred.job.tracker</name>
        <value>localhost:8021</value>
    </property>
</configuration>

设置备份机制

我们本地运行是伪分布式,不需要默认的备份 3份了,改为 1份即可。

vim hdfs-site.xml

<configuration>
    <property>
        <name>dfs.replication</name>
        <value>1</value>
    </property>
</configuration>

格式化

hdfs namenode -format

执行结果

MBP:hadoop tony$ hdfs namenode -format
WARNING: /usr/local/Cellar/hadoop/3.0.0/libexec/logs does not exist. Creating.
2018-05-18 14:13:14,899 INFO namenode.NameNode: STARTUP_MSG: 
/************************************************************
STARTUP_MSG: Starting NameNode
STARTUP_MSG:   host = MBP.local/{我的IP...}
STARTUP_MSG:   args = [-format]
STARTUP_MSG:   version = 3.0.0

..... 此处省略一大段

2018-05-18 14:13:16,119 INFO namenode.FSImage: Allocated new BlockPoolId: BP-1297562978-{我的IP...}-1526623996110
2018-05-18 14:13:16,137 INFO common.Storage: Storage directory /tmp/hadoop/hdfs/tmp/dfs/name has been successfully formatted.
2018-05-18 14:13:16,177 INFO namenode.FSImageFormatProtobuf: Saving image file /tmp/hadoop/hdfs/tmp/dfs/name/current/fsimage.ckpt_0000000000000000000 using no compression
2018-05-18 14:13:16,312 INFO namenode.FSImageFormatProtobuf: Image file /tmp/hadoop/hdfs/tmp/dfs/name/current/fsimage.ckpt_0000000000000000000 of size 389 bytes saved in 0 seconds.
2018-05-18 14:13:16,329 INFO namenode.NNStorageRetentionManager: Going to retain 1 images with txid >= 0
2018-05-18 14:13:16,335 INFO namenode.NameNode: SHUTDOWN_MSG: 
/************************************************************
SHUTDOWN_MSG: Shutting down NameNode at MBP.local/{我的IP...}
************************************************************/

终于完成了配置,准备开始跑了。

3. 运行

启动 HDFS

hadoop 的可执行程序在 /usr/local/Cellar/hadoop/3.0.0/sbin/ 目录下(注意版本号)。

cd /usr/local/Cellar/hadoop/3.0.0/sbin/

./start-dfs.sh  //启动 HDFS
./stop-dfs.sh   //停止 HDFS


MBP:sbin tony$ ./start-dfs.sh
Starting namenodes on [localhost]
Starting datanodes
Starting secondary namenodes [MBP.local]
2018-05-18 14:38:31,125 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable

# 可以通过 jps 命令查看进程
MBP:sbin tony$ jps
16816 Jps
56752 
90759 NameNode
69335 Launcher
91002 SecondaryNameNode
98799 
90863 DataNode

启动 HDFS 常见问题

问题1: localhost: ssh: connect to host localhost port 22: Connection refused 这个问题,参照下面的解决方法:

解决办法;

设置所有用户允许远程登录:

系统偏好设置 -> 共享 -> 远程登录 -> 允许访问 => 所有用户

macOS 下体验 Hadoop

然后配置 SSH

ssh-keygen -t rsa
cat ~/.ssh/id_rsa.pub >> ~/.ssh/authorized_keys

最后重新启动即可。

问题2: WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable

开启日志重新运行 ./start-dfs.sh ,查看日志:

MBP:sbin tony$ ./start-dfs.sh 
Starting namenodes on [localhost]
localhost: namenode is running as process 80560.  Stop it first.
Starting datanodes
localhost: datanode is running as process 80661.  Stop it first.
Starting secondary namenodes [MBP.local]
MBP.local: secondarynamenode is running as process 80796.  Stop it first.
2018-05-18 14:58:45,871 DEBUG util.Shell: setsid is not available on this machine. So not using it.
2018-05-18 14:58:45,872 DEBUG util.Shell: setsid exited with exit code 0
2018-05-18 14:58:46,065 DEBUG lib.MutableMetricsFactory: field org.apache.hadoop.metrics2.lib.MutableRate org.apache.hadoop.security.UserGroupInformation$UgiMetrics.loginSuccess with annotation @org.apache.hadoop.metrics2.annotation.Metric(about=, sampleName=Ops, always=false, type=DEFAULT, value=[Rate of successful kerberos logins and latency (milliseconds)], valueName=Time)
2018-05-18 14:58:46,076 DEBUG lib.MutableMetricsFactory: field org.apache.hadoop.metrics2.lib.MutableRate org.apache.hadoop.security.UserGroupInformation$UgiMetrics.loginFailure with annotation @org.apache.hadoop.metrics2.annotation.Metric(about=, sampleName=Ops, always=false, type=DEFAULT, value=[Rate of failed kerberos logins and latency (milliseconds)], valueName=Time)
2018-05-18 14:58:46,076 DEBUG lib.MutableMetricsFactory: field org.apache.hadoop.metrics2.lib.MutableRate org.apache.hadoop.security.UserGroupInformation$UgiMetrics.getGroups with annotation @org.apache.hadoop.metrics2.annotation.Metric(about=, sampleName=Ops, always=false, type=DEFAULT, value=[GetGroups], valueName=Time)
2018-05-18 14:58:46,076 DEBUG lib.MutableMetricsFactory: field private org.apache.hadoop.metrics2.lib.MutableGaugeLong org.apache.hadoop.security.UserGroupInformation$UgiMetrics.renewalFailuresTotal with annotation @org.apache.hadoop.metrics2.annotation.Metric(about=, sampleName=Ops, always=false, type=DEFAULT, value=[Renewal failures since startup], valueName=Time)
2018-05-18 14:58:46,077 DEBUG lib.MutableMetricsFactory: field private org.apache.hadoop.metrics2.lib.MutableGaugeInt org.apache.hadoop.security.UserGroupInformation$UgiMetrics.renewalFailures with annotation @org.apache.hadoop.metrics2.annotation.Metric(about=, sampleName=Ops, always=false, type=DEFAULT, value=[Renewal failures since last successful login], valueName=Time)
2018-05-18 14:58:46,078 DEBUG impl.MetricsSystemImpl: UgiMetrics, User and group related metrics
2018-05-18 14:58:46,108 DEBUG security.SecurityUtil: Setting hadoop.security.token.service.use_ip to true
2018-05-18 14:58:46,138 DEBUG security.Groups:  Creating new Groups object
2018-05-18 14:58:46,140 DEBUG util.NativeCodeLoader: Trying to load the custom-built native-hadoop library...
2018-05-18 14:58:46,142 DEBUG util.NativeCodeLoader: Failed to load native-hadoop with error: java.lang.UnsatisfiedLinkError: no hadoop in java.library.path
2018-05-18 14:58:46,142 DEBUG util.NativeCodeLoader: java.library.path=/Users/tn-ma-l30000122/Library/Java/Extensions:/Library/Java/Extensions:/Network/Library/Java/Extensions:/System/Library/Java/Extensions:/usr/lib/java:.
2018-05-18 14:58:46,142 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
2018-05-18 14:58:46,143 DEBUG util.PerformanceAdvisory: Falling back to shell based
2018-05-18 14:58:46,145 DEBUG security.JniBasedUnixGroupsMappingWithFallback: Group mapping impl=org.apache.hadoop.security.ShellBasedUnixGroupsMapping
2018-05-18 14:58:46,256 DEBUG security.Groups: Group mapping impl=org.apache.hadoop.security.JniBasedUnixGroupsMappingWithFallback; cacheTimeout=300000; warningDeltaMs=5000
2018-05-18 14:58:46,260 DEBUG security.UserGroupInformation: hadoop login
2018-05-18 14:58:46,261 DEBUG security.UserGroupInformation: hadoop login commit
2018-05-18 14:58:46,264 DEBUG security.UserGroupInformation: using local user:UnixPrincipal: tony
2018-05-18 14:58:46,264 DEBUG security.UserGroupInformation: Using user: "UnixPrincipal: tony" with name tony
2018-05-18 14:58:46,264 DEBUG security.UserGroupInformation: User entry: "tony"
2018-05-18 14:58:46,265 DEBUG security.UserGroupInformation: UGI loginUser:tony (auth:SIMPLE)
2018-05-18 14:58:46,266 DEBUG security.UserGroupInformation: PrivilegedAction as:tony (auth:SIMPLE) from:org.apache.hadoop.hdfs.tools.GetConf.run(GetConf.java:315)

看了网上的解决办法,都是 hadoop 2.x 的,我这里是 3.0,所以不行,Google 了一下,发现 StackOverFlow 上发现了这个帖子

hadoop WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform… using builtin-java classes where applicable

人家的回答是直接改日志级别。。。 然后就可以只显示错误,不显示警告 :joy:

# 修改日志文件 etc/hadoop/log4j.properties
vim /usr/local/Cellar/hadoop/3.0.0/libexec/etc/hadoop/log4j.properties

# 加一行
log4j.logger.org.apache.hadoop.util.NativeCodeLoader=ERROR

4. 体验

命令行操作

创建目录

MBP:sbin tony$ hadoop fs -ls /
MBP:sbin tony$ hadoop fs -mkdir /demo
MBP:sbin tony$ hadoop fs -ls /
Found 1 items
drwxr-xr-x   - tony supergroup          0 2018-05-22 11:06 /demo

用 hdfs 命令创建目录

cd /usr/local/Cellar/hadoop/3.0.0/
bin/hdfs dfs -mkdir /user
bin/hdfs dfs -mkdir /user/tony

复制一些文件进去

cd /usr/local/Cellar/hadoop/3.0.0/
bin/hdfs dfs -mkdir input
bin/hdfs dfs -put libexec/etc/hadoop/*.xml input

执行例子

bin/hadoop jar libexec/share/hadoop/mapreduce/hadoop-mapreduce-examples-3.0.0.jar grep input output 'dfs[a-z.]+'

查看验证

bin/hdfs dfs -get output output
cat output/*

或者

bin/hdfs dfs -cat output/*

Web 可视化界面

打开 http://localhost:9870/ 发现此时已经能访问了。如图:

macOS 下体验 Hadoop

可以看到集群详细的信息:

macOS 下体验 Hadoop

可以看到用命令行创建的目录: macOS 下体验 Hadoop

复制进去的文件(每个文件占用一个 block,128M...): macOS 下体验 Hadoop

参考链接


以上就是本文的全部内容,希望对大家的学习有所帮助,也希望大家多多支持 码农网

查看所有标签

本站部分资源来源于网络,本站转载出于传递更多信息之目的,版权归原作者或者来源机构所有,如转载稿涉及版权问题,请联系我们

只是为了好玩

只是为了好玩

Linus Torvalds、David Diamond / 陈少芸 / 人民邮电出版社 / 2014-7 / 49.00 元

本书是Linux之父Linus Torvalds的自传。 Linux之父Linus Torvalds的自传,也是Linus唯一一本书。Linus以调侃的语气讲述了自己的成长经历,在他看来,一切都是为了好玩儿,兴趣引发革命。书中内容共分为五章,一部分是Linus自己写的,一部分是合著者David Diamond的评论。 林纳斯•托瓦兹 当今世界最著名的程序员、黑客,开源操作系统Linux......一起来看看 《只是为了好玩》 这本书的介绍吧!

CSS 压缩/解压工具
CSS 压缩/解压工具

在线压缩/解压 CSS 代码

JSON 在线解析
JSON 在线解析

在线 JSON 格式化工具

RGB转16进制工具
RGB转16进制工具

RGB HEX 互转工具