macOS 下体验 Hadoop

栏目: 服务器 · 发布时间: 6年前

内容简介:原文地址:PS: 首先确保你已正常安装了默认情况下,你安装的 hadoop 配置目录在

原文地址: https://crowall.com/topic/84

PS: 首先确保你已正常安装了 brew 和 JDK。

1. 安装 Hadoop

brew install hadoop

2. 配置

export HADOOP_HOME=/usr/local/Cellar/hadoop/3.0.0/

默认情况下,你安装的 hadoop 配置目录在 /usr/local/Cellar/hadoop/3.0.0/libexec/etc/hadoop/ (注意版本号差别,可以先直接看下这个目录下你安装的版本 /usr/local/Cellar/hadoop/ )。

MBP:~ tony$ ll /usr/local/Cellar/hadoop/3.0.0/libexec/etc/hadoop/
total 304
drwxr-xr-x  30 tony  admin   960B  5  3 17:43 .
drwxr-xr-x   3 tony  admin    96B 12  9 03:17 ..
-rw-r--r--   1 tony  admin   7.7K 12  9 03:30 capacity-scheduler.xml
-rw-r--r--   1 tony  admin   1.3K 12  9 03:32 configuration.xsl
-rw-r--r--   1 tony  admin   1.2K 12  9 03:30 container-executor.cfg
-rw-r--r--   1 tony  admin   774B 12  9 03:17 core-site.xml
-rw-r--r--   1 tony  admin    16K 12  9 03:42 hadoop-env.sh
-rw-r--r--   1 tony  admin   3.2K 12  9 03:17 hadoop-metrics2.properties
-rw-r--r--   1 tony  admin    10K 12  9 03:17 hadoop-policy.xml
-rw-r--r--   1 tony  admin   3.3K 12  9 03:17 hadoop-user-functions.sh.example
-rw-r--r--   1 tony  admin   775B 12  9 03:19 hdfs-site.xml
-rw-r--r--   1 tony  admin   1.4K 12  9 03:19 httpfs-env.sh
-rw-r--r--   1 tony  admin   1.6K 12  9 03:19 httpfs-log4j.properties
-rw-r--r--   1 tony  admin    21B 12  9 03:19 httpfs-signature.secret
-rw-r--r--   1 tony  admin   620B 12  9 03:19 httpfs-site.xml
-rw-r--r--   1 tony  admin   3.4K 12  9 03:17 kms-acls.xml
-rw-r--r--   1 tony  admin   1.3K 12  9 03:17 kms-env.sh
-rw-r--r--   1 tony  admin   1.7K 12  9 03:17 kms-log4j.properties
-rw-r--r--   1 tony  admin   682B 12  9 03:17 kms-site.xml
-rw-r--r--   1 tony  admin    13K 12  9 03:17 log4j.properties
-rw-r--r--   1 tony  admin   1.7K 12  9 03:32 mapred-env.sh
-rw-r--r--   1 tony  admin   4.0K 12  9 03:32 mapred-queues.xml.template
-rw-r--r--   1 tony  admin   758B 12  9 03:32 mapred-site.xml
drwxr-xr-x   3 tony  admin    96B 12  9 03:17 shellprofile.d
-rw-r--r--   1 tony  admin   2.3K 12  9 03:17 ssl-client.xml.example
-rw-r--r--   1 tony  admin   2.6K 12  9 03:17 ssl-server.xml.example
-rw-r--r--   1 tony  admin   2.6K 12  9 03:19 user_ec_policies.xml.template
-rw-r--r--   1 tony  admin    10B 12  9 03:17 workers
-rw-r--r--   1 tony  admin   5.3K 12  9 03:30 yarn-env.sh
-rw-r--r--   1 tony  admin   690B 12  9 03:30 yarn-site.xml

编辑 hadoop-env.sh 文件

cd /usr/local/Cellar/hadoop/3.0.0/libexec/etc/hadoop/
vim hadoop-env.sh

# 查找 HADOOP_OPTS
MBP:hadoop tony$ cat hadoop-env.sh |grep -n "export HADOOP_OPTS"
90:# export HADOOP_OPTS="-Djava.net.preferIPv4Stack=true"
92:# export HADOOP_OPTS="-Djava.net.preferIPv4Stack=true -Dsun.security.krb5.debug=true -Dsun.security.spnego.debug"
106:    export HADOOP_OPTS="${HADOOP_OPTS} -Djava.security.krb5.realm= "
107:    export HADOOP_OPTS="${HADOOP_OPTS} -Djava.security.krb5.kdc= "
108:    export HADOOP_OPTS="${HADOOP_OPTS} -Djava.security.krb5.conf= "

# 第 92 行取消注释,并加上一行 JAVA_HOME (注意不要直接 Copy 我的路径)

export HADOOP_OPTS="-Djava.net.preferIPv4Stack=true -Dsun.security.krb5.debug=true -Dsun.security.spnego.debug"
export JAVA_HOME="/Library/Java/JavaVirtualMachines/jdk1.8.0_162.jdk/Contents/Home"

配置 HDFS 的访问地址及存储路径

# 配置 hadoop TMP 目录路径 (此处请随意,自己建个地址也行)

mkdir -p /tmp/hadoop/hdfs/tmp
chmod -R 777 /tmp/hadoop/hdfs/tmp

# 修改 core-site.xml 文件
vim core-site.xml

# 加上配置的属性

<configuration>
    <property>
        <name>hadoop.tmp.dir</name>
        <value>/tmp/hadoop/hdfs/tmp</value>
        <description>A base for other temporary directories.</description>
    </property>

    <property>
        <name>fs.default.name</name>
        <value>hdfs://localhost:8020</value>
    </property>
</configuration>

设置 MapReduce 的访问地址

vim mapred-site.xml

<configuration>
    <property>
        <name>mapred.job.tracker</name>
        <value>localhost:8021</value>
    </property>
</configuration>

设置备份机制

我们本地运行是伪分布式,不需要默认的备份 3份了,改为 1份即可。

vim hdfs-site.xml

<configuration>
    <property>
        <name>dfs.replication</name>
        <value>1</value>
    </property>
</configuration>

格式化

hdfs namenode -format

执行结果

MBP:hadoop tony$ hdfs namenode -format
WARNING: /usr/local/Cellar/hadoop/3.0.0/libexec/logs does not exist. Creating.
2018-05-18 14:13:14,899 INFO namenode.NameNode: STARTUP_MSG: 
/************************************************************
STARTUP_MSG: Starting NameNode
STARTUP_MSG:   host = MBP.local/{我的IP...}
STARTUP_MSG:   args = [-format]
STARTUP_MSG:   version = 3.0.0

..... 此处省略一大段

2018-05-18 14:13:16,119 INFO namenode.FSImage: Allocated new BlockPoolId: BP-1297562978-{我的IP...}-1526623996110
2018-05-18 14:13:16,137 INFO common.Storage: Storage directory /tmp/hadoop/hdfs/tmp/dfs/name has been successfully formatted.
2018-05-18 14:13:16,177 INFO namenode.FSImageFormatProtobuf: Saving image file /tmp/hadoop/hdfs/tmp/dfs/name/current/fsimage.ckpt_0000000000000000000 using no compression
2018-05-18 14:13:16,312 INFO namenode.FSImageFormatProtobuf: Image file /tmp/hadoop/hdfs/tmp/dfs/name/current/fsimage.ckpt_0000000000000000000 of size 389 bytes saved in 0 seconds.
2018-05-18 14:13:16,329 INFO namenode.NNStorageRetentionManager: Going to retain 1 images with txid >= 0
2018-05-18 14:13:16,335 INFO namenode.NameNode: SHUTDOWN_MSG: 
/************************************************************
SHUTDOWN_MSG: Shutting down NameNode at MBP.local/{我的IP...}
************************************************************/

终于完成了配置,准备开始跑了。

3. 运行

启动 HDFS

hadoop 的可执行程序在 /usr/local/Cellar/hadoop/3.0.0/sbin/ 目录下(注意版本号)。

cd /usr/local/Cellar/hadoop/3.0.0/sbin/

./start-dfs.sh  //启动 HDFS
./stop-dfs.sh   //停止 HDFS


MBP:sbin tony$ ./start-dfs.sh
Starting namenodes on [localhost]
Starting datanodes
Starting secondary namenodes [MBP.local]
2018-05-18 14:38:31,125 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable

# 可以通过 jps 命令查看进程
MBP:sbin tony$ jps
16816 Jps
56752 
90759 NameNode
69335 Launcher
91002 SecondaryNameNode
98799 
90863 DataNode

启动 HDFS 常见问题

问题1: localhost: ssh: connect to host localhost port 22: Connection refused 这个问题,参照下面的解决方法:

解决办法;

设置所有用户允许远程登录:

系统偏好设置 -> 共享 -> 远程登录 -> 允许访问 => 所有用户

macOS 下体验 Hadoop

然后配置 SSH

ssh-keygen -t rsa
cat ~/.ssh/id_rsa.pub >> ~/.ssh/authorized_keys

最后重新启动即可。

问题2: WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable

开启日志重新运行 ./start-dfs.sh ,查看日志:

MBP:sbin tony$ ./start-dfs.sh 
Starting namenodes on [localhost]
localhost: namenode is running as process 80560.  Stop it first.
Starting datanodes
localhost: datanode is running as process 80661.  Stop it first.
Starting secondary namenodes [MBP.local]
MBP.local: secondarynamenode is running as process 80796.  Stop it first.
2018-05-18 14:58:45,871 DEBUG util.Shell: setsid is not available on this machine. So not using it.
2018-05-18 14:58:45,872 DEBUG util.Shell: setsid exited with exit code 0
2018-05-18 14:58:46,065 DEBUG lib.MutableMetricsFactory: field org.apache.hadoop.metrics2.lib.MutableRate org.apache.hadoop.security.UserGroupInformation$UgiMetrics.loginSuccess with annotation @org.apache.hadoop.metrics2.annotation.Metric(about=, sampleName=Ops, always=false, type=DEFAULT, value=[Rate of successful kerberos logins and latency (milliseconds)], valueName=Time)
2018-05-18 14:58:46,076 DEBUG lib.MutableMetricsFactory: field org.apache.hadoop.metrics2.lib.MutableRate org.apache.hadoop.security.UserGroupInformation$UgiMetrics.loginFailure with annotation @org.apache.hadoop.metrics2.annotation.Metric(about=, sampleName=Ops, always=false, type=DEFAULT, value=[Rate of failed kerberos logins and latency (milliseconds)], valueName=Time)
2018-05-18 14:58:46,076 DEBUG lib.MutableMetricsFactory: field org.apache.hadoop.metrics2.lib.MutableRate org.apache.hadoop.security.UserGroupInformation$UgiMetrics.getGroups with annotation @org.apache.hadoop.metrics2.annotation.Metric(about=, sampleName=Ops, always=false, type=DEFAULT, value=[GetGroups], valueName=Time)
2018-05-18 14:58:46,076 DEBUG lib.MutableMetricsFactory: field private org.apache.hadoop.metrics2.lib.MutableGaugeLong org.apache.hadoop.security.UserGroupInformation$UgiMetrics.renewalFailuresTotal with annotation @org.apache.hadoop.metrics2.annotation.Metric(about=, sampleName=Ops, always=false, type=DEFAULT, value=[Renewal failures since startup], valueName=Time)
2018-05-18 14:58:46,077 DEBUG lib.MutableMetricsFactory: field private org.apache.hadoop.metrics2.lib.MutableGaugeInt org.apache.hadoop.security.UserGroupInformation$UgiMetrics.renewalFailures with annotation @org.apache.hadoop.metrics2.annotation.Metric(about=, sampleName=Ops, always=false, type=DEFAULT, value=[Renewal failures since last successful login], valueName=Time)
2018-05-18 14:58:46,078 DEBUG impl.MetricsSystemImpl: UgiMetrics, User and group related metrics
2018-05-18 14:58:46,108 DEBUG security.SecurityUtil: Setting hadoop.security.token.service.use_ip to true
2018-05-18 14:58:46,138 DEBUG security.Groups:  Creating new Groups object
2018-05-18 14:58:46,140 DEBUG util.NativeCodeLoader: Trying to load the custom-built native-hadoop library...
2018-05-18 14:58:46,142 DEBUG util.NativeCodeLoader: Failed to load native-hadoop with error: java.lang.UnsatisfiedLinkError: no hadoop in java.library.path
2018-05-18 14:58:46,142 DEBUG util.NativeCodeLoader: java.library.path=/Users/tn-ma-l30000122/Library/Java/Extensions:/Library/Java/Extensions:/Network/Library/Java/Extensions:/System/Library/Java/Extensions:/usr/lib/java:.
2018-05-18 14:58:46,142 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
2018-05-18 14:58:46,143 DEBUG util.PerformanceAdvisory: Falling back to shell based
2018-05-18 14:58:46,145 DEBUG security.JniBasedUnixGroupsMappingWithFallback: Group mapping impl=org.apache.hadoop.security.ShellBasedUnixGroupsMapping
2018-05-18 14:58:46,256 DEBUG security.Groups: Group mapping impl=org.apache.hadoop.security.JniBasedUnixGroupsMappingWithFallback; cacheTimeout=300000; warningDeltaMs=5000
2018-05-18 14:58:46,260 DEBUG security.UserGroupInformation: hadoop login
2018-05-18 14:58:46,261 DEBUG security.UserGroupInformation: hadoop login commit
2018-05-18 14:58:46,264 DEBUG security.UserGroupInformation: using local user:UnixPrincipal: tony
2018-05-18 14:58:46,264 DEBUG security.UserGroupInformation: Using user: "UnixPrincipal: tony" with name tony
2018-05-18 14:58:46,264 DEBUG security.UserGroupInformation: User entry: "tony"
2018-05-18 14:58:46,265 DEBUG security.UserGroupInformation: UGI loginUser:tony (auth:SIMPLE)
2018-05-18 14:58:46,266 DEBUG security.UserGroupInformation: PrivilegedAction as:tony (auth:SIMPLE) from:org.apache.hadoop.hdfs.tools.GetConf.run(GetConf.java:315)

看了网上的解决办法,都是 hadoop 2.x 的,我这里是 3.0,所以不行,Google 了一下,发现 StackOverFlow 上发现了这个帖子

hadoop WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform… using builtin-java classes where applicable

人家的回答是直接改日志级别。。。 然后就可以只显示错误,不显示警告 :joy:

# 修改日志文件 etc/hadoop/log4j.properties
vim /usr/local/Cellar/hadoop/3.0.0/libexec/etc/hadoop/log4j.properties

# 加一行
log4j.logger.org.apache.hadoop.util.NativeCodeLoader=ERROR

4. 体验

命令行操作

创建目录

MBP:sbin tony$ hadoop fs -ls /
MBP:sbin tony$ hadoop fs -mkdir /demo
MBP:sbin tony$ hadoop fs -ls /
Found 1 items
drwxr-xr-x   - tony supergroup          0 2018-05-22 11:06 /demo

用 hdfs 命令创建目录

cd /usr/local/Cellar/hadoop/3.0.0/
bin/hdfs dfs -mkdir /user
bin/hdfs dfs -mkdir /user/tony

复制一些文件进去

cd /usr/local/Cellar/hadoop/3.0.0/
bin/hdfs dfs -mkdir input
bin/hdfs dfs -put libexec/etc/hadoop/*.xml input

执行例子

bin/hadoop jar libexec/share/hadoop/mapreduce/hadoop-mapreduce-examples-3.0.0.jar grep input output 'dfs[a-z.]+'

查看验证

bin/hdfs dfs -get output output
cat output/*

或者

bin/hdfs dfs -cat output/*

Web 可视化界面

打开 http://localhost:9870/ 发现此时已经能访问了。如图:

macOS 下体验 Hadoop

可以看到集群详细的信息:

macOS 下体验 Hadoop

可以看到用命令行创建的目录: macOS 下体验 Hadoop

复制进去的文件(每个文件占用一个 block,128M...): macOS 下体验 Hadoop

参考链接


以上就是本文的全部内容,希望对大家的学习有所帮助,也希望大家多多支持 码农网

查看所有标签

本站部分资源来源于网络,本站转载出于传递更多信息之目的,版权归原作者或者来源机构所有,如转载稿涉及版权问题,请联系我们

The Effective Engineer

The Effective Engineer

Edmond Lau / The Effective Bookshelf, Palo Alto, CA. / 2015-3-19 / USD 39.00

Introducing The Effective Engineer — the only book designed specifically for today's software engineers, based on extensive interviews with engineering leaders at top tech companies, and packed with h......一起来看看 《The Effective Engineer》 这本书的介绍吧!

在线进制转换器
在线进制转换器

各进制数互转换器

HTML 编码/解码
HTML 编码/解码

HTML 编码/解码

HSV CMYK 转换工具
HSV CMYK 转换工具

HSV CMYK互换工具