内容简介:原文地址:PS: 首先确保你已正常安装了默认情况下,你安装的 hadoop 配置目录在
原文地址: https://crowall.com/topic/84
PS: 首先确保你已正常安装了 brew
和 JDK。
1. 安装 Hadoop
brew install hadoop
2. 配置
export HADOOP_HOME=/usr/local/Cellar/hadoop/3.0.0/
默认情况下,你安装的 hadoop 配置目录在 /usr/local/Cellar/hadoop/3.0.0/libexec/etc/hadoop/
(注意版本号差别,可以先直接看下这个目录下你安装的版本 /usr/local/Cellar/hadoop/
)。
MBP:~ tony$ ll /usr/local/Cellar/hadoop/3.0.0/libexec/etc/hadoop/ total 304 drwxr-xr-x 30 tony admin 960B 5 3 17:43 . drwxr-xr-x 3 tony admin 96B 12 9 03:17 .. -rw-r--r-- 1 tony admin 7.7K 12 9 03:30 capacity-scheduler.xml -rw-r--r-- 1 tony admin 1.3K 12 9 03:32 configuration.xsl -rw-r--r-- 1 tony admin 1.2K 12 9 03:30 container-executor.cfg -rw-r--r-- 1 tony admin 774B 12 9 03:17 core-site.xml -rw-r--r-- 1 tony admin 16K 12 9 03:42 hadoop-env.sh -rw-r--r-- 1 tony admin 3.2K 12 9 03:17 hadoop-metrics2.properties -rw-r--r-- 1 tony admin 10K 12 9 03:17 hadoop-policy.xml -rw-r--r-- 1 tony admin 3.3K 12 9 03:17 hadoop-user-functions.sh.example -rw-r--r-- 1 tony admin 775B 12 9 03:19 hdfs-site.xml -rw-r--r-- 1 tony admin 1.4K 12 9 03:19 httpfs-env.sh -rw-r--r-- 1 tony admin 1.6K 12 9 03:19 httpfs-log4j.properties -rw-r--r-- 1 tony admin 21B 12 9 03:19 httpfs-signature.secret -rw-r--r-- 1 tony admin 620B 12 9 03:19 httpfs-site.xml -rw-r--r-- 1 tony admin 3.4K 12 9 03:17 kms-acls.xml -rw-r--r-- 1 tony admin 1.3K 12 9 03:17 kms-env.sh -rw-r--r-- 1 tony admin 1.7K 12 9 03:17 kms-log4j.properties -rw-r--r-- 1 tony admin 682B 12 9 03:17 kms-site.xml -rw-r--r-- 1 tony admin 13K 12 9 03:17 log4j.properties -rw-r--r-- 1 tony admin 1.7K 12 9 03:32 mapred-env.sh -rw-r--r-- 1 tony admin 4.0K 12 9 03:32 mapred-queues.xml.template -rw-r--r-- 1 tony admin 758B 12 9 03:32 mapred-site.xml drwxr-xr-x 3 tony admin 96B 12 9 03:17 shellprofile.d -rw-r--r-- 1 tony admin 2.3K 12 9 03:17 ssl-client.xml.example -rw-r--r-- 1 tony admin 2.6K 12 9 03:17 ssl-server.xml.example -rw-r--r-- 1 tony admin 2.6K 12 9 03:19 user_ec_policies.xml.template -rw-r--r-- 1 tony admin 10B 12 9 03:17 workers -rw-r--r-- 1 tony admin 5.3K 12 9 03:30 yarn-env.sh -rw-r--r-- 1 tony admin 690B 12 9 03:30 yarn-site.xml
编辑 hadoop-env.sh
文件
cd /usr/local/Cellar/hadoop/3.0.0/libexec/etc/hadoop/ vim hadoop-env.sh # 查找 HADOOP_OPTS MBP:hadoop tony$ cat hadoop-env.sh |grep -n "export HADOOP_OPTS" 90:# export HADOOP_OPTS="-Djava.net.preferIPv4Stack=true" 92:# export HADOOP_OPTS="-Djava.net.preferIPv4Stack=true -Dsun.security.krb5.debug=true -Dsun.security.spnego.debug" 106: export HADOOP_OPTS="${HADOOP_OPTS} -Djava.security.krb5.realm= " 107: export HADOOP_OPTS="${HADOOP_OPTS} -Djava.security.krb5.kdc= " 108: export HADOOP_OPTS="${HADOOP_OPTS} -Djava.security.krb5.conf= " # 第 92 行取消注释,并加上一行 JAVA_HOME (注意不要直接 Copy 我的路径) export HADOOP_OPTS="-Djava.net.preferIPv4Stack=true -Dsun.security.krb5.debug=true -Dsun.security.spnego.debug" export JAVA_HOME="/Library/Java/JavaVirtualMachines/jdk1.8.0_162.jdk/Contents/Home"
配置 HDFS 的访问地址及存储路径
# 配置 hadoop TMP 目录路径 (此处请随意,自己建个地址也行) mkdir -p /tmp/hadoop/hdfs/tmp chmod -R 777 /tmp/hadoop/hdfs/tmp # 修改 core-site.xml 文件 vim core-site.xml # 加上配置的属性 <configuration> <property> <name>hadoop.tmp.dir</name> <value>/tmp/hadoop/hdfs/tmp</value> <description>A base for other temporary directories.</description> </property> <property> <name>fs.default.name</name> <value>hdfs://localhost:8020</value> </property> </configuration>
设置 MapReduce 的访问地址
vim mapred-site.xml <configuration> <property> <name>mapred.job.tracker</name> <value>localhost:8021</value> </property> </configuration>
设置备份机制
我们本地运行是伪分布式,不需要默认的备份 3份了,改为 1份即可。
vim hdfs-site.xml <configuration> <property> <name>dfs.replication</name> <value>1</value> </property> </configuration>
格式化
hdfs namenode -format
执行结果
MBP:hadoop tony$ hdfs namenode -format WARNING: /usr/local/Cellar/hadoop/3.0.0/libexec/logs does not exist. Creating. 2018-05-18 14:13:14,899 INFO namenode.NameNode: STARTUP_MSG: /************************************************************ STARTUP_MSG: Starting NameNode STARTUP_MSG: host = MBP.local/{我的IP...} STARTUP_MSG: args = [-format] STARTUP_MSG: version = 3.0.0 ..... 此处省略一大段 2018-05-18 14:13:16,119 INFO namenode.FSImage: Allocated new BlockPoolId: BP-1297562978-{我的IP...}-1526623996110 2018-05-18 14:13:16,137 INFO common.Storage: Storage directory /tmp/hadoop/hdfs/tmp/dfs/name has been successfully formatted. 2018-05-18 14:13:16,177 INFO namenode.FSImageFormatProtobuf: Saving image file /tmp/hadoop/hdfs/tmp/dfs/name/current/fsimage.ckpt_0000000000000000000 using no compression 2018-05-18 14:13:16,312 INFO namenode.FSImageFormatProtobuf: Image file /tmp/hadoop/hdfs/tmp/dfs/name/current/fsimage.ckpt_0000000000000000000 of size 389 bytes saved in 0 seconds. 2018-05-18 14:13:16,329 INFO namenode.NNStorageRetentionManager: Going to retain 1 images with txid >= 0 2018-05-18 14:13:16,335 INFO namenode.NameNode: SHUTDOWN_MSG: /************************************************************ SHUTDOWN_MSG: Shutting down NameNode at MBP.local/{我的IP...} ************************************************************/
终于完成了配置,准备开始跑了。
3. 运行
启动 HDFS
hadoop 的可执行程序在 /usr/local/Cellar/hadoop/3.0.0/sbin/
目录下(注意版本号)。
cd /usr/local/Cellar/hadoop/3.0.0/sbin/ ./start-dfs.sh //启动 HDFS ./stop-dfs.sh //停止 HDFS MBP:sbin tony$ ./start-dfs.sh Starting namenodes on [localhost] Starting datanodes Starting secondary namenodes [MBP.local] 2018-05-18 14:38:31,125 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable # 可以通过 jps 命令查看进程 MBP:sbin tony$ jps 16816 Jps 56752 90759 NameNode 69335 Launcher 91002 SecondaryNameNode 98799 90863 DataNode
启动 HDFS 常见问题
问题1: localhost: ssh: connect to host localhost port 22: Connection refused
这个问题,参照下面的解决方法:
解决办法;
设置所有用户允许远程登录:
系统偏好设置 -> 共享 -> 远程登录 -> 允许访问 => 所有用户
然后配置 SSH
ssh-keygen -t rsa cat ~/.ssh/id_rsa.pub >> ~/.ssh/authorized_keys
最后重新启动即可。
问题2: WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
开启日志重新运行 ./start-dfs.sh
,查看日志:
MBP:sbin tony$ ./start-dfs.sh Starting namenodes on [localhost] localhost: namenode is running as process 80560. Stop it first. Starting datanodes localhost: datanode is running as process 80661. Stop it first. Starting secondary namenodes [MBP.local] MBP.local: secondarynamenode is running as process 80796. Stop it first. 2018-05-18 14:58:45,871 DEBUG util.Shell: setsid is not available on this machine. So not using it. 2018-05-18 14:58:45,872 DEBUG util.Shell: setsid exited with exit code 0 2018-05-18 14:58:46,065 DEBUG lib.MutableMetricsFactory: field org.apache.hadoop.metrics2.lib.MutableRate org.apache.hadoop.security.UserGroupInformation$UgiMetrics.loginSuccess with annotation @org.apache.hadoop.metrics2.annotation.Metric(about=, sampleName=Ops, always=false, type=DEFAULT, value=[Rate of successful kerberos logins and latency (milliseconds)], valueName=Time) 2018-05-18 14:58:46,076 DEBUG lib.MutableMetricsFactory: field org.apache.hadoop.metrics2.lib.MutableRate org.apache.hadoop.security.UserGroupInformation$UgiMetrics.loginFailure with annotation @org.apache.hadoop.metrics2.annotation.Metric(about=, sampleName=Ops, always=false, type=DEFAULT, value=[Rate of failed kerberos logins and latency (milliseconds)], valueName=Time) 2018-05-18 14:58:46,076 DEBUG lib.MutableMetricsFactory: field org.apache.hadoop.metrics2.lib.MutableRate org.apache.hadoop.security.UserGroupInformation$UgiMetrics.getGroups with annotation @org.apache.hadoop.metrics2.annotation.Metric(about=, sampleName=Ops, always=false, type=DEFAULT, value=[GetGroups], valueName=Time) 2018-05-18 14:58:46,076 DEBUG lib.MutableMetricsFactory: field private org.apache.hadoop.metrics2.lib.MutableGaugeLong org.apache.hadoop.security.UserGroupInformation$UgiMetrics.renewalFailuresTotal with annotation @org.apache.hadoop.metrics2.annotation.Metric(about=, sampleName=Ops, always=false, type=DEFAULT, value=[Renewal failures since startup], valueName=Time) 2018-05-18 14:58:46,077 DEBUG lib.MutableMetricsFactory: field private org.apache.hadoop.metrics2.lib.MutableGaugeInt org.apache.hadoop.security.UserGroupInformation$UgiMetrics.renewalFailures with annotation @org.apache.hadoop.metrics2.annotation.Metric(about=, sampleName=Ops, always=false, type=DEFAULT, value=[Renewal failures since last successful login], valueName=Time) 2018-05-18 14:58:46,078 DEBUG impl.MetricsSystemImpl: UgiMetrics, User and group related metrics 2018-05-18 14:58:46,108 DEBUG security.SecurityUtil: Setting hadoop.security.token.service.use_ip to true 2018-05-18 14:58:46,138 DEBUG security.Groups: Creating new Groups object 2018-05-18 14:58:46,140 DEBUG util.NativeCodeLoader: Trying to load the custom-built native-hadoop library... 2018-05-18 14:58:46,142 DEBUG util.NativeCodeLoader: Failed to load native-hadoop with error: java.lang.UnsatisfiedLinkError: no hadoop in java.library.path 2018-05-18 14:58:46,142 DEBUG util.NativeCodeLoader: java.library.path=/Users/tn-ma-l30000122/Library/Java/Extensions:/Library/Java/Extensions:/Network/Library/Java/Extensions:/System/Library/Java/Extensions:/usr/lib/java:. 2018-05-18 14:58:46,142 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable 2018-05-18 14:58:46,143 DEBUG util.PerformanceAdvisory: Falling back to shell based 2018-05-18 14:58:46,145 DEBUG security.JniBasedUnixGroupsMappingWithFallback: Group mapping impl=org.apache.hadoop.security.ShellBasedUnixGroupsMapping 2018-05-18 14:58:46,256 DEBUG security.Groups: Group mapping impl=org.apache.hadoop.security.JniBasedUnixGroupsMappingWithFallback; cacheTimeout=300000; warningDeltaMs=5000 2018-05-18 14:58:46,260 DEBUG security.UserGroupInformation: hadoop login 2018-05-18 14:58:46,261 DEBUG security.UserGroupInformation: hadoop login commit 2018-05-18 14:58:46,264 DEBUG security.UserGroupInformation: using local user:UnixPrincipal: tony 2018-05-18 14:58:46,264 DEBUG security.UserGroupInformation: Using user: "UnixPrincipal: tony" with name tony 2018-05-18 14:58:46,264 DEBUG security.UserGroupInformation: User entry: "tony" 2018-05-18 14:58:46,265 DEBUG security.UserGroupInformation: UGI loginUser:tony (auth:SIMPLE) 2018-05-18 14:58:46,266 DEBUG security.UserGroupInformation: PrivilegedAction as:tony (auth:SIMPLE) from:org.apache.hadoop.hdfs.tools.GetConf.run(GetConf.java:315)
看了网上的解决办法,都是 hadoop 2.x 的,我这里是 3.0,所以不行,Google 了一下,发现 StackOverFlow 上发现了这个帖子
人家的回答是直接改日志级别。。。 然后就可以只显示错误,不显示警告 :joy:
# 修改日志文件 etc/hadoop/log4j.properties vim /usr/local/Cellar/hadoop/3.0.0/libexec/etc/hadoop/log4j.properties # 加一行 log4j.logger.org.apache.hadoop.util.NativeCodeLoader=ERROR
4. 体验
命令行操作
创建目录
MBP:sbin tony$ hadoop fs -ls / MBP:sbin tony$ hadoop fs -mkdir /demo MBP:sbin tony$ hadoop fs -ls / Found 1 items drwxr-xr-x - tony supergroup 0 2018-05-22 11:06 /demo
用 hdfs 命令创建目录
cd /usr/local/Cellar/hadoop/3.0.0/ bin/hdfs dfs -mkdir /user bin/hdfs dfs -mkdir /user/tony
复制一些文件进去
cd /usr/local/Cellar/hadoop/3.0.0/ bin/hdfs dfs -mkdir input bin/hdfs dfs -put libexec/etc/hadoop/*.xml input
执行例子
bin/hadoop jar libexec/share/hadoop/mapreduce/hadoop-mapreduce-examples-3.0.0.jar grep input output 'dfs[a-z.]+'
查看验证
bin/hdfs dfs -get output output cat output/* 或者 bin/hdfs dfs -cat output/*
Web 可视化界面
打开 http://localhost:9870/ 发现此时已经能访问了。如图:
可以看到集群详细的信息:
可以看到用命令行创建的目录:
复制进去的文件(每个文件占用一个 block,128M...):
参考链接
以上就是本文的全部内容,希望对大家的学习有所帮助,也希望大家多多支持 码农网
本站部分资源来源于网络,本站转载出于传递更多信息之目的,版权归原作者或者来源机构所有,如转载稿涉及版权问题,请联系我们。