杜龙少(sdvdxl)

安装启动Hadoop集群

Word count: 1,996 / Reading time: 10 min
2016/03/09 Share

环境

UCloud云主机2.6.32-431.11.15.el6.ucloud.x86_64
假设三台主机内网IP分别为master, 10.10.1.1110.10.1.12hostname分别为10-10-1-10,10-10-1-1110-10-1-12

配置JDK

本次搭建测试用的是jdk8可以从Oracle官网下载对应的版本

配置jdk

假设jdk解压后目录存放在/usr/local/jdk8命令行输入sudo vi /etc/profile添加一下内容

1
2
3
JAVA_HOME=/usr/local/jdk8
CLASSPATH=.:$JAVA_HOME/jre/lib
PATH=$PATH:$JAVA_HOME/bin

然后输入source /etc/profile使环境变量生效输入java -versionjava版本信息输出说明配置成功三台主机均这么配置

配置hadoop用户

控制台输入sudo useradd -m -U hadoop 添加hadoop用户然后输入sudo passwd hadoop 修改hadoop用户的密码输入su -l hadoop输入刚才设置的密码切换到hadoop用户

配置ssh

前提是已经切换到hadoop用户
在每个主机控制台输入ssh-keygen回车一直回车直到结束最后在master主机上使用ssh-copy-id命令拷贝认证信息到本主机和其他两台主机这样可以免密码登录ssh-copy-id hadoop@主机地址

配置网络

三台主机均在 /etc/hosts 添加以下映射:

1
2
3
master master
10.10.1.11 slave1
10.10.1.12 slave2

配置Hadoop

  1. 下载Hadoop这里下载的是2.7.1版本控制输入wget http://www.apache.org/dyn/closer.cgi/hadoop/common/hadoop-2.7.1/hadoop-2.7.1.tar.gz等待下载完成
  2. 控制台输入tar -xvf hadoop-2.7.1.tar.gz解压会生成hadoop-2.7.1目录
  3. 输入cd hadoop-2.7.1/etc/hadoop进入配置文件目录
  4. 修改hadoop-env.sh中的export JAVA_HOME=将等号后的内容改成上面配置的jdk绝对路径在这里就是/usr/local/jdk8修改完后应该是export JAVA_HOME=/usr/local/jdk8保存退出
  5. 修改core-site.xml配置config内容
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
<configuration>
<property>
<name>hadoop.tmp.dir</name>
<value>/home/hadoop/tmp</value>
<description>Abase for other temporary directories.</description>
</property>
<property>
<name>fs.defaultFS</name>
<!-- 注意这里改成自己本机的ip -->
<value>hdfs://master:9000</value>
</property>
<property>
<name>fs.default.namenode</name>
<!-- 注意这里改成自己本机的ip -->
<value>hdfs://master:8082</value>
</property>
<property>
<name>io.file.buffer.size</name>
<value>4096</value>
</property>
<property>
<name>hadoop.native.lib</name>
<value>true</value>
<description>Should native hadoop libraries, if present, be used.</description>
</property>
</configuration>
  1. 修改hdfs-site.xml修改config内容为
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
<configuration>
<property>
<name>dfs.nameservices</name>
<value>cluster</value>
</property>
<property>
<name>dfs.namenode.secondary.http-address</name>
<!-- 注意修改为自己的ip -->
<value>master:50090</value>
</property>
<property>
<name>dfs.namenode.name.dir</name>
<value>/home/hadoop/dfs/name</value>
</property>
<property>
<name>dfs.datanode.data.dir</name>
<value>/home/hadoop/dfs/data</value>
</property>
<property>
<name>dfs.replication</name>
<value>2</value>
</property>
<property>
<name>dfs.webhdfs.enabled</name>
<value>true</value>
</property>
</configuration>
  1. 修改yarn-site.xml
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
<configuration>

<!-- 注意ip改为自己的 -->
<property>
<name>yarn.nodemanager.aux-services</name>
<value>mapreduce_shuffle</value>
</property>
<property>
<name>yarn.resourcemanager.address</name>
<value>master:8032</value>
</property>
<property>
<name>yarn.resourcemanager.scheduler.address</name>
<value>master:8030</value>
</property>
<property>
<name>yarn.resourcemanager.resource-tracker.address</name>
<value>master:8031</value>
</property>
<property>
<name>yarn.resourcemanager.admin.address</name>
<value>master:8033</value>
</property>
<property>
<name>yarn.resourcemanager.webapp.address</name>
<value>master:8088</value>
</property>
</configuration>
  1. 修改mapred-site.xml
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
<configuration>
<property>
<name>mapreduce.framework.name</name>
<value>yarn</value>
</property>
<property>
<name>mapreduce.jobtracker.http.address</name>
<value>master:50030</value>
</property>
<property>
<name>mapreduce.jobhistory.address</name>
<value>master:10020</value>
</property>
<property>
<name>mapreduce.jobhistory.webapp.address</name>
<value>master:19888</value>
</property>
</configuration>
  1. 修改slaves文件添加其他两台ip
1
2
slave1
slave2

hadoop目录覆盖到其余机器对应目录
下面开始操作hadoop命令如果遇到hadoop native错误请查看文末Hadoop Native 配置部分

  1. 格式化文件系统
    注意这里的格式化文件系统并不是硬盘格式化只是针对主服务器hdfs-site.xmldfs.namenode.name.dirdfs.datanode.data.dir目录做相应的清理工作切换到Hadoophome目录执行bin/hdfs namenode -format
  2. 启动停止服务
    启动sbin/start-dfs.sh可以一次性启动masterslaves节点服务sbin/start-yarn.sh启动yarn资源管理服务要停止服务用对应的sbin/stop-dfs.shsbin/stop-dfs.sh即可停止服务
  3. 单独启动一个datanode
    增加节点或者重启节点需要单独启动则可使用以下命令:
    sbin/hadoop-daemon.sh start datanode启动nodeManagersbin/yarn-daemon.sh start nodemanager当然也可以操作namenodesbin/hadoop-daemon.sh start namenodesbin/yarn-daemon.sh start resourcemanager
    注意原文中是sbin/yarn-daemons.shsbin/hadoop-daemons.sh运行后发现并没有启动成功去掉s后启动成功

Hadoop Native 配置

输入 hadoop checknative 检查Hadoop本地库版本和相关依赖信息

1
2
3
4
5
6
7
8
9
10
11
12
13
16/03/10 12:17:56 DEBUG util.NativeCodeLoader: Trying to load the custom-built native-hadoop library...
16/03/10 12:17:56 DEBUG util.NativeCodeLoader: Failed to load native-hadoop with error: java.lang.UnsatisfiedLinkError: /home/hadoop/hadoop-2.6.3/lib/native/libhadoop.so.1.0.0: /lib64/libc.so.6: version `GLIBC_2.14' not found (required by /home/hadoop/hadoop-2.6.3/lib/native/libhadoop.so.1.0.0)
16/03/10 12:17:56 DEBUG util.NativeCodeLoader: java.library.path=/home/hadoop/hadoop-2.6.3/lib/native
16/03/10 12:17:56 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
16/03/10 12:17:56 DEBUG util.Shell: setsid exited with exit code 0
Native library checking:
hadoop: false
zlib: false
snappy: false
lz4: false
bzip2: false
openssl: false
16/03/10 12:17:56 INFO util.ExitUtil: Exiting with status 1

发现/lib64/libc.so.6: versionGLIBC_2.14 not found`信息说明该版本的Hadoop需要glibc_2.14版本下面就安装所需的版本

  1. mkdir glib_build && cd glib_build
  2. wget http://ftp.gnu.org/gnu/glibc/glibc-2.14.tar.gz && wget http://ftp.gnu.org/gnu/glibc/glibc-linuxthreads-2.5.tar.bz2
  3. tar zxf glibc-2.14.tar.gz && cd glibc-2.14 && tar jxf ../glibc-linuxthreads-2.5.tar.bz2
  4. cd ../ && export CFLAGS="-g -O2" && ./glibc-2.14/configure --prefix=/usr --disable-profile --enable-add-ons --with-headers=/usr/include --with-binutils=/usr/bin
  5. make
  6. make install
    install最后会遇到错误信息
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
CC="gcc -B/usr/bin/" /usr/bin/perl scripts/test-installation.pl /root/
/usr/bin/ld: cannot find -lnss_test1
collect2: ld returned 1 exit status
Execution of gcc -B/usr/bin/ failed!
The script has found some problems with your installation!
Please read the FAQ and the README file and check the following:
- Did you change the gcc specs file (necessary after upgrading from
Linux libc5)?
- Are there any symbolic links of the form libXXX.so to old libraries?
Links like libm.so -> libm.so.5 (where libm.so.5 is an old library) are wrong,
libm.so should point to the newly installed glibc file - and there should be
only one such link (check e.g. /lib and /usr/lib)
You should restart this script from your build directory after you've
fixed all problems!
Btw. the script doesn't work if you're installing GNU libc not as your
primary library!
make[1]: *** [install] Error 1
make[1]: Leaving directory `/root/glibc-2.14'
make: *** [install] Error 2

无需关注检验是否成功
ls -l /lib64/libc.so.6
lrwxrwxrwx 1 root root 12 Mar 10 12:12 /lib64/libc.so.6 -> libc-2.14.so
出现了/lib64/libc.so.6 -> libc-2.14.so字样说明成功了

安装openssl
yum install openssl-static.x86_64

如何修改主机名称

修改文件/etc/sysconfig/network
然后执行/etc/rc.d/init.d/network restart重启网络模块

secondaryNameNode 配置

  1. 修改masters文件如果没有则自己创建添加一个主机名称用以作为secondaryNameNode
  2. 修改hdfs-site.xml的内容删除dfs.namenode.secondary.http-address部分配置添加新的配置注意修改为自己的ip
1
2
3
4
5
6
7
8
9
10
11
12
<property>
<name>dfs.http.address</name>
<value>master:50070</value>
<description>
The address and the base port where the dfs namenode web ui will listen on.
If the port is 0 then the server will start on a free port.
</description>
</property>
<property>
<name>dfs.namenode.secondary.http-address</name>
<value>10.10.1.11</value>
</property>
参考资料
  1. Hadoop-2.5.2集群安装配置详解
  2. 基于hadoop2.2namenodeSecondaryNameNode分开配置在不同的计算机
CATALOG
  1. 1. 环境
  2. 2. 配置 JDK
    1. 2.1. 配置 jdk
  3. 3. 配置 hadoop 用户
  4. 4. 配置 ssh
  5. 5. 配置网络
  6. 6. 配置 Hadoop
  7. 7. Hadoop Native 配置
  8. 8. 如何修改主机名称
  9. 9. secondaryNameNode 配置
    1. 9.0.0.0.0.1. 参考资料: