介绍
Hadoop常用发行版及选型
- Apache Hadoop
- CDH : Cloudera Distributed Hadoop
- HDP : Hortonworks Data Platform
一般使用CDH,下载地址
http://archive.cloudera.com/cdh5/cdh/5/
LZ下载hadoop-2.6.0-cdh5.7.0,只要cdh版本号一样即可,这样各个组件之间不会冲突
看到链接为
http://archive.cloudera.com/cdh5/cdh/5/hadoop-2.6.0-cdh5.7.0/
// 将链接改为如下形式即可下载
http://archive.cloudera.com/cdh5/cdh/5/hadoop-2.6.0-cdh5.7.0.tar.gz
准备
hostname设置
vi /etc/sysconfig/network
NETWORKING=yes
HOSTNAME=hadoop000
hostname和ip地址的设置
vi
角色分配
hadoop000 NameNode/DataNode ResourceManager/NodeManager
hadoop001 DataNode NodeManager
hadoop002 DataNode NodeManager
前置安装
SSH免密登陆,在每台机器上运行
// 生成公钥和私钥对
ssh-keygen -t rsa
生成的文件在~/.ssh下
id_rsa和id_rsa.pub
执行如下命令输入密码即可(就是把hadoop000的公钥放置在hadoop000,hadoop001和hadoop002上这样就直接通过ssh登陆)
或者自己手动吧hadoop000 ~/.ssh/id_rsa.pub的内容复制到hadoop000,hadoop001和hadoop002上的~/.ssh/authorized_keys即可
ssh-copy-id -i ~/.ssh/id_rsa.pub hadoop000
ssh-copy-id -i ~/.ssh/id_rsa.pub hadoop001
ssh-copy-id -i ~/.ssh/id_rsa.pub hadoop002
公钥访问时,需要把用户的公钥放置在~/.ssh/authorized_keys文件中,上面的命令做的就是这个功能
将hadoop的tar包解压到app目录下
tar
配置用户级别的环境变量
// 打开文件
vi ~/.bash_profile
//加入如下命令
export HADOOP_HOME=/data/qa/app/hadoop-2.6.0-cdh5.7.0
export PATH=$HADOOP_HOME/bin:$PATH
// 刷新一下
source
取出JAVA_HOME的路径
echo $JAVA_HOME
配置~/app/hadoop-2.6.0-cdh5.7.0/etc/hadoop下的hadoop-env.sh
设置JAVA_HOME
export JAVA_HOME=/data/lib/jdk8
配置~/app/hadoop-2.6.0-cdh5.7.0/etc/hadoop下的core-site.xml
<configuration>
<property>
<name>fs.default.name</name>
<value>hdfs://mad01:8020</value>
</property>
</configuration>
配置~/app/hadoop-2.6.0-cdh5.7.0/etc/hadoop下的hdfs-site.xml
<configuration>
<property>
<name>dfs.namenode.name.dir</name>
<value>/data/qa/app/tmp/dfs/name</value>
</property>
<property>
<name>dfs.datanode.data.dir</name>
<value>/data/qa/app/tmp/dfs/data</value>
</property>
</configuration>
配置~/app/hadoop-2.6.0-cdh5.7.0/etc/hadoop下的yarn-site.xml
<configuration>
<property>
<name>yarn.nodemanager.aux-services</name>
<value>mapreduce_shuffle</value>
</property>
<property>
<name>yarn.resourcemanager.hostname</name>
<value>mad01</value>
</property>
</configuration>
配置~/app/hadoop-2.6.0-cdh5.7.0/etc/hadoop下的mapred-site.xml
cp mapred-site.xml.template mapred-site.xml
vi
<configuration>
<property>
<name>mapreduce.framework.name</name>
<value>yarn</value>
</property>
</configuration>
配置~/app/hadoop-2.6.0-cdh5.7.0/etc/hadoop下的slaves
// 将localhost改为如下
hadoop000
hadoop001
hadoop002
分发安装包到hadoop001和hadoop002节点
scp -r ~/app hadoop@hadoop001:~/
scp -r ~/app hadoop@hadoop002:~/
## 分发环境变量
scp ~/.bash_profile hadoop@hadoop001:~/
scp ~/.bash_profile hadoop@hadoop002:~/
## 生效
source
启动
对HDFS NameNode做格式化:只要在hadoop000上执行即可
cd $HADOOP_HOME/bin
./hdfs namenode -format
启动集群:只要在hadoop000上执行即可
cd $HADOOP_HOME/sbin
./start-all.sh
## 关闭集群
验证
jps
hadoop000
NameNode
SecondaryNameNode
ResourceManager
DataNode
NodeManager
hadoop001,hadoop002
NodeManager
DataNode
webui
http://hadoop000:50070 http://hadoop000:8088
备查命令
hadoop-daemons.sh start datanode 启动所有的数据节点
hadoop-daemon.sh start datanode 只启动当前的数据节点
参考博客