以下hive版本3+,对应的hadoop也是3+
安装

下载
➜  ~ wget https://mirrors.tuna.tsinghua.edu.cn/apache/hive/hive-3.1.2/apache-hive-3.1.2-bin.tar.gz解压
➜  ~ tar -zxvf apache-hive-3.1.2-bin.tar.gz -C /opt/Apache/配置环境变量
vim /etc/profile
...
export HIVE_HOME=/opt/Apache/apache-hive-3.1.2-bin
export PATH=$JAVA_HOME/bin:$HADOOP_HOME/bin:$HIVE_HOME/bin:$PATH
...启动
初始化数据库
hive默认使用derby数据库管理元数据,有很大缺陷,同一时间只能允许有一个hive客户端。后面改用mysql管理元数据。
- 初始化数据库元数据
➜  apache-hive-3.1.2-bin bin/schematool -dbType derby -initSchema注意:这里大概率会报错,解决办法请参考:初始化derby数据库报错
- 启动客户端
➜  apache-hive-3.1.2-bin bin/hive
which: no hbase in (/opt/Java/jdk1.8.0_261/bin:/opt/Apache/apache-maven-3.6.3/bin:/opt/node-v12.18.4-linux-x64/bin:/opt/Apache/apache-ant-1.9.15/bin:/opt/Apache/hadoop-3.2.1/bin:/opt/Apache/apache-hive-3.1.2-bin/bin:/usr/local/bin:/usr/bin:/home/sairo/bin:/usr/local/sbin:/usr/sbin)
Hive Session ID = 9ea641f6-4c3b-49db-877e-93cf945cea77
Logging initialized using configuration in jar:file:/opt/Apache/apache-hive-3.1.2-bin/lib/hive-common-3.1.2.jar!/hive-log4j2.properties Async: true
Hive-on-MR is deprecated in Hive 2 and may not be available in the future versions. Consider using a different execution engine (i.e. spark, tez) or using Hive 1.X releases.
Hive Session ID = 5dcd28fe-648a-4e5c-99cd-853719674c78
hive>简单SQL操作
hive> show databases;
OK
default
Time taken: 0.864 seconds, Fetched: 1 row(s)
hive> use default;
OK
Time taken: 0.056 seconds
hive> show tables;
OK
Time taken: 0.054 seconds
hive> create table test (id string, name string);
OK
Time taken: 0.837 seconds
hive> insert into test values('aaa', 'Tom');
Query ID = sairo_20201126194635_ab618e32-953d-4fd2-983b-dfc2d8abd4d2
Total jobs = 3
Launching Job 1 out of 3
Number of reduce tasks determined at compile time: 1
In order to change the average load for a reducer (in bytes):
  set hive.exec.reducers.bytes.per.reducer=<number>
In order to limit the maximum number of reducers:
  set hive.exec.reducers.max=<number>
In order to set a constant number of reducers:
  set mapreduce.job.reduces=<number>
Starting Job = job_1606377510588_0001, Tracking URL = http://dev-jsj.com:8088/proxy/application_1606377510588_0001/
Kill Command = /opt/Apache/hadoop-3.2.1/bin/mapred job  -kill job_1606377510588_0001
Hadoop job information for Stage-1: number of mappers: 1; number of reducers: 1
2020-11-26 19:46:51,787 Stage-1 map = 0%,  reduce = 0%
2020-11-26 19:46:58,055 Stage-1 map = 100%,  reduce = 0%, Cumulative CPU 5.96 sec
2020-11-26 19:47:05,333 Stage-1 map = 100%,  reduce = 100%, Cumulative CPU 9.81 sec
MapReduce Total cumulative CPU time: 9 seconds 810 msec
Ended Job = job_1606377510588_0001
Stage-4 is selected by condition resolver.
Stage-3 is filtered out by condition resolver.
Stage-5 is filtered out by condition resolver.
Moving data to directory hdfs://dev-jsj.com:9000/user/hive/warehouse/test/.hive-staging_hive_2020-11-26_19-46-35_648_9001647310701991693-1/-ext-10000
Loading data to table default.test
MapReduce Jobs Launched: 
Stage-Stage-1: Map: 1  Reduce: 1   Cumulative CPU: 9.81 sec   HDFS Read: 14565 HDFS Write: 243 SUCCESS
Total MapReduce CPU Time Spent: 9 seconds 810 msec
OK
Time taken: 32.519 seconds使用MYSQL管理元数据
安装MYSQL
常规操作,直接跳过
提前创建好hive元数据库
mysql> create database hive_metastore default charset utf8;向hive中添加mysql驱动包
➜  apache-hive-3.1.2-bin cp /opt/Apache/repository/mysql/mysql-connector-java/8.0.13/mysql-connector-java-8.0.13.jar ./lib配置JDBC连接参数
conf目录下有很多配置文件模板,这里编辑hive-default.xml.template并重命名为hive-site.xml

hive-site.xml:
<?xml version="1.0" encoding="UTF-8" standalone="no"?>
<?xml-stylesheet type="text/xsl" href="configuration.xsl"?>
<configuration>
  <!-- jdbc 连接的 URL -->
  <property>
    <name>javax.jdo.option.ConnectionURL</name>
    <value>jdbc:mysql://dev-jsj.com:3306/hive_metastore?useSSL=false&characterEncoding=utf8</value>
  </property>
  <!-- jdbc 连接的 Driver-->  
  <property>
    <name>javax.jdo.option.ConnectionDriverName</name>
    <value>com.mysql.cj.jdbc.Driver</value>
  </property>
  <!-- jdbc 连接的 username-->
  <property>
    <name>javax.jdo.option.ConnectionUserName</name>
    <value>root</value>  
  </property>
  <!-- jdbc 连接的 password -->
  <property>
    <name>javax.jdo.option.ConnectionPassword</name>
    <value>123456</value>
  </property>
  <!-- Hive 元数据存储版本的验证 -->
  <property>
    <name>hive.metastore.schema.verification</name>
    <value>false</value>
  </property>
  <!--元数据存储授权-->
  <property>
    <name>hive.metastore.event.db.notification.api.auth</name>
    <value>false</value>
  </property>
</configuration>注意:
- 配置jdbc url时,请将&改写为&(&符号在xml中有特殊语义,必须进行转义)
- 如果使用官方自带的配置文件模板,请将所有<property>的子标签<description>删除,官方用来解释这个配置属性的作用,但是很多描述带有特殊符号,不删除启动会报错。
- 有些教程可能会要求设置hive在hdfs中的存储路径,如:
<property>
    <name>hive.metastore.warehouse.dir</name>
    <value>/user/hive/warehouse</value>
</property>如非特殊需求,不需要配置,默认值就是:/user/hive/warehouse
初始化数据库
➜  apache-hive-3.1.2-bin  bin/schematool -dbType mysql -initSchema -verbose-verbose: 显示执行过程,不像之前那样一大段空白
到此为止,非常基础的配置已经OK,用户可以在命令行直接操作hive,接下来配置如何远程连接hive,类似启动hadoop后台服务。
使用元数据服务的方式访问 Hive

- 第一步:修改配置文件hive-site.xml
➜  apache-hive-3.1.2-bin vim conf/hive-site.xml 
# 添加如下配置
<!-- 指定存储元数据要连接的地址 -->
<property>
    <name>hive.metastore.uris</name>
    <value>thrift://dev-jsj.com:9083</value>
</property>- 第二步:启动元数据服务
➜  apache-hive-3.1.2-bin bin/hive --service metastore
# 或者
➜  apache-hive-3.1.2-bin nohup bin/hive --service metastore &提示:元数据服务是一个前台进程,默认会占用当前会话窗口,可以使用nohup命令使其在后台运行
使用 JDBC 方式访问 Hive
所谓JDBC的方式就是在服务端开启hive服务,即暴露端口,客户端可远程连接,类似于mysql和hadoop

- 第一步:修改配置文件hive-site.xml
➜  apache-hive-3.1.2-bin vim conf/hive-site.xml 
# 添加如下配置
<!-- 指定 hiveserver2 连接的 host -->
<property>
    <name>hive.server2.thrift.bind.host</name>
    <value>dev-jsj.com</value>
</property>
<!-- 指定 hiveserver2 连接的端口号 -->
<property>
    <name>hive.server2.thrift.port</name>
    <value>10000</value>
</property>- 第二步:启动 hiveserver2服务
➜  apache-hive-3.1.2-bin bin/hive --service hiveserver2
# 或者
➜  apache-hive-3.1.2-bin nohup bin/hive --service hiveserver2 &提示:
- 启动hiveserver2前要先启动元数据服务
- hiveserver2服务是一个前台进程,默认会占用当前会话窗口,可以使用- nohup命令使其在后台运行
- 第三步:命令行模拟远程连接
➜  apache-hive-3.1.2-bin bin/beeline -u jdbc:hive2://dev-jsj.com:10000 -n sairo
其他配置
日志配置
在$HIVE_HOME/conf/目录下有很多配置文件模板,找到hive-log4j2.properties.template ,配置相关属性就好。修改完成后记得重命名为hive-log4j2.properties
配置日志存放位置,默认在/tmp/{user}目录

命令行格式配置
默认情况下在命令行中操作hive命令提示符非常简单,就像下面这样:

可以添加配置使得在命令提示符中带有数据库标识符。配置如下:
➜  apache-hive-3.1.2-bin vim conf/hive-site.xml 
...
<property>
    <name>hive.cli.print.header</name>
    <value>true</value>
</property>
<property>
    <name>hive.cli.print.current.db</name>
    <value>true</value>
</property>
...
题外话
有兴趣的读者在完成以上操作后可以试试使用 HUE  连接hive,可能会遇到很多坑,各种报错,笔者花了一下午才弄好。建议有时间爱折腾的读者试试。后面有时间笔者也会写篇文章介绍如何搭建HUE环境。

问题解决
- 初始化derby数据库报错
➜  apache-hive-3.1.2-bin bin/schematool -dbType derby -initSchema
SLF4J: Class path contains multiple SLF4J bindings.
SLF4J: Found binding in [jar:file:/opt/Apache/apache-hive-3.1.2-bin/lib/log4j-slf4j-impl-2.10.0.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: Found binding in [jar:file:/opt/Apache/hadoop-3.2.1/share/hadoop/common/lib/slf4j-log4j12-1.7.25.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: See http://www.slf4j.org/codes.html#multiple_bindings for an explanation.
SLF4J: Actual binding is of type [org.apache.logging.slf4j.Log4jLoggerFactory]
Exception in thread "main" java.lang.NoSuchMethodError: com.google.common.base.Preconditions.checkArgument(ZLjava/lang/String;Ljava/lang/Object;)V
        at org.apache.hadoop.conf.Configuration.set(Configuration.java:1357)
        at org.apache.hadoop.conf.Configuration.set(Configuration.java:1338)
        at org.apache.hadoop.mapred.JobConf.setJar(JobConf.java:536)
        at org.apache.hadoop.mapred.JobConf.setJarByClass(JobConf.java:554)
        at org.apache.hadoop.mapred.JobConf.<init>(JobConf.java:448)
        at org.apache.hadoop.hive.conf.HiveConf.initialize(HiveConf.java:5141)
        at org.apache.hadoop.hive.conf.HiveConf.<init>(HiveConf.java:5104)
        at org.apache.hive.beeline.HiveSchemaTool.<init>(HiveSchemaTool.java:96)
        at org.apache.hive.beeline.HiveSchemaTool.main(HiveSchemaTool.java:1473)
        at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
        at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
        at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
        at java.lang.reflect.Method.invoke(Method.java:498)
        at org.apache.hadoop.util.RunJar.run(RunJar.java:323)
        at org.apache.hadoop.util.RunJar.main(RunJar.java:236)原因:hive自带的guava*.jar与hadoop的guava.jar版本冲突,删除掉低版本替换为高版本
解决办法:
- 查找hive中guava*.jar的位置
➜  apache-hive-3.1.2-bin find ./ -name  "*guava*"                           
./lib/guava-19.0.jar
./lib/jersey-guava-2.25.1.jar- 查看hadoop中guava*.jar的位置,一般在/opt/Apache/hadoop-3.2.1/share/hadoop/hdfs/lib/目录
➜  apache-hive-3.1.2-bin find /opt/Apache/hadoop-3.2.1 -name "*guava*"
/opt/Apache/hadoop-3.2.1/share/hadoop/common/lib/listenablefuture-9999.0-empty-to-avoid-conflict-with-guava.jar
/opt/Apache/hadoop-3.2.1/share/hadoop/hdfs/lib/guava-27.0-jre.jar
/opt/Apache/hadoop-3.2.1/share/hadoop/hdfs/lib/listenablefuture-9999.0-empty-to-avoid-conflict-with-guava.jar- 删除hive中低版本guava并替换为hadoop的高版本guava
➜  apache-hive-3.1.2-bin mv ./lib/guava-19.0.jar ./lib/guava-19.0.jar.bak
➜  apache-hive-3.1.2-bin cp /opt/Apache/hadoop-3.2.1/share/hadoop/hdfs/lib/guava-27.0-jre.jar ./lib









