Hive函数大全

Hive函数大全

[TOC]

Relational Operators:关系运算符

=
Operator OperandTypes Description
A = B 所有基本类型 如果表达式 a 等于表达式 b,则为 TRUE,否则为 FALSE。
1
2
3
4
5
6
7
0: jdbc:hive2://192.168.150.150:10000> select (1=3);
+--------+
| _c0 |
+--------+
| false |
+--------+
1 row selected (0.484 seconds)

Mathematical Functions:数学函数

​ 在 Hive 中支持一下内置的数学函数;当参数为 NULL 时, 大多数返回 NULL 。

round()
ReturnType Name(Signature) Description
Double round(Double a) 返回 a 的四舍五入后 BigInt 的值
Double round(Double a, Int d) 返回 a 四舍五入到小数点后 d 位的值。
1
2
3
4
5
6
7
8
9
10
11
12
13
14
0: jdbc:hive2://192.168.150.150:10000> select round(pi());
+------+
| _c0 |
+------+
| 3.0 |
+------+
1 row selected (0.085 seconds)
0: jdbc:hive2://192.168.150.150:10000> select round(pi(),2);
+-------+
| _c0 |
+-------+
| 3.14 |
+-------+
1 row selected (0.478 seconds)
bround()
ReturnType Name(Signature) Description
Double bround(Double a ) 返回向最接近数字方向舍入的舍入模式,如果与两个相邻数字的距离相等,则向相邻的偶数舍入。如果舍弃部分左边的数字为奇数,则左边数字加1,如果为偶数,则左边数字不变。【也称为高斯舍入或银行家的舍入】
Double bround(Double a , Int d) 返回 a 四舍五入到小数点后 d 位的值。
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
0: jdbc:hive2://192.168.150.150:10000> select bround(3.5);
+------+
| _c0 |
+------+
| 4 |
+------+
1 row selected (0.086 seconds)
0: jdbc:hive2://192.168.150.150:10000> select bround(2.5);
+------+
| _c0 |
+------+
| 2 |
+------+
1 row selected (0.076 seconds)
0: jdbc:hive2://192.168.150.150:10000> select bround(pi(), 5);
+----------+
| _c0 |
+----------+
| 3.14159 |
+----------+
1 row selected (0.07 seconds)
floor()/ceil()
ReturnType Name(Signature) Description
BigInt floor(Double a) 返回 a 向下取整的数值
BigInt ceil(Double a) 返回 a 向上取整的数值
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
-----------------------------------------floor(Double)---------------------------------------
0: jdbc:hive2://192.168.150.150:10000> select floor(pi());
+------+
| _c0 |
+------+
| 3 |
+------+
1 row selected (0.091 seconds)
-----------------------------------------ceil(Double)---------------------------------------
0: jdbc:hive2://192.168.150.150:10000> select ceil(pi());
+------+
| _c0 |
+------+
| 4 |
+------+
1 row selected (0.078 seconds)
rand()
ReturnType Name(Signature) Description
Double rand() 返回一个从0到1均匀分布的随机数(从一行到另一行不断变化)
Double rand(Int seed) 指定 seed 将确保生成的随机数序列是确定的。
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
0: jdbc:hive2://192.168.150.150:10000> select rand();
+----------------------+
| _c0 |
+----------------------+
| 0.14502120180326072 |
+----------------------+
1 row selected (0.087 seconds)
0: jdbc:hive2://192.168.150.150:10000> select rand(13);
+---------------------+
| _c0 |
+---------------------+
| 0.7298032243379924 |
+---------------------+
1 row selected (0.072 seconds)
0: jdbc:hive2://192.168.150.150:10000> select rand(13);
+---------------------+
| _c0 |
+---------------------+
| 0.7298032243379924 |
+---------------------+
1 row selected (0.489 seconds)

exp()
ReturnType Name(Signature) Description
Double exp(Double a), exp(Decimal a) 返回e为底,a为指数,幂运算后的数值【其中 e 是自然对数的基数,在 Hive 0.13.0中添加了十进制版本。】
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
0: jdbc:hive2://192.168.150.150:10000> select exp(2);
+-------------------+
| _c0 |
+-------------------+
| 7.38905609893065 |
+-------------------+
1 row selected (0.074 seconds)
0: jdbc:hive2://192.168.150.150:10000> select e() * e();
+---------------------+
| _c0 |
+---------------------+
| 7.3890560989306495 |
+---------------------+
1 row selected (0.473 seconds)

log()
ReturnType Name(Signature) Description
Double ln(Double a), ln(Decimal a) 返回参数的自然对数值 a
Double log10(Double a), log2(Decimal a) 返回参数 a 的以10为底的对数
Double log2(Double a), log2(Decimal a) 返回参数的以2为底的对数
Double log(Double/Decimal base, Double/Decimal a) 返回参数的以为底的对数
1

pow()
ReturnType Name(Signature) Description
Double pow(Double a , Double p) 返回 a 为底数,p为指数 的幂运算数值
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
0: jdbc:hive2://192.168.150.150:10000> select pow(2,3);
+------+
| _c0 |
+------+
| 8.0 |
+------+
1 row selected (0.574 seconds)
0: jdbc:hive2://192.168.150.150:10000> select pow(3,3);
+-------+
| _c0 |
+-------+
| 27.0 |
+-------+
1 row selected (0.071 seconds)

sqrt()
ReturnType Name(Signature) Description
Double sqrt(Double/Decimal a ) 返回 a.Decimal 的平方根数值
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
0: jdbc:hive2://192.168.150.150:10000> select sqrt(3);
+---------------------+
| _c0 |
+---------------------+
| 1.7320508075688772 |
+---------------------+
1 row selected (0.073 seconds)
0: jdbc:hive2://192.168.150.150:10000> select sqrt(4);
+------+
| _c0 |
+------+
| 2.0 |
+------+
1 row selected (0.077 seconds)

bin()/hex()/unhex()/conv()
ReturnType Name(Signature) Description
String bin(BigInt a) 返回二进制格式的数字
String hex( BigInt / String / Binary a) 如果参数为Int或者为二进制(Binary),返回数字 a 的16进制格式 | 如果参数为String,则它将每个字符转换为其十六进制表示形式,并返回结果 STRING
Binary unhex( String a) 反转十六进制。将每对字符解释为十六进制数,并将其转换为该数字的字节表示形式。
String conv(BigInt/String num, Int from_base, Int to_base) 在不同的数字基数之间转换数字。返回数字 n 的字符串表示形式.【最小基数为2,最大基数为36。如果 from_ base 是负数,则 n 被视为有符号数。否则,n 被视为无符号,conv()以64位精度转换。】
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
----------------------------------------bin----------------------------------------
0: jdbc:hive2://192.168.150.150:10000> select bin(4);
+------+
| _c0 |
+------+
| 100 |
+------+
1 row selected (0.483 seconds)
0: jdbc:hive2://192.168.150.150:10000> select bin(9);
+-------+
| _c0 |
+-------+
| 1001 |
+-------+
1 row selected (0.071 seconds)
-----------------------------------------hex(Int)---------------------------------------
0: jdbc:hive2://192.168.150.150:10000> select hex(14);
+------+
| _c0 |
+------+
| E |
+------+
1 row selected (0.468 seconds)
0: jdbc:hive2://192.168.150.150:10000> select hex(19);
+------+
| _c0 |
+------+
| 13 |
+------+
1 row selected (0.075 seconds)
-----------------------------------------hex(String)---------------------------------------
0: jdbc:hive2://192.168.150.150:10000> select hex('LOVE');
+-----------+
| _c0 |
+-----------+
| 4C4F5645 |
+-----------+
1 row selected (0.087 seconds)
0: jdbc:hive2://192.168.150.150:10000> select hex('L');
+------+
| _c0 |
+------+
| 4C |
+------+
1 row selected (0.47 seconds)
-----------------------------------------unhex(String)---------------------------------------
0: jdbc:hive2://192.168.150.150:10000> select unhex('4C4F5645');
+-------+
| _c0 |
+-------+
| LOVE |
+-------+
1 row selected (0.076 seconds)
0: jdbc:hive2://192.168.150.150:10000> select unhex('4C');
+------+
| _c0 |
+------+
| L |
+------+
1 row selected (0.144 seconds)
-----------------------------------------conv(String ...)---------------------------------------
0: jdbc:hive2://192.168.150.150:10000> select conv('a', 16,2);
+-------+
| _c0 |
+-------+
| 1010 |
+-------+
1 row selected (0.143 seconds)
0: jdbc:hive2://192.168.150.150:10000> select conv('6E', 18,8);
+------+
| _c0 |
+------+
| 172 |
+------+
1 row selected (0.092 seconds)
-----------------------------------------conv(Int ...)---------------------------------------
0: jdbc:hive2://192.168.150.150:10000> select conv(-17, 18,-8);
+------+
| _c0 |
+------+
| -31 |
+------+
1 row selected (0.08 seconds)
-----------------------------------------conv(Int|String ...)---------------------------------------
0: jdbc:hive2://192.168.150.150:10000> select conv(10+'10'+'10',10,10);
+------+
| _c0 |
+------+
| 30 |
+------+
1 row selected (0.076 seconds)

abs()
ReturnType Name(Signature) Description
Double abs(Double a) 返回 a 的绝对值
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
0: jdbc:hive2://192.168.150.150:10000> select abs(-10);
+------+
| _c0 |
+------+
| 10 |
+------+
1 row selected (0.101 seconds)
0: jdbc:hive2://192.168.150.150:10000> select abs(10);
+------+
| _c0 |
+------+
| 10 |
+------+
1 row selected (0.085 seconds)

pmod()
ReturnType Name(Signature) Description
Int/Double pmod(Int/Double a, Int/Double b) 返回 a 余 b 的结果
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
1 row selected (0.092 seconds)
0: jdbc:hive2://192.168.150.150:10000> select pmod(10, 5);
+------+
| _c0 |
+------+
| 0 |
+------+
1 row selected (0.503 seconds)
0: jdbc:hive2://192.168.150.150:10000> select pmod(-10, 5);
+------+
| _c0 |
+------+
| 0 |
+------+
1 row selected (0.084 seconds)
0: jdbc:hive2://192.168.150.150:10000> select pmod(-8, 5);
+------+
| _c0 |
+------+
| 2 |
+------+
1 row selected (0.086 seconds)

—–[反]三角函数:sin() / asin() / cos() / acos() / tan() / atan()
ReturnType Name(Signature) Description
Double sin(Double / Decimal a) 返回 a 的正弦值【a 以弧度表示】
Double asin(Double / Decimal a)
Double cos(Double / Decimal a)
Double acos(Double / Decimal a)
Double tan(Double / Decimal a)
Double atan(Double / Decimal a)
1

degrees()/radians()
ReturnType Name(Signature) Description
Double degrees( Double / Decimal a) 返回将 a 的值从弧度转换为度数【弧度>>>度数】
Double radians( Double / Decimal a) 返回将 a 的值从度数转换为弧度【度数>>>弧度】
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
-----------------------------------------degrees()---------------------------------------
0: jdbc:hive2://192.168.150.150:10000> select degrees(60);
+---------------------+
| _c0 |
+---------------------+
| 3437.7467707849396 |
+---------------------+
1 row selected (0.506 seconds)
0: jdbc:hive2://192.168.150.150:10000> select degrees(10);
+--------------------+
| _c0 |
+--------------------+
| 572.9577951308232 |
+--------------------+
1 row selected (0.107 seconds)
-----------------------------------------radians()---------------------------------------
0: jdbc:hive2://192.168.150.150:10000> select radians(10);
+----------------------+
| _c0 |
+----------------------+
| 0.17453292519943295 |
+----------------------+
1 row selected (0.079 seconds)
0: jdbc:hive2://192.168.150.150:10000> select radians(60);
+---------------------+
| _c0 |
+---------------------+
| 1.0471975511965976 |
+---------------------+
1 row selected (0.063 seconds)

positive()
ReturnType Name(Signature) Description
Int / Double positive( Int / Double a) 返回数值 a
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
0: jdbc:hive2://192.168.150.150:10000> select positive(60);
+------+
| _c0 |
+------+
| 60 |
+------+
1 row selected (0.083 seconds)
0: jdbc:hive2://192.168.150.150:10000> select positive(10);
+------+
| _c0 |
+------+
| 10 |
+------+
1 row selected (0.072 seconds)

negative()
ReturnType Name(Signature) Description
Int / Double negative(Int / Double a) 返回数值 a 的相反数
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
0: jdbc:hive2://192.168.150.150:10000> select negative(10);
+------+
| _c0 |
+------+
| -10 |
+------+
1 row selected (0.077 seconds)
0: jdbc:hive2://192.168.150.150:10000> select negative(-10);
+------+
| _c0 |
+------+
| 10 |
+------+
1 row selected (0.498 seconds)

sign()
ReturnType Name(Signature) Description
Double / Int sign(Double / Decimal a) 返回1.0,则表示 a 为正数| 返回-1.0,则表示 a 为负数
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
0: jdbc:hive2://192.168.150.150:10000> select sign(-10);
+-------+
| _c0 |
+-------+
| -1.0 |
+-------+
1 row selected (0.526 seconds)
0: jdbc:hive2://192.168.150.150:10000> select sign(10);
+------+
| _c0 |
+------+
| 1.0 |
+------+
1 row selected (0.077 seconds)

e()
ReturnType Name(Signature) Description
Double e() 返回自然数 e 的值
1
2
3
4
5
6
7
8
0: jdbc:hive2://192.168.150.150:10000> select e();
+--------------------+
| _c0 |
+--------------------+
| 2.718281828459045 |
+--------------------+
1 row selected (0.097 seconds)

pi()
Operator OperandTypes Description
Double pi() 返回圆周率 π 的值
1
2
3
4
5
6
7
8
0: jdbc:hive2://192.168.150.150:10000> select pi();
+--------------------+
| _c0 |
+--------------------+
| 3.141592653589793 |
+--------------------+
1 row selected (0.487 seconds)

factorial()
Operator OperandTypes Description
BigInt factorial( Int a) 返回 a 的阶乘 【a 值的有效返回为[0…20]】
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
0: jdbc:hive2://192.168.150.150:10000> select factorial(5);
+------+
| _c0 |
+------+
| 120 |
+------+
1 row selected (0.117 seconds)
0: jdbc:hive2://192.168.150.150:10000> select factorial(20);
+----------------------+
| _c0 |
+----------------------+
| 2432902008176640000 |
+----------------------+
1 row selected (0.104 seconds)

cbrt()
Operator OperandTypes Description
Double cbrt( Double a) 返回 a 的立方根
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
0: jdbc:hive2://192.168.150.150:10000> select cbrt(8);
+------+
| _c0 |
+------+
| 2.0 |
+------+
1 row selected (0.485 seconds)
0: jdbc:hive2://192.168.150.150:10000> select cbrt(27);
+------+
| _c0 |
+------+
| 3.0 |
+------+
1 row selected (0.087 seconds)

shiftleft() / shiftright() / shiftrightunsugned()
Operator OperandTypes Description
Int | BigInt shiftleft( TinyInt | SmallInt | Int a, Int b) | shiftleft( BigInt a, Int b) 返回 a 向左位移 b 次的数值
Int | BigInt shiftright( TinyInt | SmallInt | Int a, Int b) | shiftright(TinyInt | SmallInt | Int a, Int b) 返回 a 向右位移 b 次的数值
Int | BigInt shiftrightunsigned( TinyInt | SmallInt | Int a, Int b) | shiftrightunsigned(TinyInt | SmallInt | Int a, Int b) 返回 a 无符号向右位移 b 次的数值
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
-----------------------------------------shiftleft()---------------------------------------
0: jdbc:hive2://192.168.150.150:10000> select shiftleft(2,1);
+------+
| _c0 |
+------+
| 4 |
+------+
1 row selected (0.11 seconds)
0: jdbc:hive2://192.168.150.150:10000> select shiftleft(2,3);
+------+
| _c0 |
+------+
| 16 |
+------+
1 row selected (0.519 seconds)
-----------------------------------------shiftright()---------------------------------------
0: jdbc:hive2://192.168.150.150:10000> select shiftright(16,3);
+------+
| _c0 |
+------+
| 2 |
+------+
1 row selected (0.089 seconds)
0: jdbc:hive2://192.168.150.150:10000> select shiftright(4,1);
+------+
| _c0 |
+------+
| 2 |
+------+
1 row selected (0.075 seconds)

-----------------------------------------shiftrightunsigned()---------------------------------------
0: jdbc:hive2://192.168.150.150:10000> select shiftrightunsigned(16,3);
+------+
| _c0 |
+------+
| 2 |
+------+
1 row selected (0.479 seconds)
0: jdbc:hive2://192.168.150.150:10000> select shiftrightunsigned(4,1);
+------+
| _c0 |
+------+
| 2 |
+------+
1 row selected (0.105 seconds)

greatest() / least()
Operator OperandTypes Description
T greatest(T v1,T v2, …) 返回 T 中的最大值 【若 T 中存在null,则返回null】
T least(T v1,T v2, …) 返回 T 中的最小值 【若 T 中存在null,则返回null】
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
-----------------------------------------greatest()---------------------------------------
0: jdbc:hive2://192.168.150.150:10000> select greatest(3,null,4,1);
+-------+
| _c0 |
+-------+
| NULL |
+-------+
1 row selected (0.088 seconds)
0: jdbc:hive2://192.168.150.150:10000> select greatest(3,4,1);
+------+
| _c0 |
+------+
| 4 |
+------+
1 row selected (0.076 seconds)
-----------------------------------------least()---------------------------------------
0: jdbc:hive2://192.168.150.150:10000> select least(3,null,4,1);
+-------+
| _c0 |
+-------+
| NULL |
+-------+
1 row selected (0.492 seconds)
0: jdbc:hive2://192.168.150.150:10000> select least(3,4,1);
+------+
| _c0 |
+------+
| 1 |
+------+
1 row selected (0.078 seconds)

—–☆width_bucket()☆
Operator OperandTypes Description
Int width_bucket(Numeric expr, Numeric min_value, Numeric max_value, Int num_buckets) 对于给定的表达式,返回该表达式的值在计算后将落入的桶数 【expr :为其创建直方图的表达式。此表达式必须计算为数值或日期时间值或可隐式转换为数值或日期时间值的值。如果 expr 计算结果为 null,则表达式返回 null。**|** min_value 和 max_value 是解析到 expr 可接受范围的端点的表达式。这两个表达式还必须计算为数值或日期时间值,两者都不能计算为空。**|** num_buckets :是解析为指示存储桶数的常量的表达式。此表达式必须计算为正整数。】
1

Collection Functions:集合函数

​ Hive 支持以下内置的集合函数:

—–size()
Operator OperandTypes Description
Int size( Map<K,V> ) 返回Map中键值对类型的元素数量
Int size( Array ) 返回Array中数组类型的元素数量
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
-----------------------------------------size(Map)---------------------------------------
select map_keys(Map("name":"Tony")); "name":"Tony", "age":18
-----------------------------------------size(Array)---------------------------------------
0: jdbc:hive2://192.168.150.150:10000> select size(Array(1,2,3,4,5,6,8,9));
+------+
| _c0 |
+------+
| 8 |
+------+
1 row selected (0.483 seconds)
0: jdbc:hive2://192.168.150.150:10000> select size(Array(1,2,3,4));
+------+
| _c0 |
+------+
| 4 |
+------+
1 row selected (0.081 seconds)

—–map_keys() / map_values()
Operator OperandTypes Description
array map_keys(Map<K.V>) 返回一个映射键的无须数组
array map_values(Map<K.V>) 返回一个映射值的无须数组
1
2
3
-----------------------------------------map_keys()---------------------------------------

-----------------------------------------map_values()---------------------------------------
array_contains()
Operator OperandTypes Description
Boolean array_contains(Array, value) 如果 Array 中包含 value,则返回 Ture,否则 Flase
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
0: jdbc:hive2://192.168.150.150:10000> select array_contains(Array(1,2,3,4),1);
+-------+
| _c0 |
+-------+
| true |
+-------+
1 row selected (0.086 seconds)
0: jdbc:hive2://192.168.150.150:10000> select array_contains(Array(1,2,3,4),5);
+--------+
| _c0 |
+--------+
| false |
+--------+
1 row selected (0.11 seconds)

sort_array(Array)

Operator OperandTypes Description
Array sort_array(Array) 返回排序好后的Array
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
0: jdbc:hive2://192.168.150.150:10000> select sort_array(Array(1,2,5,3,4));
+--------------+
| _c0 |
+--------------+
| [1,2,3,4,5] |
+--------------+
1 row selected (0.132 seconds)
0: jdbc:hive2://192.168.150.150:10000> select sort_array(Array(14,5,89,32,48));
+------------------+
| _c0 |
+------------------+
| [5,14,32,48,89] |
+------------------+
1 row selected (0.092 seconds)

Type Conversion Functions:类型转换函数

​ 在 Hive 中支持以下类型转换函数:

—–binary()
Operator OperandTypes Description
Binary binary(string|binary a) 返回将 a 强制转换为二进制的数据
1
2
select binary('我爱你');
select bin(111);
cast()
Operator OperandTypes Description
Expected “=” to follow “type” cast(expr as ) 返回将表达式 expr 的结果转换为 type 的结果
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
0: jdbc:hive2://192.168.150.150:10000> select cast('12345' as int);
+--------+
| _c0 |
+--------+
| 12345 |
+--------+
1 row selected (0.08 seconds)
0: jdbc:hive2://192.168.150.150:10000> select cast('12345' as float);
+----------+
| _c0 |
+----------+
| 12345.0 |
+----------+
1 row selected (0.534 seconds)

Date Functions:日期函数

​ Hive 支持以下内置的日期函数:

from_unixtime()
Operator OperandTypes Description
String from_unixtime(BigInt unixtime[, String format]) 返回 时间戳(unixtime[单位:]) 默认转换为日期(精确到),若指定format格式,则依据格式转换
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
0: jdbc:hive2://192.168.150.150:10000> select from_unixtime(1646875942);
+----------------------+
| _c0 |
+----------------------+
| 2022-03-10 01:32:22 |
+----------------------+
1 row selected (0.075 seconds)
0: jdbc:hive2://192.168.150.150:10000> select from_unixtime(1646875942,'yyyy-MM-dd');
+-------------+
| _c0 |
+-------------+
| 2022-03-10 |
+-------------+
1 row selected (0.081 seconds)
0: jdbc:hive2://192.168.150.150:10000> select from_unixtime(1646875942,'yyyy/MM');
+----------+
| _c0 |
+----------+
| 2022/03 |
+----------+
1 row selected (0.076 seconds)

unix_timestamp()
Operator OperandTypes Description
BigInt unix_timestamp() 默认返回获取当前 Unix 时间戳【单位:】;这个函数不是确定的,它的值对于查询执行的范围也不固定,因此妨碍了查询的适当优化——自从2.0以来,这个函数一直被弃用,而采用 current_ timestamp 常量。
BigInt unix_timestamp(String date) 默认返回获取 date 的 Unix 时间戳【单位:】;将格式为 yyyy-MM-dd HH: mm: ss 的时间字符串 date 转换为 Unix 时间戳
BigInt unix_timestamp(String date,String format) 默认返回获取指定 format 格式的 date的 Unix 时间戳【单位:】;
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
-----------------------------------------unix_timestamp()---------------------------------------
0: jdbc:hive2://192.168.150.150:10000> select unix_timestamp();
+-------------+
| _c0 |
+-------------+
| 1623942105 |
+-------------+
1 row selected (0.493 seconds)
-----------------------------------------unix_timestamp(String date))---------------------------------------
0: jdbc:hive2://192.168.150.150:10000> select unix_timestamp('2022-03-10 01:32:22');
+-------------+
| _c0 |
+-------------+
| 1646875942 |
+-------------+
1 row selected (0.087 seconds)
-----------------------------------------unix_timestamp(String date,String format)-------------------------
0: jdbc:hive2://192.168.150.150:10000> select unix_timestamp('2022-03-10 01:32:22','yy-MM-dd');
+-------------+
| _c0 |
+-------------+
| 1646870400 |
+-------------+
1 row selected (0.113 seconds)
0: jdbc:hive2://192.168.150.150:10000> select unix_timestamp('2022-03-10 01:32','yy-MM-dd HH:MM');
+-------------+
| _c0 |
+-------------+
| 1723251600 |
+-------------+
1 row selected (0.078 seconds)

to_date()
Operator OperandTypes Description
String(Hive_version:Pre 2.1.0) | Date(Hive_version:On 2.1.0) to_date(String date) 返回时间字符串的日期部分
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
0: jdbc:hive2://192.168.150.150:10000> select to_date ('2022-03-10 01:32:22');
+-------------+
| _c0 |
+-------------+
| 2022-03-10 |
+-------------+
1 row selected (0.479 seconds)
0: jdbc:hive2://192.168.150.150:10000> select to_date ('2022-03-10 01:32');
+-------------+
| _c0 |
+-------------+
| 2022-03-10 |
+-------------+
1 row selected (0.476 seconds)

year()
Operator OperandTypes Description
Int year(String date) 返回时间字符串的年份部分
1
2
3
4
5
6
7
8
0: jdbc:hive2://192.168.150.150:10000> select year('2022-03-10 01:32:22');
+-------+
| _c0 |
+-------+
| 2022 |
+-------+
1 row selected (0.485 seconds)

quarter()
Operator OperandTypes Description
Int quarter(String date) 返回时间字符串的季度部分
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
0: jdbc:hive2://192.168.150.150:10000> select quarter('2022-03-10 01:32:22');
+------+
| _c0 |
+------+
| 1 |
+------+
1 row selected (0.106 seconds)
0: jdbc:hive2://192.168.150.150:10000> select quarter('2022-06-10');
+------+
| _c0 |
+------+
| 2 |
+------+
1 row selected (0.076 seconds)

month()
Operator OperandTypes Description
Int month(String date) 返回时间字符串的月份部分
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
0: jdbc:hive2://192.168.150.150:10000> select month('2022-03-10 01:32:22');
+------+
| _c0 |
+------+
| 3 |
+------+
1 row selected (0.077 seconds)
0: jdbc:hive2://192.168.150.150:10000> select month('2022-06-10');
+------+
| _c0 |
+------+
| 6 |
+------+
1 row selected (0.482 seconds)

day() / dayofmonth()/dayofweek()
Operator OperandTypes Description
Int day(String date) 返回时间字符串的日部分
Int dayofweek(Date date |String date) 返回时间字符串的当周的第几天【因为返回的天数为是外国的时间习惯,故我们会自定义函数重写该函数,改成符合 China 的时间习惯】
Int dayofmonth(String date) 返回时间字符串的在当月的第几天
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
-----------------------------------------day()---------------------------------------
0: jdbc:hive2://192.168.150.150:10000> select day('2022-03-10 01:32:22');
+------+
| _c0 |
+------+
| 10 |
+------+
1 row selected (0.089 seconds)
-----------------------------------------dayofweek()---------------------------------------
0: jdbc:hive2://192.168.150.150:10000> select dayofweek(current_date()), current_date();
+------+-------------+
| _c0 | _c1 |
+------+-------------+
| 7 | 2021-06-19 |
+------+-------------+
1 row selected (0.081 seconds)
-- 2021-06-19 为周六!(故需要重写...)
-----------------------------------------dayofmonth()---------------------------------------
0: jdbc:hive2://192.168.150.150:10000> select dayofmonth('2022-06-12');
+------+
| _c0 |
+------+
| 12 |
+------+
1 row selected (0.104 seconds)

hour()
Operator OperandTypes Description
Int hour(String date) 返回时间字符串的小时部分
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
0: jdbc:hive2://192.168.150.150:10000> select hour('2022-03-10 01:32:22');
+------+
| _c0 |
+------+
| 1 |
+------+
1 row selected (0.499 seconds)
0: jdbc:hive2://192.168.150.150:10000> select hour('2022-06-12');
+------+
| _c0 |
+------+
| 0 |
+------+
1 row selected (0.484 seconds)

minute()
Operator OperandTypes Description
Int minute(String date) 返回时间字符串的分钟部分
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
0: jdbc:hive2://192.168.150.150:10000> select minute('2022-03-10 01:32:22');
+------+
| _c0 |
+------+
| 32 |
+------+
1 row selected (0.08 seconds)
0: jdbc:hive2://192.168.150.150:10000> select minute('2022-06-12');
+------+
| _c0 |
+------+
| 0 |
+------+
1 row selected (0.492 seconds)

second()
Operator OperandTypes Description
Int second(String date) 返回时间字符串的秒部分
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
0: jdbc:hive2://192.168.150.150:10000> select second('2022-03-10 01:32:22');
+------+
| _c0 |
+------+
| 22 |
+------+
1 row selected (0.499 seconds)
0: jdbc:hive2://192.168.150.150:10000> select second('2022-06-12');
+------+
| _c0 |
+------+
| 0 |
+------+
1 row selected (0.074 seconds)

weekofyear()
Operator OperandTypes Description
Int weekofyear(String date) 返回时间字符串的周号【该年中的第几周】
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
0: jdbc:hive2://192.168.150.150:10000> select weekofyear('2021-01-02');
+------+
| _c0 |
+------+
| 53 |
+------+
1 row selected (0.093 seconds)
-- 时间明明是年初啊,为什么周数这么多呢?
-- 因为这个时间的周数还未到周末,还是属于去年的周数,故该时间的周数计算的是去年的周数。
0: jdbc:hive2://192.168.150.150:10000> select weekofyear('2021-01-10');
+------+
| _c0 |
+------+
| 1 |
+------+
1 row selected (0.063 seconds)
0: jdbc:hive2://192.168.150.150:10000> select weekofyear('2022-01-10 01:32:22');
+------+
| _c0 |
+------+
| 2 |
+------+
1 row selected (0.075 seconds)
0: jdbc:hive2://192.168.150.150:10000> select weekofyear('2022-01-10');
+------+
| _c0 |
+------+
| 2 |
+------+
1 row selected (0.49 seconds)

interval()
Operator OperandTypes Description
String +|- interval String num field 该函数让在时间的加减上更为简单;【field:year、month、day、 hour、 minute、 second ;field 的顺序需要按照时间的格式依次从 year 到second】
String +|- interval String nums field to field 在 interval 函数的进行多个字段同时做同样的运算(当然执行不同的运算需要依次调用interval函数进行运算)【field to field :field只能为相邻的两个字段,不能越级且大单位在前【目前测只能year to month 可用;多个字段连续运算,目前测得只有day to second 可用】;年月日的分隔符为’-‘,时分秒的分隔符为’:’,且时间的运算基于当前时间,不是预定义的时间字符串】
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
0: jdbc:hive2://192.168.150.150:10000> select current_date() - interval '2' year;
+-------------+
| _c0 |
+-------------+
| 2019-06-19 |
+-------------+
1 row selected (0.515 seconds)
0: jdbc:hive2://192.168.150.150:10000> select current_date() - interval '2' month;
+-------------+
| _c0 |
+-------------+
| 2021-04-19 |
+-------------+
1 row selected (0.476 seconds)
0: jdbc:hive2://192.168.150.150:10000> select current_date() - interval '2' year + interval '3' month;
+-------------+
| _c0 |
+-------------+
| 2019-09-19 |
+-------------+
1 row selected (0.087 seconds)
0: jdbc:hive2://192.168.150.150:10000> select current_date() - interval '2-3' year to month;
+-------------+
| _c0 |
+-------------+
| 2019-03-19 |
+-------------+
0: jdbc:hive2://192.168.150.150:10000> select current_timestamp - interval '3 12:20:30' day to second;
+--------------------------+
| _c0 |
+--------------------------+
| 2021-06-15 22:28:38.168 |
+--------------------------+
1 row selected (0.075 seconds)
0: jdbc:hive2://192.168.150.150:10000> select current_timestamp - interval '3 12:20' day to minute;
Error: Error while compiling statement: FAILED: ParseException line 1:50 cannot recognize input near 'to' 'minute' '<EOF>' in selection target (state=42000,code=40000)
-- 该问题不知什么情况,希望了解的小伙伴评论一起解决吧...

extract()
Operator OperandTypes Description
Int extract(field From source) 从源检索字段;【源:时间字符串字段:day、 dayofweek、 hour、 minute、 month、 quarter、 second、 week 和 year】
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
0: jdbc:hive2://192.168.150.150:10000> select extract(year from '2022-01-10 01:32:22');
+-------+
| _c0 |
+-------+
| 2022 |
+-------+
1 row selected (0.085 seconds)
0: jdbc:hive2://192.168.150.150:10000> select extract(quarter from '2022-01-10 01:32:22');
+------+
| _c0 |
+------+
| 1 |
+------+
1 row selected (0.112 seconds)
0: jdbc:hive2://192.168.150.150:10000> select extract(month from '2022-01-10 01:32:22');
+------+
| _c0 |
+------+
| 1 |
+------+
1 row selected (0.082 seconds)
0: jdbc:hive2://192.168.150.150:10000> select extract(week from '2022-01-10 01:32:22');
+------+
| _c0 |
+------+
| 2 |
+------+
1 row selected (0.104 seconds)
0: jdbc:hive2://192.168.150.150:10000> select extract(month from interval '1-3' year to month);
+------+
| _c0 |
+------+
| 3 |
+------+
1 row selected (0.064 seconds)
0: jdbc:hive2://192.168.150.150:10000> select extract(minute from interval '3 12:20:30' day to second) ;
+------+
| _c0 |
+------+
| 20 |
+------+
1 row selected (0.064 seconds)

datediff()
Operator OperandTypes Description
Int datediff( String enddate, String startdate) 返回开始时间到结束日期的天数
1
2
3
4
5
6
7
8
0: jdbc:hive2://192.168.150.150:10000> select datediff(current_date(), '2008-08-08');
+-------+
| _c0 |
+-------+
| 4698 |
+-------+
1 row selected (0.059 seconds)

date_add()
Operator OperandTypes Description
String(Hive_version:Pre 2.1.0) | Date(Hive_version:On 2.1.0) date_add(date/timestamp/string startdate, tinyint/smallint/int days) 返回向开始时期加上一个天数的日期
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
0: jdbc:hive2://192.168.150.150:10000> select date_add(current_date(), 10);
+-------------+
| _c0 |
+-------------+
| 2021-06-29 |
+-------------+
1 row selected (0.085 seconds)
0: jdbc:hive2://192.168.150.150:10000> select date_add(current_date(), -10);
+-------------+
| _c0 |
+-------------+
| 2021-06-09 |
+-------------+
1 row selected (0.072 seconds)

date_sub()
Operator OperandTypes Description
String(Hive_version:Pre 2.1.0) | Date(Hive_version:On 2.1.0) date_sub(date/timestamp/string startdate, tinyint/smallint/int days) 返回向开始时期减去一个天数的日期
1
2
3
4
5
6
7
8
9
10
11
12
13
14
0: jdbc:hive2://192.168.150.150:10000> select date_sub(current_date(), -10);
+-------------+
| _c0 |
+-------------+
| 2021-06-29 |
+-------------+
1 row selected (0.469 seconds)
0: jdbc:hive2://192.168.150.150:10000> select date_sub(current_date(), 10);
+-------------+
| _c0 |
+-------------+
| 2021-06-09 |
+-------------+
1 row selected (0.071 seconds)
from_utc_timestamp() / to_utc_timestamp()
Operator OperandTypes Description
Timestamp from_utc_timestamp(Timestamp ts| String date , String timezone) 返回将UTC格式的date,转换为指定时区的时间
Timestamp to_utc_timestamp(Timestamp ts| String date , String timezone) 返回将date按照指定时区的时间转换为UTC格式的时间
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
-----------------------------------------from_utc_timestamp()---------------------------------------
0: jdbc:hive2://192.168.150.150:10000> select from_utc_timestamp('1970-01-01 00:1654:00','PRC');
+------------------------+
| _c0 |
+------------------------+
| 1970-01-01 08:00:00.0 |
+------------------------+
1 row selected (0.066 seconds)
-- PRC为北京时间(没记错是东八区)
0: jdbc:hive2://192.168.150.150:10000> select from_utc_timestamp(0,'PRC');
+------------------------+
| _c0 |
+------------------------+
| 1970-01-01 08:00:00.0 |
+------------------------+
1 row selected (0.073 seconds)
-----------------------------------------to_utc_timestamp()---------------------------------------
0: jdbc:hive2://192.168.150.150:10000> select to_utc_timestamp(0,'PRC');
+------------------------+
| _c0 |
+------------------------+
| 1969-12-31 16:00:00.0 |
+------------------------+
1 row selected (0.068 seconds)

current_date()
Operator OperandTypes Description
Date current_date() 返回当前系统的时间
1
2
3
4
5
6
7
8
0: jdbc:hive2://192.168.150.150:10000> select current_date();
+-------------+
| _c0 |
+-------------+
| 2021-06-19 |
+-------------+
1 row selected (0.104 seconds)

current_timestamp()
Operator OperandTypes Description
Timestamp current_timestamp() 返回当前系统的时间戳【单位:毫秒
1
2
3
4
5
6
7
8
0: jdbc:hive2://192.168.150.150:10000> select current_timestamp();
+--------------------------+
| _c0 |
+--------------------------+
| 2021-06-19 11:24:40.936 |
+--------------------------+
1 row selected (0.07 seconds)

add_months()
Operator OperandTypes Description
String add_months( String start_date, Int num_months,[,output_date_format]) 返回给定时间增加的月份数,默认返回格式:’yyyy-MM-dd’,如果给定输出时间格式,则按指定格式返回时间格式
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
0: jdbc:hive2://192.168.150.150:10000> select add_months(current_date(),3);
+-------------+
| _c0 |
+-------------+
| 2021-09-19 |
+-------------+
1 row selected (0.06 seconds)
0: jdbc:hive2://192.168.150.150:10000> select add_months(current_date(),3,'yyyy-MM-dd HH:MM:SS');
+----------------------+
| _c0 |
+----------------------+
| 2021-09-19 00:09:00 |
+----------------------+
1 row selected (0.061 seconds)

last_day()
Operator OperandTypes Description
String last_day(String date) 返回该日期所属月的最后一天
1
2
3
4
5
6
7
8
0: jdbc:hive2://192.168.150.150:10000> select last_day(current_date());
+-------------+
| _c0 |
+-------------+
| 2021-06-30 |
+-------------+
1 row selected (0.063 seconds)

next_day()
Operator OperandTypes Description
String next_day( String start_date, String day_of_week) 返回start_date 的最近的下一个day_of_week【day_of_week:mo(2);mon(3);monday(全称)】
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
0: jdbc:hive2://192.168.150.150:10000> select next_day(current_date(),'mo');
+-------------+
| _c0 |
+-------------+
| 2021-06-21 |
+-------------+
1 row selected (0.066 seconds)
0: jdbc:hive2://192.168.150.150:10000> select next_day(current_date(),'mon');
+-------------+
| _c0 |
+-------------+
| 2021-06-21 |
+-------------+
1 row selected (0.06 seconds)
0: jdbc:hive2://192.168.150.150:10000> select next_day(current_date(),'monday');
+-------------+
| _c0 |
+-------------+
| 2021-06-21 |
+-------------+
1 row selected (0.061 seconds)

trunc()
Operator OperandTypes Description
String trunc(String date, String format ) 返回 date 对应 format 单位的第一天日期【supported formats:MONTH/MON/MM,YEAR/YYYY/YY】
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
0: jdbc:hive2://192.168.150.150:10000> select trunc(current_date(),'YYYY');
+-------------+
| _c0 |
+-------------+
| 2021-01-01 |
+-------------+
1 row selected (0.46 seconds)
0: jdbc:hive2://192.168.150.150:10000> select trunc(current_date(),'MONTH');
+-------------+
| _c0 |
+-------------+
| 2021-06-01 |
+-------------+
1 row selected (0.084 seconds)

months_between()
Operator OperandTypes Description
Double months_between(String date1,String date2) 返回两个日期的天数【如果date1晚于date2,那么结果是正的。如果date1早于date2,则结果为负值。如果date1和date2是一个月的同一天或两个月的最后一天,那么结果总是一个整数。否则,UDF根据31天的月份计算结果的小数部分,并考虑date1和date2时间成分的差异。】
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
0: jdbc:hive2://192.168.150.150:10000> select months_between('2021-5-19','2021-7-18');
+--------------+
| _c0 |
+--------------+
| -1.96774194 |
+--------------+
1 row selected (0.063 seconds)
0: jdbc:hive2://192.168.150.150:10000> select months_between('2021-5-30','2021-7-30');
+-------+
| _c0 |
+-------+
| -2.0 |
+-------+
1 row selected (0.075 seconds)

date_format()
Operator OperandTypes Description
String date_format(Date/String ts, String format) 返回将 ts 对应的 format 格式时间
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
0: jdbc:hive2://192.168.150.150:10000> select date_format(current_date(), 'yyyy-MM-dd');
+-------------+
| _c0 |
+-------------+
| 2021-06-19 |
+-------------+
1 row selected (0.071 seconds)
0: jdbc:hive2://192.168.150.150:10000> select date_format(current_date(), 'yyyy-MM-dd HH:MM:SS');
+----------------------+
| _c0 |
+----------------------+
| 2021-06-19 00:06:00 |
+----------------------+
1 row selected (0.479 seconds)

Conditional Functions:条件函数

if()
Operator OperandTypes Description
T if(Boolean testCondition,T valuetrue, T valueFalseOrNull) 当 testCondition 为 true 时返回 valueTrue,否则返回 valueFalseOrNull
1
2
3
4
5
6
7
8
0: jdbc:hive2://192.168.150.150:10000> select if(1>2, 1111,2222);
+-------+
| _c0 |
+-------+
| 2222 |
+-------+
1 row selected (0.139 seconds)
-- 与Java、Python等中的三元表达式相似
isnull() / isnotnull()
Operator OperandTypes Description
Boolean isnull(a) 如果 a 为 NULL,则返回 true,否则返回 false
Boolean isnotnull(a) 如果 a 不为 NULL,则返回 true,否则返回 false
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
-----------------------------------------isnull()---------------------------------------
0: jdbc:hive2://192.168.150.150:10000> select isnull(111);
+--------+
| _c0 |
+--------+
| false |
+--------+
1 row selected (0.106 seconds)
-----------------------------------------isnotnull()---------------------------------------
0: jdbc:hive2://192.168.150.150:10000> select isnotnull(null);
+--------+
| _c0 |
+--------+
| false |
+--------+
1 row selected (0.076 seconds)

nvl()
Operator OperandTypes Description
T nvl(T value, T default_value) 如果 value 不为空,则返回value,否则返回default_value
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
0: jdbc:hive2://192.168.150.150:10000> select nvl(null,1111);
+-------+
| _c0 |
+-------+
| 1111 |
+-------+
1 row selected (0.066 seconds)
0: jdbc:hive2://192.168.150.150:10000> select nvl(2222,1111);
+-------+
| _c0 |
+-------+
| 2222 |
+-------+
1 row selected (0.07 seconds)

coalesce()
Operator OperandTypes Description
T coalesce( T v1,T v2,…) 返回第一个非NULL的值,如果全是NULL,则返回NULL
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
0: jdbc:hive2://192.168.150.150:10000> select coalesce(null,1111);
+-------+
| _c0 |
+-------+
| 1111 |
+-------+
1 row selected (0.059 seconds)
0: jdbc:hive2://192.168.150.150:10000> select coalesce(null,null);
+-------+
| _c0 |
+-------+
| NULL |
+-------+
1 row selected (0.071 seconds)

—–case()
Operator OperandTypes Description
T case a when b then c [when d then e] * [else f] end 当 a = b 时,返回 c; 当 a = d 时,返回 e; else 返回 f
T case when a then b [when c then d] * [else e] end 当 a = true 时,返回 b; 当 c = true 时,返回 d; else 返回 e
1

nullif()
Operator OperandTypes Description
T nullif(a,b) 如果 a = b,则返回NULL,否则返回 a;【case语句:case when a = b then NULL else a】
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
0: jdbc:hive2://192.168.150.150:10000> select nullif(1,2);
+------+
| _c0 |
+------+
| 1 |
+------+
1 row selected (0.057 seconds)
0: jdbc:hive2://192.168.150.150:10000> select nullif(1,1);
+-------+
| _c0 |
+-------+
| NULL |
+-------+
1 row selected (0.068 seconds)

assert_true()
Operator OperandTypes Description
void assert_true( Boolean condition) 如果 condition 不为 true,则抛出异常,否则返回 null
1
2
3
4
5
6
7
8
9
10
0: jdbc:hive2://192.168.150.150:10000> select assert_true(1>2);
Error: java.io.IOException: org.apache.hadoop.hive.ql.metadata.HiveException: ASSERT_TRUE(): assertion failed. (state=,code=0)
0: jdbc:hive2://192.168.150.150:10000> select assert_true(1<2);
+-------+
| _c0 |
+-------+
| NULL |
+-------+
1 row selected (0.069 seconds)

String Functions:字符串函数

​ 在 Hive 中支持一下内置的字符串函数:

ascii()
Operator OperandTypes Description
Int ascii( String str) 返回 str 的第一个字符的阿斯克码
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
0: jdbc:hive2://192.168.150.150:10000> select ascii('LOVE');
+------+
| _c0 |
+------+
| 76 |
+------+
1 row selected (0.149 seconds)
0: jdbc:hive2://192.168.150.150:10000> select ascii('L');
+------+
| _c0 |
+------+
| 76 |
+------+
1 row selected (0.066 seconds)

base64() / unbase64()
Operator OperandTypes Description
String base64( Binary bin) 返回 将参数从二进制转换为base64后的字符串
Binary unbase64(String str) 返回 将字符串 str 转换为base64后的Binary
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
-----------------------------------------base64()---------------------------------------: jdbc:hive2://192.168.150.150:10000> select base64(binary('LOVE'));
+-----------+
| _c0 |
+-----------+
| TE9WRQ== |
+-----------+
1 row selected (0.487 seconds)
-----------------------------------------unbase64()---------------------------------------0: jdbc:hive2://192.168.150.150:10000> select unbase64('TE9WRQ==');
+-------+
| _c0 |
+-------+
| LOVE |
+-------+
1 row selected (0.468 seconds)

character_length()
Operator OperandTypes Description
Int character_length(string str) 返回str中包含的 UTF-8 的字符数
1
2
3
4
5
6
7
8
0: jdbc:hive2://192.168.150.150:10000> select character_length('LOVE');
+------+
| _c0 |
+------+
| 4 |
+------+
1 row selected (0.068 seconds)

chr()
Operator OperandTypes Description
String char( BigInt | Double a) 返回 与 a 等价的二进制字符的ASCII字符
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
0: jdbc:hive2://192.168.150.150:10000> select ascii('L');
+------+
| _c0 |
+------+
| 76 |
+------+
1 row selected (0.065 seconds)
0: jdbc:hive2://192.168.150.150:10000> select chr(76);
+------+
| _c0 |
+------+
| L |
+------+
1 row selected (0.063 seconds)

concat()
Operator OperandTypes Description
String concat(String|Binary A, String|Binary B…) 按顺序返回串联传入的字符串或字节所产生的字符串或字节
1
2
3
4
5
6
7
8
9
0: jdbc:hive2://192.168.150.150:10000> select concat('LOVE','YOU'),concat('foot','ball');
+----------+-----------+
| _c0 | _c1 |
+----------+-----------+
| LOVEYOU | football |
+----------+-----------+
1 row selected (0.467 seconds)


concat_ws()
Operator OperandTypes Description
String concat_ws(String sep, String a, String b,…) 类似于上面的concat(), 但是有定制的 sep 分隔符
String concat_ws(String sep, Array) 与上面的 concat _ ws ()类似,用定制的sep分割符将array元素连接
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
0: jdbc:hive2://192.168.150.150:10000> select concat_ws('_','LOVE','YOU');
+-----------+
| _c0 |
+-----------+
| LOVE_YOU |
+-----------+
1 row selected (0.064 seconds)
0: jdbc:hive2://192.168.150.150:10000> select concat_ws('_',Array('foot','ball','basket','ball'));
+------------------------+
| _c0 |
+------------------------+
| foot_ball_basket_ball |
+------------------------+
1 row selected (0.076 seconds)

decode()/encode()
Operator OperandTypes Description
String decode(Binary bin, String charset) 提供charset 将 bin 解码为字符串【charset:”US-ASCII”、”ISO-8859-1”、”UTF-8”、”UTF-16BE”、”UTF-16LE”、”UTF-16”】
String encode(String src, String charset) 提供charset 将 src解码为字符串【charset:”US-ASCII”、”ISO-8859-1”、”UTF-8”、”UTF-16BE”、”UTF-16LE”、”UTF-16”】
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
-----------------------------------------decode()---------------------------------------
0: jdbc:hive2://192.168.150.150:10000> select decode(binary('LOVE'),'UTF-16LE'),decode(binary('ME'),'UTF-16LE');
+------+------+
| _c0 | _c1 |
+------+------+
| 佌䕖 | 䕍 |
+------+------+
1 row selected (0.46 seconds)
-----------------------------------------encode()---------------------------------------
0: jdbc:hive2://192.168.150.150:10000> select encode('佌䕖','UTF-16LE'),encode('䕍','UTF-16LE');
+-------+------+
| _c0 | _c1 |
+-------+------+
| LOVE | ME |
+-------+------+
1 row selected (0.085 seconds)

elt()
Operator OperandTypes Description
String elt(N Int,String str1, Stringstr2 ,String str3,…) 返回索引号处的字符串【如果 n 小于1或大于参数数目,则返回 NULL。】
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
0: jdbc:hive2://192.168.150.150:10000> select elt(2, 'I','LOVE','YOU');
+-------+
| _c0 |
+-------+
| LOVE |
+-------+
1 row selected (0.597 seconds)
0: jdbc:hive2://192.168.150.150:10000> select elt(4, 'I','LOVE','YOU');
+-------+
| _c0 |
+-------+
| NULL |
+-------+
1 row selected (0.096 seconds)

field()
Operator OperandTypes Description
Int field(val T,val1 T,val2 T,val3 T,…) 返回 val1、 val2、 val3、 … 列表中 val首次出现的索引,如果没有找到则返回0【支持所有基本类型,使用 str.equals (x)比较参数。如果 val 为 NULL,则返回值为0】
1
2
3
4
5
6
7
8
0: jdbc:hive2://192.168.150.150:10000> select field( 'Love','I','LOVE','YOU','ME','TOO'),field(1,1,1,1);
+------+------+
| _c0 | _c1 |
+------+------+
| 0 | 1 |
+------+------+
1 row selected (0.106 seconds)

find_in_set()
Operator OperandTypes Description
Int find_in_set(String str, String strList) 返回 strList 中 str 首次出现的下标,其中 strList 是逗号分隔的字符串【如果任一参数为 null,则返回 null;如果第一个参数包含逗号,则返回0】
1
2
3
4
5
6
7
8
0: jdbc:hive2://192.168.150.150:10000> select find_in_set( 'LOVE','I,LOVE,YOU,ME,TOO'),find_in_set('NULL','say,hello,world');
+------+------+
| _c0 | _c1 |
+------+------+
| 2 | 0 |
+------+------+
1 row selected (0.09 seconds)

format_number()
Operator OperandTypes Description
String format_number(Number x, Int d) 将数字 x 格式化为’# ,#### ,#### .##’,四舍五入到小数点后的 d 位,并以字符串形式返回结果【如果 d 是0,则结果没有小数点或小数部分】
1
2
3
4
5
6
7
8
0: jdbc:hive2://192.168.150.150:10000> select format_number(123456789.12345,3),format_number(123456789.12345,0);
+------------------+--------------+
| _c0 | _c1 |
+------------------+--------------+
| 123,456,789.123 | 123,456,789 |
+------------------+--------------+
1 row selected (0.103 seconds)

get_json_object()
Operator OperandTypes Description
String get_json_object(String json_string, String path) 基于指定的 json 路径从 json 字符串中提取 json 对象,并返回提取的 json 对象的 json 字符串【一次解析一个,但可以多层,如果输入 json 字符串无效,则返回 null。注意: json 路径只能有字符[0-9|a-z] ,即没有大写字母或特殊字符。此外,键 * 不能以数字开头。* 这是由于对 Hive 列名的限制。】
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
0: jdbc:hive2://192.168.150.150:10000> select get_json_object('{"sentence":"I LOVE YOU","age":18,"date":"2016-12-10"}','$.date');
+-------------+
| _c0 |
+-------------+
| 2016-12-10 |
+-------------+
1 row selected (0.548 seconds)
0: jdbc:hive2://192.168.150.150:10000> select get_json_object('{"sentence":"I LOVE YOU","age":18,"date":"2016-12-10"}','$.name');
+-------+
| _c0 |
+-------+
| NULL |
+-------+
1 row selected (0.156 seconds)
0: jdbc:hive2://192.168.150.150:10000> select get_json_object('{"sentence":"I LOVE YOU","age":18,"date":"2016-12-10","person":{"person1":"demo"}}','$.person.person1');
+-------+
| _c0 |
+-------+
| demo |
+-------+
1 row selected (0.149 seconds)

json_tuple()
Operator OperandTypes Description
String1,…,Stringn json_tuple(String jsonStr,String k1,…,String kn) 基于指定的 json 路径从 json 字符串中提取 json 对象,并返回提取的 json 对象的 json 字符串【一次解析多个,但只能一层,如果输入 json 字符串无效,则返回 null。注意: json 路径只能有字符[0-9|a-z] ,即没有大写字母或特殊字符。此外,键 * 不能以数字开头。* 这是由于对 Hive 列名的限制。】
1
2
3
4
5
6
7
0: jdbc:hive2://192.168.150.150:10000> select json_tuple('{"sentence":"I LOVE YOU","age":18,"date":"2016-12-10"}','date','sentence','name');
+-------------+-------------+-------+
| c0 | c1 | c2 |
+-------------+-------------+-------+
| 2016-12-10 | I LOVE YOU | NULL |
+-------------+-------------+-------+

in_file()
Operator OperandTypes Description
Boolean in_file(String str, String filename) 如果字符串 str 在文件名中显示为整行,则返回 true。
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
0: jdbc:hive2://192.168.150.150:10000> select in_file('aaa', '/scripts/student.log'); 
+-------+
| _c0 |
+-------+
| true |
+-------+
1 row selected (0.152 seconds)
0: jdbc:hive2://192.168.150.150:10000> select in_file('bbb', '/scripts/student.log');
+--------+
| _c0 |
+--------+
| false |
+--------+
1 row selected (0.142 seconds)

instr()
Operator OperandTypes Description
Int instr(String str, String substr) 返回 str 中 substr 第一次出现的位置【如果两个参数中有一个为 null,返回 null; 如果在 str 中找不到 substr,返回0。请注意,这不是从零开始的。Str 中的第一个字符的索引为1
1
2
3
4
5
6
7
8
0: jdbc:hive2://192.168.150.150:10000> select instr('I,LOVE,YOU,ME,TOO','LOVE'),instr('I,LOVE,YOU,ME,TOO','null');
+------+------+
| _c0 | _c1 |
+------+------+
| 3 | 0 |
+------+------+
1 row selected (0.168 seconds)

length()
Operator OperandTypes Description
Int length(String A) 返回字符串的长度【空格也算一个字符】
1
2
3
4
5
6
7
8
0: jdbc:hive2://192.168.150.150:10000> select length('I LOVE YOU'),length('LOVE');
+------+------+
| _c0 | _c1 |
+------+------+
| 10 | 4 |
+------+------+
1 row selected (0.136 seconds)

locate()
Operator OperandTypes Description
Int locate(String substr, String str[, Int pos]) 返回位置 pos 后 str 中第一个出现substr 的位置【找不到substr则返回0】
1
2
3
4
5
6
7
8
0: jdbc:hive2://192.168.150.150:10000> select locate('LOVE','I LOVE YOU'),locate('L','I LOVE YOU',5),locate('L','I LOVE YOU L',5);
+------+------+------+
| _c0 | _c1 | _c2 |
+------+------+------+
| 3 | 0 | 12 |
+------+------+------+
1 row selected (0.565 seconds)

lower() / lcase()
Operator OperandTypes Description
String lower / lcase(String A) 返回将 A 的所有字符转换为小写所得到的字符串
1
2
3
4
5
6
7
8
0: jdbc:hive2://192.168.150.150:10000> select lower('LOVE'),lcase('LOVE');
+-------+-------+
| _c0 | _c1 |
+-------+-------+
| love | love |
+-------+-------+
1 row selected (0.129 seconds)

lpad()
Operator OperandTypes Description
String lpad(String str, Int len, String pad) 返回在 str左边填充了一个长度为 len 的填充区域【如果 str 长于 len,则返回值缩短为 len 字符。对于空填充字符串,返回值为空】
1
2
3
4
5
6
7
8
0: jdbc:hive2://192.168.150.150:10000> select lpad('LOVE',2,'I'), lpad('LOVE',6,'I'), lpad('',0,'I');
+------+---------+------+
| _c0 | _c1 | _c2 |
+------+---------+------+
| LO | IILOVE | |
+------+---------+------+
1 row selected (0.129 seconds)

ltrim() / rtrim()
Operator OperandTypes Description
String ltrim(String A) 返回从 A 的开头(左侧)去空得到的字符串
String rtrim(string A) 返回从 A 的尾部(右侧)去空得到的字符串
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
-----------------------------------------ltrim()---------------------------------------
0: jdbc:hive2://192.168.150.150:10000> select ltrim(' LOVE'),ltrim(' I LOVE');
+-------+---------+
| _c0 | _c1 |
+-------+---------+
| LOVE | I LOVE |
+-------+---------+
1 row selected (0.536 seconds)
-----------------------------------------rtrim()---------------------------------------
0: jdbc:hive2://192.168.150.150:10000> select rtrim('LOVE '),rtrim('I LOVE ');
+-------+---------+
| _c0 | _c1 |
+-------+---------+
| LOVE | I LOVE |
+-------+---------+
1 row selected (0.508 seconds)

octet_length()
Operator OperandTypes Description
Int octet_length(string str) 返回在 UTF-8编码中保存字符串 str 所需的八位字节数【中文的UTF-8的字节数为一个字三个字节】
1
2
3
4
5
6
7
8
0: jdbc:hive2://192.168.150.150:10000> select octet_length('LOVE'),octet_length('我爱你');
+------+------+
| _c0 | _c1 |
+------+------+
| 4 | 9 |
+------+------+
1 row selected (0.155 seconds)

—–parse_url()
Operator OperandTypes Description
String parse_url(String urlString, String partToExtract [, String keyToExtract]) 从 URL 返回指定的部分,一次只能提取一个part【partToExtract 的有效值包括 HOST:域名,PROTOCOL:协议,PORT:端口,REF:引用来源,PATH:路径,QUERY:解析—可指定QUERY解析的part、 AUTHORITY:、 FILE :和 USERINFO:】
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
-----------------------------------------HOST---------------------------------------
0: jdbc:hive2://192.168.150.150:10000> select parse_url('https://yangyangmm.cn/2021/01/14/Python函数/?k1=v1&k2=v2','HOST');
+----------------+
| _c0 |
+----------------+
| yangyangmm.cn |
+----------------+
1 row selected (0.131 seconds)

-----------------------------------------PATH---------------------------------------
0: jdbc:hive2://192.168.150.150:10000> select parse_url('https://yangyangmm.cn/2021/01/06/Python函数/?k1=v1&k2=v2','PATH');
+------------------------+
| _c0 |
+------------------------+
| /2021/01/06/Python函数/ |
+------------------------+
1 row selected (0.167 seconds)

-----------------------------------------QUERY---------------------------------------
0: jdbc:hive2://192.168.150.150:10000> select parse_url('https://yangyangmm.cn/2021/01/06/Python函数/?k1=v1&k2=v2','QUERY;)
+--------------+
| _c0 |
+--------------+
| k1=v1&k2=v2 |
+--------------+
1 row selected (0.52 seconds)

0: jdbc:hive2://192.168.150.150:10000> select parse_url('https://yangyangmm.cn/2021/01/06/Python函数/?k1=v1&k2=v2','QUERY'k1');
+------+
| _c0 |
+------+
| v1 |
+------+
1 row selected (0.57 seconds)

-----------------------------------------REF---------------------------------------
select parse_url('https://yangyangmm.cn/2021/01/06/Python函数/?k1=v1&k2=v2','REF');
-----------------------------------------PROTOCOL---------------------------------------
0: jdbc:hive2://192.168.150.150:10000> select parse_url('https://yangyangmm.cn/2021/01/06/Python函数/?k1=v1&k2=v2','PROTOL');
+--------+
| _c0 |
+--------+
| https |
+--------+
1 row selected (0.123 seconds)

-----------------------------------------AUTHORITY---------------------------------------
0: jdbc:hive2://192.168.150.150:10000> select parse_url('https://yangyangmm.cn/2021/01/06/Python函数/?k1=v1&k2=v2','AUTHOTY');
+----------------+
| _c0 |
+----------------+
| yangyangmm.cn |
+----------------+
1 row selected (0.142 seconds)

-----------------------------------------FILE---------------------------------------
select parse_url('https://yangyangmm.cn/2021/01/06/Python函数/?k1=v1&k2=v2','FILE ');
-----------------------------------------USERINFO---------------------------------------
select parse_url('https://yangyangmm.cn/2021/01/06/Python函数/?k1=v1&k2=v2','USERINFO');
parse_url_tuple()
Operator OperandTypes Description
String1,…,StirngN parse_url_tuple(String urlStr,String p1,…,String pn) 从 URL 返回指定的部分,一次能提取多个part【pn 的有效值包括 HOST:域名,PROTOCOL:协议,PORT:端口,REF:引用来源,PATH:路径,QUERY:解析—可指定QUERY解析的part、 AUTHORITY:、 FILE :和 USERINFO:】
1
2
3
4
5
6
7
8
0: jdbc:hive2://192.168.150.150:10000> select parse_url_tuple('https://yangyangmm.cn/2021/01/14/Python函数/?k1=v1&k2=','HOST','PATH','QUERY:k1');
+----------------+------------------------+-----+
| c0 | c1 | c2 |
+----------------+------------------------+-----+
| yangyangmm.cn | /2021/01/14/Python函数/ | v1 |
+----------------+------------------------+-----+
1 row selected (0.119 seconds)

printf()
Operator OperandTypes Description
String printf(String format, Obj… args) 返回将参数 args 按照 format 格式化
1
2
3
4
5
6
7
0: jdbc:hive2://192.168.150.150:10000> select printf('name:%s,age:%d,salary:%.2f','LOVE',18,34234.4534);
+-----------------------------------+
| _c0 |
+-----------------------------------+
| name:LOVE,age:18,salary:34234.45 |
+-----------------------------------+

quote()
Operator OperandTypes Description
String quote(String text) 返回带引号的字符串【该函数在hive4.0版本中应用】
regexp_extract()
Operator OperandTypes Description
String regexp_extract(String subject, String pattern,Int index) 将字符串subject按照pattern正则表达式的规则拆分,返回index指定的字符。【^ 表示开头;$ 表示结尾;. 表示任意字符;* 表示任意多个】
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
0: jdbc:hive2://192.168.150.150:10000> select regexp_extract('3,300,1,27.01,"Soup - Campbells, Minestrone",2018/7/17,11:53','(.*?),(.*?),(.*?),(.*?),"?(.*?)"?,(\\d{4}/\\d{1,2}/\\d{1,2}),(.*?)',4) price;
+--------+
| price |
+--------+
| 27.01 |
+--------+
1 row selected (0.116 seconds)
0: jdbc:hive2://192.168.150.150:10000>
0: jdbc:hive2://192.168.150.150:10000> select regexp_extract('3,300,1,27.01,"Soup - Campbells, Minestrone",2018/7/17,11:53','(.*?),(.*?),(.*?),(.*?),"?(.*?)"?,(\\d{4}/\\d{1,2}/\\d{1,2}),(.*?)',5) product;
+-------------------------------+
| product |
+-------------------------------+
| Soup - Campbells, Minestrone |
+-------------------------------+
1 row selected (0.116 seconds)
0: jdbc:hive2://192.168.150.150:10000>
0: jdbc:hive2://192.168.150.150:10000> select regexp_extract('3,300,1,27.01,"Soup - Campbells, Minestrone",2018/7/17,11:53','(.*?),(.*?),(.*?),(.*?),"?(.*?)"?,(\\d{4}/\\d{1,2}/\\d{1,2}),(.*?)',6) tran_date;
+------------+
| tran_date |
+------------+
| 2018/7/17 |
+------------+
1 row selected (0.108 seconds)

regexp_replace()
Operator OperandTypes Description
String regexp_replace(String initial_string, String pattern, String replacement) 将字符串 initial_string 中的符合正则表达式 pattern 的部分替换为 replacement
1
2
3
4
5
6
7
8
0: jdbc:hive2://192.168.150.150:10000> select regexp_replace('L234VE', '\\d+', 'O'),regexp_replace('123abc456def789','[a-z]+',' ');
+-------+--------------+
| _c0 | _c1 |
+-------+--------------+
| LOVE | 123 456 789 |
+-------+--------------+
1 row selected (0.125 seconds)

replace(string A, string OLD, string NEW)

Operator OperandTypes Description
String replace(String A, String OLD, String NEW) 返回将 A 中的 OLD 替换为 NEW
1
2
3
4
5
6
7
8
0: jdbc:hive2://192.168.150.150:10000> select replace('L234VE', '234', 'O'),replace('123abc456abc789','abc',' ');
+-------+--------------+
| _c0 | _c1 |
+-------+--------------+
| LOVE | 123 456 789 |
+-------+--------------+
1 row selected (0.109 seconds)

repeat()
Operator OperandTypes Description
String repeat(String str, Int n) 将 str 重复 n 次
1
2
3
4
5
6
7
8
0: jdbc:hive2://192.168.150.150:10000> select repeat('LOVE ', 4);
+-----------------------+
| _c0 |
+-----------------------+
| LOVE LOVE LOVE LOVE |
+-----------------------+
1 row selected (0.128 seconds)

reverse()
Operator OperandTypes Description
String reverse(string A) 返回 A 的反向字符串
1
2
3
4
5
6
7
8
0: jdbc:hive2://192.168.150.150:10000> select reverse('LOVE'),reverse('EVOL');
+-------+-------+
| _c0 | _c1 |
+-------+-------+
| EVOL | LOVE |
+-------+-------+
1 row selected (0.5 seconds)

rpad(string str, int len, string pad)
Operator OperandTypes Description
String rpad(String str, Int len, String pad) 返回在 str 的右边填充了一个长度为 len 的 pad【如果 str 长于 len,则返回值缩短为 len 字符。对于空填充字符串,返回值为空】
1
2
3
4
5
6
7
8
0: jdbc:hive2://192.168.150.150:10000> select rpad('LOVE', 3, ' '),rpad('LOVE', 10, ' YOU');
+------+-------------+
| _c0 | _c1 |
+------+-------------+
| LOV | LOVE YOU Y |
+------+-------------+
1 row selected (0.105 seconds)

sentences()
Operator OperandTypes Description
array<array> sentences(String str, [String lang, String locale]) 返回参数 str 分词后的单词二维数组【将str的标点符号和空格作为第1、2个维度为分割符,lang 和 locale 为可选参数】
1
2
3
4
5
6
7
8
9
0: jdbc:hive2://192.168.150.150:10000> select sentences('hello word!Hello Hive,Hello,I LOVE YOU');
+----------------------------------------------------+
| _c0 |
+----------------------------------------------------+
| [["hello","word"],["Hello","Hive","Hello","I","LOVE","YOU"]] |
+----------------------------------------------------+
1 row selected (0.133 seconds)


——ngrams()
Operator OperandTypes Description
array<struct<string,double>> ngrams(array<array>, int N, int K, int pf) 与 sentences()函数一起使用,分词后,返回arr中连续的n个单词的词频统计结果【针对arr中连续n个单词做词频统计并倒序排列,将topk个 统计结果返回】
1
select ngrams(sentences('hello word!Hello Hive,Hello,I LOVE YOU'),1,1);
—–context_ngrams()
Operator OperandTypes Description
Array<struct<String,Double>> context_ngrams(Array<Array> arr, Array cnt, Int K, Int pf) 返回arr中连续的size(cnt)个单词组合以cnt中非null的内容匹配统计,按数量到处排列,将K个统计结果返回 返回arr中连续的size(cnt)个单词组合以cnt中非null内容匹配统计,按数量倒序排列,将topk个统计结果返回
1

space()
Operator OperandTypes Description
String space( Int n ) 返回 n 个空格的字符串
1
2
3
4
5
6
7
8
0: jdbc:hive2://192.168.150.150:10000> select space(0), space(4),length(space(0)), length(space(4));
+------+-------+------+------+
| _c0 | _c1 | _c2 | _c3 |
+------+-------+------+------+
| | | 0 | 4 |
+------+-------+------+------+
1 row selected (0.165 seconds)

split()
Operator OperandTypes Description
Array split(String str, String pat**|**parten) 返回按 pat【正则也可】 内容分割的 str 返回的数组
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
0: jdbc:hive2://192.168.150.150:10000> select split('ILOVEYOU','LOVE');
+--------------+
| _c0 |
+--------------+
| ["I","YOU"] |
+--------------+
1 row selected (0.178 seconds)
0: jdbc:hive2://192.168.150.150:10000> select split('1234556sdfasd1231234','[a-z]+');
+------------------------+
| _c0 |
+------------------------+
| ["1234556","1231234"] |
+------------------------+
1 row selected (0.561 seconds)

str_to_map()
Operator OperandTypes Description
Map<String,String> str_to_map(Text[, Delimiter1, Delimiter2]) 返回使用两个分隔符将文本拆分的键值对【Delimiter1 将 Text 分隔为 K-V 对,Delimiter2 将每个 K-V 对分隔。默认分隔符是’ , ‘表示分隔符1,’ : ‘表示分隔符2】
1
2
3
4
5
6
7
8
0: jdbc:hive2://192.168.150.150:10000> select str_to_map('name:洋群满满,age:18');
+-----------------------------+
| _c0 |
+-----------------------------+
| {"name":"洋群满满","age":"18"} |
+-----------------------------+
1 row selected (0.252 seconds)

substr/substring()
Operator OperandTypes Description
String substr/substring(string|binary A, int start) 返回字符串 A 的字节数组的子字符串或片段,从开始位置开始直到字符串 A 的结束,
String substr/substring(string|binary A, int start, int len) 返回 a 的字节数组的子字符串或片段,从起始位置开始,长度为 len。
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
0: jdbc:hive2://192.168.150.150:10000> select substr('洋群满满',2);
+------+
| _c0 |
+------+
| 群满满 |
+------+
1 row selected (0.133 seconds)
0: jdbc:hive2://192.168.150.150:10000> select substring('洋群满满',2);
+------+
| _c0 |
+------+
| 群满满 |
+------+
1 row selected (0.498 seconds)
0: jdbc:hive2://192.168.150.150:10000> select substr('洋群满满',2,1);
+------+
| _c0 |
+------+
| 群 |
+------+
1 row selected (0.101 seconds)
0: jdbc:hive2://192.168.150.150:10000> select substring('洋群满满',2,1);
+------+
| _c0 |
+------+
| 群 |
+------+
1 row selected (0.122 seconds)

substring_index()
Operator OperandTypes Description
String substring_index(string A, string delim, int count) 返回字符串 A 中分隔符 delim 在计数count之前出现的子字符串【count越界则返回A】
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
1 row selected (0.494 seconds)
0: jdbc:hive2://192.168.150.150:10000> select substring_index('洋群,满,满,',',',3);
+---------+
| _c0 |
+---------+
| 洋群,满,满 |
+---------+
1 row selected (0.095 seconds)
0: jdbc:hive2://192.168.150.150:10000> select substring_index('洋群,满,满,',',',2);
+-------+
| _c0 |
+-------+
| 洋群,满 |
+-------+
1 row selected (0.468 seconds)
0: jdbc:hive2://192.168.150.150:10000> select substring_index('洋群,满,满',',',50);
+---------+
| _c0 |
+---------+
| 洋群,满,满 |
+---------+
1 row selected (0.081 seconds)

translate()
Operator OperandTypes Description
String translate(string|char|varchar Input, string|char|varchar From, string|char|varchar To) 将 From 字符串中的字符替换为 To 字符串中的相应字符,从而转换输入字符串。这类似于 PostgreSQL 中的 translate 函数。如果该 UDF 的任何参数都是 NULL,那么结果也是 NULL。
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
0: jdbc:hive2://192.168.150.150:10000> select translate('洋群满满','洋','羊');
+-------+
| _c0 |
+-------+
| 羊群满满 |
+-------+
1 row selected (0.092 seconds)
# 一个字符的From,只会替换一个字符的To
0: jdbc:hive2://192.168.150.150:10000> select translate('洋群满满','洋','yang');
+-------+
| _c0 |
+-------+
| y群满满 |
+-------+
1 row selected (0.09 seconds)
0: jdbc:hive2://192.168.150.150:10000> select translate('洋群满满','洋群','yang');
+-------+
| _c0 |
+-------+
| ya满满 |
+-------+
1 row selected (0.491 seconds)


trim()
Operator OperandTypes Description
String trim(string A) 去除A两端的空格
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
0: jdbc:hive2://192.168.150.150:10000> select length(' 洋群满满 ');
+------+
| _c0 |
+------+
| 6 |
+------+
1 row selected (0.087 seconds)
0: jdbc:hive2://192.168.150.150:10000> select length(trim(' 洋群满满 '));
+------+
| _c0 |
+------+
| 4 |
+------+
1 row selected (0.096 seconds)

upper()/ucase()
Operator OperandTypes Description
String upper(String A) 返回将 A 的所有字符转换为大写形式所得到的字符串
String ucase(String A) 返回将 A 的所有字符转换为大写形式所得到的字符串
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
0: jdbc:hive2://192.168.150.150:10000> select upper('yqmm');
+-------+
| _c0 |
+-------+
| YQMM |
+-------+
1 row selected (0.485 seconds)
0: jdbc:hive2://192.168.150.150:10000> select ucase('yqmm');
+-------+
| _c0 |
+-------+
| YQMM |
+-------+
1 row selected (0.073 seconds)

initcap()
Operator OperandTypes Description
String Initcap(string A) 返回字符串,每个单词的第一个大写字母为大写,其他所有字母为小写。【单词由空格分隔】
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
0: jdbc:hive2://192.168.150.150:10000> select initcap('yqmm');
+-------+
| _c0 |
+-------+
| Yqmm |
+-------+
1 row selected (0.096 seconds)
0: jdbc:hive2://192.168.150.150:10000> select initcap('yqmm i love you');
+------------------+
| _c0 |
+------------------+
| Yqmm I Love You |
+------------------+
1 row selected (0.065 seconds)

levenshtein()
Operator OperandTypes Description
Int levenshtein(string A, string B) 返回两个字符串之间的莱文斯坦距离【Levenshtein 距离,又称编辑距离,指的是两个字符串之间,由一个转换成另一个所需的最少编辑操作次数。许可的编辑操作包括将一个字符替换成另一个字符,插入一个字符,删除一个字符。编辑距离的算法是首先由俄国科学家Levenshtein提出的,故又叫Levenshtein Distance。】【用途:模糊查询
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
0: jdbc:hive2://192.168.150.150:10000> select levenshtein('yq','mm');
+------+
| _c0 |
+------+
| 2 |
+------+
1 row selected (0.08 seconds)
0: jdbc:hive2://192.168.150.150:10000> select levenshtein('y','qmm');
+------+
| _c0 |
+------+
| 3 |
+------+
1 row selected (0.11 seconds)

soundex()
Operator OperandTypes Description
String soundex(string A) 返回字符串的 soundex 代码【soundex是一个将任何文本串转换为描述其语音表示的字母数字模式的算法。soundex考虑了类似的发音字符和音节,使得对字符串进行发音比较而不是字母比较。虽然,soundex不是SQL概念,但是多数DBMS都提供对soundex的支持。】
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
0: jdbc:hive2://192.168.150.150:10000> select soundex('洋群满满');
+-------+
| _c0 |
+-------+
| NULL |
+-------+
1 row selected (0.484 seconds)
0: jdbc:hive2://192.168.150.150:10000> select soundex('yangqunmanman');
+-------+
| _c0 |
+-------+
| Y525 |
+-------+
1 row selected (0.096 seconds)
0: jdbc:hive2://192.168.150.150:10000> select soundex('hive');
+-------+
| _c0 |
+-------+
| H100 |
+-------+
1 row selected (0.482 seconds)

Data Masking Functions:数据屏蔽函数/脱敏函数

mask()
Operator OperandTypes Description
String mask(string Str[, string upper[, string lower[, string number]]]) 返回 str 的掩码版本。默认情况下,大写字母转换为“ X”,小写字母转换为“ x”,数字转换为“ n”。通过提供附加参数,可以覆盖掩码中使用的字符: 第二个参数控制大写字母的掩码字符,第三个参数控制小写字母,第四个参数控制数字。
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
0: jdbc:hive2://192.168.150.150:10000> select mask("abcd-EFGH-8765-4321");
+----------------------+
| _c0 |
+----------------------+
| xxxx-XXXX-nnnn-nnnn |
+----------------------+
1 row selected (0.527 seconds)
0: jdbc:hive2://192.168.150.150:10000> select mask("abcd-EFGH-8765-4321", "U", "l", "#");
+----------------------+
| _c0 |
+----------------------+
| llll-UUUU-####-#### |
+----------------------+
1 row selected (0.068 seconds)

mask_first_n()
Operator OperandTypes Description
String mask_first_n(string Str[, int N]) 返回带有前 N 个掩码值的掩码 Str 的掩码版本【大写字母转换为“X”,小写字母转换为“ x”,数字转换为“ n”】
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
0: jdbc:hive2://192.168.150.150:10000> select mask_first_n("abcd-EFGH-8765-4321", 4);
+----------------------+
| _c0 |
+----------------------+
| xxxx-EFGH-8765-4321 |
+----------------------+
1 row selected (0.083 seconds)
0: jdbc:hive2://192.168.150.150:10000> select mask_first_n("abcd-EFGH-8765-4321", 6);
+----------------------+
| _c0 |
+----------------------+
| xxxx-XFGH-8765-4321 |
+----------------------+
1 row selected (0.498 seconds)
0: jdbc:hive2://192.168.150.150:10000> select mask_first_n("abcd-EFGH-8765-4321", 8);
+----------------------+
| _c0 |
+----------------------+
| xxxx-XXXH-8765-4321 |
+----------------------+
1 row selected (0.077 seconds)

mask_last_n()
Operator OperandTypes Description
String mask_last_n(string Str[, int N]) 返回带有最后 N 个掩码值的掩码 Str 的掩码版本【大写字母转换为“X”,小写字母转换为“ x”,数字转换为“ n”】
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
0: jdbc:hive2://192.168.150.150:10000> select mask_last_n("abcd-EFGH-8765-4321", 4);
+----------------------+
| _c0 |
+----------------------+
| abcd-EFGH-8765-nnnn |
+----------------------+
1 row selected (0.082 seconds)
0: jdbc:hive2://192.168.150.150:10000> select mask_last_n("abcd-EFGH-8765-4321", 6);
+----------------------+
| _c0 |
+----------------------+
| abcd-EFGH-876n-nnnn |
+----------------------+
1 row selected (0.085 seconds)
0: jdbc:hive2://192.168.150.150:10000> select mask_last_n("abcd-EFGH-8765-4321", 8);
+----------------------+
| _c0 |
+----------------------+
| abcd-EFGH-8nnn-nnnn |
+----------------------+
1 row selected (0.479 seconds)

mask_show_first_n()
Operator OperandTypes Description
String mask_show_first_n(string Str[, int N]) 返回 Str 的屏蔽版本,显示前 N 个未屏蔽字符【大写字母转换为“X”,小写字母转换为“ x”,数字转换为“ n”】
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
0: jdbc:hive2://192.168.150.150:10000> select mask_show_first_n("1234-5678-8765-4321", 4);
+----------------------+
| _c0 |
+----------------------+
| 1234-nnnn-nnnn-nnnn |
+----------------------+
1 row selected (0.08 seconds)
0: jdbc:hive2://192.168.150.150:10000> select mask_show_first_n("1234-5678-8765-4321", 6);
+----------------------+
| _c0 |
+----------------------+
| 1234-5nnn-nnnn-nnnn |
+----------------------+
1 row selected (0.484 seconds)

mask_show_last_n()
Operator OperandTypes Description
String mask_show_first_n(string Str[, int N]) 返回 Str 的屏蔽版本,显示最后 N个未屏蔽字符【大写字母转换为“X”,小写字母转换为“ x”,数字转换为“n”】
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
0: jdbc:hive2://192.168.150.150:10000> select mask_show_last_n("1234-5678-8765-4321", 4);
+----------------------+
| _c0 |
+----------------------+
| nnnn-nnnn-nnnn-4321 |
+----------------------+
1 row selected (0.497 seconds)
0: jdbc:hive2://192.168.150.150:10000> select mask_show_last_n("1234-5678-8765-4321", 6);
+----------------------+
| _c0 |
+----------------------+
| nnnn-nnnn-nnn5-4321 |
+----------------------+
1 row selected (0.066 seconds)

mask_hash()
Operator OperandTypes Description
String mask_hash(string|char|varchar Str) 对str进行hash操作并返回(从Hive 2.1.0开始)。散列是一致的,可用于跨表将masked值连接在一起。对于非string类型的值,此函数返回null。
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
0: jdbc:hive2://192.168.150.150:10000> select mask_hash('洋群满满');
+----------------------------------------------------+
| _c0 |
+----------------------------------------------------+
| 7425da882f430537e1fa310dee2663b6dceac12ee9dba061ec9c61d4ddb89b0a |
+----------------------------------------------------+
1 row selected (0.084 seconds)
# 散列是一致的,可用于跨表将masked值连接在一起
0: jdbc:hive2://192.168.150.150:10000> select mask_hash('洋群满满');
+----------------------------------------------------+
| _c0 |
+----------------------------------------------------+
| 7425da882f430537e1fa310dee2663b6dceac12ee9dba061ec9c61d4ddb89b0a |
+----------------------------------------------------+
1 row selected (0.495 seconds)
0: jdbc:hive2://192.168.150.150:10000> select mask_hash(1234);
+-------+
| _c0 |
+-------+
| NULL |
+-------+
1 row selected (0.733 seconds)

Misc Functions:加/解密函数

java_method()/reflect()
Operator OperandTypes Description
varies java_method(Class, Method[, arg1[, arg2..]]) 使用反射机制调用java的方法【与reflect作用一样】
varies reflect(Class, Method[, arg1[, arg2..]]) 使用反射机制调用java的方法【与java_method作用一样】
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
0: jdbc:hive2://192.168.150.150:10000> SELECT reflect("java.lang.Math", "max", 1,9);
+------+
| _c0 |
+------+
| 9 |
+------+
1 row selected (0.511 seconds)
# 当然了,把它归于加/解密函数中,调用的当然不是这么简单的java.lang.math方法,而是更加容易去加/解密的方法
0: jdbc:hive2://192.168.150.150:10000> SELECT java_method("java.lang.Math", "max", 1,9);
+------+
| _c0 |
+------+
| 9 |
+------+
1 row selected (0.483 seconds
hash()
Operator OperandTypes Description
Int hash(a1[, a2…]) 返回参数的hash散列值
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
0: jdbc:hive2://192.168.150.150:10000> select hash('洋群满满');
+-------------+
| _c0 |
+-------------+
| -391132048 |
+-------------+
1 row selected (0.079 seconds)
0: jdbc:hive2://192.168.150.150:10000> select hash(11,22);
+------+
| _c0 |
+------+
| 363 |
+------+
1 row selected (0.079 seconds)

current_user()
Operator OperandTypes Description
String current_user() 从配置的身份验证器管理器返回当前用户名【.可能与连接时提供的用户相同,但是对于某些身份验证管理器(例如 :HadoopDefaultAuthenticatorr)它可能不同】
1
2
3
4
5
6
7
8
0: jdbc:hive2://192.168.150.150:10000> select current_user();
+------------+
| _c0 |
+------------+
| anonymous | # 匿名
+------------+
1 row selected (0.489 seconds)

logged_in_user()
Operator OperandTypes Description
String logged_in_user() 从会话状态返回当前用户名,就是连接到 Hive 时提供的用户名
1
2
3
4
5
6
7
8
0: jdbc:hive2://192.168.150.150:10000> select logged_in_user();
+------------+
| _c0 |
+------------+
| anonymous | # 匿名
+------------+
1 row selected (0.092 seconds)

current_database()
Operator OperandTypes Description
String current_database() 返回当前数据库名称
1
2
3
4
5
6
7
8
0: jdbc:hive2://192.168.150.150:10000> select current_database();
+----------+
| _c0 |
+----------+
| default |
+----------+
1 row selected (0.079 seconds)

md5()
Operator OperandTypes Description
String md5( String / binary str) 返回 将参数转换为MD5加密后的字符串【结果不可逆】,计算字符串或二进制的 MD5 128位校验,该值作为一个包含32个十六进制数字的字符串返回,如果参数为 NULL,则返回 NULL.
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
0: jdbc:hive2://192.168.150.150:10000> select md5('LOVE');
+-----------------------------------+
| _c0 |
+-----------------------------------+
| c9122fd7bae0681b62a39ddfc1c7fb19 |
+-----------------------------------+
1 row selected (0.061 seconds)
0: jdbc:hive2://192.168.150.150:10000> select md5('洋群满满');
+-----------------------------------+
| _c0 |
+-----------------------------------+
| e92c7e61076ad578ee89930f63d9b325 |
+-----------------------------------+
1 row selected (0.083 seconds)

sha1()/sha()
Operator OperandTypes Description
String sha1(string/binary) 为字符串或二进制文件计算 SHA-1摘要,并以十六进制字符串
String sha(string/binary) 为字符串或二进制文件计算 SHA摘要,并以十六进制字符串
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
0: jdbc:hive2://192.168.150.150:10000> select sha1('洋群满满');
+-------------------------------------------+
| _c0 |
+-------------------------------------------+
| a1e638c93d7286770a5c79134f0db146ee77dabc |
+-------------------------------------------+
1 row selected (0.082 seconds)
0: jdbc:hive2://192.168.150.150:10000> select sha('洋群满满');
+-------------------------------------------+
| _c0 |
+-------------------------------------------+
| a1e638c93d7286770a5c79134f0db146ee77dabc |
+-------------------------------------------+
1 row selected (0.469 seconds)

crc32()
Operator OperandTypes Description
BigInt crc32(string/binary) 计算字符串或二进制参数的循环冗余校验值并返回 bigint 值
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
0: jdbc:hive2://192.168.150.150:10000> select crc32('洋群满满');
+-------------+
| _c0 |
+-------------+
| 1515698111 |
+-------------+
1 row selected (0.089 seconds)
0: jdbc:hive2://192.168.150.150:10000> select crc32(010111);
+------------+
| _c0 |
+------------+
| 409081509 |
+------------+
1 row selected (0.066 seconds)

sha2()
Operator OperandTypes Description
String sha2(string/binary, int) 计算SHA-2系列哈希函数(SHA-224, SHA-256, SHA-384, and SHA-512)(从Hive1.3.0开始)。第一个参数是要hash的字符串或二进制。第二个参数表示结果所需的位长度,其值必须为224、256、384、512或0(相当于256)。从Java 8开始支持SHA-224。如果参数为空或哈希长度不是允许的值之一,则返回值为NULL
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
0: jdbc:hive2://192.168.150.150:10000> select sha2('洋群满满',0);
+----------------------------------------------------+
| _c0 |
+----------------------------------------------------+
| 7425da882f430537e1fa310dee2663b6dceac12ee9dba061ec9c61d4ddb89b0a |
+----------------------------------------------------+
1 row selected (0.084 seconds)
0: jdbc:hive2://192.168.150.150:10000> select sha2('洋群满满',256);
+----------------------------------------------------+
| _c0 |
+----------------------------------------------------+
| 7425da882f430537e1fa310dee2663b6dceac12ee9dba061ec9c61d4ddb89b0a |
+----------------------------------------------------+
1 row selected (0.119 seconds)
0: jdbc:hive2://192.168.150.150:10000> select sha2('洋群满满',512);
+----------------------------------------------------+
| _c0 |
+----------------------------------------------------+
| f7fd48e88a6dae6f39f1b622d8d82eecb6ead9a354bff2364cc790d46e04159704ddceae09b6afde86b53dc6cfed1a9cf2d17ee825dff4c6f09648703f746207 |
+----------------------------------------------------+
1 row selected (0.491 seconds)

aes_encrypt()
Operator OperandTypes Description
Binary aes_encrypt(Input string/binary, Key string/binary) 使用AES对Input加密,可以使用128、192或256位的密钥长度。如果安装了 Java 加密扩展(JCE)无限强度管辖权策略文件,则可以使用192和256位密钥。如果任一参数为 NULL 或Key长度不是允许的值之一,则返回值为 NULL。
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
0: jdbc:hive2://192.168.150.150:10000> select aes_encrypt('洋群满满','520');
+-------+
| _c0 |
+-------+
| NULL |
+-------+
1 row selected (0.128 seconds)
0: jdbc:hive2://192.168.150.150:10000> select aes_encrypt('洋群满满','1234567890123456');
+-------------------+
| _c0 |
+-------------------+
| ��@��gE3,��$�D� |
+-------------------+
1 row selected (0.08 seconds)
0: jdbc:hive2://192.168.150.150:10000> select base64(aes_encrypt('洋群满满','1234567890123456'));
+---------------------------+
| _c0 |
+---------------------------+
| xstA29dnRTMCLJTeJKlEzw== |
+---------------------------+
1 row selected (0.073 seconds)

aes_decrypt()
Operator OperandTypes Description
Binary aes_decrypt(Input binary, Key string/binary) 使用AES对Input进行解密,可以使用128、192或256位的密钥长度。如果安装了 Java 加密扩展(JCE)无限强度管辖权策略文件,则可以使用192和256位密钥。如果任一参数为 NULL 或Key长度不是允许的值之一,则返回值为 NULL。
1
2
3
4
5
6
7
8
0: jdbc:hive2://192.168.150.150:10000> select aes_decrypt(unbase64('xstA29dnRTMCLJTeJKlEzw=='), '1234567890123456'); 
+-------+
| _c0 |
+-------+
| 洋群满满 |
+-------+
1 row selected (0.093 seconds)

version()
Operator OperandTypes Description
String version() 返回 Hive 版本,该字符串包含两个字段,第一个字段是生成编号,第二个字段是生成的hash散列。例如: “ select version () ;实际的结果将取决于你的构建
1
2
3
4
5
6
7
8
0: jdbc:hive2://192.168.150.150:10000> select version();
+--------------------------------------------------+
| _c0 |
+--------------------------------------------------+
| 3.1.2 r8190d2be7b7165effa62bd21b7d60ef81fb0e4af |
+--------------------------------------------------+
1 row selected (0.089 seconds)

surrogate_key()
Operator OperandTypes Description
BigInt surrogate_key([write_id_bits, task_id_bits]) 在向表中输入数据时,自动为行生成数字 id。只能用作 acid 或只插入表的默认值
1

Built-in Aggregate Functions :内置聚合函数(UDAF)—多对一

count()
Operator OperandTypes Description
BigInt count(*) 返回检索到的行的总数,包括包含 NULL 值的行。
BigInt count(expr) 返回提供的表达式为非 null 的行数。
BigInt count(DISTINCT expr[, expr…]) 返回提供的表达式唯一且非 null 的行数。可以使用 hive.optimize.distinct.rewrite 来优化这个操作。
1

sum()
Operator OperandTypes Description
Double sum(col) 对组内某列求和(包含重复值)
Double sum(DISTINCT col) 对组内某列求和(不包含重复值)
1

avg()
Operator OperandTypes Description
Double avg(col) 对组内某列求平均值(包含重复值)
Double avg(DISTINCT col) 对组内某列求平均值(不包含重复值)
1

min()/max()
Operator OperandTypes Description
Double min(col) 返回组中列的最小值
Double max(col) 返回组中列的最大值
1

variance()/var_pop()
Operator OperandTypes Description
Double variance(col) 返回组中数值列的方差
Double var_pop(col) 返回组中数值列的方差
1

var_samp()
Operator OperandTypes Description
Double var_samp(col) 返回组中一个数值列的无偏样本方差
1

stddev_pop()
Operator OperandTypes Description
Double stddev_pop(col) 返回组中一个数值列的标准差
1

stddev_samp()
Operator OperandTypes Description
Double stddev_samp(col) 返回组中一个数值列的无偏样本标准差
1

covar_pop()
Operator OperandTypes Description
Double covar_pop(col1, col2) 返回组中一对数值列的总体协方差
1

covar_samp()
Operator OperandTypes Description
Double covar_samp(col1, col2) 返回组中一对数值列的样本协方差
1

corr()
Operator OperandTypes Description
Double corr(col1, col2) 返回组中一对数字列的皮尔逊相关系数
1

percentile()
Operator OperandTypes Description
Double percentile(BIGINT col, p) 返回组中某一列的精确位数【第p位百分数】(不适用于浮点类型)。p 一定在0和1之间。注意: 只能为整数值计算真正的百分位数。如果您的输入是非整数的,请使用 percentile_approx
Array<Double> percentile(BIGINT col, array(p1 [, p2]…)) 返回组中某列的精确百分位数 p1、 p2、 … (不适用于浮点类型)。p一定在0和1之间。注意: 只能为整数值计算真正的百分位数。如果您的输入是非整数的,请使用 PERCENTILE _ approx。
1

percentile_approx()
Operator OperandTypes Description
Double percentile_approx(DOUBLE col, p [, B]) 返回组中数值列(包括浮点类型)的近似百分位数。B 参数以内存为代价来控制近似精度。更高的值产生更好的近似值,默认值是10,000。当 col 中不同值的数目小于 B 时,这就给出了一个精确的百分比值。
Array<Double> percentile_approx(DOUBLE col, array(p1 [, p2]…) [, B]) 与上面相同,但是接受并返回百分位数值数组而不是单个百分位数值数组。
1

regr_avgx()
Operator OperandTypes Description
Double regr_avgx(independent, dependent) 相当于avg(dependent):计算自变量的平均值。该函数将任意一对数字类型作为参数,并返回一个double。任何具有null的对都将被忽略。如果应用于空集:返回null。
1

regr_avgy()
Operator OperandTypes Description
Double regr_avgy(independent, dependent) 相当于avg(independent):计算因变量的平均值。该函数将任意一对数字类型作为参数,并返回一个double。任何具有null的对都将被忽略。如果应用于空集:返回null
1

regr_count()
Operator OperandTypes Description
Double regr_count(independent, dependent) 返回用于匹配线性回归线的非空对的数量。【返回independent和dependent都非空的对数】
1

regr_intercept()
Operator OperandTypes Description
Double regr_intercept(independent, dependent) 返回线性回归线的 y 轴截距,也就是方程a * independent+ b 中 b 的值【返回线性回归的截距项】
1

regr_r2()
Operator OperandTypes Description
Double regr_r2(independent, dependent) 返回线性回归的确定系数
1

regr_slope()
Operator OperandTypes Description
Double regr_slope(independent, dependent) 返回线性回归的斜率,也就是依赖于方程 a * independent+ b 中的 a 的值
1

regr_sxx()
Operator OperandTypes Description
Double regr_sxx(independent, dependent) 等价于regr_count(independent, dependent) * var_pop(dependent)。【用如下公式计算: ∑ i = 1 n x i x i − ∑ i = 1 n x i ∗ ∑ i = 1 n x i n \frac{\sum_{i=1}^nx_ix_i-\sum_{i=1}^nx_i\ast\sum_{i=1}^nx_i}n n∑i=1nxixi−∑i=1nxi∗∑i=1nxi,并返回结果。】
1

regr_sxy()
Operator OperandTypes Description
Double regr_sxy(independent, dependent) 等价于regr_count(independent, dependent) * covar_pop(independent, dependent)。【用如下公式计算: ∑ i = 1 n y i x i − ∑ i = 1 n y i ∗ ∑ i = 1 n x i n \frac{\sum_{i=1}^ny_ix_i-\sum_{i=1}^ny_i\ast\sum_{i=1}^nx_i}n n∑i=1nyixi−∑i=1nyi∗∑i=1nxi,并返回结果。】
1

regr_syy()
Operator OperandTypes Description
Double regr_syy(independent, dependent) 等价于 regr_count(independent, dependent) * var_pop(independent)。【用如下公式计算: ∑ i = 1 n y i y i − ∑ i = 1 n y i ∗ ∑ i = 1 n y i n \frac{\sum_{i=1}^ny_iy_i-\sum_{i=1}^ny_i\ast\sum_{i=1}^ny_i}n n∑i=1nyiyi−∑i=1nyi∗∑i=1nyi,并返回结果。】
1

histogram_numeric()
Operator OperandTypes Description
Array<struct {‘x’,’y’}> histogram_numeric(col, b) 用于画直方图。返回一个长度为b的数组,数组中元素为(x,y)形式的键值对,x代表了直方图中该柱形的中心,y代表可其高度。
1

collect_set()
Operator OperandTypes Description
Array collect_set(col) 返回查询列col去重后的集合,与distinct不同,distinct查询结果为一列数据,collect_set查询后结果为一个集合形式的元素
1

collect_listl)
Operator OperandTypes Description
Array collect_list(col) 返回查询列col的列表
1

ntile()
Operator OperandTypes Description
INTEGER ntile(INTEGER x) 将有序分区划分为x个称为存储桶的组,并为该分区中的每一行分配存储桶编号。 (此方式存储可以快速计算分位数)
1

Built-in Table-Generating Functions (UDTF):内置的表生成函数(UDTF)—一对多

一般的用户定义函数,如 concat () ,接受单个输入行并输出单个输出行。相反,表生成函数将单个输入行转换为多个输出行。

explode()
Operator OperandTypes Description
T explode(ARRAY a) 将数组爆炸为多行。返回一个带有单列(col)的行集,该数组中的每个元素对应一行。
Tkey,Tvalue explode(MAP<Tkey,Tvalue> m) 将map爆炸为多行。返回一个包含两列(键、值)的行集,输入映射中的每个键值对都有一行
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
0: jdbc:hive2://192.168.150.150:10000> select explode(array('A','B','C'));
+------+
| col |
+------+
| A |
| B |
| C |
+------+
3 rows selected (0.252 seconds)
0: jdbc:hive2://192.168.150.150:10000> select explode(array('A','B','C')) as col;
+------+
| col |
+------+
| A |
| B |
| C |
+------+
3 rows selected (0.314 seconds)
0: jdbc:hive2://192.168.150.150:10000> select tf.* from (select 0) t lateral view explode(array('A','B','C')) tf;
+---------+
| tf.col |
+---------+
| A |
| B |
| C |
+---------+
3 rows selected (0.046 seconds)
0: jdbc:hive2://192.168.150.150:10000> select tf.* from (select 0) t lateral view explode(array('A','B','C')) tf as col;
+---------+
| tf.col |
+---------+
| A |
| B |
| C |
+---------+
3 rows selected (0.043 seconds)
0: jdbc:hive2://192.168.150.150:10000> select explode(map('A',10,'B',20,'C',30));
+------+--------+
| key | value |
+------+--------+
| A | 10 |
| B | 20 |
| C | 30 |
+------+--------+
3 rows selected (0.309 seconds)
0: jdbc:hive2://192.168.150.150:10000> select explode(map('A',10,'B',20,'C',30)) as (key,value);
+------+--------+
| key | value |
+------+--------+
| A | 10 |
| B | 20 |
| C | 30 |
+------+--------+
3 rows selected (0.639 seconds)
0: jdbc:hive2://192.168.150.150:10000> select tf.* from (select 0) t lateral view explode(map('A',10,'B',20,'C',30)) tf;
+---------+-----------+
| tf.key | tf.value |
+---------+-----------+
| A | 10 |
| B | 20 |
| C | 30 |
+---------+-----------+
3 rows selected (0.048 seconds)
0: jdbc:hive2://192.168.150.150:10000> select tf.* from (select 0) t lateral view explode(map('A',10,'B',20,'C',30)) tf as key,value;
+---------+-----------+
| tf.key | tf.value |
+---------+-----------+
| A | 10 |
| B | 20 |
| C | 30 |
+---------+-----------+
3 rows selected (0.299 seconds)


posexplode()
Operator OperandTypes Description
Int,T posexplode(ARRAY a) 使用 int 类型的附加位置列(原始数组中项目的位置,从 0 开始)将数组分解为多行。 返回一个包含两列 (pos,val) 的行集,数组中的每个元素占一行。
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
0: jdbc:hive2://192.168.150.150:10000> select posexplode(array('A','B','C'));
+------+------+
| pos | val |
+------+------+
| 0 | A |
| 1 | B |
| 2 | C |
+------+------+
3 rows selected (0.674 seconds)
0: jdbc:hive2://192.168.150.150:10000> select posexplode(array('A','B','C')) as (pos,val);
+------+------+
| pos | val |
+------+------+
| 0 | A |
| 1 | B |
| 2 | C |
+------+------+
3 rows selected (0.711 seconds)
0: jdbc:hive2://192.168.150.150:10000> select tf.* from (select 0) t lateral view posexplode(array('A','B','C')) tf;
+---------+---------+
| tf.pos | tf.val |
+---------+---------+
| 0 | A |
| 1 | B |
| 2 | C |
+---------+---------+
3 rows selected (0.042 seconds)
0: jdbc:hive2://192.168.150.150:10000> select tf.* from (select 0) t lateral view posexplode(array('A','B','C')) tf as pos,val;
+---------+---------+
| tf.pos | tf.val |
+---------+---------+
| 0 | A |
| 1 | B |
| 2 | C |
+---------+---------+
3 rows selected (0.253 seconds)
inline()
Operator OperandTypes Description
T1,…,Tn inline(ARRAY<STRUCT<f1:T1,…,fn:Tn>> a) 将结构数组分解为多行。 返回一个包含 N 列的行集(N = 结构中顶级元素的数量),数组中每个结构一行。
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
0: jdbc:hive2://192.168.150.150:10000> select inline(array(struct('A',10,date '2015-01-01'),struct('B',20,date '2016-02-02')));
+-------+-------+-------------+
| col1 | col2 | col3 |
+-------+-------+-------------+
| A | 10 | 2015-01-01 |
| B | 20 | 2016-02-02 |
+-------+-------+-------------+
2 rows selected (0.356 seconds)
0: jdbc:hive2://192.168.150.150:10000> select inline(array(struct('A',10,date '2015-01-01'),struct('B',20,date '2016-02-02'))) as (col1,col2,col3);
+-------+-------+-------------+
| col1 | col2 | col3 |
+-------+-------+-------------+
| A | 10 | 2015-01-01 |
| B | 20 | 2016-02-02 |
+-------+-------+-------------+
2 rows selected (0.232 seconds)
0: jdbc:hive2://192.168.150.150:10000> select tf.* from (select 0) t lateral view inline(array(struct('A',10,date '2015-01-01'),struct('B',20,date '2016-02-02'))) tf;
+----------+----------+-------------+
| tf.col1 | tf.col2 | tf.col3 |
+----------+----------+-------------+
| A | 10 | 2015-01-01 |
| B | 20 | 2016-02-02 |
+----------+----------+-------------+
2 rows selected (0.468 seconds)
0: jdbc:hive2://192.168.150.150:10000> select tf.* from (select 0) t lateral view inline(array(struct('A',10,date '2015-01-01'),struct('B',20,date '2016-02-02'))) tf as col1,col2,col3;
+----------+----------+-------------+
| tf.col1 | tf.col2 | tf.col3 |
+----------+----------+-------------+
| A | 10 | 2015-01-01 |
| B | 20 | 2016-02-02 |
+----------+----------+-------------+
2 rows selected (0.379 seconds)

stack()
Operator OperandTypes Description
T1,…,Tn/r stack(int r,T1 V1,…,Tn/r Vn) 将 n 个值 V1,…,Vn 分解为 r 行。 每行将有 n/r 列。 r 必须是常数。
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
0: jdbc:hive2://192.168.150.150:10000> select stack(2,'A',10,date '2015-01-01','B',20,date '2016-01-01');
+-------+-------+-------------+
| col0 | col1 | col2 |
+-------+-------+-------------+
| A | 10 | 2015-01-01 |
| B | 20 | 2016-01-01 |
+-------+-------+-------------+
2 rows selected (0.3 seconds)
0: jdbc:hive2://192.168.150.150:10000> select stack(2,'A',10,date '2015-01-01','B',20,date '2016-01-01') as (col0,col1,col2);
+-------+-------+-------------+
| col0 | col1 | col2 |
+-------+-------+-------------+
| A | 10 | 2015-01-01 |
| B | 20 | 2016-01-01 |
+-------+-------+-------------+
2 rows selected (0.252 seconds)
0: jdbc:hive2://192.168.150.150:10000> select tf.* from (select 0) t lateral view stack(2,'A',10,date '2015-01-01','B',20,date '2016-01-01') tf;
+----------+----------+-------------+
| tf.col0 | tf.col1 | tf.col2 |
+----------+----------+-------------+
| A | 10 | 2015-01-01 |
| B | 20 | 2016-01-01 |
+----------+----------+-------------+
2 rows selected (0.453 seconds)
0: jdbc:hive2://192.168.150.150:10000> select tf.* from (select 0) t lateral view stack(2,'A',10,date '2015-01-01','B',20,date '2016-01-01') tf as col0,col1,col2;
+----------+----------+-------------+
| tf.col0 | tf.col1 | tf.col2 |
+----------+----------+-------------+
| A | 10 | 2015-01-01 |
| B | 20 | 2016-01-01 |
+----------+----------+-------------+
2 rows selected (0.189 seconds)

+——————————+
| tab_name |
+——————————+
| ! |
| != |
| $sum0 |
| % |
| & |
| * |

+
/
<
<=
<=>
<>
=
==
>
>=
^
abs
acos
add_months
aes_decrypt
aes_encrypt
and
array
array_contains
ascii
asin
assert_true
assert_true_oom
atan
avg
base64
between
bin
bloom_filter
bround
cardinality_violation
case
cbrt
ceil
ceiling
char_length
character_length
chr
coalesce
collect_list
collect_set
compute_stats
concat
concat_ws
context_ngrams
conv
corr
cos
count
covar_pop
covar_samp
crc32
create_union
cume_dist
current_authorizer
current_database
current_date
current_groups
current_timestamp
current_user
date_add
date_format
date_sub
datediff
day
dayofmonth
dayofweek
decode
degrees
dense_rank
div
e
elt
encode
enforce_constraint
exp
explode
extract_union
factorial
field
find_in_set
first_value
floor
floor_day
floor_hour
floor_minute
floor_month
floor_quarter
floor_second
floor_week
floor_year
format_number
from_unixtime

+——————————+
| tab_name |
+——————————+
| from_utc_timestamp |
| get_json_object |
| get_splits |
| greatest |
| grouping |
| hash |
| hex |
| histogram_numeric |
| hour |
| if |
| in |
| in_bloom_filter |
| in_file |
| index |
| initcap |
| inline |
| instr |
| internal_interval |
| isfalse |
| isnotfalse |
| isnotnull |
| isnottrue |
| isnull |
| istrue |
| java_method |
| json_tuple |
| lag |
| last_day |
| last_value |
| lcase |
| lead |
| least |
| length |
| levenshtein |
| like |
| likeall |
| likeany |
| ln |
| locate |
| log |
| log10 |
| log2 |
| logged_in_user |
| lower |
| lpad |
| ltrim |
| map |
| map_keys |
| map_values |
| mask |
| mask_first_n |
| mask_hash |
| mask_last_n |
| mask_show_first_n |
| mask_show_last_n |
| matchpath |
| max |
| md5 |
| min |
| minute |
| mod |
| month |
| months_between |
| murmur_hash |
| named_struct |
| negative |
| next_day |
| ngrams |
| noop |
| noopstreaming |
| noopwithmap |
| noopwithmapstreaming |
| not |
| ntile |
| nullif |
| nvl |
| octet_length |
| or |
| parse_url |
| parse_url_tuple |
| percent_rank |
| percentile |
| percentile_approx |
| pi |
| pmod |
| posexplode |
| positive |
| pow |
| power |
| printf |
| quarter |
| radians |
| rand |
| rank |
| reflect |
| reflect2 |
| regexp |
| regexp_extract |
| regexp_replace |
| regr_avgx |
+——————————+
| tab_name |
+——————————+
| regr_avgy |
| regr_count |
| regr_intercept |
| regr_r2 |
| regr_slope |
| regr_sxx |
| regr_sxy |
| regr_syy |
| repeat |
| replace |
| replicate_rows |
| restrict_information_schema |
| reverse |
| rlike |
| round |
| row_number |
| rpad |
| rtrim |
| second |
| sentences |
| sha |
| sha1 |
| sha2 |
| shiftleft |
| shiftright |
| shiftrightunsigned |
| sign |
| sin |
| size |
| sort_array |
| sort_array_by |
| soundex |
| space |
| split |
| sq_count_check |
| sqrt |
| stack |
| std |
| stddev |
| stddev_pop |
| stddev_samp |
| str_to_map |
| struct |
| substr |
| substring |
| substring_index |
| sum |
| tan |
| to_date |
| to_epoch_milli |
| to_unix_timestamp |
| to_utc_timestamp |
| translate |
| trim |
| trunc |
| ucase |
| udftoboolean |
| udftobyte |
| udftodouble |
| udftofloat |
| udftointeger |
| udftolong |
| udftoshort |
| udftostring |
| unbase64 |
| unhex |
| unix_timestamp |
| upper |
| uuid |
| var_pop |
| var_samp |
| variance |
| version |
| weekofyear |
| when |
| width_bucket |
| windowingtablefunction |
| xpath |
| xpath_boolean |
| xpath_double |
| xpath_float |
| xpath_int |
| xpath_long |
| xpath_number |
| xpath_short |
| xpath_string |
| year |
| | |
| ~ |
+——————————+


本博客所有文章除特别声明外,均采用 CC BY-SA 4.0 协议 ,转载请注明出处!