Hive中行列转换的实现

Hive行列转换

[TOC]

1、行转列 (根据主键,进行多行合并一列)

使用函数:concat_ws(‘,’,collect_set(column))

  • collect_list 不去重
  • collect_set 去重
  • column 的数据类型要求是 string

1.1、构建测试数据

创建数据

vi row_to_col.txt

1
2
3
4
5
孙悟空,白羊座,A
大海,射手座,A
宋宋,白羊座,B
猪八戒,白羊座,A
凤姐,射手座,A

上传数据

hdfs dfs -put ./row_to_col.txt /data/hive_test/row_to_col

1.2、建表

1
2
3
4
5
6
create table row_to_col(
name string,
constellation string,
blood_type string)
row format delimited fields terminated by ","
location '/data/hive_test/row_to_col';

1.3 执行转换

1
2
3
select t.base, concat_ws('|', collect_set(t.name)) from (
select name, concat(constellation, ",", blood_type) as base from row_to_col) t
group by t.base;

image-20211006190827553

2、列转行 (对某列拆分,一列拆多行)

使用函数:lateral view explode(split(column, ‘,’)) num

2.1、构建测试数据

vi col_to_row.txt

1
2
3
疑犯追踪|悬疑,动作,科幻,剧情
Lie to me|悬疑,警匪,动作,心理,剧情
战狼2|战争,动作,灾难

hdfs dfs -put ./col_to_row.txt /data/hive_test/col_to_row

2.2、建表

1
2
3
4
5
6
create table col_to_row(
movie string,
category array<string>)
row format delimited fields terminated by "|"
collection items terminated by ","
location '/data/hive_test/col_to_row';

2.3、执行转换

1
2
3
4
5
select
movie,
category_name
from
col_to_row lateral view explode(category) table_tmp as category_name;

image-20211006190808044

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
select count(t.movie) from 
(
select
movie,
category_name
from
col_to_row lateral view explode(category) table_tmp as category_name
) t
group by t.movie;


select sum(*) from
(
select
movie,
category_name
from
col_to_row lateral view explode(category) table_tmp as category_name
) t
group by t.movie;



select count(distinct(t.movie)) from
(
select
movie,
category_name
from
col_to_row lateral view explode(category) table_tmp as category_name
) t;