impala 解析json

前提

直接通过hql查询字段为json字符串里的值,比如

1
获取'{"keyword":1}' 的keyword值 1

网上相关的文章

  1. 直接引用hive的udf
  2. 开源的实现: https://github.com/nazgul33/impala-get-json-object-udf

在CDH 集群上的实现

1
2
3
4
5
6
7
8
sudo -u impala hadoop fs -mkdir -p /user/impala/udf
sudo -u impala hadoop fs -put -f /opt/cloudera/parcels/CDH/jars/hive-exec-*.jar /user/impala/udf/hive-exec.jar
impala-shell -q "
CREATE DATABASE IF NOT EXISTS udf;
USE udf;
DROP FUNCTION IF EXISTS udf.get_json_object(string, string);
CREATE FUNCTION udf.get_json_object(string, string) returns string location '/user/impala/udf/hive-exec.jar' symbol='org.apache.hadoop.hive.ql.udf.UDFJson';
"

测试

1
2
3
4
5
6
Query: select udf.get_json_object('{"keyword":1}', '$.keyword')
+---------------------------------------------------+
| udf.get_json_object('{"keyword":1}', '$.keyword') |
+---------------------------------------------------+
| 1 |
+---------------------------------------------------+