Hive 将数据插入hive动态分区表或hdfs动态分区目录的优化 distri
Hive 将数据插入hive动态分区表或hdfs动态分区目录的优化 distribute by分区排序的应用
将数据插入动态分区可能会导致短时间内(map任务)产生大量的分区(大于分区列的值去重后的数量),
Hive 将数据插入hive动态分区表或hdfs动态分区目录的优化 distribute by分区排序的应用 将数据插入动态分区可能会导致短时间内(map任务)产生大量的分区(大于分区列的值去重后的数量),导致资源消耗过大,因此可以设置以下3个用于保护自己的参数。 Troubleshooting and best practices故障排除和最佳实践: beeline> set hive.exec.dynamic.partition.mode=nonstrict; beeline> FROM page_view_stg pvs INSERT OVERWRITE TABLE page_view PARTITION(dt, country) SELECt pvs.viewTime, pvs.userid, pvs.page_url, pvs.referrer_url, null, null, pvs.ip, from_unixtimestamp(pvs.viewTime, 'yyyy-MM-dd') ds, pvs.country; ... 2010-05-07 11:10:19,816 Stage-1 map = 0%, reduce = 0% [Fatal Error] Operator FS_28 (id=41): fatal error. Killing the job. Ended Job = job_201005052204_28178 with errors ... The problem of this that one mapper will take a random set of rows and it is very likely that the number of distinct (dt, country) pairs will exceed the limit of hive.exec.max.dynamic.partitions.pernode. One way around it is to group the rows by the dynamic partition columns in the mapper and distribute them to the reducers where the dynamic partitions will be created. In this case the number of distinct dynamic partitions will be significantly reduced. The above example query could be rewritten to: beeline> set hive.exec.dynamic.partition.mode=nonstrict; beeline> FROM page_view_stg pvs INSERT OVERWRITE TABLE page_view PARTITION(dt, country) SELECt pvs.viewTime, pvs.userid, pvs.page_url, pvs.referrer_url, null, null, pvs.ip, from_unixtimestamp(pvs.viewTime, 'yyyy-MM-dd') dt, pvs.country DISTRIBUTE BY dt, country; This query will generate a MapReduce job rather than Map-only job. The SELECT-clause will be converted to a plan to the mappers and the output will be distributed to the reducers based on the value of (dt, country) pairs. The INSERT-clause will be converted to the plan in the reducer which writes to the dynamic partitions. 实际工作中,情况不够复杂大数据排序,不需要使用distribute by来优化,应为每天执行定时任务处理昨日一省的数据,以日期和省份两个字段作为分区字段,每个程序本来就只处理一个分区的数据,所以mapper和reducer不会产生过多分区。 参考 (编辑:湘西站长网) 【声明】本站内容均来自网络,其相关言论仅代表作者个人观点,不代表本站立场。若无意侵犯到您的权利,请及时与联系站长删除相关内容! |