2016-08-09 66 views
1

我开始使用Apache kylin(版本1.5.3)。创建一个多维数据集时,我在步骤5'保存长方体统计数据'中出现错误。日志中说:Apache kylin:在步骤5创建多维数据集失败 - KeyValue大小过大

java.lang.IllegalArgumentException: KeyValue size too large 
at org.apache.hadoop.hbase.client.HTable.validatePut(HTable.java:1521) 
at org.apache.hadoop.hbase.client.BufferedMutatorImpl.validatePut(BufferedMutatorImpl.java:147) 
at org.apache.hadoop.hbase.client.BufferedMutatorImpl.doMutate(BufferedMutatorImpl.java:134) 
at org.apache.hadoop.hbase.client.BufferedMutatorImpl.mutate(BufferedMutatorImpl.java:98) 
at org.apache.hadoop.hbase.client.HTable.put(HTable.java:1038) 
at org.apache.kylin.storage.hbase.HBaseResourceStore.putResourceImpl(HBaseResourceStore.java:242) 
at org.apache.kylin.common.persistence.ResourceStore.putResource(ResourceStore.java:208) 
at org.apache.kylin.engine.mr.steps.SaveStatisticsStep.doWork(SaveStatisticsStep.java:113) 
at org.apache.kylin.job.execution.AbstractExecutable.execute(AbstractExecutable.java:112) 
at org.apache.kylin.job.execution.DefaultChainedExecutable.doWork(DefaultChainedExecutable.java:57) 
at org.apache.kylin.job.execution.AbstractExecutable.execute(AbstractExecutable.java:112) 
at org.apache.kylin.job.impl.threadpool.DefaultScheduler$JobRunner.run(DefaultScheduler.java:127) 
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145) 
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) 
at java.lang.Thread.run(Thread.java:745) 

首先,我试图创建具有较少维度的同一个立方体,它的工作原理。用左外尺寸创建antoher立方体也可以。但是,当我尝试创建一个具有所有这些(13)维度的多维数据集时,它就会失败。 我也累了设置hbase.client.keyvalue.maxsize为0来禁用检查。还是一样的错误。

有谁知道问题是什么以及我如何解决它?

顺便说一句,我在沙盒HDP 2.4上使用麒麟。

感谢您的帮助提前

瑟伦

+0

什么是您的hbase配置中的“hbase.client.keyvalue.maxsize”? –

+0

“hbase.client.keyvalue.maxsize”设置为0 atm。所以通常应该禁用检查。 –

+0

请尝试kylin.hbase.client.keyvalue.maxsize = 1048576 –

回答

0

让kylin.hbase.client.keyvalue.maxsize(驻留在麒麟配置文件 - 设置/ kylin.properteis)的确定值和hbase.client .keyvalue.maxsize(驻留在hbase配置文件中)是相同的。通常我们得到的时候kylin.hbase.client.keyvalue.maxsize的值大于hbase.client.keyvalue.maxsize

请看以下样本麒麟性能

# kylin server's mode 
kylin.server.mode=all 

# optional information for the owner of kylin platform, it can be your team's email 
# currently it will be attached to each kylin's htable attribute 
[email protected] 

# List of web servers in use, this enables one web server instance to sync up with other servers. 
kylin.rest.servers=localhost:7070 

# The metadata store in hbase 
[email protected] 

# The storage for final cube file in hbase 
kylin.storage.url=hbase 

# Temp folder in hdfs, make sure user has the right access to the hdfs directory 
kylin.hdfs.working.dir=/kylin 

# HBase Cluster FileSystem, which serving hbase, format as hdfs://hbase-cluster:8020 
# leave empty if hbase running on same cluster with hive and mapreduce 
kylin.hbase.cluster.fs= 

kylin.job.mapreduce.default.reduce.input.mb=500 

# max job retry on error, default 0: no retry 
kylin.job.retry=0 

# If true, job engine will not assume that hadoop CLI reside on the same server as it self 
# you will have to specify kylin.job.remote.cli.hostname, kylin.job.remote.cli.username and kylin.job.remote.cli.password 
# It should not be set to "true" unless you're NOT running Kylin.sh on a hadoop client machine 
# (Thus kylin instance has to ssh to another real hadoop client machine to execute hbase,hive,hadoop commands) 
kylin.job.run.as.remote.cmd=false 

# Only necessary when kylin.job.run.as.remote.cmd=true 
kylin.job.remote.cli.hostname= 

# Only necessary when kylin.job.run.as.remote.cmd=true 
kylin.job.remote.cli.username= 

# Only necessary when kylin.job.run.as.remote.cmd=true 
kylin.job.remote.cli.password= 

# Used by test cases to prepare synthetic data for sample cube 
kylin.job.remote.cli.working.dir=/tmp/kylin 

# Max count of concurrent jobs running 
kylin.job.concurrent.max.limit=10 

# Time interval to check hadoop job status 
kylin.job.yarn.app.rest.check.interval.seconds=10 

# Hive database name for putting the intermediate flat tables 
kylin.job.hive.database.for.intermediatetable=default 

#default compression codec for htable,snappy,lzo,gzip,lz4 
kylin.hbase.default.compression.codec=snappy 

#the percentage of the sampling, default 100% 
kylin.job.cubing.inmem.sampling.percent=100 

# The cut size for hbase region, in GB. 
kylin.hbase.region.cut=5 

# The hfile size of GB, smaller hfile leading to the converting hfile MR has more reducers and be faster 
# set 0 to disable this optimization 
kylin.hbase.hfile.size.gb=2 

# Enable/disable ACL check for cube query 
kylin.query.security.enabled=true 

# whether get job status from resource manager with kerberos authentication 
kylin.job.status.with.kerberos=false 


## kylin security configurations 

# spring security profile, options: testing, ldap, saml 
# with "testing" profile, user can use pre-defined name/pwd like KYLIN/ADMIN to login 
kylin.security.profile=testing 

# default roles and admin roles in LDAP, for ldap and saml 
acl.defaultRole=ROLE_ANALYST,ROLE_MODELER 
acl.adminRole=ROLE_ADMIN 

#LDAP authentication configuration 
ldap.server=ldap://ldap_server:389 
ldap.username= 
ldap.password= 

#LDAP user account directory; 
ldap.user.searchBase= 
ldap.user.searchPattern= 
ldap.user.groupSearchBase= 

#LDAP service account directory 
ldap.service.searchBase= 
ldap.service.searchPattern= 
ldap.service.groupSearchBase= 

#SAML configurations for SSO 
# SAML IDP metadata file location 
saml.metadata.file=classpath:sso_metadata.xml 
saml.metadata.entityBaseURL=https://hostname/kylin 
saml.context.scheme=https 
saml.context.serverName=hostname 
saml.context.serverPort=443 
saml.context.contextPath=/kylin 


ganglia.group= 
ganglia.port=8664 

## Config for mail service 

# If true, will send email notification; 
mail.enabled=false 
mail.host= 
mail.username= 
mail.password= 
mail.sender= 

###########################config info for web####################### 

#help info ,format{name|displayName|link} ,optional 
kylin.web.help.length=4 
kylin.web.help.0=start|Getting Started| 
kylin.web.help.1=odbc|ODBC Driver| 
kylin.web.help.2=tableau|Tableau Guide| 
kylin.web.help.3=onboard|Cube Design Tutorial| 

#guide user how to build streaming cube 
kylin.web.streaming.guide=http://kylin.apache.org/ 

#hadoop url link ,optional 
kylin.web.hadoop= 
#job diagnostic url link ,optional 
kylin.web.diagnostic= 
#contact mail on web page ,optional 
kylin.web.contact_mail= 

###########################config info for front####################### 

#env DEV|QA|PROD 
deploy.env=QA 

###########################deprecated configs####################### 
kylin.sandbox=true 
kylin.web.hive.limit=20 
# The cut size for hbase region, 
#in GB. 
# E.g, for cube whose capacity be marked as "SMALL", split region per 5GB by default 
kylin.hbase.region.cut.small=5 
kylin.hbase.region.cut.medium=10 
kylin.hbase.region.cut.large=50 
kylin.hbase.client.keyvalue.maxsize=1048576 

里面的属性键值尺寸过大错误集kylin.hbase.client.keyvalue.maxsize = 1048576

0

@尼西ķ阿尼尔

在找不到kylin.properties kylin.hbase.client.keyvalue.maxsize。 Kylin.properties看起来是这样的:

> [[email protected] conf]# cat kylin.properties 
# 
# Licensed to the Apache Software Foundation (ASF) under one or more 
# contributor license agreements. See the NOTICE file distributed with 
# this work for additional information regarding copyright ownership. 
# The ASF licenses this file to You under the Apache License, Version 2.0 
# (the "License"); you may not use this file except in compliance with 
# the License. You may obtain a copy of the License at 
# 
# http://www.apache.org/licenses/LICENSE-2.0 
# 
# Unless required by applicable law or agreed to in writing, software 
# distributed under the License is distributed on an "AS IS" BASIS, 
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. 
# See the License for the specific language governing permissions and 
# limitations under the License. 
# 

# kylin server's mode 
kylin.server.mode=all 

# optional information for the owner of kylin platform, it can be your team's email 
# currently it will be attached to each kylin's htable attribute 
[email protected] 

# List of web servers in use, this enables one web server instance to sync up with other servers. 
kylin.rest.servers=localhost:7070 

# The metadata store in hbase 
[email protected] 

# The storage for final cube file in hbase 
kylin.storage.url=hbase 

# Temp folder in hdfs, make sure user has the right access to the hdfs directory 
kylin.hdfs.working.dir=/kylin 

# HBase Cluster FileSystem, which serving hbase, format as hdfs://hbase-cluster:8020 
# leave empty if hbase running on same cluster with hive and mapreduce 
kylin.hbase.cluster.fs= 

kylin.job.mapreduce.default.reduce.input.mb=500 

# max job retry on error, default 0: no retry 
kylin.job.retry=0 

# If true, job engine will not assume that hadoop CLI reside on the same server as it self 
# you will have to specify kylin.job.remote.cli.hostname, kylin.job.remote.cli.username and kylin.job.remote.cli.password 
# It should not be set to "true" unless you're NOT running Kylin.sh on a hadoop client machine 
# (Thus kylin instance has to ssh to another real hadoop client machine to execute hbase,hive,hadoop commands) 
kylin.job.run.as.remote.cmd=false 

# Only necessary when kylin.job.run.as.remote.cmd=true 
kylin.job.remote.cli.hostname= 

# Only necessary when kylin.job.run.as.remote.cmd=true 
kylin.job.remote.cli.username= 

# Only necessary when kylin.job.run.as.remote.cmd=true 
kylin.job.remote.cli.password= 

# Used by test cases to prepare synthetic data for sample cube 
kylin.job.remote.cli.working.dir=/tmp/kylin 

# Max count of concurrent jobs running 
kylin.job.concurrent.max.limit=10 

# Time interval to check hadoop job status 
kylin.job.yarn.app.rest.check.interval.seconds=10 

# Hive database name for putting the intermediate flat tables 
kylin.job.hive.database.for.intermediatetable=default 

#default compression codec for htable,snappy,lzo,gzip,lz4 
kylin.hbase.default.compression.codec=snappy 

#the percentage of the sampling, default 100% 
kylin.job.cubing.inmem.sampling.percent=100 

# The cut size for hbase region, in GB. 
kylin.hbase.region.cut=5 

# The hfile size of GB, smaller hfile leading to the converting hfile MR has more reducers and be faster 
# set 0 to disable this optimization 
kylin.hbase.hfile.size.gb=2 

# Enable/disable ACL check for cube query 
kylin.query.security.enabled=true 

# whether get job status from resource manager with kerberos authentication 
kylin.job.status.with.kerberos=false 


## kylin security configurations 

# spring security profile, options: testing, ldap, saml 
# with "testing" profile, user can use pre-defined name/pwd like KYLIN/ADMIN to login 
kylin.security.profile=testing 

# default roles and admin roles in LDAP, for ldap and saml 
acl.defaultRole=ROLE_ANALYST,ROLE_MODELER 
acl.adminRole=ROLE_ADMIN 

#LDAP authentication configuration 
ldap.server=ldap://ldap_server:389 
ldap.username= 
ldap.password= 

#LDAP user account directory; 
ldap.user.searchBase= 
ldap.user.searchPattern= 
ldap.user.groupSearchBase= 

#LDAP service account directory 
ldap.service.searchBase= 
ldap.service.searchPattern= 
ldap.service.groupSearchBase= 

#SAML configurations for SSO 
# SAML IDP metadata file location 
saml.metadata.file=classpath:sso_metadata.xml 
saml.metadata.entityBaseURL=https://hostname/kylin 
saml.context.scheme=https 
saml.context.serverName=hostname 
saml.context.serverPort=443 
saml.context.contextPath=/kylin 


ganglia.group= 
ganglia.port=8664 

## Config for mail service 

# If true, will send email notification; 
mail.enabled=false 
mail.host= 
mail.username= 
mail.password= 
mail.sender= 

###########################config info for web####################### 

#help info ,format{name|displayName|link} ,optional 
kylin.web.help.length=4 
kylin.web.help.0=start|Getting Started| 
kylin.web.help.1=odbc|ODBC Driver| 
kylin.web.help.2=tableau|Tableau Guide| 
kylin.web.help.3=onboard|Cube Design Tutorial| 

#guide user how to build streaming cube 
kylin.web.streaming.guide=http://kylin.apache.org/ 

#hadoop url link ,optional 
kylin.web.hadoop= 
#job diagnostic url link ,optional 
kylin.web.diagnostic= 
#contact mail on web page ,optional 
kylin.web.contact_mail= 

###########################config info for front####################### 

#env DEV|QA|PROD 
deploy.env=QA 

###########################deprecated configs####################### 
kylin.sandbox=true 
kylin.web.hive.limit=20 
# The cut size for hbase region, 
#in GB. 
# E.g, for cube whose capacity be marked as "SMALL", split region per 5GB by default 
kylin.hbase.region.cut.small=5 
kylin.hbase.region.cut.medium=10 
kylin.hbase.region.cut.large=50 
+0

在属性文件中设置kylin.hbase.client.keyvalue.maxsize = 1048576 –

0

我们在熔接机前命中关键限值,以及...

从键值还记得规范的关键是需要适应短。 KeyValue#getRowOffset()