四时宝库

程序员的知识宝库

HBase运行模式:独立和分布式(hbase运行机制)

HBase有两种运行模式:独立和分布式。开箱即用,HBase在独立模式下运行。无论你的模式,你将需要通过编辑HBase conf目录中的文件来配置HBase 。至少,您必须编辑conf / hbase-env.sh以告知HBase要使用哪个java。在此文件中,您设置HBase环境变量,如heapsize和其他选项JVM,日志文件的首选位置等。将JAVA_HOME设置为指向您的java安装的根目录。

5.1。独立HBase

这是默认模式。独立模式是快速启动部分中描述的。在独立模式下,HBase不使用HDFS - 它使用本地文件系统 - 它运行所有HBase守护进程和一个本地ZooKeeper在同一个JVM。ZooKeeper绑定到一个众所周知的端口,以便客户端可以与HBase通信。

5.1.1。独立HBase over HDFS

在独立hbase上有时有用的变体是所有守护进程都在一个JVM内运行,而不是持久到本地文件系统,而是持久到HDFS实例。

当您打算使用简单的部署配置文件时,您可能会考虑使用此配置文件,加载很轻,但是数据必须跨节点入口和出口保持。写入到复制数据的HDFS可确保后者。

要配置此独立变体,请编辑您的hbase-site.xml 设置hbase.rootdir以指向HDFS实例中的目录,但是将hbase.cluster.distributed设置 为false。例如:

<configuration> <property> <name>hbase.rootdir</name> <value>hdfs://namenode.example.org:8020/hbase</value> </property> <property> <name>hbase.cluster.distributed</name> <value>false</value> </property> </configuration>

5.2。分散式

分布式模式可以细分为分布式,但所有守护程序在单个节点(也称为伪分布式 ?)上运行,并且完全分布式,其中守护程序分布在集群中的所有节点上。在伪分布式模式与完全分布式的命名来自于Hadoop的。

伪分布式模式可以针对本地文件系统运行,也可以针对Hadoop分布式文件系统(HDFS)的实例运行。全分布式模式只能在HDFS上运行。有关如何设置HDFS的信息,请参阅Hadoop 文档。有关在Hadoop 2上设置HDFS的良好步骤可以在http://www.alexjf.net/blog/distributed-systems/hadoop-yarn-installation-definitive-guide找到。

5.2.1。伪分布式


伪分布式快速启动快速入门已添加到快速入门章节。请参见quickstart-pseudo。最初在本节中的一些信息已移到那里。

伪分布式模式仅仅是在单个主机上运行的完全分布式模式。在HBase上使用此配置测试和原型。不要将此配置用于生产或评估HBase性能。

5.3。全分布

缺省情况下,HBase以独立模式运行。为了小规模测试的目的,提供了独立模式和伪分布式模式。对于生产环境,分布式模式是适当的。在分布式模式下,HBase守护程序的多个实例在集群中的多个服务器上运行。

与伪分布式模式一样,完全分布式配置要求您将hbase-cluster.distributed属性设置为true。通常,hbase.rootdir配置为指向高可用性HDFS文件系统。

此外,集群配置为使多个集群节点作为RegionServers,ZooKeeper QuorumPeers和备份HMaster服务器。这些配置基础都在快速启动完全分布式中演示。

分布式RegionServer

通常,您的群集将包含多个运行在不同服务器上的RegionServer,以及主和备份Master和ZooKeeper守护程序。该CONF / regionservers主服务器上的文件中包含主机,其RegionServers与此群集相关联的列表。每个主机位于单独的线路上。当主服务器启动或停止时,此文件中列出的所有主机将启动和停止其RegionServer进程。

ZooKeeper和HBase

请参阅ZooKeeper部分,了解有关HBase的ZooKeeper设置说明。

示例6.分布式HBase集群示例

这是分布式HBase集群的一个简单的conf / hbase-site.xml。用于真实世界工作的集群将包含更多自定义配置参数。大多数HBase配置指令都有默认值,除非在hbase-site.xml中重写该值。有关详细信息,请参阅“ 配置文件 ”。

<configuration> <property> <name>hbase.rootdir</name> <value>hdfs://namenode.example.org:8020/hbase</value> </property> <property> <name>hbase.cluster.distributed</name> <value>true</value> </property> <property> <name>hbase.zookeeper.quorum</name> <value>node-a.example.com,node-b.example.com,node-c.example.com</value> </property> </configuration>

这是一个示例conf / regionservers文件,其中包含应在集群中运行RegionServer的节点列表。这些节点需要安装HBase,并且需要使用与主服务器相同的conf /目录内容

node-a.example.com node-b.example.com node-c.example.com

这是一个示例conf / backup-masters文件,其中包含应运行备份主实例的每个节点的列表。备份主实例将处于空闲状态,除非主主服务器不可用。

node-b.example.com node-c.example.com

分布式HBase快速入门

有关使用多个ZooKeeper,backup HMaster和RegionServer实例的简单三节点集群配置的详细信息,请参阅快速入门完全分布式。

过程:HDFS客户端配置

  1. 需要注意的是,如果您在Hadoop集群上进行了HDFS客户端配置更改(例如HDFS客户端的配置指令,而不是服务器端配置),则必须使用以下方法之一使HBase能够查看和使用这些配置更改:

    1. 一个指向您的加入HADOOP_CONF_DIRHBASE_CLASSPATH环境变量hbase-env.sh。

    2. 在$ {HBASE_HOME} / conf下添加hdfs-site.xml(或hadoop-site.xml)或更好的符号链接的副本,或

    3. 如果只有一小部分HDFS客户端配置,请将它们添加到hbase-site.xml。

这种HDFS客户端配置的一个示例是dfs.replication。例如,如果要使用复制因子5运行,HBase将创建默认值为3的文件,除非您执行上述操作使配置可用于HBase。

6.运行并确认安装

确保HDFS首先运行。启动和运行停止Hadoop的HDFS守护进程斌/ start-hdfs.sh在过HADOOP_HOME目录。您可以通过测试putget将文件测试到Hadoop文件系统来确保其正确启动。HBase通常不使用MapReduce或YARN守护程序。这些不需要启动。

如果你管理自己的ZooKeeper,启动它并确认它运行,否则HBase将启动ZooKeeper为你的启动过程的一部分。

使用以下命令启动HBase:

bin / start-hbase.sh

HBASE_HOME目录运行以上。

您现在应该有一个正在运行的HBase实例。HBase日志可以在logs子目录中找到。检查出来,特别是如果HBase有麻烦启动。

HBase还提出了一个UI列出重要属性。默认情况下,它在端口16010上部署在主主机上(默认情况下,HBase RegionServers监听端口16020,并在端口16030上建立一个信息HTTP服务器)。如果主服务器在master.example.org默认端口上命名的主机上运行,请将浏览器指向http://master.example.org:16010以查看Web界面。

在HBase 0.98之前,主UI部署在端口60010上,HBase RegionServers UI部署在端口60030上。

HBase启动后,请参阅shell练习部分,了解如何创建表,添加数据,扫描插入,最后禁用和删除表。

要在退出HBase外壳后停止HBase,请输入

$ ./bin/stop-hbase.sh 停止hbase ...............

关机可能需要一段时间才能完成。如果您的集群由许多计算机组成,则可能需要更长时间。如果您正在运行分布式操作,请务必等待,直到HBase完全关闭,然后再停止Hadoop守护程序。

7.默认配置

7.1。hbase-site.xml和hbase-default.xml

正如在Hadoop中,您将特定于站点的HDFS配置添加到hdfs-site.xml文件中,对于HBase,特定于站点的自定义将存储在conf / hbase-site.xml文件中。有关可配置属性的列表,请参阅下面的hbase默认配置,或查看src / main / resources下的HBase源代码中的原始hbase-default.xml源文件。

不是所有的配置选项都使用hbase-default.xml。配置,认为很少有人会改变可以只存在于代码中; 打开这种配置的唯一方法是通过读取源代码本身。

目前,这里的更改需要重新启动集群以便HBase注意到更改。

7.2。HBase默认配置

以下文档是使用默认hbase配置文件hbase-default.xml作为源生成的。

  • hbase.tmp.dir

  • 描述

    本地文件系统上的临时目录。更改此设置以指向比'/ tmp'更常驻的位置,这是java.io.tmpdir的常规解决方案,因为在/ tmp目录在计算机重新启动时被清除。

    默认

    ${java.io.tmpdir}/hbase-${user.name}

  • hbase.rootdir

  • 描述

    由区域服务器共享并且HBase持久存储的目录。URL应该是“完全限定的”以包括文件系统方案。例如,要指定HDFS实例的namenode在端口9000上的namenode.example.org上运行的HDFS目录“/ hbase”,请将此值设置为:hdfs://namenode.example.org:9000 / hbase。默认情况下,我们写入任何$ {hbase.tmp.dir}设置太 - 通常/ tmp - 所以更改此配置或者所有数据将在机器重新启动时丢失。

    默认

    ${hbase.tmp.dir}/hbase

  • hbase.fs.tmp.dir

  • 描述

    用于保留临时数据的默认文件系统(HDFS)中的暂存目录。

    默认

    /user/${user.name}/hbase-staging

  • hbase.cluster.distributed

  • 描述

    集群将处于的模式。对于独立模式,可能的值为false,对于分布式模式为true。如果为false,启动将在一个JVM中一起运行所有HBase和ZooKeeper守护进程。

    默认

    false

  • hbase.zookeeper.quorum

  • 描述

    逗号分隔ZooKeeper集合中的服务器列表(此配置应该被命名为hbase.zookeeper.ensemble)。例如,“host1.mydomain.com,host2.mydomain.com,host3.mydomain.com”。默认情况下,这被设置为localhost用于本地和伪分布式操作模式。对于完全分布式设置,应将其设置为ZooKeeper集合服务器的完整列表。如果HBASE_MANAGES_ZK在hbase-env.sh中设置,这是hbase将启动/停止ZooKeeper作为集群启动/停止的一部分的服务器的列表。客户端,我们将采用这个集合成员列表,并与hbase.zookeeper.clientPort配置一起。并将其作为connectString参数传递给zookeeper构造函数。

    默认

    localhost

  • hbase.local.dir

  • 描述

    要用作本地存储的本地文件系统上的目录。

    默认

    ${hbase.tmp.dir}/local/

  • hbase.master.port

  • 描述

    HBase主机应绑定的端口。

    默认

    16000

  • hbase.master.info.port

  • 描述

    HBase Master Web UI的端口。如果不想运行UI实例,请设置为-1。

    默认

    16010

  • hbase.master.info.bindAddress

  • 描述

    HBase Master Web UI的绑定地址

    默认

    0.0.0.0

  • hbase.master.logcleaner.plugins

  • 描述

    由LogsCleaner服务调用的BaseLogCleanerDelegate的逗号分隔列表。这些WAL清洁程序是按顺序调用的,所以把清洁程序修剪最前面的文件。要实现您自己的BaseLogCleanerDelegate,只需将其放在HBase的类路径中,并在此处添加完全限定类名。始终在列表中添加上述默认日志清除器。

    默认

    org.apache.hadoop.hbase.master.cleaner.TimeToLiveLogCleaner

  • hbase.master.logcleaner.ttl

  • 描述

    WAL可以保留在.oldlogdir目录中的最大时间,之后它将被主线程清除。

    默认

    600000

  • hbase.master.hfilecleaner.plugins

  • 描述

    由HFileCleaner服务调用的BaseHFileCleanerDelegate的逗号分隔列表。这些HFiles清洁程序是按顺序调用,所以把清洁程序修剪最前面的文件。要实现您自己的BaseHFileCleanerDelegate,只需将其放在HBase的类路径中,并在此处添加完全限定类名。始终在列表中添加上述默认日志清除程序,因为它们将在hbase-site.xml中被覆盖。

    默认

    org.apache.hadoop.hbase.master.cleaner.TimeToLiveHFileCleaner

  • hbase.master.infoserver.redirect

  • 描述

    主机是否监听Master Web UI端口(hbase.master.info.port),并将请求重定向到由Master和RegionServer共享的Web UI服务器。

    默认

    true

  • hbase.regionserver.port

  • 描述

    HBase RegionServer绑定的端口。

    默认

    16020

  • hbase.regionserver.info.port

  • 描述

    HBase RegionServer Web UI的端口如果不希望运行RegionServer UI,请设置为-1。

    默认

    16030

  • hbase.regionserver.info.bindAddress

  • 描述

    HBase RegionServer Web UI的地址

    默认

    0.0.0.0

  • hbase.regionserver.info.port.auto

  • 描述

    主服务器或RegionServer UI是否应搜索要绑定的端口。如果hbase.regionserver.info.port已在使用,则启用自动端口搜索。用于测试,默认情况下关闭。

    默认

    false

  • hbase.regionserver.handler.count

  • 描述

    在RegionServers上启动的RPC侦听器实例计数。主器件使用相同的属性来计算主处理器。

    默认

    30

  • hbase.ipc.server.callqueue.handler.factor

  • 描述

    因素确定呼叫队列的数量。值0表示在所有处理程序之间共享的单个队列。值为1表示每个处理程序都有自己的队列。

    默认

    0.1

  • hbase.ipc.server.callqueue.read.ratio

  • 描述

    将呼叫队列拆分为读写队列。指定的间隔(应在0.0和1.0之间)将乘以呼叫队列的数量。值为0表示不拆分调用队列,这意味着读取和写入请求都将推送到同一组队列。低于0.5的值意味着将有比写队列少的读队列。值0.5表示将有相同数量的读取和写入队列。大于0.5的值意味着将有比写队列更多的读队列。值为1.0表示除了一个之外的所有队列都用于分派读取请求。示例:假设呼叫队列的总数为10,则read.ratio为0意味着:10个队列将包含读/写请求。read.ratio为0.3表示:3个队列将只包含读请求,7个队列只包含写请求。read.ratio为0.5意味着:5个队列只包含读请求,5个队列只包含写请求。read.ratio为0.8表示:8个队列只包含读请求,2个队列只包含写请求。read.ratio为1表示:9个队列只包含读请求,1个队列只包含写请求。

    默认

    0

  • hbase.ipc.server.callqueue.scan.ratio

  • 描述

    给定读取调用队列的数量,根据调用队列的总数乘以callqueue.read.ratio计算,scan.ratio属性将读取调用队列分为小读队列和长读队列。低于0.5的值意味着长读队列比短读队列少。值为0.5意味着将存在相同数量的短读和长读队列。大于0.5的值表示将存在比短读队列更多的长读队列值为0或1表示对get和scan使用相同的队列集。示例:给定读取调用队列的总数为8,scan.ratio为0或1表示:8个队列将包含长读取请求和短读取请求。scan.ratio为0.3表示:2个队列只包含长读请求,6个队列只包含短读请求。scan.ratio为0.5表示:4个队列只包含长读请求,4个队列只包含短读请求。scan.ratio为0.8表示:6个队列只包含长读请求,2个队列只包含短读请求。

    默认

    0

  • hbase.regionserver.msginterval

  • 描述

    从RegionServer到主服务器的消息之间的时间间隔(以毫秒为单位)。

    默认

    3000

  • hbase.regionserver.logroll.period

  • 描述

    无论有多少修改,我们将滚动提交日志的时间段。

    默认

    3600000

  • hbase.regionserver.logroll.errors.tolerated

  • 描述

    在触发服务器中止之前我们将允许的连续WAL关闭错误的数量。如果在日志滚动期间关闭当前WAL写入器失败,设置为0将导致区域服务器中止。即使一个小的值(2或3)将允许一个区域服务器骑过临时HDFS错误。

    默认

    2

  • hbase.regionserver.hlog.reader.impl

  • 描述

    WAL文件读取器实现。

    默认

    org.apache.hadoop.hbase.regionserver.wal.ProtobufLogReader

  • hbase.regionserver.hlog.writer.impl

  • 描述

    WAL文件写入器实现。

    默认

    org.apache.hadoop.hbase.regionserver.wal.ProtobufLogWriter

  • hbase.regionserver.global.memstore.size

  • 描述

    在阻止新更新并强制刷新之前,区域服务器中所有memstore的最大大小。默认为堆的40%(0.4)。更新被阻止,并且刷新被强制,直到区域服务器中的所有memstore的大小达到hbase.regionserver.global.memstore.size.lower.limit。为了尊重旧的hbase.regionserver.global.memstore.upperLimit属性(如果存在),此配置中的默认值已被有意留为空。

    默认

    没有

  • hbase.regionserver.global.memstore.size.lower.limit

  • 描述

    强制刷新之前区域服务器中所有memstore的最大大小。默认为hbase.regionserver.global.memstore.size(0.95)的95%。此值的100%值导致在由于memstore限制而阻止更新时发生最小可能的刷新。此配置中的默认值已被故意留为空,以便尊重旧的hbase.regionserver.global.memstore.lowerLimit属性(如果存在)。

    默认

    没有

  • hbase.regionserver.optionalcacheflushinterval

  • 描述

    在自动刷新之前,编辑在内存中存在的最长时间。默认1小时。将其设置为0以禁用自动冲洗。

    默认

    3600000

  • hbase.regionserver.dns.interface

  • 描述

    区域服务器应报告其IP地址的网络接口的名称。

    默认

    default

  • hbase.regionserver.dns.nameserver

  • 描述

    名称服务器(DNS)的主机名或IP地址,区域服务器应使用该名称或IP地址来确定主机用于通信和显示目的的主机名。

    默认

    default

  • hbase.regionserver.region.split.policy

  • 描述

    拆分策略确定何时应拆分区域。当前可用的各种其他拆分策略是BusyRegionSplitPolicy,ConstantSizeRegionSplitPolicy,DisabledRegionSplitPolicy,DelimitedKeyPrefixRegionSplitPolicy和KeyPrefixRegionSplitPolicy。DisabledRegionSplitPolicy阻止手动区域分割。

    默认

    org.apache.hadoop.hbase.regionserver.SteppingSplitPolicy

  • hbase.regionserver.regionSplitLimit

  • 描述

    对其后不再发生区域拆分的区域数量的限制。这不是区域数量的硬限制,而是作为区域服务器在一定限制之后停止拆分的指南。默认值设置为1000。

    默认

    1000

  • zookeeper.session.timeout

  • 描述

    ZooKeeper会话超时(以毫秒为单位)。它以两种不同的方式使用。首先,该值用于HBase用于连接到集合的ZK客户端。它也由HBase在启动ZK服务器时使用,并作为“maxSessionTimeout”传递。请参阅http://hadoop.apache.org/zookeeper/docs/current/zookeeperProgrammers.html#ch_zkSessions。例如,如果HBase区域服务器连接到也由HBase管理的ZK集合,则会话超时将是此配置指定的会话超时。但是,连接到使用不同配置管理的集合的区域服务器将受到该集合的maxSessionTimeout。所以,即使HBase可能建议使用90秒,集合可以有一个最大超时低于这个,它将优先。

    默认

    90000

  • zookeeper.znode.parent

  • 描述

    Root ZNode for HBase in ZooKeeper。所有使用相对路径配置的HBase的ZooKeeper文件将位于此节点下。默认情况下,所有HBase的ZooKeeper文件路径都配置有相对路径,因此它们都将位于此目录下,除非更改。

    默认

    /hbase

  • zookeeper.znode.acl.parent

  • 描述

    Root ZNode用于访问控制列表。

    默认

    acl

  • hbase.zookeeper.dns.interface

  • 描述

    ZooKeeper服务器应从其报告其IP地址的网络接口的名称。

    默认

    default

  • hbase.zookeeper.dns.nameserver

  • 描述

    名称服务器(DNS)的主机名或IP地址,ZooKeeper服务器应使用该名称或IP地址来确定主服务器用于通信和显示目的的主机名。

    默认

    default

  • hbase.zookeeper.peerport

  • 描述

    ZooKeeper对等体使用的端口相互通信。见http://hadoop.apache.org/zookeeper/docs/r3.1.1/zookeeperStarted.html#sc_RunningReplicatedZooKeeper以获取更多信息。

    默认

    2888

  • hbase.zookeeper.leaderport

  • 描述

    ZooKeeper用于领导选举的端口。见http://hadoop.apache.org/zookeeper/docs/r3.1.1/zookeeperStarted.html#sc_RunningReplicatedZooKeeper以获取更多信息。

    默认

    3888

  • hbase.zookeeper.property.initLimit

  • 描述

    属性从ZooKeeper的配置zoo.cfg。初始同步阶段可以采用的滴答数。

    默认

    10

  • hbase.zookeeper.property.syncLimit

  • 描述

    属性从ZooKeeper的配置zoo.cfg。在发送请求和获取确认之间可以通过的记号数。

    默认

    5

  • hbase.zookeeper.property.dataDir

  • 描述

    属性从ZooKeeper的配置zoo.cfg。存储快照的目录。

    默认

    ${hbase.tmp.dir}/zookeeper

  • hbase.zookeeper.property.clientPort

  • 描述

    属性从ZooKeeper的配置zoo.cfg。客户端将连接的端口。

    默认

    2181

  • hbase.zookeeper.property.maxClientCnxns

  • 描述

    属性从ZooKeeper的配置zoo.cfg。限制由IP地址标识的单个客户端可能对ZooKeeper集合的单个成员创建的并发连接数(在套接字级别)。设置为高以避免运行独立和伪分布式的zk连接问题。

    默认

    300

  • hbase.client.write.buffer

  • 描述

    HTable客户端写缓冲区的默认大小(以字节为单位)。更大的缓冲区需要更多的内存 - 在客户端和服务器端,因为服务器实例化传递的写缓冲区来处理它 - 但是更大的缓冲区大小减少了RPC的数量。对于使用的服务器端内存的估计,请评估hbase.client.write.buffer * hbase.regionserver.handler.count

    默认

    2097152

  • hbase.client.pause

  • 描述

    常规客户端暂停值。在运行重试失败的获取,区域查找等之前,主要用作等待的值。请参阅hbase.client.retries.number,以描述我们如何从此初始暂停量中退出,以及此暂停如何工作。

    默认

    100

  • hbase.client.pause.cqtbe

  • 描述

    是否对CallQueueTooBigException(cqtbe)使用特殊的客户端暂停。如果您观察到来自同一个RegionServer的频繁CQTBE并且呼叫队列保持充满,请将此属性设置为比hbase.client.pause更高的值

    默认

    没有

  • hbase.client.retries.number

  • 描述

    最大重试次数。用于所有可重试操作的最大值,例如获取单元格值,开始行更新等。重试间隔是基于hbase.client.pause的粗略函数。首先,我们在这个时间间隔重试,但是后退,我们很快就达到每十秒钟重试。有关备份如何升级的信息,请参阅HConstants#RETRY_BACKOFF。更改此设置和hbase.client.pause以适应您的工作负载。

    默认

    35

  • hbase.client.max.total.tasks

  • 描述

    单个HTable实例将发送到集群的并发突变任务的最大数量。

    默认

    100

  • hbase.client.max.perserver.tasks

  • 描述

    单个HTable实例将向单个区域服务器发送的并发突变任务的最大数量。

    默认

    5

  • hbase.client.max.perregion.tasks

  • 描述

    客户端将维护到单个区域的并发突变任务的最大数量。也就是说,如果这个区域已经有hbase.client.max.perregion.tasks正在写入,则新的puts将不会发送到此区域,直到一些写入完成。

    默认

    1

  • hbase.client.perserver.requests.threshold

  • 描述

    所有客户端线程中一个服务器的并发等待请求的最大数量(进程级别)。超过请求将立即抛出ServerTooBusyException以防止用户的线程被一个缓慢区域服务器占用和阻止。如果使用固定数量的线程以同步方式访问HBase,请将此值设置为与线程数相关的合适值,这将有助于您。有关详细信息,请参见https://issues.apache.org/jira/browse/HBASE-16388。

    默认

    2147483647

  • hbase.client.scanner.caching

  • 描述

    在扫描器上调用next时,如果未从(本地,客户端)内存提供,我们尝试获取的行数。此配置与hbase.client.scanner.max.result.size一起工作,以便有效地尝试和使用网络。缺省值为Integer.MAX_VALUE,以便网络将填充由hbase.client.scanner.max.result.size定义的块大小,而不是由特定行数限制,因为行的大小将表更改为表。如果提前知道扫描时不需要超过一定数量的行,则应通过Scan#setCaching将此配置设置为该行限制。更高的缓存值将使更快的扫描程序,但将吃掉更多的内存和一些调用下一个可能需要更长和更长的时间,当缓存是空的。不要设置此值,以使调用之间的时间大于扫描程序超时; 即hbase.client.scanner.timeout.period

    默认

    2147483647

  • hbase.client.keyvalue.maxsize

  • 描述

    指定KeyValue实例的组合最大允许大小。这是为保存在存储文件中的单个条目设置上边界。因为它们不能被分割,所以有助于避免区域不能被进一步分割,因为数据太大。将此设置为最大区域大小的一小部分似乎是明智的。将其设置为零或更小将禁用检查。

    默认

    10485760

  • hbase.client.scanner.timeout.period

  • 描述

    客户端扫描程序租期(以毫秒为单位)。

    默认

    60000

  • hbase.client.localityCheck.threadPoolSize

  • 默认

    2

  • hbase.bulkload.retries.number

  • 描述

    最大重试次数。这是在面对分割操作时试图进行原子批量加载的最大迭代次数0意味着永远不会放弃。

    默认

    10

  • hbase.master.balancer.maxRitPercent

  • 描述

    平衡过渡中区域的最大百分比。默认值为1.0。所以没有平衡器节流。如果将此配置设置为0.01,则意味着平衡过程中最多有1%的区域。然后在平衡时集群的可用性至少为99%。

    默认

    1.0

  • hbase.balancer.period

  • 描述

    区域平衡器在主服务器中运行的时间段。

    默认

    300000

  • hbase.normalizer.period

  • 描述

    区域规范器在主器件中运行的时间段。

    默认

    1800000

  • hbase.regions.slop

  • 描述

    如果任何regionserver具有平均值+(平均*斜率)区域,则重新平衡。此参数的默认值在StochasticLoadBalancer(默认负载平衡器)中为0.001,而其他负载平衡器(即SimpleLoadBalancer)中的默认值为0.2。

    默认

    0.001

  • hbase.server.thread.wakefrequency

  • 描述

    在工作搜索之间睡眠的时间(以毫秒为单位)。用作服务线程(如对数滚轴)的睡眠间隔。

    默认

    10000

  • hbase.server.versionfile.writeattempts

  • 描述

    在中止前重试尝试写入版本文件的次数。每次尝试由hbase.server.thread.wakefrequency毫秒分隔。

    默认

    3

  • hbase.hregion.memstore.flush.size

  • 描述

    如果memstore的大小超过这个字节数,Memstore将被刷新到磁盘。值由运行每个hbase.server.thread.wakefrequency的线程检查。

    默认

    134217728

  • hbase.hregion.percolumnfamilyflush.size.lower.bound.min

  • 描述

    如果使用FlushLargeStoresPolicy并且有多个列族,那么每次我们达到总的memstore限制,我们找到所有的列族,它们的memstore超过一个“下限”,只保留其他的内存。默认情况下,“下限”将为“hbase.hregion.memstore.flush.size / column_family_number”,除非此属性的值大于此值。如果没有一个家族的memstore大小超过下限,所有的memstore都将被刷新(像往常一样)。

    默认

    16777216

  • hbase.hregion.preclose.flush.size

  • 描述

    如果区域中的memstore在关闭时处于此大小或更大的大小,则在放置区域关闭标志并使该区域脱机之前,运行“预刷新”以清除memstore。在关闭时,在关闭标志下运行刷新以清空内存。在这段时间,该地区是离线的,我们不接受任何写。如果memstore内容很大,这个flush可能需要很长时间才能完成。预冲器意在清除memstore的大部分,然后放置关闭标志并使该区域脱机,所以在关闭标志下运行的刷新没有什么作用。

    默认

    5242880

  • hbase.hregion.memstore.block.multiplier

  • 描述

    如果memstore有hbase.hregion.memstore.block.multiplier乘以hbase.hregion.memstore.flush.size字节,则阻止更新。有用的防止在更新流量峰值期间失控memstore。没有上限,memstore填充,当它刷新结果flush文件需要很长时间来压缩或分裂,或更糟糕的是,我们OOME。

    默认

    4

  • hbase.hregion.memstore.mslab.enabled

  • 描述

    启用MemStore - 本地分配缓冲区,这是一个功能,可防止在大量写入负载下的堆碎片。这可以减少大堆上停止GC停留的频率。

    默认

    true

  • hbase.hregion.max.filesize

  • 描述

    最大HFile大小。如果区域的HFiles的大小的总和已经增长到超过该值,则该区域被分成两部分。

    默认

    10737418240

  • hbase.hregion.majorcompaction

  • 描述

    主要压缩之间的时间,以毫秒表示。设置为0以禁用基于时间的自动主要压缩。用户请求和基于大小的主要压缩仍将运行。此值乘以hbase.hregion.majorcompaction.jitter以使压缩在给定时间窗口内的某个随机时间开始。默认值为7天,以毫秒为单位。如果主要压缩导致您的环境中断,您可以将其配置为在非高峰时间部署运行,或通过将此参数设置为0来禁用基于时间的主要压缩,并在cron作业或另一个cron作业中运行主要压缩外部机制。

    默认

    604800000

  • hbase.hregion.majorcompaction.jitter

  • 描述

    将乘数应用于hbase.hregion.majorcompaction,以使压缩在hbase.hregion.majorcompaction两侧发生给定时间量。数字越小,压缩就越接近hbase.hregion.majorcompaction间隔。

    默认

    0.50

  • hbase.hstore.compactionThreshold

  • 描述

    如果任何一个Store中存在超过此数量的StoreFiles(每次MemStore刷新一个StoreFile),则运行压缩以将所有StoreFile重写到单个StoreFile中。较大的值延迟压缩,但是当压缩确实发生时,完成需要更长时间。

    默认

    3

  • hbase.hstore.flusher.count

  • 描述

    冲洗螺纹数。使用较少的线程,MemStore刷新将排队。使用更多线程,刷新将并行执行,增加HDFS上的负载,并可能导致更多的压缩。

    默认

    2

  • hbase.hstore.blockingStoreFiles

  • 描述

    如果任何一个Store中存在超过此数量的StoreFiles(每次清空MemStore时将写入一个StoreFile),则在完成压缩之前,或者直到超过hbase.hstore.blockingWaitTime时,将阻止此区域的更新。

    默认

    10

  • hbase.hstore.blockingWaitTime

  • 描述

    区域在达到由hbase.hstore.blockingStoreFiles定义的StoreFile限制后阻止更新的时间。在此时间过后,即使压缩尚未完成,该区域也将停止阻止更新。

    默认

    90000

  • hbase.hstore.compaction.min

  • 描述

    压缩之前必须有资格进行压缩的StoreFiles的最小数量才能运行。调整hbase.hstore.compaction.min的目的是避免结束太多的小型StoreFiles来压缩。将此值设置为2将在每次在商店中有两个StoreFiles时导致轻微的压缩,这可能是不合适的。如果将此值设置得过高,则需要相应地调整所有其他值。在大多数情况下,默认值是适当的。在以前的HBase版本中,参数hbase.hstore.compaction.min被命名为hbase.hstore.compactionThreshold。

    默认

    3

  • hbase.hstore.compaction.max

  • 描述

    将为单个次要压缩选择的StoreFiles的最大数量,而不考虑合格的StoreFiles的数量。实际上,hbase.hstore.compaction.max的值控制单个压缩完成所需的时间长度。将其设置为较大意味着更多StoreFiles包含在压缩中。在大多数情况下,默认值是适当的。

    默认

    10

  • hbase.hstore.compaction.min.size

  • 描述

    小于此大小的StoreFile(或使用ExploringCompactionPolicy的StoreFiles选择)将始终有资格进行小型压缩。此大小或更大的HFiles由hbase.hstore.compaction.ratio进行评估,以确定它们是否符合条件。由于此限制表示小于此值的所有StoreFiles的“自动包含”限制,因此可能需要在写入量很大的环境中减少此值,因为每个StoreFile都将被刷新,因此会清除1-2 MB范围内的许多StoreFiles用于压缩,并且所得到的StoreFile可能仍然处于最小尺寸并且需要进一步压实。如果该参数降低,则比率检查更快地触发。这解决了在早期版本的HBase中看到的一些问题,但在大多数情况下不再需要更改此参数。

    默认

    134217728

  • hbase.hstore.compaction.max.size

  • 描述

    大于此大小的StoreFile(或使用ExploringCompactionPolicy时的StoreFiles选择)将从压缩中排除。提高hbase.hstore.compaction.max.size的效果较少,较大的StoreFiles经常无法压缩。如果你觉得压缩过于频繁而没有很多好处,你可以尝试提高这个值。默认值:LONG.MAX_VALUE的值,以字节表示。

    默认

    9223372036854775807

  • hbase.hstore.compaction.ratio

  • 描述

    对于较小压缩,此比率用于确定大于hbase.hstore.compaction.min.size的给定StoreFile是否有资格进行压缩。它的作用是限制大StoreFiles的压缩。hbase.hstore.compaction.ratio的值表示为浮点小数。大的比率,例如10,将产生单个巨大的StoreFile。相反,低值(例如.25)将产生类似于BigTable压缩算法的行为,从而产生四个StoreFiles。建议在1.0和1.4之间的中等值。调整此值时,您将平衡写入成本与读取成本。提高值(类似1.4)将有更多的写入成本,因为您将压缩更大的StoreFiles。但是,在读取期间,HBase将需要通过更少的StoreFiles来完成读取。如果你不能利用布隆过滤器,请考虑这种方法。否则,您可以将此值降低到类似于1.0的值,以降低写入的背景成本,并使用布隆过滤器来控制在读取期间触摸的StoreFiles数。在大多数情况下,默认值是适当的。

    默认

    1.2F

  • hbase.hstore.compaction.ratio.offpeak

  • 描述

    允许您设置不同的(默认情况下,更积极)比率,以确定在非高峰时间期间是否将更大的StoreFiles包括在压缩中。工作方式与hbase.hstore.compaction.ratio相同。仅当hbase.offpeak.start.hour和hbase.offpeak.end.hour也启用时适用。

    默认

    5.0F

  • hbase.hstore.time.to.purge.deletes

  • 描述

    延迟清除具有未来时间戳的删除标记的时间量。如果未设置,或设置为0,则在下一个主要压缩期间清除所有删除标记,包括具有未来时间戳的那些标记。否则,将保留删除标记,直到在标记的时间戳加上此设置的值之后发生的主要压缩(以毫秒为单位)。

    默认

    0

  • hbase.offpeak.start.hour

  • 描述

    非高峰时间的开始,表示为0和23之间的整数,包括0和23。设置为-1可禁用非峰值。

    默认

    -1

  • hbase.offpeak.end.hour

  • 描述

    非高峰小时的结束,表示为0和23之间的整数,包括0和23。设置为-1可禁用非峰值。

    默认

    -1

  • hbase.regionserver.thread.compaction.throttle

  • 描述

    有两个不同的线程池用于压缩,一个用于大型压缩,另一个用于小型压缩。这有助于快速地保持精简表(例如hbase:meta)的压缩。如果压缩大于此阈值,它将进入大压缩池。在大多数情况下,默认值是适当的。默认值:2 x hbase.hstore.compaction.max x hbase.hregion.memstore.flush.size(默认为128MB)。值字段假定hbase.hregion.memstore.flush.size的值与缺省值保持不变。

    默认

    2684354560

  • hbase.hstore.compaction.kv.max

  • 描述

    在刷新或压缩时要读取并在批处理中写入的KeyValues的最大数量。如果您有大的KeyValues和内存异常出现问题,请将此值设置得更低如果您有宽的小行,请将此值设置得更高。

    默认

    10

  • hbase.storescanner.parallel.seek.enable

  • 描述

    在StoreScanner中启用StoreFileScanner并行搜索,这是一种可以减少特殊情况下的响应延迟的功能。

    默认

    false

  • hbase.storescanner.parallel.seek.threads

  • 描述

    如果启用了并行搜索功能,则为默认线程池大小。

    默认

    10

  • hfile.block.cache.size

  • 描述

    分配给StoreFile使用的块缓存的最大堆(-Xmx设置)的百分比。默认值为0.4意味着分配40%。设置为0可禁用,但不建议使用; 您至少需要足够的缓存来保存storefile索引。

    默认

    0.4

  • hfile.block.index.cacheonwrite

  • 描述

    这允许在索引被写入时将非根多级索引块放入块高速缓存中。

    默认

    false

  • hfile.index.block.max.size

  • 描述

    When the size of a leaf-level, intermediate-level, or root-level index block in a multi-level block index grows to this size, the block is written out and a new block is started.

    Default

    131072

  • hbase.bucketcache.ioengine

  • Description

    Where to store the contents of the bucketcache. One of: heap, offheap, or file. If a file, set it to file:PATH_TO_FILE. See http://hbase.apache.org/book.html#offheap.blockcache for more information.

    Default

    none

  • hbase.bucketcache.combinedcache.enabled

  • Description

    Whether or not the bucketcache is used in league with the LRU on-heap block cache. In this mode, indices and blooms are kept in the LRU blockcache and the data blocks are kept in the bucketcache.

    Default

    true

  • hbase.bucketcache.size

  • Description

    A float that EITHER represents a percentage of total heap memory size to give to the cache (if < 1.0) OR, it is the total capacity in megabytes of BucketCache. Default: 0.0

    Default

    none

  • hbase.bucketcache.bucket.sizes

  • Description

    A comma-separated list of sizes for buckets for the bucketcache. Can be multiple sizes. List block sizes in order from smallest to largest. The sizes you use will depend on your data access patterns. Must be a multiple of 1024 else you will run into 'java.io.IOException: Invalid HFile block magic' when you go to read from cache. If you specify no values here, then you pick up the default bucketsizes set in code (See BucketAllocator#DEFAULT_BUCKET_SIZES).

    Default

    none

  • hfile.format.version

  • Description

    The HFile format version to use for new files. Version 3 adds support for tags in hfiles (See http://hbase.apache.org/book.html#hbase.tags). Distributed Log Replay requires that tags are enabled. Also see the configuration 'hbase.replication.rpc.codec'.

    Default

    3

  • hfile.block.bloom.cacheonwrite

  • Description

    Enables cache-on-write for inline blocks of a compound Bloom filter.

    Default

    false

  • io.storefile.bloom.block.size

  • Description

    复合Bloom过滤器的单个块(“块”)的大小(以字节为单位)。该大小是近似的,因为Bloom块只能在数据块边界处插入,并且每个数据块的密钥数量不同。

    默认

    131072

  • hbase.rs.cacheblocksonwrite

  • 描述

    当块完成时,是否应将HFile块添加到块缓存中。

    默认

    false

  • hbase.rpc.timeout

  • 描述

    这是为RPC层定义HBase客户端应用程序用于远程调用超时的时间(毫秒)。它使用ping来检查连接,但最终会抛出TimeoutException。

    默认

    60000

  • hbase.client.operation.timeout

  • 描述

    操作超时是一个顶级限制(毫秒),它确保表中的阻塞操作不会被阻塞。在每个操作中,如果rpc请求由于超时或其他原因失败,它将重试直到成功或抛出RetriesExhaustedException。但是如果总时间被阻塞达到操作超时,在重试耗尽之前,它将提前中断并抛出SocketTimeoutException。

    默认

    1200000

  • hbase.cells.scanned.per.heartbeat.check

  • 描述

    在心跳检查之间扫描的单元数。心跳检查在扫描处理期间发生,以确定服务器是否应停止扫描,以便将心跳消息发送回客户端。心跳消息用于在长时间运行的扫描期间保持客户端 - 服务器连接活动。小值意味着心跳检查将更频繁地发生,并且因此将提供对扫描的执行时间的更严格的界限。较大的值意味着心跳检查的频率较低

    默认

    10000

  • hbase.rpc.shortoperation.timeout

  • 描述

    这是另一个版本的“hbase.rpc.timeout”。对于集群中的那些RPC操作,我们依靠此配置为短操作设置短暂的超时限制。例如,区域服务器尝试向活动主节点报告的短rpc超时可以更快地实现主故障转移过程。

    默认

    10000

  • hbase.ipc.client.tcpnodelay

  • 描述

    在rpc套接字连接上不设置延迟。请参阅http://docs.oracle.com/javase/1.5.0/docs/api/java/net/Socket.html#getTcpNoDelay()

    默认

    true

  • hbase.regionserver.hostname

  • 描述

    这个配置是专家:不要设置其值,除非你真的知道你在做什么。当设置为非空值时,这表示底层服务器的(外部面向)主机名。有关详细信息,请参见https://issues.apache.org/jira/browse/HBASE-12954。

    默认

    没有

  • hbase.master.keytab.file

  • 描述

    Full path to the kerberos keytab file to use for logging in the configured HMaster server principal.

    Default

    none

  • hbase.master.kerberos.principal

  • Description

    Ex. "hbase/_HOST@EXAMPLE.COM". The kerberos principal name that should be used to run the HMaster process. The principal name should be in the form: user/hostname@DOMAIN. If "_HOST" is used as the hostname portion, it will be replaced with the actual hostname of the running instance.

    Default

    none

  • hbase.regionserver.keytab.file

  • Description

    Full path to the kerberos keytab file to use for logging in the configured HRegionServer server principal.

    Default

    none

  • hbase.regionserver.kerberos.principal

  • Description

    Ex. "hbase/_HOST@EXAMPLE.COM". The kerberos principal name that should be used to run the HRegionServer process. The principal name should be in the form: user/hostname@DOMAIN. If "_HOST" is used as the hostname portion, it will be replaced with the actual hostname of the running instance. An entry for this principal must exist in the file specified in hbase.regionserver.keytab.file

    Default

    none

  • hadoop.policy.file

  • Description

    The policy configuration file used by RPC servers to make authorization decisions on client requests. Only used when HBase security is enabled.

    Default

    hbase-policy.xml

  • hbase.superuser

  • Description

    List of users or groups (comma-separated), who are allowed full privileges, regardless of stored ACLs, across the cluster. Only used when HBase security is enabled.

    Default

    none

  • hbase.auth.key.update.interval

  • Description

    The update interval for master key for authentication tokens in servers in milliseconds. Only used when HBase security is enabled.

    Default

    86400000

  • hbase.auth.token.max.lifetime

  • Description

    The maximum lifetime in milliseconds after which an authentication token expires. Only used when HBase security is enabled.

    Default

    604800000

  • hbase.ipc.client.fallback-to-simple-auth-allowed

  • Description

    When a client is configured to attempt a secure connection, but attempts to connect to an insecure server, that server may instruct the client to switch to SASL SIMPLE (unsecure) authentication. This setting controls whether or not the client will accept this instruction from the server. When false (the default), the client will not allow the fallback to SIMPLE authentication, and will abort the connection.

    Default

    false

  • hbase.ipc.server.fallback-to-simple-auth-allowed

  • Description

    When a server is configured to require secure connections, it will reject connection attempts from clients using SASL SIMPLE (unsecure) authentication. This setting allows secure servers to accept SASL SIMPLE connections from clients when the client requests. When false (the default), the server will not allow the fallback to SIMPLE authentication, and will reject the connection. WARNING: This setting should ONLY be used as a temporary measure while converting clients over to secure authentication. It MUST BE DISABLED for secure operation.

    Default

    false

  • hbase.display.keys

  • Description

    When this is set to true the webUI and such will display all start/end keys as part of the table details, region names, etc. When this is set to false, the keys are hidden.

    默认

    true

  • hbase.coprocessor.enabled

  • Description

    启用或禁用协处理器加载。如果'false'(禁用),任何其他协处理器相关配置将被忽略。

    默认

    true

  • hbase.coprocessor.user.enabled

  • 描述

    启用或禁用用户(也称为表)协处理器加载。如果为'false'(禁用),表描述符中的任何表协处理器属性将被忽略。如果“hbase.coprocessor.enabled”为“false”,此设置不起作用。

    默认

    true

  • hbase.coprocessor.region.classes

  • 描述

    默认情况下在所有表上加载的协处理器的逗号分隔列表。对于任何覆盖协处理器方法,这些类将按顺序调用。在实现自己的协处理器之后,只需将其放在HBase的类路径中,并在此处添加完全限定类名。还可以通过设置HTableDescriptor来按需加载协处理器。

    默认

    没有

  • hbase.rest.port

  • 描述

    HBase REST服务器的端口。

    默认

    8080

  • hbase.rest.readonly

  • 描述

    定义REST服务器将启动的模式。可能的值为:false:允许所有HTTP方法 - GET / PUT / POST / DELETE。true:仅允许GET方法。

    默认

    false

  • hbase.rest.threads.max

  • 描述

    REST服务器线程池的最大线程数。池中的线程被重用来处理REST请求。这控制并发处理的请求的最大数量。它可能有助于控制REST服务器使用的内存,以避免OOM问题。如果线程池已满,传入的请求将排队等待一些空闲线程。

    默认

    100

  • hbase.rest.threads.min

  • 描述

    REST服务器线程池的最小线程数。线程池总是至少具有这些数量的线程,因此REST服务器已准备好处理传入的请求。

    默认

    2

  • hbase.rest.support.proxyuser

  • 描述

    启用运行REST服务器以支持代理用户模式。

    默认

    false

  • hbase.defaults.for.version.skip

  • 描述

    Set to true to skip the 'hbase.defaults.for.version' check. Setting this to true can be useful in contexts other than the other side of a maven generation; i.e. running in an IDE. You’ll want to set this boolean to true to avoid seeing the RuntimeException complaint: "hbase-default.xml file seems to be for and old version of HBase (\${hbase.version}), this version is X.X.X-SNAPSHOT"

    Default

    false

  • hbase.coprocessor.master.classes

  • Description

    A comma-separated list of org.apache.hadoop.hbase.coprocessor.MasterObserver coprocessors that are loaded by default on the active HMaster process. For any implemented coprocessor methods, the listed classes will be called in order. After implementing your own MasterObserver, just put it in HBase’s classpath and add the fully qualified class name here.

    Default

    none

  • hbase.coprocessor.abortonerror

  • Description

    Set to true to cause the hosting server (master or regionserver) to abort if a coprocessor fails to load, fails to initialize, or throws an unexpected Throwable object. Setting this to false will allow the server to continue execution but the system wide state of the coprocessor in question will become inconsistent as it will be properly executing in only a subset of servers, so this is most useful for debugging only.

    Default

    true

  • hbase.table.lock.enable

  • Description

    Set to true to enable locking the table in zookeeper for schema change operations. Table locking from master prevents concurrent schema modifications to corrupt table state.

    Default

    true

  • hbase.table.max.rowsize

  • Description

    Maximum size of single row in bytes (default is 1 Gb) for Get’ting or Scan’ning without in-row scan flag set. If row size exceeds this limit RowTooBigException is thrown to client.

    Default

    1073741824

  • hbase.thrift.minWorkerThreads

  • Description

    The "core size" of the thread pool. New threads are created on every connection until this many threads are created.

    Default

    16

  • hbase.thrift.maxWorkerThreads

  • Description

    The maximum size of the thread pool. When the pending request queue overflows, new threads are created until their number reaches this number. After that, the server starts dropping connections.

    Default

    1000

  • hbase.thrift.maxQueuedRequests

  • Description

    The maximum number of pending Thrift connections waiting in the queue. If there are no idle threads in the pool, the server queues requests. Only when the queue overflows, new threads are added, up to hbase.thrift.maxQueuedRequests threads.

    Default

    1000

  • hbase.regionserver.thrift.framed

  • Description

    Use Thrift TFramedTransport on the server side. This is the recommended transport for thrift servers and requires a similar setting on the client side. Changing this to false will select the default transport, vulnerable to DoS when malformed requests are issued due to THRIFT-601.

    Default

    false

  • hbase.regionserver.thrift.framed.max_frame_size_in_mb

  • Description

    使用成帧传输时的默认帧大小(MB)

    默认

    2

  • hbase.regionserver.thrift.compact

  • 描述

    使用Thrift TCompactProtocol二进制序列化协议。

    默认

    false

  • hbase.rootdir.perms

  • 描述

    FS安全(kerberos)设置中根数据子目录的权限。当master启动时,它使用此权限创建rootdir,如果不匹配,则设置权限。

    默认

    700

  • hbase.wal.dir.perms

  • 描述

    FS安全(kerberos)设置中根WAL目录的权限。当master启动时,它使用此权限创建WAL目录,如果不匹配,则设置权限。

    默认

    700

  • hbase.data.umask.enable

  • 描述

    启用(如果为true),应将文件权限分配给regionserver写入的文件

    默认

    false

  • hbase.data.umask

  • 描述

    当hbase.data.umask.enable为true时,应用于写入数据文件的文件权限

    默认

    000

  • hbase.snapshot.enabled

  • 描述

    设置为true可允许拍摄/恢复/克隆快照。

    默认

    true

  • hbase.snapshot.restore.take.failsafe.snapshot

  • 描述

    设置为true以在恢复操作之前创建快照。拍摄的快照将在失败的情况下使用,以恢复之前的状态。在还原操作结束时,此快照将被删除

    默认

    true

  • hbase.snapshot.restore.failsafe.name

  • 描述

    恢复操作拍摄的故障安全快照的名称。您可以使用{snapshot.name},{table.name}和{restore.timestamp}变量根据要还原的内容创建名称。

    默认

    hbase-failsafe-{snapshot.name}-{restore.timestamp}

  • hbase.server.compactchecker.interval.multiplier

  • 描述

    确定我们扫描多长时间以查看是否需要压缩的数字。通常,压缩是在一些事件(例如memstore flush)之后完成的,但是如果一段时间内没有收到很多写入,或者由于不同的压缩策略,可能需要定期检查。检查之间的间隔为hbase.server.compactchecker.interval.multiplier乘以hbase.server.thread.wakefrequency。

    默认

    1000

  • hbase.lease.recovery.timeout

  • 描述

    How long we wait on dfs lease recovery in total before giving up.

    Default

    900000

  • hbase.lease.recovery.dfs.timeout

  • Description

    How long between dfs recover lease invocations. Should be larger than the sum of the time it takes for the namenode to issue a block recovery command as part of datanode; dfs.heartbeat.interval and the time it takes for the primary datanode, performing block recovery to timeout on a dead datanode; usually dfs.client.socket-timeout. See the end of HBASE-8389 for more.

    Default

    64000

  • hbase.column.max.version

  • Description

    New column family descriptors will use this value as the default number of versions to keep.

    Default

    1

  • dfs.client.read.shortcircuit

  • Description

    If set to true, this configuration parameter enables short-circuit local reads.

    Default

    false

  • dfs.domain.socket.path

  • Description

    This is a path to a UNIX domain socket that will be used for communication between the DataNode and local HDFS clients, if dfs.client.read.shortcircuit is set to true. If the string "_PORT" is present in this path, it will be replaced by the TCP port of the DataNode. Be careful about permissions for the directory that hosts the shared domain socket; dfsclient will complain if open to other users than the HBase user.

    Default

    none

  • hbase.dfs.client.read.shortcircuit.buffer.size

  • Description

    If the DFSClient configuration dfs.client.read.shortcircuit.buffer.size is unset, we will use what is configured here as the short circuit read default direct byte buffer size. DFSClient native default is 1MB; HBase keeps its HDFS files open so number of file blocks * 1MB soon starts to add up and threaten OOME because of a shortage of direct memory. So, we set it down from the default. Make it > the default hbase block size set in the HColumnDescriptor which is usually 64k.

    Default

    131072

  • hbase.regionserver.checksum.verify

  • Description

    If set to true (the default), HBase verifies the checksums for hfile blocks. HBase writes checksums inline with the data when it writes out hfiles. HDFS (as of this writing) writes checksums to a separate file than the data file necessitating extra seeks. Setting this flag saves some on i/o. Checksum verification by HDFS will be internally disabled on hfile streams when this flag is set. If the hbase-checksum verification fails, we will switch back to using HDFS checksums (so do not disable HDFS checksums! And besides this feature applies to hfiles only, not to WALs). If this parameter is set to false, then hbase will not verify any checksums, instead it will depend on checksum verification being done in the HDFS client.

    Default

    true

  • hbase.hstore.bytes.per.checksum

  • Description

    Number of bytes in a newly created checksum chunk for HBase-level checksums in hfile blocks.

    Default

    16384

  • hbase.hstore.checksum.algorithm

  • Description

    Name of an algorithm that is used to compute checksums. Possible values are NULL, CRC32, CRC32C.

    Default

    CRC32C

  • hbase.client.scanner.max.result.size

  • Description

    Maximum number of bytes returned when calling a scanner’s next method. Note that when a single row is larger than this limit the row is still returned completely. The default value is 2MB, which is good for 1ge networks. With faster and/or high latency networks this value should be increased.

    Default

    2097152

  • hbase.server.scanner.max.result.size

  • Description

    Maximum number of bytes returned when calling a scanner’s next method. Note that when a single row is larger than this limit the row is still returned completely. The default value is 100MB. This is a safety setting to protect the server from OOM situations.

    Default

    104857600

  • hbase.status.published

  • Description

    This setting activates the publication by the master of the status of the region server. When a region server dies and its recovery starts, the master will push this information to the client application, to let them cut the connection immediately instead of waiting for a timeout.

    Default

    false

  • hbase.status.publisher.class

  • Description

    Implementation of the status publication with a multicast message.

    Default

    org.apache.hadoop.hbase.master.ClusterStatusPublisher$MulticastPublisher

  • hbase.status.listener.class

  • Description

    Implementation of the status listener with a multicast message.

    Default

    org.apache.hadoop.hbase.client.ClusterStatusListener$MulticastListener

  • hbase.status.multicast.address.ip

  • Description

    Multicast address to use for the status publication by multicast.

    Default

    226.1.1.3

  • hbase.status.multicast.address.port

  • Description

    Multicast port to use for the status publication by multicast.

    Default

    16100

  • hbase.dynamic.jars.dir

  • Description

    The directory from which the custom filter JARs can be loaded dynamically by the region server without the need to restart. However, an already loaded filter/co-processor class would not be un-loaded. See HBASE-1936 for more details. Does not apply to coprocessors.

    Default

    ${hbase.rootdir}/lib

  • hbase.security.authentication

  • Description

    Controls whether or not secure authentication is enabled for HBase. Possible values are 'simple' (no authentication), and 'kerberos'.

    Default

    simple

  • hbase.rest.filter.classes

  • Description

    Servlet filters for REST service.

    Default

    org.apache.hadoop.hbase.rest.filter.GzipFilter

  • hbase.master.loadbalancer.class

  • Description

    Class used to execute the regions balancing when the period occurs. See the class comment for more on how it works http://hbase.apache.org/devapidocs/org/apache/hadoop/hbase/master/balancer/StochasticLoadBalancer.html It replaces the DefaultLoadBalancer as the default (since renamed as the SimpleLoadBalancer).

    Default

    org.apache.hadoop.hbase.master.balancer.StochasticLoadBalancer

  • hbase.master.normalizer.class

  • Description

    Class used to execute the region normalization when the period occurs. See the class comment for more on how it works http://hbase.apache.org/devapidocs/org/apache/hadoop/hbase/master/normalizer/SimpleRegionNormalizer.html

    Default

    org.apache.hadoop.hbase.master.normalizer.SimpleRegionNormalizer

  • hbase.rest.csrf.enabled

  • Description

    Set to true to enable protection against cross-site request forgery (CSRF)

    Default

    false

  • hbase.rest-csrf.browser-useragents-regex

  • Description

    A comma-separated list of regular expressions used to match against an HTTP request’s User-Agent header when protection against cross-site request forgery (CSRF) is enabled for REST server by setting hbase.rest.csrf.enabled to true. If the incoming User-Agent matches any of these regular expressions, then the request is considered to be sent by a browser, and therefore CSRF prevention is enforced. If the request’s User-Agent does not match any of these regular expressions, then the request is considered to be sent by something other than a browser, such as scripted automation. In this case, CSRF is not a potential attack vector, so the prevention is not enforced. This helps achieve backwards-compatibility with existing automation that has not been updated to send the CSRF prevention header.

    Default

    Mozilla.,Opera.

  • hbase.security.exec.permission.checks

  • Description

    如果启用此设置并且基于ACL的访问控制处于活动状态(AccessController协处理器作为系统协处理器安装或作为表协处理器安装在表上),则必须授予所有相关用户EXEC特权才能执行协处理器端点调用。EXEC权限与任何其他权限一样,可以全局授予用户,或授予每个表或每个命名空间的用户。有关协处理器端点的更多信息,请参阅HBase在线手册的协处理器部分。有关使用AccessController授予或撤消权限的更多信息,请参阅HBase在线手册的安全部分。

    默认

    false

  • hbase.procedure.regionserver.classes

  • 描述

    默认情况下在活动HRegionServer进程上加载的org.apache.hadoop.hbase.procedure.RegionServerProcedureManager过程管理器的逗号分隔列表。生命周期方法(init / start / stop)将由活动的HRegionServer进程调用,以执行特定的全局阻止过程。在实现自己的RegionServerProcedureManager之后,只需将其放在HBase的类路径中,并在此处添加完全限定类名。

    默认

    没有

  • hbase.procedure.master.classes

  • 描述

    默认情况下在活动HMaster进程上加载的org.apache.hadoop.hbase.procedure.MasterProcedureManager过程管理器的逗号分隔列表。过程由其签名来标识,并且用户可以使用签名和即时名称来触发全局保护过程的执行。在实现自己的MasterProcedureManager之后,只需将它放在HBase的类路径中,并在此处添加完全限定类名。

    默认

    没有

  • hbase.coordinated.state.manager.class

  • 描述

    实现协调状态管理器的类的完全限定名。

    默认

    org.apache.hadoop.hbase.coordination.ZkCoordinatedStateManager

  • hbase.regionserver.storefile.refresh.period

  • 描述

    刷新辅助区域的存储文件的周期(以毫秒为单位)。0表示禁用此功能。次要区域会在次要区域刷新该区域中的文件列表(没有通知机制)时从主要区域看到新文件(来自刷新和压缩)。但太频繁的刷新可能会导致额外的Namenode压力。如果文件无法刷新长于HFile TTL(hbase.master.hfilecleaner.ttl),请求将被拒绝。此设置也建议将HFile TTL配置为较大的值。

    默认

    0

  • hbase.region.replica.replication.enabled

  • 描述

    Whether asynchronous WAL replication to the secondary region replicas is enabled or not. If this is enabled, a replication peer named "region_replica_replication" will be created which will tail the logs and replicate the mutations to region replicas for tables that have region replication > 1. If this is enabled once, disabling this replication also requires disabling the replication peer using shell or ReplicationAdmin java class. Replication to secondary region replicas works over standard inter-cluster replication.

    Default

    false

  • hbase.http.filter.initializers

  • Description

    A comma separated list of class names. Each class in the list must extend org.apache.hadoop.hbase.http.FilterInitializer. The corresponding Filter will be initialized. Then, the Filter will be applied to all user facing jsp and servlet web pages. The ordering of the list defines the ordering of the filters. The default StaticUserWebFilter add a user principal as defined by the hbase.http.staticuser.user property.

    Default

    org.apache.hadoop.hbase.http.lib.StaticUserWebFilter

  • hbase.security.visibility.mutations.checkauths

  • Description

    This property if enabled, will check whether the labels in the visibility expression are associated with the user issuing the mutation

    Default

    false

  • hbase.http.max.threads

  • Description

    The maximum number of threads that the HTTP Server will create in its ThreadPool.

    Default

    10

  • hbase.replication.rpc.codec

  • Description

    The codec that is to be used when replication is enabled so that the tags are also replicated. This is used along with HFileV3 which supports tags in them. If tags are not used or if the hfile version used is HFileV2 then KeyValueCodec can be used as the replication codec. Note that using KeyValueCodecWithTags for replication when there are no tags causes no harm.

    Default

    org.apache.hadoop.hbase.codec.KeyValueCodecWithTags

  • hbase.replication.source.maxthreads

  • Description

    The maximum number of threads any replication source will use for shipping edits to the sinks in parallel. This also limits the number of chunks each replication batch is broken into. Larger values can improve the replication throughput between the master and slave clusters. The default of 10 will rarely need to be changed.

    Default

    10

  • hbase.serial.replication.waitingMs

  • Description

    By default, in replication we can not make sure the order of operations in slave cluster is same as the order in master. If set REPLICATION_SCOPE to 2, we will push edits by the order of written. This configure is to set how long (in ms) we will wait before next checking if a log can not push right now because there are some logs written before it have not been pushed. A larger waiting will decrease the number of queries on hbase:meta but will enlarge the delay of replication. This feature relies on zk-less assignment, and conflicts with distributed log replay. So users must set hbase.assignment.usezk and hbase.master.distributed.log.replay to false to support it.

    Default

    10000

  • hbase.http.staticuser.user

  • Description

    The user name to filter as, on static web filters while rendering content. An example use is the HDFS web UI (user to be used for browsing files).

    Default

    dr.stack

  • hbase.regionserver.handler.abort.on.error.percent

  • Description

    The percent of region server RPC threads failed to abort RS. -1 Disable aborting; 0 Abort if even a single handler has died; 0.x Abort only when this percent of handlers have died; 1 Abort only all of the handers have died.

    Default

    0.5

  • hbase.mob.file.cache.size

  • Description

    Number of opened file handlers to cache. A larger value will benefit reads by providing more file handlers per mob file cache and would reduce frequent file opening and closing. However, if this is set too high, this could lead to a "too many opened file handlers" The default value is 1000.

    Default

    1000

  • hbase.mob.cache.evict.period

  • Description

    The amount of time in seconds before the mob cache evicts cached mob files. The default value is 3600 seconds.

    Default

    3600

  • hbase.mob.cache.evict.remain.ratio

  • Description

    The ratio (between 0.0 and 1.0) of files that remains cached after an eviction is triggered when the number of cached mob files exceeds the hbase.mob.file.cache.size. The default value is 0.5f.

    Default

    0.5f

  • hbase.master.mob.ttl.cleaner.period

  • Description

    The period that ExpiredMobFileCleanerChore runs. The unit is second. The default value is one day. The MOB file name uses only the date part of the file creation time in it. We use this time for deciding TTL expiry of the files. So the removal of TTL expired files might be delayed. The max delay might be 24 hrs.

    Default

    86400

  • hbase.mob.compaction.mergeable.threshold

  • Description

    If the size of a mob file is less than this value, it’s regarded as a small file and needs to be merged in mob compaction. The default value is 1280MB.

    Default

    1342177280

  • hbase.mob.delfile.max.count

  • Description

    The max number of del files that is allowed in the mob compaction. In the mob compaction, when the number of existing del files is larger than this value, they are merged until number of del files is not larger this value. The default value is 3.

    Default

    3

  • hbase.mob.compaction.batch.size

  • Description

    The max number of the mob files that is allowed in a batch of the mob compaction. The mob compaction merges the small mob files to bigger ones. If the number of the small files is very large, it could lead to a "too many opened file handlers" in the merge. And the merge has to be split into batches. This value limits the number of mob files that are selected in a batch of the mob compaction. The default value is 100.

    Default

    100

  • hbase.mob.compaction.chore.period

  • Description

    The period that MobCompactionChore runs. The unit is second. The default value is one week.

    Default

    604800

  • hbase.mob.compactor.class

  • Description

    Implementation of mob compactor, the default one is PartitionedMobCompactor.

    Default

    org.apache.hadoop.hbase.mob.compactions.PartitionedMobCompactor

  • hbase.mob.compaction.threads.max

  • Description

    The max number of threads used in MobCompactor.

    Default

    1

  • hbase.snapshot.master.timeout.millis

  • Description

    Timeout for master for the snapshot procedure execution

    Default

    300000

  • hbase.snapshot.region.timeout

  • Description

    Timeout for regionservers to keep threads in snapshot request pool waiting

    Default

    300000

7.3. hbase-env.sh

Set HBase environment variables in this file. Examples include options to pass the JVM on start of an HBase daemon such as heap size and garbage collector configs. You can also set configurations for HBase configuration, log directories, niceness, ssh options, where to locate process pid files, etc. Open the file at conf/hbase-env.sh and peruse its content. Each option is fairly well documented. Add your own environment variables here if you want them read by HBase daemons on startup.

Changes here will require a cluster restart for HBase to notice the change.

7.4. log4j.properties

Edit this file to change rate at which HBase files are rolled and to change the level at which HBase logs messages.

Changes here will require a cluster restart for HBase to notice the change though log levels can be changed for particular daemons via the HBase UI.

7.5. Client configuration and dependencies connecting to an HBase cluster

If you are running HBase in standalone mode, you don’t need to configure anything for your client to work provided that they are all on the same machine.

Since the HBase Master may move around, clients bootstrap by looking to ZooKeeper for current critical locations. ZooKeeper is where all these values are kept. Thus clients require the location of the ZooKeeper ensemble before they can do anything else. Usually this the ensemble location is kept out in the hbase-site.xml and is picked up by the client from the CLASSPATH.

If you are configuring an IDE to run an HBase client, you should include the conf/ directory on your classpath so hbase-site.xml settings can be found (or add src/test/resources to pick up the hbase-site.xml used by tests).

Minimally, a client of HBase needs several libraries in its CLASSPATH when connecting to a cluster, including:

commons-configuration (commons-configuration-1.6.jar) commons-lang (commons-lang-2.5.jar) commons-logging (commons-logging-1.1.1.jar) hadoop-core (hadoop-core-1.0.0.jar) hbase (hbase-0.92.0.jar) log4j (log4j-1.2.16.jar) slf4j-api (slf4j-api-1.5.8.jar) slf4j-log4j (slf4j-log4j12-1.5.8.jar) zookeeper (zookeeper-3.4.2.jar)

An example basic hbase-site.xml for client only might look as follows:

<?xml version="1.0"?> <?xml-stylesheet type="text/xsl" href="configuration.xsl"?> <configuration> <property> <name>hbase.zookeeper.quorum</name> <value>example1,example2,example3</value> <description>The directory shared by region servers. </description> </property> </configuration>

7.5.1. Java client configuration

The configuration used by a Java client is kept in an HBaseConfiguration instance.

The factory method on HBaseConfiguration, HBaseConfiguration.create();, on invocation, will read in the content of the first hbase-site.xml found on the client’s CLASSPATH, if one is present (Invocation will also factor in any hbase-default.xml found; an hbase-default.xml ships inside the hbase.X.X.X.jar). It is also possible to specify configuration directly without having to read from a hbase-site.xml. For example, to set the ZooKeeper ensemble for the cluster programmatically do as follows:

Configuration config = HBaseConfiguration.create(); config.set("hbase.zookeeper.quorum", "localhost"); // Here we are running zookeeper locally

If multiple ZooKeeper instances make up your ZooKeeper ensemble, they may be specified in a comma-separated list (just as in the hbase-site.xml file). This populated Configuration instance can then be passed to an Table, and so on.

8. Example Configurations

8.1. Basic Distributed HBase Install

Here is an example basic configuration for a distributed ten node cluster: * The nodes are named example0, example1, etc., through node example9 in this example. * The HBase Master and the HDFS NameNode are running on the node example0. * RegionServers run on nodes example1-example9. * A 3-node ZooKeeper ensemble runs on example1, example2, and example3 on the default ports. * ZooKeeper data is persisted to the directory /export/zookeeper.

Below we show what the main configuration files?—?hbase-site.xml, regionservers, and hbase-env.sh?—?found in the HBase conf directory might look like.

8.1.1. hbase-site.xml

<?xml version="1.0"?> <?xml-stylesheet type="text/xsl" href="configuration.xsl"?> <configuration> <property> <name>hbase.zookeeper.quorum</name> <value>example1,example2,example3</value> <description>The directory shared by RegionServers. </description> </property> <property> <name>hbase.zookeeper.property.dataDir</name> <value>/export/zookeeper</value> <description>Property from ZooKeeper config zoo.cfg. The directory where the snapshot is stored. </description> </property> <property> <name>hbase.rootdir</name> <value>hdfs://example0:8020/hbase</value> <description>The directory shared by RegionServers. </description> </property> <property> <name>hbase.cluster.distributed</name> <value>true</value> <description>The mode the cluster will be in. Possible values are false: standalone and pseudo-distributed setups with managed ZooKeeper true: fully-distributed with unmanaged ZooKeeper Quorum (see hbase-env.sh) </description> </property> </configuration>

8.1.2. regionservers

In this file you list the nodes that will run RegionServers. In our case, these nodes are example1-example9.

example1 example2 example3 example4 example5 example6 example7 example8 example9

8.1.3. hbase-env.sh

The following lines in the hbase-env.sh file show how to set the JAVA_HOME environment variable (required for HBase 0.98.5 and newer) and set the heap to 4 GB (rather than the default value of 1 GB). If you copy and paste this example, be sure to adjust the JAVA_HOME to suit your environment.

# The java implementation to use. export JAVA_HOME=/usr/java/jdk1.8.0/ # The maximum amount of heap to use. Default is left to JVM default. export HBASE_HEAPSIZE=4G

Use rsync to copy the content of the conf directory to all nodes of the cluster.

9. The Important Configurations

Below we list some important configurations. We’ve divided this section into required configuration and worth-a-look recommended configs.

9.1. Required Configurations

Review the os and hadoop sections.

9.1.1. Big Cluster Configurations

If you have a cluster with a lot of regions, it is possible that a Regionserver checks in briefly after the Master starts while all the remaining RegionServers lag behind. This first server to check in will be assigned all regions which is not optimal. To prevent the above scenario from happening, up the hbase.master.wait.on.regionservers.mintostart property from its default value of 1. See HBASE-6389 Modify the conditions to ensure that Master waits for sufficient number of Region Servers before starting region assignments for more detail.

9.2. Recommended Configurations

9.2.1. ZooKeeper Configuration

zookeeper.session.timeout

The default timeout is three minutes (specified in milliseconds). This means that if a server crashes, it will be three minutes before the Master notices the crash and starts recovery. You might like to tune the timeout down to a minute or even less so the Master notices failures the sooner. Before changing this value, be sure you have your JVM garbage collection configuration under control otherwise, a long garbage collection that lasts beyond the ZooKeeper session timeout will take out your RegionServer (You might be fine with this?—?you probably want recovery to start on the server if a RegionServer has been in GC for a long period of time).

To change this configuration, edit hbase-site.xml, copy the changed file around the cluster and restart.

We set this value high to save our having to field questions up on the mailing lists asking why a RegionServer went down during a massive import. The usual cause is that their JVM is untuned and they are running into long GC pauses. Our thinking is that while users are getting familiar with HBase, we’d save them having to know all of its intricacies. Later when they’ve built some confidence, then they can play with configuration such as this.

Number of ZooKeeper Instances

See zookeeper.

9.2.2. HDFS Configurations

dfs.datanode.failed.volumes.tolerated

This is the "…number of volumes that are allowed to fail before a DataNode stops offering service. By default any volume failure will cause a datanode to shutdown" from the hdfs-default.xml description. You might want to set this to about half the amount of your available disks.

9.2.3. hbase.regionserver.handler.count

This setting defines the number of threads that are kept open to answer incoming requests to user tables. The rule of thumb is to keep this number low when the payload per request approaches the MB (big puts, scans using a large cache) and high when the payload is small (gets, small puts, ICVs, deletes). The total size of the queries in progress is limited by the setting hbase.ipc.server.max.callqueue.size.

如果它们的有效负载很小,那么将该数字设置为传入客户端的最大数目是安全的,典型的示例是为网站提供服务的集群,因为put通常不会被缓冲,大多数操作都会被获取。

将此设置保持为高的危险是,区域服务器中当前正在发生的所有放置的总大小可能对其内存施加过大的压力,甚至触发OutOfMemoryError。在低内存上运行的RegionServer将触发其JVM的垃圾收集器更频繁地运行,直到GC暂停变得明显(因为用于保存所有请求的有效负载的所有内存不能被丢弃,垃圾收集器尝试)。一段时间后,整个群集吞吐量受到影响,因为每个请求命中该RegionServer将需要更长时间,这进一步加剧了问题。

您可以通过rpc.logging在单个RegionServer上查看是否有太少或太多的处理程序,然后拖动其日志(排队请求消耗内存)。

9.2.4。大内存机器的配置

HBase提供了一个合理,保守的配置,将几乎所有机器类型,人们可能想测试。如果你有更大的机器 - HBase有8G和更大的堆 - 您可能有以下配置选项有用。去做。

9.2.5。压缩

您应该考虑启用ColumnFamily压缩。有几个选项几乎无摩擦,并且在大多数情况下通过减小StoreFiles的大小从而减少I / O来提高性能。

见压缩以获取更多信息。

9.2.6。配置WAL文件的大小和数量

HBase使用wal恢复在RS故障的情况下尚未刷新到磁盘的memstore数据。这些WAL文件应该配置为比HDFS块稍小(默认情况下HDFS块是64Mb,WAL文件是?60Mb)。

HBase还对WAL文件的数量有限制,旨在确保在恢复期间不会有太多的数据需要重播。此限制需要根据memstore配置进行设置,以便所有必需的数据都适合。建议分配足够的WAL文件来存储至少那么多的数据(当所有memstore都接近满时)。例如,使用16Gb RS堆,默认memstore设置(0.4)和默认WAL文件大小(?60Mb),16Gb * 0.4 / 60,WAL文件计数的起始点为?109。但是,由于所有的memstore都不是所有的时间,所以可以分配更少的WAL文件。

9.2.7。管理拆分

HBase generally handles splitting your regions, based upon the settings in your hbase-default.xml and hbase-site.xmlconfiguration files. Important settings include hbase.regionserver.region.split.policy, hbase.hregion.max.filesize, hbase.regionserver.regionSplitLimit. A simplistic view of splitting is that when a region grows to hbase.hregion.max.filesize, it is split. For most use patterns, most of the time, you should use automatic splitting. See manual region splitting decisions for more information about manual region splitting.

Instead of allowing HBase to split your regions automatically, you can choose to manage the splitting yourself. This feature was added in HBase 0.90.0. Manually managing splits works if you know your keyspace well, otherwise let HBase figure where to split for you. Manual splitting can mitigate region creation and movement under load. It also makes it so region boundaries are known and invariant (if you disable region splitting). If you use manual splits, it is easier doing staggered, time-based major compactions to spread out your network IO load.

Disable Automatic Splitting

To disable automatic splitting, set hbase.hregion.max.filesize to a very large value, such as 100 GB It is not recommended to set it to its absolute maximum value of Long.MAX_VALUE.


Automatic Splitting Is RecommendedIf you disable automatic splits to diagnose a problem or during a period of fast data growth, it is recommended to re-enable them when your situation becomes more stable. The potential benefits of managing region splits yourself are not undisputed.

Determine the Optimal Number of Pre-Split Regions

The optimal number of pre-split regions depends on your application and environment. A good rule of thumb is to start with 10 pre-split regions per server and watch as data grows over time. It is better to err on the side of too few regions and perform rolling splits later. The optimal number of regions depends upon the largest StoreFile in your region. The size of the largest StoreFile will increase with time if the amount of data grows. The goal is for the largest region to be just large enough that the compaction selection algorithm only compacts it during a timed major compaction. Otherwise, the cluster can be prone to compaction storms where a large number of regions under compaction at the same time. It is important to understand that the data growth causes compaction storms, and not the manual split decision.

If the regions are split into too many large regions, you can increase the major compaction interval by configuring HConstants.MAJOR_COMPACTION_PERIOD. HBase 0.90 introduced org.apache.hadoop.hbase.util.RegionSplitter, which provides a network-IO-safe rolling split of all regions.

9.2.8. Managed Compactions

By default, major compactions are scheduled to run once in a 7-day period. Prior to HBase 0.96.x, major compactions were scheduled to happen once per day by default.

If you need to control exactly when and how often major compaction runs, you can disable managed major compactions. See the entry for hbase.hregion.majorcompaction in the compaction.parameters table for details.


Do Not Disable Major CompactionsMajor compactions are absolutely necessary for StoreFile clean-up. Do not disable them altogether. You can run major compactions manually via the HBase shell or via the Admin API.

For more information about compactions and the compaction file selection process, see compaction

9.2.9. Speculative Execution

Speculative Execution of MapReduce tasks is on by default, and for HBase clusters it is generally advised to turn off Speculative Execution at a system-level unless you need it for a specific case, where it can be configured per-job. Set the properties mapreduce.map.speculative and mapreduce.reduce.speculative to false.

9.3. Other Configurations

9.3.1. Balancer

The balancer is a periodic operation which is run on the master to redistribute regions on the cluster. It is configured via hbase.balancer.period and defaults to 300000 (5 minutes).

See master.processes.loadbalancer for more information on the LoadBalancer.

9.3.2. Disabling Blockcache

Do not turn off block cache (You’d do it by setting hfile.block.cache.size to zero). Currently we do not do well if you do this because the RegionServer will spend all its time loading HFile indices over and over again. If your working set is such that block cache does you no good, at least size the block cache such that HFile indices will stay up in the cache (you can get a rough idea on the size you need by surveying RegionServer UIs; you’ll see index block size accounted near the top of the webpage).

9.3.3. Nagle’s or the small package problem

If a big 40ms or so occasional delay is seen in operations against HBase, try the Nagles' setting. For example, see the user mailing list thread, Inconsistent scan performance with caching set to 1 and the issue cited therein where setting notcpdelay improved scan speeds. You might also see the graphs on the tail of HBASE-7008 Set scanner caching to a better default where our Lars Hofhansl tries various data sizes w/ Nagle’s on and off measuring the effect.

9.3.4. Better Mean Time to Recover (MTTR)

This section is about configurations that will make servers come back faster after a fail. See the Deveraj Das and Nicolas Liochon blog post Introduction to HBase Mean Time to Recover (MTTR) for a brief introduction.

The issue HBASE-8354 forces Namenode into loop with lease recovery requests is messy but has a bunch of good discussion toward the end on low timeouts and how to effect faster recovery including citation of fixes added to HDFS. Read the Varun Sharma comments. The below suggested configurations are Varun’s suggestions distilled and tested. Make sure you are running on a late-version HDFS so you have the fixes he refers too and himself adds to HDFS that help HBase MTTR (e.g. HDFS-3703, HDFS-3712, and HDFS-4791?—?Hadoop 2 for sure has them and late Hadoop 1 has some). Set the following in the RegionServer.

<property> <name>hbase.lease.recovery.dfs.timeout</name> <value>23000</value> <description>How much time we allow elapse between calls to recover lease. Should be larger than the dfs timeout.</description> </property> <property> <name>dfs.client.socket-timeout</name> <value>10000</value> <description>Down the DFS timeout from 60 to 10 seconds.</description> </property>

And on the NameNode/DataNode side, set the following to enable 'staleness' introduced in HDFS-3703, HDFS-3912.

<property> <name>dfs.client.socket-timeout</name> <value>10000</value> <description>Down the DFS timeout from 60 to 10 seconds.</description> </property> <property> <name>dfs.datanode.socket.write.timeout</name> <value>10000</value> <description>Down the DFS timeout from 8 * 60 to 10 seconds.</description> </property> <property> <name>ipc.client.connect.timeout</name> <value>3000</value> <description>Down from 60 seconds to 3.</description> </property> <property> <name>ipc.client.connect.max.retries.on.timeouts</name> <value>2</value> <description>Down from 45 seconds to 3 (2 == 3 retries).</description> </property> <property> <name>dfs.namenode.avoid.read.stale.datanode</name> <value>true</value> <description>Enable stale state in hdfs</description> </property> <property> <name>dfs.namenode.stale.datanode.interval</name> <value>20000</value> <description>Down from default 30 seconds</description> </property> <property> <name>dfs.namenode.avoid.write.stale.datanode</name> <value>true</value> <description>Enable stale state in hdfs</description> </property>

9.3.5. JMX

JMX (Java Management Extensions) provides built-in instrumentation that enables you to monitor and manage the Java VM. To enable monitoring and management from remote systems, you need to set system property com.sun.management.jmxremote.port (the port number through which you want to enable JMX RMI connections) when you start the Java VM. See the official documentation for more information. Historically, besides above port mentioned, JMX opens two additional random TCP listening ports, which could lead to port conflict problem. (See HBASE-10289 for details)

As an alternative, You can use the coprocessor-based JMX implementation provided by HBase. To enable it in 0.99 or above, add below property in hbase-site.xml:

<property> <name>hbase.coprocessor.regionserver.classes</name> <value>org.apache.hadoop.hbase.JMXListener</value> </property>

DO NOT set com.sun.management.jmxremote.port for Java VM at the same time.

目前它支持Master和RegionServer Java VM。默认情况下,JMX在TCP端口10102上侦听,可以使用以下属性进一步配置端口:

<property> <name>regionserver.rmi.registry.port</name> <value>61130</value> </property> <property> <name>regionserver.rmi.connector.port</name> <value>61140</value> </property>

在大多数情况下,注册表端口可以与连接器端口共享,因此您只需要配置regionserver.rmi.registry.port。但是,如果要使用SSL通信,则必须将2个端口配置为不同的值。

默认情况下,禁用密码认证和SSL通信。要启用密码认证,您需要更新hbase-env.sh ,如下所示:

export HBASE_JMX_BASE="-Dcom.sun.management.jmxremote.authenticate=true \ -Dcom.sun.management.jmxremote.password.file=your_password_file \ -Dcom.sun.management.jmxremote.access.file=your_access_file" export HBASE_MASTER_OPTS="$HBASE_MASTER_OPTS $HBASE_JMX_BASE " export HBASE_REGIONSERVER_OPTS="$HBASE_REGIONSERVER_OPTS $HBASE_JMX_BASE "

请参阅$ JRE_HOME / lib / management下的示例密码/访问文件。

要使用密码验证启用SSL通信,请按照以下步骤操作:

#1. generate a key pair, stored in myKeyStore keytool -genkey -alias jconsole -keystore myKeyStore #2. export it to file jconsole.cert keytool -export -alias jconsole -keystore myKeyStore -file jconsole.cert #3. copy jconsole.cert to jconsole client machine, import it to jconsoleKeyStore keytool -import -alias jconsole -keystore jconsoleKeyStore -file jconsole.cert

然后更新hbase-env.sh如下:

export HBASE_JMX_BASE="-Dcom.sun.management.jmxremote.ssl=true \ -Djavax.net.ssl.keyStore=/home/tianq/myKeyStore \ -Djavax.net.ssl.keyStorePassword=your_password_in_step_1 \ -Dcom.sun.management.jmxremote.authenticate=true \ -Dcom.sun.management.jmxremote.password.file=your_password file \ -Dcom.sun.management.jmxremote.access.file=your_access_file" export HBASE_MASTER_OPTS="$HBASE_MASTER_OPTS $HBASE_JMX_BASE " export HBASE_REGIONSERVER_OPTS="$HBASE_REGIONSERVER_OPTS $HBASE_JMX_BASE "

最后jconsole在客户端使用密钥存储启动:

jconsole -J-Djavax.net.ssl.trustStore=/home/tianq/jconsoleKeyStore

要在Master上启用HBase JMX实现,还需要在hbase-site.xml中添加以下属性:
<property> <name>hbase.coprocessor.master.classes</name> <value>org.apache.hadoop.hbase.JMXListener</value> </property>

端口配置的相应属性为master.rmi.registry.port(默认为10101)和master.rmi.connector.port(默认情况下与registry.port相同)

10.动态配置

从HBase 1.0.0开始,可以更改配置的子集,而不需要重新启动服务器。在HBase的外壳,有新的运营商,update_configupdate_all_config这将促使服务器或所有服务器重新加载配置。

当前只能更改运行服务器中所有配置的子集。这里是一个不完整的名单:hbase.regionserver.thread.compaction.largehbase.regionserver.thread.compaction.smallhbase.regionserver.thread.splithbase.regionserver.thread.merge,以及压实策略和配置以及调整到非高峰小时。有关完整列表,请参阅 HBASE-12147移植在线配置从89-fb更改

发表评论:

控制面板
您好,欢迎到访网站!
  查看权限
网站分类
最新留言
    友情链接