1 Hbase基本介绍 Hbase是一个分布式数据库,可以提供数据的实时随机读写。
Hbase与mysql、oralce、db2、sqlserver等关系型数据库不同,它是一个NoSQL数据库(非关系型数据库),并且有如下特点:
Hbase的表模型与关系型数据库的表模型不同:Hbase的表没有固定的字段定义;Hbase的表中每行存储的都是一些key-value对Hbase的表中有列族的划分,用户可以指定将哪些kv插入哪个列族Hbase的表在物理存储上,是按照列族来分割的,不同列族的数据一定存储在不同的文件中Hbase的表中的每一行都固定有一个行键,而且每一行的行键在表中不能重复Hbase中的数据,包含行键,包含key,包含value,都是byte[ ]类型,hbase不负责为用户维护数据类型Hbase对事务的支持很差 
HBASE相比于其他nosql数据库(mongodb、redis、cassendra、hazelcast)的特点:Hbase的表数据存储在HDFS文件系统中,所以存储容量可以线性扩展; 数据存储的安全性可靠性极高!
2 Hbase的表结构 
rowkey:行键 
base_info 
extra_info 
 
 
001 
name:zs,age:22,sex:male 
hobbiy:read,addr:beijing 
 
002 
name:laowang,sex:male 
 
hbase的表模型跟mysql之类的关系型数据库的表模型差别巨大
hbase的表模型中有:行的概念;但没有字段的概念
行中存的都是key-value对,每行中的key-value对中的key可以是各种各样的。
hbase表模型的要点
一个表,有表名 
一个表可以分为多个列族(不同列族的数据会存储在不同文件中) 
表中的每一行有一个“行键rowkey”,而且行键在表中不能重复 
表中的每一对key-value叫做一个cell 
hbase可以对数据存储多个历史版本(历史版本数量可配置),默认取最新的版本 
整张表由于数据量过大,会被横向切分成若干个region(用rowkey范围标识),不同region的数据也存储在不同文件中 
 
hbase会对插入的数据按顺序存储:
首先会按行键排序 
同一行里面的kv会按列族排序,再按k排序 
 
hbase的表数据类型:
hbase中只支持byte[] ,此处的byte[] 包括了: rowkey,key,value,列族名,表名。
3 Hbase工作机制 
Hbase分布式系统包含两个角色
管理角色:HMaster(一般2台,一台active,一台standby) 
数据节点角色:HRegionServer(多台,和datanode在一起) 
 
Hbase不做数据处理的话,不需要yarn,yarn是复制Mapreduce计算的,Hbase只是负责数据管理
4 Hbase安装 4.1 安装准备 首先,要有一个HDFS集群,并正常运行; Hbase的regionserver应该跟hdfs中的datanode在一起zookeeper集群,并正常运行,所以安装Hbase要先安装zookeeper,zookeeper前面已经安装过了。Hbase
4.2 节点安排 各个节点角色分配如下:
节点 
安装的服务 
 
 
Master 
namenode  datanode  regionserver  hmaster  zookeeper 
 
Slave01 
datanode  regionserver  zookeeper 
 
Slave02 
datanode  regionserver  zookeeper 
 
4.3 安装Hbase 解压hbase安装包 hbase-2.0.5-bin.tar.gz
修改hbase-env.sh
1 2 3 4 export  JAVA_HOME=/usr/local /bigdata/java/jdk1.8.0_211export  HBASE_MANAGES_ZK=false 
修改hbase-site.xml
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 <configuration > 	 	<property >  		<name > hbase.rootdir</name >  		<value > hdfs://Master:9000/hbase</value >  	</property >  	 	<property >  		<name > hbase.cluster.distributed</name >  		<value > true</value >  	</property >  	 	<property >  		<name > hbase.zookeeper.quorum</name >  		<value > Master:2181,Slave01:2181,Slave02:2181</value >  	</property >  </configuration > 
修改 regionservers
修改完成后,将安装文件夹放到三个节点的/usr/local/bigdata/目录下
6 启动Hbase集群 先检查hdfs和zookeeper是否正常启动,
1 2 3 4 5 6 7 8 9 10 hadoop@Master:~$ jps 4918 DataNode 2744 QuorumPeerMain 4748 NameNode 9949 Jps 5167 SecondaryNameNode hadoop@Master:~$ /usr/local/bigdata/zookeeper-3.4.6/bin/zkServer.sh status JMX enabled by default Using config: /usr/local/bigdata/zookeeper-3.4.6/bin/../conf/zoo.cfg Mode: follower 
Slave01:
1 2 3 4 5 6 7 8 hadoop@Slave1:~$ jps 3235 QuorumPeerMain 3779 DataNode 5546 Jps hadoop@Slave1:~$  /usr/local/bigdata/zookeeper-3.4.6/bin/zkServer.sh status JMX enabled by default Using config: /usr/local/bigdata/zookeeper-3.4.6/bin/../conf/zoo.cfg Mode: leader 
Slave02:
1 2 3 4 5 6 7 8 hadoop@Slave2:~$ jps 11958 DataNode 13656 Jps 11390 QuorumPeerMain hadoop@Slave2:~$  /usr/local/bigdata/zookeeper-3.4.6/bin/zkServer.sh status JMX enabled by default Using config: /usr/local/bigdata/zookeeper-3.4.6/bin/../conf/zoo.cfg Mode: follower 
然后执行start-hbase.sh
上面的命令会启动配置文件regionserver里添加的所有机器,如果想手动启动其中一台可以用:
1 $  bin/hbase-daemon.sh start regionserver 
启动完成后在Master上会启动HRegionServer和HMaster两个服务,Slave01和Slave02会启动HMaster服务。
高可用Hbase集群应配置两台master一台处于active状态一台处于standby状态,用于监听regionserver 
可以再从另外两条机器中再启动一个HRegionServer服务。
1 $  bin/hbase-daemon.sh start master 
新启的这个master会处于backup状态
7 启动Hbase的命令行客户端 使用命令hbase shell
1 2 3 4 bin/hbase shell Hbase>  list     // 查看表 Hbase>  status   // 查看集群状态 Hbase>  version  // 查看集群版本 
问题 1 2 3 4 5 6 7 8 ERROR: org.apache.hadoop.hbase.ipc.ServerNotRunningYetException: Server is not running yet         at org.apache.hadoop.hbase.master.HMaster.checkServiceStarted(HMaster.java:2932)         at org.apache.hadoop.hbase.master.MasterRpcServices.isMasterRunning(MasterRpcServices.java:1084)         at org.apache.hadoop.hbase.shaded.protobuf.generated.MasterProtos$MasterService $2 .callBlockingMethod(MasterProtos.java)         at org.apache.hadoop.hbase.ipc.RpcServer.call(RpcServer.java:413)         at org.apache.hadoop.hbase.ipc.CallRunner.run(CallRunner.java:130)         at org.apache.hadoop.hbase.ipc.RpcExecutor$Handler .run(RpcExecutor.java:324)         at org.apache.hadoop.hbase.ipc.RpcExecutor$Handler .run(RpcExecutor.java:304) 
解决 1 $ hdfs dfsadmin -safemode leave 
8 Hbase命令行客户端操作 8.1 建表 1 2 create  't_user_info' ,'base_info' ,'extra_info'          表名      列族名   列族名 
8.2 插入数据: 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 hbase(main):011:0> put 't_user_info','001','base_info:username','zhangsan' 0 row(s) in 0.2420 seconds hbase(main):012:0> put 't_user_info','001','base_info:age','18' 0 row(s) in 0.0140 seconds hbase(main):013:0> put 't_user_info','001','base_info:sex','female' 0 row(s) in 0.0070 seconds hbase(main):014:0> put 't_user_info','001','extra_info:career','it' 0 row(s) in 0.0090 seconds hbase(main):015:0> put 't_user_info','002','extra_info:career','actoress' 0 row(s) in 0.0090 seconds hbase(main):016:0> put 't_user_info','002','base_info:username','liuyifei' 0 row(s) in 0.0060 seconds 
8.3 查询数据方式一:scan 扫描 1 2 3 4 5 6 7 8 9 hbase(main):017:0> scan 't_user_info' ROW                               COLUMN+CELL                                                                                       001                              column=base_info:age, timestamp=1496567924507, value=18                                           001                              column=base_info:sex, timestamp=1496567934669, value=female                                       001                              column=base_info:username, timestamp=1496567889554, value=zhangsan                                001                              column=extra_info:career, timestamp=1496567963992, value=it                                       002                              column=base_info:username, timestamp=1496568034187, value=liuyifei                                002                              column=extra_info:career, timestamp=1496568008631, value=actoress                                2 row(s) in 0.0420 seconds 
8.4 查询数据方式二:get 单行数据 1 2 3 4 5 6 7 hbase(main):020:0> get 't_user_info','001' COLUMN                            CELL                                                                                              base_info:age                    timestamp=1496568160192, value=19                                                                 base_info:sex                    timestamp=1496567934669, value=female                                                             base_info:username               timestamp=1496567889554, value=zhangsan                                                           extra_info:career                timestamp=1496567963992, value=it                                                                4 row(s) in 0.0770 seconds 
8.5 删除一个kv数据 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 hbase(main):021:0> delete 't_user_info','001','base_info:sex' 0 row(s) in 0.0390 seconds 删除整行数据: hbase(main):024:0> deleteall 't_user_info','001' 0 row(s) in 0.0090 seconds hbase(main):025:0> get 't_user_info','001' COLUMN                            CELL                                                                                             0 row(s) in 0.0110 seconds 3.4.1.6.	删除整个表: hbase(main):028:0> disable 't_user_info' 0 row(s) in 2.3640 seconds hbase(main):029:0> drop 't_user_info' 0 row(s) in 1.2950 seconds hbase(main):030:0> list TABLE                                                                                                                              0 row(s) in 0.0130 seconds => [] 
8.6 Hbase重要特性–排序特性(行键) 插入到hbase中去的数据,hbase会自动排序存储:key)名; 按字典顺序
Hbase的这个特性跟查询效率有极大的关系
比如:一张用来存储用户信息的表,有名字,户籍,年龄,职业….等信息
思路:如果能将相同省的用户在hbase的存储文件中连续存储,并且能将相同省中相同姓的用户连续存储,那么,上述两个查询需求的效率就会提高!!!
做法:将查询条件拼到rowkey内
9 HBASE客户端API操作 9.1 DDL操作 代码流程:
创建一个连接:Connection conn = ConnectionFactory.createConnection(conf); 
拿到一个DDL操作器:表管理器:adminAdmin admin = conn.getAdmin(); 
用表管理器的api去建表、删表、修改表定义:admin.createTable(HTableDescriptor descriptor); 
 
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 @Before public  void  getConn ()  throws  Exception	 	Configuration conf = HBaseConfiguration.create();  	conf.set("hbase.zookeeper.quorum" , "192.168.233.200:2181,192.168.233.201:2181,192.168.233.202:2181" ); 	 	conn = ConnectionFactory.createConnection(conf); } @Test public  void  testCreateTable ()  throws  Exception	 	Admin admin = conn.getAdmin(); 	 	 	HTableDescriptor hTableDescriptor = new  HTableDescriptor(TableName.valueOf("user_info" )); 	 	 	HColumnDescriptor hColumnDescriptor_1 = new  HColumnDescriptor("base_info" ); 	hColumnDescriptor_1.setMaxVersions(3 );  	 	HColumnDescriptor hColumnDescriptor_2 = new  HColumnDescriptor("extra_info" ); 	 	 	hTableDescriptor.addFamily(hColumnDescriptor_1); 	hTableDescriptor.addFamily(hColumnDescriptor_2); 	 	 	 	admin.createTable(hTableDescriptor); 	 	 	admin.close(); 	conn.close(); 	 } @Test public  void  testDropTable ()  throws  Exception	 	Admin admin = conn.getAdmin(); 	 	 	admin.disableTable(TableName.valueOf("user_info" )); 	 	admin.deleteTable(TableName.valueOf("user_info" )); 	 	 	admin.close(); 	conn.close(); } @Test public  void  testAlterTable ()  throws  Exception	 	Admin admin = conn.getAdmin(); 	 	 	HTableDescriptor tableDescriptor = admin.getTableDescriptor(TableName.valueOf("user_info" )); 	 	 	 	HColumnDescriptor hColumnDescriptor = new  HColumnDescriptor("other_info" ); 	hColumnDescriptor.setBloomFilterType(BloomType.ROWCOL);  	 	 	tableDescriptor.addFamily(hColumnDescriptor); 	 	 	 	admin.modifyTable(TableName.valueOf("user_info" ), tableDescriptor); 	 	 	admin.close(); 	conn.close(); 	 } 
9.2 DML操作 HBase的增删改查
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 191 192 193 Connection conn = null ; @Before public  void  getConn ()  throws  Exception	 	Configuration conf = HBaseConfiguration.create();  	conf.set("hbase.zookeeper.quorum" , "Master:2181,Slave01:2181,Slave02:2181" ); 	 	conn = ConnectionFactory.createConnection(conf); } @Test public  void  testPut ()  throws  Exception	 	 	Table table = conn.getTable(TableName.valueOf("user_info" )); 	 	 	Put put = new  Put(Bytes.toBytes("001" )); 	put.addColumn(Bytes.toBytes("base_info" ), Bytes.toBytes("username" ), Bytes.toBytes("张三" )); 	put.addColumn(Bytes.toBytes("base_info" ), Bytes.toBytes("age" ), Bytes.toBytes("18" )); 	put.addColumn(Bytes.toBytes("extra_info" ), Bytes.toBytes("addr" ), Bytes.toBytes("北京" )); 	 	 	Put put2 = new  Put(Bytes.toBytes("002" )); 	put2.addColumn(Bytes.toBytes("base_info" ), Bytes.toBytes("username" ), Bytes.toBytes("李四" )); 	put2.addColumn(Bytes.toBytes("base_info" ), Bytes.toBytes("age" ), Bytes.toBytes("28" )); 	put2.addColumn(Bytes.toBytes("extra_info" ), Bytes.toBytes("addr" ), Bytes.toBytes("上海" )); 	 	ArrayList<Put> puts = new  ArrayList<>(); 	puts.add(put); 	puts.add(put2); 	 	 	 	table.put(puts); 	 	table.close(); 	conn.close(); 	 } @Test public  void  testManyPuts ()  throws  Exception	 	Table table = conn.getTable(TableName.valueOf("user_info" )); 	ArrayList<Put> puts = new  ArrayList<>(); 	 	for (int  i=0 ;i<100000 ;i++){ 		Put put = new  Put(Bytes.toBytes("" +i)); 		put.addColumn(Bytes.toBytes("base_info" ), Bytes.toBytes("username" ), Bytes.toBytes("张三" +i)); 		put.addColumn(Bytes.toBytes("base_info" ), Bytes.toBytes("age" ), Bytes.toBytes((18 +i)+"" )); 		put.addColumn(Bytes.toBytes("extra_info" ), Bytes.toBytes("addr" ), Bytes.toBytes("北京" )); 		 		puts.add(put); 	} 	 	table.put(puts); 	 } @Test public  void  testDelete ()  throws  Exception	Table table = conn.getTable(TableName.valueOf("user_info" )); 	 	 	Delete delete1 = new  Delete(Bytes.toBytes("001" )); 	 	Delete delete2 = new  Delete(Bytes.toBytes("002" )); 	delete2.addColumn(Bytes.toBytes("extra_info" ), Bytes.toBytes("addr" )); 	 	ArrayList<Delete> dels = new  ArrayList<>(); 	dels.add(delete1); 	dels.add(delete2); 	 	table.delete(dels); 	 	 	table.close(); 	conn.close(); } @Test public  void  testGet ()  throws  Exception	 	Table table = conn.getTable(TableName.valueOf("user_info" )); 	 	Get get = new  Get("002" .getBytes()); 	 	Result result = table.get(get); 	 	 	byte [] value = result.getValue("base_info" .getBytes(), "age" .getBytes()); 	System.out.println(new  String(value)); 	 	System.out.println("-------------------------" ); 	 	 	CellScanner cellScanner = result.cellScanner(); 	while (cellScanner.advance()){ 		Cell cell = cellScanner.current(); 		 		byte [] rowArray = cell.getRowArray();   		byte [] familyArray = cell.getFamilyArray();   		byte [] qualifierArray = cell.getQualifierArray();   		byte [] valueArray = cell.getValueArray();  		 		System.out.println("行键: " +new  String(rowArray,cell.getRowOffset(),cell.getRowLength())); 		System.out.println("列族名: " +new  String(familyArray,cell.getFamilyOffset(),cell.getFamilyLength())); 		System.out.println("列名: " +new  String(qualifierArray,cell.getQualifierOffset(),cell.getQualifierLength())); 		System.out.println("value: " +new  String(valueArray,cell.getValueOffset(),cell.getValueLength())); 		 	} 	 	table.close(); 	conn.close(); 	 } @Test public  void  testScan ()  throws  Exception	 	Table table = conn.getTable(TableName.valueOf("user_info" )); 	 	 	Scan scan = new  Scan("10" .getBytes(), "10000\001" .getBytes()); 	 	ResultScanner scanner = table.getScanner(scan); 	 	Iterator<Result> iterator = scanner.iterator(); 	 	while (iterator.hasNext()){ 		 		Result result = iterator.next(); 		 		CellScanner cellScanner = result.cellScanner(); 		while (cellScanner.advance()){ 			Cell cell = cellScanner.current(); 			 			byte [] rowArray = cell.getRowArray();   			byte [] familyArray = cell.getFamilyArray();   			byte [] qualifierArray = cell.getQualifierArray();   			byte [] valueArray = cell.getValueArray();  			 			System.out.println("行键: " +new  String(rowArray,cell.getRowOffset(),cell.getRowLength())); 			System.out.println("列族名: " +new  String(familyArray,cell.getFamilyOffset(),cell.getFamilyLength())); 			System.out.println("列名: " +new  String(qualifierArray,cell.getQualifierOffset(),cell.getQualifierLength())); 			System.out.println("value: " +new  String(valueArray,cell.getValueOffset(),cell.getValueLength())); 		} 		System.out.println("----------------------" ); 	} } @Test public  void  test () 	String a = "000" ; 	String b = "000\0" ; 	 	System.out.println(a); 	System.out.println(b); 	 	 	byte [] bytes = a.getBytes(); 	byte [] bytes2 = b.getBytes(); 	 	System.out.println("" ); 	 }