Spring Data - How to configure hbase - 阿貝好威的實驗室

2013年3月26日星期二

Spring Data - How to configure hbase

接續上一篇 "Spring Data - Hadoop and MapReduce "，本篇要補齊的就是關於Hbase的設定部分，因為我覺得書上講的不是很清楚(或者應該說對於spring 不夠熟悉的人，會覺得不容易了解)。

開發環境：

Eclipse JunoSR2
Lib

Spring Data 1.0.0.RELEASE
Spring 3.2.1.RELEASE
Hadoop 1.1.1
Hbase 0.94.2

Hadoop Platform (single node - pseudo-distributed mode)

Hortonworks sandbox

話說Hortonworks sandbox 是一包裝再VM裡面的Hadoop大集合，很適合用來開發測試程式，這樣一來我只要帶著Notebook到哪裡都可以開發和做實驗了，安裝好以後web畫面如下：

圖片來源：螢幕截圖

首先我們先來看，關於Hbase Spring 設定檔的部份


<?xml version="1.0" encoding="UTF-8"?>

<beans xmlns="http://www.springframework.org/schema/beans" 

xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" 

xmlns:context="http://www.springframework.org/schema/context" xmlns:hdp="http://www.springframework.org/schema/hadoop"

xmlns:p="http://www.springframework.org/schema/p"

xsi:schemaLocation="http://www.springframework.org/schema/beans 



http://www.springframework.org/schema/beans/spring-beans.xsd

http://www.springframework.org/schema/context 

http://www.springframework.org/schema/context/spring-context.xsd

http://www.springframework.org/schema/hadoop 

http://www.springframework.org/schema/hadoop/spring-hadoop.xsd">





<!-- 把所有classpath 內的 .properity 都讀進來 -->

<context:property-placeholder location="classpath*:*.properties">



<!-- Spring data for hadoop 的設定 -->

<hdp:configuration>

<!-- hdfs url設定 -->

    fs.default.name=${hd.fs}

<!-- authorization 設定 -->

    hadoop.security.authorization=true

</hdp:configuration>



<!--Spring Data Hbase 設定 -->

<hdp:hbase-configuration delete-connection="true" stop-proxy="false">

<!-- Hbase rootdir 設定 -->

    hbase.rootdir=${hbase.rootdir}

<!-- Hbase zookeeper cluster 設定 -->

    hbase.zookeeper.quorum=${hbase.zookeeper.quorum}

</hdp:hbase-configuration>






<!-- Spring Data for Hbase template -->

<bean class="org.springframework.data.hadoop.hbase.HbaseTemplate" id="hTemplate" p:configuration-ref="hbaseConfiguration">

<!-- 設定component-scan dao這個目錄  -->

<context:component-scan base-package="com.howie.hadoop.hbase.dao">

    <context:include-filter expression="org.springframework.stereotype.Repository" type="annotation">
</context:include-filter>

</context:component-scan>

hbase.properties 設定


hbase.zookeeper.quorum=sandbox

zk-port=2181

hbase.rootdir=hdfs://sandbox:8020/apps/hbase/data

當然這只是最基本讓程式可以動的設定(for single node - pseudo-distributed mode)，如果針對Real Cluster mode 就會需要更多的設定，詳情請參考 Hadoop Configuration, MapReduce, and Distributed Cache , Working with HBase。

另外在這邊補充如何使用Spring & JUnit 幫 Hbase Program做Unit Test


@RunWith(SpringJUnit4ClassRunner.class)

@ContextConfiguration(locations = {"classpath:META-INF/spring/hbase-spring-context.xml"})

public class UserDaoIntegrationTest {



   @Autowired 

    private UserDao userDao; 



    @Test 

    public void testSave() { 



        User user = userDao.save("test1", "test1@gmail.com", "test1..");



        System.out.println("User:" + user);

    }

}

這樣就可以很方便的幫Hbase 的程式做Unit testing了 (其實是Integration Test...)
不過至於MapReduce 的Unit Test 可能就要借助MRUnit，不過我也還在研究要怎麼跟Spring 整合...

順帶一提的就是在Run unit test 我遇到了下面這個錯誤訊息：

[info] ipc.hbaserpc problem connecting to server 60020

我原本以為是我設定錯誤，但是似乎是sandbox的Hbase掛了，連不上HMaster...
所以到底是sandbox不穩？還是我的程式沒正常斷線造成出問題？這也還得研究研究....