話說對於常要佈署與維運Hadoop Cluster的人來說,通常會尋求兩種安裝方式:
- 盡量透過Script化來安裝 (前一篇Create Hadoop Cluster on GCP 就是這種)
- 盡量透過GUI的方式來安裝或管理
而Ambari 的存在就是希望可以透過GUI的方式來方便佈署與安裝,不過在IaaS的環境免不了的還是得手動規劃要開幾台VM,每台VM的角色是什麼,這時候就會覺得如果能包在Ambari裡面那就好了!
我相信社群也是有不少人同意這種想法,所以才會有了Hortonwork 與 OpenStack 攜手推出 Hortonworks Data Plaform Plugin — Savanna ,
不過這是專屬於Openstack 的功能,如果其他IaaS也有的話就好摟,因此在其他平台也有提供此功能出現前,我們只能手動部屬VM與安裝環境,本篇文章就是記錄如何用手動的方式:
- 規劃並在GCP上開啟VM
- 設定GCP 環境 For Hadoop Cluster
- 在VM上安裝Ambari 和 Hadoop Cluster
在GCP上安裝 Fully Distributed Hadoop Cluster
VM環境:
- CentOS 6.x
- Ambari 1.4.3
- Kerberos
- JDK-1.6.0_31 (Ambari Default)
請參考以下文件:
- Install Ambari 1.4.3 from public repositories
- Installing Kerberos
- GCP - Networking and Firewalls
- Hadoop Cluster 如何使用Ambari 安裝與啟用 Kerberos
規劃GCP 上開啟VM
在Hadoop 1.x 和 2.x 上的差異主要就是多了YARN 的 Node Manager(先忽略不管) 我在這邊總共規劃五台機器,希望可以達到跑的很順的最小安裝:
- 一台Master : 安裝Name Node,Job Tracker,HBase Master...等
- 一台Mater-secondary: 把Secondary Name Node 以及Log History相關的Servcie 放到這台
- 三台Hadoop-slave:安裝Data Node,Task Tracker 放在slave
設定Google Cloud Platform
為了要讓Hadoop Cluster 能在Google Cloud Platform 順利運行,必須先解決以下問題:
1. 防火牆
把VM 裡面的iptables 關閉,使用Google Cloud Platform所提供的防火牆,可以利用gcutil 來新增rule,或是從Developers Console新增,指令如下:
$ gcutil --project=howie-hadoop-cluster addfirewall ambari-fw --description="Incoming http allowed." --allowed="tcp:http"
2. Ambari 設定 Fully Qualified Domain Name (FQDN)
從Develop Console 安裝VM時,GCP會自動根據我們幫VM的命名產生FQDN,這樣一來就省去我們要到每一台機器編輯/etc/hosts,之後Ambari 就可以依據下列的FQDN來連線。
hadoop-slave-3.c.howie-hadoop-cluster.internal hadoop-slave-2.c.howie-hadoop-cluster.internal hadoop-slave-1.c.howie-hadoop-cluster.internal hadoop-master.c.howie-hadoop-cluster.internal hadoop-master-secondary.c.howie-hadoop-cluster.internal
3. 讓Ambari VM 可以SSH 登入其他台VM without password
首先登入Ambari VM ,利用gcutil ssh 的指令登入任一一台vm,範例如下:
$ gcutil --service_version="v1" --project="howie-hadoop-cluster" ssh --zone="us-central1-a" "hadoop-slave-1"gcutil 會在 ~/.ssh/ 目錄下面產生檔名為google_compute_engine 的 public key 和 private key,之後在安裝Ambari 時就可以把這把private key 貼上去,畫面如下:
安裝與設定Kerberos
參考之前的文章Hadoop Cluster 如何使用Ambari 安裝與啟用 Kerberos,唯一的差別就是此次的目標是要安裝Fully Distributed Hadoop Cluster ,也就是KDC 跟Ambari 是會分開安裝的。(我忘了有這個差別所以出錯好幾次~囧rz...)
在KDC新增使用者和群子組
由於KDC 並沒有加入Ambari 的管控,所以Ambari 也不會幫他新增使用者權限與群組,所以必須手動在/etc/passwd 加入以下內容:
ambari:x:501:501:Ambari user:/var/lib/ambari-server/keys/:/sbin/nologin puppet:x:502:499::/:/bin/bash nagios:x:503:502::/home/nagios:/bin/bash yarn:x:504:503::/home/yarn:/bin/bash hive:x:505:503::/home/hive:/bin/bash ambari-qa:x:1003:503::/home/ambari-qa:/bin/bash hbase:x:1001:503::/home/hbase:/bin/bash oozie:x:508:503::/home/oozie:/bin/bash hcat:x:1002:503::/home/hcat:/bin/bash rrdcached:x:498:498:rrdcached:/var/rrdtool/rrdcached:/sbin/nologin apache:x:48:48:Apache:/var/www:/sbin/nologin zookeeper:x:497:503:ZooKeeper:/var/run/zookeeper:/bin/bash hdfs:x:496:503:Hadoop HDFS:/var/lib/hadoop-hdfs:/bin/bash mapred:x:495:503:Hadoop MapReduce:/var/lib/hadoop-mapreduce:/bin/bash sqoop:x:494:503:Sqoop:/var/lib/sqoop:/bin/bash
並且在/etc/group 加入:
ambari:x:501: puppet:x:499: nagios:x:502:apache hadoop:x:503:hbase,hdfs,mapred rrdcached:x:498: apache:x:48: hdfs:x:497: hbase:x:496: mapred:x:495: hive:x:494: hcat:x:493: nagiocmd:x:504:apache
當這些都設定好後,才能執行Hadoop Cluster 如何使用Ambari 安裝與啟用 Kerberos 所提到的用create_keytabs.sh去產生keytab。
Copy keytab 到每台機器
create_keytabs.sh 會產生以下檔案:
keytabs_ambari-server.c.howie-hadoop-cluster.internal.tar keytabs_hadoop-master.c.howie-hadoop-cluster.internal.tar keytabs_hadoop-master-secondary.c.howie-hadoop-cluster.internal.tar keytabs_hadoop-slave-1.c.howie-hadoop-cluster.internal.tar keytabs_hadoop-slave-2.c.howie-hadoop-cluster.internal.tar keytabs_hadoop-slave-3.c.howie-hadoop-cluster.internal.tar
手動把這些檔案scp 到相對應的vm 目錄下,在GCP 環境下,參考的SCP運作方式:
scp -o UserKnownHostsFile=/dev/null -o CheckHostIP=no -o StrictHostKeyChecking=no -i $HOME/.ssh/google_compute_engine keytabs_ambari-server.c.howie-hadoop-cluster.internal.tar howie@ambari-server:/home/howie/. scp -o UserKnownHostsFile=/dev/null -o CheckHostIP=no -o StrictHostKeyChecking=no -i $HOME/.ssh/google_compute_engine keytabs_hadoop-master.c.howie-hadoop-cluster.internal.tar howie@hadoop-master:/home/howie/.並且在解開後放到/etc/security/keytabs 目錄下(要注意不管是解開tar 或是 mv 都要用sudo)
sudo tar xvf keytabs_ambari-server.c.howie-hadoop-cluster.internal.tar sudo mv etc/security/keytabs /etc/security/
搬移後要確定keytabs 權限如下:
-r--r-----. 1 hbase hadoop 1682 2014-02-10 07:01 hbase.headless.keytab -r--r-----. 1 hdfs hadoop 1658 2014-02-10 07:01 hdfs.headless.keytab -r--------. 1 mapred hadoop 502 2014-02-10 07:01 jhs.service.keytab -r--------. 1 hdfs hadoop 954 2014-02-10 07:01 nn.service.keytab -r--r-----. 1 ambari-qa hadoop 1778 2014-02-10 07:01 smokeuser.headless.keytab -r--r-----. 1 root hadoop 3334 2014-02-10 07:01 spnego.service.keytab等到這些都設定好,再回到Ambari 的介面去啟動 (Enable Security)。
TroubleShooting
Checksum failed
當看到以下錯誤訊息時,應該是你產生的keytabs 有誤,建議重新檢查/etc/krb5.conf 的設定有無錯誤。Caused by: KrbException: Checksum failed at sun.security.krb5.internal.crypto.ArcFourHmacEType.decrypt(ArcFourHmacEType.java:85) at sun.security.krb5.internal.crypto.ArcFourHmacEType.decrypt(ArcFourHmacEType.java:77) at sun.security.krb5.EncryptedData.decrypt(EncryptedData.java:168) at sun.security.krb5.KrbAsRep.Receive timed out 當遇到此錯誤訊息,請檢查防火牆(iptables)是否有關閉(KrbAsRep.java:87) at sun.security.krb5.KrbAsReq.getReply(KrbAsReq.java:446) at sun.security.krb5.Credentials.sendASRequest(Credentials.java:401) at sun.security.krb5.Credentials.acquireTGT(Credentials.java:350) at com.sun.security.auth.module.Krb5LoginModule.attemptAuthentication(Krb5LoginModule.java:672) ... 19 more Caused by: java.security.GeneralSecurityException: Checksum failed at sun.security.krb5.internal.crypto.dk.ArcFourCrypto.decrypt(ArcFourCrypto.java:388) at sun.security.krb5.internal.crypto.ArcFourHmac.decrypt(ArcFourHmac.java:74) at sun.security.krb5.internal.crypto.ArcFourHmacEType.decrypt(ArcFourHmacEType.java:83) ... 26 more
Caused by: javax.security.auth.login.LoginException: Receive timed out at com.sun.security.auth.module.Krb5LoginModule.attemptAuthentication(Krb5LoginModule.java:700) at com.sun.security.auth.module.Krb5LoginModule.login(Krb5LoginModule.java:542) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25) at java.lang.reflect.Method.invoke(Method.java:597) at javax.security.auth.login.LoginContext.invoke(LoginContext.java:769) at javax.security.auth.login.LoginContext.access$000(LoginContext.java:186) at javax.security.auth.login.LoginContext$5.run(LoginContext.java:706) at java.security.AccessController.doPrivileged(Native Method) at javax.security.auth.login.LoginContext.invokeCreatorPriv(LoginContext.java:703) at javax.security.auth.login.LoginContext.login(LoginContext.java:575) at org.apache.hadoop.security.UserGroupInformation.loginUserFromKeytab(UserGroupInformation.java:832) ... 7 more Caused by: java.net.SocketTimeoutException: Receive timed out at java.net.PlainDatagramSocketImpl.receive0(Native Method) at java.net.PlainDatagramSocketImpl.receive(PlainDatagramSocketImpl.java:145) at java.net.DatagramSocket.receive(DatagramSocket.java:725) at sun.security.krb5.internal.UDPClient.receive(UDPClient.java:77) at sun.security.krb5.KrbKdcReq$KdcCommunication.run(KrbKdcReq.java:388) at java.security.AccessController.doPrivileged(Native Method) at sun.security.krb5.KrbKdcReq.send(KrbKdcReq.java:296) at sun.security.krb5.KrbKdcReq.send(KrbKdcReq.java:202) at sun.security.krb5.KrbKdcReq.send(KrbKdcReq.java:175) at sun.security.krb5.KrbAsReq.send(KrbAsReq.java:431) at sun.security.krb5.Credentials.sendASRequest(Credentials.java:400) at sun.security.krb5.Credentials.acquireTGT(Credentials.java:350) at com.sun.security.auth.module.Krb5LoginModule.attemptAuthentication(Krb5LoginModule.java:672) ... 19 more
Update [20140218]
更多可能會發生的錯誤:
後記: 我總共開了7台vm 過了一個週末就花了我37USD ..... 我都還沒開始做實驗耶...Orz...
沒有留言:
張貼留言