2014年2月18日 星期二

Hadoop 2.x (YARN) 安裝Kerberos 注意事項 - Client minimum UID


這篇算是在GCP 上安裝Ambari 和 Kerberos 的延伸閱讀,話說在Hadoop 2.x 版本開始引入了YARN 和 Container 的概念,也產生了許多相關的設定檔,對於全新安裝的環境來說,一開始也許可以偷懶不用完全了解,單純跑個MR Job 應該是不會遇到什麼問題,但是一旦牽涉跟Kerberos 整合,問題就接二連三的跑出來 (出來混都是要還的....)

這次遇到的問題:

14/02/18 07:00:03 INFO mapreduce.Job: 
Job job_1392102701312_0007 failed with state FAILED due to: 
Application application_1392102701312_0007 failed 2 times due to 
AM Container for appattempt_1392102701312_0007_000002 exited with  exitCode: -1000 due to: 
Application application_1392102701312_0007 initialization failed (exitCode=255) with output: 
Requested user howie is not whitelisted and has id 500,which is below the minimum allowed 1000




上網查了資料:

[1] YARN & HDFS2 安装和配置Kerberos
[2] test-task-controller fails if run as a userid < 1000
[3] Use Kerberos Authentication to Provide Spoon Users Access to Hadoop Cluster
其中在[3]提到:
Make sure there is an operating system user account on each node in the Hadoop cluster for each user that you want to add to the Kerberos database. Add operating system user accounts if necessary. Note that the user account UIDs must be greater than the minimum user ID value (min.user.id). Usually, the minimum user ID value is set to 1000.

原來是YARN 多了一個Container的設定檔 /etc/hadoop/conf/container-executor.cfg,內容如下:

yarn.nodemanager.local-dirs=/hadoop/yarn/local
yarn.nodemanager.log-dirs=/hadoop/yarn/log
yarn.nodemanager.linux-container-executor.group=hadoop
banned.users = hfds,yarn,mapred,bin
min.user.id=1000

其中有規定送出MR Job 的Client 的 UID 必須是大於某個數字(Default 是1000),但是以RHEL /CentOS 來說,Default user 是從500起跳, 所以有以下解法:

  1. 把min.user.id 改小 (不過有啥副作用未知)
  2. 把Client UID 改成大於1000 (危險!不小心改錯就完了...)
  3. 新增一個帳號 UID  大於1000 (Recommended )

張貼留言