阿貝好威的實驗室: ApacheCon2013 - Hadoop and HBase on the Cloud: A Case Study on Performance and Isolation

圖片來源：改編自網路圖片

(這篇真的壓了好久喔....Orz...沒時間好好研究和收集這方面的資料...)

還記得之前我之前曾經寫過兩篇文章：到底功夫熊貓(Xen)踢不踢的動大象(Hadoop)呢和"要使用大象，真的得養頭大象嗎？為何不使用AWS EMR，剛好這次ApacheCon2013就有談到這個題目：

Hadoop and HBase on the Cloud: A Case Study on Performance and Isolation.

by Konstantin Shvachko, Jagane Sundar

這個talk討論的正是如何用虛擬化提高Hadoop的使用效能!? 一般來說談到虛擬化，第一直覺反應通常是使用後往往會拉低使用效能，怎麼反而還可以提高呢？所以這個題目讓我特別感興趣，讓我們來看看他的論點。

首先他的假設前提是 "Low average CPU utilization on Hadoop Clusters"，因為他認為Disk I/O和Network 都是可以透過設計和規劃提高一定的效能，但是CPU utilization is bad，他列兩點原因：

IO bound workloads preclude form using more cpu time
Cluster provisioning:

peak-local performance vs average utilization trade-off

不過他下面show的圖只能說使用Virtualization 會更有效率的使用CPU，就把它操的很忙....

所以結論是...?

不過其實上面的這些假設與如何實現不是重點，重點是聽到下面幾個不錯的Benchmark提供參考，因為當你安裝好Hadoop後，你要怎麼知道你的Hadoop設定的對不對，Performance是好不好？是否有達到一定的水平？

DFSIO (Standard Hadoop benchmark measuring HDFS performance)
YCSB (Yadhoo Cloud service benchmark)
TeraSort benchmark suite - Sort Benchmark Home Page
NameNode benchmark ( src/test/org/apache/hadoop/hdfs/NNBench.java)
MapReduce benchmark (src/test/org/apache/hadoop/mapred/MRBench.java)

Reference:
[1] How to Benchmark a Hadoop Cluster
[2] How can I run a DFSIO test on MapR?
[3] Benchmarking - DFSIO, Terasort
[4] Benchmarking Hadoop & HBase on Violin
[5] Apache hadoop performance-tuning methodologies and best practices
[6] AMD Hadoop Performance Tuning Guide
[7] Benchmarking and Stress Testing an Hadoop Cluster With TeraSort, TestDFSIO & Co.

阿貝好威的實驗室

網頁

2013年6月23日星期日

ApacheCon2013 - Hadoop and HBase on the Cloud: A Case Study on Performance and Isolation

沒有留言:

張貼留言

網頁

2013年6月23日 星期日

ApacheCon2013 - Hadoop and HBase on the Cloud: A Case Study on Performance and Isolation

沒有留言:

張貼留言

2013年6月23日星期日