這一陣子花了不少時間在研究與思考一個問題,就是關於資料的儲存與分析,同樣是面對資料,但是卻往往使用不同的技術在處理,舉Amazon來說,如果你想要使用EMR來分析資料,Amazon 的流程是這樣,資料的儲存使用Object based storage (如:S3),而資料分析才是使用block device storage(如:HDFS),不過現在甚至可以直接從S3 取代HDFS....
所以何時開始用Object based storage何時該使用block device storage呢?
在研究這個問題前,先看看wikipedia對於object based storage的定義:
An Object-based Storage Device (OSD) is a computer storage device, similar to disk storage but working at a higher level. Instead of providing a block-oriented interface that reads and writes fixed sized blocks of data, an OSD organizes data into flexible-sized data containers, called objects.
由文字可能比較難理解,但是由圖片來看應該就比較好理解。
另一份是IBM的論文在解釋Block Device 和 Object Store的差異
所以兩者最大的差異是怎麼去看待物件這回事,Object Storage 存取是針對Object 像是create Object delete object,另一個重點是要提供Http API (Restful),所以只要符合這幾個原則,不管後端是block file system or HDFS 都算是Object storage。
- Cassandra (DataStax) :Hadoop on Cassandra
- Ceph (inktank) : Hadoop on Ceph
- Dispersed Storage Network (Cleversafe)
- GPFS (IBM)
- Isilon (EMC)
- Lustre
- MapR File System
- NetApp Open Solution for Hadoop
- Extreme low cost per byte
- Very high bandwidth to support MapReduce workloads
- Rock solid data reliability
- System not designed for Hadoop’s scale
- System that don’t use commodity hardware or open source software
- Not designed for MapReduce’s I/O patterns
- Unproven technology
Reference:
[1] Object Storage: The Future Building Block for Storage Systems - IBM Haifa Research Laboratories
[2] Object-Based Storage Devices
[3] Objectively speaking: the future of objects
沒有留言 :
張貼留言