這次在ApacheCon2013 上看到 Apache Kafka 的專案介紹，心想又一個messaging system !?
於是又好奇的去收集了許多資料，首先參考Slideshare的一篇介紹Apache Kafka 的投影片，Kafka號稱最大的特色是同時混和了Offline log以及Realtime Message 兩種功能。
還記得上一篇文章我有提到各種分散式Log-aggregation系統 (如Scribe 和 Flume)，他們的架構都是Push driven architecture，雖然具高效能高擴充性，但是有以下缺點：
- 預期的端點(End points)都是大型叢集(如：Hadoop)
- 端點(End points)不能有太多即時性商業邏輯 (business logic in real-time)
- No API for batching, transcational (broker retain consumers stream position)
- No Message persistence means multiple consumers over time are impossible limiting architecture
Kafka producer currently doesn't wait for ack form the broker. Without ack , there is no gurantee that every published message is actually receved by the broker
RabbitMQ vs. Kafka
了解了Kafka特性後，還是不免會想要跟RabbitMQ作一下比較，參考Quora的這篇文章"RabbitMQ vs Kafka which one for durable messaging with good query feature"，針對兩個系統的特色，節錄以下內容：
a) Use Kafka if you have a fire hose of events (100k+/sec) you need delivered in partitioned order 'at least once' with a mix of online and batch consumers, you want to be able to re-read messages, you can deal with current limitations around node-level HA (or can use trunk code), and/or you don't mind supporting incubator-level software yourself via forums/IRC.
b) Use Rabbit if you have messages (20k+/sec) that need to be routed in complex ways to consumers, you want per-message delivery guarantees, you don't care about ordered delivery, you need HA at the cluster-node level now, and/or you need 24x7 paid support in addition to forums/IRC.
不過作者也回答道，如果要求的是Real time streaming process (filter/query) 的功能的話，這兩個都不適合，反而應該考慮Storm：
Neither offers great "filter/query" capabilities - if you need that, consider using Storm on top of one of these solutions to add computation, filtering, querying, on your streams
透過這些文章與解釋我似乎開始了解 Kafka or Storm 的用途與應用情境，也漸漸了解到為什麼很多公司都是使這兩個系統用來補足Hadoop real-time process 不足的地方，因為實在有太多公司使用這種組合，下面舉例infochimps這間公司如何看待Storm and Kafka：
Why should you care?
With Storm and Kafka, you can conduct stream processing at linear scale, assured that every message gets processed in real-time, reliably. In tandem, Storm and Kafka can handle data velocities of tens of thousands of messages every second.
Stream processing solutions like Storm and Kafka have caught the attention of many enterprises due to their superior approach to ETL (extract, transform, load) and data integration.
Storm and Kafka are also great at in-memory analytics, and real-time decision support. Companies are quickly realizing that batch processing in Hadoop does not support real-time business needs. Real-time streaming analytics is a must-have component in any enterprise Big Data solution or stack, because of how elegantly they handle the “three V’s” — volume, velocity and variety.
Storm and Kafka are the two technologies on the list that we’re most committed to at Infochimps, and it is reasonable to expect that they’ll be a formal part of our platform soon.
ps. 話說趨勢也有自己的MQ叫做TME - TrendMicro Message Exchange，所以我才會覺得怎麼又一個MQ....XD
 Message Queue Evaluation Notes
 Background Jobs in Ruby on Rails
 Intra-cluster Replication in Apache Kafka