2013年9月30日 星期一

Data Encryption for Hadoop - HadoopCryptoCompressor


As previously article (Security for Hadoop - Data Encryption) mentioned, data encryption is still not officially support .

Today I want to show you an interesting project call HadoopCryptoCompressor , which is a simple "compressor" for hadoop (really don't compress anythig) but enable you to encrypt your data with public key "AES/CBC/PKCS5Padding".

This project has also propose to Hadoop , The JIRA id is  HADOOP-7857.

Unfortunately, the original version started by geisbruch is not work for me. so I decided try to fix it . And I also merge another branch (fork by ubiquitousthey ).


Here is my fork& Patch version - howie/HadoopCryptoCompressor  , and I will show you how to use this plugin.

Tutorial


1. Install

1.1 Clone from Github
# git clone https://github.com/howie/HadoopCryptoCompressor.git crypto


1.2 Build with Maven

# cd crypto
# mvn install


Maven will generate HadoopCryptoCompressor-0.0.6-SNAPSHOT.jar at ../crypto/target/

1.3 Modify /etc/hadoop/conf/core-site.xml


  io.compression.codecs  
  org.apache.hadoop.io.compress.CryptoCodec,org.apache.hadoop.io.compress.GzipCodec,org.apache.hadoop.io.compress.DefaultCodec,com.hadoop.compression.lzo.LzoCodec,com.hadoop.compression.lzo.LzopCodec,org.apache.hadoop.io.compress.SnappyCodec



1.4 Copy jar to Hadoop Classpath

There are two way to copy HadoopCryptoCompressor-0.0.6-SNAPSHOT.jar to classpath

A. Maunally copy

Directly copy Jar to every machine's /usr/lib/hadoop/lib/  and Modify /etc/hadoop/conf/hadoop-env.sh

export HADOOP_CLASSPATH=/usr/lib/hadoop/lib/HadoopCryptoCompressor-0.0.6-SNAPSHOT.jar:${HADOOP_CLASSPATH}${JAVA_JDBC_LIBS}:${MAPREDUCE_LIBS}


Ps. this may not work in some full distributed environment.

B. Use -libjar to copy

Run some hadoop-example program such as wordcount with -libjar , hadoop will copy HadoopCryptoCompressor-0.0.6-SNAPSHOT.jar to each node's HADOOP_CLASSPATH


After install HadoopCryptoCompressor-0.0.6-SNAPSHOT.jar, Let's look into some scenario.

Scenario 1 - wordcount with encrypt data


1. Generate encrypt data

Choose any text file , and encrypt by HadoopCryptoCompressor-0.0.6-SNAPSHOT.jar

# java -jar HadoopCryptoCompressor-0.0.6-SNAPSHOT.jar -e -aeskey 123456 test.txt test.crypto


Notice that Hadoop Compression will trigger by detecting File name Extension , only if the encrypt file name is *.crypto.

2. Upload file to hdfs

# hadoop fs -put test.crypto /tmp/


3. Run wordcount

# hadoop  jar /usr/lib/hadoop/hadoop-examples.jar  wordcount -libjars file:///root/HadoopCryptoCompressor-0.0.6-SNAPSHOT.jar -Dcrypto.secret.key=123456  /tmp/test.crypto /tmp/wc-test_data_aes


4. check the result

Finally you can checkout the wordcount result

# hadoop fs -cat /tmp/test.crypto /tmp/wc-test_data_aes


Scenario 2 - Hive with encrypt data


In this scenario we try to load an encrypt file into hive ,and can select by hive.

1. Generate an encrypt file and encrypt it

The following is the content of the example file. (company_Info.txt)


# ID, company , tel , address  



A1,Trend Micro,2-2378-9666, 台北市敦化南路一段198號

A2,Google,2-8729-6000, 台灣台北市信義區市府路45號

A3,Apple,0800-020-021,台北市信義區松智路1號19樓A


2. Encrypt the file

#java -jar HadoopCryptoCompressor-0.0.6-SNAPSHOT.jar -e -aeskey 123456 -in company_Info.txt -out company_Info.crypto




3. Create hive table and load company_Info.crypto into it

First, generate a hive script for upload encrypt file. Here is the example

-- filename:upload.hive



CREATE TABLE IF NOT EXISTS test (

ID STRING,

Company_Name STRING,

Tel_Number STRING,

Address STRING)

ROW FORMAT DELIMITED FIELDS TERMINATED BY ','

STORED AS TEXTFILE;



LOAD DATA LOCAL INPATH '${hiveconf:file}'  OVERWRITE INTO TABLE companyInfo;


Second, use hive to execute upload.hive

#hive  -f uploadData.hive -hiveconf file=company_Info.crypto


4. select data from hive

Run hive , go into hive shell mode.
hive> set crypto.secret.key=123456;

hive> select * form companyInfo;

OK



A1    Trend Micro    2-2378-9666    台北市敦化南路一段198號

A2    Google         2-8729-6000    台灣台北市信義區市府路45號

A3    Apple          0800-020-021   台北市信義區松智路1號19樓A

Time taken: 7.128 seconds, Fetched: 3 row(s)

hive>



寫完這篇...真的有考慮換用logdown...Orz..
張貼留言