Fork me on GitHub

Cloudera Development Kit - Tools Module

The Tools Module is a collection of command-line tools and APIs for performing common tasks with CDK.

Example - Convert Combined Log Format files to a CDK dataset

From the tools module, build with

mvn install

Then run with

cp src/test/resources/access_log.txt /tmp/input
mvn exec:java -Dexec.mainClass="" \
-Dexec.args="file:///tmp/input file:///tmp/output logs"

Look at the output (Combined Log format converted to Avro files):

java -jar /path/to/avro-tools-*.jar tojson /tmp/output/logs/*.avro | head