Cloudera Development Kit - Tools Module
The Tools Module is a collection of command-line tools and APIs for performing common tasks with CDK.
Example - Convert Combined Log Format files to a CDK dataset
From the tools module, build with
mvn install
Then run with
cp src/test/resources/access_log.txt /tmp/input mvn exec:java -Dexec.mainClass="com.cloudera.cdk.tools.CombinedLogFormatConverter" \ -Dexec.args="file:///tmp/input repo:file:///tmp/output logs"
Look at the output (Combined Log format converted to Avro files):
java -jar /path/to/avro-tools-*.jar tojson /tmp/output/logs/*.avro | head