Fork me on GitHub

Release Notes

All past CDK releases are documented on this page. Upcoming release dates can be found in JIRA.

Version 0.4.1

Version 0.4.1 has the following notable changes:

Morphlines Library
  • Expanded documentation and examples
  • Made SolrLocator and ZooKeeperDownloader collection alias aware
  • Added commands readJson and extractJsonPaths for reading, extracting, and transforming JSON files and JSON objects, in the same style as Avro support
  • Added commands split, findReplace, extractURIComponents, extractURIQueryParameters, decodeBase64
  • Fixed extractAvroPaths exception with flatten = true if path represents a non-leaf node of type Record
  • Added several performance enhancements

Version 0.4.0

Version 0.4.0 has the following notable changes:

  • Morphlines Library. A morphline is a rich configuration file that makes it easy to define an ETL transformation chain embedded in Hadoop components such as Search, Flume, MapReduce, Pig, Hive, Sqoop.
  • An Oozie example. A new example of using Oozie to run a transformation job periodically.
  • QuickStart VM update. The examples now use version 4.3.0 of the Cloudera QuickStart VM.
  • Java package changes. The com.cloudera.data package and subpackages have been renamed to com.cloudera.cdk.data, and com.cloudera.cdk.flume has become com.cloudera.cdk.data.flume.
  • Finer-grained Maven modules. The module organization and naming has changed, including making all group IDs com.cloudera.cdk. Please see the new dependencies page for details.

The full change log is available from JIRA.

Version 0.3.0

Release date: June 6, 2013

Version 0.3.0 has the following notable changes:

  • Logging to a dataset. Using log4j as the logging API and Flume as the log transport, it is now possible to log application events to datasets.
  • Crunch support. Datasets can be exposed as Crunch sources and targets.
  • Date partitioning. New partitioning functions for partitioning datasets by year/month/day/hour/minute.
  • New examples. The new examples repository has examples for all these new features. The examples use the Cloudera QuickStart VM, version 4.2.0, to make running the examples as simple as possible.

The full change log is available from JIRA.

Version 0.2.0

Release date: May 2, 2013

Version 0.2.0 has two major additions:

  • Experimental support for reading and writing datasets in Parquet format.
  • Support for storing dataset metadata in a Hive/HCatalog metastore.

The examples module has example code for both of these usages.

The full change log is available from JIRA.

Version 0.1.0

Release date: April 5, 2013

Version 0.1.0 is the first release of the CDK Data module. This is considered a beta release. As a sub-1.0.0 release, this version is not subject to the normal API compatibility guarantees. See the Compatibility Statement for information about API compatibility guarantees.