Cloudera Development Kit

Cloudera Development Kit

Hadoop App Development Made Easier

The Cloudera Development Kit (Apache License, Version 2.0), or CDK for short, is a set of libraries, tools, examples, and documentation focused on making it easier to build systems on top of the Hadoop ecosystem.

  • Codifies expert patterns and practices for building data-oriented systems and applications
  • Lets developers focus on business logic, not plumbing or infrastructure
  • Provides smart defaults for platform choices
  • Supports gradual adoption via loosely-coupled modules


  • CDK Data The data module provides logical abstractions on top of storage subsystems (e.g. HDFS) that let users think and operate in terms of records, datasets, and dataset repositories. If you’re looking to read or write records directly to/from a storage system, the data module is for you.
  • CDK Maven Plugin The CDK Maven Plugin provides Maven goals for packaging, deploying, and running distributed applications.
  • CDK Morphlines The Morphlines module reduces the time and skills necessary to build and change Hadoop ETL stream processing applications that extract, transform and load data into Apache Solr, Enterprise Data Warehouses, HDFS, HBase or Analytic Online Dashboards.
  • CDK Tools The tools module provides command-line tools and APIs for performing common tasks with the CDK.

There is also example code demonstrating how to use the CDK in a separate GitHub repository at

Back to top

Version: 0.9.1. Last Published: 2014-01-14.

Reflow Maven skin by Andrius Velykis.