Fork me on GitHub

Dependency Information

To use the CDK modules in a Java project add the Cloudera repository to your Maven POM:

<repository>
  <id>cdh.repo</id>
  <url>https://repository.cloudera.com/artifactory/cloudera-repos</url>
  <name>Cloudera Repositories</name>
  <snapshots>
    <enabled>false</enabled>
  </snapshots>
</repository>

Then add a dependency for each module you want to use by referring to the information listed on the Dependency Information pages listed below. You can also view the transitive dependencies for each module.

Hadoop Component Dependencies

As a general rule, CDK modules mark Hadoop component dependencies as having provided scope, since in many cases the dependencies are provided by the container that the code is running in.

For example,

  • CDK Data has a provided dependency on the core Hadoop libraries
  • CDK Crunch has a provided dependency on Crunch and the core Hadoop libraries
  • CDK HCatalog has a provided dependency on HCatalog

The following containers provide the dependencies listed:

  • The CDK Maven Plugin goal cdk:run-tool provides the Hadoop and HCatalog dependencies.
  • The CDK Maven Plugin goal cdk:run-job provides the Hadoop dependencies. HCatalog should be added as a runtime dependency (example).
  • The hadoop jar command provides the Hadoop dependencies.

However, there are some cases where you may have to provide the relevant Hadoop component dependencies yourself:

  • Crunch programs (even those running in the containers listed above) (example)
  • Standalone Java programs, not run using cdk:run-tool or hadoop jar (example)
  • Web apps (example)

CDK Data Modules

CDK Morphlines Modules

CDK Tools Modules