Full name:



Create a named dataset whose entries conform to a defined schema.


Required Parameters

Name Type Since Description
avroSchemaFile String - The file containing the Avro schema. If no file with the specified name is found on the local filesystem, then the classpath is searched for a matching resource.
User property is: cdk.avroSchemaFile.
datasetName String - The name of the dataset to create.
User property is: cdk.datasetName.

Optional Parameters

Name Type Since Description
format String - The file format (avro or parquet).
User property is: cdk.format.
hadoopConfiguration Properties - Hadoop configuration properties.
User property is: cdk.hadoopConfiguration.
hcatalog boolean - If true, store dataset metadata in HCatalog, otherwise store it on the filesystem.
User property is: cdk.hcatalog.
partitionExpression String - The partition expression, in JEXL format (experimental).
User property is: cdk.partitionExpression.
rootDirectory String - The root directory of the dataset repository. Optional if using HCatalog for metadata storage.
User property is: cdk.rootDirectory.

Parameter Details


The file containing the Avro schema. If no file with the specified name is found on the local filesystem, then the classpath is searched for a matching resource.
  • Type: java.lang.String
  • Required: Yes
  • User Property: cdk.avroSchemaFile


The name of the dataset to create.
  • Type: java.lang.String
  • Required: Yes
  • User Property: cdk.datasetName


The file format (avro or parquet).
  • Type: java.lang.String
  • Required: No
  • User Property: cdk.format


Hadoop configuration properties.
  • Type: java.util.Properties
  • Required: No
  • User Property: cdk.hadoopConfiguration


If true, store dataset metadata in HCatalog, otherwise store it on the filesystem.
  • Type: boolean
  • Required: No
  • User Property: cdk.hcatalog


The partition expression, in JEXL format (experimental).
  • Type: java.lang.String
  • Required: No
  • User Property: cdk.partitionExpression


The root directory of the dataset repository. Optional if using HCatalog for metadata storage.
  • Type: java.lang.String
  • Required: No
  • User Property: cdk.rootDirectory

Back to top

Version: 0.8.1. Last Published: 2013-10-23.

Reflow Maven skin by Andrius Velykis.