cdk:create-dataset
Full name:
com.cloudera.cdk:cdk-maven-plugin:0.9.0:create-dataset
Description:
Create a named dataset whose entries conform to a defined schema.
Attributes:
- Requires dependency resolution of artifacts in scope: compile.
Required Parameters
Name | Type | Since | Description |
---|---|---|---|
datasetName | String | - | The name of the dataset to create. User property is: cdk.datasetName. |
Optional Parameters
Name | Type | Since | Description |
---|---|---|---|
avroSchemaFile | String | - | The file containing the Avro schema. If no file with the specified name is found on the local filesystem, then the classpath is searched for a matching resource. One of either this property or cdk.avroSchemaReflectClass must be specified. User property is: cdk.avroSchemaFile. |
avroSchemaReflectClass | String | - | The fully-qualified classname of the Avro reflect class to use to generate a schema. The class must be available on the classpath. One of either this property or cdk.avroSchemaFile must be specified. User property is: cdk.avroSchemaReflectClass. |
format | String | - | The file format (avro or parquet). User property is: cdk.format. |
hadoopConfiguration | Properties | - | Hadoop configuration properties. User property is: cdk.hadoopConfiguration. |
hcatalog | boolean | - | If true, store dataset metadata in HCatalog, otherwise store it on the filesystem. User property is: cdk.hcatalog. |
partitionExpression | String | - | The partition expression, in JEXL format (experimental). User property is: cdk.partitionExpression. |
repositoryUri | String | - | The URI specifying the dataset repository, e.g. repo:hdfs://host:8020/data. Optional, but if specified then cdk.rootDirectory and cdk.hcatalog are ignored. User property is: cdk.repositoryUri. |
rootDirectory | String | - | The root directory of the dataset repository. Optional if using HCatalog for metadata storage. User property is: cdk.rootDirectory. |
Parameter Details
The file containing the Avro schema. If no file with the specified name is found on the local filesystem, then the classpath is searched for a matching resource. One of either this property or
cdk.avroSchemaReflectClass must be specified.
- Type: java.lang.String
- Required: No
- User Property: cdk.avroSchemaFile
The fully-qualified classname of the Avro reflect class to use to generate a schema. The class must be available on the classpath. One of either this property or
cdk.avroSchemaFile must be specified.
- Type: java.lang.String
- Required: No
- User Property: cdk.avroSchemaReflectClass
The name of the dataset to create.
- Type: java.lang.String
- Required: Yes
- User Property: cdk.datasetName
The file format (avro or parquet).
- Type: java.lang.String
- Required: No
- User Property: cdk.format
Hadoop configuration properties.
- Type: java.util.Properties
- Required: No
- User Property: cdk.hadoopConfiguration
If true, store dataset metadata in HCatalog, otherwise store it on the filesystem.
- Type: boolean
- Required: No
- User Property: cdk.hcatalog
The partition expression, in JEXL format (experimental).
- Type: java.lang.String
- Required: No
- User Property: cdk.partitionExpression
The URI specifying the dataset repository, e.g.
repo:hdfs://host:8020/data. Optional, but if specified then
cdk.rootDirectory and
cdk.hcatalog are ignored.
- Type: java.lang.String
- Required: No
- User Property: cdk.repositoryUri
The root directory of the dataset repository. Optional if using HCatalog for metadata storage.
- Type: java.lang.String
- Required: No
- User Property: cdk.rootDirectory