API Usage Tutorial
Cloudera Navigator Concepts
The API terminology is similar to that used in the web UI:
- Entity
-
Abstract data structure that describes structural features of any entity.
An entity can be uniquely identified by its identity.
- Relation
-
Describes relationship among entities.
A relationship itself is an entity and like any other entity can have properties, comments associated with it.
API Usage Examples
- Viewing Entity Metadata
- Viewing Relations
- Viewing Lineage
- Modifying Entities
- Modifying Entities using Metadata Preregistration
Viewing Entity Metadata
You can search by any entity field. In this example we search for
file entities and limit the results to just the first two that are found.
To search for the next two after these results, just increase the offset.
Note that we must put quotes around the URL, since it contains an ampersand.
$ curl 'http://localhost:7187/api/v3/entities?query=type:file&limit=2&offset=0' \
-X GET \
-u <username>:<password>
[ {
"identity" : "5c2c5fc0494f07a2d1e2219799aab5b6",
"originalName" : "jobtracker.info",
"sourceId" : "a09b0233cc58ff7d601eaa68673a20c6",
"firstClassParentId" : null,
"parentPath" : "/tmp/mapred/system",
"extractorRunId" : "a09b0233cc58ff7d601eaa68673a20c6##1",
"name" : null,
"description" : null,
"tags" : null,
"properties" : null,
"fileSystemPath" : "/tmp/mapred/system/jobtracker.info",
"type" : "FILE",
"size" : 4,
"created" : "2014-08-25T19:07:17.652Z",
"lastModified" : "2014-08-25T19:07:17.652Z",
"lastAccessed" : "2014-08-25T19:07:17.185Z",
"permissions" : "rwx------",
"owner" : "mapred",
"group" : "supergroup",
"blockSize" : null,
"mimeType" : "application/octet-stream",
"sourceType" : "HDFS",
"deleted" : false,
"replication" : null,
"internalType" : "fselement"
}, {
"identity" : "76a8a355a475396efa110434c80e92db",
"originalName" : "hbase.version",
"sourceId" : "a09b0233cc58ff7d601eaa68673a20c6",
"firstClassParentId" : null,
"parentPath" : "/hbase",
"extractorRunId" : "a09b0233cc58ff7d601eaa68673a20c6##1",
"name" : null,
"description" : null,
"tags" : null,
"properties" : null,
"fileSystemPath" : "/hbase/hbase.version",
"type" : "FILE",
"size" : 7,
"created" : "2014-08-25T19:08:25.719Z",
"lastModified" : "2014-08-25T19:08:25.719Z",
"lastAccessed" : "2014-08-25T19:08:25.401Z",
"permissions" : "rw-r--r--",
"owner" : "hbase",
"group" : "hbase",
"blockSize" : null,
"mimeType" : "application/octet-stream",
"sourceType" : "HDFS",
"deleted" : false,
"replication" : null,
"internalType" : "fselement"
} ]
Searching with AND and OR
You can combine fields with AND and OR. In this example we are searching
for tables that are named either "sample_07" or "sample_08". By leaving
off the limit and offset parameters, we are searching for the first 100
results.
$ curl 'http://localhost:7187/api/v3/entities?query=(((originalName:sample_07)OR(originalName:sample_08))AND(type:TABLE))' \
-X GET \
-u <username>:<password>
[ {
"identity" : "ea27302e11370a3927ac11cbb920891d",
"originalName" : "sample_07",
"sourceId" : "4fbdadc6899638782fc8cb626176dc7b",
"firstClassParentId" : null,
"parentPath" : "/default",
"extractorRunId" : "4fbdadc6899638782fc8cb626176dc7b##1",
"name" : null,
"description" : null,
"tags" : null,
"properties" : null,
"created" : "2014-08-25T19:26:48.000Z",
"lastAccessed" : "1970-01-01T00:00:00.000Z",
"fileSystemPath" : "hdfs://example.com:8020/user/hive/warehouse/sample_07",
"inputFormat" : "org.apache.hadoop.mapred.TextInputFormat",
"outputFormat" : "org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat",
"partColNames" : null,
"clusteredByColNames" : null,
"sortByColNames" : null,
"serdeName" : null,
"serdeLibName" : "org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe",
"serdeProps" : null,
"params" : null,
"owner" : "admin",
"type" : "TABLE",
"deleted" : false,
"compressed" : false,
"sourceType" : "HIVE",
"internalType" : "hv_table"
}, {
"identity" : "9282adb88478c2ce4beb13dbba997ef5",
"originalName" : "sample_08",
"sourceId" : "4fbdadc6899638782fc8cb626176dc7b",
"firstClassParentId" : null,
"parentPath" : "/default",
"extractorRunId" : "4fbdadc6899638782fc8cb626176dc7b##1",
"name" : null,
"description" : null,
"tags" : null,
"properties" : null,
"created" : "2014-08-25T19:26:55.000Z",
"lastAccessed" : "1970-01-01T00:00:00.000Z",
"fileSystemPath" : "hdfs://example.com:8020/user/hive/warehouse/sample_08",
"inputFormat" : "org.apache.hadoop.mapred.TextInputFormat",
"outputFormat" : "org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat",
"partColNames" : null,
"clusteredByColNames" : null,
"sortByColNames" : null,
"serdeName" : null,
"serdeLibName" : "org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe",
"serdeProps" : null,
"params" : null,
"owner" : "admin",
"type" : "TABLE",
"deleted" : false,
"compressed" : false,
"sourceType" : "HIVE",
"internalType" : "hv_table"
} ]
Searching by Date
You can search date fields by a range of possible dates. In this example
we're looking for entities with a created date from July 12th up to but
not including July 19th. The date looks like this:
The date range looks like this:
[
$ curl 'http://localhost:7187/api/v3/entities?query=(created:\[2014-07-11T23:59:59.999Z%20TO%202014-07-19T00:00:000Z\])' \
-X GET \
-u <username>:<password>
[ {
"identity" : "6f45b7ea465986ec65e5ab3f8bdad377",
"originalName" : "commons-io-2.1.jar",
"sourceId" : "a09b0233cc58ff7d601eaa68673a20c6",
"firstClassParentId" : null,
"parentPath" : "/user/oozie/share/lib/lib_20140717134226/hive",
"extractorRunId" : "a09b0233cc58ff7d601eaa68673a20c6##1",
"name" : null,
"description" : null,
"tags" : null,
"properties" : {
"__cloudera_internal__hueLink" : "http://erin3-4.ent.cloudera.com:8888/filebrowser/view/user/oozie/share/lib/lib_20140717134226/hive/commons-io-2.1.jar"
},
"fileSystemPath" : "/user/oozie/share/lib/lib_20140717134226/hive/commons-io-2.1.jar",
"type" : "FILE",
"size" : 163151,
"created" : "2014-07-17T20:42:35.652Z",
"lastModified" : "2014-07-17T20:42:35.652Z",
"lastAccessed" : "2014-07-18T17:30:32.560Z",
"permissions" : "rw-r--r--",
"owner" : "oozie",
"group" : "oozie",
"blockSize" : null,
"mimeType" : "application/octet-stream",
"deleted" : false,
"sourceType" : "HDFS",
"replication" : null,
"internalType" : "fselement"
}, {
"identity" : "bb5ebd766e7ab514b53f6660eed198d7",
"originalName" : "ST4-4.0.4.jar",
"sourceId" : "a09b0233cc58ff7d601eaa68673a20c6",
"firstClassParentId" : null,
"parentPath" : "/user/oozie/share/lib/lib_20140717134226/hive",
"extractorRunId" : "a09b0233cc58ff7d601eaa68673a20c6##1",
"name" : null,
"description" : null,
"tags" : null,
"properties" : {
"__cloudera_internal__hueLink" : "http://erin3-4.ent.cloudera.com:8888/filebrowser/view/user/oozie/share/lib/lib_20140717134226/hive/ST4-4.0.4.jar"
},
"fileSystemPath" : "/user/oozie/share/lib/lib_20140717134226/hive/ST4-4.0.4.jar",
"type" : "FILE",
"size" : 236660,
"created" : "2014-07-17T20:42:35.686Z",
"lastModified" : "2014-07-17T20:42:35.686Z",
"lastAccessed" : "2014-07-18T17:30:32.066Z",
"permissions" : "rw-r--r--",
"owner" : "oozie",
"group" : "oozie",
"blockSize" : null,
"mimeType" : "application/octet-stream",
"deleted" : false,
"sourceType" : "HDFS",
"replication" : null,
"internalType" : "fselement"
} ]
Searching for a Property
Properties are stored as up_
$ curl 'http://localhost:7187/api/v3/entities?query=(up_priority:medium)' \
-X GET \
-u <username>:<password>
[ {
"identity" : "b346a429af2c96ac68116c8c41248d1e",
"originalName" : "salary",
"sourceId" : "4fbdadc6899638782fc8cb626176dc7b",
"firstClassParentId" : "ea27302e11370a3927ac11cbb920891d",
"parentPath" : "/default/sample_07",
"extractorRunId" : "4fbdadc6899638782fc8cb626176dc7b##1",
"name" : null,
"description" : null,
"tags" : null,
"properties" : {
"__cloudera_internal__hueLink" : "http://erin3-4.ent.cloudera.com:8888/metastore/table/default/sample_07",
"priority" : "medium"
},
"dataType" : "int",
"originalDescription" : null,
"type" : "FIELD",
"deleted" : false,
"sourceType" : "HIVE",
"internalType" : "hv_column"
} ]
Searching for a Tag
Use the "tags" field to search for a tag:
$ curl 'http://localhost:7187/api/v3/entities?query=(tags:required)' \
-X GET \
-u <username>:<password>
[ {
"identity" : "f0a7ae7b88333116a92278d686cc716e",
"originalName" : "total_emp",
"sourceId" : "4fbdadc6899638782fc8cb626176dc7b",
"firstClassParentId" : "ea27302e11370a3927ac11cbb920891d",
"parentPath" : "/default/sample_07",
"extractorRunId" : "4fbdadc6899638782fc8cb626176dc7b##1",
"name" : null,
"description" : null,
"tags" : [ "common", "required" ],
"properties" : {
"__cloudera_internal__hueLink" : "http://erin3-4.ent.cloudera.com:8888/metastore/table/default/sample_07"
},
"dataType" : "int",
"originalDescription" : null,
"type" : "FIELD",
"deleted" : false,
"sourceType" : "HIVE",
"internalType" : "hv_column"
} ]
Searching by ID
If you know an entity's ID, you can request that entity directly.
$ curl http://localhost:7187/api/v3/entities/9282adb88478c2ce4beb13dbba997ef5 \
-X GET \
-u <username>:<password>
{
"identity" : "9282adb88478c2ce4beb13dbba997ef5",
"originalName" : "sample_08",
"sourceId" : "4fbdadc6899638782fc8cb626176dc7b",
"firstClassParentId" : null,
"parentPath" : "/default",
"extractorRunId" : "4fbdadc6899638782fc8cb626176dc7b##1",
"name" : null,
"description" : null,
"tags" : null,
"properties" : null,
"created" : "2014-08-25T19:26:55.000Z",
"lastAccessed" : "1970-01-01T00:00:00.000Z",
"fileSystemPath" : "hdfs://example.com:8020/user/hive/warehouse/sample_08",
"inputFormat" : "org.apache.hadoop.mapred.TextInputFormat",
"outputFormat" : "org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat",
"partColNames" : null,
"clusteredByColNames" : null,
"sortByColNames" : null,
"serdeName" : null,
"serdeLibName" : "org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe",
"serdeProps" : null,
"params" : null,
"owner" : "admin",
"type" : "TABLE",
"deleted" : false,
"compressed" : false,
"sourceType" : "HIVE",
"internalType" : "hv_table"
}
Searching for Multiple IDs
You can query for multiple entities at once by specifying ids as GET
parameters.
$ curl 'http://localhost:7187/api/v3/entities/?ids=a09b0233cc58ff7d601eaa68673a20c6&ids=4fbdadc6899638782fc8cb626176dc7b' \
-X GET \
-u <username>:<password>
Searching for Columns in a Table
By querying for a firstClassParentIdentity you can get all of the
columns belonging to a table.
$ curl 'http://localhost:7187/api/v3/entities/?query=firstClassParentIdentity:c28d0187f27749d34bff0b379c1b0853' \
-X GET \
-u <username>:<password>
Viewing Relations
Entities are joined via relations. There are different types of
relations:
- ALIAS: alias relationship, e.g. relation from table to synonym
- CONJOINT: a non-directional relationship, e.g. a relation between a table and an index.
- CONTROL_FLOW: a relationship where source entity controls the data flow relationship for the target entity, e.g. a relation between columns used in INSERT clause and the WHERE clause of a hive query.
- DATA_FLOW: a data flow relationship, e.g. a relation from a file to a mapreduce job that used the file.
- INSTANCE_OF: a relationship between a template and its instance, e.g. an Operation Execution is an instance of an Operation.
- LOGICAL_PHYSICAL: a relation between a logical entity and its physical entity, e.g. a relation between a Hive Query and an MR Job, or relation between Hive Table and HDFS representation of the table.
- PARENT_CHILD: a parent child relationship, e.g. a relation between a directory and a file within that directory.
Many of these relations are directional, so there are roles for the different sides:
- SOURCE: Source of the relationship in a directional relationship like DATA_FLOW.
- TARGET: Target of the relationship in a directional relationship like DATA_FLOW.
- PARENT: Parent entity for a PARENT_CHILD relationship.
- CHILD: Children entities for a PARENT_CHILD relationship.
- LOGICAL: Logical entities for a LOGICAL_PHYSICAL relationship.
- PHYSICAL: Physical entity for a LOGICAL_PHYSICAL relationship.
- ENDPOINT1: One end of a conjoint relationship.
- ENDPOINT2: Other end of a conjoint relationship.
As an example, if we want to search for the relations between a table and its columns, we can search for PARENT_CHILD relations of the table in which the table is the PARENT (in the example below, the entity ID is that of the table). The entity IDs listed in the "children" field of each relation are the column IDs.
$ curl 'http://localhost:7187/api/v3/relations/?entityIds=c28d0187f27749d34bff0b379c1b0853&types=PARENT_CHILD&roles=PARENT' \
-X GET \
-u <username>:<password>
[ {
"identity" : "8573b94ba011783cd8b0611c5af9206c",
"type" : "PARENT_CHILD",
"propagatorId" : null,
"extractorRunId" : "4fbdadc6899638782fc8cb626176dc7b##4280",
"children" : {
"entityIds" : [ "7d68c518939a32a58e7f87ff741c1654" ]
},
"parent" : {
"entityId" : "c28d0187f27749d34bff0b379c1b0853"
},
"unlinked" : false,
"propagatable" : false
}, {
"identity" : "a4b73efa6eb638d64960936351bf1c37",
"type" : "PARENT_CHILD",
"propagatorId" : null,
"extractorRunId" : "4fbdadc6899638782fc8cb626176dc7b##4280",
"children" : {
"entityIds" : [ "8230e93a6a2d7b7a7c2f25ae7b09fafd" ]
},
"parent" : {
"entityId" : "c28d0187f27749d34bff0b379c1b0853"
},
"unlinked" : false,
"propagatable" : false
}, ... ]
Viewing Lineage
You can view the entities comprising the lineage of an entity
$ curl 'http://localhost:7187/api/v3/lineage/?entityIds=a09b0233cc58ff7d601eaa68673a20c6' \
-X GET \
-u <username>:<password>
Modifying Entities
Entity metadata consists of technical metadata and business metadata. Technical metadata (such as originalName and owner) comes from the service that generated the data, and it cannot be modified without going to the service that generated it. For example, to modify the owner of a file, you must go to HDFS -- you cannot modify it through Navigator. Business metadata, on the other hand, can be modified. Business metadata consists of:
- name
- description
- tags
- properties
$ curl http://localhost:7187/api/v3/entities/9282adb88478c2ce4beb13dbba997ef5 \
-X PUT -H "Content-Type: application/json" \
-d '{
"name":"myFavoriteTable",
"description":"This is a description of my favorite table.",
"tags":["fav"],
"properties":{"priority":"highest"}
}' \
-u <username>:<password>
{
"identity" : "9282adb88478c2ce4beb13dbba997ef5",
"originalName" : "sample_08",
"sourceId" : "4fbdadc6899638782fc8cb626176dc7b",
"firstClassParentId" : null,
"parentPath" : "/default",
"extractorRunId" : "4fbdadc6899638782fc8cb626176dc7b##1",
"name" : "myFavoriteTable",
"description" : "This is a description of my favorite table.",
"tags" : [ "fav" ],
"properties" : {
"__cloudera_internal__hueLink" : "http://erin3-4.ent.cloudera.com:8888/metastore/table/default/sample_08",
"priority" : "highest"
},
"created" : "2014-08-25T19:26:55.000Z",
"lastAccessed" : "1970-01-01T00:00:00.000Z",
"fileSystemPath" : "hdfs://example.com:8020/user/hive/warehouse/sample_08",
"inputFormat" : "org.apache.hadoop.mapred.TextInputFormat",
"outputFormat" : "org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat",
"partColNames" : null,
"clusteredByColNames" : null,
"sortByColNames" : null,
"serdeName" : null,
"serdeLibName" : "org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe",
"serdeProps" : null,
"params" : null,
"owner" : "admin",
"type" : "TABLE",
"deleted" : false,
"compressed" : false,
"sourceType" : "HIVE",
"internalType" : "hv_table"
}
You can also remove the business metadata. The PUT request will set all
business metadata fields, so by leaving out name and description, they
will be removed. The __cloudera_internal__hueLink is a special property
used by Navigator that cannot be removed.
$ curl http://localhost:7187/api/v3/entities/9282adb88478c2ce4beb13dbba997ef5 \
-X PUT -H "Content-Type: application/json" \
-d '{
"tags":[],
"properties":{}
}' \
-u <username>:<password>
{
"identity" : "9282adb88478c2ce4beb13dbba997ef5",
"originalName" : "sample_08",
"sourceId" : "4fbdadc6899638782fc8cb626176dc7b",
"firstClassParentId" : null,
"parentPath" : "/default",
"extractorRunId" : "4fbdadc6899638782fc8cb626176dc7b##1",
"name" : null,
"description" : null,
"tags" : null,
"properties" : {
"__cloudera_internal__hueLink" : "http://erin3-4.ent.cloudera.com:8888/metastore/table/default/sample_08"
},
"created" : "2014-08-25T19:26:55.000Z",
"lastAccessed" : "1970-01-01T00:00:00.000Z",
"fileSystemPath" : "hdfs://example.com:8020/user/hive/warehouse/sample_08",
"inputFormat" : "org.apache.hadoop.mapred.TextInputFormat",
"outputFormat" : "org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat",
"partColNames" : null,
"clusteredByColNames" : null,
"sortByColNames" : null,
"serdeName" : null,
"serdeLibName" : "org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe",
"serdeProps" : null,
"params" : null,
"owner" : "admin",
"type" : "TABLE",
"deleted" : false,
"compressed" : false,
"sourceType" : "HIVE",
"internalType" : "hv_table"
}
Modifying Entities using Metadata Preregistration
To refer to an entity that hasn't yet been extracted, you must know the
Source that it will eventually be extracted from. A Source is Navigator's
representation of the service the data is extracted from. You can
preregister HDFS and Hive objects (files, directories, databases, tables,
views, and columns). Let's start by listing the HDFS and Hive Sources.
$ curl 'http://localhost:7187/api/v3/entities?query=((type:SOURCE)AND((sourceType:HDFS)OR(sourceType:Hive)))' \
-X GET \
-u <username>:<password>
[ {
"identity" : "a09b0233cc58ff7d601eaa68673a20c6",
"originalName" : "HDFS-1",
"sourceId" : null,
"firstClassParentId" : null,
"parentPath" : null,
"extractorRunId" : null,
"name" : "HDFS-1",
"description" : null,
"tags" : null,
"properties" : null,
"clusterName" : "Cluster 1",
"sourceUrl" : "hdfs://example.com:8020",
"sourceType" : "HDFS",
"sourceExtractIteration" : 10,
"type" : "SOURCE",
"internalType" : "source"
}, {
"identity" : "4fbdadc6899638782fc8cb626176dc7b",
"originalName" : "HIVE-1",
"sourceId" : null,
"firstClassParentId" : null,
"parentPath" : null,
"extractorRunId" : null,
"name" : "HIVE-1",
"description" : null,
"tags" : null,
"properties" : null,
"clusterName" : "Cluster 1",
"sourceUrl" : "thrift://example.com:9083",
"sourceType" : "HIVE",
"sourceExtractIteration" : 14,
"type" : "SOURCE",
"internalType" : "source"
} ]
Metadata Preregistration for HDFS
Let's create business metadata for a file that doesn't exist yet.
To identify the file, we refer to the source it will belong to (use the
identity from the HDFS source found via the previous API call), the parent
directory, and the filename. The example below is for the file
/user/admin/newFile.
$ curl http://localhost:7187/api/v3/entities/ \
-X POST -H "Content-Type: application/json" \
-d '{
"sourceId":"a09b0233cc58ff7d601eaa68673a20c6",
"parentPath":"/user/admin",
"originalName":"newFile",
"name":"awesomeFile",
"description":"This is going to be an awesome file.",
"tags":["fav"],
"properties":{"priority":"medium"}
}' \
-u <username>:<password>
{
"identity" : "d36f222528518ed955b9d8d8255e53e0",
"originalName" : "newFile",
"sourceId" : "a09b0233cc58ff7d601eaa68673a20c6",
"firstClassParentId" : null,
"parentPath" : "/user/admin",
"extractorRunId" : null,
"name" : "awesomeFile",
"description" : "This is going to be an awesome file.",
"tags" : [ "fav" ],
"properties" : {
"priority" : "medium"
},
"type" : null,
"sourceType" : null,
"internalType" : "UNDEFINED"
}
Directories are handled in the same way. The parent directory is the
directory above the directory you're modifying, and the originalName is the
directory you're modifying. The example below is for the directory
/user/admin/newDirectory/. Navigator does not actually care whether this
entity represents a file or a directory -- that will be determined when the
actual entity is extracted.
$ curl http://localhost:7187/api/v3/entities/ \
-X POST -H "Content-Type: application/json" \
-d '{
"sourceId":"a09b0233cc58ff7d601eaa68673a20c6",
"parentPath":"/user/admin",
"originalName":"newDirectory",
"name":"awesomeFile",
"description":"This is going to be an awesome file.",
"tags":["fav"],
"properties":{"priority":"medium"}
}' \
-u <username>:<password>
{
"identity" : "d36f222528518ed955b9d8d8255e53e0",
"originalName" : "newDirectory",
"sourceId" : "a09b0233cc58ff7d601eaa68673a20c6",
"firstClassParentId" : null,
"parentPath" : "/user/admin",
"extractorRunId" : null,
"name" : "awesomeFile",
"description" : "This is going to be an awesome file.",
"tags" : [ "fav" ],
"properties" : {
"priority" : "medium"
},
"type" : null,
"sourceType" : null,
"internalType" : "UNDEFINED"
}
Metadata Preregistration for Hive
Let's create business metadata for a Hive database that doesn't exist yet.
To identify the database, we refer to the source it will belong to (use the
identity from the Hive source found via the previous API call) and the
database name. In this case, we will leave the parentPath out, since a
database has no entity above it.
$ curl http://localhost:7187/api/v3/entities/ \
-X POST -H "Content-Type: application/json" \
-d '{
"sourceId":"4fbdadc6899638782fc8cb626176dc7b",
"originalName":"newDatabase",
"name":"awesomeDatabase",
"description":"This is going to be an awesome database.",
"tags":["fav"],
"properties":{"priority":"medium"}
}' \
-u <username>:<password>
{
"identity" : "1f29ad529c31f0cda977dfccc3186c3a",
"originalName" : "newDatabase",
"sourceId" : "4fbdadc6899638782fc8cb626176dc7b",
"firstClassParentId" : null,
"parentPath" : null,
"extractorRunId" : null,
"name" : "awesomeDatabase",
"description" : "This is going to be an awesome database.",
"tags" : [ "fav" ],
"properties" : {
"priority" : "medium"
},
"type" : null,
"sourceType" : null,
"internalType" : "UNDEFINED"
}
Let's create business metadata for a Hive table that doesn't exist yet.
The parentPath will refer to the database name that the table will
eventually belong to.
$ curl http://localhost:7187/api/v3/entities/ \
-X POST -H "Content-Type: application/json" \
-d '{
"sourceId":"4fbdadc6899638782fc8cb626176dc7b",
"parentPath":"myDatabase",
"originalName":"newTable",
"name":"awesomeTable",
"description":"This is going to be an awesome table.",
"tags":["fav"],
"properties":{"priority":"medium"}
}' \
-u <username>:<password>
{
"identity" : "1f29ad529c31f0cda977dfccc3186c3a",
"originalName" : "newTable",
"sourceId" : "4fbdadc6899638782fc8cb626176dc7b",
"firstClassParentId" : null,
"parentPath" : "myDatabase",
"extractorRunId" : null,
"name" : "awesomeTable",
"description" : "This is going to be an awesome table.",
"tags" : [ "fav" ],
"properties" : {
"priority" : "medium"
},
"type" : null,
"sourceType" : null,
"internalType" : "UNDEFINED"
}
For a Hive column, the parentPath will refer to the database name and
table name that the column will eventually belong to.
$ curl http://localhost:7187/api/v3/entities/ \
-X POST -H "Content-Type: application/json" \
-d '{
"sourceId":"4fbdadc6899638782fc8cb626176dc7b",
"parentPath":"myDatabase/myTable",
"originalName":"newColumn",
"name":"awesomeColumn",
"description":"This is going to be an awesome column.",
"tags":["fav"],
"properties":{"priority":"medium"}
}' \
-u <username>:<password>
{
"identity" : "1989a647141f86f01400603f53f831c0",
"originalName" : "newColumn",
"sourceId" : "4fbdadc6899638782fc8cb626176dc7b",
"firstClassParentId" : null,
"parentPath" : "myDatabase/myTable",
"extractorRunId" : null,
"name" : "awesomeColumn",
"description" : "This is going to be an awesome column.",
"tags" : [ "fav" ],
"properties" : {
"priority" : "medium"
},
"type" : null,
"sourceType" : null,
"internalType" : "UNDEFINED"
}
Searching for Metadata Preregistration Entries
Let's look at a list of all of the preregistered metadata entries we
have added so far. Note that you can only view the preregistered entities
via the API -- the UI does not display these entities.
$ curl http://localhost:7187/api/v3/entities/?query=-internalType:* \
-X GET \
-u <username>:<password>
[ {
"identity" : "d36f222528518ed955b9d8d8255e53e0",
"originalName" : "newChild",
"sourceId" : "a09b0233cc58ff7d601eaa68673a20c6",
"firstClassParentId" : null,
"parentPath" : "/user/admin",
"extractorRunId" : null,
"name" : "awesomeFile",
"description" : "This is going to be an awesome file.",
"tags" : [ "fav" ],
"properties" : {
"priority" : "medium"
},
"type" : null,
"sourceType" : null,
"internalType" : "UNDEFINED"
}, {
"identity" : "1f29ad529c31f0cda977dfccc3186c3a",
"originalName" : "newDatabase",
"sourceId" : "4fbdadc6899638782fc8cb626176dc7b",
"firstClassParentId" : null,
"parentPath" : null,
"extractorRunId" : null,
"name" : "awesomeDatabase",
"description" : "This is going to be an awesome database.",
"tags" : [ "fav" ],
"properties" : {
"priority" : "medium"
},
"type" : null,
"sourceType" : null,
"internalType" : "UNDEFINED"
} ]
Removing Metadata Preregistration Entries
We don't yet support deleting pre-registration entries. If you create a
bad entry (for example, if you have a typo in the originalName), you can
clear out all of the business metadata to prevent it from being applied
to an entity.
$ curl http://localhost:7187/api/v3/entities/ \
-X POST -H "Content-Type: application/json" \
-d '{
"sourceId":"4fbdadc6899638782fc8cb626176dc7b",
"parentPath":"myDatabase/myTable",
"originalName":"newColumn",
"name":"",
"description":"",
"tags":[],
"properties":{}
}' \
-u <username>:<password>
{
"identity" : "1989a647141f86f01400603f53f831c0",
"originalName" : "newColumn",
"sourceId" : "4fbdadc6899638782fc8cb626176dc7b",
"firstClassParentId" : null,
"parentPath" : "myDatabase/myTable",
"extractorRunId" : null,
"name" : null,
"description" : null,
"tags" : null,
"properties" : null,
"type" : null,
"sourceType" : null,
"internalType" : "UNDEFINED"
}
You can also do this via a PUT request. The ID in the URL is that of the
bad entry you created.
$ curl http://localhost:7187/api/v3/entities/7b453e6c48179e61d7e355b5abd76b2c \
-X PUT -H "Content-Type: application/json" \
-d '{
"tags":[],
"properties":{}
}' \
-u <username>:<password>