API Usage Tutorial

Cloudera Navigator Concepts

The API terminology is similar to that used in the web UI:

Entity

Abstract data structure that describes structural features of any entity.

An entity can be uniquely identified by its identity.

Relation

Describes relationship among entities.

A relationship itself is an entity and like any other entity can have properties, comments associated with it.

API Usage Examples

Viewing Entity Metadata
Viewing Relations
Viewing Lineage
Modifying Entities
Modifying Entities using Metadata Preregistration

Viewing Entity Metadata

You can search by any entity field. In this example we search for file entities and limit the results to just the first two that are found. To search for the next two after these results, just increase the offset. Note that we must put quotes around the URL, since it contains an ampersand. $ curl 'http://localhost:7187/api/v3/entities?query=type:file&limit=2&offset=0' \ -X GET \ -u <username>:<password> [ { "identity" : "5c2c5fc0494f07a2d1e2219799aab5b6", "originalName" : "jobtracker.info", "sourceId" : "a09b0233cc58ff7d601eaa68673a20c6", "firstClassParentId" : null, "parentPath" : "/tmp/mapred/system", "extractorRunId" : "a09b0233cc58ff7d601eaa68673a20c6##1", "name" : null, "description" : null, "tags" : null, "properties" : null, "fileSystemPath" : "/tmp/mapred/system/jobtracker.info", "type" : "FILE", "size" : 4, "created" : "2014-08-25T19:07:17.652Z", "lastModified" : "2014-08-25T19:07:17.652Z", "lastAccessed" : "2014-08-25T19:07:17.185Z", "permissions" : "rwx------", "owner" : "mapred", "group" : "supergroup", "blockSize" : null, "mimeType" : "application/octet-stream", "sourceType" : "HDFS", "deleted" : false, "replication" : null, "internalType" : "fselement" }, { "identity" : "76a8a355a475396efa110434c80e92db", "originalName" : "hbase.version", "sourceId" : "a09b0233cc58ff7d601eaa68673a20c6", "firstClassParentId" : null, "parentPath" : "/hbase", "extractorRunId" : "a09b0233cc58ff7d601eaa68673a20c6##1", "name" : null, "description" : null, "tags" : null, "properties" : null, "fileSystemPath" : "/hbase/hbase.version", "type" : "FILE", "size" : 7, "created" : "2014-08-25T19:08:25.719Z", "lastModified" : "2014-08-25T19:08:25.719Z", "lastAccessed" : "2014-08-25T19:08:25.401Z", "permissions" : "rw-r--r--", "owner" : "hbase", "group" : "hbase", "blockSize" : null, "mimeType" : "application/octet-stream", "sourceType" : "HDFS", "deleted" : false, "replication" : null, "internalType" : "fselement" } ]

Searching with AND and OR

You can combine fields with AND and OR. In this example we are searching for tables that are named either "sample_07" or "sample_08". By leaving off the limit and offset parameters, we are searching for the first 100 results. $ curl 'http://localhost:7187/api/v3/entities?query=(((originalName:sample_07)OR(originalName:sample_08))AND(type:TABLE))' \ -X GET \ -u <username>:<password> [ { "identity" : "ea27302e11370a3927ac11cbb920891d", "originalName" : "sample_07", "sourceId" : "4fbdadc6899638782fc8cb626176dc7b", "firstClassParentId" : null, "parentPath" : "/default", "extractorRunId" : "4fbdadc6899638782fc8cb626176dc7b##1", "name" : null, "description" : null, "tags" : null, "properties" : null, "created" : "2014-08-25T19:26:48.000Z", "lastAccessed" : "1970-01-01T00:00:00.000Z", "fileSystemPath" : "hdfs://example.com:8020/user/hive/warehouse/sample_07", "inputFormat" : "org.apache.hadoop.mapred.TextInputFormat", "outputFormat" : "org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat", "partColNames" : null, "clusteredByColNames" : null, "sortByColNames" : null, "serdeName" : null, "serdeLibName" : "org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe", "serdeProps" : null, "params" : null, "owner" : "admin", "type" : "TABLE", "deleted" : false, "compressed" : false, "sourceType" : "HIVE", "internalType" : "hv_table" }, { "identity" : "9282adb88478c2ce4beb13dbba997ef5", "originalName" : "sample_08", "sourceId" : "4fbdadc6899638782fc8cb626176dc7b", "firstClassParentId" : null, "parentPath" : "/default", "extractorRunId" : "4fbdadc6899638782fc8cb626176dc7b##1", "name" : null, "description" : null, "tags" : null, "properties" : null, "created" : "2014-08-25T19:26:55.000Z", "lastAccessed" : "1970-01-01T00:00:00.000Z", "fileSystemPath" : "hdfs://example.com:8020/user/hive/warehouse/sample_08", "inputFormat" : "org.apache.hadoop.mapred.TextInputFormat", "outputFormat" : "org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat", "partColNames" : null, "clusteredByColNames" : null, "sortByColNames" : null, "serdeName" : null, "serdeLibName" : "org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe", "serdeProps" : null, "params" : null, "owner" : "admin", "type" : "TABLE", "deleted" : false, "compressed" : false, "sourceType" : "HIVE", "internalType" : "hv_table" } ]

Searching by Date

You can search date fields by a range of possible dates. In this example we're looking for entities with a created date from July 12th up to but not including July 19th. The date looks like this:
--T::.Z
The date range looks like this:
[ TO In our query we must replace spaces with %20. Note that the created date is the date the entity was created in its respective service (for example, in HDFS it's the date the file/directory was created) not the date the entity was extracted. $ curl 'http://localhost:7187/api/v3/entities?query=(created:\[2014-07-11T23:59:59.999Z%20TO%202014-07-19T00:00:000Z\])' \ -X GET \ -u <username>:<password> [ { "identity" : "6f45b7ea465986ec65e5ab3f8bdad377", "originalName" : "commons-io-2.1.jar", "sourceId" : "a09b0233cc58ff7d601eaa68673a20c6", "firstClassParentId" : null, "parentPath" : "/user/oozie/share/lib/lib_20140717134226/hive", "extractorRunId" : "a09b0233cc58ff7d601eaa68673a20c6##1", "name" : null, "description" : null, "tags" : null, "properties" : { "__cloudera_internal__hueLink" : "http://erin3-4.ent.cloudera.com:8888/filebrowser/view/user/oozie/share/lib/lib_20140717134226/hive/commons-io-2.1.jar" }, "fileSystemPath" : "/user/oozie/share/lib/lib_20140717134226/hive/commons-io-2.1.jar", "type" : "FILE", "size" : 163151, "created" : "2014-07-17T20:42:35.652Z", "lastModified" : "2014-07-17T20:42:35.652Z", "lastAccessed" : "2014-07-18T17:30:32.560Z", "permissions" : "rw-r--r--", "owner" : "oozie", "group" : "oozie", "blockSize" : null, "mimeType" : "application/octet-stream", "deleted" : false, "sourceType" : "HDFS", "replication" : null, "internalType" : "fselement" }, { "identity" : "bb5ebd766e7ab514b53f6660eed198d7", "originalName" : "ST4-4.0.4.jar", "sourceId" : "a09b0233cc58ff7d601eaa68673a20c6", "firstClassParentId" : null, "parentPath" : "/user/oozie/share/lib/lib_20140717134226/hive", "extractorRunId" : "a09b0233cc58ff7d601eaa68673a20c6##1", "name" : null, "description" : null, "tags" : null, "properties" : { "__cloudera_internal__hueLink" : "http://erin3-4.ent.cloudera.com:8888/filebrowser/view/user/oozie/share/lib/lib_20140717134226/hive/ST4-4.0.4.jar" }, "fileSystemPath" : "/user/oozie/share/lib/lib_20140717134226/hive/ST4-4.0.4.jar", "type" : "FILE", "size" : 236660, "created" : "2014-07-17T20:42:35.686Z", "lastModified" : "2014-07-17T20:42:35.686Z", "lastAccessed" : "2014-07-18T17:30:32.066Z", "permissions" : "rw-r--r--", "owner" : "oozie", "group" : "oozie", "blockSize" : null, "mimeType" : "application/octet-stream", "deleted" : false, "sourceType" : "HDFS", "replication" : null, "internalType" : "fselement" } ]

Searching for a Property

Properties are stored as up_ (the up stands for user property). If you want to search by a property, you must remember to add "up_" to the name. $ curl 'http://localhost:7187/api/v3/entities?query=(up_priority:medium)' \ -X GET \ -u <username>:<password> [ { "identity" : "b346a429af2c96ac68116c8c41248d1e", "originalName" : "salary", "sourceId" : "4fbdadc6899638782fc8cb626176dc7b", "firstClassParentId" : "ea27302e11370a3927ac11cbb920891d", "parentPath" : "/default/sample_07", "extractorRunId" : "4fbdadc6899638782fc8cb626176dc7b##1", "name" : null, "description" : null, "tags" : null, "properties" : { "__cloudera_internal__hueLink" : "http://erin3-4.ent.cloudera.com:8888/metastore/table/default/sample_07", "priority" : "medium" }, "dataType" : "int", "originalDescription" : null, "type" : "FIELD", "deleted" : false, "sourceType" : "HIVE", "internalType" : "hv_column" } ]

Searching for a Tag

Use the "tags" field to search for a tag: $ curl 'http://localhost:7187/api/v3/entities?query=(tags:required)' \ -X GET \ -u <username>:<password> [ { "identity" : "f0a7ae7b88333116a92278d686cc716e", "originalName" : "total_emp", "sourceId" : "4fbdadc6899638782fc8cb626176dc7b", "firstClassParentId" : "ea27302e11370a3927ac11cbb920891d", "parentPath" : "/default/sample_07", "extractorRunId" : "4fbdadc6899638782fc8cb626176dc7b##1", "name" : null, "description" : null, "tags" : [ "common", "required" ], "properties" : { "__cloudera_internal__hueLink" : "http://erin3-4.ent.cloudera.com:8888/metastore/table/default/sample_07" }, "dataType" : "int", "originalDescription" : null, "type" : "FIELD", "deleted" : false, "sourceType" : "HIVE", "internalType" : "hv_column" } ]

Searching by ID

If you know an entity's ID, you can request that entity directly. $ curl http://localhost:7187/api/v3/entities/9282adb88478c2ce4beb13dbba997ef5 \ -X GET \ -u <username>:<password> { "identity" : "9282adb88478c2ce4beb13dbba997ef5", "originalName" : "sample_08", "sourceId" : "4fbdadc6899638782fc8cb626176dc7b", "firstClassParentId" : null, "parentPath" : "/default", "extractorRunId" : "4fbdadc6899638782fc8cb626176dc7b##1", "name" : null, "description" : null, "tags" : null, "properties" : null, "created" : "2014-08-25T19:26:55.000Z", "lastAccessed" : "1970-01-01T00:00:00.000Z", "fileSystemPath" : "hdfs://example.com:8020/user/hive/warehouse/sample_08", "inputFormat" : "org.apache.hadoop.mapred.TextInputFormat", "outputFormat" : "org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat", "partColNames" : null, "clusteredByColNames" : null, "sortByColNames" : null, "serdeName" : null, "serdeLibName" : "org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe", "serdeProps" : null, "params" : null, "owner" : "admin", "type" : "TABLE", "deleted" : false, "compressed" : false, "sourceType" : "HIVE", "internalType" : "hv_table" }

Searching for Multiple IDs

You can query for multiple entities at once by specifying ids as GET parameters. $ curl 'http://localhost:7187/api/v3/entities/?ids=a09b0233cc58ff7d601eaa68673a20c6&ids=4fbdadc6899638782fc8cb626176dc7b' \ -X GET \ -u <username>:<password>

Searching for Columns in a Table

By querying for a firstClassParentIdentity you can get all of the columns belonging to a table. $ curl 'http://localhost:7187/api/v3/entities/?query=firstClassParentIdentity:c28d0187f27749d34bff0b379c1b0853' \ -X GET \ -u <username>:<password>

Viewing Relations

Entities are joined via relations. There are different types of relations:

ALIAS: alias relationship, e.g. relation from table to synonym
CONJOINT: a non-directional relationship, e.g. a relation between a table and an index.
CONTROL_FLOW: a relationship where source entity controls the data flow relationship for the target entity, e.g. a relation between columns used in INSERT clause and the WHERE clause of a hive query.
DATA_FLOW: a data flow relationship, e.g. a relation from a file to a mapreduce job that used the file.
INSTANCE_OF: a relationship between a template and its instance, e.g. an Operation Execution is an instance of an Operation.
LOGICAL_PHYSICAL: a relation between a logical entity and its physical entity, e.g. a relation between a Hive Query and an MR Job, or relation between Hive Table and HDFS representation of the table.
PARENT_CHILD: a parent child relationship, e.g. a relation between a directory and a file within that directory.

Many of these relations are directional, so there are roles for the different sides:

SOURCE: Source of the relationship in a directional relationship like DATA_FLOW.
TARGET: Target of the relationship in a directional relationship like DATA_FLOW.
PARENT: Parent entity for a PARENT_CHILD relationship.
CHILD: Children entities for a PARENT_CHILD relationship.
LOGICAL: Logical entities for a LOGICAL_PHYSICAL relationship.
PHYSICAL: Physical entity for a LOGICAL_PHYSICAL relationship.
ENDPOINT1: One end of a conjoint relationship.
ENDPOINT2: Other end of a conjoint relationship.

As an example, if we want to search for the relations between a table and its columns, we can search for PARENT_CHILD relations of the table in which the table is the PARENT (in the example below, the entity ID is that of the table). The entity IDs listed in the "children" field of each relation are the column IDs.


        $ curl 'http://localhost:7187/api/v3/relations/?entityIds=c28d0187f27749d34bff0b379c1b0853&types=PARENT_CHILD&roles=PARENT' \
        -X GET \
        -u <username>:<password>

        [ {
          "identity" : "8573b94ba011783cd8b0611c5af9206c",
          "type" : "PARENT_CHILD",
          "propagatorId" : null,
          "extractorRunId" : "4fbdadc6899638782fc8cb626176dc7b##4280",
          "children" : {
            "entityIds" : [ "7d68c518939a32a58e7f87ff741c1654" ]
          },
          "parent" : {
            "entityId" : "c28d0187f27749d34bff0b379c1b0853"
          },
          "unlinked" : false,
          "propagatable" : false
        }, {
          "identity" : "a4b73efa6eb638d64960936351bf1c37",
          "type" : "PARENT_CHILD",
          "propagatorId" : null,
          "extractorRunId" : "4fbdadc6899638782fc8cb626176dc7b##4280",
          "children" : {
            "entityIds" : [ "8230e93a6a2d7b7a7c2f25ae7b09fafd" ]
          },
          "parent" : {
            "entityId" : "c28d0187f27749d34bff0b379c1b0853"
          },
          "unlinked" : false,
          "propagatable" : false
        }, ... ]

Viewing Lineage

You can view the entities comprising the lineage of an entity $ curl 'http://localhost:7187/api/v3/lineage/?entityIds=a09b0233cc58ff7d601eaa68673a20c6' \ -X GET \ -u <username>:<password>

Modifying Entities

Entity metadata consists of technical metadata and business metadata. Technical metadata (such as originalName and owner) comes from the service that generated the data, and it cannot be modified without going to the service that generated it. For example, to modify the owner of a file, you must go to HDFS -- you cannot modify it through Navigator. Business metadata, on the other hand, can be modified. Business metadata consists of:

name
description
tags
properties

These are the only metadata fields you can add, update, and remove. In this example we set all 4 types of business metadata:


        $ curl http://localhost:7187/api/v3/entities/9282adb88478c2ce4beb13dbba997ef5 \
        -X PUT -H "Content-Type: application/json" \
        -d '{
          "name":"myFavoriteTable",
          "description":"This is a description of my favorite table.",
          "tags":["fav"],
          "properties":{"priority":"highest"}
        }' \
        -u <username>:<password>

        {
          "identity" : "9282adb88478c2ce4beb13dbba997ef5",
          "originalName" : "sample_08",
          "sourceId" : "4fbdadc6899638782fc8cb626176dc7b",
          "firstClassParentId" : null,
          "parentPath" : "/default",
          "extractorRunId" : "4fbdadc6899638782fc8cb626176dc7b##1",
          "name" : "myFavoriteTable",
          "description" : "This is a description of my favorite table.",
          "tags" : [ "fav" ],
          "properties" : {
            "__cloudera_internal__hueLink" : "http://erin3-4.ent.cloudera.com:8888/metastore/table/default/sample_08",
            "priority" : "highest"
          },
          "created" : "2014-08-25T19:26:55.000Z",
          "lastAccessed" : "1970-01-01T00:00:00.000Z",
          "fileSystemPath" : "hdfs://example.com:8020/user/hive/warehouse/sample_08",
          "inputFormat" : "org.apache.hadoop.mapred.TextInputFormat",
          "outputFormat" : "org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat",
          "partColNames" : null,
          "clusteredByColNames" : null,
          "sortByColNames" : null,
          "serdeName" : null,
          "serdeLibName" : "org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe",
          "serdeProps" : null,
          "params" : null,
          "owner" : "admin",
          "type" : "TABLE",
          "deleted" : false,
          "compressed" : false,
          "sourceType" : "HIVE",
          "internalType" : "hv_table"
        }

You can also remove the business metadata. The PUT request will set all business metadata fields, so by leaving out name and description, they will be removed. The __cloudera_internal__hueLink is a special property used by Navigator that cannot be removed. $ curl http://localhost:7187/api/v3/entities/9282adb88478c2ce4beb13dbba997ef5 \ -X PUT -H "Content-Type: application/json" \ -d '{ "tags":[], "properties":{} }' \ -u <username>:<password> { "identity" : "9282adb88478c2ce4beb13dbba997ef5", "originalName" : "sample_08", "sourceId" : "4fbdadc6899638782fc8cb626176dc7b", "firstClassParentId" : null, "parentPath" : "/default", "extractorRunId" : "4fbdadc6899638782fc8cb626176dc7b##1", "name" : null, "description" : null, "tags" : null, "properties" : { "__cloudera_internal__hueLink" : "http://erin3-4.ent.cloudera.com:8888/metastore/table/default/sample_08" }, "created" : "2014-08-25T19:26:55.000Z", "lastAccessed" : "1970-01-01T00:00:00.000Z", "fileSystemPath" : "hdfs://example.com:8020/user/hive/warehouse/sample_08", "inputFormat" : "org.apache.hadoop.mapred.TextInputFormat", "outputFormat" : "org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat", "partColNames" : null, "clusteredByColNames" : null, "sortByColNames" : null, "serdeName" : null, "serdeLibName" : "org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe", "serdeProps" : null, "params" : null, "owner" : "admin", "type" : "TABLE", "deleted" : false, "compressed" : false, "sourceType" : "HIVE", "internalType" : "hv_table" }

Modifying Entities using Metadata Preregistration

To refer to an entity that hasn't yet been extracted, you must know the Source that it will eventually be extracted from. A Source is Navigator's representation of the service the data is extracted from. You can preregister HDFS and Hive objects (files, directories, databases, tables, views, and columns). Let's start by listing the HDFS and Hive Sources. $ curl 'http://localhost:7187/api/v3/entities?query=((type:SOURCE)AND((sourceType:HDFS)OR(sourceType:Hive)))' \ -X GET \ -u <username>:<password> [ { "identity" : "a09b0233cc58ff7d601eaa68673a20c6", "originalName" : "HDFS-1", "sourceId" : null, "firstClassParentId" : null, "parentPath" : null, "extractorRunId" : null, "name" : "HDFS-1", "description" : null, "tags" : null, "properties" : null, "clusterName" : "Cluster 1", "sourceUrl" : "hdfs://example.com:8020", "sourceType" : "HDFS", "sourceExtractIteration" : 10, "type" : "SOURCE", "internalType" : "source" }, { "identity" : "4fbdadc6899638782fc8cb626176dc7b", "originalName" : "HIVE-1", "sourceId" : null, "firstClassParentId" : null, "parentPath" : null, "extractorRunId" : null, "name" : "HIVE-1", "description" : null, "tags" : null, "properties" : null, "clusterName" : "Cluster 1", "sourceUrl" : "thrift://example.com:9083", "sourceType" : "HIVE", "sourceExtractIteration" : 14, "type" : "SOURCE", "internalType" : "source" } ]

Metadata Preregistration for HDFS

Let's create business metadata for a file that doesn't exist yet. To identify the file, we refer to the source it will belong to (use the identity from the HDFS source found via the previous API call), the parent directory, and the filename. The example below is for the file /user/admin/newFile. $ curl http://localhost:7187/api/v3/entities/ \ -X POST -H "Content-Type: application/json" \ -d '{ "sourceId":"a09b0233cc58ff7d601eaa68673a20c6", "parentPath":"/user/admin", "originalName":"newFile", "name":"awesomeFile", "description":"This is going to be an awesome file.", "tags":["fav"], "properties":{"priority":"medium"} }' \ -u <username>:<password> { "identity" : "d36f222528518ed955b9d8d8255e53e0", "originalName" : "newFile", "sourceId" : "a09b0233cc58ff7d601eaa68673a20c6", "firstClassParentId" : null, "parentPath" : "/user/admin", "extractorRunId" : null, "name" : "awesomeFile", "description" : "This is going to be an awesome file.", "tags" : [ "fav" ], "properties" : { "priority" : "medium" }, "type" : null, "sourceType" : null, "internalType" : "UNDEFINED" }

Directories are handled in the same way. The parent directory is the directory above the directory you're modifying, and the originalName is the directory you're modifying. The example below is for the directory /user/admin/newDirectory/. Navigator does not actually care whether this entity represents a file or a directory -- that will be determined when the actual entity is extracted. $ curl http://localhost:7187/api/v3/entities/ \ -X POST -H "Content-Type: application/json" \ -d '{ "sourceId":"a09b0233cc58ff7d601eaa68673a20c6", "parentPath":"/user/admin", "originalName":"newDirectory", "name":"awesomeFile", "description":"This is going to be an awesome file.", "tags":["fav"], "properties":{"priority":"medium"} }' \ -u <username>:<password> { "identity" : "d36f222528518ed955b9d8d8255e53e0", "originalName" : "newDirectory", "sourceId" : "a09b0233cc58ff7d601eaa68673a20c6", "firstClassParentId" : null, "parentPath" : "/user/admin", "extractorRunId" : null, "name" : "awesomeFile", "description" : "This is going to be an awesome file.", "tags" : [ "fav" ], "properties" : { "priority" : "medium" }, "type" : null, "sourceType" : null, "internalType" : "UNDEFINED" }

Metadata Preregistration for Hive

Let's create business metadata for a Hive database that doesn't exist yet. To identify the database, we refer to the source it will belong to (use the identity from the Hive source found via the previous API call) and the database name. In this case, we will leave the parentPath out, since a database has no entity above it. $ curl http://localhost:7187/api/v3/entities/ \ -X POST -H "Content-Type: application/json" \ -d '{ "sourceId":"4fbdadc6899638782fc8cb626176dc7b", "originalName":"newDatabase", "name":"awesomeDatabase", "description":"This is going to be an awesome database.", "tags":["fav"], "properties":{"priority":"medium"} }' \ -u <username>:<password> { "identity" : "1f29ad529c31f0cda977dfccc3186c3a", "originalName" : "newDatabase", "sourceId" : "4fbdadc6899638782fc8cb626176dc7b", "firstClassParentId" : null, "parentPath" : null, "extractorRunId" : null, "name" : "awesomeDatabase", "description" : "This is going to be an awesome database.", "tags" : [ "fav" ], "properties" : { "priority" : "medium" }, "type" : null, "sourceType" : null, "internalType" : "UNDEFINED" }

Let's create business metadata for a Hive table that doesn't exist yet. The parentPath will refer to the database name that the table will eventually belong to. $ curl http://localhost:7187/api/v3/entities/ \ -X POST -H "Content-Type: application/json" \ -d '{ "sourceId":"4fbdadc6899638782fc8cb626176dc7b", "parentPath":"myDatabase", "originalName":"newTable", "name":"awesomeTable", "description":"This is going to be an awesome table.", "tags":["fav"], "properties":{"priority":"medium"} }' \ -u <username>:<password> { "identity" : "1f29ad529c31f0cda977dfccc3186c3a", "originalName" : "newTable", "sourceId" : "4fbdadc6899638782fc8cb626176dc7b", "firstClassParentId" : null, "parentPath" : "myDatabase", "extractorRunId" : null, "name" : "awesomeTable", "description" : "This is going to be an awesome table.", "tags" : [ "fav" ], "properties" : { "priority" : "medium" }, "type" : null, "sourceType" : null, "internalType" : "UNDEFINED" }

For a Hive column, the parentPath will refer to the database name and table name that the column will eventually belong to. $ curl http://localhost:7187/api/v3/entities/ \ -X POST -H "Content-Type: application/json" \ -d '{ "sourceId":"4fbdadc6899638782fc8cb626176dc7b", "parentPath":"myDatabase/myTable", "originalName":"newColumn", "name":"awesomeColumn", "description":"This is going to be an awesome column.", "tags":["fav"], "properties":{"priority":"medium"} }' \ -u <username>:<password> { "identity" : "1989a647141f86f01400603f53f831c0", "originalName" : "newColumn", "sourceId" : "4fbdadc6899638782fc8cb626176dc7b", "firstClassParentId" : null, "parentPath" : "myDatabase/myTable", "extractorRunId" : null, "name" : "awesomeColumn", "description" : "This is going to be an awesome column.", "tags" : [ "fav" ], "properties" : { "priority" : "medium" }, "type" : null, "sourceType" : null, "internalType" : "UNDEFINED" }

Searching for Metadata Preregistration Entries

Let's look at a list of all of the preregistered metadata entries we have added so far. Note that you can only view the preregistered entities via the API -- the UI does not display these entities. $ curl http://localhost:7187/api/v3/entities/?query=-internalType:* \ -X GET \ -u <username>:<password> [ { "identity" : "d36f222528518ed955b9d8d8255e53e0", "originalName" : "newChild", "sourceId" : "a09b0233cc58ff7d601eaa68673a20c6", "firstClassParentId" : null, "parentPath" : "/user/admin", "extractorRunId" : null, "name" : "awesomeFile", "description" : "This is going to be an awesome file.", "tags" : [ "fav" ], "properties" : { "priority" : "medium" }, "type" : null, "sourceType" : null, "internalType" : "UNDEFINED" }, { "identity" : "1f29ad529c31f0cda977dfccc3186c3a", "originalName" : "newDatabase", "sourceId" : "4fbdadc6899638782fc8cb626176dc7b", "firstClassParentId" : null, "parentPath" : null, "extractorRunId" : null, "name" : "awesomeDatabase", "description" : "This is going to be an awesome database.", "tags" : [ "fav" ], "properties" : { "priority" : "medium" }, "type" : null, "sourceType" : null, "internalType" : "UNDEFINED" } ]

Removing Metadata Preregistration Entries

We don't yet support deleting pre-registration entries. If you create a bad entry (for example, if you have a typo in the originalName), you can clear out all of the business metadata to prevent it from being applied to an entity. $ curl http://localhost:7187/api/v3/entities/ \ -X POST -H "Content-Type: application/json" \ -d '{ "sourceId":"4fbdadc6899638782fc8cb626176dc7b", "parentPath":"myDatabase/myTable", "originalName":"newColumn", "name":"", "description":"", "tags":[], "properties":{} }' \ -u <username>:<password> { "identity" : "1989a647141f86f01400603f53f831c0", "originalName" : "newColumn", "sourceId" : "4fbdadc6899638782fc8cb626176dc7b", "firstClassParentId" : null, "parentPath" : "myDatabase/myTable", "extractorRunId" : null, "name" : null, "description" : null, "tags" : null, "properties" : null, "type" : null, "sourceType" : null, "internalType" : "UNDEFINED" }

You can also do this via a PUT request. The ID in the URL is that of the bad entry you created. $ curl http://localhost:7187/api/v3/entities/7b453e6c48179e61d7e355b5abd76b2c \ -X PUT -H "Content-Type: application/json" \ -d '{ "tags":[], "properties":{} }' \ -u <username>:<password>