API Usage Tutorial

Cloudera Navigator Concepts

The API terminology is similar to that used in the web UI:

Entity

Abstract data structure that describes structural features of any entity.

An entity can be uniquely identified by its identity.

Relation

Describes relationship among entities.

A relationship itself is an entity and like any other entity can have properties, comments associated with it.

API Usage Examples

Explore Around

You can search by any entity field. In this example we search for file entities and limit the results to just the first two that are found. To search for the next two after these results, just increase the offset. Note that we must put quotes around the URL, since it contains an ampersand. $ curl 'http://localhost:7187/api/v2/entities?query=type:file&limit=2&offset=0' \ -u <username>:<password> \ -X GET [ { "identity" : "5c2c5fc0494f07a2d1e2219799aab5b6", "originalName" : "jobtracker.info", "sourceId" : "a09b0233cc58ff7d601eaa68673a20c6", "firstClassParentId" : null, "parentPath" : "/tmp/mapred/system", "extractorRunId" : "a09b0233cc58ff7d601eaa68673a20c6##1", "name" : null, "description" : null, "tags" : null, "properties" : null, "fileSystemPath" : "/tmp/mapred/system/jobtracker.info", "type" : "FILE", "size" : 4, "created" : "2014-08-25T19:07:17.652Z", "lastModified" : "2014-08-25T19:07:17.652Z", "lastAccessed" : "2014-08-25T19:07:17.185Z", "permissions" : "rwx------", "owner" : "mapred", "group" : "supergroup", "blockSize" : null, "mimeType" : "application/octet-stream", "sourceType" : "HDFS", "deleted" : false, "replication" : null, "internalType" : "fselement" }, { "identity" : "76a8a355a475396efa110434c80e92db", "originalName" : "hbase.version", "sourceId" : "a09b0233cc58ff7d601eaa68673a20c6", "firstClassParentId" : null, "parentPath" : "/hbase", "extractorRunId" : "a09b0233cc58ff7d601eaa68673a20c6##1", "name" : null, "description" : null, "tags" : null, "properties" : null, "fileSystemPath" : "/hbase/hbase.version", "type" : "FILE", "size" : 7, "created" : "2014-08-25T19:08:25.719Z", "lastModified" : "2014-08-25T19:08:25.719Z", "lastAccessed" : "2014-08-25T19:08:25.401Z", "permissions" : "rw-r--r--", "owner" : "hbase", "group" : "hbase", "blockSize" : null, "mimeType" : "application/octet-stream", "sourceType" : "HDFS", "deleted" : false, "replication" : null, "internalType" : "fselement" } ]

You can combine fields with AND and OR. In this example we are searching for tables that are named either "sample_07" or "sample_08". By leaving off the limit and offset parameters, we are searching for the first 100 results. $ curl 'http://localhost:7187/api/v1/entities?query=(((originalName:sample_07)OR(originalName:sample_08))AND(type:TABLE))' \ -u <username>:<password> \ -X GET [ { "identity" : "ea27302e11370a3927ac11cbb920891d", "originalName" : "sample_07", "sourceId" : "4fbdadc6899638782fc8cb626176dc7b", "firstClassParentId" : null, "parentPath" : "/default", "extractorRunId" : "4fbdadc6899638782fc8cb626176dc7b##1", "name" : null, "description" : null, "tags" : null, "properties" : null, "created" : "2014-08-25T19:26:48.000Z", "lastAccessed" : "1970-01-01T00:00:00.000Z", "fileSystemPath" : "hdfs://example.com:8020/user/hive/warehouse/sample_07", "inputFormat" : "org.apache.hadoop.mapred.TextInputFormat", "outputFormat" : "org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat", "partColNames" : null, "clusteredByColNames" : null, "sortByColNames" : null, "serdeName" : null, "serdeLibName" : "org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe", "serdeProps" : null, "params" : null, "owner" : "admin", "type" : "TABLE", "deleted" : false, "compressed" : false, "sourceType" : "HIVE", "internalType" : "hv_table" }, { "identity" : "9282adb88478c2ce4beb13dbba997ef5", "originalName" : "sample_08", "sourceId" : "4fbdadc6899638782fc8cb626176dc7b", "firstClassParentId" : null, "parentPath" : "/default", "extractorRunId" : "4fbdadc6899638782fc8cb626176dc7b##1", "name" : null, "description" : null, "tags" : null, "properties" : null, "created" : "2014-08-25T19:26:55.000Z", "lastAccessed" : "1970-01-01T00:00:00.000Z", "fileSystemPath" : "hdfs://example.com:8020/user/hive/warehouse/sample_08", "inputFormat" : "org.apache.hadoop.mapred.TextInputFormat", "outputFormat" : "org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat", "partColNames" : null, "clusteredByColNames" : null, "sortByColNames" : null, "serdeName" : null, "serdeLibName" : "org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe", "serdeProps" : null, "params" : null, "owner" : "admin", "type" : "TABLE", "deleted" : false, "compressed" : false, "sourceType" : "HIVE", "internalType" : "hv_table" } ]

If you know an entity's ID, you can request that entity directly. $ curl http://localhost:7187/api/v1/entities/9282adb88478c2ce4beb13dbba997ef5 \ -u <username>:<password> \ -X GET { "identity" : "9282adb88478c2ce4beb13dbba997ef5", "originalName" : "sample_08", "sourceId" : "4fbdadc6899638782fc8cb626176dc7b", "firstClassParentId" : null, "parentPath" : "/default", "extractorRunId" : "4fbdadc6899638782fc8cb626176dc7b##1", "name" : null, "description" : null, "tags" : null, "properties" : null, "created" : "2014-08-25T19:26:55.000Z", "lastAccessed" : "1970-01-01T00:00:00.000Z", "fileSystemPath" : "hdfs://example.com:8020/user/hive/warehouse/sample_08", "inputFormat" : "org.apache.hadoop.mapred.TextInputFormat", "outputFormat" : "org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat", "partColNames" : null, "clusteredByColNames" : null, "sortByColNames" : null, "serdeName" : null, "serdeLibName" : "org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe", "serdeProps" : null, "params" : null, "owner" : "admin", "type" : "TABLE", "deleted" : false, "compressed" : false, "sourceType" : "HIVE", "internalType" : "hv_table" }

You can update an entity to add new business metadata. $ curl http://localhost:7187/api/v2/entities/9282adb88478c2ce4beb13dbba997ef5 \ -u <username>:<password> \ -X PUT -H "Content-Type: application/json" \ -d '{ "name":"myFavoriteTable", "description":"This is a description of my favorite table.", "tags":["fav"], "properties":{"priority":"highest"} }' { "identity" : "9282adb88478c2ce4beb13dbba997ef5", "originalName" : "sample_08", "sourceId" : "4fbdadc6899638782fc8cb626176dc7b", "firstClassParentId" : null, "parentPath" : "/default", "extractorRunId" : "4fbdadc6899638782fc8cb626176dc7b##1", "name" : "myFavoriteTable", "description" : "This is a description of my favorite table.", "tags" : [ "fav" ], "properties" : { "priority" : "highest" }, "created" : "2014-08-25T19:26:55.000Z", "lastAccessed" : "1970-01-01T00:00:00.000Z", "fileSystemPath" : "hdfs://example.com:8020/user/hive/warehouse/sample_08", "inputFormat" : "org.apache.hadoop.mapred.TextInputFormat", "outputFormat" : "org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat", "partColNames" : null, "clusteredByColNames" : null, "sortByColNames" : null, "serdeName" : null, "serdeLibName" : "org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe", "serdeProps" : null, "params" : null, "owner" : "admin", "type" : "TABLE", "deleted" : false, "compressed" : false, "sourceType" : "HIVE", "internalType" : "hv_table" }

Metadata Preregistration

To refer to an entity that hasn't yet been extracted, you must know the Source that it will eventually be extracted from. A Source is Navigator's representation of the service the data is extracted from. You can preregister HDFS and Hive objects (files, directories, databases, tables, views, and columns). Let's start by listing the HDFS and Hive Sources. $ curl 'http://localhost:7187/api/v2/entities?query=((type:SOURCE)AND((sourceType:HDFS)OR(sourceType:Hive)))' \ -u <username>:<password> \ -X GET [ { "identity" : "a09b0233cc58ff7d601eaa68673a20c6", "originalName" : "HDFS-1", "sourceId" : null, "firstClassParentId" : null, "parentPath" : null, "extractorRunId" : null, "name" : "HDFS-1", "description" : null, "tags" : null, "properties" : null, "clusterName" : "Cluster 1", "sourceUrl" : "hdfs://example.com:8020", "sourceType" : "HDFS", "sourceExtractIteration" : 10, "type" : "SOURCE", "internalType" : "source" }, { "identity" : "4fbdadc6899638782fc8cb626176dc7b", "originalName" : "HIVE-1", "sourceId" : null, "firstClassParentId" : null, "parentPath" : null, "extractorRunId" : null, "name" : "HIVE-1", "description" : null, "tags" : null, "properties" : null, "clusterName" : "Cluster 1", "sourceUrl" : "thrift://example.com:9083", "sourceType" : "HIVE", "sourceExtractIteration" : 14, "type" : "SOURCE", "internalType" : "source" } ]

Let's create business metadata for a file that doesn't exist yet. To identify the file, we refer to the source it will belong to (use the identity from the HDFS source found via the previous API call), the parent directory, and the filename. $ curl http://localhost:7187/api/v2/entities/ \ -u <username>:<password> \ -X POST -H "Content-Type: application/json" \ -d '{ "sourceId":"a09b0233cc58ff7d601eaa68673a20c6", "parentPath":"/user/admin", "originalName":"newChild", "name":"awesomeFile", "description":"This is going to be an awesome file.", "tags":["fav"], "properties":{"priority":"medium"} }' { "identity" : "d36f222528518ed955b9d8d8255e53e0", "originalName" : "newChild", "sourceId" : "a09b0233cc58ff7d601eaa68673a20c6", "firstClassParentId" : null, "parentPath" : "/user/admin", "extractorRunId" : null, "name" : "awesomeFile", "description" : "This is going to be an awesome file.", "tags" : [ "fav" ], "properties" : { "priority" : "medium" }, "type" : null, "sourceType" : null, "internalType" : "UNDEFINED" }

Let's create business metadata for a Hive database that doesn't exist yet. To identify the database, we refer to the source it will belong to (use the identity from the Hive source found via the previous API call) and the database name. In this case, we will leave the parentPath out, since a database has no entity above it. $ curl http://localhost:7187/api/v2/entities/ \ -u <username>:<password> \ -X POST -H "Content-Type: application/json" \ -d '{ "sourceId":"4fbdadc6899638782fc8cb626176dc7b", "originalName":"newDatabase", "name":"awesomeDatabase", "description":"This is going to be an awesome database.", "tags":["fav"], "properties":{"priority":"medium"} }' { "identity" : "1f29ad529c31f0cda977dfccc3186c3a", "originalName" : "newDatabase", "sourceId" : "4fbdadc6899638782fc8cb626176dc7b", "firstClassParentId" : null, "parentPath" : null, "extractorRunId" : null, "name" : "awesomeDatabase", "description" : "This is going to be an awesome database.", "tags" : [ "fav" ], "properties" : { "priority" : "medium" }, "type" : null, "sourceType" : null, "internalType" : "UNDEFINED" }

Let's look at a list of all of the preregistered metadata entries we have added so far. $ curl http://localhost:7187/api/v2/entities/?query=-internalType:* \ -u <username>:<password> \ -X GET [{ "identity" : "d36f222528518ed955b9d8d8255e53e0", "originalName" : "newChild", "sourceId" : "a09b0233cc58ff7d601eaa68673a20c6", "firstClassParentId" : null, "parentPath" : "/user/admin", "extractorRunId" : null, "name" : "awesomeFile", "description" : "This is going to be an awesome file.", "tags" : [ "fav" ], "properties" : { "priority" : "medium" }, "type" : null, "sourceType" : null, "internalType" : "UNDEFINED" }, { "identity" : "1f29ad529c31f0cda977dfccc3186c3a", "originalName" : "newDatabase", "sourceId" : "4fbdadc6899638782fc8cb626176dc7b", "firstClassParentId" : null, "parentPath" : null, "extractorRunId" : null, "name" : "awesomeDatabase", "description" : "This is going to be an awesome database.", "tags" : [ "fav" ], "properties" : { "priority" : "medium" }, "type" : null, "sourceType" : null, "internalType" : "UNDEFINED" }]

Let's create business metadata for a Hive column that doesn't exist yet. This time for the parentPath we will refer to the database name and table name that the column will eventually belong to. $ curl http://localhost:7187/api/v2/entities/ \ -u <username>:<password> \ -X POST -H "Content-Type: application/json" \ -d '{ "sourceId":"4fbdadc6899638782fc8cb626176dc7b", "parentPath":"myDatabase/myTable", "originalName":"newColumn", "name":"awesomeColumn", "description":"This is going to be an awesome column.", "tags":["fav"], "properties":{"priority":"medium"} }' { "identity" : "1989a647141f86f01400603f53f831c0", "originalName" : "newColumn", "sourceId" : "4fbdadc6899638782fc8cb626176dc7b", "firstClassParentId" : null, "parentPath" : "myDatabase/myTable", "extractorRunId" : null, "name" : "awesomeColumn", "description" : "This is going to be an awesome column.", "tags" : [ "fav" ], "properties" : { "priority" : "medium" }, "type" : null, "sourceType" : null, "internalType" : "UNDEFINED" }

To remove business metadata, provide empty values for the business metadata fields. $ curl http://localhost:7187/api/v2/entities/ \ -u <username>:<password> \ -X POST -H "Content-Type: application/json" \ -d '{ "sourceId":"4fbdadc6899638782fc8cb626176dc7b", "parentPath":"myDatabase/myTable", "originalName":"newColumn", "name":"", "description":"", "tags":[], "properties":{} }' { "identity" : "1989a647141f86f01400603f53f831c0", "originalName" : "newColumn", "sourceId" : "4fbdadc6899638782fc8cb626176dc7b", "firstClassParentId" : null, "parentPath" : "myDatabase/myTable", "extractorRunId" : null, "name" : null, "description" : null, "tags" : null, "properties" : null, "type" : null, "sourceType" : null, "internalType" : "UNDEFINED" }