Hue Browsers power the Data Catalog. They let you easily search, glance and perform actions on data or jobs in Cloud or on premise clusters.
The browsers can be “enriched” with Search and Tagging by metadata services.
The Table Browser enables you to manage the databases, tables, and partitions of the metastore shared by the Hive and Impala. You can perform the following operations:
Search and display metadata like tags and additional description from Catalog backends.
Databases
Tables
The goal of the importer is to allow ad-hoc queries on data not yet in the clusters and simplifies self-service analytics.
If you want to import your own data instead of installing the sample
tables, open the importer from the left menu or from the little +
in the left assist.
To learn more, watch the video on Data Import Wizard.
Note Files can be dragged & dropped, selected from HDFS or S3 (if configured), and their formats are automatically detected. The wizard also assists when performing advanced functionalities like table partitioning, Kudu tables, and nested types.
Import data from relational databases to HDFS file or Hive table using Apache Sqoop. It enables to bring large amount of data into the cluster in just few clicks via interactive UI. The imports run on YARN and are scheduled by Oozie.
Learn more about it on the ingesting data from traditional databases post.
In the past, indexing data into Solr to then explore it with a Dynamic Dashboard has been quite difficult. The task involved writing a Solr schema and a Morphlines file then submitting a job to YARN to do the indexing. Often times getting this correct for non trivial imports could take a few days of work. Now with Hue’s new feature you can start your YARN indexing job in minutes.
Dashboards are an interactive way to explore your data quickly and easily. No programming is required and the analysis is done by drag & drops and clicks.
Read more about Dashboards.
Simply drag & drop widgets that are interconnected together. This is great for exploring new datasets or monitoring without having to type.
The search box support live prefix filtering of field data and comes with a Solr syntax autocomplete in order to make the querying intuitive and quick. Any field can be inspected for its top values of statistic. This analysis happens very fast as the data is indexed.
The top search bar offers a full autocomplete on all the values of the index.
The “More like This” feature lets you selected fields you would like to use to find similar records. This is a great way to find similar issues, customers, people… with regard to a list of attributes.
The File Browser application lets you interact with these file systems HDFS, S3 or ADLS:
Hue is fully compatible with HDFS and is handy for browsing, peeking at file content, upload or downloading data.
Hue can be setup to read and write to a configured S3 account, and users get autocomplete capabilities and can directly query from and save data to S3 without any intermediate moving/copying to HDFS.
Learn more about it on the ADLS integration post.
Note ADLS gen2 is currently not supported.
Google file system is currently not supported.
We’ll take a look at the HBase Browser App.
Note: With just a few changes in the Python API, the HBase browser could be compatible with Apache Kudu or Google Big Table.
The smartview is the view that you land on when you first enter a table. On the left hand side are the row keys and hovering over a row reveals a list of controls on the right. Click a row to select it, and once selected you can perform batch operations, sort columns, or do any amount of standard database operations. To explore a row, simple scroll to the right. By scrolling, the row should continue to lazily-load cells until the end.
To initially populate the table, you can insert a new row or bulk upload CSV/TSV/etc. type data into your table.
On the right hand side of a row is a ‘+’ sign that lets you insert columns into your row
To edit a cell, simply click to edit inline.
If you need more control or data about your cell, click “Full Editor” to edit.
In the full editor, you can view cell history or upload binary data to the cell. Binary data of certain MIME Types are detected, meaning you can view and edit images, PDFs, JSON, XML, and other types directly in your browser!
Hovering over a cell also reveals some more controls (such as the delete button or the timestamp). Click the title to select a few and do batch operations:
If you need some sample data to get started and explore, check out this howto create HBase table tutorial.
The “Smart Searchbar” is a sophisticated tool that helps you zero-in on your data. The smart search supports a number of operations. The most basic ones include finding and scanning row keys. Here I am selecting two row keys with:
domain.100, domain.200
Submitting this query gives me the two rows I was looking for. If I want to fetch rows after one of these, I have to do a scan. This is as easy as writing a ‘+’ followed by the number of rows you want to fetch.
domain.100, domain.200 +5
Fetches domain.100 and domain.200 followed by the next 5 rows. If you’re ever confused about your results, you can look down below and the query bar and also click in to edit your query.
The Smart Search also supports column filtering. On any row, I can specify the specific columns or families I want to retrieve. With:
domain.100[column_family:]
I can select a bare family, or mix columns from different families like so:
domain.100[family1:, family2:, family3:column_a]
Doing this will restrict my results from one row key to the columns I specified. If you want to restrict column families only, the same effect can be achieved with the filters on the right. Just click to toggle a filter.
Finally, let’s try some more complex column filters. I can query for bare columns:
domain.100[column_a]
This will multiply my query over all column families. I can also do prefixes and scans:
domain.100[family: prefix* +3]
This will fetch me all columns that start with prefix* limited to 3 results. Finally, I can filter on range:
domain.100[family: column1 to column100]
This will fetch me all columns in ‘family:’ that are lexicographically >= column1 but <= column100. The first column (‘column1’) must be a valid column, but the second can just be any string for comparison.
The Smart Search also supports prefix filtering on rows. To select a prefixed row, simply type the row key followed by a star *. The prefix should be highlighted like any other searchbar keyword. A prefix scan is performed exactly like a regular scan, but with a prefixed row.
domain.10* +10
Finally, as a new feature, you can also take full advantage of the HBase filteringlanguage, by typing your filter string between curly braces. HBase Browser autocompletes your filters for you so you don’t have to look them up every time. You can apply filters to rows or scans.
domain.1000 {ColumnPrefixFilter('100-') AND ColumnCountGetFilter(3)}
This doc only covers a few basic features of the Smart Search. You can take advantage of the full querying language by referring to the help menu when using the app. These include column prefix, bare columns, column range, etc. Remember that if you ever need help with the searchbar, you can use the help menu that pops up while typing, which will suggest next steps to complete your query.
Solr indexes can be created and are listed in the interface.
Sentry roles and privileges can directly be edited in the Security interface.
Note Sentry is going to be replaced by Apache Ranger in HUE-8748.
Solr privileges can be edited directly via the interface.
For listing collections, query and creating collection:
Admin=*->action=*
Collection=*->action=*
Schema=*->action=*
Config=*->action=*
Kafka topics can be listed.
Note This is currently an experimental feature.
The Job Browser application lets you to examine multiple types of jobs jobs running in the cluster. Job Browser presents the job and tasks in layers for quick access to the logs and troubleshooting.
Any job running on the Resource Manager will be automatically listed. The information will be fetched accordingly if the job got moved to one of the history servers.
There are three ways to access the Query browser:
Query capabilities
Read more about it on Browsing Impala Query Execution within the SQL Editor .
List submitted workflows, schedules and bundles.
List Livy sessions and submitted statements.