Development

This section goes into greater detail on how to build and reuse the components of Hue.

Building

Dependencies

  • The OS specific install instructions are listed in the install guide
  • Python 2.7+ (Python 3 support tracked in HUE-8737)
  • Django (1.11 already included in the distribution)
  • Java (Java 1.8) (should go away after HUE-8740)
  • npm (6.4+)
  • Mako is the templating language
  • Bootstrap
  • Knockout js

Quick Start

Build once:

make apps

Then start the dev server (which will auto reload):

./build/env/bin/hue runserver

If you are changing Javascript or CSS files, also start:

npm run dev

Then it is recommended to use MySQL or PostGres as the [database]((http://cloudera.github.io/hue/latest/).

Open the hue.ini file in a text editor. Directly below the [[database]] line, add the following options (and modify accordingly for your MySQL setup):

host=localhost
port=3306
engine=mysql
user=hue
password=secretpassword
name=hue

Javascript

The javascript files are currently being migrated to webpack bundles, during this process some files will live under src/desktop/static/ and some will live under src/dekstop/js

For changes to the files under src/desktop/js the following applies:

First make sure all third-party dependencies defined in package.json are installed into node_modules/

npm install

Also run this after making changes to package.json, adding new third-party dependencies etc.

To generate the js bundles run:

npm run webpack
npm run webpack-workers
npm run webpack-login

During development the bundles can be autogenerated when it detects changes to the .js files, for this run:

npm run dev

Before sending a review with changes to the bundles run:

npm run lint-fix

and possibly fix any issues it might report.

Documentation

New website:

Install https://gohugo.io/getting-started/quick-start/.

Develop:

hugo serve

Release:

Make sure the base URL is correct:

cd docs/docs-site
vim config.toml
baseURL='http://cloudera.github.io/hue/docs-4.4.0/'

Build the doc website:

hugo

Add it to the github page and push:

git checkout gh-pages
cp ~/docs/docs-site/public docs-4.4.0 -r
rm latest
ln -s docs-4.4.0 latest

git push origin HEAD:gh-pages

Old documentation:

make docs

CSS / LESS

After changing the CSS in a .less file, rebuilding with:

make css

SQL Autocomplete

Install a patched jison:

git clone https://github.com/JohanAhlen/jison
cd jison
npm install -g .

Then run:

make sql-all-parsers

Ace Editor

After modifying files under tools/ace-editor run the following to build ace.js

npm install
make ace

Internationalization

How to update all the messages and compile them:

make locales

How to update and compile the messages of one app:

cd apps/beeswax
make compile-locale

How to create a new locale for an app:

cd $APP_ROOT/src/$APP_NAME/locale
$HUE_ROOT/build/env/bin/pybabel init -D django -i en_US.pot -d . -l fr

Dev environment

Debugging with PyCharm

First step is to configure pycharm to use the Hue virtual environment at ./build/env/env Pycharm virtualenv Second step is to configure the debug configuration Pycharm debug

Debugging with Eclipse

First step is to configure Eclipse to use the Hue virtual environment at ./build/env/env Eclipse interpreter Second step is to configure the debug configuration Eclipse debug Eclipse debug arguments Eclipse debug interpreter

API Server

From 30,000 feet

From up on high

Hue, as a “container” web application, sits in between your Hadoop installation and the browser. It hosts all the Hue Apps, including the built-in ones, and ones that you may write yourself.

The Hue Server

Web Back-end

Hue is a web application built on the Django python web framework. Django, running on the WSGI container/web server (typically CherryPy), manages the url dispatch, executes application logic code, and puts together the views from their templates. Django uses a database (typically sqlite) to manage session data, and Hue applications can use it as well for their “models”. (For example, the saved Editor stores saved queries in the database.)

In addition to the web server, some Hue applications run daemon processes “on the side”. Some examples are the Celery Task Server, Celery Beat.

Interacting with external services

Interacting with Hadoop

Hue provides some APIs for interacting with external services like Databases of File storages. These APIs work by making REST API or Thrift calls the Hadoop daemons.

An Architectural View

Architecture

A Hue application may span three tiers: (1) the UI and user interaction in the client’s browser, (2) the core application logic in the Hue web server, and (3) external services with which applications may interact.

The absolute minimum that you must implement (besides boilerplate), is a “Django view“ function that processes the request and the associated template to render the response into HTML.

Many apps will evolve to have a bit of custom JavaScript and CSS styles. Apps that need to talk to an external service will pull in the code necessary to talk to that service.

File Layout

The Hue “framework” is in desktop/core/ and contains the Web components. desktop/libs/ is the API for talking to various Hadoop services. The installable apps live in apps/. Please place third-party dependencies in the app’s ext-py/ directory.

The typical directory structure for inside an application includes:

  src/
    for Python/Django code
      models.py
      urls.py
      views.py
      forms.py
      settings.py

  conf/
    for configuration (``.ini``) files to be installed

  static/
    for static HTML/js resources and help doc

  templates/
    for data to be put through a template engine

  locales/
    for localizations in multiple languages

For the URLs within your application, you should make your own urls.py which will be automatically rooted at /yourappname/ in the global namespace. See apps/about/src/about/urls.py for an example.

Configurations

File

Hue uses a typed configuration system that reads configuration files (in an ini-style format). By default, Hue loads all *.ini files in the build/desktop/conf directory. The configuration files have the following format:

# This is a comment
[ app_name ]          # Same as your app's name
app_property = "Pink Floyd"

[[ section_a ]]         # The double brackets start a section under [ app_name ]
a_weight = 80         # that is useful for grouping
a_height = 180

[[ filesystems ]]       # Sections are also useful for making a list
[[[ cluster_1 ]]]       # All list members are sub-sections of the same type
namenode_host = localhost
# User may define more:
# [[[ cluster_2 ]]]
# namenode_host = 10.0.0.1

Variables

Your application’s conf.py is special. It provides access to the configuration file (and even default configurations not specified in the file). Using the above example, your conf.py should define the following:

A desktop.lib.conf.Config object for app_property, such as:

MY_PROPERTY = Config(key='app_property', default='Beatles', help='blah')

You can access its value by MY_PROPERTY.get().

A desktop.lib.conf.ConfigSection object for section_a, such as:

SECTION_A = ConfigSection(key='section_a',
      help='blah',
      members=dict(
        AWEIGHT=Config(key='a_weight', type=int, default=0),
        AHEIGHT=Config(key='a_height', type=int, default=0)))

You can access the values by SECTION_A.AWEIGHT.get().

A desktop.lib.conf.UnspecifiedConfigSection object for filesystems, such as:

FS = UnspecifiedConfigSection(
    key='filesystems',
    each=ConfigSection(members=dict(
        nn_host=Config(key='namenode_host', required=True))

An UnspecifiedConfigSection is useful when the children of the section are not known. When Hue loads your application’s configuration, it binds all sub-sections. You can access the values by:

cluster1_val = FS['cluster_1'].nn_host.get()
all_clusters = FS.keys()
for cluster in all_clusters:
    val = FS[cluster].nn_host.get()

Application can automatically detect configuration problems and alert the admin. To take advantage of this feature, create a config_validator function in your conf.py:

def config_validator(user):
  """
  config_validator(user) -> [(config_variable, error_msg)] or None
  Called by core check_config() view.
  """
  res = [ ]
  if not REQUIRED_PROPERTY.get():
    res.append((REQUIRED_PROPERTY, "This variable must be set"))
  if MY_INT_PROPERTY.get() < 0:
    res.append((MY_INT_PROPERTY, "This must be a non-negative number"))
  return res
You should specify the help="..." argument to all configuration related objects in your conf.py. The examples omit some for the sake of space. But you and your application's users can view all the configuration variables by doing:
    $ build/env/bin/hue config_help
  

Saving documents

Each app used to have its own model to store its data (e.g. a saving a SQL query, query history…). All the models have been unified into a single Document2 model in the desktop app:

desktop/core/src/desktop/models.py.

The Document2 model provides automatically creation, sharing and saving. It persists the document data into a json field, which limits the need ot database migrations and simplifies the interaction with the frontend.

Document2 is based on Django Models are Django’s Object-Relational Mapping framework.

Walk-through of a Django View

Django Request

Django is an MVC framework, except that the controller is called a “view” and the “view” is called a “template”. For an application developer, the essential flow to understand is how the “urls.py” file provides a mapping between URLs (expressed as a regular expression, optionally with captured parameters) and view functions. These view functions typically use their arguments (for example, the captured parameters) and their request object (which has, for example, the POST and GET parameters) to prepare dynamic content to be rendered using a template.

Templates: Django and Mako

In Hue, the typical pattern for rendering data through a template is:

from desktop.lib.django_util import render

def view_function(request):
  return render('view_function.mako', request, dict(greeting="hello"))

The render() function chooses a template engine (either Django or Mako) based on the extension of the template file (“.html” or “.mako”). Mako templates are more powerful, in that they allow you to run arbitrary code blocks quite easily, and are more strict (some would say finicky); Django templates are simpler, but are less expressive.

Authentication Backends

Hue exposes a configuration flag (“auth”) to configure a custom authentication backend. See http://docs.djangoproject.com/en/dev/topics/auth/#writing-an-authentication-backend for writing such a backend.

In addition to that, backends may support a manages_passwords_externally() method, returning True or False, to tell the user manager application whether or not changing passwords within Hue is possible.

Authorization

Applications may define permission sets for different actions. Administrators can assign permissions to user groups in the UserAdmin application. To define custom permission sets, modify your app’s settings.py to create a list of (identifier, description) tuples:

PERMISSION_ACTIONS = [
  ("delete", "Delete really important data"),
  ("email", "Send email to the entire company"),
  ("identifier", "Description of the permission")
]

Then you can use this decorator on your view functions to enforce permission:

@desktop.decorators.hue_permission_required("delete", "my_app_name")
def delete_financial_report(request):
  ...

Using and Installing Thrift

Right now, we check in the generated thrift code. To generate the code, you’ll need the thrift binary version 0.9.0. Please download from http://thrift.apache.org/.

The modules using Thrift have some helper scripts like regenerate_thrift.sh for regenerating the code from the interfaces.

Profiling

Hue has a profiling system built in, which can be used to analyze server-side performance of applications. To enable profiling:

build/env/bin/hue runprofileserver

Then, access the page that you want to profile. This will create files like /tmp/useradmin.users.000072ms.2011-02-21T13:03:39.745851.prof. The format for the file names is /tmp/....prof.

Hue uses the hotshot profiling library for instrumentation. The documentation for this library is located at: http://docs.python.org/library/hotshot.html.

You can use kcachegrind to view the profiled data graphically:

hotshot2calltree /tmp/xyz.prof > /tmp/xyz.trace
kcachegrind /tmp/xyz.trace

More generally, you can programmatically inspect a trace:

#!/usr/bin/python
import hotshot.stats
import sys

stats = hotshot.stats.load(sys.argv[1])
stats.sort_stats('cumulative', 'calls')
stats.print_stats(100)

This script takes in a .prof file, and orders function calls by the cumulative time spent in that function, followed by the number of times the function was called, and then prints out the top 100 time-wasters. For information on the other stats available, take a look at this website: http://docs.python.org/library/profile.html#pstats.Stats

Upgrades

After upgrading the version of Hue, running these two commands will make sure the database has the correct tables and fields.

./build/env/bin/hue syncdb
./build/env/bin/hue migrate

Debugging Tips and Tricks

  • Set DESKTOP_DEBUG=1 as an environment variable if you want logs to go to stderr as well as to the respective log files.
  • Use runserver. If you want to set a CLI breakpoint, just insert __import__("ipdb").set_trace() into your code.
  • Django tends to restart its server whenever it notices a file changes. For certain things (like configuration changes), this is not sufficient. Restart the server whole-heartedly.
  • We recommend developing with the Chrome console.

Web interface

Developing applications for Hue requires a minimal amount of CSS (and potentially JavaScript) to use existing functionality.

In a nutshell, front-end development in Hue is using Bootstrap and Knockout js to layout your app and script the custom interactions.

Node.js

During development the bundles can be autogenerated when it detects changes to the .js files, for this run:

npm run dev

CSS Styles

Hue uses Bootstrap version 2.0 CSS styles and layouts. They are highly reusable and flexible. Your app doesn’t have to use these styles, but if you do, it’ll save you some time and make your app look at home in Hue.

On top of the standard Bootstrap styles, Hue defines a small set of custom styles in desktop/core/static/css/jhue.css.

Defining Styles

When you create your application it will provision a CSS file for you in the static/css directory. For organization purposes, your styles should go here (and any images you have should go in static/art). Your app’s name will be a class that is assigned to the root of your app in the DOM. So if you created an app called “calculator” then every window you create for your app will have the class “calculator”. Every style you define should be prefixed with this to prevent you from accidentally colliding with the framework style. Examples:

/* the right way: */
.calculator p {
  /* all my paragraphs should have a margin of 8px */
  margin: 8px;
  /* and a background from my art directory */
  background: url(../art/paragraph.gif);
}
/* the wrong way: */
p {
  /* woops; we're styling all the paragraphs on the page, affecting
     the common header! */
  margin: 8px;
  background: url(../art/paragraph.gif);
}

Icons

You should create an icon for your application that is a transparent png sized 24px by 24px. Your settings.py file should point to your icon via the ICON variable. The create_desktop_app command creates a default icon for you.

If you do not define an application icon, your application will not show up in the navigation bar.

Hue ships with Twitter Bootstrap and Font Awesome 3 (http://fortawesome.github.io/Font-Awesome/) so you have plenty of scalable icons to choose from. You can style your elements to use them like this (in your mako template):

<!-- show a trash icon in a link -->
<a href="#something"><i class="icon-trash"></i> Trash</a>

Static files

For better performances, Hue uses the Django staticfiles app. If in production mode, if you edit some static files, you would need to run this command or make apps. No actions are needed in development mode.

./build/env/bin/hue collectstatic

Testing

The short story

Install the mini cluster (only once):

./tools/jenkins/jenkins.sh slow

Run all the tests:

build/env/bin/hue test all

Or just some parts of the tests, e.g.:

build/env/bin/hue test specific impala
build/env/bin/hue test specific impala.tests:TestMockedImpala
build/env/bin/hue test specific impala.tests:TestMockedImpala.test_basic_flow

Jasmine tests:

npm run test

Longer story

The test management command prepares the arguments (test app names) and passes them to nose (django_nose.nose_runner). Nose will then magically find all the tests to run.

Tests themselves should be named *_test.py. These will be found as long as they’re in packages covered by django. You can use the unittest frameworks, or you can just name your method with the word “test” at a word boundary, and nose will find it. See apps/hello/src/hello/hello_test.py for an example.

Helpful command-line tricks

To run tests that do not depend on Hadoop, use:

build/env/bin/hue test fast

To run all tests, use:

build/env/bin/hue test all

To run only tests of a particular app, use:

build/env/bin/hue test specific <app>

E.g. build/env/bin/hue test specific filebrowser

To run a specific test, use:

build/env/bin/hue test specific <test_func>

E.g. build/env/bin/hue test specific useradmin.tests:test_user_admin

Start up pdb on test failures:

build/env/bin/hue test <args> --pdb --pdb-failure -s

Point to an Impalad and trigger the Impala tests:

build/env/bin/hue test impala impalad-01.gethue.com

Create and run the Jasmine tests

Add them in a “spec” subfolder relative to the file under test and the filename of the test has to end with “Spec.js”.

someFile.js              <- File under test
├── spec/
│   ├── someFileSpec.js  <- File containing tests

Run all the tests once with:

npm run test

Optionally to use Karma and headless chrome for the tests you can run

npm run test-karma

See desktop/core/src/desktop/js/spec/karma.config.js for various options

Special environment variables

DESKTOP_LOGLEVEL=<level>
  level can be DEBUG, INFO, WARN, ERROR, or CRITICAL

  When specified, the console logger is set to the given log level. A console
  logger is created if one is not defined.

DESKTOP_DEBUG
  A shorthand for DESKTOP_LOG_LEVEL=DEBUG. Also turns on output HTML
  validation.

DESKTOP_PROFILE
  Turn on Python profiling. The profile data is saved in a file. See the
  console output for the location of the file.

DESKTOP_LOG_DIR=$dir
  Specify the HUE log directory. Defaults to ./log.

DESKTOP_DB_CONFIG=$db engine:db name:test db name:username:password:host:port
  Specify alternate DB connection parameters for HUE to use. Useful for
  testing your changes against, for example, MySQL instead of sqlite. String
  is a colon-delimited list.

TEST_IMPALAD_HOST=impalad-01.gethue.com
  Point to an Impalad and trigger the Impala tests.

Writing tests that depend on Hadoop

Use pseudo_hdfs4.py! You should tag such tests with “requires_hadoop”, as follows:

from nose.plugins.attrib import attr

@attr('requires_hadoop')
def your_test():
  ...

Jenkins Configuration

Because building Hadoop (for the tests that require it) is slow, we’ve separated the Jenkins builds into “fast” and “slow”. Both are run via scripts/jenkins.sh, which should be kept updated with the latest and greatest in build technologies.

Release

Update the versions to the next release (current release +1):

:100644 100644 4db6d5f... f907d04... M  VERSION
:100644 100644 9332f95... 45b28ad... M  desktop/libs/librdbms/java/pom.xml
:100644 100644 551f62f... 694f021... M  maven/pom.xml

How to count the number of commits since the last release and add them and the authors to the release notes:

git log --oneline --since=2018-01-01 | grep 'release'
git log --oneline --since=2018-01-01 | grep -n '6df64e3'
git log --oneline -449 > scratch.txt

git log --pretty="%an" | sort | uniq > scratch.txt

Pushing the release branch:

git push origin HEAD:branch-4.4.0

Tagging the release:

git tag -a release-4.4.0 -m "release-4.4.0"
git push origin release-4.4.0

Building the tarball release:

make prod

Source of the release: https://github.com/cloudera/hue/archive/release-4.4.0.zip

Other things to update:

Instructions:

docker build https://github.com/cloudera/hue.git#release-4.4.0 -t gethue/hue:4.4.0 -f tools/docker/hue/Dockerfile
docker tag gethue/hue:4.4.0 gethue/latest
docker images
docker login
docker push gethue/hue
docker push gethue/hue:4.4.0

Then send release notes to hue-user, https://twitter.com/gethue!