Scheduler

The application lets you build workflows and then schedule them to run regularly automatically. A monitoring interface shows the progress, logs and allow actions like pausing or stopping jobs.

The Oozie Editor/Dashboard application allows you to define Oozie workflow, coordinator, and bundle applications, run workflow, coordinator, and bundle jobs, and view the status of jobs. For information about Oozie, see Oozie Documentation.

A workflow application is a collection of actions arranged in a directed acyclic graph (DAG). It includes two types of nodes:

  • Control flow - start, end, fork, join, decision, and kill
  • Action - Jobs

A coordinator application allows you to define and execute recurrent and interdependent workflow jobs. The coordinator application defines the conditions under which the execution of workflows can occur.

A bundle application allows you to batch a set of coordinator applications.

Workflows

In the Workflow Editor you can easily perform operations on Oozie action and control nodes.

Action Nodes

The Workflow Editor supports dragging and dropping action nodes. As you move the action over other actions and forks, highlights indicate active areas. If there are actions in the workflow, the active areas are the actions themselves and the areas above and below the actions. If you drop an action on an existing action, a fork and join is added to the workflow.

  • Add actions to the workflow by clicking the action button and drop the action on the workflow. The Edit Node screen displays.

    1. Set the action properties and click Done. Each action in a workflow must have a unique name.
  • Copy an action by clicking the Copy button.

  1. The action is opened in the Edit Node screen.
  2. Edit the action properties and click Done. The action is added to the end of the workflow.
  • Delete an action by clicking the Trash button.
  • Edit an action by clicking the Edit button.
  • Change the position of an action by left-clicking and dragging an action to a new location.

Control Nodes

  • Create a fork and join by dropping an action on top of another action.
  • Remove a fork and join by dragging a forked action and dropping it above the fork.
  • Convert a fork to a decision by clicking the Fork button.
  • To edit a decision:
    1. Click the Edit button.
    2. Fill in the predicates that determine which action to perform and select the default action from the drop-down list.
    3. Click Done.

Note: worfklows.xml and their job.properties cab also directly be selected and executed via the File Browser.

Schedules

In Coordinator Manager you create Oozie coordinator applications and submit them for execution.

Editing a Coordinator

In the Coordinator Editor you specify coordinator properties and the datasets on which the workflow scheduled by the coordinator will operate by stepping through screens in a wizard. You can also advance to particular steps and revisit steps by clicking the Step “tabs” above the screens. The following instructions walk you through the wizard.

  1. Type a name, select the workflow, check the Is shared checkbox to share the job, and click Next. If the Coordinator Editor was opened after scheduling a workflow, the workflow will be set.
  2. Select how many times the coordinator will run for each specified unit, the start and end times of the coordinator, the timezone of the start and end times, and click Next. Times must be expressed as UTC times. For example, to run at 10 pm PST, specify a start time of 6 am UTC of the following day (+8 hours) and set the Timezone field to America/Los_Angeles.
  3. Click Add to select an input dataset and click Next. If no datasets exist, follow the procedure in Creating a Dataset.
  4. Click Add to select an output dataset. Click Save coordinator or click Next to specify advanced settings.
  5. To share the coordinator with all users, check the Isshared checkbox.
  6. Fill in parameters to pass to Oozie, properties that determine how long a coordinator will wait before timing out, how many coordinators can run and wait concurrently, and the coordinator execution policy.
  7. Click Save coordinator.

Creating a Dataset

  1. In the Coordinator Editor, do one of the following:
    • Click here in the Inputs or Outputs pane at the top of the editor.
    • In the pane at the left, click the Create new link. Proceed with Editing a Dataset.

Editing a Dataset

  1. Type a name for the dataset.
  2. In the Start and Frequency fields, specify when and how often the dataset will be available.
  3. In the URI field, specify a URI template for the location of the dataset. To construct URIs and URI paths containing dates and timestamps, you can specify the variables ${YEAR},${MONTH},${DAY},${HOUR},${MINUTE}. For example: hdfs://foo:9000/usr/app/stats/${YEAR}/${MONTH}/data.
  4. In the Instance field, click a button to choose a default, single, or range of data instances. For example, if frequency==DAY, a window of the last rolling 5 days (not including today) would be expressed as start: -5 and end: -1. Check the advanced checkbox to display a field where you can specify a coordinator EL function.
  5. Specify the timezone of the start date.
  6. In the Done flag field, specify the flag that identifies when input datasets are no longer ready.

Bundles

A bundle consists in a collection of schedules.

Creating a Bundle

  1. Click the Create button at the top right.
  2. In the Name field, type a name.
  3. In the Kick off time field, choose a kick off time.
  4. Check the Is shared checkbox to allow all users to access the workflow.
  5. Click Save. The Bundle Editor opens. Proceed with Editing a Bundle.

Editing a Bundle

In the Bundle Editor, you specify properties by stepping through screens in a wizard. You can also advance to particular steps and revisit steps by clicking the Step “tabs” above the screens. The following instructions walk you through the wizard.

  1. Click Add to select a coordinator that the bundle will kick off.
  2. Choose the kick off time. The time must be expressed as a UTC time. For example, to run at 10 pm PST, specify a start time of 6 am UTC of the following day (+8 hours).
  3. To share the bundle with all users, check the Is shared checkbox.
  4. Click Next to specify advanced settings or click Save bundle.
  5. Fill in parameters to pass to Oozie.
  6. Click Save bundle.