Scheduler
The application lets you build workflows and then schedule them to run regularly automatically.
A monitoring interface shows the progress, logs and allow actions like pausing or stopping jobs.
The Oozie Editor/Dashboard application allows you to define Oozie
workflow, coordinator, and bundle applications, run workflow,
coordinator, and bundle jobs, and view the status of jobs. For
information about Oozie, see Oozie
Documentation.
A workflow application is a collection of actions arranged in a directed
acyclic graph (DAG). It includes two types of nodes:
- Control flow - start, end, fork, join, decision, and kill
- Action - Jobs
A coordinator application allows you to define and execute recurrent and
interdependent workflow jobs. The coordinator application defines the
conditions under which the execution of workflows can occur.
A bundle application allows you to batch a set of coordinator
applications.
Workflows
In the Workflow Editor you can easily perform operations on Oozie action
and control nodes.
Action Nodes
The Workflow Editor supports dragging and dropping action nodes. As you
move the action over other actions and forks, highlights indicate active
areas. If there are actions in the workflow, the active areas are the
actions themselves and the areas above and below the actions. If you
drop an action on an existing action, a fork and join is added to the
workflow.
- The action is opened in the Edit Node screen.
- Edit the action properties and click Done. The action is added
to the end of the workflow.
- Delete an action by clicking the Trash button.
- Edit an action by clicking the Edit button.
- Change the position of an action by left-clicking and dragging an
action to a new location.
Control Nodes
- Create a fork and join by dropping an action on top of another
action.
- Remove a fork and join by dragging a forked action and dropping it
above the fork.
- Convert a fork to a decision by clicking the
Fork button.
- To edit a decision:
- Click the Edit button.
- Fill in the predicates that determine which action to perform
and select the default action from the drop-down list.
- Click Done.
Note: worfklows.xml and their job.properties
cab also directly be selected and executed via the File Browser.
Schedules
In Coordinator Manager you create Oozie coordinator applications and
submit them for execution.
Editing a Coordinator
In the Coordinator Editor you specify coordinator properties and the
datasets on which the workflow scheduled by the coordinator will operate
by stepping through screens in a wizard. You can also advance to
particular steps and revisit steps by clicking the Step “tabs” above the
screens. The following instructions walk you through the wizard.
- Type a name, select the workflow, check the Is shared checkbox
to share the job, and click Next. If the Coordinator Editor was
opened after scheduling a workflow, the workflow will be set.
- Select how many times the coordinator will run for each specified
unit, the start and end times of the coordinator, the timezone of
the start and end times, and click Next. Times must be expressed
as UTC times. For example, to run at 10 pm PST, specify a start time
of 6 am UTC of the following day (+8 hours) and set the Timezone
field to America/Los_Angeles.
- Click Add to select an input dataset and click Next. If no
datasets exist, follow the procedure in Creating a
Dataset.
- Click Add to select an output dataset. Click Save
coordinator or click Next to specify advanced settings.
- To share the coordinator with all users, check the Isshared
checkbox.
- Fill in parameters to pass to Oozie, properties that determine how
long a coordinator will wait before timing out, how many
coordinators can run and wait concurrently, and the coordinator
execution policy.
- Click Save coordinator.
Creating a Dataset
- In the Coordinator Editor, do one of the following:
- Click here in the Inputs or Outputs pane at the top of the
editor.
- In the pane at the left, click the Create new link. Proceed
with Editing a Dataset.
Editing a Dataset
- Type a name for the dataset.
- In the Start and Frequency fields, specify when and how often the
dataset will be available.
- In the URI field, specify a URI template for the location of the
dataset. To construct URIs and URI paths containing dates and
timestamps, you can specify the variables
${YEAR},${MONTH},${DAY},${HOUR},${MINUTE}. For example:
hdfs://foo:9000/usr/app/stats/${YEAR}/${MONTH}/data.
- In the Instance field, click a button to choose a default, single,
or range of data instances. For example, if frequency==DAY, a window
of the last rolling 5 days (not including today) would be expressed
as start: -5 and end: -1. Check the advanced checkbox to display a
field where you can specify a coordinator EL
function.
- Specify the timezone of the start date.
- In the Done flag field, specify the flag that identifies when input
datasets are no longer ready.
Bundles
A bundle consists in a collection of schedules.
Creating a Bundle
- Click the Create button at the top right.
- In the Name field, type a name.
- In the Kick off time field, choose a kick off time.
- Check the Is shared checkbox to allow all users to access the
workflow.
- Click Save. The Bundle Editor opens. Proceed with Editing a
Bundle.
Editing a Bundle
In the Bundle Editor, you specify properties by stepping through screens
in a wizard. You can also advance to particular steps and revisit steps
by clicking the Step “tabs” above the screens. The following
instructions walk you through the wizard.
- Click Add to select a coordinator that the bundle will kick off.
- Choose the kick off time. The time must be expressed as a UTC time.
For example, to run at 10 pm PST, specify a start time of 6 am UTC
of the following day (+8 hours).
- To share the bundle with all users, check the Is shared
checkbox.
- Click Next to specify advanced settings or click Save
bundle.
- Fill in parameters to pass to Oozie.
- Click Save bundle.