|
||||||||||
PREV CLASS NEXT CLASS | FRAMES NO FRAMES | |||||||||
SUMMARY: NESTED | FIELD | CONSTR | METHOD | DETAIL: FIELD | CONSTR | METHOD |
java.lang.Object com.cloudera.crunch.DoFn<S,T>
public abstract class DoFn<S,T>
Base class for all data processing functions in Crunch.
Note that all DoFn
instances implement Serializable
,
and thus all of their non-transient member variables must implement
Serializable
as well. If your DoFn depends on non-serializable
classes for data processing, they may be declared as transient
and initialized in the DoFn's initialize
method.
Constructor Summary | |
---|---|
DoFn()
|
Method Summary | |
---|---|
void |
cleanup(Emitter<T> emitter)
Called during the cleanup of the MapReduce job this DoFn
is associated with. |
void |
configure(org.apache.hadoop.conf.Configuration conf)
Called during the job planning phase. |
protected org.apache.hadoop.conf.Configuration |
getConfiguration()
|
protected org.apache.hadoop.mapreduce.Counter |
getCounter(Enum<?> counterName)
|
protected org.apache.hadoop.mapreduce.Counter |
getCounter(String groupName,
String counterName)
|
protected String |
getStatus()
|
protected org.apache.hadoop.mapreduce.TaskAttemptID |
getTaskAttemptID()
|
void |
initialize()
Called during the setup of the MapReduce job this DoFn
is associated with. |
abstract void |
process(S input,
Emitter<T> emitter)
Processes the records from a PCollection . |
protected void |
progress()
|
float |
scaleFactor()
Returns an estimate of how applying this function to a PCollection
will cause it to change in side. |
void |
setConfigurationForTest(org.apache.hadoop.conf.Configuration conf)
Sets a Configuration instance to be used during unit tests. |
void |
setContext(org.apache.hadoop.mapreduce.TaskInputOutputContext<?,?,?,?> context)
Called during setup to pass the TaskInputOutputContext to
this DoFn instance. |
protected void |
setStatus(String status)
|
Methods inherited from class java.lang.Object |
---|
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait |
Constructor Detail |
---|
public DoFn()
Method Detail |
---|
public void configure(org.apache.hadoop.conf.Configuration conf)
conf
- The Configuration instance for the Job.public abstract void process(S input, Emitter<T> emitter)
PCollection
.
input
- The input recordemitter
- The emitter to send the output topublic void initialize()
DoFn
is associated with. Subclasses may override this method to
do appropriate initialization.
public void cleanup(Emitter<T> emitter)
DoFn
is associated with. Subclasses may override this method to do
appropriate cleanup.
emitter
- The emitter that was used for outputpublic void setContext(org.apache.hadoop.mapreduce.TaskInputOutputContext<?,?,?,?> context)
TaskInputOutputContext
to
this DoFn
instance.
public void setConfigurationForTest(org.apache.hadoop.conf.Configuration conf)
Configuration
instance to be used during unit tests.
conf
- The Configuration instance.public float scaleFactor()
PCollection
will cause it to change in side. The optimizer uses these estimates to
decide where to break up dependent MR jobs into separate Map and Reduce
phases in order to minimize I/O.
Subclasses of DoFn
that will substantially alter the size of the
resulting PCollection
should override this method.
protected org.apache.hadoop.conf.Configuration getConfiguration()
protected org.apache.hadoop.mapreduce.Counter getCounter(Enum<?> counterName)
protected org.apache.hadoop.mapreduce.Counter getCounter(String groupName, String counterName)
protected void progress()
protected org.apache.hadoop.mapreduce.TaskAttemptID getTaskAttemptID()
protected void setStatus(String status)
protected String getStatus()
|
||||||||||
PREV CLASS NEXT CLASS | FRAMES NO FRAMES | |||||||||
SUMMARY: NESTED | FIELD | CONSTR | METHOD | DETAIL: FIELD | CONSTR | METHOD |