com.cloudera.crunch.impl.mr
Class MRPipeline

java.lang.Object
  extended by com.cloudera.crunch.impl.mr.MRPipeline
All Implemented Interfaces:
Pipeline

public class MRPipeline
extends Object
implements Pipeline


Constructor Summary
MRPipeline(Class<?> jarClass)
           
MRPipeline(Class<?> jarClass, org.apache.hadoop.conf.Configuration conf)
           
 
Method Summary
<T> SourceTarget<T>
createIntermediateOutput(PType<T> ptype)
           
 org.apache.hadoop.fs.Path createTempPath()
           
 void done()
          Run any remaining jobs required to generate outputs and then clean up any intermediate data files that were created in this run or previous calls to run.
 void enableDebug()
          Turn on debug logging for jobs that are run from this pipeline.
 org.apache.hadoop.conf.Configuration getConfiguration()
          Returns the Configuration instance associated with this pipeline.
 int getNextAnonymousStageId()
           
<T> Iterable<T>
materialize(PCollection<T> pcollection)
          Create the given PCollection and read the data it contains into the returned Collection instance for client use.
<S> PCollection<S>
read(Source<S> source)
          Converts the given Source into a PCollection that is available to jobs run using this Pipeline instance.
<K,V> PTable<K,V>
read(TableSource<K,V> source)
          A version of the read method for TableSource instances that map to PTables.
 PCollection<String> readTextFile(String pathName)
          A convenience method for reading a text file.
 void run()
          Constructs and executes a series of MapReduce jobs in order to write data to the output targets.
 void setConfiguration(org.apache.hadoop.conf.Configuration conf)
          Set the Configuration to use with this pipeline.
 void write(PCollection<?> pcollection, Target target)
          Write the given collection to the given target on the next pipeline run.
<T> void
writeTextFile(PCollection<T> pcollection, String pathName)
          A convenience method for writing a text file.
 
Methods inherited from class java.lang.Object
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait
 

Constructor Detail

MRPipeline

public MRPipeline(Class<?> jarClass)
           throws IOException
Throws:
IOException

MRPipeline

public MRPipeline(Class<?> jarClass,
                  org.apache.hadoop.conf.Configuration conf)
Method Detail

getConfiguration

public org.apache.hadoop.conf.Configuration getConfiguration()
Description copied from interface: Pipeline
Returns the Configuration instance associated with this pipeline.

Specified by:
getConfiguration in interface Pipeline

setConfiguration

public void setConfiguration(org.apache.hadoop.conf.Configuration conf)
Description copied from interface: Pipeline
Set the Configuration to use with this pipeline.

Specified by:
setConfiguration in interface Pipeline

run

public void run()
Description copied from interface: Pipeline
Constructs and executes a series of MapReduce jobs in order to write data to the output targets.

Specified by:
run in interface Pipeline

done

public void done()
Description copied from interface: Pipeline
Run any remaining jobs required to generate outputs and then clean up any intermediate data files that were created in this run or previous calls to run.

Specified by:
done in interface Pipeline

read

public <S> PCollection<S> read(Source<S> source)
Description copied from interface: Pipeline
Converts the given Source into a PCollection that is available to jobs run using this Pipeline instance.

Specified by:
read in interface Pipeline
Parameters:
source - The source of data
Returns:
A PCollection that references the given source

read

public <K,V> PTable<K,V> read(TableSource<K,V> source)
Description copied from interface: Pipeline
A version of the read method for TableSource instances that map to PTables.

Specified by:
read in interface Pipeline
Parameters:
source - The source of the data
Returns:
A PTable that references the given source

readTextFile

public PCollection<String> readTextFile(String pathName)
Description copied from interface: Pipeline
A convenience method for reading a text file.

Specified by:
readTextFile in interface Pipeline

write

public void write(PCollection<?> pcollection,
                  Target target)
Description copied from interface: Pipeline
Write the given collection to the given target on the next pipeline run.

Specified by:
write in interface Pipeline
Parameters:
pcollection - The collection
target - The output target

materialize

public <T> Iterable<T> materialize(PCollection<T> pcollection)
Description copied from interface: Pipeline
Create the given PCollection and read the data it contains into the returned Collection instance for client use.

Specified by:
materialize in interface Pipeline
Parameters:
pcollection - The PCollection to materialize
Returns:
the data from the PCollection as a read-only Collection

createIntermediateOutput

public <T> SourceTarget<T> createIntermediateOutput(PType<T> ptype)

createTempPath

public org.apache.hadoop.fs.Path createTempPath()

writeTextFile

public <T> void writeTextFile(PCollection<T> pcollection,
                              String pathName)
Description copied from interface: Pipeline
A convenience method for writing a text file.

Specified by:
writeTextFile in interface Pipeline

getNextAnonymousStageId

public int getNextAnonymousStageId()

enableDebug

public void enableDebug()
Description copied from interface: Pipeline
Turn on debug logging for jobs that are run from this pipeline.

Specified by:
enableDebug in interface Pipeline


Copyright © 2012. All Rights Reserved.