com.cloudera.cdk.morphline.saxon
Class ConvertHTMLBuilder
java.lang.Object
com.cloudera.cdk.morphline.saxon.ConvertHTMLBuilder
- All Implemented Interfaces:
- CommandBuilder
public final class ConvertHTMLBuilder
- extends Object
- implements CommandBuilder
Command that converts HTML to XHTML using the TagSoup library.
Instead of parsing well-formed or valid XML, this command parses HTML as it is found in the wild:
poor, nasty and brutish, though quite often far from short. TagSoup (and hence this command) is
designed for people who have to process this stuff using some semblance of a rational application
design. By providing this converter, it allows standard XML tools to be applied to even the
worst HTML.
Methods inherited from class java.lang.Object |
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait |
ConvertHTMLBuilder
public ConvertHTMLBuilder()
getNames
public Collection<String> getNames()
- Description copied from interface:
CommandBuilder
- Returns the names with which this command can be invoked.
The returned set can contain synonyms to enable backwards compatible name changes.
- Specified by:
getNames
in interface CommandBuilder
build
public Command build(com.typesafe.config.Config config,
Command parent,
Command child,
MorphlineContext context)
- Description copied from interface:
CommandBuilder
- Creates and returns a command rooted at the given morphline config.
The command will feed records into child. The command will have parent as it's parent.
Additional parameters can be passed via the morphline context.
- Specified by:
build
in interface CommandBuilder
Copyright © 2013 Cloudera. All rights reserved.