com.cloudera.cdk.morphline.saxon
Class ConvertHTMLBuilder
java.lang.Object
com.cloudera.cdk.morphline.saxon.ConvertHTMLBuilder
- All Implemented Interfaces:
- CommandBuilder
public final class ConvertHTMLBuilder
- extends Object
- implements CommandBuilder
Command that converts HTML to XHTML using the TagSoup library.
Instead of parsing well-formed or valid XML, this command parses HTML as it is found in the wild:
poor, nasty and brutish, though quite often far from short. TagSoup (and hence this command) is
designed for people who have to process this stuff using some semblance of a rational application
design. By providing this converter, it allows standard XML tools to be applied to even the
worst HTML.
Methods inherited from class java.lang.Object |
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait |
ConvertHTMLBuilder
public ConvertHTMLBuilder()
getNames
public Collection<String> getNames()
- Description copied from interface:
CommandBuilder
- Returns the names with which this command can be invoked.
The returned set can contain synonyms to enable backwards compatible name changes.
- Specified by:
getNames
in interface CommandBuilder
build
public Command build(com.typesafe.config.Config config,
Command parent,
Command child,
MorphlineContext context)
- Description copied from interface:
CommandBuilder
- Creates and returns a command rooted at the given morphline JSON
config
.
The command will feed records into child
. The command will have
parent
as it's parent. Additional parameters can be passed via the morphline
context
.
- Specified by:
build
in interface CommandBuilder
Copyright © 2013 Cloudera. All rights reserved.