|
||||||||||
PREV CLASS NEXT CLASS | FRAMES NO FRAMES | |||||||||
SUMMARY: NESTED | FIELD | CONSTR | METHOD | DETAIL: FIELD | CONSTR | METHOD |
java.lang.Objectcom.cloudera.cdk.data.DatasetRepositories
public class DatasetRepositories
Convenience methods for working with DatasetRepository
instances.
Constructor Summary | |
---|---|
DatasetRepositories()
|
Method Summary | |
---|---|
static DatasetRepository |
open(String uri)
Synonym for open(java.net.URI) for String URIs. |
static DatasetRepository |
open(URI repositoryUri)
Open a DatasetRepository} for the given URI. |
static void |
register(URIPattern pattern,
OptionBuilder<DatasetRepository> builder)
Registers a URIPattern and an OptionBuilder to create
instances of DatasetRepository from the pattern's match options. |
Methods inherited from class java.lang.Object |
---|
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait |
Constructor Detail |
---|
public DatasetRepositories()
Method Detail |
---|
public static void register(URIPattern pattern, OptionBuilder<DatasetRepository> builder)
URIPattern
and an OptionBuilder
to create
instances of DatasetRepository from the pattern's match options.
pattern
- a URIPatternbuilder
- an OptionBuilder that expects options defined by
pattern
and builds DatasetRepository instances.public static DatasetRepository open(String uri)
open(java.net.URI)
for String URIs.
uri
- a String URI
IllegalArgumentException
- If the String cannot be parsed into a
valid URI (URI
).public static DatasetRepository open(URI repositoryUri)
Open a DatasetRepository} for the given URI.
This method provides a simpler way to connect to a DatasetRepository
while providing information about the appropriate MetadataProvider
and other options to use. For almost all cases, this is the preferred method
of retrieving an instance of a DatasetRepository
.
The format of a repository URI is as follows.
dsr:[storage component]
The [storage component]
indicates the underlying metadata and,
in some cases, physical storage of the data, along with any options. The
supported storage backends are:
file:[path]
where [path]
is a relative or absolute
filesystem path to be used as the dataset repository root directory in which
to store dataset data. When specifying an absolute path, the
null authority
(i.e. file:///my/path
)
form may be used. Alternatively, the authority section may be omitted
entirely (e.g. file:/my/path
). Either way, it is illegal to
provide an authority (i.e.
file://this-part-is-illegal/my/path
). This storage backend
will produce a DatasetRepository
that stores both data and metadata
on the local operating system filesystem. See
FileSystemDatasetRepository
for more information.
hdfs://[host]:[port]/[path]
where [host]
and
[port]
indicate the location of the Hadoop NameNode, and
[path]
is the dataset repository root directory in which to
store dataset data. This form will load the Hadoop configuration
information per the usual methods (i.e. searching the process's classpath
for the various configuration files). This storage backend will produce a
DatasetRepository
that stores both data and metadata in HDFS. See
FileSystemDatasetRepository
for more information.
hive
will connect to the Hive MetaStore. Dataset locations
will be determined by Hive as managed tables.
hive:/[path]
will also connect to the Hive MetaStore, but
tables will be external and stored under [path]
. The
repository storage layout will be the same as hdfs
and
file
repositories. HDFS connection options can be supplied
by adding hdfs-host
and hdfs-port
query options
to the URI (see examples).
repo:file:foo/bar |
Store data+metadata on the local filesystem in the directory
./foo/bar . |
repo:file:///data |
Store data+metadata on the local filesystem in the directory
/data |
repo:hdfs://localhost:8020/data |
Same as above, but stores data+metadata on HDFS. |
repo:hive |
Connects to the Hive MetaStore and creates managed tables. |
repo:hive:/path?hdfs-host=localhost&hdfs-port=8020 |
Connects to the Hive MetaStore and creates external tables stored in
hdfs://localhost:8020/path . |
repositoryUri
- The repository URI
DatasetRepository
|
||||||||||
PREV CLASS NEXT CLASS | FRAMES NO FRAMES | |||||||||
SUMMARY: NESTED | FIELD | CONSTR | METHOD | DETAIL: FIELD | CONSTR | METHOD |