|
||||||||||
PREV CLASS NEXT CLASS | FRAMES NO FRAMES | |||||||||
SUMMARY: NESTED | FIELD | CONSTR | METHOD | DETAIL: FIELD | CONSTR | METHOD |
java.lang.Objectcom.cloudera.cdk.data.filesystem.FileSystemDatasetRepository
public class FileSystemDatasetRepository
A DatasetRepository
that stores data in a Hadoop FileSystem
.
Given a FileSystem
, a root directory, and a MetadataProvider
,
this DatasetRepository
implementation can load and store
Dataset
s on both local filesystems as well as the Hadoop Distributed
FileSystem (HDFS). Users may directly instantiate this class with the three
dependencies above and then perform dataset-related operations using any of
the provided methods. The primary methods of interest will be
create(String, com.cloudera.cdk.data.DatasetDescriptor)
, get(String)
, and
drop(String)
which create a new dataset, load an existing
dataset, or delete an existing dataset, respectively. Once a dataset has been created
or loaded, users can invoke the appropriate Dataset
methods to get a reader
or writer as needed.
DatasetRepository
,
Dataset
,
DatasetDescriptor
,
PartitionStrategy
,
MetadataProvider
Nested Class Summary | |
---|---|
static class |
FileSystemDatasetRepository.Builder
A fluent builder to aid in the construction of FileSystemDatasetRepository
instances. |
Constructor Summary | |
---|---|
FileSystemDatasetRepository(FileSystem fileSystem,
Path rootDirectory)
Construct a FileSystemDatasetRepository on the given FileSystem and
root directory, and a FileSystemMetadataProvider with the same FileSystem and root directory. |
|
FileSystemDatasetRepository(FileSystem fileSystem,
Path rootDirectory,
MetadataProvider metadataProvider)
Construct a FileSystemDatasetRepository on the given FileSystem and
root directory, with the given MetadataProvider for metadata storage. |
|
FileSystemDatasetRepository(URI uri)
Construct a FileSystemDatasetRepository with a root directory at the
given URI , and a FileSystemMetadataProvider with the same root
directory. |
Method Summary | |
---|---|
Dataset |
create(String name,
DatasetDescriptor descriptor)
Create a Dataset with the supplied descriptor . |
boolean |
drop(String name)
Drop the named Dataset . |
Dataset |
get(String name)
Get the latest version of a named Dataset . |
FileSystem |
getFileSystem()
|
MetadataProvider |
getMetadataProvider()
|
Path |
getRootDirectory()
|
static PartitionKey |
partitionKeyForPath(Dataset dataset,
URI partitionPath)
Get a PartitionKey corresponding to a partition's filesystem path
represented as a URI . |
protected Path |
pathForDataset(String name)
Implementations should return the fully-qualified path of the data directory for the dataset with the given name. |
String |
toString()
|
Dataset |
update(String name,
DatasetDescriptor descriptor)
Update an existing Dataset to reflect the supplied descriptor . |
Methods inherited from class java.lang.Object |
---|
clone, equals, finalize, getClass, hashCode, notify, notifyAll, wait, wait, wait |
Constructor Detail |
---|
public FileSystemDatasetRepository(FileSystem fileSystem, Path rootDirectory)
FileSystemDatasetRepository
on the given FileSystem
and
root directory, and a FileSystemMetadataProvider
with the same FileSystem
and root directory.
fileSystem
- the filesystem to store metadata and datasets inrootDirectory
- the root directory for metadata and datasetspublic FileSystemDatasetRepository(URI uri)
FileSystemDatasetRepository
with a root directory at the
given URI
, and a FileSystemMetadataProvider
with the same root
directory.
uri
- the root directory for metadata and datasetspublic FileSystemDatasetRepository(FileSystem fileSystem, Path rootDirectory, MetadataProvider metadataProvider)
FileSystemDatasetRepository
on the given FileSystem
and
root directory, with the given MetadataProvider
for metadata storage.
fileSystem
- the filesystem to store datasets inrootDirectory
- the root directory for datasetsmetadataProvider
- the provider for metadata storageMethod Detail |
---|
public Dataset create(String name, DatasetDescriptor descriptor)
DatasetRepository
Dataset
with the supplied descriptor
. Depending on
the underlying dataset storage, some schemas types or configurations may
not be supported. If an illegal schema is supplied, an exception will be
thrown by the implementing class. It is illegal to create a more than one
dataset with a given name. If a duplicate name is provided, an exception is
thrown.
create
in interface DatasetRepository
name
- The fully qualified dataset namedescriptor
- A descriptor that describes the schema and other properties of the
dataset
public Dataset update(String name, DatasetDescriptor descriptor)
DatasetRepository
Dataset
to reflect the supplied descriptor
. The
common case is updating a dataset schema. Depending on
the underlying dataset storage, some updates may not be supported,
such as a change in format or partition strategy.
Any attempt to make an unsupported or incompatible update will result in an
exception being thrown and no change being made to the dataset.
update
in interface DatasetRepository
name
- The fully qualified dataset namedescriptor
- A descriptor that describes the schema and other properties of the
dataset
public Dataset get(String name)
DatasetRepository
Dataset
. If no dataset with the
provided name
exists, a DatasetRepositoryException
is thrown.
get
in interface DatasetRepository
name
- The name of the dataset.public boolean drop(String name)
DatasetRepository
Dataset
. If no dataset with the
provided name
exists, a DatasetReaderException
is thrown.
drop
in interface DatasetRepository
name
- The name of the dataset.
true
if the dataset was successfully dropped, false otherwisepublic static PartitionKey partitionKeyForPath(Dataset dataset, URI partitionPath)
PartitionKey
corresponding to a partition's filesystem path
represented as a URI
. If the path is not a valid partition,
then IllegalArgumentException
is thrown. Note that the partition does not
have to exist.
dataset
- the filesystem datasetpartitionPath
- a directory path where the partition data is stored
protected Path pathForDataset(String name)
Implementations should return the fully-qualified path of the data directory for the dataset with the given name.
This method is for internal use only and users should not call it directly.
public String toString()
toString
in class Object
public Path getRootDirectory()
public FileSystem getFileSystem()
FileSystem
on which datasets are stored.public MetadataProvider getMetadataProvider()
MetadataProvider
being used by this repository.
|
||||||||||
PREV CLASS NEXT CLASS | FRAMES NO FRAMES | |||||||||
SUMMARY: NESTED | FIELD | CONSTR | METHOD | DETAIL: FIELD | CONSTR | METHOD |