This software is licensed under the Apache Software License 2.0. A file named LICENSE.txt should have been included with the software.
HDFS provides byte-oriented input and output streams. Most developers prefer to think in terms of higher level objects than files and directories, and frequently graft concepts of tables or datasets on to data stored in HDFS. This library aims to give you that, out of the box, in a pleasant format that works with the rest of the ecosystem, while still giving you efficient access to your data.
Further, weve found that picking from the myriad of file formats and compression options is a weird place from which to start ones Hadoop journey. Rather than say it depends, and lead developers through a decision tree, we decided to create a set of APIs that does what you ultimately want some high percentage of the time. For the rest of the time, well, feature requests are happily accepted (double karma if they come with patches)!
Snappy compressed, binary, Avro data files, according to Avros object container file spec. Avro meets the criteria for sane storage and operation of data. Specifically, Avro:
Why not store data as protocol buffers?
Protos do not define a standard for storing a set of protocol buffer encoded records in a file that supports compression and is also splittable by MapReduce.
See Why not protocol buffers?
See https://github.com/eishay/jvm-serializers/wiki. In other words, because its terrible.
Absolutely! Youre encouraged to read the How to Contribute docs included with the source code. In short, you must: