Add the following dependency in your project
<dependency> <groupId>com.cloudera.llama</groupId> <artifactId>mini-llama</artifactId> <version>1.0.0-cdh5.1.3-SNAPSHOT</version> <scope>test</scope> </dependency>
The Maven repository where SNAPSHOT artifacts are currently deployed is https://repository.cloudera.com/artifactory/libs-snapshot-local
IMPORTANT: By default, MiniLlama uses an ephemeral port bound to localhost only.
Using MiniLlama will also start a Hadoop HDFS/Yarn minicluster. You need to specify how many nodes (number of DataNodes and NodeManagers) for the Hadoop HDFS/Yarn minicluster.
You must have in your classpath (in a directory, not in a JAR) a fair-scheduler.xml file defining the FairScheduler queue configuration.
For example:
public class TestUsingMiniLlama { private Configuration createMiniLlamaConfiguration() { URL url = Thread.currentThread().getContextClassLoader().getResource( "fair-scheduler.xml"); String fsallocationFile = url.toExternalForm(); fsallocationFile = fsallocationFile.substring("file://".length()); Configuration conf = MiniLlama.createMiniClusterConf(1); conf.set("yarn.scheduler.fair.allocation.file", fsallocationFile); conf.set(LlamaAM.INITIAL_QUEUES_KEY, "default"); return conf; } @Test public void testMiniLlama() throws Exception { Configuration conf = createMiniLlamaConfiguration(); try { server.start(); String serverHost = server.getAddressHost(); String serverPort = server.getAddressPort(); TTransport transport = new TSocket(server.getAddressHost(), server.getAddressPort()); transport.open(); TProtocol protocol = new TBinaryProtocol(transport); LlamaAMService.Client client = new LlamaAMService.Client(protocol); .... } finally { server.stop(); } } }
Example of a minimal fair-scheduler.xml configuration file defining 3 queues:
<allocations> <queue name="queue1"> </queue> <queue name="queue2"> </queue> <queue name="queue3"> </queue> </allocations>
IMPORTANT: MiniLlama from the command-line is available from the mini-llama tarball, it is not available in the llama-dist tarball.
MiniLlama requires a local Hadoop cluster installation to pick up the Hadoop JAR files. The HADOOP_HOME environment variable must be defined.
MiniLlama uses the following 2 configuration files from the installation directory:
To run MiniLlama use the minillama script.
The minillama script supports the following commands and options:
usage: minillama help : display usage for all commands or specified command minillama minicluster <OPTIONS> : start embedded mini HDFS/Yarn cluster -hdfsnoformat don't format mini HDFS -hdfswriteconf <arg> file to write mini HDFS configuration -nodes <arg> number of nodes (default: 1) minillama cluster : use external HDFS/Yarn cluster
If MiniLlama is started with the minicluster sub-command, MiniLlama will bootstrap a HDFS/Yarn minicluster.
If MiniLlama is started with the cluster sub-command, it will use an existing HDFS/Yarn pseudo cluster.
To use a MiniLlama instance running against a multi-node cluster in a single machine, the configuration of MiniLlama must include the mapping between DataNode and NodeManager pairs, this mapping is of the form DataNodeHost:DataNodePort to NodeManagerHost:NodeManagerPort. This mapping must be set in the llama-site.xml file used by the MiniLlama instance in the llama.minicluster.node.mapper.mapping configuration property using the JDK Properties String encoding. MiniLlama includes a CLI tool, minillamanodemapper, to aid with this encoding.
The minillamanodemapper usage syntax is:
Usage: minillamanodemapper [DNHOST:DNPORT=NMHOST:NMPORT]+
For example:
$ minillamanodemapper localhost:20001=localhost:20002 localhost:20003=localhost:20004 Picked up _JAVA_OPTIONS: -Djava.awt.headless=true # #Mon Jan 13 10:09:50 PST 2014 localhost\:20003=localhost\:20004 localhost\:20001=localhost\:20002 $
The LlamaClient command-line tool allows to interact with a Llama AM running from a proper installation or from MiniLlama.
LlamaClient can run a client callback server listening for notifications and support all Llama AM Thrift API operations: Register, Unregister, GetNodes, Reserve and Release.
IMPORTANT: LlamaClient command-line line is available from the mini-llama tarball, it is not available in the llama-dist tarball.
Follow the steps to install and configure MiniLlama.
The LlamaClient supports the following sub-command options:
llamaclient help : display usage for all commands or specified command llamaclient uuid : generate an UUID llamaclient register <OPTIONS> : register client -callback <arg> <HOST>:<PORT> of client's callback server -clientid <arg> client ID -llama <arg> <HOST>:<PORT> of llama -nolog no logging -secure uses kerberos llamaclient unregister <OPTIONS> : unregister client -handle <arg> <UUID> from registration -llama <arg> <HOST>:<PORT> of llama -nolog no logging -secure uses kerberos llamaclient getnodes <OPTIONS> : get cluster nodes -handle <arg> <UUID> from registration -llama <arg> <HOST>:<PORT> of llama -nolog no logging -secure uses kerberos llamaclient reserve <OPTIONS> : make a reservation -cpus <arg> cpus required per location of reservation -handle <arg> <UUID> from registration -llama <arg> <HOST>:<PORT> of llama -locations <arg> locations of reservation, comma separated -memory <arg> memory (MB) required per location of reservation -nogang no gang reservation -nolog no logging -queue <arg> queue of reservation -relaxlocality relax locality -secure uses kerberos llamaclient release <OPTIONS> : release a reservation -handle <arg> <UUID> from registration -llama <arg> <HOST>:<PORT> of llama -nolog no logging -reservation <arg> <UUID> from reservation -secure uses kerberos llamaclient callbackserver <OPTIONS> : run callback server -nolog no logging -port <arg> <PORT> of callback server -secure uses kerberos llamaclient load <OPTIONS> : run a load -allocationtimeout <arg> allocation timeout, millisecs (default 10000) -callback <arg> <HOST>:<PORT> of client's callback server -clients <arg> number of clients -cpus <arg> cpus required per location of reservation -holdtime <arg> time to hold a reservation once allocated (-1 to not wait for allocation and release immediately), millisecs -llama <arg> <HOST>:<PORT> of llama -locations <arg> locations of reservation, comma separated -memory <arg> memory (MB) required per location of reservation -nogang no gang reservation -nolog no logging -queue <arg> queue of reservation -relaxlocality relax locality -rounds <arg> reservations per client -sleeptime <arg> time to sleep between reservations, millisecs *** Load Testing The <<<llamaclient>>> '<<<load>>>' sub-command allows to simulate basic load in Llama. The main motivation for this load testing is to stress Llama and and find out contention points and critical paths in Llama code. For example, a run using 100 clients doing 1000 reservations/releases without waiting for the allocation notification (<<<-holdtime -1>>>) and no sleep time between reservations is:
Number of clients : 100 Reservations per client : 1000 Hold allocations for : 0 ms Sleep between reservations: 0 ms Allocation timeout : 10000 ms
Wall time : 31220 ms
Timed out allocations : 14
Reservation rate : 3204.743738595719 per sec
Register time (mean) : 6.87 ms Reserve time (mean) : 0.5457198443579767 ms Allocate time (mean) : 23.180933852140075 ms Release time (mean) : 0.48540856031128404 ms Unregister time (mean) : 82.76 ms