Introduction
YARN works by launching an application using a “client” application. For Drill, this is the Drill-on-YARN client. The client can run on any machine that has both the Drill and Hadoop software. Any host from which you currently launch YARN jobs can be the client. The client is not required to be part of the YARN cluster.
When running Drill outside of YARN, you must install Drill on every node in the cluster. With YARN, you only need install Drill only on the client machine; Drill-on-YARN automatically deploys (“localizes”) Drill to the worker nodes.
When running Drill without YARN, many users place their configuration files and custom code within the Drill distribution directory. When running under YARN, all your configuration and custom code resides in the site directory; do not change anything in the Drill install. (This allows Drill-on-YARN to upload your original Drill install archive without rebuilding it.)
Drill-on-YARN Components
Drill-on-YARN uses the following components:
- Drill distribution archive: The original .tar.gz file for your Drill distribution. DrillonYARN uploads this archive to your distributed file system (DFS). YARN downloads it (localized it) to each worker node.
- Drill site directory: A directory that contains your Drill configuration and custom jar files. DrillonYARN copies this directory to each worker node.
- Configuration: A configuration file which tells DrillonYARN how to manage your Drill cluster. This file is separate from your configuration files for Drill itself.
- DrillonYARN client: A command line program to start, stop and monitor your YARN-managed Drill cluster.
- Drill Application Master (AM): The software that works with YARN to request resources, launch Drillbits, and so on. The AM provides a web UI to manage your Drill cluster.
- Drillbit: The Drill daemon software that YARN runs on each node.
Steps of creating a Basic Drill Cluster
Create a Master Directory
To localize Drill files, the client tool requires a copy of the original Drill distribution archive and the location of your site directory. Assume all these components reside in a single “master directory” described as $MASTER_DIR . On the client machine, create the master directory, as shown:
export MASTER_DIR=/path/to/master/dir mkdir $MASTER_DIR cd $MASTER_DIR
Unpack the archive to create $DRILL_HOME. - Create the site directory with the required configuration files.
Install Drill
Follow the Drill Arm64 install directions to install Drill on your client host:
1.Select a Drill version. The name is used in multiple places below. For convenience, define an environment variable for the name:
export DRILL_NAME=apachedrillx.y.z
2. Expand the Drill distribution into this folder to create the master directory
tar -xzf $DRILL_NAME.tar.gz
3. For ease of following the remaining steps, call your expanded Drill folder $DRILL_HOME :
export DRILL_HOME=$MASTER_DIR/$DRILL_NAME
Your master directory should now contain the original Drill archive along with an expanded copy of that archive.
Create the Site Directory
The site directory contains your site-specific files for Drill. If you are converting an existing Drill install, see the “Site Directory” section.
Create the site directory within your master directory:
export DRILL_SITE=$MASTER_DIR/site mkdir $DRILL_SITE
When you do a fresh install, Drill includes a conf directory under $DRILL_HOME. Use the files in that directory to create your site directory.
cp $DRILL_HOME/conf/drill-override-example.conf $DRILL_SITE/drill-override.conf cp $DRILL_HOME/conf/drill-on-yarn-example.conf $DRILL_SITE/drill-on-yarn.conf cp $DRILL_HOME/conf/drill-env.sh $DRILL_SITE
Drill Resource Configuration
Drill-on-YARN uses a different mechanism to set these values. You set the values in drill-on-yarn.conf ,
then Drill-on-YARN copies the values into the environment variables when launching each Drillbit.
drill-override.conf:
drill.exec: { cluster-id: "drillbits1" zk: { connect: "node1:2181,node2:2181,node3:2181", root: "drill", refresh: 500, timeout: 5000, retry: { count: 7200, delay: 500 } } }
drill-on-yarn.conf:
drill.yarn: { app-name: "Drill-on-YARN" dfs: { connection: "hdfs://node1:9000/" app-dir: "hdfs://node1:9000/users/drill" } yarn: { queue: "default" } drill-install: { client-path: "/home/admin/drill/apache-drill-1.15.0.tar.gz" } am: { heap: "450M" memory-mb: 512 } http: { port: 12345 auth-type: "simple" user-ame: "admin" password: "admin" rest-key="" } drillbit: { heap: "3G" max-direct-memory: "1G" code-cache: "1G" memory-mb: 4096 vcores: 2 # disks: 3 classpath: "" } cluster: [ { name: "drill-group1" type: "basic" count: 3 } ] }
Configuration Tips:
- Label configuration is disabled in Yarn, so we also set field 'type' to basic, not label in drill-on-yarn.conf.
- The default value user_name is not correct in drill-on-yarn.conf. It should be modified to 'user-name'
- 'app-dir' should be a absolute path like 'hdfs://node1:9000/users/drill' in drill-on-yarn.conf. It is noted that the doc from Apache-Drill is not correct.
- We should disable the 'vmem-check' in yarn-site.xml.
ZooKeeper Configuration
Drill uses ZooKeeper to coordinate between Drillbits.
When run under YARN, the Drill Application Master uses ZooKeeper to monitor Drillbit health.
Drill-on-YARN reads your $DRILL_SITE/drilloverride.conf file for ZooKeeper settings.
Hadoop Location
Apache Drill users must tell Drill-on-YARN the location of your Hadoop install.
Set the HADOOP_HOME environment variable in $DRILL_SITE/drillenv.sh to point to your Hadoop installation:
export HADOOP_HOME= /path/to/hadoop-home
This assumes that Hadoop configuration is in the default location:
$HADOOP_HOME/etc/hadoop
Launch Drill Under YARN
Use the client tool to launch your new Drill cluster, as shown:
$DRILL_HOME/bin/drill-on-yarn.sh --site $DRILL_SITE start