Apache Drill on YARN for Arm64

Introduction


YARN works by launching an application using a “client” application. For Drill, this is the Drill-on-YARN client. The client can run on any machine that has both the Drill and Hadoop software. Any host from which you currently launch YARN jobs can be the client. The client is not required to be part of the YARN cluster.

When running Drill outside of YARN, you must install Drill on every node in the cluster. With YARN, you only need install Drill only on the client machine; Drill-on-YARN automatically deploys (“localizes”) Drill to the worker nodes.

When running Drill without YARN, many users place their configuration files and custom code within the Drill distribution directory. When running under YARN, all your configuration and custom code resides in the site directory; do not change anything in the Drill install. (This allows Drill-on-YARN to upload your original Drill install archive without rebuilding it.)



Drill-on-YARN Components 


Drill-on-YARN uses the following components:

  • Drill distribution archive: The original .tar.gz file for your Drill distribution. DrillonYARN uploads this archive to your distributed file system (DFS). YARN downloads it (localized it) to each worker node.
  • Drill site directory: A directory that contains your Drill configuration and custom jar files. DrillonYARN copies this directory to each worker node.
  • Configuration: A configuration file which tells DrillonYARN how to manage your Drill cluster. This file is separate from your configuration files for Drill itself.
  • DrillonYARN client: A command line program to start, stop and monitor your YARN-managed Drill cluster.
  • Drill Application Master (AM): The software that works with YARN to request resources, launch Drillbits, and so on. The AM provides a web UI to manage your Drill cluster.
  • Drillbit: The Drill daemon software that YARN runs on each node.



Steps of creating a Drill-Yarn Cluster


Create a Master Directory

To localize Drill files, the client tool requires a copy of the original Drill distribution archive and the location of your site directory. Assume all these components reside in a single “master directory” described as $MASTER_DIR . On the client machine, create the master directory, as shown:


	export MASTER_DIR=/path/to/master/dir
	mkdir $MASTER_DIR
	cd $MASTER_DIR  

Unpack the archive to create $DRILL_HOME. - Create the site directory with the required configuration files.

Install Drill

Follow the Drill Arm64 install directions to install Drill on your client host:

1.Select a Drill version. The name is used in multiple places below. For convenience, define an environment variable for the name:

	export DRILL_NAME=apachedrillx.y.z

2.  Expand the Drill distribution into this folder to create the master directory

	tar -xzf $DRILL_NAME.tar.gz

3.  For ease of following the remaining steps, call your expanded Drill folder $DRILL_HOME :

 	export DRILL_HOME=$MASTER_DIR/$DRILL_NAME

Your master directory should now contain the original Drill archive along with an expanded copy of that archive.

Create the Site Directory

The site directory contains your site-specific files for Drill. If you are converting an existing Drill install, see the “Site Directory” section.

Create the site directory within your master directory:

	export DRILL_SITE=$MASTER_DIR/site
	mkdir $DRILL_SITE

When you do a fresh install, Drill includes a conf directory under $DRILL_HOME. Use the files in that directory to create your site directory.

	cp $DRILL_HOME/conf/drill-override-example.conf $DRILL_SITE/drill-override.conf
	cp $DRILL_HOME/conf/drill-on-yarn-example.conf $DRILL_SITE/drill-on-yarn.conf
    cp $DRILL_HOME/conf/drill-env.sh $DRILL_SITE  

Drill Resource Configuration

Drill-on-YARN uses a different mechanism to set these values. You set the values in drill-on-yarn.conf ,

then Drill-on-YARN copies the values into the environment variables when launching each Drillbit.

drill-override.conf:

drill.exec: {
  cluster-id: "drillbits1"
  zk: {
	connect: "node1:2181,node2:2181,node3:2181",
	root: "drill",
	refresh: 500,
	timeout: 5000,
  	retry: {
  	  count: 7200,
  	  delay: 500
  	}
  }
}


drill-on-yarn.conf:

drill.yarn: {
  app-name: "Drill-on-YARN"

  dfs: {
    connection: "hdfs://node1:9000/"
    app-dir: "hdfs://node1:9000/users/drill"
  }

  yarn: {
    queue: "default"
  }

  drill-install: {
    client-path: "/home/admin/drill/apache-drill-1.15.0.tar.gz"
  }

  am: {
    heap: "450M"
    memory-mb: 512
  }

  http: {
    port: 12345
    auth-type: "simple"
    user-ame: "admin"
    password: "admin"
    rest-key=""
  }

  drillbit: {
    heap: "3G"
    max-direct-memory: "1G"
    code-cache: "1G"
    memory-mb: 4096
    vcores: 2
    # disks: 3
    classpath: ""
  }

  cluster: [
    {
      name: "drill-group1"
      type: "basic"
      count: 3
    }
  ]
}


Configuration Tips:

  1. Label configuration is disabled in Yarn, so we also set field 'type' to basic, not label in drill-on-yarn.conf.
  2. The default value user_name is not correct in drill-on-yarn.conf. It should be modified to 'user-name'
  3.  'app-dir' should be a absolute path like 'hdfs://node1:9000/users/drill' in drill-on-yarn.conf.   It is noted that the doc from Apache-Drill is not correct.
  4. We should disable the 'vmem-check' in yarn-site.xml:

      <property>
        <name>yarn.nodemanager.vmem-check-enabled</name>
        <value>false</value>
      </property>


ZooKeeper Configuration 

Drill uses ZooKeeper to coordinate between Drillbits.

When run under YARN, the Drill Application Master uses ZooKeeper to monitor Drillbit health.

Drill-on-YARN reads your $DRILL_SITE/drilloverride.conf file for ZooKeeper settings.

Hadoop Location

Apache Drill users must tell Drill-on-YARN the location of your Hadoop install.

Set the HADOOP_HOME environment variable in $DRILL_SITE/drillenv.sh to point to your Hadoop installation:

	export HADOOP_HOME= /path/to/hadoop-home  

This assumes that Hadoop configuration is in the default location:

	$HADOOP_HOME/etc/hadoop 

Hadoop and Drill environment variables list

	export JAVA_HOME=/usr/lib/jvm/java-8-openjdk-arm64
	export HADOOP_HOME=/usr/lib/hadoop-2.8.4
	export HADOOP_CONF_DIR=$HADOOP_HOME/etc/hadoop
	export HADOOP_YARN_HOME=/usr/lib/hadoop-2.8.4
	export YARN_CONF_DIR=$HADOOP_YARN_HOME/etc/hadoop
	export PATH=$HADOOP_HOME/bin:$JAVA_HOME/bin:$PATH
	export CLASSPATH=.:$JAVA_HOME/lib/dt.jar:$JAVA_HOME/lib/tools.jar:$HADOOP_HOME/bin

	export MASTER_DIR=/home/linaro/drill-setup/drill-master
	export DRILL_HOME=$MASTER_DIR/apache-drill-1.15.0
	export DRILL_SITE=$MASTER_DIR/site
	export PROD_DRILL_HOME=/home/linaro/drill-setup/drill/distribution/target/apache-drill-1.15.0/apache-drill-1.15.0

Launch Drill Under YARN


 Use the client tool to launch your new Drill cluster, as shown:

	$DRILL_HOME/bin/drill-on-yarn.sh --site $DRILL_SITE start


Login web UI:  http://10.101.16.16:12345

Drill master:

Drillbits: http://10.101.16.7:8047

Yarn cluster: http://10.101.16.7:8088/cluster