Apache Arrow Enablement on AArch64


Sources


  • Upstream: https://github.com/apache/arrow
  • Master branch is used in this Wiki.
  • Apache Arrow is a cross-language development platform. This wiki lists three mainly modules (C++, C_glib and python) build steps.  

Setup Environment


  • Ubuntu 16.04 64bit for Arm64
  • Dependcies: maven@v3.3.9,  python>=2.7, python-dev, g++ , cmake > 3.6.0
  • JDK 8, 1.8.0_171

Build Steps


Apache Arrow C++ codebase

This directory contains the code and build system for the Arrow C++ libraries, as well as for the C++ libraries for Apache Parquet.

Install CPP dependencies

Arrow uses CMake as a build configuration system. Currently, it supports in-source and out-of-source builds with the latter one being preferred.

  • A C++11-enabled compiler. On Linux, gcc 4.8 and higher should be sufficient.
  • CMake
  • Boost
sudo apt-get install g++ \
     libboost-all-dev \
     libboost-filesystem-dev \
     libboost-system-dev \
     libncurses5-dev wget curl libcurl4-openssl-dev \
     libtool flex bison pkg-config libssl-dev automake 

On Ubuntu16.04, we should build cmake from source(cmake > 3.6.0):

If based on Ubuntu18.04,  pls skip this step.

# Get source
wget https://cmake.org/files/v3.13/cmake-3.13.0-rc2.tar.gz


# Build $ install
tar zxvf cmake-3.13.0-rc2.tar.gz
cd cmake-3.13.0-rc2
./bootstrap
make -j32
sudo make install

# Set env
export PATH=/usr/local/bin:$PATH

Build Arrow CPP

1. Simple debug build:

git clone https://github.com/apache/arrow.git
cd arrow/cpp
mkdir debug
cd debug
cmake ..
make unittest


2. 
Release build:

git clone https://github.com/apache/arrow.git
cd arrow/cpp
mkdir release
cd release
cmake .. -DCMAKE_BUILD_TYPE=Release
make unittest 


Detailed unit test logs will be placed in the build directory under build/test-logs.

On some Linux distributions, running the test suite might require setting an explicit locale. If you see any locale-related errors, try setting the environment variable (which requires the locales package or equivalent):

export LC_ALL="en_US.UTF-8"


3. Build benchmark

Fix GTest missing:
sudo apt install libgtest-dev

cd /usr/src/gtest
sudo cmake CMakeLists.txt
sudo make
 
#copy or symlink libgtest.a and libgtest_main.a to your /usr/lib folder
sudo cp *.a /usr/lib


Build benchmark
cmake -DARROW_BUILD_BENCHMARKS=ON -DCMAKE_BUILD_TYPE=Release ..
make runbenchmark


Arrow GLib

Arrow GLib is a wrapper library for Arrow C++. Arrow GLib provides C API.

Arrow GLib supports GObject Introspection. It means that you can create language bindings at runtime or compile time.

Install C_glib dependencies

sudo apt-get install -y libgtk2.0-dev libglib2.0-dev \
                        autoconf-archive libgirepository1.0-dev \
                        meson ninja-build                     

Build Arrow C_glib

git clone https://github.com/apache/arrow.git
cd c_glib
./autogen.sh
./configure --enable-gtk-doc
make
sudo make install
sudo ldconfig

Python library for Apache Arrow

This library provides a Python API for functionality provided by the Arrow C++ libraries, along with tools for Arrow integration and interoperability with pandas, NumPy, and other software in the Python ecosystem.

System Requirements

On Linux, for this guide, we recommend using 4.9, or clang 3.7 or higher. 

Install C_glib dependencies

Arrow GLib is a wrapper library for Arrow C++. Arrow GLib provides C API.

Arrow GLib supports GObject Introspection. It means that you can create language bindings at runtime or compile time.

sudo apt-get install -y libgtk2.0-dev libglib2.0-dev \
                        autoconf-archive libgirepository1.0-dev \
                        meson ninja-build                     

Build Arrow python

1. Environment variables :

export ARROW_BUILD_TYPE=release
export ARROW_HOME=$(pwd)/dist
export PARQUET_HOME=$(pwd)/dist
export LD_LIBRARY_PATH=$(pwd)/dist/lib:$LD_LIBRARY_PATH                


2. Rebuild Arrow C++ libraries with diffrernt cmake configuration:

mkdir arrow/cpp/build
pushd arrow/cpp/build

cmake -DCMAKE_BUILD_TYPE=$ARROW_BUILD_TYPE \
      -DCMAKE_INSTALL_PREFIX=$ARROW_HOME \
      -DARROW_PARQUET=on \
      -DARROW_PYTHON=on \
      -DARROW_PLASMA=on \
      -DARROW_BUILD_TESTS=OFF \
      ..
make -j4
make install

If you don't want to build and install the Plasma in-memory object store, you can omit the -DARROW_PLASMA=on flag.


3. Build pyarrow:

cd arrow/python
python setup.py build_ext --build-type=$ARROW_BUILD_TYPE \
       --with-parquet --with-plasma --inplace             

Arrow C++ codebase Unit Tests 


make unittest

All cases are passed on AArch64.

 Test project /home/linux/arrow/cpp/bld-dbg
      Start  1: allocator-test
 1/39 Test  #1: allocator-test ...................   Passed    0.15 sec
      Start  2: array-test
 2/39 Test  #2: array-test .......................   Passed    5.01 sec
      Start  3: buffer-test
 3/39 Test  #3: buffer-test ......................   Passed    0.14 sec
      Start  4: memory_pool-test
 4/39 Test  #4: memory_pool-test .................   Passed    0.15 sec
      Start  5: pretty_print-test
 5/39 Test  #5: pretty_print-test ................   Passed    0.15 sec
      Start  6: public-api-test
 6/39 Test  #6: public-api-test ..................   Passed    0.14 sec
      Start  7: status-test
 7/39 Test  #7: status-test ......................   Passed    0.14 sec
      Start  8: stl-test
 8/39 Test  #8: stl-test .........................   Passed    0.14 sec
      Start  9: type-test
 9/39 Test  #9: type-test ........................   Passed    0.15 sec
      Start 10: table-test
10/39 Test #10: table-test .......................   Passed    0.14 sec
      Start 11: table_builder-test
11/39 Test #11: table_builder-test ...............   Passed    0.14 sec
      Start 12: tensor-test
12/39 Test #12: tensor-test ......................   Passed    0.14 sec
      Start 13: compute-test
13/39 Test #13: compute-test .....................   Passed    0.83 sec
      Start 14: feather-test
14/39 Test #14: feather-test .....................   Passed    0.34 sec
      Start 15: ipc-read-write-test
15/39 Test #15: ipc-read-write-test ..............   Passed    5.13 sec
      Start 16: ipc-json-test
16/39 Test #16: ipc-json-test ....................   Passed    0.22 sec
      Start 17: json-integration-test
17/39 Test #17: json-integration-test ............   Passed    0.14 sec
      Start 18: csv-chunker-test
18/39 Test #18: csv-chunker-test .................   Passed    0.15 sec
      Start 19: csv-column-builder-test
19/39 Test #19: csv-column-builder-test ..........   Passed    0.14 sec
      Start 20: csv-converter-test
20/39 Test #20: csv-converter-test ...............   Passed    0.14 sec
      Start 21: csv-parser-test
21/39 Test #21: csv-parser-test ..................   Passed    0.14 sec
      Start 22: io-buffered-test
22/39 Test #22: io-buffered-test .................   Passed    0.19 sec
      Start 23: io-compressed-test
23/39 Test #23: io-compressed-test ...............   Passed   13.47 sec
      Start 24: io-file-test
24/39 Test #24: io-file-test .....................   Passed    0.62 sec
      Start 25: io-hdfs-test
25/39 Test #25: io-hdfs-test .....................   Passed    0.15 sec
      Start 26: io-memory-test
26/39 Test #26: io-memory-test ...................   Passed    2.43 sec
      Start 27: io-readahead-test
27/39 Test #27: io-readahead-test ................   Passed    0.59 sec
      Start 28: bit-util-test
28/39 Test #28: bit-util-test ....................   Passed    0.58 sec
      Start 29: checked-cast-test
29/39 Test #29: checked-cast-test ................   Passed    0.14 sec
      Start 30: compression-test
30/39 Test #30: compression-test .................   Passed    0.67 sec
      Start 31: decimal-test
31/39 Test #31: decimal-test .....................   Passed    0.15 sec
      Start 32: key-value-metadata-test
32/39 Test #32: key-value-metadata-test ..........   Passed    0.14 sec
      Start 33: rle-encoding-test
33/39 Test #33: rle-encoding-test ................   Passed    0.42 sec
      Start 34: parsing-util-test
34/39 Test #34: parsing-util-test ................   Passed    0.14 sec
      Start 35: stl-util-test
35/39 Test #35: stl-util-test ....................   Passed    0.14 sec
      Start 36: thread-pool-test
36/39 Test #36: thread-pool-test .................   Passed    0.57 sec
      Start 37: task-group-test
37/39 Test #37: task-group-test ..................   Passed    0.32 sec
      Start 38: lazy-test
38/39 Test #38: lazy-test ........................   Passed    0.14 sec
      Start 39: logging-test
39/39 Test #39: logging-test .....................   Passed    0.56 sec

100% tests passed, 0 tests failed out of 39

Label Time Summary:
unittest    =  35.27 sec*proc (39 tests)


Benchmark for Arrow C++ codebase on AArch64/x86


make runbenchmark

AArch64 benchamark

Test project /home/linux/arrow/cpp/bld
      Start 13: builder-benchmark
 1/13 Test #13: builder-benchmark ................   Passed   66.00 sec
      Start 14: column-benchmark
 2/13 Test #14: column-benchmark .................   Passed   10.96 sec
      Start 16: compute-benchmark
 3/13 Test #16: compute-benchmark ................   Passed  489.87 sec
      Start 21: ipc-read-write-benchmark
 4/13 Test #21: ipc-read-write-benchmark .........   Passed   31.79 sec
      Start 26: csv-converter-benchmark
 5/13 Test #26: csv-converter-benchmark ..........   Passed    4.87 sec
      Start 27: csv-parser-benchmark
 6/13 Test #27: csv-parser-benchmark .............   Passed   11.40 sec
      Start 34: io-file-benchmark
 7/13 Test #34: io-file-benchmark ................   Passed   18.72 sec
      Start 35: io-memory-benchmark
 8/13 Test #35: io-memory-benchmark ..............   Passed  124.33 sec
      Start 48: bit-util-benchmark
 9/13 Test #48: bit-util-benchmark ...............   Passed   27.60 sec
      Start 49: compression-benchmark
10/13 Test #49: compression-benchmark ............   Passed   35.03 sec
      Start 50: decimal-benchmark
11/13 Test #50: decimal-benchmark ................   Passed    2.24 sec
      Start 51: lazy-benchmark
12/13 Test #51: lazy-benchmark ...................   Passed  454.25 sec
      Start 52: number-parsing-benchmark
13/13 Test #52: number-parsing-benchmark .........   Passed    8.60 sec

100% tests passed, 0 tests failed out of 13

Label Time Summary:
benchmark    = 1285.65 sec*proc (13 tests)

Total Test time (real) = 1285.68 sec

x86 benchamark

Test project /home/builder/arrow/cpp/bld
      Start 13: builder-benchmark
 1/13 Test #13: builder-benchmark ................   Passed   33.47 sec
      Start 14: column-benchmark
 2/13 Test #14: column-benchmark .................   Passed    7.46 sec
      Start 16: compute-benchmark
 3/13 Test #16: compute-benchmark ................   Passed  217.33 sec
      Start 21: ipc-read-write-benchmark
 4/13 Test #21: ipc-read-write-benchmark .........   Passed   30.19 sec
      Start 26: csv-converter-benchmark
 5/13 Test #26: csv-converter-benchmark ..........   Passed    4.22 sec
      Start 27: csv-parser-benchmark
 6/13 Test #27: csv-parser-benchmark .............   Passed   12.18 sec
      Start 34: io-file-benchmark
 7/13 Test #34: io-file-benchmark ................   Passed   19.27 sec
      Start 35: io-memory-benchmark
 8/13 Test #35: io-memory-benchmark ..............   Passed   52.36 sec
      Start 48: bit-util-benchmark
 9/13 Test #48: bit-util-benchmark ...............   Passed   29.20 sec
      Start 49: compression-benchmark
10/13 Test #49: compression-benchmark ............   Passed   15.29 sec
      Start 50: decimal-benchmark
11/13 Test #50: decimal-benchmark ................   Passed    1.97 sec
      Start 51: lazy-benchmark
12/13 Test #51: lazy-benchmark ...................   Passed  191.99 sec
      Start 52: number-parsing-benchmark
13/13 Test #52: number-parsing-benchmark .........   Passed   10.36 sec

100% tests passed, 0 tests failed out of 13

Label Time Summary:
benchmark    = 625.30 sec*proc (13 tests)


AArch64's substantial performance gap to the x86 platform is obviously from the benchmark shown aboved. 

On the next step, we are going to do profiing work on AArch64 to try to find CPU hot spot and give out possible improvement