Big Data & Data Science Home
About this project
Make ARM64 a first class citizen in the Hadoop/Spark community and scale-out analytics.
Get Involved
- Meetings/Calls - See project calendar
- Join the bdds-dev ML (archive)
- #linaro-bigdata on irc.freenode.net
Team:
The content of this macro can only be viewed by users who have logged in. |
Assignee, ARM |
Member Engineer, ARMThe content of this macro can only be viewed by users who have logged in. |
Meetings
Scope of Work
Coming soon...
Backlog
The following items are on the project backlog but not currently planned. If you are interested in contributing to any of these items, please state your intention on the project's mailing list (found above)
Health Checks
Coming soon.....
Documentation
- ERP
- Big Data Components
- Apache Bigtop
- ODPi
- Big Data Core Components
- Big Data Operations
- Big Data Streaming Tools
- Big Data Data warehousing and Database Tools
- Big Data Data Governance and Security
- Big Data File Formats
- Big Data Datascience Notebooks
- Big Data Analytics
- Big Data ML - Machine Learning
- Big Data component dependencies
ERP
ERP 16.12: Installing Hadoop 2.7.2, Spark 2.0 and Hive 2.0.1
ERP 17.08 Building ELK (ElasticSearch, LogStash and Kibana) on Aarch64
ERP 18.06 Building and testing BigData components using Bigtop on Debian-9:AArch64
Resources
Big Data Components
Apache Bigtop
ODPi
Big Data Core Components
Apache Hadoop
- Setup, Configure and Install ODPi Hadoop
- Building and Running Apache Hadoop
- Building Hadoop 2.7.2, Spark 2.0, Hive 2.0.1 using Apache Bigtop
- Building, Running, Configuring and Profiling Apache Hadoop
- Apache Hadoop Tuning Notes
- Apache Hadoop Map Reduce Notes
- OpenJDK javac Nullpointerexception building Hadoop
- Patch 1/3 Introduce the HyperCrc32C Checksum class
- Patch 2/3 libhadoop: CRC: ARM NEON Support
- Patch 3/3 ModifyCRC to target NEON routine
- hadoop-lca13
ELK - ElasticSearch, Logstash and Kibana
Apache Sqoop
Apache Arrow
Big Data Operations
Big Data Streaming Tools
Apache Spark
Apache Flink
Apache Beam
Apache Tez
Apache Flume
Apache Storm
Apache Tachyon
Apache Kafka
- Apache Kafka Streams
Apache NiFi
Apache MiNiFi
Big Data Data warehousing and Database Tools
Apache Hive
Apache HBase
Apache Cassandra
Postrgres
Memcached
MySQL
Redis
Apache Drill
Big Data Data Governance and Security
Apache Ranger
Apache Knox
Apache Atlas
Apache Sentry
Big Data File Formats
Apache Parquet
Apache Avro
Big Data Datascience Notebooks
Apache Jupyter
Apache Zeppelin
Big Data Analytics
Apache Kudu
Big Data ML - Machine Learning
Big Data component dependencies
Tests
Smoke Tests
Integration Tests
ODPi Spec Tests
Benchmarking
- TeraSort
- Building, Running, Configuring and Profiling Apache Hadoop
- Spark Bench
- TPC-H
- TPCxHS
- Apache Bench
- BigBench
- Building and Running HiBench on AArch64 Platform
- HiveTestBench
HiBench
Build and Port
- Build Apache Ambari on AArch64
- HBase Enablement on AArch64
- Apache Flink on AArch64
- Zookeeper Enablement on AArch64
- NETLIB-JAVA AArch64 Natives Support
- Apache Ambari Install, Setup and Configuration
Machine Learning
Misc
- Onboarding info - Welcome to Team
- Big Data team work summary v1.0
- CRC32 vs Non-CRC32 Study
- Configuring archiva with Tomcat
Blogs/Presentations
State of Big Data on Aarch64 - Apache Bigtop
Big Data Roadmap
Strategic Engineering
Big Data and OpenJDK Strategic Engineering - 2018
Big Data and OpenJDK Strategic Engineering - 2017
Big Data Epics
JIRA Inprogress
Source Code
- Coming soon...
Linaro Ltd.