Delivered
Details
Assignee
Anmar Oueja
Anmar OuejaReporter
Former user
Former user(Deactivated)Labels
Due date
Aug 27, 2015
Priority
Checklist
Sentry
Created July 29, 2015 at 6:59 PM
Updated August 1, 2021 at 4:07 PM
Resolved August 27, 2015 at 1:38 PM
Rationale
Big Data is growing at 100% rate every year, as reported at last Hadoop Summit 2015 in Bruxelles. Big Data is also coming up as top priority for ARM servers. Big Data is not just Hadoop but an ecosystem of projects, applications, patterns, components, plug-in's around Hadoop. Spark is also the new hype in Big Data.
Hadoop Terasort runs on ARMv8-A and is used as part of the nightly testing for OpenJDK (CARD-972), this prooves that the map-reduce part is fully working on Aarch64 - at least in single node configuration.
The purpose of this card is to focus on prototyping and exploring the features that are already working properly on ARMv8-A in a multi-node cluster configuration vs identifying the gaps. This effort shall also identify the test suites and benchmarking suites to be used at a later stage. It is important to evaluate also real life representative use cases.
The focus shall be on Hadoop first but Spark shall be covered as well.
The main output of this card shall be a status assessment and a solid plan towards full support of Big Data on ARM servers.
Deliverables
status assessment for Hadoop and Spark on ARM, what is working vs failing
preliminary output from running basic test and benchmarking suites + list of most advanced ones for the next phases
documentation for the set up and any script / workaround required
hardware requirements for the next steps, e.g. number of nodes, memory, switches, servers, traffic generators, etc.
plan for Big Data on ARM - follow on cards to handle the scope and an epic card to represent the long term strategic targets for the Big Data on ARM lead project.
Staffing
one senior engineer for 2-3 months
Acceptance Criteria and Closeout
Criteria
Status
Closeout Notes/Links
Documentation that outlines the feasibility study approach, targets, HW requirements and results should be made available via the wiki
Feasibility of Big Data Task: https://wiki.linaro.org/Internal/LEG/Engineering/BigData/Feasibility
New scope should be made available through JIRA cards - at least 1 or 2 cards should be ready for LEG-SC review by the end of the feasibility phase
Big Data is now part of Lead project and proposal sent to LEG-SC. Jira issues will open accordingly
Legend: