2018-10-05
OpenSM still causing issues when setting up IB on the D03s
- Best route is to enable it on the switch
- Subnet created with P_Key, but don't know how to add nodes to it
- LID changes when SM changes / restarts, switch should know
- @Pak will try to set it up
Looking at different binaries on Mellanox drivers
- To do with host names on RODATA
- Can also have v8.1 instructions for newer cores
- We have to be careful with older arches
Talking about benchmarks, noise and how to use perf to find issues
- Thinking about hwloc support for Arm cores
- INRIA has done some work, should upstream it
- Added issue in benchmark_harness to use it
IPoIB tests too slow (15GB) while pure IB are fast (45GB+)
- Second ports look open, may need to flip the cabled next time in London
- @Baptiste is finishing the Jenkins job to automate it
When we get IB jobs running and stable, we'll look at OpenMPI's MTT
- Goal is to upload (some) results to OpenMPI's website
2018-08-31
Pak
Infiniband:
- Seems to be working on D05/D03 cluster, but due to the big difference between the two machines, it's not good for testing latency/bandwidth.
- If we had at least two D05s for cluster setup, it would be enough, but our other D05 runs Debian and benchmarks and doesn't have a Mellanox card.
- Action: to update HPC-294 with the tests and expected results to make sure IB is working and of good enough quality.
- Need to understand what we can do with our switch regarding subnet manager, and what we will have to use opensm
- Action: to work on HPC-292 trying to setup a subnet in the switch, and if not, listing the opensm setup during cluster provisioning
- Requested upstreaming for the feature needed for our clusters to work: socket direct and multi-host (see 2018-08-30), no response yet.
Lustre:
- Usually needs at least 4 servers for redundancy (two disks, metadata), but made it work on single x86 machine, server and client working
- Client builds and installs on Arm, but fails to communicate with the server. May be card issues (ConnectX5 on Arm vs X3/4 on x86).
- Building the server on Arm has some build issues (platform not recognised), may be due to old autoconf scripts.
- Action: try different cards on the x86 side and try a newer autoconf script, update HPC-321
Renato
Replicated Pak's Amberwing setup with multi-node using MLNX_OFED drivers, works fine, but install process is cumbersome. Working to automate it.
Tried building Lustre server on an x86 VM and got some weird build errors (AVX512 on a 10y.o server), bay be auto-detect.
Baptiste said there's a way to copy the host CPU features into the VM, will try that next. If it doesn't work, try to force configure options to disable AVX512.
That work will be updated in HPC-322.
2018-08-30
Takahiro
Had to move back to help Post-K development, didn't have time to continue working on upstream reviewed patch.
Current patch doesn't help other loops under investigation, will need additional work for those later.
Takeharu
Having trouble with Infiniband setup, which has delayed adding support for IB configuration in the Ansible recipes.
Not getting full speed on Mellanox fabric. May help to use auxiliary card on a PCI lane managed by the second CPU. Will need Socket Direct support (only on closed source drivers).
Would prefer to upstream the Ansible recipes into another repository (Linaro, OpenHPC) instead of having his own being the upstream.
Post-K uses a custom Lustre client/server, so they don't have the same problems we do with the server's kernel modules.
Fujitsu will use commercial version of Mellanox drivers, but also the freedom to use the open source ones.
We may need special handling in the Ansible recipes to choose which ones to install, or to leave that aside (ie. not overwrite existing drivers).
Masaki
Progress on LLVM and HCQC work reported in his YVR18 slides. Will share the source, so that we can merge with other compiler work (Takahiro, Renato, TCWG?).
Renato
Infiniband progress in the lab:
- Huawei servers use ConnectX5 with two ports each: one to IB switch (for MPI), one to 100GB Eth switch (for Lustre)
- Qualcomm servers use ConnectX4 in multi-node: OSS drivers don't support it, so we need to use MLNX_OFED. Provisioning / orchestration not ready for that.
Following up with Mellanox to upstream required features:
- Socket Direct: needed to have aux card on second CPU working to maximise bandwidth
- Multi-node: needed to make Amberwing aux. riser to make ports visible on second node
Testing Lustre:
- Client from whamcloud builds on Arm (both Huawei and Qualcomm) and packages install successfully
- Server needs kernel drivers that were removed from staging, so we will start with Intel server
- We don't have a spare x86_64 server, so we'll probably create a new VM on our admin server (really bad performance)
Slides