Skip to end of banner
Go to start of banner

Weekly Sync Minutes

Skip to end of metadata
Go to start of metadata

You are viewing an old version of this page. View the current version.

Compare with Current View Page History

Version 1 Next »

2018-08-30 Asia

Takahiro

Had to move back to help Post-K development, didn't have time to continue working on upstream reviewed patch.

Current patch doesn't help other loops under investigation, will need additional work for those later.

Takeharu

Having trouble with Infiniband setup, which has delayed adding support for IB configuration in the Ansible recipes.

Not getting full speed on Mellanox fabric. May help to use auxiliary card on a PCI lane managed by the second CPU. Will need Socket Direct support (only on closed source drivers).

Would prefer to upstream the Ansible recipes into another repository (Linaro, OpenHPC) instead of having his own being the upstream.

Post-K uses a custom Lustre client/server, so they don't have the same problems we do with the server's kernel modules.

Fujitsu wants support for OSS Mellanox drivers, but also the freedom to use the closed source ones.

We may need special handling in the Ansible recipes to choose which ones to install, or to leave that aside (ie. not overwrite existing drivers).

Masaki

Progress on LLVM and HCQC work reported in his YVR18 slides. Will share the source, so that we can merge with other compiler work (Takahiro, Renato, TCWG?).

Renato

Infiniband progress in the lab:

  • Huawei servers use ConnectX5 with two ports each: one to IB switch (for MPI), one to 100GB Eth switch (for Lustre)
  • Qualcomm servers use ConnectX4 in multi-node: OSS drivers don't support it, so we need to use MLNX_OFED. Provisioning / orchestration not ready for that.

Following up with Mellanox to upstream required features:

  • Socket Direct: needed to have aux card on second CPU working to maximise bandwidth
  • Multi-node: needed to make Amberwing aux. riser to make ports visible on second node

Testing Lustre:

  • Client from whamcloud builds on Arm (both Huawei and Qualcomm) and packages install successfully
  • Server needs kernel drivers that were removed from staging, so we will start with Intel server
  • We don't have a spare x86_64 server, so we'll probably create a new VM on our admin server (really bad performance)

Slides

  • No labels