Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.

2018-10-26

IB performance issues

  • Software issues are being resolved (HPC-341), we need to push them upstream
    • Need to test on D03, D05, QDF, etc, to make sure it's not TX2 specific
    • @Renato: check who can upstream the Mellanox patch (Ilias?)
  • Hardware timing issues will need time to be resolved and we can't do anything
    • We can identify them (by running on different hardware, investigating)
    • And report back to the vendors, if they haven't seen it yet
  • Intel writes directly to cache (bypasses memory)
    • Can we do that, too? This would speed up considerably
  • We're adding an IB performance job to Jenkins
    • We can use that to test changes in OFED drivers (Mellanox or Inbox)
    • OpenUCX performance tests can be done on a single-node system
    • OpenMPI seems to perform better on shared memory than UCX

Adding IB test job to Jenkins

  • We're only running dual-node for now, could add single node (loopback, shared mem)
  • Could also add UCX perf tests to the same job

2018-10-25

Infiniband installation on OpenHPC tracking on HPC-351:

  • Code mostly finished, will test next week
  • Will submit a pull request once finished

...