2018-10-26
IB performance issues
- Software issues are being resolved (HPC-341), we need to push them upstream
- Need to test on D03, D05, QDF, etc, to make sure it's not TX2 specific
- @Renato: check who can upstream the Mellanox patch (Ilias?)
- Hardware timing issues will need time to be resolved and we can't do anything
- We can identify them (by running on different hardware, investigating)
- And report back to the vendors, if they haven't seen it yet
- Intel writes directly to cache (bypasses memory)
- Can we do that, too? This would speed up considerably
- We're adding an IB performance job to Jenkins
Adding IB test job to Jenkins
- We're only running dual-node for now, could add single node (loopback, shared mem)
- Could also add UCX perf tests to the same job
2018-10-25
Infiniband installation on OpenHPC tracking on HPC-351:
- Code mostly finished, will test next week
- Will submit a pull request once finished
...