TensorFlow benchmarking

In order to chart the performance of TensorFlow over time, there must be performance measurements made at regular intervals. These should also cover a range of use cases and machine capabilities to get the best possible coverage.

We will make use of the MLCommons infrastructure to run some of the tests from MLPerf Inference r1.1.

Usable benchmarks

These benchmarks have all been run on recent versions of TensorFlow

resnet50

ssd-resnet34

ssd-mobilenet

Unusable benchmarks

These benchmarks have problems that prevent their use on recent versions of TensorFlow, or even, in one case, on any version.

dlrm

3d-unet

bert