MBed TLS running slow on Open CI
Description
Environment
Engineering Progress Update
Attachments
is a dependency for
Activity
Glen Valante May 19, 2023 at 12:54 PM
To date we are seeing reasonable performance. We are closing this issue .
Glen Valante April 5, 2023 at 1:12 AM
Update:
We have adjusted the size of the executors to: T3.medium/T3.large/T3.medium. The Master is still C6a.4xlarge. We are getting consistent numbers and will monitor.
After TF-M/TF-A releases in April, we will look for an opportunity to migrate the tests back to Production from Stage.
Moving the issue to blocked while we monitor and wait till next month when we migrate it back to production.
Arthur She March 29, 2023 at 6:20 PM
The ARM CC license server issue has been resolved by moving the flexnet back to scaleway from AWS.
Below is @Kelley Spoon 's explanation
Currently: I have moved flexnet back to the old server at scaleway in order to get the tests running again.
The problem: When we setup DNS for 'flexnet.tf.o' to point to the public IP address, traffic from the mbed-tls instances and ci.staging are being routed outside of the AWS network and thus VPC security group and back in through the public IP's interface. This means that traffic from any AWS assets is no longer granted the privileges of the security group and is subject to the same firewall policies as traffic from the rest of the internet... so it gets blocked.
If you attempt to reach the flexnet server by its internal AWS IP address from another AWS instance, it works just fine with our existing rules.
The solution: We need to find a way to keep AWS instances from using the public IP interface for the server and ensure they get routed through the private IP. Something similar to putting an /etc/hosts in to preempt the DNS settings.
Since I'm not sure what the right way to do this is, I decided to fall back to the old flexnet server since we know it's operational. Arthur ran a test and it passed, so this should hold us for now.
Arthur She March 24, 2023 at 3:33 AM
Bence merged a patch that resolved the performance issue on the OpenCI.
And we’re having ARM CC license server issue, all_armcc-build_armcc test failed on all builds. The log showed “Unable to connect to the license server"
The MbedTLS team is noticing that jobs are running extremely slow, examples:
https://github.com/Mbed-TLS/mbedtls/pull/5742
https://ci.trustedfirmware.org/view/TF-M/job/tf-m-nightly/922/
No indication of why this is happening.