Delivered
Details
Details
Assignee
Leonardo Sandoval
Leonardo Sandoval(Deactivated)Reporter
Joanna Farley
Joanna FarleyLabels
Upstream
No
Share Visibility
Dave Pigott
Don Harbin
Maria Högberg
Original estimate
0m
Time tracking
Components
Priority
Checklist
Checklist
Sentry
Sentry
Created September 16, 2021 at 10:51 AM
Updated February 15, 2022 at 10:32 PM
Resolved December 14, 2021 at 6:26 PM
On some of our OpenCI test runs I see LAVA errors like
2021-09-15T14:44:45 lava-test-interactive-retry failed: 1 of 1 attempts. 'lava-test-interactive timed out after 900 seconds'
2021-09-15T14:44:45 lava-test-interactive timed out after 900 seconds
in lava logs like https://ci-builds.trustedfirmware.org/static-files/gRa8QQWPi_HnlWwULQFsICHMgv-DmRkioSu8HUb2IYcxNjMxNzIzNTYxNDk5OjE2OmpvYW5uYWZhcmxleS1hcm06am9iL3RmLWEtYnVpbGRlci80Mzk3MjYvYXJ0aWZhY3Q=/lava.log from a L2 run on a partner patch under review.
An L2 job showing this https://review.trustedfirmware.org/c/TF-A/trusted-firmware-a/+/9856 shows 4 failures of this type.
On a re run of the 4 failed tests 3 of then passed https://ci.trustedfirmware.org/job/tf-ci-gateway/13459/ the failed one had the above error.
On re running the single failed test it failed again, then again on a further rerun and finally passed on the rerun after.
L1, L2 and main jobs all suffer this issue.
Do we know what’s going on in LAVA? This is not new and as seen can be worked around but having to spend time re-running failed tests to what looks like a LAVA issue is time consuming and annoying.
Further discussion with Leonardo he provided the following input:
Yes, I have also observed similar behaviour. What I suspect is when a L1|2 job is launched, LAVA lab gets a burst of jobs to be processed, in turn, LAVA process these 8 at a time at the same physical machine, and at some point, the execution of each job slows down, giving timeouts. The naive approach here is to increase the timeout value (now it is 900 seconds, 15 minutes) but I am not sure if this is the best solution. Another option is to reduce the number of concurrent jobs, which in theory, would process faster.