TF-A CI / tf-l3-code-coverage tests random failures
Description
Environment
Engineering Progress Update
Activity
Paul Sokolovskyy October 5, 2023 at 11:35 AM
@Olivier Deprez :
This issue does not seem to happen any longer (after LAVA upgrade a few adjustements we did for this test group) , so I believe we could conclude and close this ticket.
First of all, sorry for lack of updates, this issue was backlogged by Linaro’s offsite meeting and then my vacation. I actually tried to look into this issue quickly before my vacation, but what I saw at that time seemed like exactly an issue which could be attributed to the LAVA upgrade, so put off looking into it as it seemed rather involved. It’s a miracle that it resolved itself during this time, I guess it’s actually the tweaks you guys made.
So, for completeness, looking at https://ci.trustedfirmware.org/job/tf-a-main/857/ , we have
| Code coverage |
|
|
|
| ( 15 min ) |
(15 mins! How cool is that!) And what’s important that there’s actual code coverage starts are there and looks sane: https://ci.trustedfirmware.org/job/tf-a-ci-gateway/50029/ (what I see previously looked as if trace data, source for codecov, wasn’t produced on LAVA side).
So, as long as you @Olivier Deprez checked it for sanity either, I guess we indeed can close it, thanks!
Olivier Deprez October 4, 2023 at 9:04 AM
This issue does not seem to happen any longer (after LAVA upgrade a few adjustements we did for this test group) , so I believe we could conclude and close this ticket. @Paul Sokolovskyy if you agree.
Olivier Deprez September 13, 2023 at 10:03 AM(edited)
Last couple of runs seem to no longer have random failures since Sep, 7th but the job is still failing perhaps for another reason
https://ci.trustedfirmware.org/job/tf-a-ci-gateway/48076/console
00:12:38.441 Writing directory view page.
00:12:38.441 Overall coverage rate:
00:12:38.441 lines......: 69.0% (3128 of 4531 lines)
00:12:38.441 functions..: 58.9% (622 of 1056 functions)
00:12:38.441 branches...: 55.5% (826 of 1489 branches)
00:12:38.444 ++ generate_header /home/buildslave/workspace/tf-a-ci-gateway/report.html
00:12:38.444 ++ local cov_html=/home/buildslave/workspace/tf-a-ci-gateway/merge/outdir/lcov/index.html
00:12:38.444 ++ local out_report=/home/buildslave/workspace/tf-a-ci-gateway/report.html
00:12:38.444 ++ python3 -
00:12:38.479 Traceback (most recent call last):
00:12:38.479 File "<stdin>", line 17, in <module>
00:12:38.479 FileNotFoundError: [Errno 2] No such file or directory: '/home/buildslave/workspace/tf-a-ci-gateway/merge/outdir/lcov/index.html'
00:12:38.504 Build step 'Execute scripts' changed build result to FAILURE
00:12:38.504 Build step 'Execute scripts' marked build as failure
00:12:38.505 Archiving artifacts
00:12:45.947 Finished: FAILURE
Benjamin Copeland September 12, 2023 at 12:53 PM
@Paul Sokolovskyy FYI LAVA upgrade is done.
Paul Sokolovskyy September 1, 2023 at 9:55 AM
@Olivier Deprez :
Ok but let’s be careful from now on as we’ll start our pre-release activities for an Oct/Nov
Added note to https://linaro.atlassian.net/browse/STG-4919 . The current plan is to upgrade next week, so should be sustainable plan.
Hi,
We’re experiencing random failures with the tf-l3-code-coverage test group since around mid August.
This is currently resulting in the TF-A CI main job failing.
Not always the same test affected in this group, but fail modes looks similar.
It seems like the job builds, runs and produces results but never ends and timeouts.
The lava log shows
2023-08-31T01:29:17 covtrace-FVP_Base_RevC_2xAEMvA.cluster0.cpu0.log 88005504 13 4 2023-08-31T01:29:17 covtrace-FVP_Base_RevC_2xAEMvA.cluster0.cpu0.log 88005508 2 4 2023-08-31T01:29:17 covtrace-FVP_Base_RevC_2xAEMvA.cluster0.cpu0.log 8800550c 2 4 2023-08-31T01:29:17 covtrace-FVP_Base_RevC_2xAEMvA.cluster0.cpu0.log 88005510 3631 4 2023-08-31T01:29:17 Stopping container lava-1889970-2.1.2 from action run-fvp 2023-08-31T01:29:17 Calling: 'nice' 'docker' 'stop' 'lava-1889970-2.1.2' 2023-08-31T01:33:11 Failed to clean after action 'run-fvp': job timed out after 300 seconds 2023-08-31T01:33:11 Traceback (most recent call last): File "/usr/lib/python3/dist-packages/lava_dispatcher/action.py", line 206, in cleanup child.cleanup(connection) File "/usr/lib/python3/dist-packages/lava_dispatcher/actions/boot/fvp.py", line 347, in cleanup super().cleanup(connection) File "/usr/lib/python3/dist-packages/lava_dispatcher/actions/boot/fvp.py", line 189, in cleanup return_value = self.run_cmd(["docker", "stop", self.container], allow_fail=True) File "/usr/lib/python3/dist-packages/lava_dispatcher/action.py", line 674, in run_cmd proc.expect(pexpect.EOF) File "/usr/lib/python3/dist-packages/pexpect/spawnbase.py", line 343, in expect return self.expect_list(compiled_pattern_list, File "/usr/lib/python3/dist-packages/pexpect/spawnbase.py", line 372, in expect_list return exp.expect_loop(timeout) File "/usr/lib/python3/dist-packages/pexpect/expect.py", line 169, in expect_loop incoming = spawn.read_nonblocking(spawn.maxread, timeout) File "/usr/lib/python3/dist-packages/pexpect/pty_spawn.py", line 500, in read_nonblocking if (timeout != 0) and select(timeout): File "/usr/lib/python3/dist-packages/pexpect/pty_spawn.py", line 450, in select return select_ignore_interrupts([self.child_fd], [], [], timeout)[0] File "/usr/lib/python3/dist-packages/pexpect/utils.py", line 143, in select_ignore_interrupts return select.select(iwtd, owtd, ewtd, timeout) File "/usr/lib/python3/dist-packages/lava_common/timeout.py", line 76, in _timed_out raise self.exception("%s timed out after %s seconds" % (self.name, duration)) lava_common.exceptions.JobError: job timed out after 300 seconds 2023-08-31T01:33:11 Failed to clean after action 'boot-fvp-main': Failed to clean after job 2023-08-31T01:33:11 Traceback (most recent call last): File "/usr/lib/python3/dist-packages/lava_dispatcher/action.py", line 206, in cleanup child.cleanup(connection) File "/usr/lib/python3/dist-packages/lava_dispatcher/action.py", line 844, in cleanup self.pipeline.cleanup(connection) File "/usr/lib/python3/dist-packages/lava_dispatcher/action.py", line 215, in cleanup raise InfrastructureError("Failed to clean after job") lava_common.exceptions.InfrastructureError: Failed to clean after job 2023-08-31T01:33:11 Failed to clean after action 'boot-fvp': Failed to clean after job 2023-08-31T01:33:11 Traceback (most recent call last): File "/usr/lib/python3/dist-packages/lava_dispatcher/action.py", line 206, in cleanup child.cleanup(connection) File "/usr/lib/python3/dist-packages/lava_dispatcher/action.py", line 844, in cleanup self.pipeline.cleanup(connection) File "/usr/lib/python3/dist-packages/lava_dispatcher/action.py", line 215, in cleanup raise InfrastructureError("Failed to clean after job") lava_common.exceptions.InfrastructureError: Failed to clean after job 2023-08-31T01:33:11 InfrastructureError: The Infrastructure is not working correctly. Please report this error to LAVA admins. 2023-08-31T01:33:11 {'case': 'job', 'definition': 'lava', 'error_msg': 'Failed to clean after job', 'error_type': 'Infrastructure', 'result': 'fail'}
The lava job is not much explicit https://tf.validation.linaro.org/scheduler/job/1889970
Example of failing jobs since last few days:
https://ci.trustedfirmware.org/job/tf-a-ci-gateway/47109/
https://ci.trustedfirmware.org/job/tf-a-ci-gateway/47046/
https://ci.trustedfirmware.org/job/tf-a-ci-gateway/46966/
https://ci.trustedfirmware.org/job/tf-a-ci-gateway/46898/
https://ci.trustedfirmware.org/job/tf-a-ci-gateway/46720/
https://ci.trustedfirmware.org/job/tf-a-ci-gateway/46544/
https://ci.trustedfirmware.org/job/tf-a-ci-gateway/46390/
https://ci.trustedfirmware.org/job/tf-a-ci-gateway/46352/
https://ci.trustedfirmware.org/job/tf-a-ci-gateway/46275/