AVA, Xen & kernel: hardware notes, debugging tips & tricks
Some unsorted collection of small but useful notes about AVA’s hardware, firmware and debugging options for Xen and Linux kernel compiled as part of TRS targeting AVA Developer Platform 32 machine.
Xen
Early printk via UART in Xen
Such options are required in .config
file of xen if compiled as standalone binary:
CONFIG_VERBOSE_DEBUG=y
CONFIG_EARLY_UART_CHOICE_PL011=y
CONFIG_EARLY_UART_PL011=y
CONFIG_EARLY_PRINTK=y
CONFIG_EARLY_UART_BASE_ADDRESS=0x100002600000
CONFIG_EARLY_UART_PL011_BAUD_RATE=0
CONFIG_EARLY_UART_PL011_MMIO32=y
CONFIG_EARLY_PRINTK_INC="debug-pl011.inc"
In case of TRS these config options can be added in one of the *.cfg
files under meta-ledge-secure/meta-ledge-secure/dynamic-layers/meta-virtualization/recipes-extended/xen/files/
Increasing verbosity of logs for Xen
Xen cmdline’s parameters loglvl=...
and guest_loglvl=...
can be switched to all
.
For TRS this can be achieved by editing the cmdline in file meta-ledge-secure/meta-ledge-secure/dynamic-layers/meta-virtualization/recipes-extended/xen/files/xen.cfg.in
:
options=noreboot dom0_mem=${DOM0_MEMORY_SIZE} bootscrub=0 iommu=on loglvl=all guest_loglvl=all
Logs location (xen, qemu)
On the booted machine the useful qemu and xen logs will be located under /var/log/xen/*
UEFI/BIOS
UEFI firmware on AVA does not deal well with stale boot entries in the EFI boot manager when such boot entry may not be associated with any disk or partition physically present in the machine. In such case the entry cannot be deleted or updated and, for instance, ubuntu installer fails to finish installation when it cannot add a boot entry due to name duplication. Such situation was observed in practice and probably happened after disk with installed OS was physically removed from the system.
Hence, before removing an NVMe disk from AVA machine it makes sense to delete any EFI boot that can be associated with a disk to be removed.
If such thing happens, the way to restore out of it is to create an empty/fake required UEFI partition and change its UUID usingfdisk
to match the entry in EFI boot manager. Afterwards the entry in EFI boot manager becomes valid and can be successfully deleted.UEFI/BIOS freezes or hangs sometimes. It may be related to some weird USB devices attached to the machine when booting into UEFI/BIOS interface.
PCIe
buggy PCIe controller/PCIe write-combining issue
PCIe controller on AVA has a hardware bug: some unaligned accesses or write-combining operations over PCIe device MMIO space mapped as normal memory lead to corruption of that data. The best example of this is Linux graphical environment, for instance corruption of graphics under Xorg/X11 systems. Wayland-native applications, surprisingly, do not have this issue.
The original thread is here:
https://gitlab.freedesktop.org/mesa/mesa/-/issues/9100
There are two patches for kernel 6.3 that workaround this issue by remapping PCIe MMIO space memory as a Device, non-gathering memory and fixing/handling unaligned access faults as result.
PCIe AER events flood
It is observed that some PCIe cards might have issues negotiating power management policies and may produce AER events/messages under old UEFI firmware (1.x.y.z-something). The latest firmwares does not have this issue. There is an option in BIOS to handle such messages in firmware instead but it is untested.
However, the PCIe ASPM can be disabled via Linux kernel cmdline:
pcie_aspm=off
The AER messages also could be suppressed using:
The https://docs.ipi.wiki/com-hpc/ava/firmware_A2Versions.html suggests that AER events can be caused by incorrect payload size which can be re-configured in BIOS.
PCIe speed
By default BIOS/UEFI sets the PCIe speeds to Gen3. There are options in BIOS/UEFI interface to enable Gen4 speed. There is no information if anyone tested this but it should work in theory. In case of potential performance impact, the Gen4 speed could be tested.
PCIe-to-NVMe adapter card
Such card was tested and it worked. I was able to use 3 NVMe disks at some point when 3rd NVMe was connected via pcie-to-nvme card.
SATA ports on motherboard
It looks like they do not work and probably there is no SATA controller on the carrier board.