/
AVA, Xen & kernel: hardware notes, debugging tips & tricks

AVA, Xen & kernel: hardware notes, debugging tips & tricks

Some unsorted collection of small but useful notes about AVA’s hardware, firmware and debugging options for Xen and Linux kernel compiled as part of TRS targeting AVA Developer Platform 32 machine.

Xen

  • Early printk via UART in Xen

Such options are required in .config file of xen if compiled as standalone binary:

CONFIG_VERBOSE_DEBUG=y CONFIG_EARLY_UART_CHOICE_PL011=y CONFIG_EARLY_UART_PL011=y CONFIG_EARLY_PRINTK=y CONFIG_EARLY_UART_BASE_ADDRESS=0x100002600000 CONFIG_EARLY_UART_PL011_BAUD_RATE=0 CONFIG_EARLY_UART_PL011_MMIO32=y CONFIG_EARLY_PRINTK_INC="debug-pl011.inc"

In case of TRS these config options can be added in one of the *.cfg files under meta-ledge-secure/meta-ledge-secure/dynamic-layers/meta-virtualization/recipes-extended/xen/files/

  • Increasing verbosity of logs for Xen

Xen cmdline’s parameters loglvl=... and guest_loglvl=... can be switched to all.

For TRS this can be achieved by editing the cmdline in file meta-ledge-secure/meta-ledge-secure/dynamic-layers/meta-virtualization/recipes-extended/xen/files/xen.cfg.in :

options=noreboot dom0_mem=${DOM0_MEMORY_SIZE} bootscrub=0 iommu=on loglvl=all guest_loglvl=all
  • Logs location (xen, qemu)

On the booted machine the useful qemu and xen logs will be located under /var/log/xen/*

UEFI/BIOS

  • UEFI firmware on AVA does not deal well with stale boot entries in the EFI boot manager when such boot entry may not be associated with any disk or partition physically present in the machine. In such case the entry cannot be deleted or updated and, for instance, ubuntu installer fails to finish installation when it cannot add a boot entry due to name duplication. Such situation was observed in practice and probably happened after disk with installed OS was physically removed from the system.
    Hence, before removing an NVMe disk from AVA machine it makes sense to delete any EFI boot that can be associated with a disk to be removed.
    If such thing happens, the way to restore out of it is to create an empty/fake required UEFI partition and change its UUID using fdisk to match the entry in EFI boot manager. Afterwards the entry in EFI boot manager becomes valid and can be successfully deleted.

  • UEFI/BIOS freezes or hangs sometimes. It may be related to some weird USB devices attached to the machine when booting into UEFI/BIOS interface.

PCIe

  • buggy PCIe controller/PCIe write-combining issue
    PCIe controller on AVA has a hardware bug: some unaligned accesses or write-combining operations over PCIe device MMIO space mapped as normal memory lead to corruption of that data. The best example of this is Linux graphical environment, for instance corruption of graphics under Xorg/X11 systems. Wayland-native applications, surprisingly, do not have this issue.
    The original thread is here:
    https://gitlab.freedesktop.org/mesa/mesa/-/issues/9100
    There are two patches for kernel 6.3 that workaround this issue by remapping PCIe MMIO space memory as a Device, non-gathering memory and fixing/handling unaligned access faults as result.



  • PCIe AER events flood

It is observed that some PCIe cards might have issues negotiating power management policies and may produce AER events/messages under old UEFI firmware (1.x.y.z-something). The latest firmwares does not have this issue. There is an option in BIOS to handle such messages in firmware instead but it is untested.

However, the PCIe ASPM can be disabled via Linux kernel cmdline:

pcie_aspm=off

The AER messages also could be suppressed using:

The https://docs.ipi.wiki/com-hpc/ava/firmware_A2Versions.html suggests that AER events can be caused by incorrect payload size which can be re-configured in BIOS.

  • PCIe speed

By default BIOS/UEFI sets the PCIe speeds to Gen3. There are options in BIOS/UEFI interface to enable Gen4 speed. There is no information if anyone tested this but it should work in theory. In case of potential performance impact, the Gen4 speed could be tested.

  • PCIe-to-NVMe adapter card

Such card was tested and it worked. I was able to use 3 NVMe disks at some point when 3rd NVMe was connected via pcie-to-nvme card.

SATA ports on motherboard

It looks like they do not work and probably there is no SATA controller on the carrier board.