Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.

...

  • UEFI firmware on AVA does not deal well with stale boot entries in the EFI boot manager when such boot entry may not be associated with any disk or partition physically present in the machine. In such case the entry cannot be deleted or updated and, for instance, ubuntu installer fails to finish installation when it cannot add a boot entry due to name duplication. Such situation was observed in practice and probably happened after disk with installed OS was physically removed from the system.
    Hence, before removing an NVMe disk from AVA machine it makes sense to delete any EFI boot that can be associated with a disk to be removed.
    If such thing happens, the way to restore out of it is to create an empty/fake required UEFI partition and change its UUID using fdisk to match the entry in EFI boot manager. Afterwards the entry in EFI boot manager becomes valid and can be successfully deleted.

  • UEFI/BIOS freezes or hangs sometimes. It may be related to some weird USB devices attached to the machine when booting into UEFI/BIOS interface.

PCIe

  • buggy PCIe controller/PCIe write-combining issue
    PCIe controller on AVA has a hardware bug: some unaligned accesses or write-combining operations over PCIe device MMIO space mapped as normal memory lead to corruption of that data. The best example of this is Linux graphical environment, for instance corruption of graphics under Xorg/X11 systems. Wayland-native applications, surprisingly, do not have this issue.
    The original thread is here:
    https://gitlab.freedesktop.org/mesa/mesa/-/issues/9100
    There are two patches for kernel 6.3 that workaround this issue by remapping PCIe MMIO space memory as a Device, non-gathering memory and fixing/handling unaligned access faults as result.

    View file
    name9001-ampere-arm64-Add-a-fixup-handler-for-alignment-fault.patch

    View file
    name9002-ampere-arm64-Work-around-Ampere-Altra-erratum-82288-.patch

  • PCIe AER events flood

It is observed that some PCIe cards might have issues negotiating power management policies and may produce AER events/messages under old UEFI firmware (1.x.y.z-something). The latest firmwares does not have this issue. There is an option in BIOS to handle such messages in firmware instead but it is untested.

...