Construction of AArch64Insn from a64.decode

Construction of AArch64Insn from a64.decode

Instruction format naming

tcg/aarch64/tcg-target.c.inc contains a list of instructions in the AArch64Insn enum; this groups instructions into formats which are numbered; for example BR, BLR and RET are all given the format “3207”. According to the comment above this, “Arm didn't provide us with nice names for the formats, so we use the section number of the architecture reference manual in which the instruction group is described.

The format number is used to check against the function which emits the instruction; hence tcg_out_insn_3207 is used for BR, BLR and RET functions. This is checked by the preprocessor.

We would like to change the numbers used here to more meaningful names, ideally some names which match conventions used by other code.

Other conventions

a64.decode

target/arm/tcg/a64.decode lists a set of instruction formats used for decoding. For example, AND_i, OR_i, EOR_i are all in the format @logic_imm_32 (@logic_imm_64 variants are also present).

For example, in tcg-target.c.inc, I3207_BR maps to 0xd61f0000. This is a base format for the instruction; other fields need to be ORed with this number to form the full instruction.

In a64.decode, BR is represented as:

BR 1101011 0000 11111 000000 rn:5 00000 &r

Inserting five zeros in place of ‘rn:5’ which represents a registers number gets us the value 0b11010110000111110000000000000000 or 0xd61f0000.

a64.decode does give a named instruction format for BR. The “&r” above is an argument reference; just telling us that on of the variable parts of the instruction is a single register. Other instructions have format names signified by an '@' at the end of the line as per @logic_imm_32 above.

This gives us some instruction format names, but there are problems with this approach:

  • There are many more instructions in a64.decode than are used in AArch64Insn, so we would still need a list of instructions in tcg-target.c.inc.

    • We could generate enumerations and unimplemented stub functions for all the formats in a64.decode, but this would likely increase the code size significantly for a lot of encodings we don’t use.

  • Some instructions in a64.decode don’t have any instruction grouping, as per ‘BR’ above.

  • Some instructions don’t appear in a64.decode; for example ‘CBNZ’ used in tcg-target.c.inc is not present in a64.decode; instead, a64.decode only lists CBZ and has the ‘nz:1’ field in its opcode description. Presumably, nz=0 means CBZ and nz=1 means CBNZ. We could still use this to match CBNZ into the ‘cbz’ format, though.

  • Some instructions have multiple decodes. For example, only one form of NOP is listed in tcg-target.c.inc while five are listed in a64.decode. None of the NOPs haves any instruction format class in a64.decode, presumably because they have no varying fields, but this may show be a problem if other instructions map to several formats in a64.decode.

  • Some SIMD instructions have arguments which cannot be zero:

    • I3609 class instructions have an immh field which cannot be zero

    • I3611 class instructions have a size field which cannot be zero

  • Hence, the base instruction format given in tcg-target.c.inc will be rejected by the decoder. This is only a problem if we use the decoder generated by decodetree.py; the base format for these instructions is still readable in a64.decode and could be used if we parsed a64.decode with another tool.