Device Tree Feature (aka DT) Support

About

Goal: Split current monolithic wperf-driver (Kernel driver) into small, independent drivers with well defined Kernel ↔︎ User space protocol.

Relates to WPERF-709: Device Tree SupportClosed.

Introduction

Note: Please note that this is draft, we may want to split drivers in a different way e.g. core PMU and uncore PMU as two separate drivers (not three like its stated below). See below draft of ideas and possible solutions to DT architecture. Further chapters should describe architecture we want o actually implement.

  • [ON HOLD] Split current wperf-driver into three separate independent drivers (core, DSU and DMC driver).

  • Introduce “driver type” for WindowsPerf drivers in DT.

    • “core PMU” driver type to support vanilla Arm PMUs.

    • “uncore PMU” driver(s):

      • Introduce “DSU” driver type to support uncore PMU counting.

      • Introduce “DMC” driver type to support Dynamic Memory Controllers.

  • [ON HOLD] Introduce driver “template” for each type.

    • Note: We may want to have one template for “PMU like devices”

  • [ON HOLD] Move current wperf-driver core PMU and uncore PMU implementations under new driver types.

  • Add DT driver enumeration to user space.

    • Adapt wperf and drivers to communicate with N drivers installed.

    • Adapt wperf-devgen to support new driver types enumeration and installation / removal.

Design And Architecture

This chapter contains design and architectural decisions we agree to implement with DT feature.

All below architectural changes are a roadmap (no order in which they should be added).

DMC driver prototype

  • Files · feature-WPERF-729 · linaro / WindowsPerf / WindowsPerf · GitLab

  • Comments from @Matthew Sykes (Deactivated) I got from Slack (just to archive here):

    • This is the IOCTL handler. The concept of 'queue' is a bit old fashioned to be honest, I dont know why MSFT used it in KMDF, but they have, and the 'queue init' code sets the IOCTL handler entry point.

    • Device create sets the File Open and CLose entry point, so those handlers are in device.c
      WDM drivers are much cleaner in this respect, open, close, ioctl, are just Irp requests, and all set in the same place and often handled in the same file

    • Then you have power and plug and play handlers, they also have their own files
      IOCTL handlers by their nature can be huge

    • Especially so since often the Irp needs handling on the way back up from a lower device, to there is twice as much code, one section sends it down, the other handles it on the way up, as the lower driver, usually a PDO, sets some values in the Irp

    • But yeah, its a bit of a moot point whether to put queue and ioctl into two files or one, as they are actually the same thing in KMDF

      • image-20240314-135153.png
    • Linaro Dummy is the driver, and the yellow bang is saying:

      • image-20240314-135259.png
    • "The device cannot find enough free resources" , yet clearly in the previous screen shot there is nothing else using this range. So its almost as if  the system has reserved it and wont let the DMC have it.

Kernel - user-space Interface

Interface between user-space (wperf application and Kernel driver aka wperf-driver (and other wperf-driver-* drivers divided into logical sections and is defined by IOCTLs as follows:

Counting Interface

This supports generic Arm PMU hardware.

Counting model, for obtaining aggregate counts of occurrences of special events.

IOCTL

Description / Notes

IOCTL

Description / Notes

IOCTL_PMU_CTL_START

 

IOCTL_PMU_CTL_STOP

 

IOCTL_PMU_CTL_RESET

 

IOCTL_PMU_CTL_QUERY_SUPP_EVENTS

 

IOCTL_PMU_CTL_ASSIGN_EVENTS

 

IOCTL_PMU_CTL_READ_COUNTING

 

IOCTL_DSU_CTL_INIT

 

IOCTL_DSU_CTL_READ_COUNTING

 

IOCTL_DMC_CTL_INIT

 

IOCTL_DMC_CTL_READ_COUNTING

 

Sampling Interface

This supports generic Arm PMU hardware and other devices (such as SPE?) which can perform sampling and deliver the same typo of information to user space: PC values with hit counts.

Sampling model, for determining the frequencies of event occurrences produced by program locations at the function, basic block, and/or instruction levels.

IOCTL

Description / Notes

IOCTL

Description / Notes

IOCTL_PMU_CTL_SAMPLE_SET_SRC

 

IOCTL_PMU_CTL_SAMPLE_START

 

IOCTL_PMU_CTL_SAMPLE_STOP

 

IOCTL_PMU_CTL_SAMPLE_GET

 

Interoperability Interface

This interface is common amongst all drivers. It is user to query, detect, fetch hardware information from each driver.

Note: ALL DRIVERS MUST CORRECTLY AND IN THE SAME WAY (ABI) IMPLEMENT THIS INTERFACE.

IOCTL

Description / Notes

IOCTL

Description / Notes

IOCTL_PMU_CTL_QUERY_HW_CFG

 

IOCTL_PMU_CTL_QUERY_VERSION

 

IOCTL_PMU_CTL_LOCK_ACQUIRE

 

IOCTL_PMU_CTL_LOCK_RELEASE

 

Device_ID string pattern

See:

Introduction

<dev_type>.<dev_func>=<event_prefix_list> - separated with semicolon ; for each supported "capabilities".

Where:

  • <dev_type> - E.g. supported type such as:

    • core (Arm PMU),

    • dsu

    • dmc (Arm DDR controller), and

    • spe (Arm Statistical Profiling Extension).

  • <dev_func>

    • stat for counting with wperf stat (also timelines)

    • sample or record for sampling with wperf sample / wperf record.

  • <event_prefix_list> - list of event prefixes supported (with wperf ... -e <events>) that will indicate which device to select for the events.

Examples

Example Device_ID string for pristine wperf-driver:

core.stat=core;core.sample=core;dsu.stat=dsu;dmc.stat=dmc_clk,dmc_clkdiv2

Example Device_ID for each device

  • core.stat=core - events starting with /core/ (or no prefix!) selected with:

    • wperf stat -e <event>

    • wperf stat -e /core/<event> or

    • will be used to count events over time.

  • core.sample=core - events starting with /core/ (or no prefix!) selected with

    • wperf sample -e <event>

    • wperf sample -e /core/<event> or

    • will be used to sample events over time.

  • dmc.stat=dmc_clk,dmc_clkdiv2 - events starting with /dmc_clk/or/dmc_clkdiv2/ ` selected with:

    • wperf stat -e /dmc_clk/<event> or

    • wperf stat -e /dmc_clkdiv2/<events will be used to count events over time.

Assumptions:

  1. One driver can have many capabilities in their Device_ID string.

  2. Driver yields data for each capability information in Device_ID string.

 


Notes

  • @Przemyslaw Wirkus: Changes that we may have to implement:

    • New IOCTL with WindowsPerf like detection infmroation, with payload like:

      • Capability string (ASCIIZ) e.g. "spe", "core;dsu;dmc" where "core" or "spe" are defined by us devices which can e.g. count and sample. Can be semicolon ; separated.

        • Note: "core;spe" - we would allow one driver to have more than one capability aka.

      • Functionality string (ASCIIZ) e.g. "counting" or "sampling" where we e.g that "pmu" capability comes with counting and sampling (like wperf-driver does). Can be semicolon ; separated for each capability. Example: for capability string `"core;dsu;dmc" (which would be wperf-driver capability string) we can generate below functionality string:

        • "counting,sampling;counting;counting" and it maps like this:

          • corecounting,sampling - wperf’s stat and sample / record work with events specified with -e and /core/<event> (or -e <event>).

            • Note /core/ will be also default (as it is) for events without name.

          • dsucounting - wperf’s stat work with events specified with -e and /dsu/<events>.

          • dmccounting - wperf’s stat work with events specified with -e and /dmc/<events>.

    • IOCTL_PMU_CTL_QUERY_HW_CFG - we may have to split this IOCTL to at least two parts, one would provide hardware information (like register values) as it’s doing now, second part would be maybe interface specific.