[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <e5a06b5b-1281-452e-ab9e-83ac0edf2418@amd.com>
Date: Tue, 27 Aug 2024 12:20:51 -0500
From: Mario Limonciello <mario.limonciello@....com>
To: Perry Yuan <perry.yuan@....com>, hdegoede@...hat.com,
ilpo.jarvinen@...ux.intel.com, Borislav.Petkov@....com,
kprateek.nayak@....com
Cc: Alexander.Deucher@....com, Xinmei.Huang@....com,
bharathprabhu.perdoor@....com, poonam.aggrwal@....com, Li.Meng@....com,
platform-driver-x86@...r.kernel.org, linux-kernel@...r.kernel.org,
Xiaojian.Du@....com
Subject: Re: [PATCH 01/11] Documentation: x86: Add AMD Hardware Feedback
Interface documentation
On 8/27/2024 04:36, Perry Yuan wrote:
> From: Perry Yuan <Perry.Yuan@....com>
>
> Introduce a new documentation file, `amd_hfi.rst`, which delves into the
> implementation details of the AMD Hardware Feedback Interface and its
> associated driver, `amd_hfi`. This documentation describes how the
> driver provides hint to the OS scheduling which depends on the capability
> of core performance and efficiency ranking data.
>
> This documentation describes
> * The design of the driver
> * How the driver provides hints to the OS scheduling
> * How the driver interfaces with the kernel for efficiency ranking data.
>
> Signed-off-by: Perry Yuan <Perry.Yuan@....com>
Reviewed-by: Mario Limonciello <mario.limonciello@....com>
> ---
> Documentation/arch/x86/amd-hfi.rst | 116 +++++++++++++++++++++++++++++
> Documentation/arch/x86/index.rst | 1 +
> 2 files changed, 117 insertions(+)
> create mode 100644 Documentation/arch/x86/amd-hfi.rst
>
> diff --git a/Documentation/arch/x86/amd-hfi.rst b/Documentation/arch/x86/amd-hfi.rst
> new file mode 100644
> index 000000000000..351641ce2821
> --- /dev/null
> +++ b/Documentation/arch/x86/amd-hfi.rst
> @@ -0,0 +1,116 @@
> +.. SPDX-License-Identifier: GPL-2.0
> +
> +======================================================================
> +Hardware Feedback Interface For Hetero Core Scheduling On AMD Platform
> +======================================================================
> +
> +:Copyright (C) 2024 Advanced Micro Devices, Inc. All Rights Reserved.
> +
> +:Author: Perry Yuan <perry.yuan@....com>
> +
> +Overview
> +--------
> +
> +AMD Heterogeneous Core implementations are comprised of more than one
> +architectural class and CPUs are comprised of cores of various efficiency
> +and power capabilities. Power management strategies must be designed to accommodate
> +the complexities introduced by incorporating different core types.
> +Heterogeneous systems can also extend to more than two architectural classes as well.
> +The purpose of the scheduling feedback mechanism is to provide information to
> +the operating system scheduler in real time such that the scheduler can direct
> +threads to the optimal core.
> +
> +``Classic cores`` are generally more performant and ``Dense cores`` are generally more
> +efficient.
> +The goal of AMD's heterogeneous architecture is to attain power benefit by sending
> +background thread to the dense cores while sending high priority threads to the classic
> +cores. From a performance perspective, sending background threads to dense cores can free
> +up power headroom and allow the classic cores to optimally service demanding threads.
> +Furthermore, the area optimized nature of the dense cores allows for an increasing
> +number of physical cores. This improved core density will have positive multithreaded
> +performance impact.
> +
> +AMD Heterogeneous Core Driver
> +-----------------------------
> +
> +The ``amd_hfi`` driver delivers the operating system a performance and energy efficiency
> +capability data for each CPU in the system. The scheduler can use the ranking data
> +from the HFI driver to make task placement decisions.
> +
> +Thread Classification and Ranking Table Interaction
> +----------------------------------------------------
> +
> +The thread classification is used to select into a ranking table that describes
> +an efficiency and performance ranking for each classification.
> +
> +Threads are classified during runtime into enumerated classes. The classes represent
> +thread performance/power characteristics that may benefit from special scheduling behaviors.
> +The below table depicts an example of thread classification and a preference where a given thread
> +should be scheduled based on its thread class. The real time thread classification is consumed
> +by the operating system and is used to inform the scheduler of where the thread should be placed.
> +
> +Thread Classification Example Table
> +^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
> ++----------+----------------+-------------------------------+---------------------+---------+
> +| class ID | Classification | Preferred scheduling behavior | Preemption priority | Counter |
> ++----------+----------------+-------------------------------+---------------------+---------+
> +| 0 | Default | Performant | Highest | |
> ++----------+----------------+-------------------------------+---------------------+---------+
> +| 1 | Non-scalable | Efficient | Lowest | PMCx1A1 |
> ++----------+----------------+-------------------------------+---------------------+---------+
> +| 2 | I/O bound | Efficient | Lowest | PMCx044 |
> ++----------+----------------+-------------------------------+---------------------+---------+
> +
> +
> +AMD Hardware Feedback Interface
> +--------------------------------
> +
> +The Hardware Feedback Interface provides to the operating system information
> +about the performance and energy efficiency of each CPU in the system. Each
> +capability is given as a unit-less quantity in the range [0-255]. A higher
> +performance value indicates higher performance capability, and a higher
> +efficiency value indicates more efficiency. Energy efficiency and performance
> +are reported in separate capabilities in the shared memory based ranking table.
> +
> +These capabilities may change at runtime as a result of changes in the
> +operating conditions of the system or the action of external factors.
> +Power Management FW is responsible for detecting events that would require
> +a reordering of the performance and efficiency ranking. Table updates would
> +happen relatively infrequently and occur on the time scale of seconds or more.
> +
> +The mechanism used to trigger a table update like below events:
> + * Thermal Stress Events
> + * Silent Compute
> + * Extreme Low Battery Scenarios
> +
> +The kernel or a userspace policy daemon can use these capabilities to modify
> +task placement decisions. For instance, if either the performance or energy
> +capabilities of a given logical processor becomes zero, it is an indication that
> +the hardware recommends to the operating system to not schedule any tasks on
> +that processor for performance or energy efficiency reasons, respectively.
> +
> +Implementation details for Linux
> +--------------------------------
> +
> +The implementation of threads scheduling consists of the following steps:
> +
> +1. A thread is spawned and scheduled to the ideal core using the default
> + heterogeneous scheduling policy.
> +2. The processor profiles thread execution and assigns an enumerated classification ID.
> + This classification is communicated to the OS via logical processor scope MSR.
> +3. During the thread context switch out the operating system consumes the workload(WL)
> + classification which resides in a logical processor scope MSR.
> +4. The OS triggers the hardware to clear its history by writing to an MSR,
> + after consuming the WL classification and before switching in the new thread.
> +5. If due to the classification, ranking table, and processor availability,
> + the thread is not on its ideal processor, the OS will then consider scheduling
> + the thread on its ideal processor (if available).
> +
> +Ranking Table update
> +---------------------------
> +The power management firmware issues an platform interrupt after updating the ranking
> +table and is ready for the operating system to consume it. CPUs receive such interrupt
> +and read new ranking table from shared memory which PCCT table has provided, then
> +``amd_hfi`` driver parse the new table to provide new consume data for scheduling decisions.
> +
> +
> diff --git a/Documentation/arch/x86/index.rst b/Documentation/arch/x86/index.rst
> index 8ac64d7de4dc..7f47229f3104 100644
> --- a/Documentation/arch/x86/index.rst
> +++ b/Documentation/arch/x86/index.rst
> @@ -43,3 +43,4 @@ x86-specific Documentation
> features
> elf_auxvec
> xstate
> + amd_hfi
Powered by blists - more mailing lists