[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <b69bee17-6a84-4cb2-ab8a-2793c2fe7c49@intel.com>
Date: Mon, 31 Mar 2025 09:14:55 -0700
From: Reinette Chatre <reinette.chatre@...el.com>
To: Tony Luck <tony.luck@...el.com>, Fenghua Yu <fenghuay@...dia.com>, "Maciej
Wieczor-Retman" <maciej.wieczor-retman@...el.com>, Peter Newman
<peternewman@...gle.com>, James Morse <james.morse@....com>, Babu Moger
<babu.moger@....com>, Drew Fustini <dfustini@...libre.com>, Dave Martin
<Dave.Martin@....com>
CC: <linux-kernel@...r.kernel.org>, <patches@...ts.linux.dev>
Subject: Re: [PATCH v2 01/16] x86/rectrl: Fake OOBMSM interface
Hi Tony,
(nit in shortlog: rectrl -> resctrl)
On 3/21/25 4:15 PM, Tony Luck wrote:
> Real version is coming soon ... this is here so the remaining parts
> will build (and run ... assuming a 2 socket system that supports RDT
> monitoring ... only missing part is that the event counters just
> report fixed values).
>
> Real version of this would just add the INTEL_AET_RESCTRL Kconfig
> option with dependency checks on
> INTEL_VSEC=y && INTEL_AET_TELEMETRY=y && INTEL_AET_DISCOVERY=y
>
> Just for RFC discussion.
Would appreciate a bit more detail about what OOBMSM provides
to be able to understand this series.
Even though changelog mentions "event counters" I am not able to
recognize any unique events provided by this interface. Instead it
just seems to provide a memory region that is entirely up to resctrl
to interpret based on the "unique identifier" hinted to in cover letter.
I could not find any description that connects the "unique identifier"
to the "guid" used in following patches. I think it will be helpful to
upfront connect the high level "unique identifier" with the "guid" that
the patches use to make this obvious.
Closest information to something that can be used by resctrl is
"num_rmids". Could you please add information on how "num_rmids" relates
to the memory region that is only specified via an addr and size?
>
> Signed-off-by: Tony Luck <tony.luck@...el.com>
> ---
> .../cpu/resctrl/fake_intel_aet_features.h | 73 +++++++++++++++++++
> .../cpu/resctrl/fake_intel_aet_features.c | 65 +++++++++++++++++
> arch/x86/Kconfig | 1 +
> arch/x86/kernel/cpu/resctrl/Makefile | 1 +
> drivers/platform/x86/intel/pmt/Kconfig | 3 +
> 5 files changed, 143 insertions(+)
> create mode 100644 arch/x86/kernel/cpu/resctrl/fake_intel_aet_features.h
> create mode 100644 arch/x86/kernel/cpu/resctrl/fake_intel_aet_features.c
>
> diff --git a/arch/x86/kernel/cpu/resctrl/fake_intel_aet_features.h b/arch/x86/kernel/cpu/resctrl/fake_intel_aet_features.h
> new file mode 100644
> index 000000000000..c835c4108abc
> --- /dev/null
> +++ b/arch/x86/kernel/cpu/resctrl/fake_intel_aet_features.h
> @@ -0,0 +1,73 @@
> +/* SPDX-License-Identifier: GPL-2.0 */
> +
> +/* Bits stolen from OOBMSM VSEC discovery code */
> +
> +enum pmt_feature_id {
> + FEATURE_INVALID = 0x0,
> + FEATURE_PER_CORE_PERF_TELEM = 0x1,
> + FEATURE_PER_CORE_ENV_TELEM = 0x2,
> + FEATURE_PER_RMID_PERF_TELEM = 0x3,
> + FEATURE_ACCEL_TELEM = 0x4,
> + FEATURE_UNCORE_TELEM = 0x5,
> + FEATURE_CRASH_LOG = 0x6,
> + FEATURE_PETE_LOG = 0x7,
> + FEATURE_TPMI_CTRL = 0x8,
> + FEATURE_RESERVED = 0x9,
> + FEATURE_TRACING = 0xA,
> + FEATURE_PER_RMID_ENERGY_TELEM = 0xB,
> + FEATURE_MAX = 0xB,
> +};
> +
> +/**
> + * struct oobmsm_plat_info - Platform information for a device instance
> + * @cdie_mask: Mask of all compute dies in the partition
> + * @package_id: CPU Package id
> + * @partition: Package partition id when multiple VSEC PCI devices per package
> + * @segment: PCI segment ID
> + * @bus_number: PCI bus number
> + * @device_number: PCI device number
> + * @function_number: PCI function number
> + *
> + * Structure to store platform data for a OOBMSM device instance.
> + */
> +struct oobmsm_plat_info {
> + u16 cdie_mask;
> + u8 package_id;
> + u8 partition;
> + u8 segment;
> + u8 bus_number;
> + u8 device_number;
> + u8 function_number;
> +};
> +
> +enum oobmsm_supplier_type {
> + OOBMSM_SUP_PLAT_INFO,
> + OOBMSM_SUP_DISC_INFO,
> + OOBMSM_SUP_S3M_SIMICS,
> + OOBMSM_SUP_TYPE_MAX
> +};
> +
> +struct oobmsm_mapping_supplier {
> + struct device *supplier_dev[OOBMSM_SUP_TYPE_MAX];
> + struct oobmsm_plat_info plat_info;
> + unsigned long features;
> +};
> +
> +struct telemetry_region {
> + struct oobmsm_plat_info plat_info;
> + void __iomem *addr;
> + size_t size;
> + u32 guid;
> + u32 num_rmids;
> +};
Could there be some description of what a "telemetry_region" is and
how the members should be interpreted by resctrl?
> +
> +struct pmt_feature_group {
> + enum pmt_feature_id id;
> + int count;
> + struct kref kref;
> + struct telemetry_region regions[];
> +};
> +
> +struct pmt_feature_group *intel_pmt_get_regions_by_feature(enum pmt_feature_id id);
> +
> +void intel_pmt_put_feature_group(struct pmt_feature_group *feature_group);
> diff --git a/arch/x86/kernel/cpu/resctrl/fake_intel_aet_features.c b/arch/x86/kernel/cpu/resctrl/fake_intel_aet_features.c
> new file mode 100644
> index 000000000000..b537068d99fb
> --- /dev/null
> +++ b/arch/x86/kernel/cpu/resctrl/fake_intel_aet_features.c
> @@ -0,0 +1,65 @@
> +// SPDX-License-Identifier: GPL-2.0-only
> +#include <linux/cleanup.h>
> +#include <linux/minmax.h>
> +#include <linux/slab.h>
> +#include "fake_intel_aet_features.h"
> +#include <linux/intel_vsec.h>
> +#include <linux/resctrl.h>
> +
> +#include "internal.h"
> +
> +#define ENERGY_QWORDS ((576 * 2) + 3)
> +#define PERF_QWORDS ((576 * 7) + 3)
> +
> +static long pg[4 * ENERGY_QWORDS + 2 * PERF_QWORDS];
> +
> +static int __init fill(void)
> +{
> + u64 val = 0;
> +
> + for (int i = 0; i < sizeof(pg); i += sizeof(val)) {
> + pg[i / sizeof(val)] = BIT_ULL(63) + val;
> + val++;
> + }
> + return 0;
> +}
> +device_initcall(fill);
> +
> +#define PKG_REGION(_entry, _guid, _addr, _pkg) \
> + [_entry] = { .guid = _guid, .addr = (void __iomem *)_addr, .plat_info = { .package_id = _pkg }}
> +
> +static struct pmt_feature_group fake_energy = {
> + .count = 4,
> + .regions = {
> + PKG_REGION(0, 0x26696143, &pg[0 * ENERGY_QWORDS], 0),
> + PKG_REGION(1, 0x26696143, &pg[1 * ENERGY_QWORDS], 0),
> + PKG_REGION(2, 0x26696143, &pg[2 * ENERGY_QWORDS], 1),
> + PKG_REGION(3, 0x26696143, &pg[3 * ENERGY_QWORDS], 1)
> + }
> +};
> +
> +static struct pmt_feature_group fake_perf = {
> + .count = 2,
> + .regions = {
> + PKG_REGION(0, 0x26557651, &pg[4 * ENERGY_QWORDS + 0 * PERF_QWORDS], 0),
> + PKG_REGION(1, 0x26557651, &pg[4 * ENERGY_QWORDS + 1 * PERF_QWORDS], 1)
> + }
> +};
Could there be some guidance on how to interpret the hardcoded data? For example,
one group contains two regions and the other four. Was this just done for testing
to ensure implementation supports multiple regions per package or ...?
Is it expected that multiple feature groups could have different number of regions?
I also think initializing the id would be helpful to understand the example better.
> +
> +struct pmt_feature_group *
> +intel_pmt_get_regions_by_feature(enum pmt_feature_id id)
> +{
> + switch (id) {
> + case FEATURE_PER_RMID_ENERGY_TELEM:
> + return &fake_energy;
> + case FEATURE_PER_RMID_PERF_TELEM:
> + return &fake_perf;
> + default:
> + return ERR_PTR(-ENOENT);
> + }
> + return ERR_PTR(-ENOENT);
> +}
> +
> +void intel_pmt_put_feature_group(struct pmt_feature_group *feature_group)
> +{
> +}
> diff --git a/arch/x86/Kconfig b/arch/x86/Kconfig
> index ea29d22a621f..6112cb6cad05 100644
> --- a/arch/x86/Kconfig
> +++ b/arch/x86/Kconfig
> @@ -504,6 +504,7 @@ config X86_CPU_RESCTRL
> bool "x86 CPU resource control support"
> depends on X86 && (CPU_SUP_INTEL || CPU_SUP_AMD)
> depends on MISC_FILESYSTEMS
> + select INTEL_AET_RESCTRL if X86_64
I expect something like this will stay (i.e. not part of the "Fake" code).
In this case, should this perhaps only be selected on Intel (CPU_SUP_INTEL)?
(nit: the tab is unexpected)
> select ARCH_HAS_CPU_RESCTRL
> select RESCTRL_FS
> select RESCTRL_FS_PSEUDO_LOCK
> diff --git a/arch/x86/kernel/cpu/resctrl/Makefile b/arch/x86/kernel/cpu/resctrl/Makefile
> index 909be78ec6da..2c3b09f95127 100644
> --- a/arch/x86/kernel/cpu/resctrl/Makefile
> +++ b/arch/x86/kernel/cpu/resctrl/Makefile
> @@ -2,6 +2,7 @@
> obj-$(CONFIG_X86_CPU_RESCTRL) += core.o rdtgroup.o monitor.o
> obj-$(CONFIG_X86_CPU_RESCTRL) += ctrlmondata.o
> obj-$(CONFIG_RESCTRL_FS_PSEUDO_LOCK) += pseudo_lock.o
> +obj-$(CONFIG_INTEL_AET_RESCTRL) += fake_intel_aet_features.o
>
> # To allow define_trace.h's recursive include:
> CFLAGS_pseudo_lock.o = -I$(src)
> diff --git a/drivers/platform/x86/intel/pmt/Kconfig b/drivers/platform/x86/intel/pmt/Kconfig
> index e916fc966221..6d3b1f64efe9 100644
> --- a/drivers/platform/x86/intel/pmt/Kconfig
> +++ b/drivers/platform/x86/intel/pmt/Kconfig
> @@ -38,3 +38,6 @@ config INTEL_PMT_CRASHLOG
>
> To compile this driver as a module, choose M here: the module
> will be called intel_pmt_crashlog.
> +
> +config INTEL_AET_RESCTRL
> + bool
I expect that this will also stay ... if so, could this be expanded to
have needed dependency and also be accompanied by some documentation. Something like
"Architecture selects this when ...." This will make it clear that this is
not something a user will select during kernel build while also explaining
what it is used for.
Reinette
Powered by blists - more mailing lists