[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <85c670f4-e34d-46da-8458-b7233a902a5f@intel.com>
Date: Thu, 8 May 2025 08:53:24 -0700
From: Reinette Chatre <reinette.chatre@...el.com>
To: Tony Luck <tony.luck@...el.com>, Fenghua Yu <fenghuay@...dia.com>, "Maciej
Wieczor-Retman" <maciej.wieczor-retman@...el.com>, Peter Newman
<peternewman@...gle.com>, James Morse <james.morse@....com>, Babu Moger
<babu.moger@....com>, Drew Fustini <dfustini@...libre.com>, Dave Martin
<Dave.Martin@....com>, Anil Keshavamurthy <anil.s.keshavamurthy@...el.com>,
Chen Yu <yu.c.chen@...el.com>
CC: <x86@...nel.org>, <linux-kernel@...r.kernel.org>,
<patches@...ts.linux.dev>
Subject: Re: [PATCH v4 16/31] x86/resctrl: Add first part of telemetry event
enumeration
Hi Tony,
After working with this series for a while and needing to jump between
patches to understand how things fit together I would like to please ask
that you replace these "Add <num> part ..." and "Final steps ..." shortlogs
with something descriptive that reflects what the patch actually does.
This will make it much easier to jump between patches while trying
to understand how this all fits together.
On 4/28/25 5:33 PM, Tony Luck wrote:
> The OOBMSM VSEC discovery driver enumerates many different types
> of telemetry resources. Resctrl is only interested in the ones
> that are tied to an RMID value in the IA32_PQR_ASSOC MSR.
"RMID value in the IA32_PQR_ASSOC MSR" -> "RMID value that can be used
in the IA32_PQR_ASSOC MSR"?
>
> Make a request for each of the FEATURE_PER_RMID_ENERGY_TELEM and
"for each of the" -> "for the"
> FEATURE_PER_RMID_PERF_TELEM feature groups and scan the list
> of known event groups for matching guid values.
This is the first (apart from fake driver) mention of guid and it
is mentioned in a way that assumes reader knows what it means and what
the significance is. Please add more context about what guid means/represents.
>
> Configuration to follow in subsequent patches.
Please avoid "subsequent patches" in changelog.
>
> Hold onto references to any pmt_feature_groups that resctrl
How is reader expected to know what a "pmt_feature_group" is?
This work relies on a new separate feature that introduces new
data structures and itself introduces several data structures.
Please help reader to understand how this all fits together.
> uses until resctrl exit.
>
> Signed-off-by: Tony Luck <tony.luck@...el.com>
> ---
> arch/x86/kernel/cpu/resctrl/internal.h | 8 ++
> arch/x86/kernel/cpu/resctrl/core.c | 5 ++
> arch/x86/kernel/cpu/resctrl/intel_aet.c | 113 ++++++++++++++++++++++++
> arch/x86/kernel/cpu/resctrl/Makefile | 1 +
> 4 files changed, 127 insertions(+)
> create mode 100644 arch/x86/kernel/cpu/resctrl/intel_aet.c
>
> diff --git a/arch/x86/kernel/cpu/resctrl/internal.h b/arch/x86/kernel/cpu/resctrl/internal.h
> index 83b20e6b25d7..571db665eca6 100644
> --- a/arch/x86/kernel/cpu/resctrl/internal.h
> +++ b/arch/x86/kernel/cpu/resctrl/internal.h
> @@ -167,4 +167,12 @@ void __init intel_rdt_mbm_apply_quirk(void);
>
> void rdt_domain_reconfigure_cdp(struct rdt_resource *r);
>
> +#ifdef CONFIG_INTEL_AET_RESCTRL
> +bool intel_aet_get_events(void);
> +void __exit intel_aet_exit(void);
> +#else
> +static inline bool intel_aet_get_events(void) { return false; }
> +static inline void intel_aet_exit(void) { };
Extra semicolon?
> +#endif
> +
> #endif /* _ASM_X86_RESCTRL_INTERNAL_H */
> diff --git a/arch/x86/kernel/cpu/resctrl/core.c b/arch/x86/kernel/cpu/resctrl/core.c
> index 4d1556707c01..0103f577e4ca 100644
> --- a/arch/x86/kernel/cpu/resctrl/core.c
> +++ b/arch/x86/kernel/cpu/resctrl/core.c
> @@ -724,6 +724,9 @@ void resctrl_arch_pre_mount(void)
>
> if (atomic_cmpxchg(&only_once, 0, 1))
> return;
> +
> + if (!intel_aet_get_events())
> + return;
> }
>
> enum {
> @@ -1076,6 +1079,8 @@ late_initcall(resctrl_arch_late_init);
>
> static void __exit resctrl_arch_exit(void)
> {
> + intel_aet_exit();
> +
> cpuhp_remove_state(rdt_online);
>
> resctrl_exit();
> diff --git a/arch/x86/kernel/cpu/resctrl/intel_aet.c b/arch/x86/kernel/cpu/resctrl/intel_aet.c
> new file mode 100644
> index 000000000000..dda44baf75ae
> --- /dev/null
> +++ b/arch/x86/kernel/cpu/resctrl/intel_aet.c
> @@ -0,0 +1,113 @@
> +// SPDX-License-Identifier: GPL-2.0-only
> +/*
> + * Resource Director Technology(RDT)
> + * - Intel Application Energy Telemetry
> + *
> + * Copyright (C) 2025 Intel Corporation
> + *
> + * Author:
> + * Tony Luck <tony.luck@...el.com>
> + */
> +
> +#define pr_fmt(fmt) "resctrl: " fmt
> +
> +#include <linux/cleanup.h>
> +#include <linux/cpu.h>
> +#include <linux/resctrl.h>
> +
> +/* Temporary - delete from final version */
> +#include "fake_intel_aet_features.h"
> +
> +#include "internal.h"
> +
> +/**
> + * struct event_group - All information about a group of telemetry events.
> + * Some fields initialized with MMIO layout information
> + * gleaned from the XML files. Others are set from data
> + * retrieved from intel_pmt_get_regions_by_feature().
Please see "Structure, union, and enumeration documentation" in
Documentation/doc-guide/kernel-doc.rst on how a "brief description" is
separate from the full description.
The "some" and "others" is quite vague. The members themselves
can have snippet to indicate where it is initialized from.
> + * @pfg: The pmt_feature_group for this event group
This comment does not add any information.
"Points to the aggregated telemetry space information within the OOBMSM driver
that contains data for all telemetry regions ..."
> + * @guid: Unique number per XML description file
How can it be sure that it is clear to reader what "XML description file"
means here? Which XML file? The structure description can expand what is meant
by "the XML files" and then each member initialized from it can have a
"(initialized from XML file)" or "(from XML file)" snippet.
> + */
> +struct event_group {
> + struct pmt_feature_group *pfg;
> + int guid;
Should this be u32?
> +};
> +
> +/* Link: https://github.com/intel/Intel-PMT xml/CWF/OOBMSM/RMID-ENERGY *.xml */
404
> +static struct event_group energy_0x26696143 = {
> + .guid = 0x26696143,
> +};
> +
> +/* Link: https://github.com/intel/Intel-PMT xml/CWF/OOBMSM/RMID-PERF *.xml */
404
> +static struct event_group perf_0x26557651 = {
> + .guid = 0x26557651,
> +};
> +
> +static struct event_group *known_event_groups[] = {
> + &energy_0x26696143,
> + &perf_0x26557651,
> +};
One has to study this series further to understand where this is going. At this point
the data structure seems to be unnecessarily complex requiring (what appears to be)
a lot of unnecessary pointer wrangling. Reader can ask here ... why pointers, why
not just an array of structs?. Adding information to explain
this choice will help to understand this work and make this easier to review.
> +
> +#define NUM_KNOWN_GROUPS ARRAY_SIZE(known_event_groups)
> +
> +static bool configure_events(struct event_group *e, struct pmt_feature_group *p)
configure_events() is an unexpected name for a function that returns true/false.
There is no explanation for this and reader needs to read following patches to
understand what its purpose is.
> +{
> + return false;
> +}
> +
> +DEFINE_FREE(intel_pmt_put_feature_group, struct pmt_feature_group *, \
> + if (!IS_ERR_OR_NULL(_T)) \
> + intel_pmt_put_feature_group(_T))
> +
Please document get_pmt_feature().
> +static bool get_pmt_feature(enum pmt_feature_id feature)
> +{
> + struct pmt_feature_group *p __free(intel_pmt_put_feature_group) = NULL;
> + struct event_group **peg;
> + bool ret;
> +
> + p = intel_pmt_get_regions_by_feature(feature);
> +
> + if (IS_ERR_OR_NULL(p))
> + return false;
> +
> + for (peg = &known_event_groups[0]; peg < &known_event_groups[NUM_KNOWN_GROUPS]; peg++) {
> + for (int i = 0; i < p->count; i++) {
> + if ((*peg)->guid == p->regions[i].guid) {
> + ret = configure_events((*peg), p);
Unnecessary parenthesis?
> + if (ret) {
This introduces too much confusion. It is custom for "0" to indicate success but this uses
boolean that is an unclear choice for a function that turns out (looking a few patches ahead)
to be complex. Please change "configure_events()" to have proper return codes. For example, it
allocates memory and when that fails it should return "-ENOMEM", not "false".
> + (*peg)->pfg = no_free_ptr(p);
> + return true;
> + }
> + break;
> + }
> + }
> + }
> +
> + return false;
> +}
> +
> +/*
> + * Ask OOBMSM discovery driver for all the RMID based telemetry groups
> + * that it supports.
This comment implies that get_pmt_feature() is an OOBMSM call?
> + */
> +bool intel_aet_get_events(void)
> +{
> + bool ret1, ret2;
> +
> + ret1 = get_pmt_feature(FEATURE_PER_RMID_ENERGY_TELEM);
> + ret2 = get_pmt_feature(FEATURE_PER_RMID_PERF_TELEM);
> +
> + return ret1 || ret2;
> +}
> +
> +void __exit intel_aet_exit(void)
> +{
> + struct event_group **peg;
> +
> + for (peg = &known_event_groups[0]; peg < &known_event_groups[NUM_KNOWN_GROUPS]; peg++) {
> + if ((*peg)->pfg) {
> + intel_pmt_put_feature_group((*peg)->pfg);
> + (*peg)->pfg = NULL;
> + }
> + }
> +}
> diff --git a/arch/x86/kernel/cpu/resctrl/Makefile b/arch/x86/kernel/cpu/resctrl/Makefile
> index 28ae1c88b2ac..8b4603cad783 100644
> --- a/arch/x86/kernel/cpu/resctrl/Makefile
> +++ b/arch/x86/kernel/cpu/resctrl/Makefile
> @@ -1,6 +1,7 @@
> # SPDX-License-Identifier: GPL-2.0
> obj-$(CONFIG_X86_CPU_RESCTRL) += core.o rdtgroup.o monitor.o
> obj-$(CONFIG_X86_CPU_RESCTRL) += ctrlmondata.o
> +obj-$(CONFIG_INTEL_AET_RESCTRL) += intel_aet.o
> obj-$(CONFIG_INTEL_AET_RESCTRL) += fake_intel_aet_features.o
> obj-$(CONFIG_RESCTRL_FS_PSEUDO_LOCK) += pseudo_lock.o
>
Reinette
Powered by blists - more mailing lists