[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <7d149e94-ab9d-4db7-abb6-83aaec1e6735@intel.com>
Date: Wed, 25 Jun 2025 16:38:41 -0700
From: Reinette Chatre <reinette.chatre@...el.com>
To: Babu Moger <babu.moger@....com>, <corbet@....net>, <tony.luck@...el.com>,
<Dave.Martin@....com>, <james.morse@....com>, <tglx@...utronix.de>,
<mingo@...hat.com>, <bp@...en8.de>, <dave.hansen@...ux.intel.com>
CC: <x86@...nel.org>, <hpa@...or.com>, <akpm@...ux-foundation.org>,
<rostedt@...dmis.org>, <paulmck@...nel.org>, <thuth@...hat.com>,
<ardb@...nel.org>, <gregkh@...uxfoundation.org>, <seanjc@...gle.com>,
<thomas.lendacky@....com>, <pawan.kumar.gupta@...ux.intel.com>,
<manali.shukla@....com>, <perry.yuan@....com>, <kai.huang@...el.com>,
<peterz@...radead.org>, <xiaoyao.li@...el.com>, <kan.liang@...ux.intel.com>,
<mario.limonciello@....com>, <xin3.li@...el.com>, <gautham.shenoy@....com>,
<xin@...or.com>, <chang.seok.bae@...el.com>, <fenghuay@...dia.com>,
<peternewman@...gle.com>, <maciej.wieczor-retman@...el.com>,
<eranian@...gle.com>, <linux-doc@...r.kernel.org>,
<linux-kernel@...r.kernel.org>
Subject: Re: [PATCH v14 29/32] fs/resctrl: Introduce the interface to modify
assignments in a group
Hi Babu,
On 6/13/25 2:05 PM, Babu Moger wrote:
> Introduce the interface to modify assignments within a group when
nit: This cannot be an introduction since the "interface" (resctrl file)
already exists at this point so this patch enables it to support modifications.
Perhaps:
"Enable the mbm_l3_assignments resctrl file to be used to modify counter
assignments of CTRL_MON and MON groups when the "mbm_event" counter
assignment mode is enabled." (Please feel free to improve)
> "mbm_event" mode is enabled.
>
> The assignment modifications are done in the following format:
> <Event>:<Domain id>=<Assignment state>
>
> Event: A valid MBM event in the
> /sys/fs/resctrl/info/L3_MON/event_configs directory.
>
> Domain ID: A valid domain ID. When writing, '*' applies the changes
> to all domains.
>
> Assignment states:
>
> _ : Unassign the counter.
>
> e : Assign the counter exclusively.
>
> Examples:
>
> $ cd /sys/fs/resctrl
> $ cat /sys/fs/resctrl/mbm_L3_assignments
> mbm_total_bytes:0=e;1=e
> mbm_local_bytes:0=e;1=e
>
> To unassign the counter associated with the mbm_total_bytes event on
> domain 0:
>
> $ echo "mbm_total_bytes:0=_" > mbm_L3_assignments
> $ cat /sys/fs/resctrl/mbm_L3_assignments
> mbm_total_bytes:0=_;1=e
> mbm_local_bytes:0=e;1=e
>
> To unassign the counter associated with the mbm_total_bytes event on
> all the domains:
>
> $ echo "mbm_total_bytes:*=_" > mbm_L3_assignments
> $ cat /sys/fs/resctrl/mbm_L3_assignments
> mbm_total_bytes:0=_;1=_
> mbm_local_bytes:0=e;1=e
>
> Signed-off-by: Babu Moger <babu.moger@....com>
> ---
...
> ---
> Documentation/filesystems/resctrl.rst | 146 +++++++++++++++++++++++-
> fs/resctrl/internal.h | 9 ++
> fs/resctrl/rdtgroup.c | 157 +++++++++++++++++++++++++-
> 3 files changed, 310 insertions(+), 2 deletions(-)
>
> diff --git a/Documentation/filesystems/resctrl.rst b/Documentation/filesystems/resctrl.rst
> index a232a0b1356c..cd82c2966ed7 100644
> --- a/Documentation/filesystems/resctrl.rst
> +++ b/Documentation/filesystems/resctrl.rst
> @@ -527,7 +527,8 @@ When the "mba_MBps" mount option is used all CTRL_MON groups will also contain:
> Event: A valid MBM event in the
> /sys/fs/resctrl/info/L3_MON/event_configs directory.
>
> - Domain ID: A valid domain ID.
> + Domain ID: A valid domain ID. When writing, '*' applies the changes
> + to all domains.
>
> Assignment states:
>
> @@ -544,6 +545,34 @@ When the "mba_MBps" mount option is used all CTRL_MON groups will also contain:
> mbm_total_bytes:0=e;1=e
> mbm_local_bytes:0=e;1=e
>
> + Assignments can be modified by writing to the interface.
> +
> + Example:
> + To unassign the counter associated with the mbm_total_bytes event on domain 0:
> + ::
> +
> + # echo "mbm_total_bytes:0=_" > /sys/fs/resctrl/mbm_L3_assignments
> + # cat /sys/fs/resctrl/mbm_L3_assignments
> + mbm_total_bytes:0=_;1=e
> + mbm_local_bytes:0=e;1=e
> +
> + To unassign the counter associated with the mbm_total_bytes event on all the domains:
> + ::
> +
> + # echo "mbm_total_bytes:*=_" > /sys/fs/resctrl/mbm_L3_assignments
> + # cat /sys/fs/resctrl/mbm_L3_assignments
> + mbm_total_bytes:0=_;1=_
> + mbm_local_bytes:0=e;1=e
> +
> + To assign the counter associated with the mbm_total_bytes event on all domains in
> + exclusive mode:
> + ::
> +
> + # echo "mbm_total_bytes:*=e" > /sys/fs/resctrl/mbm_L3_assignments
> + # cat /sys/fs/resctrl/mbm_L3_assignments
> + mbm_total_bytes:0=e;1=e
> + mbm_local_bytes:0=e;1=e
> +
> Resource allocation rules
> -------------------------
>
> @@ -1579,6 +1608,121 @@ View the llc occupancy snapshot::
> # cat /sys/fs/resctrl/p1/mon_data/mon_L3_00/llc_occupancy
> 11234000
>
> +
> +Examples on working with mbm_assign_mode
> +========================================
> +
> +a. Check if MBM assign support is available
"MBM assign support"? I do not think this term has been used so far.
> +::
> +
> + #mount -t resctrl resctrl /sys/fs/resctrl/
> +
> + # cat /sys/fs/resctrl/info/L3_MON/mbm_assign_mode
> + [mbm_event]
> + default
> +
> +mbm_event feature is detected and it is enabled.
> +
> +b. Check how many assignable counters are supported.
> +::
> +
> + # cat /sys/fs/resctrl/info/L3_MON/num_mbm_cntrs
> + 0=32;1=32
> +
> +c. Check how many assignable counters are available for assignment in each domain.
> +::
> +
> + # cat /sys/fs/resctrl/info/L3_MON/available_mbm_cntrs
> + 0=30;1=30
> +
> +d. To list the default group's assign states:
> +::
> +
> + # cat /sys/fs/resctrl/mbm_L3_assignments
> + mbm_total_bytes:0=e;1=e
> + mbm_local_bytes:0=e;1=e
> +
> +e. To unassign the counter associated with the mbm_total_bytes event on domain 0:
> +::
> +
> + # echo "mbm_total_bytes:0=_" > /sys/fs/resctrl/mbm_L3_assignments
> + # cat /sys/fs/resctrl/mbm_L3_assignments
> + mbm_total_bytes:0=_;1=e
> + mbm_local_bytes:0=e;1=e
> +
> +f. To unassign the counter associated with the mbm_total_bytes event on all domains:
> +::
> +
> + # echo "mbm_total_bytes:*=_" > /sys/fs/resctrl/mbm_L3_assignments
> + # cat /sys/fs/resctrl/mbm_L3_assignment
> + mbm_total_bytes:0=_;1=_
> + mbm_local_bytes:0=e;1=e
> +
> +g. To assign a counter associated with the mbm_total_bytes event on all domains i
> +nexclusive mode:
"in exclusive"
> +::
> +
> + # echo "mbm_total_bytes:*=e" > /sys/fs/resctrl/mbm_L3_assignments
> + # cat /sys/fs/resctrl/mbm_L3_assignments
> + mbm_total_bytes:0=e;1=e
> + mbm_local_bytes:0=e;1=e
> +
> +h. Read the events mbm_total_bytes and mbm_local_bytes of the default group. There is
> +no change in reading the events with the assignment. If the event is unassigned when
> +reading, then the read will come back as "Unassigned".
> +::
> +
> + # cat /sys/fs/resctrl/mon_data/mon_L3_00/mbm_total_bytes
> + 779247936
> + # cat /sys/fs/resctrl/mon_data/mon_L3_00/mbm_local_bytes
> + 765207488
> +
> +i. Check the default event configurations.
> +::
> +
> + # cat /sys/fs/resctrl/info/L3_MON/event_configs/mbm_total_bytes/event_filter
> + local_reads,remote_reads,local_non_temporal_writes,remote_non_temporal_writes,
> + local_reads_slow_memory,remote_reads_slow_memory,dirty_victim_writes_all
> +
> + # cat /sys/fs/resctrl/info/L3_MON/event_configs/mbm_local_bytes/event_filter
> + local_reads,local_non_temporal_writes,local_reads_slow_memory
> +
> +j. Change the event configuration for mbm_local_bytes.
> +::
> +
> + # echo "local_reads, local_non_temporal_writes, local_reads_slow_memory, remote_reads" >
> + /sys/fs/resctrl/info/L3_MON/counter_configs/mbm_local_bytes/event_filter
> +
> + # cat /sys/fs/resctrl/info/L3_MON/counter_configs/mbm_local_bytes/event_filter
> + local_reads, local_non_temporal_writes, local_reads_slow_memory, remote_reads
Please let output match code wrt spacing.
> +
> +This will update all (across all domains of all monitor groups) counter assignments
> +associated with the mbm_local_bytes event.
> +
> +k. Now read the local event again. The first read may come back with "Unavailable"
> +status. The subsequent read of mbm_local_bytes will display only the read events.
(note comment about "read events" on duplicate text)
> +::
> +
> + # cat /sys/fs/resctrl/mon_data/mon_L3_00/mbm_local_bytes
> + Unavailable
> + # cat /sys/fs/resctrl/mon_data/mon_L3_00/mbm_local_bytes
> + 314101
> +
> +l. Users have the option to go back to 'default' mbm_assign_mode if required. This can be
> +done using the following command. Note that switching the mbm_assign_mode will reset all
> +the MBM counters (and thus all MBM events) of all the resctrl groups.
hmmm ... earlier documentation about mbm_assign_mode changes was careful to use
"may reset", and here is it switched to "will reset". I am still cautious to make any
strong commitments about resctrl behavior in user documentation.
> +::
> +
> + # echo "default" > /sys/fs/resctrl/info/L3_MON/mbm_assign_mode
> + # cat /sys/fs/resctrl/info/L3_MON/mbm_assign_mode
> + mbm_event
> + [default]
> +
> +m. Unmount the resctrl
> +::
> +
> + #umount /sys/fs/resctrl/
> +
> Intel RDT Errata
> ================
>
> diff --git a/fs/resctrl/internal.h b/fs/resctrl/internal.h
> index ed0e3b695ad5..14d99c723ea5 100644
> --- a/fs/resctrl/internal.h
> +++ b/fs/resctrl/internal.h
> @@ -51,6 +51,15 @@ static inline struct rdt_fs_context *rdt_fc2context(struct fs_context *fc)
> return container_of(kfc, struct rdt_fs_context, kfc);
> }
>
> +/*
> + * Assignment types for the monitor modes
> + */
> +enum {
> + ASSIGN_NONE = 0,
> + ASSIGN_EXCLUSIVE,
> + ASSIGN_INVALID,
> +};
I do not think this is necessary (more below)
> +
> /**
> * struct mon_evt - Description of a monitor event
> * @evtid: event id
> diff --git a/fs/resctrl/rdtgroup.c b/fs/resctrl/rdtgroup.c
> index 18ec65801dbb..92bb8f3adfae 100644
> --- a/fs/resctrl/rdtgroup.c
> +++ b/fs/resctrl/rdtgroup.c
> @@ -2129,6 +2129,160 @@ static int mbm_L3_assignments_show(struct kernfs_open_file *of, struct seq_file
> return ret;
> }
>
> +/**
> + * mbm_get_mon_event_by_name() - Return the mon_evt entry for the matching
> + * event name.
> + */
> +static struct mon_evt *mbm_get_mon_event_by_name(struct rdt_resource *r,
struct rdt_resource parameter seems to be unused ... but should be used to match
with mon_evt::rid?
> + char *name)
> +{
> + struct mon_evt *mevt;
> +
> + for (mevt = &mon_event_all[0]; mevt < &mon_event_all[QOS_NUM_EVENTS]; mevt++) {
(use macro)
> + if (mevt->enabled && !strcmp(mevt->name, name))
> + return mevt;
> + }
> +
> + return NULL;
> +}
This looks to be a utility that should be close to the data structure in
fs/resctrl/monitor.c. Please check if you can move monitoring code
to fs/resctrl/monitor.c.
> +
> +static unsigned int resctrl_get_assign_state(char *assign)
> +{
> + if (!assign || strlen(assign) != 1)
> + return ASSIGN_INVALID;
> +
> + switch (*assign) {
> + case 'e':
> + return ASSIGN_EXCLUSIVE;
I think this can be simplified by calling resctrl_assign_cntr_event()
(rdtgroup_assign_cntr_event()) directly.
> + case '_':
> + return ASSIGN_NONE;
Here resctrl_unassign_cntr_event() (rdtgroup_unassign_cntr_event())
can be called directly.
> + default:
> + return ASSIGN_INVALID;
> + }
With assign/unassign done the function can return proper error
> +}
> +
> +static int resctrl_process_assign(struct rdt_resource *r, struct rdtgroup *rdtgrp,
> + char *event, char *tok)
> +{
> + struct rdt_mon_domain *d;
> + unsigned long dom_id = 0;
> + char *dom_str, *id_str;
> + struct mon_evt *mevt;
> + int assign_state;
> + char domain[10];
> + bool found;
> + int ret;
> +
> + mevt = mbm_get_mon_event_by_name(r, event);
> + if (!mevt) {
> + rdt_last_cmd_printf("Invalid event %s\n", event);
> + return -ENOENT;
> + }
> +
> +next:
> + if (!tok || tok[0] == '\0')
> + return 0;
> +
> + /* Start processing the strings for each domain */
> + dom_str = strim(strsep(&tok, ";"));
> +
> + id_str = strsep(&dom_str, "=");
> +
> + /* Check for domain id '*' which means all domains */
> + if (id_str && *id_str == '*') {
> + d = NULL;
> + goto check_state;
Instead of "goto check_state" resctrl_get_assign_state() (with
more appropriate name after changes) can be called directly, its
return handled, possibly printing to last_cmd_status without needing
any sprintf() tricks, and exit from resctrl_process_assign().
Apart from simplifying the code an additional benefit is to avoid
(ab)use case where user/bot may write:
# echo "mbm_total_bytes:*=_;*=e;*=_" > /sys/fs/resctrl/mbm_L3_assignments
> + } else if (!id_str || kstrtoul(id_str, 10, &dom_id)) {
> + rdt_last_cmd_puts("Missing domain id\n");
> + return -EINVAL;
> + }
> +
> + /* Verify if the dom_id is valid */
> + found = false;
> + list_for_each_entry(d, &r->mon_domains, hdr.list) {
> + if (d->hdr.id == dom_id) {
Similarly, resctrl_get_assign_state() (new name TBD) can be
called directly and "found" can be dropped.
> + found = true;
> + break;
> + }
> + }
> +
> + if (!found) {
> + rdt_last_cmd_printf("Invalid domain id %ld\n", dom_id);
> + return -EINVAL;
> + }
> +
> +check_state:
> + assign_state = resctrl_get_assign_state(dom_str);
> +
> + switch (assign_state) {
> + case ASSIGN_NONE:
> + ret = resctrl_unassign_cntr_event(r, d, rdtgrp, mevt);
> + break;
> + case ASSIGN_EXCLUSIVE:
> + ret = resctrl_assign_cntr_event(r, d, rdtgrp, mevt);
> + break;
> + case ASSIGN_INVALID:
> + ret = -EINVAL;
> + }
> +
> + if (ret)
> + goto out_fail;
> +
> + goto next;
> +
> +out_fail:
> + sprintf(domain, d ? "%ld" : "*", dom_id);
> +
> + rdt_last_cmd_printf("Assign operation '%s:%s=%s' failed\n", event, domain, dom_str);
> +
> + return ret;
> +}
> +
> +static ssize_t mbm_L3_assignments_write(struct kernfs_open_file *of, char *buf,
> + size_t nbytes, loff_t off)
> +{
> + struct rdt_resource *r = resctrl_arch_get_resource(RDT_RESOURCE_L3);
> + struct rdtgroup *rdtgrp;
> + char *token, *event;
> + int ret = 0;
> +
> + /* Valid input requires a trailing newline */
> + if (nbytes == 0 || buf[nbytes - 1] != '\n')
> + return -EINVAL;
> +
> + buf[nbytes - 1] = '\0';
> +
> + rdtgrp = rdtgroup_kn_lock_live(of->kn);
> + if (!rdtgrp) {
> + rdtgroup_kn_unlock(of->kn);
> + return -ENOENT;
> + }
> + rdt_last_cmd_clear();
> +
> + if (!resctrl_arch_mbm_cntr_assign_enabled(r)) {
> + rdt_last_cmd_puts("mbm_event mode is not enabled\n");
> + rdtgroup_kn_unlock(of->kn);
> + return -EINVAL;
> + }
> +
> + while ((token = strsep(&buf, "\n")) != NULL) {
> + /*
> + * The write command follows the following format:
> + * “<Event>:<Domain ID>=<Assignment state>”
> + * Extract the event name first.
> + */
> + event = strsep(&token, ":");
> +
> + ret = resctrl_process_assign(r, rdtgrp, event, token);
> + if (ret)
> + break;
> + }
> +
> + rdtgroup_kn_unlock(of->kn);
> +
> + return ret ?: nbytes;
> +}
> +
> /* rdtgroup information files for one cache resource. */
> static struct rftype res_common_files[] = {
> {
> @@ -2269,9 +2423,10 @@ static struct rftype res_common_files[] = {
> },
> {
> .name = "mbm_L3_assignments",
> - .mode = 0444,
> + .mode = 0644,
> .kf_ops = &rdtgroup_kf_single_ops,
> .seq_show = mbm_L3_assignments_show,
> + .write = mbm_L3_assignments_write,
> },
> {
> .name = "mbm_assign_mode",
Reinette
Powered by blists - more mailing lists