[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <2e488812-671e-4aa9-a292-c54b174f2dd7@intel.com>
Date: Thu, 13 Jun 2024 17:54:10 -0700
From: Reinette Chatre <reinette.chatre@...el.com>
To: Babu Moger <babu.moger@....com>, <corbet@....net>, <fenghua.yu@...el.com>,
<tglx@...utronix.de>, <mingo@...hat.com>, <bp@...en8.de>,
<dave.hansen@...ux.intel.com>
CC: <x86@...nel.org>, <hpa@...or.com>, <paulmck@...nel.org>,
<rdunlap@...radead.org>, <tj@...nel.org>, <peterz@...radead.org>,
<yanjiewtw@...il.com>, <kim.phillips@....com>, <lukas.bulwahn@...il.com>,
<seanjc@...gle.com>, <jmattson@...gle.com>, <leitao@...ian.org>,
<jpoimboe@...nel.org>, <rick.p.edgecombe@...el.com>,
<kirill.shutemov@...ux.intel.com>, <jithu.joseph@...el.com>,
<kai.huang@...el.com>, <kan.liang@...ux.intel.com>,
<daniel.sneddon@...ux.intel.com>, <pbonzini@...hat.com>,
<sandipan.das@....com>, <ilpo.jarvinen@...ux.intel.com>,
<peternewman@...gle.com>, <maciej.wieczor-retman@...el.com>,
<linux-doc@...r.kernel.org>, <linux-kernel@...r.kernel.org>,
<eranian@...gle.com>, <james.morse@....com>
Subject: Re: [PATCH v4 00/19] x86/resctrl : Support AMD Assignable Bandwidth
Monitoring Counters (ABMC)
Hi Babu,
On 5/24/24 5:23 AM, Babu Moger wrote:
>
>
> d. This series adds a new interface file /sys/fs/resctrl/info/L3_MON/mbm_assign_control
> to list and modify the group's assignment states.
There was a lot of discussion resulting in this centralized file. At first glance this
file appears to be very complicated and I believe any reasonable person would wonder if
all of this is necessary. I recommend that you add a motivation for why this file is needed.
Some items I recall are : it makes it easier for user space to learn how counters are used (no
need to traverse resctrl and open()/close() many files), on the resctrl side it makes
it possible to support counter re-assignment with a single IPI. There may be other motivations
that I am forgetting now.
Also, could the name just be "mbm_control"? What is enabled at this time are "assignable
counters" but in the future we may want to add support for other flags that have nothing to
do with "assignable counters".
>
> The list follows the following format:
>
> "<CTRL_MON group>/<MON group>/<domain_id>=<assignment_flags>"
"assignment_flags" -> "flags" ? (throughout)
>
>
> Format for specific type of groups:
>
> * Default CTRL_MON group:
> "//<domain_id>=<assignment_flags>"
>
> * Non-default CTRL_MON group:
> "<CTRL_MON group>//<domain_id>=<assignment_flags>"
>
> * Child MON group of default CTRL_MON group:
> "/<MON group>/<domain_id>=<assignment_flags>"
>
> * Child MON group of non-default CTRL_MON group:
> "<CTRL_MON group>/<MON group>/<domain_id>=<assignment_flags>"
>
> Assignment flags can be one of the following:
>
> t MBM total event is enabled
> l MBM local event is enabled
> tl Both total and local MBM events are enabled
> _ None of the MBM events are enabled
>
> Examples:
>
> # cat /sys/fs/resctrl/info/L3_MON/mbm_assign_control
> non_default_ctrl_mon_grp//0=tl;1=tl;
> non_default_ctrl_mon_grp/child_non_default_mon_grp/0=tl;1=tl;
> //0=tl;1=tl;
> /child_default_mon_grp/0=tl;1=tl;
>
> There are four groups and all the groups have local and total
> event enabled on domain 0 and 1.
>
> =tl means both total and local events are enabled.
>
> "//" - This is a default CONTROL MON group
>
> "non_default_ctrl_mon_grp//" - This is non default CONTROL MON group
Be consistent with "non-default" (vs non default) as well as "CTRL_MON" (vs
CONTROL MON).
>
> "/child_default_mon_grp/" - This is Child MON group of the defult group
"Child" -> "child"
"defult" -> "default"
>
> "non_default_ctrl_mon_grp/child_non_default_mon_grp/" - This is child
> MON group of the non default group
non-default
>
> e. Update the group assignment states using the interface file /sys/fs/resctrl/info/L3_MON/mbm_assign_control.
>
> The write format is similar to the above list format with addition of
> op-code for the assignment operation.
>
> * Default CTRL_MON group:
> "//<domain_id><op-code><assignment_flags>"
>
> * Non-default CTRL_MON group:
> "<CTRL_MON group>//<domain_id><op-code><assignment_flags>"
>
> * Child MON group of default CTRL_MON group:
> "/<MON group>/<domain_id><op-code><assignment_flags>"
>
> * Child MON group of non-default CTRL_MON group:
> "<CTRL_MON group>/<MON group>/<domain_id><op-code><assignment_flags>"
>
> Op-code can be one of the following:
>
> = Update the assignment to match the flags
> + Assign a new state
> - Unassign a new state
Looking here and the implementation it seems that "+_" and "-_" is supported.
I think that should be invalid. Only "=_" seems appropriate to me.
Also please take care to not have a catchall "default" that does an
unassign. Doing something like that will prevent us from ever being
able to add any flags in the future.
>
>
> Initial group status:
>
> # cat /sys/fs/resctrl/info/L3_MON/mbm_assign_control
> non_default_ctrl_mon_grp//0=tl;1=tl;
> non_default_ctrl_mon_grp/child_non_default_mon_grp/0=tl;1=tl;
> //0=tl;1=tl;
> /child_default_mon_grp/0=tl;1=tl;
>
> To update the default group to enable only total event on domain 0:
> # echo "//0=t" > /sys/fs/resctrl/info/L3_MON/mbm_assign_control
>
> Assignment status after the update:
> # cat /sys/fs/resctrl/info/L3_MON/mbm_assign_control
> non_default_ctrl_mon_grp//0=tl;1=tl;
> non_default_ctrl_mon_grp/child_non_default_mon_grp/0=tl;1=tl;
> //0=t;1=tl;
> /child_default_mon_grp/0=tl;1=tl;
>
> To update the MON group child_default_mon_grp to remove total event on domain 1:
> # echo "/child_default_mon_grp/1-t" > /sys/fs/resctrl/info/L3_MON/mbm_assign_control
>
> Assignment status after the update:
> $ cat /sys/fs/resctrl/info/L3_MON/mbm_assign_control
> non_default_ctrl_mon_grp//0=tl;1=tl;
> non_default_ctrl_mon_grp/child_non_default_mon_grp/0=tl;1=tl;
> //0=t;1=l;
> /child_default_mon_grp/0=t;1=tl;
This does not look right. Why did domain #1 of the default CTRL_MON group change also?
>
> To update the MON group non_default_ctrl_mon_grp/child_non_default_mon_grp to
> remove both local and total events on domain 1:
> # echo "non_default_ctrl_mon_grp/child_non_default_mon_grp/1=_" >
> /sys/fs/resctrl/info/L3_MON/mbm_assign_control
>
> Assignment status after the update:
> # cat /sys/fs/resctrl/info/L3_MON/mbm_assign_control
> non_default_ctrl_mon_grp//0=tl;1=tl;
> non_default_ctrl_mon_grp/child_non_default_mon_grp/0=tl;1=_;
> //0=t;1=l;
> /child_default_mon_grp/0=t;1=tl;
>
> To update the default group to add a total event domain 1.
> # echo "//1+t" > /sys/fs/resctrl/info/L3_MON/mbm_assign_control
>
Unclear where "t" flag was removed.
> Assignment status after the update:
>
> # cat /sys/fs/resctrl/info/L3_MON/mbm_assign_control
> non_default_ctrl_mon_grp//0=tl;1=tl;
> non_default_ctrl_mon_grp/child_non_default_mon_grp/0=tl;1=_;
> //0=t;1=tl;
> /child_default_mon_grp/0=t;1=tl;
>
> f. Read the event mbm_total_bytes and mbm_local_bytes of the default group.
> There is no change in reading the evetns with ABMC. If the event is unassigned
"evetns" -> "events"
> when reading, then the read will come back as Unavailable.
Should this not rather be "Unassigned"? According to the docs the counters
will return "Unavailable" right after reconfigure so it seems that there
are scenarios where an "assigned" counter returns "Unavailable". It seems more
useful to return "Unassigned" that will have a new specific meaning that
overloading existing "Unavailable" that has original meaning of "try again" ....
but in this case trying again will be futile.
>
> # cat /sys/fs/resctrl/mon_data/mon_L3_00/mbm_total_bytes
> 779247936
> # cat /sys/fs/resctrl/mon_data/mon_L3_00/mbm_local_bytes
> 765207488
>
> g. Users will have the option to go back to legacy_mbm mode if required.
> This can be done using the following command.
>
> # echo "legacy_mbm" > /sys/fs/resctrl/info/L3_MON/mbm_assign
> # cat /sys/fs/resctrl/info/L3_MON/mbm_assign
> abmc
> [mbm_legacy]
It is confusing for the value written by user space to be different from
the value displayed: "legacy_mbm" vs "mbm_legacy.
This is still missing information about what happens to the counters/events on
such a switch. Will events just keep counting? Will they be reset? ...?
I also think we should try to find a more generic name for this file.
"mbm_cntr_mode" or "mbm_mode" maybe?
>
> h. Check the bandwidth configuration for the group. Note that bandwidth
> configuration has a domain scope. Total event defaults to 0x7F (to
> count all the events) and local event defaults to 0x15 (to count all
> the local numa events). The event bitmap decoding is available at
> https://www.kernel.org/doc/Documentation/x86/resctrl.rst
> in section "mbm_total_bytes_config", "mbm_local_bytes_config":
>
> #cat /sys/fs/resctrl/info/L3_MON/mbm_total_bytes_config
> 0=0x7f;1=0x7f
>
> #cat /sys/fs/resctrl/info/L3_MON/mbm_local_bytes_config
> 0=0x15;1=0x15
>
> j. Change the bandwidth source for domain 0 for the total event to count only reads.
> Note that this change effects total events on the domain 0.
>
> #echo 0=0x33 > /sys/fs/resctrl/info/L3_MON/mbm_total_bytes_config
> #cat /sys/fs/resctrl/info/L3_MON/mbm_total_bytes_config
> 0=0x33;1=0x7F
>
> k. Now read the total event again. The mbm_total_bytes should display
> only the read events.
>
> #cat /sys/fs/resctrl/mon_data/mon_L3_00/mbm_total_bytes
> 314101
According to doc, right after a BMEC change the counter will read "Unavailable"
is this not the case here?
>
> l. Unmount the resctrl
>
> #umount /sys/fs/resctrl/
Reinette
Powered by blists - more mailing lists