[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <1c50b589-a738-4ae6-8362-bd1ce0d0dc98@amd.com>
Date: Wed, 17 Jul 2024 12:19:17 -0500
From: "Moger, Babu" <babu.moger@....com>
To: Reinette Chatre <reinette.chatre@...el.com>, corbet@....net,
fenghua.yu@...el.com, tglx@...utronix.de, mingo@...hat.com, bp@...en8.de,
dave.hansen@...ux.intel.com
Cc: x86@...nel.org, hpa@...or.com, paulmck@...nel.org, rdunlap@...radead.org,
tj@...nel.org, peterz@...radead.org, yanjiewtw@...il.com,
kim.phillips@....com, lukas.bulwahn@...il.com, seanjc@...gle.com,
jmattson@...gle.com, leitao@...ian.org, jpoimboe@...nel.org,
rick.p.edgecombe@...el.com, kirill.shutemov@...ux.intel.com,
jithu.joseph@...el.com, kai.huang@...el.com, kan.liang@...ux.intel.com,
daniel.sneddon@...ux.intel.com, pbonzini@...hat.com, sandipan.das@....com,
ilpo.jarvinen@...ux.intel.com, peternewman@...gle.com,
maciej.wieczor-retman@...el.com, linux-doc@...r.kernel.org,
linux-kernel@...r.kernel.org, eranian@...gle.com, james.morse@....com
Subject: Re: [PATCH v5 00/20] x86/resctrl : Support AMD Assignable Bandwidth
Monitoring Counters (ABMC)
Hi Reinette,
On 7/12/24 17:03, Reinette Chatre wrote:
> Hi Babu,
>
> On 7/3/24 2:48 PM, Babu Moger wrote:
>> # Linux Implementation
>>
>> Linux resctrl subsystem provides the interface to count maximum of two
>> memory bandwidth events per group, from a combination of available total
>> and local events. Keeping the current interface, users can enable a maximum
>> of 2 ABMC counters per group. User will also have the option to enable only
>> one counter to the group. If the system runs out of assignable ABMC
>> counters, kernel will display an error. Users need to disable an already
>> enabled counter to make space for new assignments.
>
> The implementation appears to be converging on an interface that can
> be generic enough to be used by other features discussed along the way.
> "Linux implementation" summary can thus add:
>
> Create a generic interface aimed to support user space assignment
> of scarce counters used for monitoring. First usage of interface
> is by ABMC with option to expand usage to "soft-RMID" and MPAM
> counters in future.
Sure.
>
>
>> # Examples
>>
>> a. Check if ABMC support is available
>> #mount -t resctrl resctrl /sys/fs/resctrl/
>>
>> #cat /sys/fs/resctrl/info/L3_MON/mbm_mode
>> [abmc]
>> legacy
>>
>> Linux kernel detected ABMC feature and it is enabled.
>
> How about renaming "abmc" to "mbm_cntrs"? This will match the num_mbm_cntrs
> info file and be the final step to make this generic so that another
> architecture
> can more easily support assignining hardware counters without needing to call
> the feature AMD's "abmc".
I think we aleady settled this with "mbm_cntr_assignable".
For soft-RMID" it will be mbm_sw_assignable.
>
> Expanding on this it may be possible to add a new "sw_mbm_cntrs" feature that
> will be the "soft-RMID" feature while also reflecting the "mbm_cntrs" name
> so that when user space enables that feature its properties can be found in
> "num_mbm_cntrs".
>
> The "abmc" kernel parameter remains but that does seem separate from this
> resctrl fs feature since it is explicitly tied to X86_FEATURE_ABMC surely
> making it architecture specific.
>
>>
>> b. Check how many ABMC counters are available.
>>
>> #cat /sys/fs/resctrl/info/L3_MON/num_cntrs
>> 32
>
> This is now num_mbm_cntrs
Sure.
>
>>
>> c. Create few resctrl groups.
>>
>> # mkdir /sys/fs/resctrl/mon_groups/child_default_mon_grp
>> # mkdir /sys/fs/resctrl/non_default_ctrl_mon_grp
>> # mkdir
>> /sys/fs/resctrl/non_default_ctrl_mon_grp/mon_groups/child_non_default_mon_grp
>>
>>
>> d. This series adds a new interface file
>> /sys/fs/resctrl/info/L3_MON/mbm_control
>> to list and modify the group's monitoring states. File provides
>> single place
>> to list monitoring states of all the resctrl groups. It makes it
>> easier for
>> user space to learn about the counters are used without needing to
>> traverse
>> all the groups thus reducing the number of filesystem calls.
>>
>> The list follows the following format:
>>
>> "<CTRL_MON group>/<MON group>/<domain_id>=<flags>"
>>
>> Format for specific type of groups:
>>
>> * Default CTRL_MON group:
>> "//<domain_id>=<flags>"
>>
>> * Non-default CTRL_MON group:
>> "<CTRL_MON group>//<domain_id>=<flags>"
>>
>> * Child MON group of default CTRL_MON group:
>> "/<MON group>/<domain_id>=<flags>"
>>
>> * Child MON group of non-default CTRL_MON group:
>> "<CTRL_MON group>/<MON group>/<domain_id>=<flags>"
>>
>> Flags can be one of the following:
>>
>> t MBM total event is enabled.
>> l MBM local event is enabled.
>> tl Both total and local MBM events are enabled.
>> _ None of the MBM events are enabled
>
> The language needs to be changed here (and in the many copied places) to
> be specific about what setting the flag accomplishes. For example, in
> "legacy" mode user space can be expected to find all events enabled, no?
> Needing a new feature to set a flag to accomplish something that is
> possible in legacy mode can thus cause confusion.
Yes. It is possible to do it. But I feel unnessassary.
>
> If I understand the implementation reading "mbm_control" will fail
> if system is ABMC capable but it is disabled. Why can "mbm_control" not
> always be displayed to user space? For example, what if "mbm_control" is
> always available to user space and it can provide specific information to
> user space. For example:
> t MBM total event is enabled but may not always be counted.
> T MBM total event is enabled and being counted.
>
> On AMD systems resource groups will have "t" associated with monitor
> groups when ABMC disabled, "T" when ABMC enabled and a counter assigned.
> On Intel systems monitor groups will always have "T".
I think more flags will add more confusion.
>
> For "soft-RMID" the flag could possible continue to be "T"?
>
> I am trying to find ways to communicate to user space consistently
> and clearly and any insights will be appreciated. We really do not want
> to add this interface and then find that it just causes confusion.
>
> It is not quite obvious to me when the new files should be visible and
> what they should present to the user. "mbm_mode" is now always visible.
> Should "num_mbm_cntrs" not also always be visible? Right now "num_mbm_cntrs"
> appears to be only associated to ABMC, should it not also, for example,
> be the file that "soft-RMID" may use to share how many counters are
> available? Its contents will thus be dynamic based on which "MBM mode" is
> active, begging the question, what should it contain when "legacy" mode is
> enabled, should "num_mbm_cntrs" perhaps show "0" to user space when
> "legacy" mode is active?
Its good we have this discussion.
How about we go with simple way for now. The mbm_mode will only available
when ABMC or Soft_RMID(MPAM feature) is supported. Same way for the
num_mbm_cntrs.
>
>>
>> Examples:
>>
>> # cat /sys/fs/resctrl/info/L3_MON/mbm_control
>> non_default_ctrl_mon_grp//0=tl;1=tl;
>> non_default_ctrl_mon_grp/child_non_default_mon_grp/0=tl;1=tl;
>> //0=tl;1=tl;
>> /child_default_mon_grp/0=tl;1=tl;
>>
>> There are four groups and all the groups have local and total
>> event enabled on domain 0 and 1.
>
> "local and total event" is vague, can it be made specific with, for example,
> "local and total MBM events"
Sure.
>
>>
>> =tl means both total and local events are enabled.
>
> Same here (and all copied places in this series)
Sure.
>
>>
>> "//" - This is a default CTRL_MON group
>>
>> "non_default_ctrl_mon_grp//" - This is non-default CTRL_MON group
>>
>> "/child_default_mon_grp/" - This is Child MON group of the defult
>> group
>
> Same typos as in previous version of cover letter.
Oh. no. Will fix it.
>
>>
>> "non_default_ctrl_mon_grp/child_non_default_mon_grp/" - This is child
>> MON group of the non-default group
>>
>> e. Update the group assignment states using the interface file
>> /sys/fs/resctrl/info/L3_MON/mbm_control.
>>
>> The write format is similar to the above list format with addition of
>> op-code for the assignment operation.
>>
>> * Default CTRL_MON group:
>> "//<domain_id><op-code><flags>"
>>
>> * Non-default CTRL_MON group:
>> "<CTRL_MON group>//<domain_id><op-code><flags>"
>>
>> * Child MON group of default CTRL_MON group:
>> "/<MON group>/<domain_id><op-code><flags>"
>>
>> * Child MON group of non-default CTRL_MON group:
>> "<CTRL_MON group>/<MON group>/<domain_id><op-code><flags>"
>>
>> Op-code can be one of the following:
>>
>> = Update the assignment to match the flag.
>> + Assign a new state.
>> - Unassign a new state.
>
> Please be consistent with terminology. Above switches between "flag"
> and "state" while it then continues below using "event". Also,
> "Unassign a _new_ state" is unexpected, it should probably be an
> _existing_ (not "new") state/flag/event?
I will use event consistantly.
>
>>
>> Flags can be one of the following:
>>
>> t MBM total event.
>> l MBM local event.
>> tl Both total and local MBM events.
>> _ None of the MBM events. Only works with '=' op-code.
>>
>> Initial group status:
>> # cat /sys/fs/resctrl/info/L3_MON/mbm_control
>> non_default_ctrl_mon_grp//0=tl;1=tl;
>> non_default_ctrl_mon_grp/child_non_default_mon_grp/0=tl;1=tl;
>> //0=tl;1=tl;
>> /child_default_mon_grp/0=tl;1=tl;
>>
>> To update the default group to enable only total event on domain 0:
>> # echo "//0=t" > /sys/fs/resctrl/info/L3_MON/mbm_control
>>
>> Assignment status after the update:
>> # cat /sys/fs/resctrl/info/L3_MON/mbm_control
>> non_default_ctrl_mon_grp//0=tl;1=tl;
>> non_default_ctrl_mon_grp/child_non_default_mon_grp/0=tl;1=tl;
>> //0=t;1=tl;
>> /child_default_mon_grp/0=tl;1=tl;
>>
>> To update the MON group child_default_mon_grp to remove total event
>> on domain 1:
>> # echo "/child_default_mon_grp/1-t" >
>> /sys/fs/resctrl/info/L3_MON/mbm_control
>>
>> Assignment status after the update:
>> $ cat /sys/fs/resctrl/info/L3_MON/mbm_control
>> non_default_ctrl_mon_grp//0=tl;1=tl;
>> non_default_ctrl_mon_grp/child_non_default_mon_grp/0=tl;1=tl;
>> //0=t;1=tl;
>> /child_default_mon_grp/0=tl;1=l;
>>
>> To update the MON group
>> non_default_ctrl_mon_grp/child_non_default_mon_grp to
>> remove both local and total events on domain 1:
>> # echo "non_default_ctrl_mon_grp/child_non_default_mon_grp/1=_" >
>> /sys/fs/resctrl/info/L3_MON/mbm_control
>>
>> Assignment status after the update:
>> non_default_ctrl_mon_grp//0=tl;1=tl;
>> non_default_ctrl_mon_grp/child_non_default_mon_grp/0=tl;1=_;
>> //0=t;1=tl;
>> /child_default_mon_grp/0=tl;1=l;
>>
>> To update the default group to add a local event domain 0.
>> # echo "//0+l" > /sys/fs/resctrl/info/L3_MON/mbm_control
>>
>> Assignment status after the update:
>> # cat /sys/fs/resctrl/info/L3_MON/mbm_control
>> non_default_ctrl_mon_grp//0=tl;1=tl;
>> non_default_ctrl_mon_grp/child_non_default_mon_grp/0=tl;1=_;
>> //0=tl;1=tl;
>> /child_default_mon_grp/0=tl;1=l;
>>
>>
>> f. Read the event mbm_total_bytes and mbm_local_bytes of the default group.
>> There is no change in reading the events with ABMC. If the event is
>> unassigned
>> when reading, then the read will come back as "Unassigned".
>>
>> # cat /sys/fs/resctrl/mon_data/mon_L3_00/mbm_total_bytes
>> 779247936
>> # cat /sys/fs/resctrl/mon_data/mon_L3_00/mbm_local_bytes
>> 765207488
>>
>> g. Users will have the option to go back to legacy mbm_mode if required.
>> This can be done using the following command. Note that switching the
>> mbm_mode will reset all the mbm counters of all resctrl groups.
>
> mbm -> MBM (throughout)
Sure.
>
>>
>> # echo "legacy" > /sys/fs/resctrl/info/L3_MON/mbm_mode
>> # cat /sys/fs/resctrl/info/L3_MON/mbm_mode
>> abmc
>> [legacy]
>>
>> h. Check the bandwidth configuration for the group. Note that bandwidth
>> configuration has a domain scope. Total event defaults to 0x7F (to
>> count all the events) and local event defaults to 0x15 (to count all
>> the local numa events). The event bitmap decoding is available at
>> https://www.kernel.org/doc/Documentation/x86/resctrl.rst
>> in section "mbm_total_bytes_config", "mbm_local_bytes_config":
>>
>> #cat /sys/fs/resctrl/info/L3_MON/mbm_total_bytes_config
>> 0=0x7f;1=0x7f
>>
>> #cat /sys/fs/resctrl/info/L3_MON/mbm_local_bytes_config
>> 0=0x15;1=0x15
>>
>> j. Change the bandwidth source for domain 0 for the total event to count
>> only reads.
>> Note that this change effects total events on the domain 0.
>>
>> #echo 0=0x33 > /sys/fs/resctrl/info/L3_MON/mbm_total_bytes_config
>> #cat /sys/fs/resctrl/info/L3_MON/mbm_total_bytes_config
>> 0=0x33;1=0x7F
>>
>> k. Now read the total event again. The first read will come back with
>> "Unavailable"
>> status. The subsequent read of mbm_total_bytes will display only the
>> read events.
>>
>> #cat /sys/fs/resctrl/mon_data/mon_L3_00/mbm_total_bytes
>> Unavailable
>> #cat /sys/fs/resctrl/mon_data/mon_L3_00/mbm_total_bytes
>> 314101
>>
>> l. Unmount the resctrl
>>
>> #umount /sys/fs/resctrl/
>>
>
> Reinette
>
--
Thanks
Babu Moger
Powered by blists - more mailing lists