lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <1c50b589-a738-4ae6-8362-bd1ce0d0dc98@amd.com>
Date: Wed, 17 Jul 2024 12:19:17 -0500
From: "Moger, Babu" <babu.moger@....com>
To: Reinette Chatre <reinette.chatre@...el.com>, corbet@....net,
 fenghua.yu@...el.com, tglx@...utronix.de, mingo@...hat.com, bp@...en8.de,
 dave.hansen@...ux.intel.com
Cc: x86@...nel.org, hpa@...or.com, paulmck@...nel.org, rdunlap@...radead.org,
 tj@...nel.org, peterz@...radead.org, yanjiewtw@...il.com,
 kim.phillips@....com, lukas.bulwahn@...il.com, seanjc@...gle.com,
 jmattson@...gle.com, leitao@...ian.org, jpoimboe@...nel.org,
 rick.p.edgecombe@...el.com, kirill.shutemov@...ux.intel.com,
 jithu.joseph@...el.com, kai.huang@...el.com, kan.liang@...ux.intel.com,
 daniel.sneddon@...ux.intel.com, pbonzini@...hat.com, sandipan.das@....com,
 ilpo.jarvinen@...ux.intel.com, peternewman@...gle.com,
 maciej.wieczor-retman@...el.com, linux-doc@...r.kernel.org,
 linux-kernel@...r.kernel.org, eranian@...gle.com, james.morse@....com
Subject: Re: [PATCH v5 00/20] x86/resctrl : Support AMD Assignable Bandwidth
 Monitoring Counters (ABMC)

Hi Reinette,

On 7/12/24 17:03, Reinette Chatre wrote:
> Hi Babu,
> 
> On 7/3/24 2:48 PM, Babu Moger wrote:
>> # Linux Implementation
>>
>> Linux resctrl subsystem provides the interface to count maximum of two
>> memory bandwidth events per group, from a combination of available total
>> and local events. Keeping the current interface, users can enable a maximum
>> of 2 ABMC counters per group. User will also have the option to enable only
>> one counter to the group. If the system runs out of assignable ABMC
>> counters, kernel will display an error. Users need to disable an already
>> enabled counter to make space for new assignments.
> 
> The implementation appears to be converging on an interface that can
> be generic enough to be used by other features discussed along the way.
> "Linux implementation" summary can thus add:
> 
>     Create a generic interface aimed to support user space assignment
>     of scarce counters used for monitoring. First usage of interface
>     is by ABMC with option to expand usage to "soft-RMID" and MPAM
>     counters in future.

Sure.

> 
> 
>> # Examples
>>
>> a. Check if ABMC support is available
>>     #mount -t resctrl resctrl /sys/fs/resctrl/
>>
>>     #cat /sys/fs/resctrl/info/L3_MON/mbm_mode
>>     [abmc]
>>     legacy
>>
>>     Linux kernel detected ABMC feature and it is enabled.
> 
> How about renaming "abmc" to "mbm_cntrs"? This will match the num_mbm_cntrs
> info file and be the final step to make this generic so that another
> architecture
> can more easily support assignining hardware counters without needing to call
> the feature AMD's "abmc".

I think we aleady settled this with "mbm_cntr_assignable".

For soft-RMID" it will be mbm_sw_assignable.


> 
> Expanding on this it may be possible to add a new "sw_mbm_cntrs" feature that
> will be the "soft-RMID" feature while also reflecting the "mbm_cntrs" name
> so that when user space enables that feature its properties can be found in
> "num_mbm_cntrs".
> 
> The "abmc" kernel parameter remains but that does seem separate from this
> resctrl fs feature since it is explicitly tied to X86_FEATURE_ABMC surely
> making it architecture specific.
> 
>>
>> b. Check how many ABMC counters are available.
>>
>>     #cat /sys/fs/resctrl/info/L3_MON/num_cntrs
>>     32
> 
> This is now num_mbm_cntrs

Sure.

> 
>>
>> c. Create few resctrl groups.
>>
>>     # mkdir /sys/fs/resctrl/mon_groups/child_default_mon_grp
>>     # mkdir /sys/fs/resctrl/non_default_ctrl_mon_grp
>>     # mkdir
>> /sys/fs/resctrl/non_default_ctrl_mon_grp/mon_groups/child_non_default_mon_grp
>>
>>
>> d. This series adds a new interface file
>> /sys/fs/resctrl/info/L3_MON/mbm_control
>>     to list and modify the group's monitoring states. File provides
>> single place
>>     to list monitoring states of all the resctrl groups. It makes it
>> easier for
>>     user space to learn about the counters are used without needing to
>> traverse
>>     all the groups thus reducing the number of filesystem calls.
>>
>>     The list follows the following format:
>>
>>     "<CTRL_MON group>/<MON group>/<domain_id>=<flags>"
>>
>>     Format for specific type of groups:
>>
>>     * Default CTRL_MON group:
>>      "//<domain_id>=<flags>"
>>
>>         * Non-default CTRL_MON group:
>>                 "<CTRL_MON group>//<domain_id>=<flags>"
>>
>>         * Child MON group of default CTRL_MON group:
>>                 "/<MON group>/<domain_id>=<flags>"
>>
>>         * Child MON group of non-default CTRL_MON group:
>>                 "<CTRL_MON group>/<MON group>/<domain_id>=<flags>"
>>
>>         Flags can be one of the following:
>>
>>          t  MBM total event is enabled.
>>          l  MBM local event is enabled.
>>          tl Both total and local MBM events are enabled.
>>          _  None of the MBM events are enabled
> 
> The language needs to be changed here (and in the many copied places) to
> be specific about what setting the flag accomplishes. For example, in
> "legacy" mode user space can be expected to find all events enabled, no?
> Needing a new feature to set a flag to accomplish something that is
> possible in legacy mode can thus cause confusion.

Yes. It is possible to do it. But I feel unnessassary.

> 
> If I understand the implementation reading "mbm_control" will fail
> if system is ABMC capable but it is disabled. Why can "mbm_control" not
> always be displayed to user space? For example, what if "mbm_control" is
> always available to user space and it can provide specific information to
> user space. For example:
>     t  MBM total event is enabled but may not always be counted.
>     T  MBM total event is enabled and being counted.
> 
> On AMD systems resource groups will have "t" associated with monitor
> groups when ABMC disabled, "T" when ABMC enabled and a counter assigned.
> On Intel systems monitor groups will always have "T".

I think more flags will add more confusion.

> 
> For "soft-RMID" the flag could possible continue to be "T"?
> 
> I am trying to find ways to communicate to user space consistently
> and clearly and any insights will be appreciated. We really do not want
> to add this interface and then find that it just causes confusion.
> 
> It is not quite obvious to me when the new files should be visible and
> what they should present to the user. "mbm_mode" is now always visible.
> Should "num_mbm_cntrs" not also always be visible? Right now "num_mbm_cntrs"
> appears to be only associated to ABMC, should it not also, for example,
> be the file that "soft-RMID" may use to share how many counters are
> available? Its contents will thus be dynamic based on which "MBM mode" is
> active, begging the question, what should it contain when "legacy" mode is
> enabled, should "num_mbm_cntrs" perhaps show "0" to user space when
> "legacy" mode is active?

Its good we have this discussion.

How about we go with simple way for now. The mbm_mode will only available
when ABMC or Soft_RMID(MPAM feature) is supported. Same way for the
num_mbm_cntrs.


> 
>>
>>     Examples:
>>
>>     # cat /sys/fs/resctrl/info/L3_MON/mbm_control
>>     non_default_ctrl_mon_grp//0=tl;1=tl;
>>     non_default_ctrl_mon_grp/child_non_default_mon_grp/0=tl;1=tl;
>>     //0=tl;1=tl;
>>     /child_default_mon_grp/0=tl;1=tl;
>>     
>>     There are four groups and all the groups have local and total
>>     event enabled on domain 0 and 1.
> 
> "local and total event" is vague, can it be made specific with, for example,
> "local and total MBM events"

Sure.

> 
>>
>>     =tl means both total and local events are enabled.
> 
> Same here (and all copied places in this series)

Sure.

> 
>>
>>     "//" - This is a default CTRL_MON group
>>
>>     "non_default_ctrl_mon_grp//" - This is non-default CTRL_MON group
>>
>>     "/child_default_mon_grp/"  - This is Child MON group of the defult
>> group
> 
> Same typos as in previous version of cover letter.

Oh. no. Will fix it.

> 
>>
>>     "non_default_ctrl_mon_grp/child_non_default_mon_grp/" - This is child
>>     MON group of the non-default group
>>
>> e. Update the group assignment states using the interface file
>> /sys/fs/resctrl/info/L3_MON/mbm_control.
>>
>>     The write format is similar to the above list format with addition of
>>     op-code for the assignment operation.
>>     
>>     * Default CTRL_MON group:
>>             "//<domain_id><op-code><flags>"
>>     
>>     * Non-default CTRL_MON group:
>>             "<CTRL_MON group>//<domain_id><op-code><flags>"
>>     
>>     * Child MON group of default CTRL_MON group:
>>             "/<MON group>/<domain_id><op-code><flags>"
>>     
>>     * Child MON group of non-default CTRL_MON group:
>>             "<CTRL_MON group>/<MON group>/<domain_id><op-code><flags>"
>>     
>>     Op-code can be one of the following:
>>     
>>     = Update the assignment to match the flag.
>>     + Assign a new state.
>>     - Unassign a new state.
> 
> Please be consistent with terminology. Above switches between "flag"
> and "state" while it then continues below using "event". Also,
> "Unassign a _new_ state" is unexpected, it should probably be an
> _existing_ (not "new") state/flag/event?

I will use event consistantly.

> 
>>
>>     Flags can be one of the following:
>>
>>          t  MBM total event.
>>          l  MBM local event.
>>          tl Both total and local MBM events.
>>          _  None of the MBM events. Only works with '=' op-code.
>>     
>>     Initial group status:
>>     # cat /sys/fs/resctrl/info/L3_MON/mbm_control
>>     non_default_ctrl_mon_grp//0=tl;1=tl;
>>     non_default_ctrl_mon_grp/child_non_default_mon_grp/0=tl;1=tl;
>>     //0=tl;1=tl;
>>     /child_default_mon_grp/0=tl;1=tl;
>>
>>     To update the default group to enable only total event on domain 0:
>>     # echo "//0=t" > /sys/fs/resctrl/info/L3_MON/mbm_control
>>
>>     Assignment status after the update:
>>     # cat /sys/fs/resctrl/info/L3_MON/mbm_control
>>     non_default_ctrl_mon_grp//0=tl;1=tl;
>>     non_default_ctrl_mon_grp/child_non_default_mon_grp/0=tl;1=tl;
>>     //0=t;1=tl;
>>     /child_default_mon_grp/0=tl;1=tl;
>>
>>     To update the MON group child_default_mon_grp to remove total event
>> on domain 1:
>>     # echo "/child_default_mon_grp/1-t" >
>> /sys/fs/resctrl/info/L3_MON/mbm_control
>>
>>     Assignment status after the update:
>>     $ cat /sys/fs/resctrl/info/L3_MON/mbm_control
>>     non_default_ctrl_mon_grp//0=tl;1=tl;
>>     non_default_ctrl_mon_grp/child_non_default_mon_grp/0=tl;1=tl;
>>     //0=t;1=tl;
>>     /child_default_mon_grp/0=tl;1=l;
>>
>>     To update the MON group
>> non_default_ctrl_mon_grp/child_non_default_mon_grp to
>>     remove both local and total events on domain 1:
>>     # echo "non_default_ctrl_mon_grp/child_non_default_mon_grp/1=_" >
>>            /sys/fs/resctrl/info/L3_MON/mbm_control
>>
>>     Assignment status after the update:
>>     non_default_ctrl_mon_grp//0=tl;1=tl;
>>     non_default_ctrl_mon_grp/child_non_default_mon_grp/0=tl;1=_;
>>     //0=t;1=tl;
>>     /child_default_mon_grp/0=tl;1=l;
>>
>>     To update the default group to add a local event domain 0.
>>     # echo "//0+l" > /sys/fs/resctrl/info/L3_MON/mbm_control
>>
>>     Assignment status after the update:
>>     # cat /sys/fs/resctrl/info/L3_MON/mbm_control
>>     non_default_ctrl_mon_grp//0=tl;1=tl;
>>     non_default_ctrl_mon_grp/child_non_default_mon_grp/0=tl;1=_;
>>     //0=tl;1=tl;
>>     /child_default_mon_grp/0=tl;1=l;
>>
>>
>> f. Read the event mbm_total_bytes and mbm_local_bytes of the default group.
>>     There is no change in reading the events with ABMC. If the event is
>> unassigned
>>     when reading, then the read will come back as "Unassigned".
>>     
>>     # cat /sys/fs/resctrl/mon_data/mon_L3_00/mbm_total_bytes
>>     779247936
>>     # cat /sys/fs/resctrl/mon_data/mon_L3_00/mbm_local_bytes
>>     765207488
>>     
>> g. Users will have the option to go back to legacy mbm_mode if required.
>>     This can be done using the following command. Note that switching the
>>     mbm_mode will reset all the mbm counters of all resctrl groups.
> 
> mbm -> MBM (throughout)

Sure.

> 
>>
>>     # echo "legacy" > /sys/fs/resctrl/info/L3_MON/mbm_mode
>>     # cat /sys/fs/resctrl/info/L3_MON/mbm_mode
>>     abmc
>>     [legacy]
>>
>> h. Check the bandwidth configuration for the group. Note that bandwidth
>>     configuration has a domain scope. Total event defaults to 0x7F (to
>>     count all the events) and local event defaults to 0x15 (to count all
>>     the local numa events). The event bitmap decoding is available at
>>     https://www.kernel.org/doc/Documentation/x86/resctrl.rst
>>     in section "mbm_total_bytes_config", "mbm_local_bytes_config":
>>     
>>     #cat /sys/fs/resctrl/info/L3_MON/mbm_total_bytes_config
>>     0=0x7f;1=0x7f
>>     
>>     #cat /sys/fs/resctrl/info/L3_MON/mbm_local_bytes_config
>>     0=0x15;1=0x15
>>     
>> j. Change the bandwidth source for domain 0 for the total event to count
>> only reads.
>>     Note that this change effects total events on the domain 0.
>>     
>>     #echo 0=0x33 > /sys/fs/resctrl/info/L3_MON/mbm_total_bytes_config
>>     #cat /sys/fs/resctrl/info/L3_MON/mbm_total_bytes_config
>>     0=0x33;1=0x7F
>>     
>> k. Now read the total event again. The first read will come back with
>> "Unavailable"
>>     status. The subsequent read of mbm_total_bytes will display only the
>> read events.
>>     
>>     #cat /sys/fs/resctrl/mon_data/mon_L3_00/mbm_total_bytes
>>     Unavailable
>>     #cat /sys/fs/resctrl/mon_data/mon_L3_00/mbm_total_bytes
>>     314101
>>     
>> l. Unmount the resctrl
>>     
>>     #umount /sys/fs/resctrl/
>>
> 
> Reinette
> 

-- 
Thanks
Babu Moger

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ