lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Date: Tue, 18 Jun 2024 16:02:27 -0500
From: "Moger, Babu" <babu.moger@....com>
To: Reinette Chatre <reinette.chatre@...el.com>, corbet@....net,
 fenghua.yu@...el.com, tglx@...utronix.de, mingo@...hat.com, bp@...en8.de,
 dave.hansen@...ux.intel.com
Cc: x86@...nel.org, hpa@...or.com, paulmck@...nel.org, rdunlap@...radead.org,
 tj@...nel.org, peterz@...radead.org, yanjiewtw@...il.com,
 kim.phillips@....com, lukas.bulwahn@...il.com, seanjc@...gle.com,
 jmattson@...gle.com, leitao@...ian.org, jpoimboe@...nel.org,
 rick.p.edgecombe@...el.com, kirill.shutemov@...ux.intel.com,
 jithu.joseph@...el.com, kai.huang@...el.com, kan.liang@...ux.intel.com,
 daniel.sneddon@...ux.intel.com, pbonzini@...hat.com, sandipan.das@....com,
 ilpo.jarvinen@...ux.intel.com, peternewman@...gle.com,
 maciej.wieczor-retman@...el.com, linux-doc@...r.kernel.org,
 linux-kernel@...r.kernel.org, eranian@...gle.com, james.morse@....com
Subject: Re: [PATCH v4 00/19] x86/resctrl : Support AMD Assignable Bandwidth
 Monitoring Counters (ABMC)

Hi Reinette,

Thanks for the feedback for the series.

On 6/13/24 19:54, Reinette Chatre wrote:
> Hi Babu,
> 
> On 5/24/24 5:23 AM, Babu Moger wrote:
>>
>>
>> d. This series adds a new interface file
>> /sys/fs/resctrl/info/L3_MON/mbm_assign_control
>>     to list and modify the group's assignment states.
> 
> There was a lot of discussion resulting in this centralized file. At first
> glance this
> file appears to be very complicated and I believe any reasonable person
> would wonder if
> all of this is necessary. I recommend that you add a motivation for why
> this file is needed.
> Some items I recall are : it makes it easier for user space to learn how
> counters are used (no
> need to traverse resctrl and open()/close() many files), on the resctrl
> side it makes
> it possible to support counter re-assignment with a single IPI. There may
> be other motivations
> that I am forgetting now.

Sure. Will add those details.
> 
> Also, could the name just be "mbm_control"? What is enabled at this time
> are "assignable
> counters" but in the future we may want to add support for other flags
> that have nothing to
> do with "assignable counters".

Yes. Sure.

> 
>>
>>     The list follows the following format:
>>
>>     "<CTRL_MON group>/<MON group>/<domain_id>=<assignment_flags>"
> 
> "assignment_flags" -> "flags" ? (throughout)

Yes.

> 
>>
>>
>>     Format for specific type of groups:
>>
>>     * Default CTRL_MON group:
>>      "//<domain_id>=<assignment_flags>"
>>
>>         * Non-default CTRL_MON group:
>>                 "<CTRL_MON group>//<domain_id>=<assignment_flags>"
>>
>>         * Child MON group of default CTRL_MON group:
>>                 "/<MON group>/<domain_id>=<assignment_flags>"
>>
>>         * Child MON group of non-default CTRL_MON group:
>>                 "<CTRL_MON group>/<MON
>> group>/<domain_id>=<assignment_flags>"
>>
>>         Assignment flags can be one of the following:
>>
>>          t  MBM total event is enabled
>>          l  MBM local event is enabled
>>          tl Both total and local MBM events are enabled
>>          _  None of the MBM events are enabled
>>
>>     Examples:
>>
>>     # cat /sys/fs/resctrl/info/L3_MON/mbm_assign_control
>>     non_default_ctrl_mon_grp//0=tl;1=tl;
>>     non_default_ctrl_mon_grp/child_non_default_mon_grp/0=tl;1=tl;
>>     //0=tl;1=tl;
>>     /child_default_mon_grp/0=tl;1=tl;
>>
>>     There are four groups and all the groups have local and total
>>     event enabled on domain 0 and 1.
>>
>>     =tl means both total and local events are enabled.
>>
>>     "//" - This is a default CONTROL MON group
>>
>>     "non_default_ctrl_mon_grp//" - This is non default CONTROL MON group
> 
> Be consistent with "non-default" (vs non default) as well as "CTRL_MON" (vs
> CONTROL MON).

Sure.

> 
>>
>>     "/child_default_mon_grp/"  - This is Child MON group of the defult
>> group
> 
> "Child" -> "child"
> "defult" -> "default"

Yes.
> 
>>
>>     "non_default_ctrl_mon_grp/child_non_default_mon_grp/" - This is child
>>     MON group of the non default group
> 
> non-default

Sure.

> 
>>
>> e. Update the group assignment states using the interface file
>> /sys/fs/resctrl/info/L3_MON/mbm_assign_control.
>>
>>     The write format is similar to the above list format with addition of
>>     op-code for the assignment operation.
>>     
>>     * Default CTRL_MON group:
>>             "//<domain_id><op-code><assignment_flags>"
>>     
>>     * Non-default CTRL_MON group:
>>             "<CTRL_MON group>//<domain_id><op-code><assignment_flags>"
>>     
>>     * Child MON group of default CTRL_MON group:
>>             "/<MON group>/<domain_id><op-code><assignment_flags>"
>>     
>>     * Child MON group of non-default CTRL_MON group:
>>             "<CTRL_MON group>/<MON
>> group>/<domain_id><op-code><assignment_flags>"
>>     
>>     Op-code can be one of the following:
>>     
>>     = Update the assignment to match the flags
>>     + Assign a new state
>>     - Unassign a new state
> 
> Looking here and the implementation it seems that "+_" and "-_" is supported.
> I think that should be invalid. Only "=_" seems appropriate to me.
> Also please take care to not have a catchall "default" that does an
> unassign. Doing something like that will prevent us from ever being
> able to add any flags in the future.

Yes. Good catch..  Will fix it.

> 
>>     
>>     
>>     Initial group status:
>>     
>>     # cat /sys/fs/resctrl/info/L3_MON/mbm_assign_control
>>     non_default_ctrl_mon_grp//0=tl;1=tl;
>>     non_default_ctrl_mon_grp/child_non_default_mon_grp/0=tl;1=tl;
>>     //0=tl;1=tl;
>>     /child_default_mon_grp/0=tl;1=tl;
>>     
>>      To update the default group to enable only total event on domain 0:
>>      # echo "//0=t" > /sys/fs/resctrl/info/L3_MON/mbm_assign_control
>>     
>>      Assignment status after the update:
>>      # cat /sys/fs/resctrl/info/L3_MON/mbm_assign_control
>>      non_default_ctrl_mon_grp//0=tl;1=tl;
>>      non_default_ctrl_mon_grp/child_non_default_mon_grp/0=tl;1=tl;
>>      //0=t;1=tl;
>>      /child_default_mon_grp/0=tl;1=tl;
>>     
>>      To update the MON group child_default_mon_grp to remove total event
>> on domain 1:
>>      # echo "/child_default_mon_grp/1-t" >
>> /sys/fs/resctrl/info/L3_MON/mbm_assign_control
>>     
>>      Assignment status after the update:
>>      $ cat /sys/fs/resctrl/info/L3_MON/mbm_assign_control
>>      non_default_ctrl_mon_grp//0=tl;1=tl;
>>      non_default_ctrl_mon_grp/child_non_default_mon_grp/0=tl;1=tl;
>>      //0=t;1=l;
>>      /child_default_mon_grp/0=t;1=tl;
> 
> This does not look right. Why did domain #1 of the default CTRL_MON group
> change also?

Will correct  it.

> 
>>     
>>      To update the MON group
>> non_default_ctrl_mon_grp/child_non_default_mon_grp to
>>      remove both local and total events on domain 1:
>>      # echo "non_default_ctrl_mon_grp/child_non_default_mon_grp/1=_" >
>>                    /sys/fs/resctrl/info/L3_MON/mbm_assign_control
>>     
>>      Assignment status after the update:
>>      # cat /sys/fs/resctrl/info/L3_MON/mbm_assign_control
>>      non_default_ctrl_mon_grp//0=tl;1=tl;
>>      non_default_ctrl_mon_grp/child_non_default_mon_grp/0=tl;1=_;
>>      //0=t;1=l;
>>      /child_default_mon_grp/0=t;1=tl;
>>     
>>      To update the default group to add a total event domain 1.
>>      # echo "//1+t" > /sys/fs/resctrl/info/L3_MON/mbm_assign_control
>>     
> 
> Unclear where "t" flag was removed.

Yes. Will correct.

> 
>>      Assignment status after the update:
>>     
>>      # cat /sys/fs/resctrl/info/L3_MON/mbm_assign_control
>>      non_default_ctrl_mon_grp//0=tl;1=tl;
>>      non_default_ctrl_mon_grp/child_non_default_mon_grp/0=tl;1=_;
>>      //0=t;1=tl;
>>      /child_default_mon_grp/0=t;1=tl;
>>     
>> f. Read the event mbm_total_bytes and mbm_local_bytes of the default group.
>>     There is no change in reading the evetns with ABMC. If the event is
>> unassigned
> 
> "evetns" -> "events"

Sure.

> 
>>     when reading, then the read will come back as Unavailable.
> 
> Should this not rather be "Unassigned"? According to the docs the counters
> will return "Unavailable" right after reconfigure so it seems that there
> are scenarios where an "assigned" counter returns "Unavailable". It seems
> more
> useful to return "Unassigned" that will have a new specific meaning that
> overloading existing "Unavailable" that has original meaning of "try
> again" ....
> but in this case trying again will be futile.

Hardware returns "Unavailable" in both the cases. So, thought of 
reporting the same without any interpretation.

> 
>>     
>>     # cat /sys/fs/resctrl/mon_data/mon_L3_00/mbm_total_bytes
>>     779247936
>>     # cat /sys/fs/resctrl/mon_data/mon_L3_00/mbm_local_bytes
>>     765207488
>>     
>> g. Users will have the option to go back to legacy_mbm mode if required.
>>     This can be done using the following command.
>>
>>     # echo "legacy_mbm" > /sys/fs/resctrl/info/L3_MON/mbm_assign
>>     # cat /sys/fs/resctrl/info/L3_MON/mbm_assign
>>          abmc
>>          [mbm_legacy]
> 
> It is confusing for the value written by user space to be different from
> the value displayed: "legacy_mbm" vs "mbm_legacy.

My bad. Both should have been "legacy_mbm"

> 
> This is still missing information about what happens to the
> counters/events on
> such a switch. Will events just keep counting? Will they be reset? ...?

It will all reset.

> 
> I also think we should try to find a more generic name for this file.
> "mbm_cntr_mode" or "mbm_mode" maybe?

"mbm_mode" looks better.  Then I will change "legacy_mbm" to "mbm_legacy".

> 
>>
>> h. Check the bandwidth configuration for the group. Note that bandwidth
>>     configuration has a domain scope. Total event defaults to 0x7F (to
>>     count all the events) and local event defaults to 0x15 (to count all
>>     the local numa events). The event bitmap decoding is available at
>>     https://www.kernel.org/doc/Documentation/x86/resctrl.rst
>>     in section "mbm_total_bytes_config", "mbm_local_bytes_config":
>>     
>>     #cat /sys/fs/resctrl/info/L3_MON/mbm_total_bytes_config
>>     0=0x7f;1=0x7f
>>     
>>     #cat /sys/fs/resctrl/info/L3_MON/mbm_local_bytes_config
>>     0=0x15;1=0x15
>>     
>> j. Change the bandwidth source for domain 0 for the total event to count
>> only reads.
>>     Note that this change effects total events on the domain 0.
>>     
>>     #echo 0=0x33 > /sys/fs/resctrl/info/L3_MON/mbm_total_bytes_config
>>     #cat /sys/fs/resctrl/info/L3_MON/mbm_total_bytes_config
>>     0=0x33;1=0x7F
>>     
>> k. Now read the total event again. The mbm_total_bytes should display
>>     only the read events.
>>     
>>     #cat /sys/fs/resctrl/mon_data/mon_L3_00/mbm_total_bytes
>>     314101
> 
> According to doc, right after a BMEC change the counter will read
> "Unavailable"
> is this not the case here?

Yes. First read will come back with "Unavailable". Will have add one 
line about that here.

> 
>>     
>> l. Unmount the resctrl
>>     
>>     #umount /sys/fs/resctrl/
> 
> Reinette
> 
> 

-- 
Thanks
Babu Moger

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ