[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <6e297b6e-e39b-e358-7bb5-59add62f8b2b@amd.com>
Date: Tue, 18 Jun 2024 16:02:27 -0500
From: "Moger, Babu" <babu.moger@....com>
To: Reinette Chatre <reinette.chatre@...el.com>, corbet@....net,
fenghua.yu@...el.com, tglx@...utronix.de, mingo@...hat.com, bp@...en8.de,
dave.hansen@...ux.intel.com
Cc: x86@...nel.org, hpa@...or.com, paulmck@...nel.org, rdunlap@...radead.org,
tj@...nel.org, peterz@...radead.org, yanjiewtw@...il.com,
kim.phillips@....com, lukas.bulwahn@...il.com, seanjc@...gle.com,
jmattson@...gle.com, leitao@...ian.org, jpoimboe@...nel.org,
rick.p.edgecombe@...el.com, kirill.shutemov@...ux.intel.com,
jithu.joseph@...el.com, kai.huang@...el.com, kan.liang@...ux.intel.com,
daniel.sneddon@...ux.intel.com, pbonzini@...hat.com, sandipan.das@....com,
ilpo.jarvinen@...ux.intel.com, peternewman@...gle.com,
maciej.wieczor-retman@...el.com, linux-doc@...r.kernel.org,
linux-kernel@...r.kernel.org, eranian@...gle.com, james.morse@....com
Subject: Re: [PATCH v4 00/19] x86/resctrl : Support AMD Assignable Bandwidth
Monitoring Counters (ABMC)
Hi Reinette,
Thanks for the feedback for the series.
On 6/13/24 19:54, Reinette Chatre wrote:
> Hi Babu,
>
> On 5/24/24 5:23 AM, Babu Moger wrote:
>>
>>
>> d. This series adds a new interface file
>> /sys/fs/resctrl/info/L3_MON/mbm_assign_control
>> to list and modify the group's assignment states.
>
> There was a lot of discussion resulting in this centralized file. At first
> glance this
> file appears to be very complicated and I believe any reasonable person
> would wonder if
> all of this is necessary. I recommend that you add a motivation for why
> this file is needed.
> Some items I recall are : it makes it easier for user space to learn how
> counters are used (no
> need to traverse resctrl and open()/close() many files), on the resctrl
> side it makes
> it possible to support counter re-assignment with a single IPI. There may
> be other motivations
> that I am forgetting now.
Sure. Will add those details.
>
> Also, could the name just be "mbm_control"? What is enabled at this time
> are "assignable
> counters" but in the future we may want to add support for other flags
> that have nothing to
> do with "assignable counters".
Yes. Sure.
>
>>
>> The list follows the following format:
>>
>> "<CTRL_MON group>/<MON group>/<domain_id>=<assignment_flags>"
>
> "assignment_flags" -> "flags" ? (throughout)
Yes.
>
>>
>>
>> Format for specific type of groups:
>>
>> * Default CTRL_MON group:
>> "//<domain_id>=<assignment_flags>"
>>
>> * Non-default CTRL_MON group:
>> "<CTRL_MON group>//<domain_id>=<assignment_flags>"
>>
>> * Child MON group of default CTRL_MON group:
>> "/<MON group>/<domain_id>=<assignment_flags>"
>>
>> * Child MON group of non-default CTRL_MON group:
>> "<CTRL_MON group>/<MON
>> group>/<domain_id>=<assignment_flags>"
>>
>> Assignment flags can be one of the following:
>>
>> t MBM total event is enabled
>> l MBM local event is enabled
>> tl Both total and local MBM events are enabled
>> _ None of the MBM events are enabled
>>
>> Examples:
>>
>> # cat /sys/fs/resctrl/info/L3_MON/mbm_assign_control
>> non_default_ctrl_mon_grp//0=tl;1=tl;
>> non_default_ctrl_mon_grp/child_non_default_mon_grp/0=tl;1=tl;
>> //0=tl;1=tl;
>> /child_default_mon_grp/0=tl;1=tl;
>>
>> There are four groups and all the groups have local and total
>> event enabled on domain 0 and 1.
>>
>> =tl means both total and local events are enabled.
>>
>> "//" - This is a default CONTROL MON group
>>
>> "non_default_ctrl_mon_grp//" - This is non default CONTROL MON group
>
> Be consistent with "non-default" (vs non default) as well as "CTRL_MON" (vs
> CONTROL MON).
Sure.
>
>>
>> "/child_default_mon_grp/" - This is Child MON group of the defult
>> group
>
> "Child" -> "child"
> "defult" -> "default"
Yes.
>
>>
>> "non_default_ctrl_mon_grp/child_non_default_mon_grp/" - This is child
>> MON group of the non default group
>
> non-default
Sure.
>
>>
>> e. Update the group assignment states using the interface file
>> /sys/fs/resctrl/info/L3_MON/mbm_assign_control.
>>
>> The write format is similar to the above list format with addition of
>> op-code for the assignment operation.
>>
>> * Default CTRL_MON group:
>> "//<domain_id><op-code><assignment_flags>"
>>
>> * Non-default CTRL_MON group:
>> "<CTRL_MON group>//<domain_id><op-code><assignment_flags>"
>>
>> * Child MON group of default CTRL_MON group:
>> "/<MON group>/<domain_id><op-code><assignment_flags>"
>>
>> * Child MON group of non-default CTRL_MON group:
>> "<CTRL_MON group>/<MON
>> group>/<domain_id><op-code><assignment_flags>"
>>
>> Op-code can be one of the following:
>>
>> = Update the assignment to match the flags
>> + Assign a new state
>> - Unassign a new state
>
> Looking here and the implementation it seems that "+_" and "-_" is supported.
> I think that should be invalid. Only "=_" seems appropriate to me.
> Also please take care to not have a catchall "default" that does an
> unassign. Doing something like that will prevent us from ever being
> able to add any flags in the future.
Yes. Good catch.. Will fix it.
>
>>
>>
>> Initial group status:
>>
>> # cat /sys/fs/resctrl/info/L3_MON/mbm_assign_control
>> non_default_ctrl_mon_grp//0=tl;1=tl;
>> non_default_ctrl_mon_grp/child_non_default_mon_grp/0=tl;1=tl;
>> //0=tl;1=tl;
>> /child_default_mon_grp/0=tl;1=tl;
>>
>> To update the default group to enable only total event on domain 0:
>> # echo "//0=t" > /sys/fs/resctrl/info/L3_MON/mbm_assign_control
>>
>> Assignment status after the update:
>> # cat /sys/fs/resctrl/info/L3_MON/mbm_assign_control
>> non_default_ctrl_mon_grp//0=tl;1=tl;
>> non_default_ctrl_mon_grp/child_non_default_mon_grp/0=tl;1=tl;
>> //0=t;1=tl;
>> /child_default_mon_grp/0=tl;1=tl;
>>
>> To update the MON group child_default_mon_grp to remove total event
>> on domain 1:
>> # echo "/child_default_mon_grp/1-t" >
>> /sys/fs/resctrl/info/L3_MON/mbm_assign_control
>>
>> Assignment status after the update:
>> $ cat /sys/fs/resctrl/info/L3_MON/mbm_assign_control
>> non_default_ctrl_mon_grp//0=tl;1=tl;
>> non_default_ctrl_mon_grp/child_non_default_mon_grp/0=tl;1=tl;
>> //0=t;1=l;
>> /child_default_mon_grp/0=t;1=tl;
>
> This does not look right. Why did domain #1 of the default CTRL_MON group
> change also?
Will correct it.
>
>>
>> To update the MON group
>> non_default_ctrl_mon_grp/child_non_default_mon_grp to
>> remove both local and total events on domain 1:
>> # echo "non_default_ctrl_mon_grp/child_non_default_mon_grp/1=_" >
>> /sys/fs/resctrl/info/L3_MON/mbm_assign_control
>>
>> Assignment status after the update:
>> # cat /sys/fs/resctrl/info/L3_MON/mbm_assign_control
>> non_default_ctrl_mon_grp//0=tl;1=tl;
>> non_default_ctrl_mon_grp/child_non_default_mon_grp/0=tl;1=_;
>> //0=t;1=l;
>> /child_default_mon_grp/0=t;1=tl;
>>
>> To update the default group to add a total event domain 1.
>> # echo "//1+t" > /sys/fs/resctrl/info/L3_MON/mbm_assign_control
>>
>
> Unclear where "t" flag was removed.
Yes. Will correct.
>
>> Assignment status after the update:
>>
>> # cat /sys/fs/resctrl/info/L3_MON/mbm_assign_control
>> non_default_ctrl_mon_grp//0=tl;1=tl;
>> non_default_ctrl_mon_grp/child_non_default_mon_grp/0=tl;1=_;
>> //0=t;1=tl;
>> /child_default_mon_grp/0=t;1=tl;
>>
>> f. Read the event mbm_total_bytes and mbm_local_bytes of the default group.
>> There is no change in reading the evetns with ABMC. If the event is
>> unassigned
>
> "evetns" -> "events"
Sure.
>
>> when reading, then the read will come back as Unavailable.
>
> Should this not rather be "Unassigned"? According to the docs the counters
> will return "Unavailable" right after reconfigure so it seems that there
> are scenarios where an "assigned" counter returns "Unavailable". It seems
> more
> useful to return "Unassigned" that will have a new specific meaning that
> overloading existing "Unavailable" that has original meaning of "try
> again" ....
> but in this case trying again will be futile.
Hardware returns "Unavailable" in both the cases. So, thought of
reporting the same without any interpretation.
>
>>
>> # cat /sys/fs/resctrl/mon_data/mon_L3_00/mbm_total_bytes
>> 779247936
>> # cat /sys/fs/resctrl/mon_data/mon_L3_00/mbm_local_bytes
>> 765207488
>>
>> g. Users will have the option to go back to legacy_mbm mode if required.
>> This can be done using the following command.
>>
>> # echo "legacy_mbm" > /sys/fs/resctrl/info/L3_MON/mbm_assign
>> # cat /sys/fs/resctrl/info/L3_MON/mbm_assign
>> abmc
>> [mbm_legacy]
>
> It is confusing for the value written by user space to be different from
> the value displayed: "legacy_mbm" vs "mbm_legacy.
My bad. Both should have been "legacy_mbm"
>
> This is still missing information about what happens to the
> counters/events on
> such a switch. Will events just keep counting? Will they be reset? ...?
It will all reset.
>
> I also think we should try to find a more generic name for this file.
> "mbm_cntr_mode" or "mbm_mode" maybe?
"mbm_mode" looks better. Then I will change "legacy_mbm" to "mbm_legacy".
>
>>
>> h. Check the bandwidth configuration for the group. Note that bandwidth
>> configuration has a domain scope. Total event defaults to 0x7F (to
>> count all the events) and local event defaults to 0x15 (to count all
>> the local numa events). The event bitmap decoding is available at
>> https://www.kernel.org/doc/Documentation/x86/resctrl.rst
>> in section "mbm_total_bytes_config", "mbm_local_bytes_config":
>>
>> #cat /sys/fs/resctrl/info/L3_MON/mbm_total_bytes_config
>> 0=0x7f;1=0x7f
>>
>> #cat /sys/fs/resctrl/info/L3_MON/mbm_local_bytes_config
>> 0=0x15;1=0x15
>>
>> j. Change the bandwidth source for domain 0 for the total event to count
>> only reads.
>> Note that this change effects total events on the domain 0.
>>
>> #echo 0=0x33 > /sys/fs/resctrl/info/L3_MON/mbm_total_bytes_config
>> #cat /sys/fs/resctrl/info/L3_MON/mbm_total_bytes_config
>> 0=0x33;1=0x7F
>>
>> k. Now read the total event again. The mbm_total_bytes should display
>> only the read events.
>>
>> #cat /sys/fs/resctrl/mon_data/mon_L3_00/mbm_total_bytes
>> 314101
>
> According to doc, right after a BMEC change the counter will read
> "Unavailable"
> is this not the case here?
Yes. First read will come back with "Unavailable". Will have add one
line about that here.
>
>>
>> l. Unmount the resctrl
>>
>> #umount /sys/fs/resctrl/
>
> Reinette
>
>
--
Thanks
Babu Moger
Powered by blists - more mailing lists