[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <28080949-d8c2-4cbf-b971-705deb71ac4c@intel.com>
Date: Thu, 7 Nov 2024 16:21:42 -0800
From: Reinette Chatre <reinette.chatre@...el.com>
To: "Luck, Tony" <tony.luck@...el.com>, Peter Newman <peternewman@...gle.com>
CC: "Yu, Fenghua" <fenghua.yu@...el.com>, "babu.moger@....com"
<babu.moger@....com>, "bp@...en8.de" <bp@...en8.de>,
"dave.hansen@...ux.intel.com" <dave.hansen@...ux.intel.com>, "Eranian,
Stephane" <eranian@...gle.com>, "hpa@...or.com" <hpa@...or.com>,
"james.morse@....com" <james.morse@....com>, "linux-kernel@...r.kernel.org"
<linux-kernel@...r.kernel.org>, "mingo@...hat.com" <mingo@...hat.com>,
"nert.pinx@...il.com" <nert.pinx@...il.com>, "tan.shaopeng@...itsu.com"
<tan.shaopeng@...itsu.com>, "tglx@...utronix.de" <tglx@...utronix.de>,
"x86@...nel.org" <x86@...nel.org>
Subject: Re: [PATCH v2 2/2] x86/resctrl: Don't workqueue local event counter
reads
Hi Tony,
On 11/7/24 3:30 PM, Luck, Tony wrote:
>>> E.g. read a counter:
>>>
>>> $ cat mbm_local_bytes
>>> 123456789
>>>
>>> H/w counter for this event/group assigned elsewhere.
>>>
>>> H/w counter assigned back to this event/group
>>>
>>> $ cat mbm_local_bytes
>>> 23456
>>>
>>> Bandwidth calculation sees traffic amount:
>>> (23456 - 123456789) = -123433333
>>> Oops. Negative!
>>
>> As I understand this is already an issue today on AMD systems without assignable counters
>> that may run out of counters. On these systems, any RMID that is no longer being tracked will
>> be reset to zero. [1]
>
> My understanding too.
>
>> The support for assignable counters give user space control over this unexpected reset of
>> counters.
>>
>> The scenario you present seem to demonstrate how two independent user space systems
>> can trample on each other when interacting with the same resources. Is this something you expect
>> resctrl should protect against? I would expect that there would be a single user space system
>> doing something like above and it would reset history after unassigning a counter.
>
> As we are discussing adding a new interface, I thought it worth considering adding
> a way for user space to be aware of the re-assignment of counters. IMHO it would be
> a nice to have feature. Not required if all users of resctrl are aware of each other's
> actions.
If this is indeed a requirement it may be best to consider it as part of the current
work to enable assignable counters. For example, by adding the "generation" value to
"mbm_assign_control" file that independent user space apps can query to get current counter
state before parsing event data.
I am not familiar with a use case relying on independent user space applications interacting
with resctrl so I would like to understand this requirement better before making the interface
more complicated.
>
>> This does indeed highlight that if resctrl does start to dynamically assign counters (which
>> has only been speculated in this thread and is not part of current [1] design) then it may cause
>> problems on user space side.
>
> Agreed. Dynamic assignment would break "the user knows what is happening" assumption.
> Seems like a bad idea.
I believe that is why Peter described it as a new "mode" that user space can select and thus
be aware of. This does not address how user space is expected to deal with event data reads that
may not increment when this "mode" is active though.
Reinette
Powered by blists - more mailing lists