[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <fa78c5e6-582c-43fd-a0c0-5b6a4439b0e2@intel.com>
Date: Thu, 22 May 2025 09:32:42 -0700
From: Reinette Chatre <reinette.chatre@...el.com>
To: Peter Newman <peternewman@...gle.com>, "Luck, Tony" <tony.luck@...el.com>
CC: "Moger, Babu" <bmoger@....com>, "babu.moger@....com" <babu.moger@....com>,
"corbet@....net" <corbet@....net>, "tglx@...utronix.de" <tglx@...utronix.de>,
"mingo@...hat.com" <mingo@...hat.com>, "bp@...en8.de" <bp@...en8.de>,
"dave.hansen@...ux.intel.com" <dave.hansen@...ux.intel.com>,
"james.morse@....com" <james.morse@....com>, "dave.martin@....com"
<dave.martin@....com>, "fenghuay@...dia.com" <fenghuay@...dia.com>,
"x86@...nel.org" <x86@...nel.org>, "hpa@...or.com" <hpa@...or.com>,
"paulmck@...nel.org" <paulmck@...nel.org>, "akpm@...ux-foundation.org"
<akpm@...ux-foundation.org>, "thuth@...hat.com" <thuth@...hat.com>,
"rostedt@...dmis.org" <rostedt@...dmis.org>, "ardb@...nel.org"
<ardb@...nel.org>, "gregkh@...uxfoundation.org" <gregkh@...uxfoundation.org>,
"daniel.sneddon@...ux.intel.com" <daniel.sneddon@...ux.intel.com>,
"jpoimboe@...nel.org" <jpoimboe@...nel.org>, "alexandre.chartre@...cle.com"
<alexandre.chartre@...cle.com>, "pawan.kumar.gupta@...ux.intel.com"
<pawan.kumar.gupta@...ux.intel.com>, "thomas.lendacky@....com"
<thomas.lendacky@....com>, "perry.yuan@....com" <perry.yuan@....com>,
"seanjc@...gle.com" <seanjc@...gle.com>, "Huang, Kai" <kai.huang@...el.com>,
"Li, Xiaoyao" <xiaoyao.li@...el.com>, "kan.liang@...ux.intel.com"
<kan.liang@...ux.intel.com>, "Li, Xin3" <xin3.li@...el.com>,
"ebiggers@...gle.com" <ebiggers@...gle.com>, "xin@...or.com" <xin@...or.com>,
"Mehta, Sohil" <sohil.mehta@...el.com>, "andrew.cooper3@...rix.com"
<andrew.cooper3@...rix.com>, "mario.limonciello@....com"
<mario.limonciello@....com>, "linux-doc@...r.kernel.org"
<linux-doc@...r.kernel.org>, "linux-kernel@...r.kernel.org"
<linux-kernel@...r.kernel.org>, "Wieczor-Retman, Maciej"
<maciej.wieczor-retman@...el.com>, "Eranian, Stephane" <eranian@...gle.com>,
"Xiaojian.Du@....com" <Xiaojian.Du@....com>, "gautham.shenoy@....com"
<gautham.shenoy@....com>
Subject: Re: [PATCH v13 00/27] x86/resctrl : Support AMD Assignable Bandwidth
Monitoring Counters (ABMC)
Hi Peter,
On 5/22/25 1:47 AM, Peter Newman wrote:
> Hi Tony, Reinette,
>
> On Thu, May 22, 2025 at 2:21 AM Luck, Tony <tony.luck@...el.com> wrote:
>>
>>>>>> There's also the mongroup-RMID overcommit use case I described
>>>>>> above[1]. On Intel we can safely assume that there are counters to
>>>>>> back all RMIDs, so num_mbm_cntrs would be calculated directly from
>>>>>> num_rmids.
>>>>>
>>>>> This is about the:
>>>>> There's now more interest in Google for allowing explicit control of
>>>>> where RMIDs are assigned on Intel platforms. Even though the number of
>>>>> RMIDs implemented by hardware tends to be roughly the number of
>>>>> containers they want to support, they often still need to create
>>>>> containers when all RMIDs have already been allocated, which is not
>>>>> currently allowed. Once the container has been created and starts
>>>>> running, it's no longer possible to move its threads into a monitoring
>>>>> group whenever RMIDs should become available again, so it's important
>>>>> for resctrl to maintain an accurate task list for a container even
>>>>> when RMIDs are not available.
>>>>>
>>>>> I see a monitor group as a collection of tasks that need to be monitored together.
>>>>> The "task list" is the group of tasks that share a monitoring ID that
>>>>> is required to be a valid ID since when any of the tasks are scheduled that ID is
>>>>> written to the hardware. I intentionally tried to not use RMID since I believe
>>>>> this is required for all archs.
>>>>> I thus do not understand how a task can start running when it does not have
>>>>> a valid monitoring ID. The idea of "deferred assignment" is not clear to me,
>>>>> there can never be "unmonitored tasks", no? I think I am missing something here.
>
> You are correct. I did forget to mention something...
>
>>>>
>>>> In the AMD/RMID implemenentation this might be achieved with something
>>>> extra in the task structure to denote whether a task is in a monitored
>>>> group or not. E.g. We add "task->rmid_valid" as well as "task->rmid".
>>>> Tasks in an unmonitored group retain their "task->rmid" (that's what
>>>> identifies them as a member of a group) but have task->rmid_valid set
>>>> to false. Context switch code would be updated to load "0" into the
>>>> IA32_PQR_ASSOC.RMID field for tasks without a valid RMID. So they
>>>> would still be monitored, but activity would be bundled with all
>>>> tasks in the default resctrl group.
>>>>
>>>> Presumably something analogous could be done for ARM/MPAM.
>>>>
>>>
>>> I do not interpret this as an unmonitored task but instead a task that
>>> belongs to the default resource group. Specifically, any data accumulated by
>>> such a task is attributed to the default resource group. Having tasks
>>> in a separate group but their monitoring data accumulating in/contributed to
>>> the default resource group (that has its own set of tasks) sounds wrong to me.
>>> Such an implementation makes any monitoring data of default resource group
>>> invalid, and by extension impossible to use default resource group to manage
>>> an allocation for a group of monitor groups if user space needs insight
>>> in monitoring data across all these monitor groups. User space will need to
>>> interact with resctrl differently and individually query monitor groups instead
>>> of CTRL_MON group once.
>>
>> Maybe assign one of the limited supply of RMIDs for these "unmonitored"
>> tasks. Populate a resctrl group named "unmonitored" that lists all the
>> unmonitored tasks in a (read-only) "tasks" file. And supply all the counts
>> for these tasks in normal looking "mon_data" directory.
>
> I needed to switch to an rdtgroup struct pointer rather than hardware
> IDs in the task structure to indicate group membership[1], otherwise
> it's not possible to determine which tasks are in a group when it
> doesn't have a unique HW ID value.
Whether the task struct contains a pointer (albeit accompanied with its
own complexities) does not address the issue that I am concerned about.
Looking at [1] I expect this new feature handles "unmonitored" groups by
placing them in the default monitoring group, following Tony's first [3]
suggestion.
When considering [1] by itself in the context of current resctrl all tasks
should be members of resource groups that have valid HW monitoring IDs allocated.
Using the default resource group in this way seems like addressing edge cases
where pointer is not yet valid (unclear what these scenarios may be) instead of
routing many tasks to the default group. I am not sure and I'll have to study
that change closer to reason accurately.
>From what I understand the new proposal that builds on [1] involves creating
new monitor groups that are "unmonitored" for any length of time and when backed
by the implementation in [1] this would mean these groups will actually
still be monitored but the data attributed to the default resource group.
As I mentioned in response [4] to Tony this fundamentally changes the
behavior users can expect from the default resource group. In addition,
this breaks the first of the "Resource monitoring rules" from
Documentation/filesystems/resctrl.rst:
1) If a task is a member of a MON group, or non-default CTRL_MON group
then RDT events for the task will be reported in that group.
How does this fit with the ABMC work? I continue to think that I am missing
parts of the discussion as it seems this new feature discussion mixed in
with ABMC work.
Reinette
>
> Also this is required for shared assignment so that changing a group's
> IDs in a domain only requires updating running tasks rather than
> needing to search the entire task list, which would lead to the same
> problem we encountered in mongroup rename[2].
>
> -Peter
>
> [1] https://lore.kernel.org/lkml/20240325172707.73966-5-peternewman@google.com/
> [2] https://lore.kernel.org/lkml/CALPaoCh0SbG1+VbbgcxjubE7Cc2Pb6QqhG3NH6X=WwsNfqNjtA@mail.gmail.com/
[3] https://lore.kernel.org/lkml/aC5lL_qY00vd8qp4@agluck-desk3/
[4] https://lore.kernel.org/lkml/a131e8ed-88b2-4fed-983b-5deea955a9a5@intel.com/
Powered by blists - more mailing lists