[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <a131e8ed-88b2-4fed-983b-5deea955a9a5@intel.com>
Date: Wed, 21 May 2025 17:10:19 -0700
From: Reinette Chatre <reinette.chatre@...el.com>
To: "Luck, Tony" <tony.luck@...el.com>
CC: Peter Newman <peternewman@...gle.com>, "Moger, Babu" <bmoger@....com>,
<babu.moger@....com>, <corbet@....net>, <tglx@...utronix.de>,
<mingo@...hat.com>, <bp@...en8.de>, <dave.hansen@...ux.intel.com>,
<james.morse@....com>, <dave.martin@....com>, <fenghuay@...dia.com>,
<x86@...nel.org>, <hpa@...or.com>, <paulmck@...nel.org>,
<akpm@...ux-foundation.org>, <thuth@...hat.com>, <rostedt@...dmis.org>,
<ardb@...nel.org>, <gregkh@...uxfoundation.org>,
<daniel.sneddon@...ux.intel.com>, <jpoimboe@...nel.org>,
<alexandre.chartre@...cle.com>, <pawan.kumar.gupta@...ux.intel.com>,
<thomas.lendacky@....com>, <perry.yuan@....com>, <seanjc@...gle.com>,
<kai.huang@...el.com>, <xiaoyao.li@...el.com>, <kan.liang@...ux.intel.com>,
<xin3.li@...el.com>, <ebiggers@...gle.com>, <xin@...or.com>,
<sohil.mehta@...el.com>, <andrew.cooper3@...rix.com>,
<mario.limonciello@....com>, <linux-doc@...r.kernel.org>,
<linux-kernel@...r.kernel.org>, <maciej.wieczor-retman@...el.com>,
<eranian@...gle.com>, <Xiaojian.Du@....com>, <gautham.shenoy@....com>
Subject: Re: [PATCH v13 00/27] x86/resctrl : Support AMD Assignable Bandwidth
Monitoring Counters (ABMC)
Hi Tony,
On 5/21/25 4:43 PM, Luck, Tony wrote:
> On Wed, May 21, 2025 at 04:03:37PM -0700, Reinette Chatre wrote:
>> Hi Peter and Babu,
>>
>> On 5/21/25 2:18 AM, Peter Newman wrote:
..
>>> There's also the mongroup-RMID overcommit use case I described
>>> above[1]. On Intel we can safely assume that there are counters to
>>> back all RMIDs, so num_mbm_cntrs would be calculated directly from
>>> num_rmids.
>>
>> This is about the:
>> There's now more interest in Google for allowing explicit control of
>> where RMIDs are assigned on Intel platforms. Even though the number of
>> RMIDs implemented by hardware tends to be roughly the number of
>> containers they want to support, they often still need to create
>> containers when all RMIDs have already been allocated, which is not
>> currently allowed. Once the container has been created and starts
>> running, it's no longer possible to move its threads into a monitoring
>> group whenever RMIDs should become available again, so it's important
>> for resctrl to maintain an accurate task list for a container even
>> when RMIDs are not available.
>>
>> I see a monitor group as a collection of tasks that need to be monitored together.
>> The "task list" is the group of tasks that share a monitoring ID that
>> is required to be a valid ID since when any of the tasks are scheduled that ID is
>> written to the hardware. I intentionally tried to not use RMID since I believe
>> this is required for all archs.
>> I thus do not understand how a task can start running when it does not have
>> a valid monitoring ID. The idea of "deferred assignment" is not clear to me,
>> there can never be "unmonitored tasks", no? I think I am missing something here.
>
> In the AMD/RMID implemenentation this might be achieved with something
> extra in the task structure to denote whether a task is in a monitored
> group or not. E.g. We add "task->rmid_valid" as well as "task->rmid".
> Tasks in an unmonitored group retain their "task->rmid" (that's what
> identifies them as a member of a group) but have task->rmid_valid set
> to false. Context switch code would be updated to load "0" into the
> IA32_PQR_ASSOC.RMID field for tasks without a valid RMID. So they
> would still be monitored, but activity would be bundled with all
> tasks in the default resctrl group.
>
> Presumably something analogous could be done for ARM/MPAM.
>
I do not interpret this as an unmonitored task but instead a task that
belongs to the default resource group. Specifically, any data accumulated by
such a task is attributed to the default resource group. Having tasks
in a separate group but their monitoring data accumulating in/contributed to
the default resource group (that has its own set of tasks) sounds wrong to me.
Such an implementation makes any monitoring data of default resource group
invalid, and by extension impossible to use default resource group to manage
an allocation for a group of monitor groups if user space needs insight
in monitoring data across all these monitor groups. User space will need to
interact with resctrl differently and individually query monitor groups instead
of CTRL_MON group once.
Reinette
Powered by blists - more mailing lists