[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <CALPaoCjh_NXQLtNBqei=7a6Jsr17fEnPO+kqMaNq4xNu2UPDJA@mail.gmail.com>
Date: Thu, 22 May 2025 10:47:08 +0200
From: Peter Newman <peternewman@...gle.com>
To: "Luck, Tony" <tony.luck@...el.com>
Cc: "Chatre, Reinette" <reinette.chatre@...el.com>, "Moger, Babu" <bmoger@....com>,
"babu.moger@....com" <babu.moger@....com>, "corbet@....net" <corbet@....net>,
"tglx@...utronix.de" <tglx@...utronix.de>, "mingo@...hat.com" <mingo@...hat.com>, "bp@...en8.de" <bp@...en8.de>,
"dave.hansen@...ux.intel.com" <dave.hansen@...ux.intel.com>, "james.morse@....com" <james.morse@....com>,
"dave.martin@....com" <dave.martin@....com>, "fenghuay@...dia.com" <fenghuay@...dia.com>,
"x86@...nel.org" <x86@...nel.org>, "hpa@...or.com" <hpa@...or.com>,
"paulmck@...nel.org" <paulmck@...nel.org>,
"akpm@...ux-foundation.org" <akpm@...ux-foundation.org>, "thuth@...hat.com" <thuth@...hat.com>,
"rostedt@...dmis.org" <rostedt@...dmis.org>, "ardb@...nel.org" <ardb@...nel.org>,
"gregkh@...uxfoundation.org" <gregkh@...uxfoundation.org>,
"daniel.sneddon@...ux.intel.com" <daniel.sneddon@...ux.intel.com>,
"jpoimboe@...nel.org" <jpoimboe@...nel.org>,
"alexandre.chartre@...cle.com" <alexandre.chartre@...cle.com>,
"pawan.kumar.gupta@...ux.intel.com" <pawan.kumar.gupta@...ux.intel.com>,
"thomas.lendacky@....com" <thomas.lendacky@....com>, "perry.yuan@....com" <perry.yuan@....com>,
"seanjc@...gle.com" <seanjc@...gle.com>, "Huang, Kai" <kai.huang@...el.com>,
"Li, Xiaoyao" <xiaoyao.li@...el.com>,
"kan.liang@...ux.intel.com" <kan.liang@...ux.intel.com>, "Li, Xin3" <xin3.li@...el.com>,
"ebiggers@...gle.com" <ebiggers@...gle.com>, "xin@...or.com" <xin@...or.com>,
"Mehta, Sohil" <sohil.mehta@...el.com>,
"andrew.cooper3@...rix.com" <andrew.cooper3@...rix.com>,
"mario.limonciello@....com" <mario.limonciello@....com>,
"linux-doc@...r.kernel.org" <linux-doc@...r.kernel.org>,
"linux-kernel@...r.kernel.org" <linux-kernel@...r.kernel.org>,
"Wieczor-Retman, Maciej" <maciej.wieczor-retman@...el.com>, "Eranian, Stephane" <eranian@...gle.com>,
"Xiaojian.Du@....com" <Xiaojian.Du@....com>, "gautham.shenoy@....com" <gautham.shenoy@....com>
Subject: Re: [PATCH v13 00/27] x86/resctrl : Support AMD Assignable Bandwidth
Monitoring Counters (ABMC)
Hi Tony, Reinette,
On Thu, May 22, 2025 at 2:21 AM Luck, Tony <tony.luck@...el.com> wrote:
>
> > >>> There's also the mongroup-RMID overcommit use case I described
> > >>> above[1]. On Intel we can safely assume that there are counters to
> > >>> back all RMIDs, so num_mbm_cntrs would be calculated directly from
> > >>> num_rmids.
> > >>
> > >> This is about the:
> > >> There's now more interest in Google for allowing explicit control of
> > >> where RMIDs are assigned on Intel platforms. Even though the number of
> > >> RMIDs implemented by hardware tends to be roughly the number of
> > >> containers they want to support, they often still need to create
> > >> containers when all RMIDs have already been allocated, which is not
> > >> currently allowed. Once the container has been created and starts
> > >> running, it's no longer possible to move its threads into a monitoring
> > >> group whenever RMIDs should become available again, so it's important
> > >> for resctrl to maintain an accurate task list for a container even
> > >> when RMIDs are not available.
> > >>
> > >> I see a monitor group as a collection of tasks that need to be monitored together.
> > >> The "task list" is the group of tasks that share a monitoring ID that
> > >> is required to be a valid ID since when any of the tasks are scheduled that ID is
> > >> written to the hardware. I intentionally tried to not use RMID since I believe
> > >> this is required for all archs.
> > >> I thus do not understand how a task can start running when it does not have
> > >> a valid monitoring ID. The idea of "deferred assignment" is not clear to me,
> > >> there can never be "unmonitored tasks", no? I think I am missing something here.
You are correct. I did forget to mention something...
> > >
> > > In the AMD/RMID implemenentation this might be achieved with something
> > > extra in the task structure to denote whether a task is in a monitored
> > > group or not. E.g. We add "task->rmid_valid" as well as "task->rmid".
> > > Tasks in an unmonitored group retain their "task->rmid" (that's what
> > > identifies them as a member of a group) but have task->rmid_valid set
> > > to false. Context switch code would be updated to load "0" into the
> > > IA32_PQR_ASSOC.RMID field for tasks without a valid RMID. So they
> > > would still be monitored, but activity would be bundled with all
> > > tasks in the default resctrl group.
> > >
> > > Presumably something analogous could be done for ARM/MPAM.
> > >
> >
> > I do not interpret this as an unmonitored task but instead a task that
> > belongs to the default resource group. Specifically, any data accumulated by
> > such a task is attributed to the default resource group. Having tasks
> > in a separate group but their monitoring data accumulating in/contributed to
> > the default resource group (that has its own set of tasks) sounds wrong to me.
> > Such an implementation makes any monitoring data of default resource group
> > invalid, and by extension impossible to use default resource group to manage
> > an allocation for a group of monitor groups if user space needs insight
> > in monitoring data across all these monitor groups. User space will need to
> > interact with resctrl differently and individually query monitor groups instead
> > of CTRL_MON group once.
>
> Maybe assign one of the limited supply of RMIDs for these "unmonitored"
> tasks. Populate a resctrl group named "unmonitored" that lists all the
> unmonitored tasks in a (read-only) "tasks" file. And supply all the counts
> for these tasks in normal looking "mon_data" directory.
I needed to switch to an rdtgroup struct pointer rather than hardware
IDs in the task structure to indicate group membership[1], otherwise
it's not possible to determine which tasks are in a group when it
doesn't have a unique HW ID value.
Also this is required for shared assignment so that changing a group's
IDs in a domain only requires updating running tasks rather than
needing to search the entire task list, which would lead to the same
problem we encountered in mongroup rename[2].
-Peter
[1] https://lore.kernel.org/lkml/20240325172707.73966-5-peternewman@google.com/
[2] https://lore.kernel.org/lkml/CALPaoCh0SbG1+VbbgcxjubE7Cc2Pb6QqhG3NH6X=WwsNfqNjtA@mail.gmail.com/
Powered by blists - more mailing lists