[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <b2e020b1-f6b2-e236-a042-4eb2fd27d8b0@intel.com>
Date: Fri, 7 Oct 2022 08:36:37 -0700
From: Reinette Chatre <reinette.chatre@...el.com>
To: Peter Newman <peternewman@...gle.com>,
Fenghua Yu <fenghua.yu@...el.com>
CC: Stephane Eranian <eranian@...gle.com>,
<linux-kernel@...r.kernel.org>,
Thomas Gleixner <tglx@...utronix.de>,
James Morse <james.morse@....com>,
Babu Moger <Babu.Moger@....com>,
"Luck, Tony" <tony.luck@...el.com>
Subject: Re: [RFD] resctrl: reassigning a running container's CTRL_MON group
+Tony
On 10/7/2022 3:39 AM, Peter Newman wrote:
> Hi Reinette, Fenghua,
>
> I'd like to talk about the tasks file interface in CTRL_MON and MON
> groups.
>
> For some background, we are using the memory-bandwidth monitoring and
> allocation features of resctrl to maintain QoS on external memory
> bandwidth for latency-sensitive containers to help enable batch
> containers to use up leftover CPU/memory resources on a machine. We
> also monitor the external memory bandwidth usage of all hosted
> containers to identify ones which are misusing their latency-sensitive
> CoS assignment and downgrade them to the batch CoS.
>
> The trouble is, container manager developers working with the tasks
> interface have complained that it's not usable for them because it takes
> many (or an unbounded number of) passes to move all tasks from a
> container over, as the list is always changing.
>
> Our solution for them is to remove the need for moving tasks between
> CTRL_MON groups. Because we are mainly using MB throttling to implement
> QoS, we only need two classes of service. Therefore we've modified
> resctrl to reuse existing CLOSIDs for CTRL_MON groups with identical
> configurations, allowing us to create a CTRL_MON group for every
> container. Instead of moving the tasks over, we only need to update
> their CTRL_MON group's schemata. Another benefit for us is that we do
> not need to also move all of the tasks over to a new monitoring group in
> the batch CTRL_MON group, and the usage counts remain intact.
>
> The CLOSID management rules would roughly be:
>
> 1. If an update would cause a CTRL_MON group's config to match that of
> an existing group, the CTRL_MON group's CLOSID should change to that
> of the existing group, where the definition of "match" is: all
> control values match in all domains for all resources, as well as
> the cpu masks matching.
>
> 2. If an update to a CTRL_MON group sharing a CLOSID with another group
> causes that group to no longer match any others, a new CLOSID must
> be allocated.
>
> 3. An update to a CTRL_MON group using a non-shared CLOSID which
> continues to not match any others follows the current resctrl
> behavior.
>
> Before I prepare any patches for review, I'm interested in any comments
> or suggestions on the use case and solution.
>
> Are there simpler strategies for reassigning a running container's tasks
> to a different CTRL_MON group that we should be considering first?
>
> Any concerns about the CLOSID-reusing behavior? The hope is existing
> users who aren't creating identically-configured CTRL_MON groups would
> be minimally impacted. Would it help if the proposed behavior were
> opt-in at mount-time?
>
> Thanks!
> -Peter
Powered by blists - more mailing lists