linux-kernel - Re: [RFD] resctrl: reassigning a running container's CTRL

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <CALPaoCjdeRjyX5L6BBX688ZM21eMwetuL9QLF1+GEDUskGcU2w@mail.gmail.com>
Date:   Wed, 12 Oct 2022 13:21:00 +0200
From:   Peter Newman <peternewman@...gle.com>
To:     Reinette Chatre <reinette.chatre@...el.com>
Cc:     Tony Luck <tony.luck@...el.com>,
        "Yu, Fenghua" <fenghua.yu@...el.com>,
        "Eranian, Stephane" <eranian@...gle.com>,
        "linux-kernel@...r.kernel.org" <linux-kernel@...r.kernel.org>,
        Thomas Gleixner <tglx@...utronix.de>,
        James Morse <james.morse@....com>,
        Babu Moger <Babu.Moger@....com>,
        Gaurang Upasani <gupasani@...gle.com>
Subject: Re: [RFD] resctrl: reassigning a running container's CTRL_MON group

[Adding Gaurang to CC]

On Tue, Oct 11, 2022 at 1:35 AM Reinette Chatre
<reinette.chatre@...el.com> wrote:
>
> On 10/7/2022 10:28 AM, Tony Luck wrote:
> > I don't know how complex it would for the kernel to implement this. Or
> > whether it would meet Google's needs.
> >
>
> How about moving monitor groups from one control group to another?
>
> Based on the initial description I got the impression that there is
> already a monitor group for every container. (Please correct me if I am
> wrong). If this is the case then it may be possible to create an interface
> that could move an entire monitor group to another control group. This would
> keep the benefit of usage counts remaining intact, tasks get a new closid, but
> keep their rmid. There would be no need for the user to specify process-ids.

Yes, Stephane also pointed out the importance of maintaining RMID assignments
as well and I don't believe I put enough emphasis on it during my
original email.

We need to maintain accurate memory bandwidth usage counts on all
containers, so it's important to be able to maintain an RMID assignment
and its event counts across a CoS downgrade. The solutions Tony
suggested do solve the races in moving the tasks, but the container
would need to temporarily join the default MON group in the new CTRL_MON
group before it can be moved to its replacement MON group.

Being able to re-parent a MON group would allow us to change the CLOSID
independently of the RMID in a container and would address the issue.

The only other point I can think of to differentiate it from the
automatic CLOSID management solution is whether the 1:1 CTRL_MON:CLOSID
approach will become too limiting going forward. For example, if there
are configurations where one resource has far fewer CLOSIDs than others
and we want to start assigning CLOSIDs on-demand, per-resource to avoid
wasting other resources' available CLOSID spaces. If we can foresee this
becoming a concern, then automatic CLOSID management would be
inevitable.

-Peter


On Tue, Oct 11, 2022 at 1:35 AM Reinette Chatre
<reinette.chatre@...el.com> wrote:
>
> On 10/7/2022 10:28 AM, Tony Luck wrote:
> > On Fri, Oct 07, 2022 at 08:44:53AM -0700, Yu, Fenghua wrote:
> >> Hi, Peter,
> >>
> >>> On 10/7/2022 3:39 AM, Peter Newman wrote:
> >
> >>>> The CLOSID management rules would roughly be:
> >>>>
> >>>>  1. If an update would cause a CTRL_MON group's config to match that of
> >>>>     an existing group, the CTRL_MON group's CLOSID should change to that
> >>>>     of the existing group, where the definition of "match" is: all
> >>>>     control values match in all domains for all resources, as well as
> >>>>     the cpu masks matching.
> >
> > So the micro steps are:
> >
> > # mkdir newgroup
> >       # New groups are created with maximum resources. So this might
> >       # match the root/default group (if the root schemata had not
> >       # been edited) ... so you could re-use CLOSID=0 for this, or
> >       # perhaps allocate a new CLOSID
> > # edit newgroup/schemata
> >       # if this update makes this schemata match some other group,
> >       # then update the CLOSID for this group to be same as the other
> >       # group.
> >>>>
> >>>>  2. If an update to a CTRL_MON group sharing a CLOSID with another group
> >>>>     causes that group to no longer match any others, a new CLOSID must
> >>>>     be allocated.
> >       # So you have reference counts for CLOSIDs for how many groups
> >       # share it. In above example the change to the schemata and
> >       # alloction of a new CLOSID would decrement the reference count
> >       # and free the old CLOSID if it goes to zero
> >>>>
> >>>>  3. An update to a CTRL_MON group using a non-shared CLOSID which
> >>>>     continues to not match any others follows the current resctrl
> >>>>     behavior.
> >       # An update to a CTRL_MON group that has a CLOSID reference
> >       # count > 1 would try to allocate a new CLOSID if the new
> >       # schemata doesn't match any other group. If all CLOSIDs are
> >       # already in use, the write(2) to the schemata file must fail
> >       # ... maybe -ENOSPC is the right error code?
> >
> > Note that if the root/default CTRL_MON had been editted you might not be
> > able to create a new group (even though you intend to make to match some
> > existing group and share a CLOSID). Perhaps we could change existing
> > semantics so that new groups copy the root group schemata instead of
> > being maximally permissibe with all resources?
> >>>>
> >>>> Before I prepare any patches for review, I'm interested in any
> >>>> comments or suggestions on the use case and solution.
> >>>>
> >>>> Are there simpler strategies for reassigning a running container's
> >>>> tasks to a different CTRL_MON group that we should be considering first?
> >
> > Do tasks in a container share a "process group"? If they do, then a
> > simpler option would be some syntax to assign a group to a resctrl group
> > (perhaps as a negative task-id? or with a "G" prefix??).
> >
> > Or is there some other simple way to enumerate all the tasks in a
> > container with some syntax that is convenient for both the user and the
> > kernel? If there is, then add code to allow something like:
> >       # echo C{containername} > tasks
> > and have the resctrl code move all tasks en masse.
> >
> > Yet another option would be syntax to apply the move recursively to all
> > descendents of the given task id.
> >
> >       # echo R{process-id} > tasks
> >
> > I don't know how complex it would for the kernel to implement this. Or
> > whether it would meet Google's needs.
> >
>
> How about moving monitor groups from one control group to another?
>
> Based on the initial description I got the impression that there is
> already a monitor group for every container. (Please correct me if I am
> wrong). If this is the case then it may be possible to create an interface
> that could move an entire monitor group to another control group. This would
> keep the benefit of usage counts remaining intact, tasks get a new closid, but
> keep their rmid. There would be no need for the user to specify process-ids.
>
> Reinette