[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <CAHk-=wgXZmRu762bjSeK80+T_LTo+UP9y5rP-uvym1vquSxmBw@mail.gmail.com>
Date: Sun, 10 Jan 2021 10:46:05 -0800
From: Linus Torvalds <torvalds@...ux-foundation.org>
To: Alexey Gladkov <gladkov.alexey@...il.com>
Cc: LKML <linux-kernel@...r.kernel.org>,
Linux Containers <containers@...ts.linux-foundation.org>,
Kernel Hardening <kernel-hardening@...ts.openwall.com>,
Alexey Gladkov <legion@...nel.org>,
"Eric W . Biederman" <ebiederm@...ssion.com>,
Kees Cook <keescook@...omium.org>,
Christian Brauner <christian@...uner.io>
Subject: Re: [RFC PATCH v2 0/8] Count rlimits in each user namespace
On Sun, Jan 10, 2021 at 9:34 AM Alexey Gladkov <gladkov.alexey@...il.com> wrote:
>
> To address the problem, we bind rlimit counters to each user namespace. The
> result is a tree of rlimit counters with the biggest value at the root (aka
> init_user_ns). The rlimit counter increment/decrement occurs in the current and
> all parent user namespaces.
I'm not seeing why this is necessary.
Maybe it's the right approach, but none of the patches (or this cover
letter email) really explain it to me.
I understand why you might want the _limits_ themselves would form a
tree like this - with the "master limit" limiting the limits in the
user namespaces under it.
But I don't understand why the _counts_ should do that. The 'struct
user_struct' should be shared across even user namespaces for the same
user.
IOW, the very example of the problem you quote seems to argue against this:
> For example, there are two containers (A and B) created by one user. The
> container A sets RLIMIT_NPROC=1 and starts one process. Everything is fine, but
> when container B tries to do the same it will fail because the number of
> processes is counted globally for each user and user has one process already.
Note how the problem was _not_ that the _count_ was global. That part
was fine and all good.
No, the problem was that the _limit_ in container A also ended up
affecting container B.
So to me, that says that it would make sense to continue to use the
resource counts in 'struct user_struct' (because if user A has a hard
limit of X, then creating a new namespace shouldn't expand that
limit), but then have the ability to make per-container changes to the
resource limits (as long as they are within the bounds of the parent
user namespace resource limit).
Maybe there is some reason for this ucounts approach, but if so, I
feel it was not explained at all.
Linus
Powered by blists - more mailing lists