[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <CAHk-=whg4aVxA7LFAUFCzOn78_7TL1CPo+esPKgN5JTHy8H-Rg@mail.gmail.com>
Date: Wed, 10 Mar 2021 13:14:24 -0800
From: Linus Torvalds <torvalds@...ux-foundation.org>
To: Alexey Gladkov <gladkov.alexey@...il.com>
Cc: LKML <linux-kernel@...r.kernel.org>,
io-uring <io-uring@...r.kernel.org>,
Kernel Hardening <kernel-hardening@...ts.openwall.com>,
Linux Containers <containers@...ts.linux-foundation.org>,
Linux-MM <linux-mm@...ck.org>,
Alexey Gladkov <legion@...nel.org>,
Andrew Morton <akpm@...ux-foundation.org>,
Christian Brauner <christian.brauner@...ntu.com>,
"Eric W . Biederman" <ebiederm@...ssion.com>,
Jann Horn <jannh@...gle.com>, Jens Axboe <axboe@...nel.dk>,
Kees Cook <keescook@...omium.org>,
Oleg Nesterov <oleg@...hat.com>
Subject: Re: [PATCH v8 3/8] Use atomic_t for ucounts reference counting
On Wed, Mar 10, 2021 at 4:01 AM Alexey Gladkov <gladkov.alexey@...il.com> wrote:
>
>
> +/* 127: arbitrary random number, small enough to assemble well */
> +#define refcount_zero_or_close_to_overflow(ucounts) \
> + ((unsigned int) atomic_read(&ucounts->count) + 127u <= 127u)
> +
> +struct ucounts *get_ucounts(struct ucounts *ucounts)
> +{
> + if (ucounts) {
> + if (refcount_zero_or_close_to_overflow(ucounts)) {
> + WARN_ONCE(1, "ucounts: counter has reached its maximum value");
> + return NULL;
> + }
> + atomic_inc(&ucounts->count);
> + }
> + return ucounts;
Side note: you probably should just make the limit be the "oh, the
count overflows into the sign bit".
The reason the page cache did that tighter thing is that it actually
has _two_ limits:
- the "try_get_page()" thing uses the sign bit as a "uhhuh, I've now
used up half of the available reference counting bits, and I will
refuse to use any more".
This is basically your "get_ucounts()" function. It's a "I want a
refcount, but I'm willing to deal with failures".
- the page cache has a _different_ set of "I need to unconditionally
get a refcount, and I can *not* deal with failures".
This is basically the traditional "get_page()", which is only used
in fairly controlled places, and should never be something that can
overflow.
And *that* special code then uses that
"zero_or_close_to_overflow()" case as a "doing a get_page() in this
situation is very very wrong". This is purely a debugging feature used
for a VM_BUG_ON() (that has never triggered, as far as I know).
For your ucounts situation, you don't have that second case at all, so
you have no reason to ever allow the count to even get remotely close
to overflowing.
A reference count being within 128 counts of overflow (when we're
talking a 32-bit count) is basically never a good idea. It means that
you are way too close to the limit, and there's a risk that lots of
concurrent people all first see an ok value, and then *all* decide to
do the increment, and then you're toast.
In contrast, if you use the sign bit as a "ok, let's stop
incrementing", the fact that your "overflow" test and the increment
aren't atomic really isn't a big deal.
(And yes, you could use a cmpxchg to *make* the overflow test atomic,
but it's often much much more expensive, so..)
Linus
Powered by blists - more mailing lists