[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <87ft2u2ss5.fsf@x220.int.ebiederm.org>
Date: Thu, 21 Jan 2021 09:50:34 -0600
From: ebiederm@...ssion.com (Eric W. Biederman)
To: Alexey Gladkov <gladkov.alexey@...il.com>
Cc: Linus Torvalds <torvalds@...ux-foundation.org>,
LKML <linux-kernel@...r.kernel.org>,
io-uring <io-uring@...r.kernel.org>,
Kernel Hardening <kernel-hardening@...ts.openwall.com>,
Linux Containers <containers@...ts.linux-foundation.org>,
Linux-MM <linux-mm@...ck.org>,
Andrew Morton <akpm@...ux-foundation.org>,
Christian Brauner <christian.brauner@...ntu.com>,
Jann Horn <jannh@...gle.com>, Jens Axboe <axboe@...nel.dk>,
Kees Cook <keescook@...omium.org>,
Oleg Nesterov <oleg@...hat.com>
Subject: Re: [RFC PATCH v3 1/8] Use refcount_t for ucounts reference counting
Alexey Gladkov <gladkov.alexey@...il.com> writes:
> On Tue, Jan 19, 2021 at 07:57:36PM -0600, Eric W. Biederman wrote:
>> Alexey Gladkov <gladkov.alexey@...il.com> writes:
>>
>> > On Mon, Jan 18, 2021 at 12:34:29PM -0800, Linus Torvalds wrote:
>> >> On Mon, Jan 18, 2021 at 11:46 AM Alexey Gladkov
>> >> <gladkov.alexey@...il.com> wrote:
>> >> >
>> >> > Sorry about that. I thought that this code is not needed when switching
>> >> > from int to refcount_t. I was wrong.
>> >>
>> >> Well, you _may_ be right. I personally didn't check how the return
>> >> value is used.
>> >>
>> >> I only reacted to "it certainly _may_ be used, and there is absolutely
>> >> no comment anywhere about why it wouldn't matter".
>> >
>> > I have not found examples where checked the overflow after calling
>> > refcount_inc/refcount_add.
>> >
>> > For example in kernel/fork.c:2298 :
>> >
>> > current->signal->nr_threads++;
>> > atomic_inc(¤t->signal->live);
>> > refcount_inc(¤t->signal->sigcnt);
>> >
>> > $ semind search signal_struct.sigcnt
>> > def include/linux/sched/signal.h:83 refcount_t sigcnt;
>> > m-- kernel/fork.c:723 put_signal_struct if (refcount_dec_and_test(&sig->sigcnt))
>> > m-- kernel/fork.c:1571 copy_signal refcount_set(&sig->sigcnt, 1);
>> > m-- kernel/fork.c:2298 copy_process refcount_inc(¤t->signal->sigcnt);
>> >
>> > It seems to me that the only way is to use __refcount_inc and then compare
>> > the old value with REFCOUNT_MAX
>> >
>> > Since I have not seen examples of such checks, I thought that this is
>> > acceptable. Sorry once again. I have not tried to hide these changes.
>>
>> The current ucount code does check for overflow and fails the increment
>> in every case.
>>
>> So arguably it will be a regression and inferior error handling behavior
>> if the code switches to the ``better'' refcount_t data structure.
>>
>> I originally didn't use refcount_t because silently saturating and not
>> bothering to handle the error makes me uncomfortable.
>>
>> Not having to acquire the ucounts_lock every time seems nice. Perhaps
>> the path forward would be to start with stupid/correct code that always
>> takes the ucounts_lock for every increment of ucounts->count, that is
>> later replaced with something more optimal.
>>
>> Not impacting performance in the non-namespace cases and having good
>> performance in the other cases is a fundamental requirement of merging
>> code like this.
>
> Did I understand your suggestion correctly that you suggest to use
> spin_lock for atomic_read and atomic_inc ?
>
> If so, then we are already incrementing the counter under ucounts_lock.
>
> ...
> if (atomic_read(&ucounts->count) == INT_MAX)
> ucounts = NULL;
> else
> atomic_inc(&ucounts->count);
> spin_unlock_irq(&ucounts_lock);
> return ucounts;
>
> something like this ?
Yes. But without atomics. Something a bit more like:
> ...
> if (ucounts->count == INT_MAX)
> ucounts = NULL;
> else
> ucounts->count++;
> spin_unlock_irq(&ucounts_lock);
> return ucounts;
I do believe at some point we will want to say using the spin_lock for
ucounts->count is cumbersome, and suboptimal and we want to change it to
get a better performing implementation.
Just for getting the semantics correct we should be able to use just
ucounts_lock for locking. Then when everything is working we can
profile and optimize the code.
I just don't want figuring out what is needed to get hung up over little
details that we can change later.
Eric
Powered by blists - more mailing lists