lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <CAHk-=whU3jhye3kZPq61S0tAkW2sKv2JzGpog1owrv8--ZABhQ@mail.gmail.com>
Date:   Tue, 29 Nov 2022 10:22:48 -0800
From:   Linus Torvalds <torvalds@...ux-foundation.org>
To:     Sebastian Andrzej Siewior <bigeasy@...utronix.de>
Cc:     linux-kernel@...r.kernel.org,
        "Eric W. Biederman" <ebiederm@...ssion.com>,
        Ben Segall <bsegall@...gle.com>,
        Christian Brauner <christian.brauner@...ntu.com>,
        Daniel Bristot de Oliveira <bristot@...hat.com>,
        Dietmar Eggemann <dietmar.eggemann@....com>,
        Ingo Molnar <mingo@...hat.com>,
        Juri Lelli <juri.lelli@...hat.com>,
        Mel Gorman <mgorman@...e.de>, Oleg Nesterov <oleg@...hat.com>,
        Peter Zijlstra <peterz@...radead.org>,
        Steven Rostedt <rostedt@...dmis.org>,
        Thomas Gleixner <tglx@...utronix.de>,
        Valentin Schneider <vschneid@...hat.com>,
        Vincent Guittot <vincent.guittot@...aro.org>
Subject: Re: [PATCH] signal: Allow tasks to cache one sigqueue struct (again).

On Tue, Nov 29, 2022 at 7:15 AM Sebastian Andrzej Siewior
<bigeasy@...utronix.de> wrote:
>
> From: Thomas Gleixner <tglx@...utronix.de>
>
> The idea for this originates from the real time tree to make signal
> delivery for realtime applications more efficient.

NAK.

This was attempted once. It had unexplained problems, and very unclear locking.

And this new version is THE SAME CRAZY STUFF. That whole
sigqueue_cache_or_free() hasn't been improved, and is just a thousand
monkeys on crack typing wildly on a keyboard.

This updated patch seems to *attempt* to do everything under the
sighand->siglock, and several of the comments point to it.

But - exactly like the first version - that isn't actually always
true, so it then does *other* random tests instead.

And in at least one case - sigqueue_free() - this code *literally* does that

        spin_unlock_irqrestore(lock, flags);

        if (q)
                __sigqueue_cache_or_free(q);

where that "spin_unlock_irqrestore()" just released 'sighand->siglock'
which is what allegedly is protecting this all.

In other words, STOP DOING THIS.

Insanity is doing the same thing over and over again, expecting
different results. This is insanity.

We have a couple of alternatives here:

 (a) do it *properly* and obviously under that sighand->lock in EVERY
SINGLE PATH.

     None of these games with random code paths either having the lock
conditionally (the 'sigqueue_flags' in __sigqueue_alloc()).

     And none of these "I just dropped the lock, and then free the cache"

 (b) alternatively, do no locking at all, and depend on 'current'
being single-threaded, and _only_ do it in task-synchronous contexts
(so no interrupts)

 (c) use proper atomics that don't *need* locking (ie "xchg(p,NULL)"
kind of things)

but no, we do not repeat the mistake of mixing (a) and (b) with random code.

And exactly because we've already done this once - badly - I want the
new code to be *obviously* correct.

That very much means "don't try to revive a patch that was already
reverted because it was broken and didn't make sense". Throw this
patch AWAY. It's not worth reviving. Start from scratch, with that
"this needs to be _obviously_ correct" as the starting point.

Honestly, I suspect the proper approach is:

 - absolutely *no* "test if task is dead". Free the last left-over
cached entry as you free the 'struct task_struct".

   Stop doing these "test random other unrelated flags".

   If you cannot based it on just the cache field and the lock, DON'T
DO IT AT ALL.

 - absolutely no "test the 'sigqueue_flags' argument".

   Again, stop doing these "test random flags".

   Either only do the "do I have a cached entry" in the path that
unambiguously already holds the lock!

   Or do a cache-specific lock (ie "use atomics") so that it doesn't
*matter* whether the lock is held.

 - make every step *so* obvious that you don't even need a comment.

   And then add the comment about it anyway.

Ok? Because I never want to see this broken approach again.

                Linus

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ