linux-kernel - Re: [GIT PULL] signal/exit/ptrace changes for v5.17

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <CAHk-=wgS865kHU=4NO=AvK07fcK7M6C6EYGdk80R1tkPKTLyhQ@mail.gmail.com>
Date:   Mon, 17 Jan 2022 06:08:49 +0200
From:   Linus Torvalds <torvalds@...ux-foundation.org>
To:     "Eric W. Biederman" <ebiederm@...ssion.com>
Cc:     Linux Kernel Mailing List <linux-kernel@...r.kernel.org>,
        Alexey Gladkov <legion@...nel.org>,
        Al Viro <viro@...iv.linux.org.uk>,
        Kees Cook <keescook@...omium.org>,
        Oleg Nesterov <oleg@...hat.com>
Subject: Re: [GIT PULL] signal/exit/ptrace changes for v5.17

On Sat, Jan 15, 2022 at 2:00 AM Eric W. Biederman <ebiederm@...ssion.com> wrote:
>
> I am currently investigating to figure out if wake_up_interruptible
> (instead of simply wake_up) ever makes sense outside of the signal code.

It may not be a huge deal, but it *absolutely* makes sense.

Any subsystem that uses "wait_event_interruptible()" (or variations of
that) should by default only use "wake_up_interruptible()" to wake the
wait queue.

The reason? Being (interruptibly) on one wait-queue does *NOT* make it
impossible that the very same process isn't waiting non-interruptibly
on another one.

It's not hugely common, but the Linux kernel wait-queues have very
much been designed for the whole "one process can be on multiple wait
queues for different reasons at the same time" model. That is *very*
core.

People sometimes think that is just a "poll/select()" thing, but
that's not at all true. It's quite valid to do things like

        add_wait_queue(..)
        for (.. some loop ..) {
                set_current_state(TASK_INTERRUPTIBLE);
                ... do various things, checking state etc ..
               schedule();
        }
        set_current_state(TASK_RUNNABLE);
        remove_wait_queue();

and part of that "do various things" is obviously checking for signals
and other "exit this wait loop", but other things can be things like
taking a page fault because you copied data from user space etc.

And that in turn may end up having a nested wait on another waitqueue
(for the IO), and the outer wait queue should basically strive to not
wake up unrelated "deeper" sleeps, because that is pointless and just
causes extra wakeups.

So the page fault event will cause a nested TASK_UNINTERRUPTIBLE
sleep, and when that IO has completed, it goes into TASK_RUNNABLE, so
the outer (interruptible) loop above will have a "dummy schedule()"
and loop around again to be put back into TASK_INTERRUPTIBLE sleep
next time around.

But note how it would be pointless to use a "wake_up()" on this outer
wait queue - it would wake up the deeper IO sleep too, and that would
just see "oh, the IO I'm waiting for hasn't completed, so I'll just
have to go to sleep again".

Would it still _work_? Yes. Is the above _common_? No. But it's a
really fundamnetal pattern in the kernel, and it's fundamentally how
wait queues work, and how they should work, and an interruptible sleep
should generally be seen as pairing with an interruptible wake,
because that's just how things are.

Why would you want to change something basic like that? Yes, using
"wake_up()" instead of "wake_up_interruptible()" would still result in
a working kernel, but it's just _pointless_.

             Linus