lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <87bl0aidjv.fsf@email.froward.int.ebiederm.org>
Date:   Mon, 17 Jan 2022 09:31:48 -0600
From:   "Eric W. Biederman" <ebiederm@...ssion.com>
To:     Linus Torvalds <torvalds@...ux-foundation.org>
Cc:     Linux Kernel Mailing List <linux-kernel@...r.kernel.org>,
        Alexey Gladkov <legion@...nel.org>,
        Al Viro <viro@...iv.linux.org.uk>,
        Kees Cook <keescook@...omium.org>,
        Oleg Nesterov <oleg@...hat.com>
Subject: Re: [GIT PULL] signal/exit/ptrace changes for v5.17

Linus Torvalds <torvalds@...ux-foundation.org> writes:

> On Sat, Jan 15, 2022 at 2:00 AM Eric W. Biederman <ebiederm@...ssion.com> wrote:
>>
>> I am currently investigating to figure out if wake_up_interruptible
>> (instead of simply wake_up) ever makes sense outside of the signal code.
>
> It may not be a huge deal, but it *absolutely* makes sense.
>
> Any subsystem that uses "wait_event_interruptible()" (or variations of
> that) should by default only use "wake_up_interruptible()" to wake the
> wait queue.
>
> The reason? Being (interruptibly) on one wait-queue does *NOT* make it
> impossible that the very same process isn't waiting non-interruptibly
> on another one.
>
> It's not hugely common, but the Linux kernel wait-queues have very
> much been designed for the whole "one process can be on multiple wait
> queues for different reasons at the same time" model. That is *very*
> core.
>
> People sometimes think that is just a "poll/select()" thing, but
> that's not at all true. It's quite valid to do things like
>
>         add_wait_queue(..)
>         for (.. some loop ..) {
>                 set_current_state(TASK_INTERRUPTIBLE);
>                 ... do various things, checking state etc ..
>                schedule();
>         }
>         set_current_state(TASK_RUNNABLE);
>         remove_wait_queue();
>
> and part of that "do various things" is obviously checking for signals
> and other "exit this wait loop", but other things can be things like
> taking a page fault because you copied data from user space etc.
>
> And that in turn may end up having a nested wait on another waitqueue
> (for the IO), and the outer wait queue should basically strive to not
> wake up unrelated "deeper" sleeps, because that is pointless and just
> causes extra wakeups.
>
> So the page fault event will cause a nested TASK_UNINTERRUPTIBLE
> sleep, and when that IO has completed, it goes into TASK_RUNNABLE, so
> the outer (interruptible) loop above will have a "dummy schedule()"
> and loop around again to be put back into TASK_INTERRUPTIBLE sleep
> next time around.
>
> But note how it would be pointless to use a "wake_up()" on this outer
> wait queue - it would wake up the deeper IO sleep too, and that would
> just see "oh, the IO I'm waiting for hasn't completed, so I'll just
> have to go to sleep again".
>
> Would it still _work_? Yes. Is the above _common_? No. But it's a
> really fundamnetal pattern in the kernel, and it's fundamentally how
> wait queues work, and how they should work, and an interruptible sleep
> should generally be seen as pairing with an interruptible wake,
> because that's just how things are.
>
> Why would you want to change something basic like that? Yes, using
> "wake_up()" instead of "wake_up_interruptible()" would still result in
> a working kernel, but it's just _pointless_.

Thank you for the detailed reply.  I am going to have to spend a little
bit digesting it.

They why is that I am digging into how to reliably test for and get
just the wakeups needed in the coredump code.

As a first approximation writing to files and causing dump_interrupted
to change for signal_pending to fatal_sending_pending worked like a
charm.  Unfortunately the pipe code is still performing interruptible
waits and io_uring causes truncated coredumps.

I would like to have a version of pipe_write that sleeps in
TASK_KILLABLE.  I want the I/O wake-ups and I want the SIGKILL wake ups
but I don't want any other wake-ups.  Unfortunately the I/O wake-ups in
the pipe code are sent with wake_up_interruptible.  So a task sleeping
in TASK_KILLABLE won't get them.

Which means that the obvious solution of changing
wait_event_interruptible to wake_event_killable breaks coredump support
(as I found out the hard way).

I understand things well enough that I can change the signal code and
not make the coredump code worse.  I am still trying to figure out what
a clean maintainable solution for coredumps writing to pipes is going to
be.  So I will dig in and read the code that has the nested wait queues
and see if I can understand that logic, and keep thinking about the
coredump support.

Eric

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ