linux-kernel - Re: segfaults of processes while being killed after commit "mm: make the page fault mmap locking killable"

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <CAHk-=whKBx_UUKagfyF72EJrpqNCupF4yeoPgapjEBe1bynGcw@mail.gmail.com>
Date:   Tue, 25 Jul 2023 09:38:32 -0700
From:   Linus Torvalds <torvalds@...ux-foundation.org>
To:     Fiona Ebner <f.ebner@...xmox.com>,
        "Eric W. Biederman" <ebiederm@...ssion.com>,
        Oleg Nesterov <oleg@...hat.com>
Cc:     akpm@...ux-foundation.org,
        Thomas Lamprecht <t.lamprecht@...xmox.com>,
        Wolfgang Bumiller <w.bumiller@...xmox.com>, linux-mm@...ck.org,
        linux-kernel@...r.kernel.org
Subject: Re: segfaults of processes while being killed after commit "mm: make
 the page fault mmap locking killable"

On Tue, 25 Jul 2023 at 04:16, Fiona Ebner <f.ebner@...xmox.com> wrote:
>
> will end up without a vma and cause/log the segfault. Of course the
> process is already being killed, but I'd argue it is very confusing to
> users when apparent segfaults from such processes are being logged by
> the kernel.

Ahh. Yes, that wasn't the intent. A process that is being killed
should exit with the lethal signal, not SIGSEGV.

This is not new, btw - this situation is exactly the same as if you
use something like NFS where the process is killed and the IO is
interrupted and the page fault faults for that reason.

But I suspect *very* few people actually encounter that NFS situation
(you can get it on local filesystems too, but the IO race window is
then so small as to probably not be triggerable at all).

So the new killable() check is probably much easier to actually
trigger in practice, even though it's not a new situation per se.

What exactly made you notice? Is it just the logging from
'show_unhandled_signals' being set?

Because the actual signal itself, from the

        force_sig_fault(SIGSEGV, si_code, (void __user *)address);

in __bad_area_nosemaphore() should be overridden by the fact that a
lethal signal was already pending.

But let's add a couple of signal people rather than the mm people to
the participants. Eric, Oleg - would not an existing fatal signal take
precedence over a new SIGSEGV? I obviously thought it did, but looking
at 'get_signal()' and the signal delivery, I don't actually see any
code to that effect.

Fiona - that patch is easily reverted, and it was done as a separate
patch exactly because I was wondering if there was some subtle reason
we didn't already do that.

But before we revert it, would you mind trying out the attached
trivial patch instead?

I'd also still be interested if the symptoms were anything else than
'show_unhandled_signals' causing the show_signal_msg() dance, and
resulting in a message something like

    a.out[1567]: segfault at xyz ip [..] likely on CPU X

in dmesg...

                 Linus

View attachment "patch.diff" of type "text/x-patch" (536 bytes)