linux-kernel - Re: segfaults of processes while being killed after commit "mm: make the page fault mmap locking killable"

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <CAHk-=wivat-bcWsGnQOd3=ODx0zFnc7R82tiee=fSU+DF4tD5g@mail.gmail.com>
Date:   Wed, 26 Jul 2023 10:59:41 -0700
From:   Linus Torvalds <torvalds@...ux-foundation.org>
To:     Fiona Ebner <f.ebner@...xmox.com>
Cc:     "Eric W. Biederman" <ebiederm@...ssion.com>,
        Oleg Nesterov <oleg@...hat.com>, akpm@...ux-foundation.org,
        Thomas Lamprecht <t.lamprecht@...xmox.com>,
        Wolfgang Bumiller <w.bumiller@...xmox.com>, linux-mm@...ck.org,
        linux-kernel@...r.kernel.org
Subject: Re: segfaults of processes while being killed after commit "mm: make
 the page fault mmap locking killable"

On Wed, 26 Jul 2023 at 01:19, Fiona Ebner <f.ebner@...xmox.com> wrote:
>
> Checking the status from waitpid, it does show that the process was
> terminated by signal 9, even if the segfault was logged.

Thanks for verifying. That's what I thought, and I had just entirely
forgotten about the logging of failed page faults.

This whole "fatal signals during IO can also cause a failed page
fault" has been true for a long long time, but because it's done later
by the actual VM code, there we actually end up going through
"fault_signal_pending()" and suppressing the logging of the page fault
failure that way.

> > But before we revert it, would you mind trying out the attached
> > trivial patch instead?
>
> The patch works for me too :) (after adding the missing tsk argument
> like Thomas pointed out)

So it turns out that not only did I forget the argument, I decided
that I put that test for fatal signals in the wrong place.

The patch obviously does fix the problem on x86, and we could do the
same thing for all the other architectures that do this signal
logging.

But there's actually a much better place to put the fatal signal
check, which will take care of all architectures: just do it in the
'unhandled_signal()' function.

So I fixed the missing argument, and moved the test to a different
place, but I still added your (and Thomas') "Tested-by:" even if you
ended up testing something that was a bit different.

Oleg, I took your Acked-by too. Despite the final patch being somewhat
different. Holler if you see something objectionable.

It's commit 5f0bc0b042fc ("mm: suppress mm fault logging if fatal
signal already pending") in my tree now.

And because it's a bit different from what you already tested, it
would be lovely to just get a confirmation that I didn't screw
anything up when I decided I needed to make a fix that covers more
than just x86.

Thanks,
                     Linus