lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <CAHk-=wiN-JcUh4uhDNmA4hp26Mg+c2DTuzgWY2fZ6hytDtOMCg@mail.gmail.com>
Date:   Mon, 12 Jun 2023 08:56:25 -0700
From:   Linus Torvalds <torvalds@...ux-foundation.org>
To:     "Darrick J. Wong" <djwong@...nel.org>, Jens Axboe <axboe@...nel.dk>
Cc:     Dave Chinner <david@...morbit.com>, Zorro Lang <zlang@...hat.com>,
        linux-xfs@...r.kernel.org,
        "Eric W. Biederman" <ebiederm@...ssion.com>,
        Mike Christie <michael.christie@...cle.com>,
        "Michael S. Tsirkin" <mst@...hat.com>, linux-kernel@...r.kernel.org
Subject: Re: [6.5-rc5 regression] core dump hangs (was Re: [Bug report]
 fstests generic/051 (on xfs) hang on latest linux v6.5-rc5+)

On Mon, Jun 12, 2023 at 8:36 AM Darrick J. Wong <djwong@...nel.org> wrote:
>
> > Or maybe Darrick (who doesn't see the issue) is running on raw
> > hardware, and you and Zorro are running in a virtual environment?
>
> Ahah, it turns out that liburing-dev isn't installed on the test fleet,
> so fstests didn't get built with io_uring support.  That probably
> explains why I don't see any of these hangs.
>
> Oh.  I can't *install* the debian liburing-dev package because it has
> a versioned dependency on linux-libc-dev >= 5.1, which isn't compatible
> with me having a linux-libc-dev-djwong package that contains the uapi
> headers for the latest upstream kernel and Replaces: linux-libc-dev.
> So either I have to create a dummy linux-libc-dev with adequate version
> number that pulls in my own libc header package, or rename that package.
>
> <sigh> It's going to take me a while to research how best to split this
> stupid knot.

Oh, no, that's great. It explains why you don't see the problem, and
Dave and Zorro do. Perfect.

No need for you to install any liburing packages, at least for this
issue. You'll probably want it eventually just for test coverage, but
for now it's the smoking gun we wanted - I was looking at why vhost
would be impacted, because that commit so intentionally *tried* to not
do anything at all to io_uring.

But it obviously failed. Which then in turn explains the bug.

Not that I see exactly where it went wrong yet, but at least we're
looking at the right thing. Adding Jens to the participants, in case
he sees what goes wrong.

Jens, commit f9010dbdce91 ("fork, vhost: Use CLONE_THREAD to fix
freezer/ps regression") seems to have broken core dumping with
io_uring threads, even though it tried very hard not to. See

  https://lore.kernel.org/all/20230611124836.whfktwaumnefm5z5@zlang-mailbox/

for the beginning of this thread.

Honestly, that "try to not change io_uring" was my least favorite part
of that patch, because I really think we want to try to aim for these
user helper threads having as much infrastructure in common as
possible. And when it comes to core dumps, I do not believe that
waiting for the io_uring thread adds anything to the end result,
because the only reason we wait for it is to put in the thread
register state into the core dump, and for kernel helper threads, that
information just isn't useful. It's going to be the state that caused
the thread to be created, not anything that is worth saving in a core
dump for.

So I'd actually prefer to just simplify the logic entirely, and say
"PF_USER_WORKER tasks do not participate in core dumps, end of story".
io_uring didn't _care_, so including them wasn't a pain, but if the
vhost exit case can be delayed, I'd rather just say "let's do thig
thing for both io_uring and vhost, and not split those two cases up".

Anyway, I don't see exactly what goes wrong, but I feel better just
from this having been narrowed down to io_uring threads. I suspect
Jens actually might even have a core-dumping test-case somewhere,
since core dumping was a thing that io_uring ended up having some
issues with at one point.

           Linus

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ