[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <87ldze2lsq.fsf@email.froward.int.ebiederm.org>
Date: Thu, 26 Sep 2024 15:37:57 -0500
From: "Eric W. Biederman" <ebiederm@...ssion.com>
To: Linus Torvalds <torvalds@...ux-foundation.org>
Cc: Vegard Nossum <vegard.nossum@...cle.com>, Kees Cook <kees@...nel.org>,
linux-kernel@...r.kernel.org, Allen Pais <apais@...ux.microsoft.com>,
Brian Mak <makb@...iper.net>, Jeff Xu <jeffxu@...omium.org>, Roman
Kisel <romank@...ux.microsoft.com>, regressions@...ts.linux.dev
Subject: Re: [GIT PULL] execve updates for v6.12-rc1
Linus Torvalds <torvalds@...ux-foundation.org> writes:
> On Thu, 26 Sept 2024 at 12:10, Eric W. Biederman <ebiederm@...ssion.com> wrote:
>>
>> One of the common causes for coredump truncation is weird interactions
>> between io_uring and the coredump code. (AKA kernel bugs).
>>
>> That is something you can't ask your debugger to tell you.
>>
>> So from 10,000 feet I think the idea is sane.
>
> What? No. Adding printk's to chase kernel bugs is certainly a
> time-honored tradition. But we don't leave them in the kernel sources
> for posterity.
No argument from me there. We certainly don't leave them enabled by
default. Although in truth most of the failures the coredump code
can experience are cases that should never happen in normal operation.
> And none of the coredumpo failure reports had anything to do with
> io_uring bugs anyway. They were literally "print out when disk filled
> up or core dumps weren't enabled".
dump_interrupted was instrumented. That is what io_uring was
triggering. In fact dump_interrupted still has problems with I think
dumping to a pipe.
> If you didn't get a core dump because the kernel didn't have core
> dumps configured, we shouldn't print out some babying kernel message
> about that.
Some of them are certainly silly, or excessive.
> None of this has anything to do with io_uring or kernel bugs.
I respectfully disagree.
A huge part of the problem is that when io_uring triggers
dump_interrupted it is so subtle people don't have a clue what is going
on. Not that I am saying it is necessarily io_uring that is just the
one I have debugged and tried to sort out. Other kernel subsystems
could have similar weird interactions, but io_uring where it plays with
TIF_NOTIFY_SIGNAL has caused problems in the past.
I don't vouch for this implementation or think it is necessarily
the right way to get better information out, but the coredump code
is very much a black box that is quite difficult for people to work
with.
What I know is that recently truncated core dumps have been on peoples
radar enough that we received two separate patches from two different
organizations to do something about them. That says to me that this an
actual problem that people are experiencing, not some theoretical thing.
I am all for reverting code that doesn't work, and for looking for
better solutions, but simply saying to people their pain is not a real
problem. That seems terribly wrong.
Eric
Powered by blists - more mailing lists