[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <87ee56e43r.fsf@email.froward.int.ebiederm.org>
Date: Mon, 17 Jan 2022 10:09:28 -0600
From: "Eric W. Biederman" <ebiederm@...ssion.com>
To: Olivier Langlois <olivier@...llion01.com>
Cc: Linus Torvalds <torvalds@...ux-foundation.org>,
Heiko Carstens <hca@...ux.ibm.com>,
Linux Kernel Mailing List <linux-kernel@...r.kernel.org>,
"<linux-arch@...r.kernel.org>" <linux-arch@...r.kernel.org>,
Linux API <linux-api@...r.kernel.org>,
Alexey Gladkov <legion@...nel.org>,
Kyle Huey <me@...ehuey.com>, Oleg Nesterov <oleg@...hat.com>,
Kees Cook <keescook@...omium.org>,
Al Viro <viro@...iv.linux.org.uk>,
Jens Axboe <axboe@...nel.dk>,
Pavel Begunkov <asml.silence@...il.com>
Subject: Re: [PATCH 1/8] signal: Make SIGKILL during coredumps an explicit
special case
Olivier Langlois <olivier@...llion01.com> writes:
> On Fri, 2022-01-14 at 18:12 -0600, Eric W. Biederman wrote:
>> Linus Torvalds <torvalds@...ux-foundation.org> writes:
>>
>> > On Tue, Jan 11, 2022 at 10:51 AM Eric W. Biederman
>> > <ebiederm@...ssion.com> wrote:
>> > >
>> > > + while ((n == -ERESTARTSYS) &&
>> > > test_thread_flag(TIF_NOTIFY_SIGNAL)) {
>> > > + tracehook_notify_signal();
>> > > + n = __kernel_write(file, addr, nr, &pos);
>> > > + }
>> >
>> > This reads horribly wrongly to me.
>> >
>> > That "tracehook_notify_signal()" thing *has* to be renamed before
>> > we
>> > have anything like this that otherwise looks like "this will just
>> > loop
>> > forever".
>> >
>> > I'm pretty sure we've discussed that "tracehook" thing before - the
>> > whole header file is misnamed, and most of the functions in theer
>> > are
>> > too.
>> >
>> > As an ugly alternative, open-code it, so that it's clear that "yup,
>> > that clears the TIF_NOTIFY_SIGNAL flag".
>>
>> A cleaner alternative looks like to modify the pipe code to use
>> wake_up_XXX instead of wake_up_interruptible_XXX and then have code
>> that does pipe_write_killable instead of pipe_write_interruptible.
>
> Do not forget that the problem might not be limited to the pipe FS as
> Oleg Nesterov pointed out here:
>
> https://lore.kernel.org/io-uring/20210614141032.GA13677@redhat.com/
>
> This is why I did like your patch fixing __dump_emit. If the only
> problem is the tracehook_notify_signal() function unclear name, that
> should be addressed instead of trying to fix the problem in a different
> way.
It might be that the fix is to run a portion of the exit_to_userspace
loop that does:
if (ti_work & (_TIF_SIGPENDING | _TIF_NOTIFY_SIGNAL))
handle_signal_work(regs, ti_work);
I am deep in brainstorm mode trying to find something that comes out
clean.
Oleg is right that while to be POSIX compliant and otherwise compatible
with traditional unix behavior sleeps in filesystems need to be
uninterruptible. NFS has not always provided that compatibility.
>> There is also a question of how all of this should interact with the
>> freezer, as I think changing from interruptible to killable means
>> that
>> the coredumps became unfreezable.
>>
>> I am busily simmering this on my back burner and I hope I can come up
>> with something sensible.
>
> IMHO, fixing the problem on the emit function side has the merit of
> being future proof if something else than io_uring in the future would
> raise the TIF_NOTIFY_SIGNAL flag
>
> but I am wondering why no one commented anything about my proposal of
> cancelling io_uring before generating the core dump therefore stopping
> it to flip TIF_NOTIFY_SIGNAL while the core dump is generated.
>
> Is there something wrong with my proposed approach?
> https://lore.kernel.org/lkml/cover.1629655338.git.olivier@trillion01.com/
>
> It did flawlessly created many dozens of io_uring app core dumps in the
> last months for me...
>From my perspective I am not at all convinced that io_uring is the only
culprit.
Beyond that the purpose of a coredump is to snapshot the process as it
is, before anything is shutdown so that someone can examine the coredump
and figure out what failed. Running around changing the state of the
process has a very real chance of hiding what is going wrong.
Further your change requires that there be a place for io_uring to clean
things up. Given that fundamentally that seems like the wrong thing to
me I am not interested in making it easy to what looks like the wrong
thing.
All of this may be perfection being the enemy of the good (especially as
your io_uring magic happens as a special case in do_coredump). My work
in this area is to remove hacks so I can be convinced the code works
100% of the time so unfortunately I am not interested in pick up a
change that is only good enough. Someone else like Andrew Morton might
be.
None of that changes the fact that tracehook_notify_signal needs to be
renamed. That effects your approach and my proof of concept approach.
So renaming tracehook_notify_signal just needs to be done.
Eric
Powered by blists - more mailing lists