[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <87mtk2qf5s.fsf@email.froward.int.ebiederm.org>
Date: Tue, 11 Jan 2022 12:51:43 -0600
From: "Eric W. Biederman" <ebiederm@...ssion.com>
To: Olivier Langlois <olivier@...llion01.com>
Cc: Heiko Carstens <hca@...ux.ibm.com>,
Linus Torvalds <torvalds@...ux-foundation.org>,
Linux Kernel Mailing List <linux-kernel@...r.kernel.org>,
"<linux-arch@...r.kernel.org>" <linux-arch@...r.kernel.org>,
Linux API <linux-api@...r.kernel.org>,
Alexey Gladkov <legion@...nel.org>,
Kyle Huey <me@...ehuey.com>, Oleg Nesterov <oleg@...hat.com>,
Kees Cook <keescook@...omium.org>,
Al Viro <viro@...iv.linux.org.uk>,
Jens Axboe <axboe@...nel.dk>,
Pavel Begunkov <asml.silence@...il.com>
Subject: Re: [PATCH 1/8] signal: Make SIGKILL during coredumps an explicit
special case
"Eric W. Biederman" <ebiederm@...ssion.com> writes:
> Olivier Langlois <olivier@...llion01.com> writes:
>
>> On Mon, 2022-01-10 at 15:11 -0600, Eric W. Biederman wrote:
>>>
>>>
>>> I have been able to confirm that changing wait_event_interruptible to
>>> wait_event_killable was the culprit. Something about the way
>>> systemd-coredump handles coredumps is not compatible with
>>> wait_event_killable.
>>
>> This is my experience too that systemd-coredump is doing something
>> unexpected. When I tested the patch:
>> https://lore.kernel.org/lkml/cover.1629655338.git.olivier@trillion01.com/
>>
>> to make sure that the patch worked, sending coredumps to systemd-
>> coredump was making systemd-coredump, well, core dump... Not very
>> useful...
>
> Oh. Wow....
>
>> Sending the dumps through a pipe to anything else than systemd-coredump
>> was working fine.
>
> Interesting.
>
> I need to read through the pipe code and see how all of that works. For
> writing directly to disk only ignoring killable interruptions are the
> usual semantics. Ordinary pipe code has different semantics, and I
> suspect that is what is tripping things up.
>
> As for systemd-coredump it does whatever it does and I suspect some
> versions of systemd-coredump are simply not robust if a coredump stops
> unexpectedly.
>
> The good news is the pipe code is simple enough, it will be possible to
> completely read through that code.
My bug, obvious in hindsight is that "try_to_wait_up(TASK_INTERRUPTIBLE)"
does not work on a task that is in sleeping in TASK_KILLABLE.
That looks fixable in wait_for_dump_helpers it just won't be as easy
as changing wait_event_interruptible to wait_event_killable.
To prevent short pipe write from causing short writes during a coredump
I believe all we need to do handle -ERSTARTSYS with TIF_NOTIFY_SIGNAL.
Something like what I have below.
Until wait_for_dump_helpers is sorted out the coredump won't wait for
the dump helper the way it should, but otherwise things should work.
diff --git a/fs/coredump.c b/fs/coredump.c
index 7dece20b162b..0db1baf91420 100644
--- a/fs/coredump.c
+++ b/fs/coredump.c
@@ -796,6 +796,10 @@ static int __dump_emit(struct coredump_params *cprm, const void *addr, int nr)
if (dump_interrupted())
return 0;
n = __kernel_write(file, addr, nr, &pos);
+ while ((n == -ERESTARTSYS) && test_thread_flag(TIF_NOTIFY_SIGNAL)) {
+ tracehook_notify_signal();
+ n = __kernel_write(file, addr, nr, &pos);
+ }
if (n != nr)
return 0;
file->f_pos = pos;
Eric
Powered by blists - more mailing lists