linux-kernel - Re: [PATCH] coredump: Limit what can interrupt coredumps

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <cbfa38bf5536fe37850823c68ea23f6c93dda154.camel@trillion01.com>
Date:   Wed, 11 Aug 2021 16:47:58 -0400
From:   Olivier Langlois <olivier@...llion01.com>
To:     Tony Battersby <tonyb@...ernetics.com>,
        "Eric W. Biederman" <ebiederm@...ssion.com>,
        Oleg Nesterov <oleg@...hat.com>
Cc:     Linus Torvalds <torvalds@...ux-foundation.org>,
        Linux Kernel Mailing List <linux-kernel@...r.kernel.org>,
        linux-fsdevel <linux-fsdevel@...r.kernel.org>,
        io-uring <io-uring@...r.kernel.org>,
        Alexander Viro <viro@...iv.linux.org.uk>,
        Jens Axboe <axboe@...nel.dk>,
        "Pavel Begunkov>" <asml.silence@...il.com>
Subject: Re: [PATCH] coredump: Limit what can interrupt coredumps

On Tue, 2021-08-10 at 17:48 -0400, Tony Battersby wrote:
> > 
> I just ran into this problem also - coredumps from an io_uring
> program
> to a pipe are truncated.  But I am using kernel 5.10.57, which does
> NOT
> have commit 12db8b690010 ("entry: Add support for TIF_NOTIFY_SIGNAL")
> or
> commit 06af8679449d ("coredump: Limit what can interrupt
> coredumps"). 
> Kernel 5.4 works though, so I bisected the problem to commit
> f38c7e3abfba ("io_uring: ensure async buffered read-retry is setup
> properly") in kernel 5.9.  Note that my io_uring program uses only
> async
> buffered reads, which may be why this particular commit makes a
> difference to my program.
> 
> My io_uring program is a multi-purpose long-running program with many
> threads.  Most threads don't use io_uring but a few of them do. 
> Normally, my core dumps are piped to a program so that they can be
> compressed before being written to disk, but I can also test writing
> the
> core dumps directly to disk.  This is what I have found:
> 
> *) Unpatched 5.10.57: if a thread that doesn't use io_uring triggers
> a
> coredump, the core file is written correctly, whether it is written
> to
> disk or piped to a program, even if another thread is using io_uring
> at
> the same time.
> 
> *) Unpatched 5.10.57: if a thread that uses io_uring triggers a
> coredump, the core file is truncated, whether written directly to
> disk
> or piped to a program.
> 
> *) 5.10.57+backport 06af8679449d: if a thread that uses io_uring
> triggers a coredump, and the core is written directly to disk, then
> it
> is written correctly.
> 
> *) 5.10.57+backport 06af8679449d: if a thread that uses io_uring
> triggers a coredump, and the core is piped to a program, then it is
> truncated.
> 
> *) 5.10.57+revert f38c7e3abfba: core dumps are written correctly,
> whether written directly to disk or piped to a program.
> 
> Tony Battersby
> Cybernetics
> 
Tony,

this is super interesting details. I'm leaving for few days so I will
not be able to look into it until I am back but here is my
interpretation of your findings:

f38c7e3abfba makes it more likely that your task ends up in a fd read
wait queue. Previously the io_uring req queuing was failing and
returning EAGAIN. Now it ends up using io uring fast poll.

When the core dump gets written through a pipe, pipe_write must block
waiting for some event. If the task gets waken up by the io_uring wait
queue entry instead, it must somehow make pipe_write fails.

So the problem must be a mix of TIF_NOTIFY_SIGNAL and the fact that
io_uring wait queue entries aren't cleaned up while doing the core
dump.

I have a new modif to try out. I'll hopefully be able to submit a patch
to fix that once I come back (I cannot do it now or else, I'll never
leave ;-))