linux-kernel - Re: [PATCH 2/2] coredump: Allow coredumps to pipes to work with io

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <654cb5de-a563-b812-a435-d9b435cee334@kernel.dk>
Date:   Tue, 23 Aug 2022 12:27:06 -0600
From:   Jens Axboe <axboe@...nel.dk>
To:     "Eric W. Biederman" <ebiederm@...ssion.com>,
        Olivier Langlois <olivier@...llion01.com>
Cc:     Pavel Begunkov <asml.silence@...il.com>,
        linux-kernel@...r.kernel.org, linux-fsdevel@...r.kernel.org,
        io-uring@...r.kernel.org, Alexander Viro <viro@...iv.linux.org.uk>,
        Oleg Nesterov <oleg@...hat.com>,
        Linus Torvalds <torvalds@...ux-foundation.org>
Subject: Re: [PATCH 2/2] coredump: Allow coredumps to pipes to work with
 io_uring

On 8/23/22 12:22 PM, Eric W. Biederman wrote:
> Olivier Langlois <olivier@...llion01.com> writes:
> 
>> On Mon, 2022-08-22 at 17:16 -0400, Olivier Langlois wrote:
>>>
>>> What is stopping the task calling do_coredump() to be interrupted and
>>> call task_work_add() from the interrupt context?
>>>
>>> This is precisely what I was experiencing last summer when I did work
>>> on this issue.
>>>
>>> My understanding of how async I/O works with io_uring is that the
>>> task
>>> is added to a wait queue without being put to sleep and when the
>>> io_uring callback is called from the interrupt context,
>>> task_work_add()
>>> is called so that the next time io_uring syscall is invoked, pending
>>> work is processed to complete the I/O.
>>>
>>> So if:
>>>
>>> 1. io_uring request is initiated AND the task is in a wait queue
>>> 2. do_coredump() is called before the I/O is completed
>>>
>>> IMHO, this is how you end up having task_work_add() called while the
>>> coredump is generated.
>>>
>> I forgot to add that I have experienced the issue with TCP/IP I/O.
>>
>> I suspect that with a TCP socket, the race condition window is much
>> larger than if it was disk I/O and this might make it easier to
>> reproduce the issue this way...
> 
> I was under the apparently mistaken impression that the io_uring
> task_work_add only comes from the io_uring userspace helper threads.
> Those are definitely suppressed by my change.
> 
> Do you have any idea in the code where io_uring code is being called in
> an interrupt context?  I would really like to trace that code path so I
> have a better grasp on what is happening.
> 
> If task_work_add is being called from interrupt context then something
> additional from what I have proposed certainly needs to be done.

task_work may come from the helper threads, but generally it does not.
One example would be doing a read from a socket. There's no data there,
poll is armed to trigger a retry. When we get the poll notification that
there's now data to be read, then we kick that off with task_work. Since
it's from the poll handler, it can trigger from interrupt context. See
the path from io_uring/poll.c:io_poll_wake() -> __io_poll_execute() ->
io_req_task_work_add() -> task_work_add().

It can also happen for regular IRQ based reads from regular files, where
the completion is actually done via task_work added from the potentially
IRQ based completion path.

-- 
Jens Axboe