[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <87lerdr810.fsf@email.froward.int.ebiederm.org>
Date: Wed, 24 Aug 2022 10:11:23 -0500
From: "Eric W. Biederman" <ebiederm@...ssion.com>
To: Jens Axboe <axboe@...nel.dk>
Cc: Olivier Langlois <olivier@...llion01.com>,
Pavel Begunkov <asml.silence@...il.com>,
linux-kernel@...r.kernel.org, linux-fsdevel@...r.kernel.org,
io-uring@...r.kernel.org, Alexander Viro <viro@...iv.linux.org.uk>,
Oleg Nesterov <oleg@...hat.com>,
Linus Torvalds <torvalds@...ux-foundation.org>
Subject: Re: [PATCH 2/2] coredump: Allow coredumps to pipes to work with
io_uring
Jens Axboe <axboe@...nel.dk> writes:
> On 8/23/22 12:22 PM, Eric W. Biederman wrote:
>> Olivier Langlois <olivier@...llion01.com> writes:
>>
>>> On Mon, 2022-08-22 at 17:16 -0400, Olivier Langlois wrote:
>>>>
>>>> What is stopping the task calling do_coredump() to be interrupted and
>>>> call task_work_add() from the interrupt context?
>>>>
>>>> This is precisely what I was experiencing last summer when I did work
>>>> on this issue.
>>>>
>>>> My understanding of how async I/O works with io_uring is that the
>>>> task
>>>> is added to a wait queue without being put to sleep and when the
>>>> io_uring callback is called from the interrupt context,
>>>> task_work_add()
>>>> is called so that the next time io_uring syscall is invoked, pending
>>>> work is processed to complete the I/O.
>>>>
>>>> So if:
>>>>
>>>> 1. io_uring request is initiated AND the task is in a wait queue
>>>> 2. do_coredump() is called before the I/O is completed
>>>>
>>>> IMHO, this is how you end up having task_work_add() called while the
>>>> coredump is generated.
>>>>
>>> I forgot to add that I have experienced the issue with TCP/IP I/O.
>>>
>>> I suspect that with a TCP socket, the race condition window is much
>>> larger than if it was disk I/O and this might make it easier to
>>> reproduce the issue this way...
>>
>> I was under the apparently mistaken impression that the io_uring
>> task_work_add only comes from the io_uring userspace helper threads.
>> Those are definitely suppressed by my change.
>>
>> Do you have any idea in the code where io_uring code is being called in
>> an interrupt context? I would really like to trace that code path so I
>> have a better grasp on what is happening.
>>
>> If task_work_add is being called from interrupt context then something
>> additional from what I have proposed certainly needs to be done.
>
> task_work may come from the helper threads, but generally it does not.
> One example would be doing a read from a socket. There's no data there,
> poll is armed to trigger a retry. When we get the poll notification that
> there's now data to be read, then we kick that off with task_work. Since
> it's from the poll handler, it can trigger from interrupt context. See
> the path from io_uring/poll.c:io_poll_wake() -> __io_poll_execute() ->
> io_req_task_work_add() -> task_work_add().
But that is a task_work to the helper thread correct?
> It can also happen for regular IRQ based reads from regular files, where
> the completion is actually done via task_work added from the potentially
> IRQ based completion path.
I can see that.
Which leaves me with the question do these task_work's directly wake up
the thread that submitted the I/O request? Or is there likely to be
something for an I/O thread to do before an ordinary program thread is
notified.
I am asking because it is only the case of notifying ordinary program
threads that is interesting in the case of a coredump. As I understand
it a data to read notification would typically be handled by the I/O
uring worker thread to trigger reading the data before letting userspace
know everything it asked to be done is complete.
Eric
Powered by blists - more mailing lists