[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <cb3419a7-988e-4133-8ec0-c27953c5da4a@gmail.com>
Date: Sun, 12 Jan 2025 01:21:22 +0000
From: Pavel Begunkov <asml.silence@...il.com>
To: Bui Quang Minh <minhquangbui99@...il.com>, linux-kernel@...r.kernel.org
Cc: Jens Axboe <axboe@...nel.dk>, io-uring@...r.kernel.org,
syzbot+3c750be01dab672c513d@...kaller.appspotmail.com,
Li Zetao <lizetao1@...wei.com>
Subject: Re: [PATCH] io_uring: annotate sqd->thread access with data race in
cancel path
On 1/11/25 13:57, Bui Quang Minh wrote:
> On 1/11/25 19:02, Pavel Begunkov wrote:
>> On 1/11/25 10:59, Bui Quang Minh wrote:
>>> The sqd->thread access in io_uring_cancel_generic is just for debug check
>>> so we can safely ignore the data race.
>>>
>>> The sqd->thread access in io_uring_try_cancel_requests is to check if the
>>> caller is the sq threadi with the check ctx->sq_data->thread == current. In
>>> case this is called in a task other than the sq thread, we expect the
>>> expression to be false. And in that case, the sq_data->thread read can race
>>> with the NULL write in the sq thread termination. However, the race will
>>> still make ctx->sq_data->thread == current be false, so we can safely
>>> ignore the data race.
>>>
>>> Reported-by: syzbot+3c750be01dab672c513d@...kaller.appspotmail.com
>>> Reported-by: Li Zetao <lizetao1@...wei.com>
>>> Signed-off-by: Bui Quang Minh <minhquangbui99@...il.com>
>>> ---
>>> io_uring/io_uring.c | 15 ++++++++++++---
>>> 1 file changed, 12 insertions(+), 3 deletions(-)
>>>
>>> diff --git a/io_uring/io_uring.c b/io_uring/io_uring.c
>>> index ff691f37462c..b1a116620ae1 100644
>>> --- a/io_uring/io_uring.c
>>> +++ b/io_uring/io_uring.c
>>> @@ -3094,9 +3094,18 @@ static __cold bool io_uring_try_cancel_requests(struct io_ring_ctx *ctx,
>>> ret |= (cret != IO_WQ_CANCEL_NOTFOUND);
>>> }
>>> - /* SQPOLL thread does its own polling */
>>> + /*
>>> + * SQPOLL thread does its own polling
>>> + *
>>> + * We expect ctx->sq_data->thread == current to be false when
>>> + * this function is called on a task other than the sq thread.
>>> + * In that case, the sq_data->thread read can race with the
>>> + * NULL write in the sq thread termination. However, the race
>>> + * will still make ctx->sq_data->thread == current be false,
>>> + * so we can safely ignore the data race here.
>>> + */
>>> if ((!(ctx->flags & IORING_SETUP_SQPOLL) && cancel_all) ||
>>> - (ctx->sq_data && ctx->sq_data->thread == current)) {
>>> + (ctx->sq_data && data_race(ctx->sq_data->thread) == current)) {
>>> while (!wq_list_empty(&ctx->iopoll_list)) {
>>> io_iopoll_try_reap_events(ctx);
>>> ret = true;
>>
>> data_race() is a hammer we don't want to use to just silence warnings,
>> it can hide real problems. The fact that it needs 6 lines of comments
>> to explain is also not a good sign.
>>
>> Instead, you can pass a flag, i.e. io_uring_cancel_generic() will have
>> non zero sqd IFF it's the SQPOLL task.
>
> At first, I think of using READ_ONCE here and WRITE_ONCE in the sq thread termination to avoid the data race. What do you think about this approach?
Same thing, that'd be complicating synchronisation when there
shouldn't be any races in the first place. Having no races is
easier than wrapping them into READ_ONCE and keeping in mind
what that's even fine.
Btw, the line you're changing doesn't even look right. SQPOLL
clears sqd->task right before starting with cancellations, so
sounds like it's mindlessly comparing NULL == current.
--
Pavel Begunkov
Powered by blists - more mailing lists