[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <YK5tyZNAFc8dh6ke@elver.google.com>
Date: Wed, 26 May 2021 17:48:25 +0200
From: Marco Elver <elver@...gle.com>
To: asml.silence@...il.com, axboe@...nel.dk
Cc: syzbot <syzbot+73554e2258b7b8bf0bbf@...kaller.appspotmail.com>,
io-uring@...r.kernel.org, linux-kernel@...r.kernel.org,
syzkaller-bugs@...glegroups.com, dvyukov@...gle.com
Subject: Re: [syzbot] KCSAN: data-race in __io_uring_cancel /
io_uring_try_cancel_requests
On Wed, May 26, 2021 at 08:44AM -0700, syzbot wrote:
> Hello,
>
> syzbot found the following issue on:
>
> HEAD commit: a050a6d2 Merge tag 'perf-tools-fixes-for-v5.13-2021-05-24'..
> git tree: upstream
> console output: https://syzkaller.appspot.com/x/log.txt?x=13205087d00000
> kernel config: https://syzkaller.appspot.com/x/.config?x=3bcc8a6b51ef8094
> dashboard link: https://syzkaller.appspot.com/bug?extid=73554e2258b7b8bf0bbf
> compiler: Debian clang version 11.0.1-2
[...]
> write to 0xffff88811d8df330 of 8 bytes by task 3709 on cpu 1:
> io_uring_clean_tctx fs/io_uring.c:9042 [inline]
> __io_uring_cancel+0x261/0x3b0 fs/io_uring.c:9136
> io_uring_files_cancel include/linux/io_uring.h:16 [inline]
> do_exit+0x185/0x1560 kernel/exit.c:781
> do_group_exit+0xce/0x1a0 kernel/exit.c:923
> get_signal+0xfc3/0x1610 kernel/signal.c:2835
> arch_do_signal_or_restart+0x2a/0x220 arch/x86/kernel/signal.c:789
> handle_signal_work kernel/entry/common.c:147 [inline]
> exit_to_user_mode_loop kernel/entry/common.c:171 [inline]
> exit_to_user_mode_prepare+0x109/0x190 kernel/entry/common.c:208
> __syscall_exit_to_user_mode_work kernel/entry/common.c:290 [inline]
> syscall_exit_to_user_mode+0x20/0x40 kernel/entry/common.c:301
> do_syscall_64+0x56/0x90 arch/x86/entry/common.c:57
> entry_SYSCALL_64_after_hwframe+0x44/0xae
>
> read to 0xffff88811d8df330 of 8 bytes by task 6412 on cpu 0:
> io_uring_try_cancel_iowq fs/io_uring.c:8911 [inline]
> io_uring_try_cancel_requests+0x1ce/0x8e0 fs/io_uring.c:8933
> io_ring_exit_work+0x7c/0x1110 fs/io_uring.c:8736
> process_one_work+0x3e9/0x8f0 kernel/workqueue.c:2276
> worker_thread+0x636/0xae0 kernel/workqueue.c:2422
> kthread+0x1d0/0x1f0 kernel/kthread.c:313
> ret_from_fork+0x1f/0x30 arch/x86/entry/entry_64.S:294
I wasn't entirely sure if io_wq is guaranteed to remain live in this
case in io_uring_try_cancel_iowq(), but the comment there suggests it
does. In that case, I think the below patch would explain the situation
better and also propose a fix.
Thoughts?
Thanks,
-- Marco
------ >8 ------
From: Marco Elver <elver@...gle.com>
Date: Wed, 26 May 2021 16:56:37 +0200
Subject: [PATCH] io_uring: fix data race to avoid potential NULL-deref
Commit ba5ef6dc8a82 ("io_uring: fortify tctx/io_wq cleanup") introduced
setting tctx->io_wq to NULL a bit earlier. This has caused KCSAN to
detect a data race between between accesses to tctx->io_wq:
write to 0xffff88811d8df330 of 8 bytes by task 3709 on cpu 1:
io_uring_clean_tctx fs/io_uring.c:9042 [inline]
__io_uring_cancel fs/io_uring.c:9136
io_uring_files_cancel include/linux/io_uring.h:16 [inline]
do_exit kernel/exit.c:781
do_group_exit kernel/exit.c:923
get_signal kernel/signal.c:2835
arch_do_signal_or_restart arch/x86/kernel/signal.c:789
handle_signal_work kernel/entry/common.c:147 [inline]
exit_to_user_mode_loop kernel/entry/common.c:171 [inline]
...
read to 0xffff88811d8df330 of 8 bytes by task 6412 on cpu 0:
io_uring_try_cancel_iowq fs/io_uring.c:8911 [inline]
io_uring_try_cancel_requests fs/io_uring.c:8933
io_ring_exit_work fs/io_uring.c:8736
process_one_work kernel/workqueue.c:2276
...
With the config used, KCSAN only reports data races with value changes:
this implies that in the case here we also know that tctx->io_wq was
non-NULL. Therefore, depending on interleaving, we may end up with:
[CPU 0] | [CPU 1]
io_uring_try_cancel_iowq() | io_uring_clean_tctx()
if (!tctx->io_wq) // false | ...
... | tctx->io_wq = NULL
io_wq_cancel_cb(tctx->io_wq, ...) | ...
-> NULL-deref |
Note: It is likely that thus far we've gotten lucky and the compiler
optimizes the double-read into a single read into a register -- but this
is never guaranteed, and can easily change with a different config!
Fix the data race by atomically accessing tctx->io_wq. Of course, this
assumes that a valid io_wq remains alive for the duration of
io_uring_try_cancel_iowq(), which should be the case per comment there.
Reported-by: syzbot+bf2b3d0435b9b728946c@...kaller.appspotmail.com
Signed-off-by: Marco Elver <elver@...gle.com>
Cc: Jens Axboe <axboe@...nel.dk>
Cc: Pavel Begunkov <asml.silence@...il.com>
---
fs/io_uring.c | 8 ++++++--
1 file changed, 6 insertions(+), 2 deletions(-)
diff --git a/fs/io_uring.c b/fs/io_uring.c
index 5f82954004f6..c7e27b464cb6 100644
--- a/fs/io_uring.c
+++ b/fs/io_uring.c
@@ -8903,12 +8903,16 @@ static bool io_uring_try_cancel_iowq(struct io_ring_ctx *ctx)
mutex_lock(&ctx->uring_lock);
list_for_each_entry(node, &ctx->tctx_list, ctx_node) {
struct io_uring_task *tctx = node->task->io_uring;
+ struct io_wq *io_wq;
+ if (!tctx)
+ continue;
/*
* io_wq will stay alive while we hold uring_lock, because it's
* killed after ctx nodes, which requires to take the lock.
*/
- if (!tctx || !tctx->io_wq)
+ io_wq = READ_ONCE(tctx->io_wq);
+ if (!io_wq)
continue;
cret = io_wq_cancel_cb(tctx->io_wq, io_cancel_ctx_cb, ctx, true);
ret |= (cret != IO_WQ_CANCEL_NOTFOUND);
@@ -9039,7 +9043,7 @@ static void io_uring_clean_tctx(struct io_uring_task *tctx)
struct io_tctx_node *node;
unsigned long index;
- tctx->io_wq = NULL;
+ WRITE_ONCE(tctx->io_wq, NULL);
xa_for_each(&tctx->xa, index, node)
io_uring_del_task_file(index);
if (wq)
--
2.31.1.818.g46aad6cb9e-goog
Powered by blists - more mailing lists