[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <CAJnrk1Zhku-_ayXqCisYOCWnDf31YDyiWWEHJeMU=BDYoQR9mA@mail.gmail.com>
Date: Tue, 16 Dec 2025 12:46:07 +0800
From: Joanne Koong <joannelkoong@...il.com>
To: Caleb Sander Mateos <csander@...estorage.com>
Cc: Jens Axboe <axboe@...nel.dk>, io-uring@...r.kernel.org, linux-kernel@...r.kernel.org,
syzbot@...kaller.appspotmail.com
Subject: Re: [PATCH v5 6/6] io_uring: avoid uring_lock for IORING_SETUP_SINGLE_ISSUER
On Tue, Dec 16, 2025 at 4:10 AM Caleb Sander Mateos
<csander@...estorage.com> wrote:
>
> io_ring_ctx's mutex uring_lock can be quite expensive in high-IOPS
> workloads. Even when only one thread pinned to a single CPU is accessing
> the io_ring_ctx, the atomic CASes required to lock and unlock the mutex
> are very hot instructions. The mutex's primary purpose is to prevent
> concurrent io_uring system calls on the same io_ring_ctx. However, there
> is already a flag IORING_SETUP_SINGLE_ISSUER that promises only one
> task will make io_uring_enter() and io_uring_register() system calls on
> the io_ring_ctx once it's enabled.
> So if the io_ring_ctx is setup with IORING_SETUP_SINGLE_ISSUER, skip the
> uring_lock mutex_lock() and mutex_unlock() on the submitter_task. On
> other tasks acquiring the ctx uring lock, use a task work item to
> suspend the submitter_task for the critical section.
Does this open the pathway to various data corruption issues since the
submitter task can be suspended while it's in the middle of executing
a section of logic that was previously protected by the mutex? With
this patch (if I'm understandng it correctly), there's now no
guarantee that the logic inside the mutexed section for
IORING_SETUP_SINGLE_ISSUER submitter tasks is "atomically bundled", so
if it gets suspended between two state changes that need to be atomic
/ bundled together, then I think the task that does the suspend would
now see corrupt state.
I did a quick grep and I think one example of this race shows up in
io_uring/rsrc.c for buffer cloning where if the src_ctx has
IORING_SETUP_SINGLE_ISSUER set and the cloning happens at the same
time the submitter task is unregistering the buffers, then this chain
of events happens:
* submitter task is executing the logic in io_sqe_buffers_unregister()
-> io_rsrc_data_free(), and frees data->nodes but data->nr is not yet
updated
* submitter task gets suspended through io_register_clone_buffers() ->
lock_two_rings() -> mutex_lock_nested(&ctx2->uring_lock, ...)
* after suspending the src ctx, -> io_clone_buffers() runs, which will
get the incorrect "nbufs = src_ctx->buf_table.nr;" value
* io_clone_buffers() calls io_rsrc_node_lookup() which will
dereference a NULL pointer
Thanks,
Joanne
> If the io_ring_ctx is IORING_SETUP_R_DISABLED (possible during
> io_uring_setup(), io_uring_register(), or io_uring exit), submitter_task
> may be set concurrently, so acquire the uring_lock before checking it.
> If submitter_task isn't set yet, the uring_lock suffices to provide
> mutual exclusion.
>
> Signed-off-by: Caleb Sander Mateos <csander@...estorage.com>
> Tested-by: syzbot@...kaller.appspotmail.com
> ---
> io_uring/io_uring.c | 12 +++++
> io_uring/io_uring.h | 114 ++++++++++++++++++++++++++++++++++++++++++--
> 2 files changed, 123 insertions(+), 3 deletions(-)
>
Powered by blists - more mailing lists