[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <d05aa530-f0f5-4ec2-91ae-b193ae644395@kernel.dk>
Date: Fri, 3 May 2024 12:32:38 -0600
From: Jens Axboe <axboe@...nel.dk>
To: Breno Leitao <leitao@...ian.org>, Pavel Begunkov <asml.silence@...il.com>
Cc: leit@...a.com, "open list:IO_URING" <io-uring@...r.kernel.org>,
open list <linux-kernel@...r.kernel.org>
Subject: Re: [PATCH] io_uring/io-wq: Use set_bit() and test_bit() at
worker->flags
On 5/3/24 11:37 AM, Breno Leitao wrote:
> Utilize set_bit() and test_bit() on worker->flags within io_uring/io-wq
> to address potential data races.
>
> The structure io_worker->flags may be accessed through parallel data
> paths, leading to concurrency issues. When KCSAN is enabled, it reveals
> data races occurring in io_worker_handle_work and
> io_wq_activate_free_worker functions.
>
> BUG: KCSAN: data-race in io_worker_handle_work / io_wq_activate_free_worker
> write to 0xffff8885c4246404 of 4 bytes by task 49071 on cpu 28:
> io_worker_handle_work (io_uring/io-wq.c:434 io_uring/io-wq.c:569)
> io_wq_worker (io_uring/io-wq.c:?)
> <snip>
>
> read to 0xffff8885c4246404 of 4 bytes by task 49024 on cpu 5:
> io_wq_activate_free_worker (io_uring/io-wq.c:? io_uring/io-wq.c:285)
> io_wq_enqueue (io_uring/io-wq.c:947)
> io_queue_iowq (io_uring/io_uring.c:524)
> io_req_task_submit (io_uring/io_uring.c:1511)
> io_handle_tw_list (io_uring/io_uring.c:1198)
>
> Line numbers against commit 18daea77cca6 ("Merge tag 'for-linus' of
> git://git.kernel.org/pub/scm/virt/kvm/kvm").
>
> These races involve writes and reads to the same memory location by
> different tasks running on different CPUs. To mitigate this, refactor
> the code to use atomic operations such as set_bit(), test_bit(), and
> clear_bit() instead of basic "and" and "or" operations. This ensures
> thread-safe manipulation of worker flags.
Looks good, a few comments for v2:
> diff --git a/io_uring/io-wq.c b/io_uring/io-wq.c
> index 522196dfb0ff..6712d70d1f18 100644
> --- a/io_uring/io-wq.c
> +++ b/io_uring/io-wq.c
> @@ -44,7 +44,7 @@ enum {
> */
> struct io_worker {
> refcount_t ref;
> - unsigned flags;
> + unsigned long flags;
> struct hlist_nulls_node nulls_node;
> struct list_head all_list;
> struct task_struct *task;
This now creates a hole in the struct, maybe move 'lock' up after ref so
that it gets filled and the current hole after 'lock' gets removed as
well?
And then I'd renumber the flags, they take bit offsets, not
masks/values. Otherwise it's a bit confusing for someone reading the
code, using masks with test/set bit functions.
--
Jens Axboe
Powered by blists - more mailing lists