[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <225f0de7-8dc6-8470-3e8e-f6af58ef668a@kernel.dk>
Date: Wed, 14 Jun 2023 13:25:40 -0600
From: Jens Axboe <axboe@...nel.dk>
To: Linus Torvalds <torvalds@...ux-foundation.org>
Cc: Zorro Lang <zlang@...hat.com>, io-uring <io-uring@...r.kernel.org>,
LKML <linux-kernel@...r.kernel.org>,
Dave Chinner <david@...morbit.com>
Subject: Re: [PATCH] io_uring/io-wq: don't clear PF_IO_WORKER on exit
On 6/14/23 11:44?AM, Linus Torvalds wrote:
> On Tue, 13 Jun 2023 at 18:14, Jens Axboe <axboe@...nel.dk> wrote:
>>
>> + preempt_disable();
>> + current->worker_private = NULL;
>> + preempt_enable();
>
> Yeah, that preempt_disable/enable cannot possibly make a difference in
> any sane situation.
>
> If you want to make clear that it should be one single write, do it
> with WRITE_ONCE().
>
> But realistically, that won't matter either. There's just no way a
> sane compiler can make it do anything else, and just the plain
>
> current->worker_private = NULL;
>
> will be equivalent.
>
> If there are ordering concerns, then neither preemption nor
> WRITE_ONCE() matter, but "smp_store_release()" would.
>
> But then any readers should use "smp_load_acquire()" too.
>
> However, in this case, I don't think any of that matters.
Right, it's all 'current' stuff, at least any users of whatever hangs
off ->worker_private is. I've cut it down and added a comment as well.
> The actual backing store is free'd with kfree_rcu(), so any ordering
> would be against the RCU grace period anyway. So the only ordering
> that matters is, I think, that you set it to NULL *before* that
> kfree_rcu() call, so that we know that "if somebody has seen a
> non-NULL worker_private, then you still have a full RCU grace period
> until it is gone".
>
> Of course, that all still assumes that any read of worker_private
> (from outside of 'current') is inside an RCU read-locked region. Which
> isn't actually obviously true.
>
> But at least for the case of io_wq_worker_running() and
> io_wq_worker_sleeping, the call is always just for the current task.
> So there are no ordering constraints at all. Not for preemption, not
> for SMP, not for RCU. It's all entirely thread-local.
>
> (That may not be obvious in the source code, since
> io_wq_worker_sleeping/running gets a 'tsk' argument, but in the
> context of the scheduler, 'tsk' is always just a cached copy of
> 'current').
>
> End result: just do it as a plain store. And I don't understand why
> the free'ing of that data structure is RCU-delayed at all. There does
> not seem to be any non-synchronous users of the worker_private pointer
> at all. So I *think* that
>
> kfree_rcu(worker, rcu);
>
> should just be
>
> kfree(worker);
>
> and I wonder if that rcu-freeing was there to try to hide the bug.
>
> But maybe I'm missing something.
It's for worker lookup, on activating a new worker for exmaple, and has
a reference associated with it too. This is unrelated to
->worker_private, that is only ever used in context of the the worker
itself. Inside those lookups we need to ensure that 'worker' doesn't go
away, hence why it's freed by rcu. So we cannot get rid of that.
--
Jens Axboe
Powered by blists - more mailing lists