lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <225f0de7-8dc6-8470-3e8e-f6af58ef668a@kernel.dk>
Date:   Wed, 14 Jun 2023 13:25:40 -0600
From:   Jens Axboe <axboe@...nel.dk>
To:     Linus Torvalds <torvalds@...ux-foundation.org>
Cc:     Zorro Lang <zlang@...hat.com>, io-uring <io-uring@...r.kernel.org>,
        LKML <linux-kernel@...r.kernel.org>,
        Dave Chinner <david@...morbit.com>
Subject: Re: [PATCH] io_uring/io-wq: don't clear PF_IO_WORKER on exit

On 6/14/23 11:44?AM, Linus Torvalds wrote:
> On Tue, 13 Jun 2023 at 18:14, Jens Axboe <axboe@...nel.dk> wrote:
>>
>> +       preempt_disable();
>> +       current->worker_private = NULL;
>> +       preempt_enable();
> 
> Yeah, that preempt_disable/enable cannot possibly make a difference in
> any sane situation.
> 
> If you want to make clear that it should be one single write, do it
> with WRITE_ONCE().
> 
> But realistically, that won't matter either. There's just no way a
> sane compiler can make it do anything else, and just the plain
> 
>         current->worker_private = NULL;
> 
> will be equivalent.
> 
> If there are ordering concerns, then neither preemption nor
> WRITE_ONCE() matter, but "smp_store_release()" would.
> 
> But then any readers should use "smp_load_acquire()" too.
> 
> However, in this case, I don't think any of that matters.

Right, it's all 'current' stuff, at least any users of whatever hangs
off ->worker_private is. I've cut it down and added a comment as well.

> The actual backing store is free'd with kfree_rcu(), so any ordering
> would be against the RCU grace period anyway. So the only ordering
> that matters is, I think, that you set it to NULL *before* that
> kfree_rcu() call, so that we know that "if somebody has seen a
> non-NULL worker_private, then you still have a full RCU grace period
> until it is gone".
> 
> Of course, that all still assumes that any read of worker_private
> (from outside of 'current') is inside an RCU read-locked region. Which
> isn't actually obviously true.
> 
> But at least for the case of io_wq_worker_running() and
> io_wq_worker_sleeping, the call is always just for the current task.
> So there are no ordering constraints at all. Not for preemption, not
> for SMP, not for RCU. It's all entirely thread-local.
> 
> (That may not be obvious in the source code, since
> io_wq_worker_sleeping/running gets a 'tsk' argument, but in the
> context of the scheduler, 'tsk' is always just a cached copy of
> 'current').
> 
> End result: just do it as a plain store.  And I don't understand why
> the free'ing of that data structure is RCU-delayed at all. There does
> not seem to be any non-synchronous users of the worker_private pointer
> at all. So I *think* that
> 
>         kfree_rcu(worker, rcu);
> 
> should just be
> 
>         kfree(worker);
> 
> and I wonder if that rcu-freeing was there to try to hide the bug.
> 
> But maybe I'm missing something.

It's for worker lookup, on activating a new worker for exmaple, and has
a reference associated with it too. This is unrelated to
->worker_private, that is only ever used in context of the the worker
itself. Inside those lookups we need to ensure that 'worker' doesn't go
away, hence why it's freed by rcu. So we cannot get rid of that.

-- 
Jens Axboe

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ