linux-kernel - Re: RCU vs data

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [day] [month] [year] [list]

Message-ID: <20210706141624.GA4397@paulmck-ThinkPad-P17-Gen-1>
Date:   Tue, 6 Jul 2021 07:16:24 -0700
From:   "Paul E. McKenney" <paulmck@...nel.org>
To:     Peter Zijlstra <peterz@...radead.org>
Cc:     Marco Elver <elver@...gle.com>, linux-kernel@...r.kernel.org
Subject: Re: RCU vs data_race()

On Tue, Jul 06, 2021 at 10:37:49AM +0200, Peter Zijlstra wrote:
> On Mon, Jun 21, 2021 at 06:37:57AM -0700, Paul E. McKenney wrote:
> > Because the static inline functions end up in RCU's tranlation units,
> > and they do write to other state.  See for example the line marked with
> > the asterisk at the end of this email, which is apparently between
> > __run_timers() and rcu_torture_reader().  But gdb says that this is
> > really between this statement in __run_timers():
> > 
> > 	base->running_timer = NULL;
> 
> (I'm finding that in expire_timers(), not in __run_timers(), and both
> are in kernel/time/timer.c)
> 
> > And this statement in try_to_del_timer_sync():
> > 
> > 	if (base->running_timer != timer)
> > 
> > Because of inline functions, running KCSAN on RCU can detect races in
> > other subsystems.  In this case, I could argue for a READ_ONCE() on the
> > "if" condition, but last I knew, the rules for timers are that C-language
> > writes do not participate in data races.  So maybe I should add that
> > READ_ONCE(), but running KCSAN on rcutorture would still complain.
> 
> That code is correct as is, AFAICT, so READ_ONCE() is not the right
> annotation.  Also, READ_ONCE() would not actively help the situation in
> any case, and arguably make the code worse and more confusing.
> 
> 
> What happens here is that we read base->running_timer while holding
> base->lock. We set base->running_timer to !0 holding that same
> base->lock from the timer IRQ context. We clear base->running_timer
> *without* base->lock, from the timer IRQ context.
> 
> So yes, there's a race between the locked load of base->running_timer
> and the timer IRQ on another CPU clearing it.
> 
> But since the comparison is an equality test and the only purpose of the
> clear is to destroy that equality, any partial clear serves that
> purpose.
> 
> Now, a partial clear could create a false positive match for another
> timer, which in turn could make try_to_del_timer_sync() fail spuriously
> I suppose, but any caller needs to be able to deal with failure of that
> function, so no harm done there.
> 
> 
> *HOWEVER* timer_curr_running() violates all that.. but that looks like
> it's only ever used as a diagnostic, so that should be fine too.

Maybe a data_race(), then?  Except that in current mainline (as opposed
to v5.13) timer_curr_running() seems to have been inlined.

> Anyway, AFAIU the problem is on the WRITE side, so any READ_ONCE() will
> not ever fix anything here. If we want to get rid of the possible
> spurious fail in try_to_del_timer_sync() we need to consistently
> WRITE/READ_ONCE() all of base->running_timer, if we don't care, we need
> a data_race() annotation somewhere, although I'm not currently seeing
> that inlining you mention that's needed for this problem to manifest in
> the first place.

Coming back to my original question, is my best strategy for RCU
to create data_race()-wrapped variants of the schedule_timeout*()
functions?

							Thanx, Paul