linux-kernel - Re: srcu: BUG in __synchronize

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives
Hash Suite for Android: free password hash cracker in your pocket
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <20170314004702.6650857e@gmail.com>
Date:   Tue, 14 Mar 2017 00:47:02 -0700
From:   Lance Roy <ldr709@...il.com>
To:     "Paul E. McKenney" <paulmck@...ux.vnet.ibm.com>
Cc:     Andrey Konovalov <andreyknvl@...gle.com>,
        Lai Jiangshan <jiangshanlai@...il.com>,
        Josh Triplett <josh@...htriplett.org>,
        Steven Rostedt <rostedt@...dmis.org>,
        Mathieu Desnoyers <mathieu.desnoyers@...icios.com>,
        LKML <linux-kernel@...r.kernel.org>,
        syzkaller <syzkaller@...glegroups.com>,
        Dmitry Vyukov <dvyukov@...gle.com>,
        Kostya Serebryany <kcc@...gle.com>
Subject: Re: srcu: BUG in __synchronize_srcu

I am not sure how the rcu_scheduler_active changes in __synchronize_srcu work,
but there seem to be a few problems in them. First,
"if (done && likely(!driving))" on line 453 doesn't appear to ever happen,
as driving doesn't get set to false when srcu_reschedule is called. This seems
like it could cause a race condition if another thread notices that ->running is
false, adds itself to the queue, set ->running to true, and starts on its own
grace period before the first thread acquires the lock again on line 469. Then
the first thread will then acquire the lock, set running to false, and release
the lock, resulting in an invalid state where ->running is false but the second
thread is still trying to finish its grace period.

Second, the while loop on line 455 seems to violate to rule that ->running
shouldn't be false when there are entries in the queue. If a second thread adds
itself to the queue while the first thread is driving SRCU inside that loop, and
then the first thread finishes its own grace period and quits the loop, it will
set ->running to false even though there is still a thread on the queue.

The second issue requires rcu_scheduler_active to be RCU_SCHEDULER_INIT to
occur, and as I don't what the assumptions during RCU_SCHEDULER_INIT are I don't
know if it is actually a problem, but the first issue looks like it could occur
at any time.

Thanks,
Lance

On Fri, 10 Mar 2017 14:26:09 -0800
"Paul E. McKenney" <paulmck@...ux.vnet.ibm.com> wrote:
> On Fri, Mar 10, 2017 at 08:29:55PM +0100, Andrey Konovalov wrote:
> > On Fri, Mar 10, 2017 at 8:28 PM, Andrey Konovalov <andreyknvl@...gle.com>
> > wrote:
> > > Kernel panic - not syncing: Fatal exception
>
> So the theory is that if !sp->running, all of SRCU's queues must be empty.
> So if you are holding ->queue_lock (with irqs disabled) and you see
> !sp->running, and then you enqueue a callback on ->batch_check0, then
> that callback must be the first in the list.  And the code preceding
> the WARN_ON() you triggered does in fact check and enqueue shile holding
> ->queue_lock with irqs disabled.
>
> And rcu_batch_queue() does operate FIFO as required.  (Otherwise,
> srcu_barrier() would not work.)
>
> There are only three calls to rcu_batch_queue(), and the one involved with
> the WARN_ON() enqueues to ->batch_check0.  The other two enqueue to
> ->batch_queue.  Callbacks move from ->batch_queue to ->batch_check0 to
> ->batch_check1 to ->batch_done, so nothing should slip in front.
>
> Of course, if ->running were ever set to false with any of ->batch_check0,
> ->batch_check1, or ->batch_done non-empty, this WARN_ON() would trigger.
> But srcu_reschedule() sets it to false only if all four batches are
> empty (and checks and sets under ->queue_lock()), and all other cases
> where it is set to false happen at initialization time, and also clear
> out the queues.  Of course, if someone raced an init_srcu_struct() with
> either a call_srcu() or synchronize_srcu(), all bets are off.  Now,
> mmu_notifier.c does invoke init_srcu_struct() manually, but it does
> so at subsys_initcall() time.  Which -might- be after other things are
> happening, so one "hail Mary" attempted fix is to remove mmu_notifier_init()
> and replace the "static struct srcu_struct srcu" with:

>
> 	DEFINE_STATIC_SRCU(srcu);
>
> But this might require changing the name -- I vaguely recall some
> strangeness where the names of statically defined per-CPU variables need
> to be globally unique even when static.  Easy enough to do, though.
> Might need a similar change to the "srcu" instances defined in vmd.c
> and kvm_host.h -- assuming that this change helps.
>
> Another possibility is that something in SRCU is messing with either the
> queues or the ->running field without holding ->queue_lock.  And that does
> seem to be happening -- srcu_advance_batches() invokes rcu_batch_move()
> without holding anything.  Which seems like it could cause trouble
> if someone else was invoking synchronize_srcu() concurrently.  Those
> particular invocations might be safe due to access only by a single
> kthread/workqueue, given that all updates to ->batch_queue are protected
> by ->queue_lock (aside from initialization).
>
> But ->batch_check0 is updated by __synchronize_srcu(), though protected
> by ->queue_lock, and only if ->running is false, and with both the
> check and the update protected by the same ->queue_lock critical section.
> If ->running is false, then the workqueue cannot be running, so it remains
> to see if all other updates to ->batch_check0 are either with ->queue_lock
> held and ->running false on the one hand or from the workqueue handler
> on the other:
>
> srcu_collect_new() updates with ->queue_lock held, but does not check
> 	->running.  It is invoked only from process_srcu(), which in
> 	turn is invoked only as a workqueue handler.  The work is queued
> 	from:
>
> 	call_srcu(), which does so with ->queue_lock held having just
> 		set ->running to true.
>
> 	srcu_reschedule(), which invokes it if there are non-empty
> 		queues.  This is invoked from __synchronize_srcu()
> 		in the case where it has set ->running to true
> 		after finding the queues empty, which should imply
> 		no other instances.
>
> 		It is also invoked from process_srcu(), which is
> 		invoked only as a workqueue handler.  (Yay
> 		recursive inquiry!)
>
> srcu_advance_batches() updates without locks held.  It is invoked as
> 	follows:
>
> 	__synchronize_srcu() in the case where ->running was set, which
> 		as noted before excludes workqueue handlers.
>
> 	process_srcu() which as noted before is only invoked from
> 		a workqueue handler.
>
> So an SRCU workqueue is invoked only from a workqueue handler, or from
> some other task that transitioned ->running from false to true while
> holding ->queuelock.  There should therefore only be one SRCU workqueue
> per srcu_struct, so this should be safe.  Though I hope that it can
> be simplified a bit.  :-/
>
> So the only suggestion I have at the moment is static definition of
> the "srcu" variable.  Lai, Josh, Steve, Mathieu, anything I missed?
>
> 						Thanx, Paul
>