linux-kernel - Consolidating RCU-bh, RCU-preempt, and RCU-sched

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [thread-next>] [day] [month] [year] [list]
Message-Id: <20180713000249.GA16907@linux.vnet.ibm.com>
Date:   Thu, 12 Jul 2018 17:02:49 -0700
From:   "Paul E. McKenney" <paulmck@...ux.vnet.ibm.com>
To:     josh@...htriplett.org, rostedt@...dmis.org,
        mathieu.desnoyers@...icios.com, jiangshanlai@...il.com
Cc:     linux-kernel@...r.kernel.org, mingo@...nel.org,
        torvalds@...ux-foundation.org, peterz@...radead.org,
        oleg@...hat.com, edumazet@...gle.com, davem@...emloft.net,
        tglx@...utronix.de
Subject: Consolidating RCU-bh, RCU-preempt, and RCU-sched

Hello!

I now have a semi-reasonable prototype of changes consolidating the
RCU-bh, RCU-preempt, and RCU-sched update-side APIs in my -rcu tree.
There are likely still bugs to be fixed and probably other issues as well,
but a prototype does exist.

Assuming continued good rcutorture results and no objections, I am
thinking in terms of this timeline:

o	Preparatory work and cleanups are slated for the v4.19 merge window.

o	The actual consolidation and post-consolidation cleanup is slated
	for the merge window after v4.19 (v5.0?).  These cleanups include
	the replacements called out below within the RCU implementation
	itself (but excluding kernel/rcu/sync.c, see question below).

o	Replacement of now-obsolete update APIs is slated for the second
	merge window after v4.19 (v5.1?).  The replacements are currently
	expected to be as follows:

	synchronize_rcu_bh() -> synchronize_rcu()
	synchronize_rcu_bh_expedited() -> synchronize_rcu_expedited()
	call_rcu_bh() -> call_rcu()
	rcu_barrier_bh() -> rcu_barrier()
	synchronize_sched() -> synchronize_rcu()
	synchronize_sched_expedited() -> synchronize_rcu_expedited()
	call_rcu_sched() -> call_rcu()
	rcu_barrier_sched() -> rcu_barrier()
	get_state_synchronize_sched() -> get_state_synchronize_rcu()
	cond_synchronize_sched() -> cond_synchronize_rcu()
	synchronize_rcu_mult() -> synchronize_rcu()

	I have done light testing of these replacements with good results.

Any objections to this timeline?

I also have some questions on the ultimate end point.  I have default
choices, which I will likely take if there is no discussion.

o	
	Currently, I am thinking in terms of keeping the per-flavor
	read-side functions.  For example, rcu_read_lock_bh() would
	continue to disable softirq, and would also continue to tell
	lockdep about the RCU-bh read-side critical section.  However,
	synchronize_rcu() will wait for all flavors of read-side critical
	sections, including those introduced by (say) preempt_disable(),
	so there will no longer be any possibility of mismatching (say)
	RCU-bh readers with RCU-sched updaters.

	I could imagine other ways of handling this, including:

	a.	Eliminate rcu_read_lock_bh() in favor of
		local_bh_disable() and so on.  Rely on lockdep
		instrumentation of these other functions to identify RCU
		readers, introducing such instrumentation as needed.  I am
		not a fan of this approach because of the large number of
		places in the Linux kernel where interrupts, preemption,
		and softirqs are enabled or disabled "behind the scenes".

	b.	Eliminate rcu_read_lock_bh() in favor of rcu_read_lock(),
		and required callers to also disable softirqs, preemption,
		or whatever as needed.	I am not a fan of this approach
		because it seems a lot less convenient to users of RCU-bh
		and RCU-sched.

	At the moment, I therefore favor keeping the RCU-bh and RCU-sched
	read-side APIs.  But are there better approaches?

o	How should kernel/rcu/sync.c be handled?  Here are some
	possibilities:

	a.	Leave the full gp_ops[] array and simply translate
		the obsolete update-side functions to their RCU
		equivalents.

	b.	Leave the current gp_ops[] array, but only have
		the RCU_SYNC entry.  The __INIT_HELD field would
		be set to a function that was OK with being in an
		RCU read-side critical section, an interrupt-disabled
		section, etc.

		This allows for possible addition of SRCU functionality.
		It is also a trivial change.  Note that the sole user
		of sync.c uses RCU_SCHED_SYNC, and this would need to
		be changed to RCU_SYNC.

		But is it likely that we will ever add SRCU?

	c.	Eliminate that gp_ops[] array, hard-coding the function
		pointers into their call sites.

	I don't really have a preference.  Left to myself, I will be lazy
	and take option #a.  Are there better approaches?

o	Currently, if a lock related to the scheduler's rq or pi locks is
	held across rcu_read_unlock(), that lock must be held across the
	entire read-side critical section in order to avoid deadlock.
	Now that the end of the RCU read-side critical section is
	deferred until sometime after interrupts are re-enabled, this
	requirement could be lifted.  However, because the end of the RCU
	read-side critical section is detected sometime after interrupts
	are re-enabled, this means that a low-priority RCU reader might
	remain priority-boosted longer than need be, which could be a
	problem when running real-time workloads.

	My current thought is therefore to leave this constraint in
	place.  Thoughts?

Anything else that I should be worried about?  ;-)

							Thanx, Paul