linux-kernel - Re: Quick review of -rt RCU-related patches

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <20111004231504.GG2223@linux.vnet.ibm.com>
Date:	Tue, 4 Oct 2011 16:15:04 -0700
From:	"Paul E. McKenney" <paulmck@...ux.vnet.ibm.com>
To:	Thomas Gleixner <tglx@...utronix.de>
Cc:	linux-rt-users <linux-rt-users@...r.kernel.org>,
	LKML <linux-kernel@...r.kernel.org>,
	Peter Zijlstra <peterz@...radead.org>
Subject: Re: Quick review of -rt RCU-related patches

On Wed, Oct 05, 2011 at 12:05:47AM +0200, Thomas Gleixner wrote:
> Paul,
> 
> On Tue, 4 Oct 2011, Paul E. McKenney wrote:
> > rcu-reduce-lock-section.patch
> > 
> > 	Prevents a wakeup within an irq-disabled raw spinlock.
> > 	But which wakeup?
> 
> The call comes from:
> 
>     synchronize_rcu_expedited()
> 	raw_spin_lock_irqsave(&rsp->onofflock, flags);
>         sync_rcu_preempt_exp_init()
> 	   rcu_report_exp_rnp()
>     
> 
> > 	o	rcu_report_exp_rnp() is only permitted to do a wakeup
> > 		if called with rrupts enabled.
> 
> That's not what the code does, obviously.
> 
> > 	o	All calls enable wakeups -except- for the call from
> > 		sync_rcu_preempt_exp_init(), but in that case, there
> > 		is nothing to wake up -- it is in fact running doing
> > 		the initialization.
> 
> Right, though the problem is that rcu_report_exp_rnp() unconditionally
> calls wake_up() which takes the wait_queue lock (a "sleeping spinlock"
> on RT). So the patch merily prevents the call when called from
> synchronize_rcu_expedited() ...

OK, got it.

> > signal-fix-up-rcu-wreckage.patch
> > 
> > 	Makes __lock_task_sighand() do local_irq_save_nort() instead
> > 	of local_irq_save().  The RCU implementation should be OK with
> > 	this.  The signal implementation might or might not.
> 
> It does :)
> 
> > sched-might-sleep-do-not-account-rcu-depth.patch
> > 
> > 	Yow!!!  For CONFIG_PREEMPT_RT_FULL, this redefines
> > 	sched_rcu_preempt_depth() to always return zero.
> > 
> > 	This means that might_sleep() will never complain about
> > 	blocking in an RCU read-side critical section.  I guess that
> > 	this is necessary, though it would be better to have some
> > 	way to complain for general sleeping (e.g., waiting on
> > 	network receive) as opposed to blocking on a lock that is
> > 	subject to priority inheritance.
> 
> Well, there's always a remaining problem. We need that stuff fully
> preemptible on rt. Any ideas ?

Not yet.  We would have to classify context switches into two groups:

1.	Preemptions or blocking waiting for sleeping spinlocks.

2.	Everything else.

Given that classification, it would be straightforward: prohibit
group #2 context switches while in RCU-preempt read-side critical
sections.  I know, easy for me to say!  ;-)

> > rcu-force-preempt-rcu-for-rt.patch
> > 
> > 	This one should be obsoleted by commit 8008e129dc9, which is
> > 	queued in -tip, hopefully for 3.2.
> 
> Ok.
> 
> > 	Except for the RT_GROUP_SCHED change, which is unrelated.
> 
> Huch, that should be in a different patch

I was wondering about that.

> > rcu-disable-the-rcu-bh-stuff-for-rt.patch
> > 
> > 	This implements RCU-bh in terms of RCU-preempt, but disables
> > 	BH around the resulting RCU-preempt read-side critical section.
> > 	May be vulnerable to network-based denial-of-service attacks,
> > 	which could OOM a system with this patch.
> > 
> > 	The implementation of rcu_read_lock_bh_held() is weak, but OK.	In
> > 	an ideal world, it would also complain if not local_bh_disable().
> 
> Well, I dropped that after our IRC conversation, but we still need to
> have some extra debugging for RT to diagnose situations where we break
> those rcu_bh assumptions. That _bh rcu stuff should have never been
> there, we'd rather should drop the softirq processing back to
> ksoftirqd in such an overload case (in mainline) and voluntary
> schedule away from ksoftirqd until the situation is resolved.
> 
> I consider rcu_bh a bandaid for the nasty implict semantics of BH
> processing and I'm looking really forward to Peters analysis of the
> modern cpu local BKL constructs at RTLWS.
> 
> The patch stemmed from an earlier discussion about getting rid of
> those special rcu_bh semantics even in mainline for the sake of not
> making a special case for a completely over(ab)used construct which
> originates from the historically grown softirq behaviour. I think that
> keeping the special cased rcu_bh stuff around is just taking any
> incentive away from moving that whole softirq processing into a
> scheduler controllable environment (i.e. threaded interrupts). 

Between -rt and the guys pushing packets, I can tell that this is going
to be a fun one.  ;-)

I will see if I can come up with a way to make that patch safe to
apply.  Might not be all that hard.

> > _paul_e__mckenney_-eliminate_-_rcu_boosted.patch
> > 
> > 	Looks fine, but I might not be the best reviewer for this one.
> 
> :)
> 
> > peter_zijlstra-frob-rcu.patch
> > 
> > 	Looks OK.  Hmmm...  Should this one go to mainline?
> > 	Oh, looks equivalent, actually.  So why the change?
> 
> Peter ?
> 
> > Possible fixes from the 3.2-bound RCU commits in -tip:
> > 
> > 7eb4f455 (Make rcu_implicit_dynticks_qs() locals be correct size)
> > 
> > 	Symptoms: misbehavior on 32-bit systems after a given CPU went
> > 		through about 2^30 dyntick-idle transitions.  You would
> > 		have to work pretty hard to get this one to happen.
> 
> Definitley and most of our test systems run with nohz=off
> 
> > 5c51dd73 (Prevent early boot set_need_resched() from __rcu_pending())
> > 
> > 	Symptoms: boot-time hangs for rare configurations.
> 
> Never seen that one
> 
> > 037067a1 (Prohibit grace periods during early boot)
> > 
> > 	Symptoms: boot-time hangs for rare configurations.
> 
> Ditto
> 
> > 06ae115a (Avoid having just-onlined CPU resched itself when RCU is idle)
> > 
> > 	Symptoms: boot-time hangs for rare configurations.
> 
> Ditto
> 
> > 5b61b0ba (Wire up RCU_BOOST_PRIO for rcutree)
> > 
> > 	Symptoms: The system uses RT priority 1 for boosting regardless
> > 		of the value of CONFIG_RCU_BOOST_PRIO.
> 
> Makes sense
> 
> > e90c53d3 (Remove rcu_needs_cpu_flush() to avoid false quiescent states)
> > 
> > 	Symptoms: Too-short grace periods for RCU_FAST_NO_HZ=y.
> > 		But simpler to set RCU_FAST_NO_HZ=n.
> 
> We probably don't have that on

I hope not.  ;-)

> > And the following is a patch to a theoretical problem with expedited
> > grace periods.
> > 
> > ------------------------------------------------------------------------
> > 
> > rcu: Avoid RCU-preempt expedited grace-period botch
> 
> Were you able to actually trigger that?

Nope, by inspection only.  Which is a good thing, as this would likely
be very hard to reproduce and even harder to debug.

							Thanx, Paul

> Thanks,
> 
> 	tglx
> 
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/