linux-kernel - Re: [RFC] Deadlock via recursive wakeup via RCU with threadirqs

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <20190627185103.GA8956@google.com>
Date:   Thu, 27 Jun 2019 14:51:03 -0400
From:   Joel Fernandes <joel@...lfernandes.org>
To:     "Paul E. McKenney" <paulmck@...ux.ibm.com>
Cc:     Sebastian Andrzej Siewior <bigeasy@...utronix.de>,
        Steven Rostedt <rostedt@...dmis.org>,
        rcu <rcu@...r.kernel.org>, LKML <linux-kernel@...r.kernel.org>,
        Thomas Gleixner <tglx@...utronix.de>,
        Ingo Molnar <mingo@...hat.com>,
        Peter Zijlstra <peterz@...radead.org>,
        Josh Triplett <josh@...htriplett.org>,
        Mathieu Desnoyers <mathieu.desnoyers@...icios.com>,
        Lai Jiangshan <jiangshanlai@...il.com>
Subject: Re: [RFC] Deadlock via recursive wakeup via RCU with threadirqs

On Thu, Jun 27, 2019 at 02:27:22PM -0400, Joel Fernandes wrote:
> On Thu, Jun 27, 2019 at 11:11:12AM -0700, Paul E. McKenney wrote:
> > On Thu, Jun 27, 2019 at 01:46:27PM -0400, Joel Fernandes wrote:
> > > On Thu, Jun 27, 2019 at 1:43 PM Joel Fernandes <joel@...lfernandes.org> wrote:
> > > >
> > > > On Thu, Jun 27, 2019 at 11:40 AM Sebastian Andrzej Siewior
> > > > <bigeasy@...utronix.de> wrote:
> > > > >
> > > > > On 2019-06-27 11:37:10 [-0400], Joel Fernandes wrote:
> > > > > > Sebastian it would be nice if possible to trace where the
> > > > > > t->rcu_read_unlock_special is set for this scenario of calling
> > > > > > rcu_read_unlock_special, to give a clear idea about whether it was
> > > > > > really because of an IPI. I guess we could also add additional RCU
> > > > > > debug fields to task_struct (just for debugging) to see where there
> > > > > > unlock_special is set.
> > > > > >
> > > > > > Is there a test to reproduce this, or do I just boot an intel x86_64
> > > > > > machine with "threadirqs" and run into it?
> > > > >
> > > > > Do you want to send me a patch or should I send you my kvm image which
> > > > > triggers the bug on boot?
> > > >
> > > > I could reproduce this as well just booting Linus tree with threadirqs
> > > > command line and running rcutorture. In 15 seconds or so it locks
> > > > up... gdb backtrace shows the recursive lock:
> > > 
> > > Sorry that got badly wrapped, so I pasted it here:
> > > https://hastebin.com/ajivofomik.shell
> > 
> > Which rcutorture scenario would that be?  TREE03 is thus far refusing
> > to fail for me when run this way:
> > 
> > $ tools/testing/selftests/rcutorture/bin/kvm.sh --cpus 8 --duration 5 --trust-make --configs "TREE03" --bootargs "threadirqs"
> 
> I built x86_64_defconfig with CONFIG_PREEMPT enabled, then I ran it with
> following boot params:
> rcutorture.shutdown_secs=60 rcutorture.n_barrier_cbs=4 rcutree.kthread_prio=2
> 
> and also "threadirqs"
> 
> This was not a TREE config, but just my simple RCU test using qemu.

Ah, it seems that the issue is reproducible in Linus tree only (which matches
the initial diff Sebastian posted). It cannot be reproduced with your /dev
branch. So perhaps the in_irq() check indeed works.

Looking further, in_irq() does also set the HARDIRQ_MASK in the preempt_count
courtesy of:
#define __irq_enter()                                   \
        do {                                            \
                account_irq_enter_time(current);        \
                preempt_count_add(HARDIRQ_OFFSET);      \
                trace_hardirq_enter();

I dumped the stack at this point as well even with "threadirqs" just to
double confirm that is the case.

So probably, the in_irq() check is sufficient. However I am still a bit
nervous about this issue manifesting in other paths of the scheduler
that don't execute from an interrupt handler, but still would have RCU
reader sections with spinlocks held - I am not sure if this is possible
though but it does make me nervous.

Thanks!

> 
> 
> I will try this diff and let you know.
> 
> > If it had failed, I would have tried the patch shown below.  I know that
> > Sebastian has some concerns about the bug happening anyway, but we have
> > to start somewhere!  ;-)
> > 
> > ------------------------------------------------------------------------
> > 
> > diff --git a/kernel/rcu/tree_plugin.h b/kernel/rcu/tree_plugin.h
> > index 82c925df1d92..be7bafc2c0a0 100644
> > --- a/kernel/rcu/tree_plugin.h
> > +++ b/kernel/rcu/tree_plugin.h
> > @@ -624,25 +624,16 @@ static void rcu_read_unlock_special(struct task_struct *t)
> >  		      (rdp->grpmask & rnp->expmask) ||
> >  		      tick_nohz_full_cpu(rdp->cpu);
> >  		// Need to defer quiescent state until everything is enabled.
> > -		if ((exp || in_irq()) && irqs_were_disabled && use_softirq &&
> > -		    (in_irq() || !t->rcu_read_unlock_special.b.deferred_qs)) {
> > -			// Using softirq, safe to awaken, and we get
> > -			// no help from enabling irqs, unlike bh/preempt.
> > -			raise_softirq_irqoff(RCU_SOFTIRQ);
> > -		} else {
> > -			// Enabling BH or preempt does reschedule, so...
> > -			// Also if no expediting or NO_HZ_FULL, slow is OK.
> > -			set_tsk_need_resched(current);
> > -			set_preempt_need_resched();
> > -			if (IS_ENABLED(CONFIG_IRQ_WORK) && irqs_were_disabled &&
> > -			    !rdp->defer_qs_iw_pending && exp) {
> > -				// Get scheduler to re-evaluate and call hooks.
> > -				// If !IRQ_WORK, FQS scan will eventually IPI.
> > -				init_irq_work(&rdp->defer_qs_iw,
> > -					      rcu_preempt_deferred_qs_handler);
> > -				rdp->defer_qs_iw_pending = true;
> > -				irq_work_queue_on(&rdp->defer_qs_iw, rdp->cpu);
> > -			}
> > +		set_tsk_need_resched(current);
> > +		set_preempt_need_resched();
> > +		if (IS_ENABLED(CONFIG_IRQ_WORK) && irqs_were_disabled &&
> > +		    !rdp->defer_qs_iw_pending && exp) {
> > +			// Get scheduler to re-evaluate and call hooks.
> > +			// If !IRQ_WORK, FQS scan will eventually IPI.
> > +			init_irq_work(&rdp->defer_qs_iw,
> > +				      rcu_preempt_deferred_qs_handler);
> > +			rdp->defer_qs_iw_pending = true;
> > +			irq_work_queue_on(&rdp->defer_qs_iw, rdp->cpu);
> 
> Nice to see the code here got simplified ;-)
>