linux-kernel - Re: rcu_process_callbacks irqsoff latency caused by taking spinlock with irqs disabled

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [day] [month] [year] [list]

Message-Id: <20180417154357.GA24235@linux.vnet.ibm.com>
Date:   Tue, 17 Apr 2018 08:43:57 -0700
From:   "Paul E. McKenney" <paulmck@...ux.vnet.ibm.com>
To:     Nicholas Piggin <npiggin@...il.com>
Cc:     Linux Kernel Mailing List <linux-kernel@...r.kernel.org>
Subject: Re: rcu_process_callbacks irqsoff latency caused by taking spinlock
 with irqs disabled

On Sun, Apr 08, 2018 at 02:06:18PM -0700, Paul E. McKenney wrote:
> On Sat, Apr 07, 2018 at 07:40:42AM +1000, Nicholas Piggin wrote:
> > On Thu, 5 Apr 2018 08:53:20 -0700
> > "Paul E. McKenney" <paulmck@...ux.vnet.ibm.com> wrote:

[ . . . ]

> > > > Note that rcu doesn't show up consistently at the top, this was
> > > > just one that looked *maybe* like it can be improved. So I don't
> > > > know how reproducible it is.  
> > > 
> > > Ah, that leads me to wonder whether the hypervisor preempted whoever is
> > > currently holding the lock.  Do we have anything set up to detect that
> > > sort of thing?
> > 
> > In this case it was running on bare metal, so it was a genuine latency
> > event. It just hasn't been consistently at the top (scheduler has been
> > there, but I'm bringing that down with tuning).
> 
> OK, never mind about vCPU preemption, then!  ;-)
> 
> It looks like I will have other reasons to decrease rcu_node lock
> contention, so let me see what I can do.

And the intermittent contention behavior you saw makes is plausible
given the current code structure, which avoids contention in the common
case where grace periods follow immediately one after the other, but
does not in the less-likely case where RCU is idle and a bunch of CPUs
simultaneously see the need for a new grace period.  I have a fix in
the works which occasionally actually makes it through rcutorture.  ;-)

I expect to have something robust enough to post to LKML by the end
of this week.

							Thanx, Paul