[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <20170608212814.GD2553@templeofstupid.com>
Date: Thu, 8 Jun 2017 14:28:14 -0700
From: Krister Johansen <kjlx@...pleofstupid.com>
To: "Paul E. McKenney" <paulmck@...ux.vnet.ibm.com>
Cc: Krister Johansen <kjlx@...pleofstupid.com>,
linux-kernel@...r.kernel.org, mingo@...nel.org,
jiangshanlai@...il.com, dipankar@...ibm.com,
akpm@...ux-foundation.org, mathieu.desnoyers@...icios.com,
josh@...htriplett.org, tglx@...utronix.de, peterz@...radead.org,
rostedt@...dmis.org, dhowells@...hat.com, edumazet@...gle.com,
fweisbec@...il.com, oleg@...hat.com, bobby.prani@...il.com,
stable@...r.kernel.org, gregkh@...uxfoundation.org
Subject: Re: [PATCH tip/core/rcu 45/88] rcu: Add memory barriers for NOCB
leader wakeup
On Thu, Jun 08, 2017 at 01:55:00PM -0700, Paul E. McKenney wrote:
> On Thu, Jun 08, 2017 at 01:11:48PM -0700, Krister Johansen wrote:
> > May I impose upon you to CC this patch to stable, and tag it as fixing
> > abedf8e241? I ran into this on a production 4.9 branch. When I
> > debugged it, I discovered that it went all the way back to 4.6. The
> > tl;dr is that at least for some environments, the missed wakeup
> > manifests itself as a series of hung-task warnings to console and if I'm
> > unlucky it can also generate a hang that can block interactive logins
> > via ssh.
>
> Interesting! This is the first that I have heard that this was anything
> other than a theoretical bug. To the comment in your second URL, it is
> wise to recall that a seismologist was in fact arrested for failing to
> predict an earthquake. Later acquitted/pardoned/whatever, but arrested
> nonetheless. ;-)
Point taken. I do realize that we all make mistakes, and certainly I do
too. Perhaps I should have said that my survey of current callers of
swake_up() was enough to convince me that I didn't have an immediate
problem elsewhere, but that I'm not familiar enough with the code base
to make that statement with a lot of authority. The concern being that if
the patch came from RT-linux where the barrier was present in
swake_up(), are there other places where swake_up() callers still assume
this is being handled on their behalf?
As part of this, I also pondered whether I should add a comment around
swake_up(), similar to what's already there for waitqueue_active.
I wasn't sure how subtle this is for other consumers, though.
> Silliness aside, does my patch actually fix your problem in practice as
> well as in theory? If so, may I have your Tested-by?
Yes, it absolutely does. Consider it given:
Tested-by: Krister Johansen <kjlx@...pleofstupid.com>
> Impressive investigative effort, by the way!
Thanks!
-K
Powered by blists - more mailing lists