[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <CAHk-=wg_0v3t+mAdS2-sPWD6DTH3Y9aGoQUhx7Mk1MB8gm9xjw@mail.gmail.com>
Date: Thu, 27 Dec 2018 13:46:17 -0800
From: Linus Torvalds <torvalds@...ux-foundation.org>
To: Sargun Dhillon <sargun@...gun.me>
Cc: Vincent Guittot <vincent.guittot@...aro.org>,
Xie XiuQi <xiexiuqi@...wei.com>,
Ingo Molnar <mingo@...hat.com>,
Peter Zijlstra <peterz@...radead.org>, xiezhipeng1@...wei.com,
huawei.libin@...wei.com,
linux-kernel <linux-kernel@...r.kernel.org>,
Dmitry Adamushko <dmitry.adamushko@...il.com>,
Tejun Heo <tj@...nel.org>
Subject: Re: [PATCH] sched: fix infinity loop in update_blocked_averages
On Thu, Dec 27, 2018 at 1:09 PM Sargun Dhillon <sargun@...gun.me> wrote:
>
> This appears to be broken since October on 4.18.5. We've only noticed
> it recently with a workload which does ridiculously parallel compiles
> in cgroups that are rapidly churned.
Yeah, that's probably unusual enough that people will have missed it.
Because it really looks like the bug has been there since 4.13, unless
I'm mis-reading things. Other things have changed there since, so
maybe I am.
> It's also an awkward bug to catch, because none of the lockup
> detectors, were catching it in our environment. The only reason we
> caught it was that it was blocking other cores, and those other cores
> were missing IPIs, resulting in catastrophic failure.
My gut feel is that we just need to revert that commit. It doesn't
revert clealy, but it doesn't look hard to do manually.
Something like the attached?
But we do need Tejun and PeterZ to take a look, since there might be
something subtle going on.
Everybody is probably still on well-deserved vacations, so it might be
a while. But testing the attached patch is probably a good idea
regardless.
Linus
View attachment "patch.diff" of type "text/x-patch" (2945 bytes)
Powered by blists - more mailing lists