[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <5c289c5a-a120-a1d0-ca89-2724a1445fe8@linux.intel.com>
Date: Wed, 28 Apr 2021 14:05:15 +0800
From: Aubrey Li <aubrey.li@...ux.intel.com>
To: Aubrey Li <aubrey.intel@...il.com>, Josh Don <joshdon@...gle.com>
Cc: Don Hiatt <dhiatt@...italocean.com>,
Peter Zijlstra <peterz@...radead.org>,
Joel Fernandes <joel@...lfernandes.org>,
"Hyser,Chris" <chris.hyser@...cle.com>,
Ingo Molnar <mingo@...nel.org>,
Vincent Guittot <vincent.guittot@...aro.org>,
Valentin Schneider <valentin.schneider@....com>,
Mel Gorman <mgorman@...e.de>,
linux-kernel <linux-kernel@...r.kernel.org>,
Thomas Gleixner <tglx@...utronix.de>
Subject: Re: [PATCH 04/19] sched: Prepare for Core-wide rq->lock
On 4/28/21 9:03 AM, Aubrey Li wrote:
> On Wed, Apr 28, 2021 at 7:36 AM Josh Don <joshdon@...gle.com> wrote:
>>
>> On Tue, Apr 27, 2021 at 10:10 AM Don Hiatt <dhiatt@...italocean.com> wrote:
>>> Hi Josh and Peter,
>>>
>>> I've been running into soft lookups and hard lockups when running a script
>>> that just cycles setting the cookie of a group of processes over and over again.
>>>
>>> Unfortunately the only way I can reproduce this is by setting the cookies
>>> on qemu. I've tried sysbench, stress-ng but those seem to work just fine.
>>>
>>> I'm running Peter's branch and even tried the suggested changes here but
>>> still see the same behavior. I enabled panic on hard lockup and here below
>>> is a snippet of the log.
>>>
>>> Is there anything you'd like me to try or have any debugging you'd like me to
>>> do? I'd certainly like to get to the bottom of this.
>>
>> Hi Don,
>>
>> I tried to repro using qemu, but did not generate a lockup. Could you
>> provide more details on what your script is doing (or better yet,
>> share the script directly)? I would have expected you to potentially
>> hit a lockup if you were cycling sched_core being enabled and
>> disabled, but it sounds like you are just recreating the cookie for a
>> process group over and over?
>>
>
> I saw something similar on a bare metal hardware. Also tried the suggested
> patch here and no luck. Panic stack attached with
> softlockup_all_cpu_backtrace=1.
> (sorry, my system has 192 cpus and somehow putting 184 cpus offline causes
> system hang without any message...)
Can you please try the following change to see if the problem is gone on your side?
Thanks,
-Aubrey
diff --git a/kernel/sched/core.c b/kernel/sched/core.c
index f732642e3e09..1ef13b50dfcd 100644
--- a/kernel/sched/core.c
+++ b/kernel/sched/core.c
@@ -493,14 +493,17 @@ void double_rq_lock(struct rq *rq1, struct rq *rq2)
{
lockdep_assert_irqs_disabled();
- if (rq1->cpu > rq2->cpu)
- swap(rq1, rq2);
-
- raw_spin_rq_lock(rq1);
- if (__rq_lockp(rq1) == __rq_lockp(rq2))
- return;
-
- raw_spin_rq_lock_nested(rq2, SINGLE_DEPTH_NESTING);
+ if (__rq_lockp(rq1) == __rq_lockp(rq2)) {
+ raw_spin_rq_lock(rq1);
+ } else {
+ if (__rq_lockp(rq1) < __rq_lockp(rq2)) {
+ raw_spin_rq_lock(rq1);
+ raw_spin_rq_lock_nested(rq2, SINGLE_DEPTH_NESTING);
+ } else {
+ raw_spin_rq_lock(rq2);
+ raw_spin_rq_lock_nested(rq1, SINGLE_DEPTH_NESTING);
+ }
+ }
}
#endif
Powered by blists - more mailing lists