linux-kernel - Re: [PATCH 04/19] sched: Prepare for Core-wide rq->lock

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <CAERHkrsoCR7d3N2rhwKCeFDDBv4-S4HzD567mOaV_pngXn_Hkg@mail.gmail.com>
Date:   Wed, 28 Apr 2021 18:57:21 +0800
From:   Aubrey Li <aubrey.intel@...il.com>
To:     Aubrey Li <aubrey.li@...ux.intel.com>
Cc:     Josh Don <joshdon@...gle.com>, Don Hiatt <dhiatt@...italocean.com>,
        Peter Zijlstra <peterz@...radead.org>,
        Joel Fernandes <joel@...lfernandes.org>,
        "Hyser,Chris" <chris.hyser@...cle.com>,
        Ingo Molnar <mingo@...nel.org>,
        Vincent Guittot <vincent.guittot@...aro.org>,
        Valentin Schneider <valentin.schneider@....com>,
        Mel Gorman <mgorman@...e.de>,
        linux-kernel <linux-kernel@...r.kernel.org>,
        Thomas Gleixner <tglx@...utronix.de>
Subject: Re: [PATCH 04/19] sched: Prepare for Core-wide rq->lock

On Wed, Apr 28, 2021 at 2:05 PM Aubrey Li <aubrey.li@...ux.intel.com> wrote:
>
> On 4/28/21 9:03 AM, Aubrey Li wrote:
> > On Wed, Apr 28, 2021 at 7:36 AM Josh Don <joshdon@...gle.com> wrote:
> >>
> >> On Tue, Apr 27, 2021 at 10:10 AM Don Hiatt <dhiatt@...italocean.com> wrote:
> >>> Hi Josh and Peter,
> >>>
> >>> I've been running into soft lookups and hard lockups when running a script
> >>> that just cycles setting the cookie of a group of processes over and over again.
> >>>
> >>> Unfortunately the only way I can reproduce this is by setting the cookies
> >>> on qemu. I've tried sysbench, stress-ng but those seem to work just fine.
> >>>
> >>> I'm running Peter's branch and even tried the suggested changes here but
> >>> still see the same behavior. I enabled panic on hard lockup and here below
> >>> is a snippet of the log.
> >>>
> >>> Is there anything you'd like me to try or have any debugging you'd like me to
> >>> do? I'd certainly like to get to the bottom of this.
> >>
> >> Hi Don,
> >>
> >> I tried to repro using qemu, but did not generate a lockup. Could you
> >> provide more details on what your script is doing (or better yet,
> >> share the script directly)? I would have expected you to potentially
> >> hit a lockup if you were cycling sched_core being enabled and
> >> disabled, but it sounds like you are just recreating the cookie for a
> >> process group over and over?
> >>
> >
> > I saw something similar on a bare metal hardware. Also tried the suggested
> > patch here and no luck. Panic stack attached with
> > softlockup_all_cpu_backtrace=1.
> > (sorry, my system has 192 cpus and somehow putting 184 cpus offline causes
> > system hang without any message...)
>
> Can you please try the following change to see if the problem is gone on your side?
>

Please ignore this patch, as the change of double_rq_unlock() in
Peter's last patch
fixed the problem.

Thanks,
-Aubrey

>
> diff --git a/kernel/sched/core.c b/kernel/sched/core.c
> index f732642e3e09..1ef13b50dfcd 100644
> --- a/kernel/sched/core.c
> +++ b/kernel/sched/core.c
> @@ -493,14 +493,17 @@ void double_rq_lock(struct rq *rq1, struct rq *rq2)
>  {
>         lockdep_assert_irqs_disabled();
>
> -       if (rq1->cpu > rq2->cpu)
> -               swap(rq1, rq2);
> -
> -       raw_spin_rq_lock(rq1);
> -       if (__rq_lockp(rq1) == __rq_lockp(rq2))
> -               return;
> -
> -       raw_spin_rq_lock_nested(rq2, SINGLE_DEPTH_NESTING);
> +       if (__rq_lockp(rq1) == __rq_lockp(rq2)) {
> +               raw_spin_rq_lock(rq1);
> +       } else {
> +               if (__rq_lockp(rq1) < __rq_lockp(rq2)) {
> +                       raw_spin_rq_lock(rq1);
> +                       raw_spin_rq_lock_nested(rq2, SINGLE_DEPTH_NESTING);
> +               } else {
> +                       raw_spin_rq_lock(rq2);
> +                       raw_spin_rq_lock_nested(rq1, SINGLE_DEPTH_NESTING);
> +               }
> +       }
>  }
>  #endif
>