lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  PHC 
Open Source and information security mailing list archives
 
Hash Suite for Android: free password hash cracker in your pocket
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Date:   Mon, 18 Mar 2019 14:56:20 +0800
From:   Aubrey Li <aubrey.intel@...il.com>
To:     Subhra Mazumdar <subhra.mazumdar@...cle.com>,
        Peter Zijlstra <peterz@...radead.org>,
        Tim Chen <tim.c.chen@...ux.intel.com>
Cc:     Mel Gorman <mgorman@...hsingularity.net>,
        Ingo Molnar <mingo@...nel.org>,
        Thomas Gleixner <tglx@...utronix.de>,
        Paul Turner <pjt@...gle.com>,
        Linux List Kernel Mailing <linux-kernel@...r.kernel.org>,
        Linus Torvalds <torvalds@...ux-foundation.org>,
        "Fr?d?ric Weisbecker" <fweisbec@...il.com>,
        Kees Cook <keescook@...omium.org>,
        Greg Kerr <kerrnel@...gle.com>
Subject: Re: [RFC][PATCH 00/16] sched: Core scheduling

On Tue, Mar 12, 2019 at 7:36 AM Subhra Mazumdar
<subhra.mazumdar@...cle.com> wrote:
>
>
> On 3/11/19 11:34 AM, Subhra Mazumdar wrote:
> >
> > On 3/10/19 9:23 PM, Aubrey Li wrote:
> >> On Sat, Mar 9, 2019 at 3:50 AM Subhra Mazumdar
> >> <subhra.mazumdar@...cle.com> wrote:
> >>> expected. Most of the performance recovery happens in patch 15 which,
> >>> unfortunately, is also the one that introduces the hard lockup.
> >>>
> >> After applied Subhra's patch, the following is triggered by enabling
> >> core sched when a cgroup is
> >> under heavy load.
> >>
> > It seems you are facing some other deadlock where printk is involved.
> > Can you
> > drop the last patch (patch 16 sched: Debug bits...) and try?
> >
> > Thanks,
> > Subhra
> >
> Never Mind, I am seeing the same lockdep deadlock output even w/o patch
> 16. Btw
> the NULL fix had something missing, following works.
>

okay, here is another one, on my system, the boot up CPUs don't match the
possible cpu map, so the not onlined CPU rq->core are not initialized, which
causes NULL pointer dereference panic in online_fair_sched_group():

And here is a quick fix.
-----------------------------------------------------------------------------------------------------
@@ -10488,7 +10493,8 @@ void online_fair_sched_group(struct task_group *tg)
        for_each_possible_cpu(i) {
                rq = cpu_rq(i);
                se = tg->se[i];
-
+               if (!rq->core)
+                       continue;
                raw_spin_lock_irq(rq_lockp(rq));
                update_rq_clock(rq);
                attach_entity_cfs_rq(se);

Thanks,
-Aubrey

Powered by blists - more mailing lists