linux-kernel - Re: [RFC][PATCH 00/16] sched: Core scheduling

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <ac10b8fd-da1a-2cc2-afee-4e2b9b4278ee@oracle.com>
Date:   Mon, 11 Mar 2019 16:33:22 -0700
From:   Subhra Mazumdar <subhra.mazumdar@...cle.com>
To:     Aubrey Li <aubrey.intel@...il.com>
Cc:     Mel Gorman <mgorman@...hsingularity.net>,
        Peter Zijlstra <peterz@...radead.org>,
        Ingo Molnar <mingo@...nel.org>,
        Thomas Gleixner <tglx@...utronix.de>,
        Paul Turner <pjt@...gle.com>,
        Tim Chen <tim.c.chen@...ux.intel.com>,
        Linux List Kernel Mailing <linux-kernel@...r.kernel.org>,
        Linus Torvalds <torvalds@...ux-foundation.org>,
        Fr?d?ric Weisbecker <fweisbec@...il.com>,
        Kees Cook <keescook@...omium.org>,
        Greg Kerr <kerrnel@...gle.com>
Subject: Re: [RFC][PATCH 00/16] sched: Core scheduling


On 3/11/19 11:34 AM, Subhra Mazumdar wrote:
>
> On 3/10/19 9:23 PM, Aubrey Li wrote:
>> On Sat, Mar 9, 2019 at 3:50 AM Subhra Mazumdar
>> <subhra.mazumdar@...cle.com> wrote:
>>> expected. Most of the performance recovery happens in patch 15 which,
>>> unfortunately, is also the one that introduces the hard lockup.
>>>
>> After applied Subhra's patch, the following is triggered by enabling
>> core sched when a cgroup is
>> under heavy load.
>>
> It seems you are facing some other deadlock where printk is involved. 
> Can you
> drop the last patch (patch 16 sched: Debug bits...) and try?
>
> Thanks,
> Subhra
>
Never Mind, I am seeing the same lockdep deadlock output even w/o patch 
16. Btw
the NULL fix had something missing, following works.

--------->8------------

diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
index 1d0dac4..27cbc64 100644
--- a/kernel/sched/fair.c
+++ b/kernel/sched/fair.c
@@ -4131,7 +4131,7 @@ pick_next_entity(struct cfs_rq *cfs_rq, struct 
sched_entity *curr)
          * Avoid running the skip buddy, if running something else can
          * be done without getting too unfair.
*/
-       if (cfs_rq->skip == se) {
+       if (cfs_rq->skip && cfs_rq->skip == se) {
                 struct sched_entity *second;

                 if (se == curr) {
@@ -4149,13 +4149,15 @@ pick_next_entity(struct cfs_rq *cfs_rq, struct 
sched_entity *curr)
/*
          * Prefer last buddy, try to return the CPU to a preempted task.
*/
-       if (cfs_rq->last && wakeup_preempt_entity(cfs_rq->last, left) < 1)
+       if (left && cfs_rq->last && wakeup_preempt_entity(cfs_rq->last, 
left)
+           < 1)
                 se = cfs_rq->last;

/*
          * Someone really wants this to run. If it's not unfair, run it.
*/
-       if (cfs_rq->next && wakeup_preempt_entity(cfs_rq->next, left) < 1)
+       if (left && cfs_rq->next && wakeup_preempt_entity(cfs_rq->next, 
left)
+           < 1)
                 se = cfs_rq->next;

         clear_buddies(cfs_rq, se);
@@ -6958,6 +6960,9 @@ pick_task_fair(struct rq *rq)

                 se = pick_next_entity(cfs_rq, NULL);

+               if (!(se || curr))
+                       return NULL;
+
                 if (curr) {
                         if (se && curr->on_rq)
update_curr(cfs_rq);