linux-kernel - Re: [PATCH updated v2] sched/fair: core wide cfs task priority comparison

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <CANaguZD_ZknCrnUA8TYs4rc0TLJZ9J2_FcWmW5cxEMWDTdL6hg@mail.gmail.com>
Date:   Thu, 14 May 2020 18:51:27 -0400
From:   Vineeth Remanan Pillai <vpillai@...italocean.com>
To:     Peter Zijlstra <peterz@...radead.org>
Cc:     Aaron Lu <aaron.lwe@...il.com>,
        Nishanth Aravamudan <naravamudan@...italocean.com>,
        Julien Desfossez <jdesfossez@...italocean.com>,
        Tim Chen <tim.c.chen@...ux.intel.com>,
        Ingo Molnar <mingo@...nel.org>,
        Thomas Gleixner <tglx@...utronix.de>,
        Paul Turner <pjt@...gle.com>,
        Linus Torvalds <torvalds@...ux-foundation.org>,
        Aaron Lu <aaron.lu@...ux.alibaba.com>,
        Linux List Kernel Mailing <linux-kernel@...r.kernel.org>,
        Frédéric Weisbecker <fweisbec@...il.com>,
        Kees Cook <keescook@...omium.org>,
        Greg Kerr <kerrnel@...gle.com>, Phil Auld <pauld@...hat.com>,
        Aubrey Li <aubrey.intel@...il.com>,
        "Li, Aubrey" <aubrey.li@...ux.intel.com>,
        Valentin Schneider <valentin.schneider@....com>,
        Mel Gorman <mgorman@...hsingularity.net>,
        Pawan Gupta <pawan.kumar.gupta@...ux.intel.com>,
        Paolo Bonzini <pbonzini@...hat.com>,
        Joel Fernandes <joelaf@...gle.com>,
        Joel Fernandes <joel@...lfernandes.org>
Subject: Re: [PATCH updated v2] sched/fair: core wide cfs task priority comparison

Hi Peter,

On Thu, May 14, 2020 at 9:02 AM Peter Zijlstra <peterz@...radead.org> wrote:
>
> A little something like so, this syncs min_vruntime when we switch to
> single queue mode. This is very much SMT2 only, I got my head in twist
> when thikning about more siblings, I'll have to try again later.
>
Thanks for the quick patch! :-)

For SMT-n, would it work if sync vruntime if atleast one sibling is
forced idle? Since force_idle is for all the rqs, I think it would
be correct to sync the vruntime if atleast one cpu is forced idle.

> --- a/kernel/sched/core.c
> +++ b/kernel/sched/core.c
> -               if (is_idle_task(rq_i->core_pick) && rq_i->nr_running)
> -                       rq_i->core_forceidle = true;
> +               if (is_idle_task(rq_i->core_pick)) {
> +                       if (rq_i->nr_running)
> +                               rq_i->core_forceidle = true;
> +               } else {
> +                       new_active++;
I think we need to reset new_active on restarting the selection.

> +               }
>
>                 if (i == cpu)
>                         continue;
> @@ -4476,6 +4473,16 @@ next_class:;
>                 WARN_ON_ONCE(!cookie_match(next, rq_i->core_pick));
>         }
>
> +       /* XXX SMT2 only */
> +       if (new_active == 1 && old_active > 1) {
As I mentioned above, would it be correct to check if atleast one sibling is
forced_idle? Something like:
if (cpumask_weight(cpu_smt_mask(cpu)) == old_active && new_active < old_active)

> +               /*
> +                * We just dropped into single-rq mode, increment the sequence
> +                * count to trigger the vruntime sync.
> +                */
> +               rq->core->core_sync_seq++;
> +       }
> +       rq->core->core_active = new_active;
core_active seems to be unused.

> +bool cfs_prio_less(struct task_struct *a, struct task_struct *b)
> +{
> +       struct sched_entity *se_a = &a->se, *se_b = &b->se;
> +       struct cfs_rq *cfs_rq_a, *cfa_rq_b;
> +       u64 vruntime_a, vruntime_b;
> +
> +       while (!is_same_tg(se_a, se_b)) {
> +               int se_a_depth = se_a->depth;
> +               int se_b_depth = se_b->depth;
> +
> +               if (se_a_depth <= se_b_depth)
> +                       se_b = parent_entity(se_b);
> +               if (se_a_depth >= se_b_depth)
> +                       se_a = parent_entity(se_a);
> +       }
> +
> +       cfs_rq_a = cfs_rq_of(se_a);
> +       cfs_rq_b = cfs_rq_of(se_b);
> +
> +       vruntime_a = se_a->vruntime - cfs_rq_a->core_vruntime;
> +       vruntime_b = se_b->vruntime - cfs_rq_b->core_vruntime;
Should we be using core_vruntime conditionally? should it be min_vruntime for
default comparisons and core_vruntime during force_idle?

Thanks,
Vineeth