linux-kernel - Re: [PATCH v3] sched/fair: filter out overloaded cpus in SIS

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <CADjb_WRmXh0tj7nZZR3QybhLxtoxZBy6OMKRNygtKOx-wUPxZA@mail.gmail.com>
Date:   Mon, 9 May 2022 23:21:05 +0800
From:   Chen Yu <yu.chen.surf@...il.com>
To:     Abel Wu <wuyun.abel@...edance.com>
Cc:     Peter Zijlstra <peterz@...radead.org>,
        Mel Gorman <mgorman@...e.de>,
        Vincent Guittot <vincent.guittot@...aro.org>,
        Josh Don <joshdon@...gle.com>,
        Linux Kernel Mailing List <linux-kernel@...r.kernel.org>
Subject: Re: [PATCH v3] sched/fair: filter out overloaded cpus in SIS

On Sun, May 8, 2022 at 1:50 AM Abel Wu <wuyun.abel@...edance.com> wrote:
>
> Hi Chen,
>
> On 5/8/22 12:09 AM, Chen Yu Wrote:
[cut]
> >> @@ -81,8 +81,20 @@ struct sched_domain_shared {
> >>          atomic_t        ref;
> >>          atomic_t        nr_busy_cpus;
> >>          int             has_idle_cores;
> >> +
> >> +       /*
> >> +        * Tracking of the overloaded cpus can be heavy, so start
> >> +        * a new cacheline to avoid false sharing.
> >> +        */
> > Although we put the following items into different cache line compared to
> > above ones, is it possible that there is still cache false sharing if
> > CPU1 is reading nr_overloaded_cpus while
> > CPU2 is updating overloaded_cpus?
>
> I think it's not false sharing, it's just cache contention. But yes,
> it is still possible if the two items mixed with others (by compiler)
> in one cacheline, which seems out of our control..
>
My understanding is that, since nr_overloaded_cpus starts with a new
cache line,  overloaded_cpus is very likely to be in the same cache line.
Only If the write to nr_overloaded_cpus mask is not frequent(maybe tick based
update is not frequent), the read of nr_overloaded_cpus can survive from cache
false sharing, which is mainly read by SIS.  I have a stupid thought
that if nr_overloaded_cpus
mask and nr_overloaded_cpus could be put to 2 cache lines.
> >> +       atomic_t        nr_overloaded_cpus ____cacheline_aligned;
> > ____cacheline_aligned seems to put nr_overloaded_cpus into data section, which
> > seems to be unnecessary. Would ____cacheline_internodealigned_in_smp
> > be more lightweight?
>
> I didn't see the difference of the two macros, it would be appreciate
> if you can shed some light.
>
Sorry I mistook  ____cacheline_aligned for __cacheline_aligned which is
put into a data section. Please ignore my previous comment.
> >> +       unsigned long   overloaded_cpus[]; /* Must be last */
> >>   };
> >>
[cut]
> >> +       /*
> >> +        * It's unlikely to find an idle cpu if the system is under
> >> +        * heavy pressure, so skip searching to save a few cycles
> >> +        * and relieve cache traffic.
> >> +        */
> >> +       if (weight - nro < (nr >> 4) && !has_idle_core)
> >> +               return -1;
> > In [1] we used util_avg to check if the domain is overloaded and quit
> > earlier, since util_avg would be
> > more stable and contains historic data. But I think nr_running in your
> > patch could be used as
> > complementary metric and added to update_idle_cpu_scan() in [1] IMO.
> >> +
> >>          cpumask_and(cpus, sched_domain_span(sd), p->cpus_ptr);
> >> +       if (nro > 1)
> >> +               cpumask_andnot(cpus, cpus, sdo_mask(sds));
> > If I understand correctly, this is the core of the optimization: SIS
> > filters out the busy cores. I wonder if it
> > is possible to save historic h_nr_running/idle_h_nr_running and use
> > the average value? (like the calculation
> > of avg_scan_cost).
>
> Yes, I have been already working on that for several days, and
> along with some improvement on load balance (group_has_spare).
> Ideally we can finally get rid out of the cache issues.
>
Ok, could you please also Cc me in the next version? I'd like to have
a try.

-- 
Thanks,
Chenyu