lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <20230216052105.GA20785@ranerica-svr.sc.intel.com>
Date:   Wed, 15 Feb 2023 21:21:05 -0800
From:   Ricardo Neri <ricardo.neri-calderon@...ux.intel.com>
To:     Dietmar Eggemann <dietmar.eggemann@....com>
Cc:     Valentin Schneider <vschneid@...hat.com>,
        Peter Zijlstra <peterz@...radead.org>,
        Juri Lelli <juri.lelli@...hat.com>,
        Vincent Guittot <vincent.guittot@...aro.org>,
        Ricardo Neri <ricardo.neri@...el.com>,
        "Ravi V. Shankar" <ravi.v.shankar@...el.com>,
        Ben Segall <bsegall@...gle.com>,
        Daniel Bristot de Oliveira <bristot@...hat.com>,
        Len Brown <len.brown@...el.com>, Mel Gorman <mgorman@...e.de>,
        "Rafael J. Wysocki" <rafael.j.wysocki@...el.com>,
        Srinivas Pandruvada <srinivas.pandruvada@...ux.intel.com>,
        Steven Rostedt <rostedt@...dmis.org>,
        Tim Chen <tim.c.chen@...ux.intel.com>,
        Ionela Voinescu <ionela.voinescu@....com>, x86@...nel.org,
        linux-kernel@...r.kernel.org, "Tim C . Chen" <tim.c.chen@...el.com>
Subject: Re: [PATCH v3 06/10] sched/fair: Use the prefer_sibling flag of the
 current sched domain

On Mon, Feb 13, 2023 at 10:43:28PM -0800, Ricardo Neri wrote:
> On Mon, Feb 13, 2023 at 01:17:09PM +0100, Dietmar Eggemann wrote:
> > On 10/02/2023 19:31, Ricardo Neri wrote:
> > > On Fri, Feb 10, 2023 at 05:12:30PM +0000, Valentin Schneider wrote:
> > >> On 10/02/23 17:53, Peter Zijlstra wrote:
> > >>> On Fri, Feb 10, 2023 at 02:54:56PM +0000, Valentin Schneider wrote:
> > >>>
> > >>>> So something like have SD_PREFER_SIBLING affect the SD it's on (and not
> > >>>> its parent), but remove it from the lowest non-degenerated topology level?
> > >>>
> > >>> So I was rather confused about the whole moving it between levels things
> > >>> this morning -- conceptually, prefer siblings says you want to try
> > >>> sibling domains before filling up your current domain. Now, balancing
> > >>> between siblings happens one level up, hence looking at child->flags
> > >>> makes perfect sense.
> > >>>
> > >>> But looking at the current domain and still calling it prefer sibling
> > >>> makes absolutely no sense what so ever.
> > >>>
> > >>
> > >> True :-)
> > >>
> > >>> In that confusion I think I also got the polarity wrong, I thought you
> > >>> wanted to kill prefer_sibling for the assymetric SMT cases, instead you
> > >>> want to force enable it as long as there is one SMT child around.
> > > 
> > > Exactly.
> > > 
> > >>>
> > >>> Whichever way around it we do it, I'm thinking perhaps some renaming
> > >>> might be in order to clarify things.
> > >>>
> > >>> How about adding a flag SD_SPREAD_TASKS, which is the effective toggle
> > >>> of the behaviour, but have it be set by children with SD_PREFER_SIBLING
> > >>> or something.
> > >>>
> > >>
> > >> Or entirely bin SD_PREFER_SIBLING and stick with SD_SPREAD_TASKS, but yeah
> > >> something along those lines.
> > > 
> > > I sense a consesus towards SD_SPREAD_TASKS.
> > 
> > Can you not detect the E-core dst_cpu case on MC with:
> > 
> > +       if (child)
> > +               sds->prefer_sibling = child->flags & SD_PREFER_SIBLING;
> > +       else if (sds->busiest)
> > +               sds->prefer_sibling = sds->busiest->group_weight > 1;
> 
> Whose child wants the prefer_sibling setting? In update_sd_lb_stats(), it
> is set based on the flags of the destination CPU's sched domain. But when
> used in find_busiest_group() tasks are spread from the busiest group's
> child domain.
> 
> Your proposed code, also needs a check for SD_PREFER_SIBLING, no?

I tweaked the solution that Dietmar proposed:

-	sds->prefer_sibling = child && child->flags & SD_PREFER_SIBLING;
+	if (sds->busiest)
+		sds->prefer_sibling = sds->busiest->flags & SD_PREFER_SIBLING;

This comes from the observation that the prefer_sibling setting acts on
busiest group. It then depends on whether the busiest group, not the local
group, has child sched sched domains. Today it works because in most cases
both the local and the busiest groups have child domains with the SD_
PREFER_SIBLING flag.

This would also satisfy sched domains with the SD_ASYM_CPUCAPACITY flag as
prefer_sibling would not be set in that case.

It would also conserve the current behavior at the NUMA level. We would
not need to implement SD_SPREAD_TASKS.

This would both fix the SMT vs non-SMT bug and be less invasive.

Thoughts?

Thanks and BR,
Ricardo

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ