lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Date:   Wed, 4 Mar 2020 20:01:54 +0000
From:   Qais Yousef <qais.yousef@....com>
To:     Steven Rostedt <rostedt@...dmis.org>
Cc:     Ingo Molnar <mingo@...nel.org>,
        Peter Zijlstra <peterz@...radead.org>,
        Dietmar Eggemann <dietmar.eggemann@....com>,
        Pavan Kondeti <pkondeti@...eaurora.org>,
        Juri Lelli <juri.lelli@...hat.com>,
        Vincent Guittot <vincent.guittot@...aro.org>,
        Ben Segall <bsegall@...gle.com>, Mel Gorman <mgorman@...e.de>,
        linux-kernel@...r.kernel.org
Subject: Re: [PATCH v3 1/6] sched/rt: cpupri_find: Implement fallback
 mechanism for !fit case

On 03/04/20 13:54, Steven Rostedt wrote:
> > If we fix 1, then assuming found == -1 for all level, we'll still have the
> > problem that the mask is stale.
> > 
> > We can do a full scan again as Tao was suggestion, the 2nd one without any
> > fitness check that is. But isn't this expensive?
> 
> I was hoping to try to avoid that, but it's not that expensive and will
> probably seldom happen. Perhaps we should run some test cases and trace the
> results to see how often that can happen.
> 
> > 
> > We risk the mask being stale anyway directly after selecting it. Or a priority
> > level might become the lowest level just after we dismissed it.
> 
> Sure, but that's still a better effort.

Okay let me run some quick tests and send an updated series if it doesn't
return something suspicious.

Are you happy with the rest of the series then?

> > There's another 'major' problem that I need to bring your attention to,
> > find_lowest_rq() always returns the first CPU in the mask.
> > 
> > See discussion below for more details
> > 
> > 	https://lore.kernel.org/lkml/20200219140243.wfljmupcrwm2jelo@e107158-lin/
> > 
> > In my test because multiple tasks wakeup together they all end up going to CPU1
> > (the first fitting CPU in the mask), then just to be pushed back again. Not
> > necessarily to where they were running before.
> > 
> > Not sure if there are other situations where this could happen.
> > 
> > If we fix this problem then we can help reduce the effect of this raciness in
> > find_lowest_rq(), and end up with less ping-ponging if tasks happen to
> > wakeup/sleep at the wrong time during the scan.
> 
> Hmm, I wonder if there's a fast way of getting the next CPU from the
> current cpu the task is on. Perhaps that will help in the ping ponging.

I think there's for_each_cpu_wrap() or some variant of it that allows to start
from a random place.

This won't help if there's a single cpu in the mask. Or when
nr_waking_tasks > nr_cpus_in_lowest_rq. Still an improvement over the current
behavior nonetheless.

The other option is maybe to mark that cpu unavailable once selected so the
next search can't return it. But when do you mark it available back again?

Thanks

--
Qais Yousef

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ