lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Date:   Tue, 5 Oct 2021 12:23:46 +0200
From:   Peter Zijlstra <peterz@...radead.org>
To:     Mel Gorman <mgorman@...hsingularity.net>
Cc:     Vincent Guittot <vincent.guittot@...aro.org>,
        Mike Galbraith <efault@....de>, Ingo Molnar <mingo@...nel.org>,
        Valentin Schneider <valentin.schneider@....com>,
        Aubrey Li <aubrey.li@...ux.intel.com>,
        Barry Song <song.bao.hua@...ilicon.com>,
        Srikar Dronamraju <srikar@...ux.vnet.ibm.com>,
        LKML <linux-kernel@...r.kernel.org>
Subject: Re: [PATCH 2/2] sched/fair: Scale wakeup granularity relative to
 nr_running

On Wed, Sep 22, 2021 at 06:38:53PM +0100, Mel Gorman wrote:

> 
> I'm not seeing an alternative suggestion that could be turned into
> an implementation. The current value for sched_wakeup_granularity
> was set 12 years ago was exposed for tuning which is no longer

(it has always been SCHED_DEBUG)

> the case. The intent was to allow some dynamic adjustment between
> sysctl_sched_wakeup_granularity and sysctl_sched_latency to reduce
> over-scheduling in the worst case without disabling preemption entirely
> (which the first version did).
> 
> Should we just ignore this problem and hope it goes away or just let
> people keep poking silly values into debugfs via tuned?

People are going to do stupid no matter what (and tuned is very much
included there -- poking random numbers in debug interfaces without
doing analysis on what the workload does and needs it just wasting
everybodies time. RHT really should know better than that).

I'm also thinking that adding more heuristics isn't going to improve the
situation lots.

For the past # of years people have been talking about extendng the task
model for SCHED_NORMAL, latency_nice was one such proposal.

If we really want to fix this proper, and not make a bigger mess of
things, we should look at all these various workloads and identify
*what* specifically they want and *why*.

Once we have this enumerated, we can look at what exactly we can provide
and how to structure the interface.

The extention must be hint only, we should be free to completely ignore
it.

The things I can think of off the top of my head are:

- tail latency; prepared to waste time to increase the odds of running
  sooner. Possible effect: have this task always do a full
  select_idle_sibling() scan.

  (there's also the anti case, which I'm not sure how to enumerate,
  basically they don't want select_idle_sibling(), just place the task
  wherever)

- non-interactive; doesn't much care about wakeup latency; can suffer
  packing?

- background; (implies non-interactive?) doesn't much care about
  completion time either, just cares about efficiency

- interactive; cares much about wakeup-latency; cares less about
  throughput.

- (energy) efficient; cares more about energy usage than performance


Now, we already have SCHED_BATCH that completely kills wakeup preemption
and SCHED_IDLE lets everything preempt. I know SCHED_IDLE is in active
use (because we see patches for that), I don't think people use
SCHED_BATCH much, should they?


Now, I think a number of contraints are going to be mutually exclusive,
and if we get such tasks combined there's just nothing much we can do
about it (and since it's all hints, we good).

We should also not make the load-balance condition too complicated, we
don't want it to try and (badly) solve a large set of constraints.
Again, if possible respect the hints, otherwise tough luck.

Powered by blists - more mailing lists