lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <ba60262d15891702cae0d59122388c6a18caaf53.camel@gmx.de>
Date:   Thu, 23 Sep 2021 03:47:30 +0200
From:   Mike Galbraith <efault@....de>
To:     Vincent Guittot <vincent.guittot@...aro.org>,
        Mel Gorman <mgorman@...hsingularity.net>
Cc:     Peter Zijlstra <peterz@...radead.org>,
        Ingo Molnar <mingo@...nel.org>,
        Valentin Schneider <valentin.schneider@....com>,
        Aubrey Li <aubrey.li@...ux.intel.com>,
        Barry Song <song.bao.hua@...ilicon.com>,
        Srikar Dronamraju <srikar@...ux.vnet.ibm.com>,
        LKML <linux-kernel@...r.kernel.org>
Subject: Re: [PATCH 2/2] sched/fair: Scale wakeup granularity relative to
 nr_running

On Wed, 2021-09-22 at 20:22 +0200, Vincent Guittot wrote:
> On Wed, 22 Sept 2021 at 19:38, Mel Gorman <mgorman@...hsingularity.net> wrote:
> >
> >
> > I'm not seeing an alternative suggestion that could be turned into
> > an implementation. The current value for sched_wakeup_granularity
> > was set 12 years ago was exposed for tuning which is no longer
> > the case. The intent was to allow some dynamic adjustment between
> > sysctl_sched_wakeup_granularity and sysctl_sched_latency to reduce
> > over-scheduling in the worst case without disabling preemption entirely
> > (which the first version did).

I don't think those knobs were ever _intended_ for general purpose
tuning, but they did get used that way by some folks.

> >
> > Should we just ignore this problem and hope it goes away or just let
> > people keep poking silly values into debugfs via tuned?
>
> We should certainly not add a bandaid because people will continue to
> poke silly value at the end. And increasing
> sysctl_sched_wakeup_granularity based on the number of running threads
> is not the right solution.

Watching my desktop box stack up large piles of very short running
threads, I agree, instantaneous load looks like a non-starter.

>  According to the description of your
> problem that the current task doesn't get enough time to move forward,
> sysctl_sched_min_granularity should be part of the solution. Something
> like below will ensure that current got a chance to move forward

Nah, progress is guaranteed, the issue is a zillion very similar short
running threads preempting each other with no win to be had, thus
spending cycles in the scheduler that are utterly wasted.  It's a valid
issue, trouble is teaching the scheduler to recognize that situation
without mucking up other situations where there IS a win for even very
short running threads say, doing a synchronous handoff; preemption is
cheaper than scheduling off if the waker is going be awakened again in
very short order.

> diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
> index 9bf540f04c2d..39d4e4827d3d 100644
> --- a/kernel/sched/fair.c
> +++ b/kernel/sched/fair.c
> @@ -7102,6 +7102,7 @@ static void check_preempt_wakeup(struct rq *rq,
> struct task_struct *p, int wake_
>         int scale = cfs_rq->nr_running >= sched_nr_latency;
>         int next_buddy_marked = 0;
>         int cse_is_idle, pse_is_idle;
> +       unsigned long delta_exec;
>
>         if (unlikely(se == pse))
>                 return;
> @@ -7161,6 +7162,13 @@ static void check_preempt_wakeup(struct rq *rq,
> struct task_struct *p, int wake_
>                 return;
>
>         update_curr(cfs_rq_of(se));
> +       delta_exec = se->sum_exec_runtime - se->prev_sum_exec_runtime;
> +       /*
> +        * Ensure that current got a chance to move forward
> +        */
> +       if (delta_exec < sysctl_sched_min_granularity)
> +               return;
> +
>         if (wakeup_preempt_entity(se, pse) == 1) {
>                 /*
>                  * Bias pick_next to pick the sched entity that is

Yikes!  If you do that, you may as well go the extra nanometer and rip
wakeup preemption out entirely, same result, impressive diffstat.

	-Mike

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ