linux-kernel - Re: [PATCH 2/2] sched/fair: Scale wakeup granularity relative to nr

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite for Android: free password hash cracker in your pocket

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <22e7133d674b82853a5ee64d3f5fc6b35a8e18d6.camel@gmx.de>
Date:   Tue, 21 Sep 2021 05:52:32 +0200
From:   Mike Galbraith <efault@....de>
To:     Mel Gorman <mgorman@...hsingularity.net>,
        Peter Zijlstra <peterz@...radead.org>
Cc:     Ingo Molnar <mingo@...nel.org>,
        Vincent Guittot <vincent.guittot@...aro.org>,
        Valentin Schneider <valentin.schneider@....com>,
        Aubrey Li <aubrey.li@...ux.intel.com>,
        Barry Song <song.bao.hua@...ilicon.com>,
        Srikar Dronamraju <srikar@...ux.vnet.ibm.com>,
        LKML <linux-kernel@...r.kernel.org>
Subject: Re: [PATCH 2/2] sched/fair: Scale wakeup granularity relative to
 nr_running

On Mon, 2021-09-20 at 15:26 +0100, Mel Gorman wrote:
>
> This patch scales the wakeup granularity based on the number of running
> tasks on the CPU up to a max of 8ms by default.  The intent is to
> allow tasks to run for longer while overloaded so that some tasks may
> complete faster and reduce the degree a domain is overloaded. Note that
> the TuneD throughput-performance profile allows up to 15ms but there
> is no explanation why such a long period was necessary so this patch is
> conservative and uses the value that check_preempt_wakeup() also takes
> into account.  An internet search for instances where this parameter are
> tuned to high values offer either no explanation or a broken one.
>
> This improved hackbench on a range of machines when communicating via
> pipes (sockets show little to no difference). For a 2-socket CascadeLake
> machine, the results were

Twiddling wakeup preemption based upon the performance of a fugly fork
bomb seems like a pretty bad idea to me.

Preemption does rapidly run into diminishing return as load climbs for
a lot of loads, but as you know, it's a rather sticky wicket because
even when over-committed, preventing light control threads from slicing
through (what can be a load's own work crew of) hogs can seriously
injure performance.

<snip>

> @@ -7044,10 +7045,22 @@ balance_fair(struct rq *rq, struct task_struct *prev, struct rq_flags *rf)
>  }
>  #endif /* CONFIG_SMP */
>  
> -static unsigned long wakeup_gran(struct sched_entity *se)
> +static unsigned long
> +wakeup_gran(struct cfs_rq *cfs_rq, struct sched_entity *se)
>  {
>         unsigned long gran = sysctl_sched_wakeup_granularity;
>  
> +       /*
> +        * If rq is specified, scale the granularity relative to the number
> +        * of running tasks but no more than 8ms with default
> +        * sysctl_sched_wakeup_granularity settings. The wakeup gran
> +        * reduces over-scheduling but if tasks are stacked then the
> +        * domain is likely overloaded and over-scheduling may
> +        * prolong the overloaded state.
> +        */
> +       if (cfs_rq && cfs_rq->nr_running > 1)
> +               gran *= min(cfs_rq->nr_running >> 1, sched_nr_latency);
> +

Maybe things have changed while I wasn't watching closely, but...

The scaled up tweakables on my little quad desktop box:
sched_nr_latency = 8
sched_wakeup_granularity = 4ms
sched_latency = 24ms

Due to the FAIR_SLEEPERS feature, a task can only receive a max of
sched_latency/2 sleep credit, ie the delta between waking sleeper and
current is clipped to a max of 12 virtual ms, so the instant our
preempt threshold reaches 12.000ms, by human booboo or now 3 runnable
tasks with this change, wakeup preemption is completely disabled, or?

	-Mike