lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Date:   Mon, 18 Dec 2017 11:26:10 +0100
From:   Peter Zijlstra <peterz@...radead.org>
To:     Mel Gorman <mgorman@...hsingularity.net>
Cc:     Ingo Molnar <mingo@...hat.com>,
        Matt Fleming <matt@...eblueprint.co.uk>,
        LKML <linux-kernel@...r.kernel.org>
Subject: Re: [PATCH 2/4] sched: Allow a wakee to run on the prev_cpu if it is
 idle and cache-affine with the waker

On Mon, Dec 18, 2017 at 09:43:25AM +0000, Mel Gorman wrote:
> With the commit "sched: Only migrate tasks due to interrupts if prev
> and target CPUs share cache", we no longer migrate a task from interrupt
> context if the waker does not share a CPU. However, for a normal wakeup
> from a cache-affine process, we can miss the fact that prev_cpu is idle
> and an appropriate sibling leading to unnecessary searches and migrations.
> 
> This patch reworks wake_affine to return a suitable CPU to wake on which
> may be the current or prev CPU. If wake_affine_idle returns prev due to it
> being idle then select_idle_sibling will immediately return the prev_cpu
> without searching. It's slightly mixed on dbench using ext4 with gains when the machine is lightly
> loaded and a small regression borderline on the noise when more than a node's
> worth of CPU is used.
> 
>                           4.15.0-rc3             4.15.0-rc3
>                                noirq               wakeprev
> Hmean     1        865.01 (   0.00%)      834.19 (  -3.56%)
> Hmean     2       1274.44 (   0.00%)     1353.09 (   6.17%)
> Hmean     4       1628.08 (   0.00%)     1714.82 (   5.33%)
> Hmean     8       1831.80 (   0.00%)     1855.84 (   1.31%)
> Hmean     16      2091.44 (   0.00%)     1975.40 (  -5.55%)
> Hmean     32      2430.29 (   0.00%)     2298.58 (  -5.42%)
> Hmean     64      2568.54 (   0.00%)     2536.56 (  -1.25%)
> Hmean     128     2499.28 (   0.00%)     2543.81 (   1.78%)
> Stddev    1          5.35 (   0.00%)       19.39 (-262.63%)
> Stddev    2         11.09 (   0.00%)        4.88 (  55.97%)
> Stddev    4          6.80 (   0.00%)        9.24 ( -35.93%)
> Stddev    8          9.41 (   0.00%)       28.39 (-201.82%)
> Stddev    16        20.01 (   0.00%)       44.92 (-124.56%)
> Stddev    32        44.74 (   0.00%)       50.14 ( -12.07%)
> Stddev    64        93.18 (   0.00%)       84.97 (   8.81%)
> Stddev    128      177.85 (   0.00%)      178.00 (  -0.09%)
> 
> However, system CPU usage is noticably reduced
> 
>           4.15.0-rc3  4.15.0-rc3
>                noirq    wakeprev
> User         1058.32     1077.42
> System       5729.22     5287.61
> Elapsed      1550.69     1553.09
> 
> Signed-off-by: Mel Gorman <mgorman@...hsingularity.net>
> ---
>  kernel/sched/fair.c | 70 ++++++++++++++++++++++++++++++++++++++---------------
>  1 file changed, 51 insertions(+), 19 deletions(-)
> 
> diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
> index 4a1f7d32ecf6..392e08b364bd 100644
> --- a/kernel/sched/fair.c
> +++ b/kernel/sched/fair.c
> @@ -5689,17 +5689,21 @@ static int wake_wide(struct task_struct *p)
>   * soonest. For the purpose of speed we only consider the waking and previous
>   * CPU.
>   *
> - * wake_affine_idle() - only considers 'now', it check if the waking CPU is (or
> - *			will be) idle.
> + * wake_affine_idle() - only considers 'now', it checks if a CPU that is
> + *			cache-affine with the waker is idle
> + *

This bit belongs in the previous patch I'm thinking.

> + * wake_affine_sync() - only considers 'now', it checks if the waking CPU
> + *			will be idle. Migrations to a different NUMA node
> + *			are allowed on the basis that sync wakeups imply
> + *			shared data between waker and wakee.

And it would be nice if we can rework the return value thing in a
separate patch from adding that affine_sync thing, and then munge your
3rd patch along wiht the patch that introduces it.


Did you run these patches on more than just dbench? In specific I
suppose the schbench stuff from facebook would be interesting. Also that
NAS-lu benchmark.


Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ