linux-kernel - Re: [PATCH 07/10] sched/fair: Provide can_migrate_task

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <f75ddcd5-f9a0-30c5-94e4-c4077e17ffb0@arm.com>
Date:   Fri, 26 Oct 2018 19:04:06 +0100
From:   Valentin Schneider <valentin.schneider@....com>
To:     Steve Sistare <steven.sistare@...cle.com>, mingo@...hat.com,
        peterz@...radead.org
Cc:     subhra.mazumdar@...cle.com, dhaval.giani@...cle.com,
        rohit.k.jain@...cle.com, daniel.m.jordan@...cle.com,
        pavel.tatashin@...rosoft.com, matt@...eblueprint.co.uk,
        umgwanakikbuti@...il.com, riel@...hat.com, jbacik@...com,
        juri.lelli@...hat.com, linux-kernel@...r.kernel.org
Subject: Re: [PATCH 07/10] sched/fair: Provide can_migrate_task_llc

Hi Steve,

On 22/10/2018 15:59, Steve Sistare wrote:
> Define a simpler version of can_migrate_task called can_migrate_task_llc
> which does not require a struct lb_env argument, and judges whether a
> migration from one CPU to another within the same LLC should be allowed.
> 
> Signed-off-by: Steve Sistare <steven.sistare@...cle.com>
> ---
>  kernel/sched/fair.c | 28 ++++++++++++++++++++++++++++
>  1 file changed, 28 insertions(+)
> 
> diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
> index 4acdd8d..6548bed 100644
> --- a/kernel/sched/fair.c
> +++ b/kernel/sched/fair.c
> @@ -7168,6 +7168,34 @@ int can_migrate_task(struct task_struct *p, struct lb_env *env)
>  }
>  
>  /*
> + * Return true if task @p can migrate from @rq to @dst_rq in the same LLC.
> + * No need to test for co-locality, and no need to test task_hot(), as sharing
> + * LLC provides cache warmth at that level.

I was thinking that perhaps we could have scenarios where some rq's
keep stealing tasks off of each other and we end up circulating tasks 
between CPUs. Now, that would only happen if we had a handful of tasks
with a very tiny period, and I'm not familiar with (real) such hyperactive
workloads similar to those generated by hackbench where that could happen.

In short, I wonder if we should have task_hot() in there. Drawing a
parallel with load_balance(), even if load-balancing is happening between
rqs of the same LLC, we do go check task_hot(). Have you already experimented
with adding a task_hot() check in here?

I've run some iterations of hackbench (hackbench 2 process 100000) to
investigate this task bouncing, but I didn't really see any of it. That was
just a 4+4 big.LITTLE system though, I'll try to get numbers on a system
with more CPUs.

----->8-----

activations: # of task activations (task starts running)
cpu_migrations: # of activations where cpu != prev_cpu
% stats are percentiles

- STEAL:

  | stat  | cpu_migrations | activations |
  |-------+----------------+-------------|
  | count |    2005.000000 | 2005.000000 |
  | mean  |      16.244888 |  290.608479 |
  | std   |      38.963138 |  253.003528 |
  | min   |       0.000000 |    3.000000 |
  | 50%   |       3.000000 |  239.000000 |
  | 75%   |       8.000000 |  436.000000 |
  | 90%   |      45.000000 |  626.000000 |
  | 99%   |     188.960000 | 1073.000000 |
  | max   |     369.000000 | 1417.000000 |

- NO_STEAL:

  | stat  | cpu_migrations | activations |
  |-------+----------------+-------------|
  | count |    2005.000000 | 2005.000000 |
  | mean  |      15.260848 |  297.860848 |
  | std   |      46.331890 |  253.210813 |
  | min   |       0.000000 |    3.000000 |
  | 50%   |       3.000000 |  252.000000 |
  | 75%   |       7.000000 |  444.000000 |
  | 90%   |      32.600000 |  643.600000 |
  | 99%   |     214.880000 | 1127.520000 |
  | max   |     467.000000 | 1547.000000 |

----->8-----

Otherwise, my only other concern at the moment is that since stealing
doesn't care about load, we could steal a task that would cause a big
imbalance, which wouldn't have happened with a call to load_balance().

I don't think this can be triggered with a symmetrical workload like
hackbench, so I'll go explore something else.