linux-kernel - Re: [PATCH 3/5] sched/fair: rework load

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <aff49e6a-b4d3-baae-9124-3d5bb5abdbfe@arm.com>
Date:   Fri, 26 Jul 2019 15:01:54 +0100
From:   Valentin Schneider <valentin.schneider@....com>
To:     Vincent Guittot <vincent.guittot@...aro.org>
Cc:     linux-kernel <linux-kernel@...r.kernel.org>,
        Ingo Molnar <mingo@...hat.com>,
        Peter Zijlstra <peterz@...radead.org>,
        Quentin Perret <quentin.perret@....com>,
        Dietmar Eggemann <dietmar.eggemann@....com>,
        Morten Rasmussen <Morten.Rasmussen@....com>,
        Phil Auld <pauld@...hat.com>
Subject: Re: [PATCH 3/5] sched/fair: rework load_balance

On 26/07/2019 13:30, Vincent Guittot wrote:
>> We can avoid this entirely by going straight for an active balance when
>> we are balancing misfit tasks (which we really should be doing TBH).
> 
> but your misfit task might not be the running one anymore when
> load_balance effectively happens
> 

We could add a check in the active balance bits to make sure the current
task is still a misfit task (albeit not necessarily the one we wanted to
migrate, since we can't really differentiate them).

Misfit migration shouldn't go through detach_tasks() - if the misfit task
is still the running task, we want to go for active balance anyway, and if
it's not the running task anymore then we should try to detect it and give
up - there's not much else we can do. From a rq's perspective, a task can
only ever be misfit if it's currently running.

The current code can totally active balance the wrong task if the load
balancer saw a misfit task in update_sd_lb_stats() but it moved away in the
meantime, so making misfit balancing skip detach_tasks() would be a straight
improvement IMO: we can still get some active balance collaterals, but at
least we don't wrongfully detach a non-running task that happened to have
the right load shape.

>>
>> If we *really* want to be surgical about misfit migration, we could track
>> the task itself via a pointer to its task_struct, but IIRC Morten
> 
> I thought about this but task can have already die at that time and
> the pointer is no more relevant.
> Or we should parse the list of task still attached to the cpu and
> compare them with the saved pointer but then it's not scalable and
> will consume a lot of time
> 
>> purposely avoided this due to all the fun synchronization issues that
>> come with it.
>>
>> With that out of the way, I still believe we should maximize the migrated
>> load when dealing with several misfit tasks - there's not much else you can
>> look at anyway to make a decision.
> 
> But you can easily select a task that is not misfit so what is the best/worst ?
> select a fully wrong task or at least one of the real misfit tasks
> 

Utilization can't help you select a "best" misfit task amongst several 
since the utilization of misfit tasks is by definition meaningless.

I do agree that looking at utilization when detaching the task prevents
picking a non-misfit task, but those are two different issues:

1) Among several rqs/groups with misfit tasks, pick the busiest one
   (this is where I'm arguing we should use load)
2) When detaching a task, make sure it's a misfit task (this is where
   you're arguing we should use utilization).

> I'm fine to go back and use load instead of util but it's not robust IMO.
> 

[...]
>> What if there is spare capacity but no idle CPUs? In scenarios like this
>> we should balance utilization. We could wait for a newidle balance to
> 
> why should we balance anything ? all tasks have already enough running time.
> It's better to wait for a cpu to become idle instead of trying to
> predict which one will be idle first and migrate task uselessly
> because other tasks can easily wakeup in the meantime
> 

I probably need to play with this and create some synthetic use cases.

What I had in mind is something like 2 CPUs, CPU0 running a 20% task and
CPU1 running 6 10% tasks.

If CPU0 runs the load balancer, balancing utilization would mean pulling
2 tasks from CPU1 to reach the domain-average of 40%. The good side of this
is that we could save ourselves from running some newidle balances, but
I'll admit that's all quite "finger in the air".

>> happen, but it'd be a shame to repeatedly do this when we could
>> preemptively balance utilization.
>>