linux-kernel - Re: [PATCH] sched/fair: Update rq_clock, cfs_rq before migrating for asym cpu capacity

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <20190709135054.GF3402@hirez.programming.kicks-ass.net>
Date:   Tue, 9 Jul 2019 15:50:54 +0200
From:   Peter Zijlstra <peterz@...radead.org>
To:     Chris Redpath <chris.redpath@....com>
Cc:     linux-kernel@...r.kernel.org, Ingo Molnar <mingo@...hat.com>,
        morten.rasmussen@....com, dietmar.eggemann@....com,
        Vincent Guittot <vincent.guittot@...aro.org>
Subject: Re: [PATCH] sched/fair: Update rq_clock, cfs_rq before migrating for
 asym cpu capacity

On Tue, Jul 09, 2019 at 12:57:59PM +0100, Chris Redpath wrote:
> The ancient workaround to avoid the cost of updating rq clocks in the
> middle of a migration causes some issues on asymmetric CPU capacity
> systems where we use task utilization to determine which cpus fit a task.
> On quiet systems we can inflate task util after a migration which
> causes misfit to fire and force-migrate the task.
> 
> This occurs when:
> 
> (a) a task has util close to the non-overutilized capacity limit of a
>     particular cpu (cpu0 here); and
> (b) the prev_cpu was quiet otherwise, such that rq clock is
>     sufficiently out of date (cpu1 here).
> 
> e.g.
>                               _____
> cpu0: ________________________|   |______________
> 
>                                   |<- misfit happens
>           ______                  ___         ___
> cpu1: ____|    |______________|___| |_________|
> 
>              ->|              |<- wakeup migration time
> last rq clock update
> 
> When the task util is in just the right range for the system, we can end
> up migrating an unlucky task back and forth many times until we are lucky
> and the source rq happens to be updated close to the migration time.
> 
> In order to address this, lets update both rq_clock and cfs_rq where
> this could be an issue.

Can you quantify how much of a problem this really is? It is really sad,
but this is already the second place where we take rq->lock on
migration. We worked so hard to avoid having to acquire it :/

> Signed-off-by: Chris Redpath <chris.redpath@....com>
> ---
>  kernel/sched/fair.c | 15 +++++++++++++++
>  1 file changed, 15 insertions(+)
> 
> diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
> index b798fe7ff7cd..51791db26a2a 100644
> --- a/kernel/sched/fair.c
> +++ b/kernel/sched/fair.c
> @@ -6545,6 +6545,21 @@ static void migrate_task_rq_fair(struct task_struct *p, int new_cpu)
>  		 * wakee task is less decayed, but giving the wakee more load
>  		 * sounds not bad.
>  		 */
> +		if (static_branch_unlikely(&sched_asym_cpucapacity) &&
> +			p->state == TASK_WAKING) {

nit: indent fail.

> +			/*
> +			 * On asymmetric capacity systems task util guides
> +			 * wake placement so update rq_clock and cfs_rq
> +			 */
> +			struct cfs_rq *cfs_rq = task_cfs_rq(p);
> +			struct rq *rq = task_rq(p);
> +			struct rq_flags rf;
> +
> +			rq_lock_irqsave(rq, &rf);
> +			update_rq_clock(rq);
> +			update_cfs_rq_load_avg(cfs_rq_clock_pelt(cfs_rq), cfs_rq);
> +			rq_unlock_irqrestore(rq, &rf);
> +		}
>  		remove_entity_load_avg(&p->se);
>  	}
>  
> -- 
> 2.17.1
>