linux-kernel - Re: [PATCH] sched/fair: Ignore percpu threads for imbalance pulls

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <20210218011305.GB23511@codeaurora.org>
Date:   Thu, 18 Feb 2021 06:43:06 +0530
From:   Pavan Kondeti <pkondeti@...eaurora.org>
To:     Valentin Schneider <valentin.schneider@....com>
Cc:     Lingutla Chandrasekhar <clingutla@...eaurora.org>,
        linux-kernel@...r.kernel.org, juri.lelli@...hat.com,
        vincent.guittot@...aro.org, dietmar.eggemann@....com,
        rostedt@...dmis.org, peterz@...radead.org, mingo@...nel.org
Subject: Re: [PATCH] sched/fair: Ignore percpu threads for imbalance pulls

On Wed, Feb 17, 2021 at 02:50:23PM +0000, Valentin Schneider wrote:
> On 17/02/21 17:38, Lingutla Chandrasekhar wrote:
> > In load balancing, when balancing group is unable to pull task
> > due to ->cpus_ptr constraints from busy group, then it sets
> > LBF_SOME_PINNED to lb env flags, as a consequence, sgc->imbalance
> > is set for its parent domain level. which makes the group
> > classified as imbalance to get help from another balancing cpu.
> >
> > Consider a 4-CPU big.LITTLE system with CPUs 0-1 as LITTLEs and
> > CPUs 2-3 as Bigs with below scenario:
> > - CPU0 doing newly_idle balancing
> > - CPU1 running percpu kworker and RT task (small tasks)
> > - CPU2 running 2 big tasks
> > - CPU3 running 1 medium task
> >
> > While CPU0 is doing newly_idle load balance at MC level, it fails to
> > pull percpu kworker from CPU1 and sets LBF_SOME_PINNED to lb env flag
> > and set sgc->imbalance at DIE level domain. As LBF_ALL_PINNED not cleared,
> > it tries to redo the balancing by clearing CPU1 in env cpus, but it don't
> > find other busiest_group, so CPU0 stops balacing at MC level without
> > clearing 'sgc->imbalance' and restart the load balacing at DIE level.
> >
> > And CPU0 (balancing cpu) finds LITTLE's group as busiest_group with group
> > type as imbalance, and Bigs that classified the level below imbalance type
> > would be ignored to pick as busiest, and the balancing would be aborted
> > without pulling any tasks (by the time, CPU1 might not have running tasks).
> >
> > It is suboptimal decision to classify the group as imbalance due to
> > percpu threads. So don't use LBF_SOME_PINNED for per cpu threads.
> >
> 
> Sounds like you've stumbled on the same thing I'm trying to fix in
> 
>   http://lore.kernel.org/r/20210128183141.28097-8-valentin.schneider@arm.com
> 
> (I'm currently working on a v2)
> 
> Now, I'd tend to agree that if we could prevent pcpu kworkers from
> interfering with load-balance altogether, that would indeed be much
> better than trying to deal with the group_imbalanced faff further down the
> line (which is what I've been doing).
> 
> > Signed-off-by: Lingutla Chandrasekhar <clingutla@...eaurora.org>
> >
> > diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
> > index 04a3ce20da67..44a05ad8c96b 100644
> > --- a/kernel/sched/fair.c
> > +++ b/kernel/sched/fair.c
> > @@ -7560,7 +7560,9 @@ int can_migrate_task(struct task_struct *p, struct lb_env *env)
> >  
> >  		schedstat_inc(p->se.statistics.nr_failed_migrations_affine);
> >  
> > -		env->flags |= LBF_SOME_PINNED;
> > +		/* Ignore percpu threads for imbalance pulls. */
> > +		if (p->nr_cpus_allowed > 1)
> > +			env->flags |= LBF_SOME_PINNED;
> >  
> >  		/*
> >  		 * Remember if this task can be migrated to any other CPU in
> 
> Unlike user tasks, pcpu kworkers have a stable affinity (with some hotplug
> quirks), so perhaps we could do this instead:
> 
> ---
> diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
> index 8a8bd7b13634..84fca350b9ae 100644
> --- a/kernel/sched/fair.c
> +++ b/kernel/sched/fair.c
> @@ -7539,6 +7539,9 @@ int can_migrate_task(struct task_struct *p, struct lb_env *env)
>  	if (throttled_lb_pair(task_group(p), env->src_cpu, env->dst_cpu))
>  		return 0;
>  
> +	if (kthread_is_per_cpu(p))
> +		return 0;
> +
>  	if (!cpumask_test_cpu(env->dst_cpu, p->cpus_ptr)) {
>  		int cpu;
>  

Looks good to me. In our testing also, the false imbalance is manifested due
to pinned kworkers.

Thanks,
Pavan
-- 
Qualcomm India Private Limited, on behalf of Qualcomm Innovation Center, Inc.
Qualcomm Innovation Center, Inc. is a member of Code Aurora Forum, a Linux Foundation Collaborative Project.