linux-kernel - Re: [RFC PATCH] sched/fair: Fix impossible migrate

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <20230721220430.qv6eqo4dosfrsilo@airbuntu>
Date:   Fri, 21 Jul 2023 23:04:30 +0100
From:   Qais Yousef <qyousef@...alina.io>
To:     Vincent Guittot <vincent.guittot@...aro.org>
Cc:     Ingo Molnar <mingo@...nel.org>,
        Peter Zijlstra <peterz@...radead.org>,
        Dietmar Eggemann <dietmar.eggemann@....com>,
        linux-kernel@...r.kernel.org
Subject: Re: [RFC PATCH] sched/fair: Fix impossible migrate_util scenario in
 load balance

On 07/21/23 15:52, Vincent Guittot wrote:
> Le vendredi 21 juil. 2023 à 11:57:11 (+0100), Qais Yousef a écrit :
> > On 07/20/23 14:31, Vincent Guittot wrote:
> > 
> > > I was trying to reproduce the behavior but I was failing until I
> > > realized that this code path is used when the 2 groups are not sharing
> > > their cache. Which topology do you use ? I thought that dynamiQ and
> > > shares cache between all 8 cpus was the norm for arm64 embedded device
> > > now
> > 
> > Hmm good question. phantom domains didn't die which I think is what causing
> > this. I can look if this is for a good reason or just historical artifact.
> > 
> > > 
> > > Also when you say "the little cluster capacity is very small nowadays
> > > (around 200 or less)", it is the capacity of 1 core or the cluster ?
> > 
> > I meant one core. So in my case all the littles were busy except for one that
> > was mostly idle and never pulled a task from mid where two tasks were stuck on
> > a CPU there. And the logs I have added were showing me that the env->imbalance
> > was on 150+ range but the task we pull was in the 350+ range.
> 
> I'm not able to reproduce your problem with v6.5-rc2 and without phantom domain,
> which is expected because we share cache and weight is 1 so we use the path
> 
> 		if (busiest->group_weight == 1 || sds->prefer_sibling) {
> 			/*
> 			 * When prefer sibling, evenly spread running tasks on
> 			 * groups.
> 			 */
> 			env->migration_type = migrate_task;
> 			env->imbalance = sibling_imbalance(env, sds, busiest, local);
> 		} else {
> 

I missed the deps on topology. So yes you're right, this needs to be addressed
first. I seem to remember Sudeep merged some stuff that will flatten these
topologies.

Let me chase this topology thing out first.


Thanks!

--
Qais Yousef

> > 
> > I should have mentioned that I'm on 5.15 - sorry with Android it's hard to run
> > mainline on products :( But this code as far as I can tell hasn't changed much.
> > 
> > I can try to find something that runs mainline and reproduce there if you think
> > my description of the problem is not clear or applicable.
> > 
> > 
> > Thanks
> > 
> > --
> > Qais Yousef