[<prev] [next>] [<thread-prev] [day] [month] [year] [list]
Message-ID: <76d1fe33-da20-47b3-9403-f3d6e664ad96@intel.com>
Date: Fri, 31 Oct 2025 23:17:08 +0800
From: "Chen, Yu C" <yu.c.chen@...el.com>
To: K Prateek Nayak <kprateek.nayak@....com>, Tim Chen
<tim.c.chen@...ux.intel.com>
CC: Vincent Guittot <vincent.guittot@...aro.org>, Juri Lelli
<juri.lelli@...hat.com>, Dietmar Eggemann <dietmar.eggemann@....com>, "Steven
Rostedt" <rostedt@...dmis.org>, Ben Segall <bsegall@...gle.com>, Mel Gorman
<mgorman@...e.de>, Valentin Schneider <vschneid@...hat.com>, "Madadi Vineeth
Reddy" <vineethr@...ux.ibm.com>, Hillf Danton <hdanton@...a.com>, "Shrikanth
Hegde" <sshegde@...ux.ibm.com>, Jianyong Wu <jianyong.wu@...look.com>,
"Yangyu Chen" <cyy@...self.name>, Tingyin Duan <tingyin.duan@...il.com>, Vern
Hao <vernhao@...cent.com>, Len Brown <len.brown@...el.com>, Aubrey Li
<aubrey.li@...el.com>, Zhao Liu <zhao1.liu@...el.com>, Chen Yu
<yu.chen.surf@...il.com>, Adam Li <adamli@...amperecomputing.com>, Tim Chen
<tim.c.chen@...el.com>, <linux-kernel@...r.kernel.org>, Peter Zijlstra
<peterz@...radead.org>, "Gautham R . Shenoy" <gautham.shenoy@....com>, "Ingo
Molnar" <mingo@...hat.com>
Subject: Re: [PATCH 15/19] sched/fair: Respect LLC preference in task
migration and detach
Hi Prateek,
On 10/31/2025 11:32 AM, K Prateek Nayak wrote:
> Hello Tim,
>
> On 10/31/2025 1:37 AM, Tim Chen wrote:
>> On Thu, 2025-10-30 at 09:49 +0530, K Prateek Nayak wrote:
>>> Hello Tim,
>>>
>>> On 10/30/2025 2:39 AM, Tim Chen wrote:
>>>>>> I suppose you are suggesting that the threshold for stopping task detachment
>>>>>> should be higher. With the above can_migrate_llc() check, I suppose we have
>>>>>> raised the threshold for stopping "task detachment"?
>>>>>
>>>>> Say the LLC is under heavy load and we only have overloaded groups.
>>>>> can_migrate_llc() would return "mig_unrestricted" since
>>>>> fits_llc_capacity() would return false.
>>>>>
>>>>> Since we are under "migrate_load", sched_balance_find_src_rq() has
>>>>> returned the CPU with the highest load which could very well be the
>>>>> CPU with with a large number of preferred LLC tasks.
>>>>>
>>>>> sched_cache_enabled() is still true and when detach_tasks() reaches
>>>>> one of these preferred llc tasks (which comes at the very end of the
>>>>> tasks list),
>>>>> we break out even if env->imbalance > 0 leaving
>>>>
>>>> Yes, but at least one task has been removed to even the load (making forward progress) and
>>>> the remaining tasks all wish to stay in the current LLC and will
>>>> preferred not to be moved. My thought was to not even all the load out
>>>> in one shot and pull more tasks out of their preferred LLC.
>>>> If the imbalance still remain, we'll come to that in the next load balance.
>>>
>>> In that case, can we spoof a LBF_ALL_PINNED for the case where we start
>>
>> In the code chunk (with fix I mentioned in last reply):
>>
>> +#ifdef CONFIG_SCHED_CACHE
>> + /*
>> + * Don't detach more tasks if the remaining tasks want
>> + * to stay. We know the remaining tasks all prefer the
>> + * current LLC, because after order_tasks_by_llc(), the
>> + * tasks that prefer the current LLC are at the tail of
>> + * the list. The inhibition of detachment is to avoid too
>> + * many tasks being migrated out of the preferred LLC.
>> + */
>> + if (sched_cache_enabled() && detached && p->preferred_llc != -1 &&
>> + llc_id(env->src_cpu) == p->preferred_llc &&
>> llc_id(env->dst_cpu) != p->preferred_llc)
>> + break;
>>
>> We have already pulled at least one task when we stop detaching because we
>> know that all the remaining tasks want to stay in it current LLC.
>> "detached" is non zero when we break. So LBF_ALL_PINNED would be cleared.
>> We will only exit the detach_tasks loop when there are truly no tasks
>> that can be moved and it is truly a LBF_ALL_PINNED case.
>
> So what I was suggesting is something like:
>
> @@ -10251,6 +10252,7 @@ static int detach_tasks(struct lb_env *env)
> unsigned long util, load;
> struct task_struct *p;
> int detached = 0;
> + bool preserve_preferred;
>
> lockdep_assert_rq_held(env->src_rq);
>
> @@ -10268,6 +10270,10 @@ static int detach_tasks(struct lb_env *env)
>
> tasks = order_tasks_by_llc(env, &env->src_rq->cfs_tasks);
>
> + preserve_preferred = sched_cache_enabled() &&
> + !(env->sd->flags & SD_SHARE_LLC) &&
Maybe also check (env->sd->child->flag & SD_SHARE_LLC) because we only
care about the domain that is the parent of a LLC domain.
> + !sd->nr_balance_failed;
> +
> while (!list_empty(tasks)) {
> /*
> * We don't want to steal all, otherwise we may be treated likewise,
> @@ -10370,16 +10376,15 @@ static int detach_tasks(struct lb_env *env)
>
> #ifdef CONFIG_SCHED_CACHE
> /*
> - * Don't detach more tasks if the remaining tasks want
> - * to stay. We know the remaining tasks all prefer the
> - * current LLC, because after order_tasks_by_llc(), the
> - * tasks that prefer the current LLC are at the tail of
> - * the list. The inhibition of detachment is to avoid too
> - * many tasks being migrated out of the preferred LLC.
> + * We've hit tasks that prefer src LLC while balancing between LLCs.
> + * If previous balances have been successful, pretend the rest of the
> + * tasks on this CPU are pinned and let the main load balancing loop
> + * find another target CPU to pull from if imbalance exists.
> */
> - if (sched_cache_enabled() && detached && p->preferred_llc != -1 &&
> - llc_id(env->src_cpu) == p->preferred_llc)
> + if (preserve_preferred && detached && llc_id(env->src_cpu) == p->preferred_llc) {
> + env->flags |= LBF_ALL_PINNED;
Let me try to understand this strategy: if all previous migrations
on this sched_domain have succeeded, it means that even if we stop
migrating tasks out of this busiest CPU from now on, it won’t
matter because the imbalance has already been mitigated. If we stop
the migration, we should look for other busy CPUs to pull some tasks
from. One concern is that setting LBF_ALL_PINNED and only clearing
env->dst_cpu will trigger a full re-scan of the entire sched_domain,
which might be costly-especially on large LLCs. We can try this to
see if it has any impact on the benchmark.
thanks,
Chenyu
> break;
> + }
> #endif
>
>
Powered by blists - more mailing lists