linux-kernel - Re: [PATCH 07/19] sched/fair: Track LLC-preferred tasks per runqueue

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <f218ef23-3350-488f-a0ea-eb902475021f@intel.com>
Date: Wed, 29 Oct 2025 20:48:11 +0800
From: "Chen, Yu C" <yu.c.chen@...el.com>
To: K Prateek Nayak <kprateek.nayak@....com>, Tim Chen
	<tim.c.chen@...ux.intel.com>
CC: Vincent Guittot <vincent.guittot@...aro.org>, Juri Lelli
	<juri.lelli@...hat.com>, Dietmar Eggemann <dietmar.eggemann@....com>, "Steven
 Rostedt" <rostedt@...dmis.org>, Ben Segall <bsegall@...gle.com>, Mel Gorman
	<mgorman@...e.de>, Valentin Schneider <vschneid@...hat.com>, "Madadi Vineeth
 Reddy" <vineethr@...ux.ibm.com>, Hillf Danton <hdanton@...a.com>, "Shrikanth
 Hegde" <sshegde@...ux.ibm.com>, Jianyong Wu <jianyong.wu@...look.com>,
	"Yangyu Chen" <cyy@...self.name>, Tingyin Duan <tingyin.duan@...il.com>, Vern
 Hao <vernhao@...cent.com>, Len Brown <len.brown@...el.com>, Aubrey Li
	<aubrey.li@...el.com>, Zhao Liu <zhao1.liu@...el.com>, Chen Yu
	<yu.chen.surf@...il.com>, Adam Li <adamli@...amperecomputing.com>, Tim Chen
	<tim.c.chen@...el.com>, <linux-kernel@...r.kernel.org>, Peter Zijlstra
	<peterz@...radead.org>, Ingo Molnar <mingo@...hat.com>, "Gautham R . Shenoy"
	<gautham.shenoy@....com>
Subject: Re: [PATCH 07/19] sched/fair: Track LLC-preferred tasks per runqueue

On 10/29/2025 12:32 PM, K Prateek Nayak wrote:
> Hello Tim,
> 
> On 10/28/2025 9:16 PM, Tim Chen wrote:
>> On Tue, 2025-10-28 at 23:15 +0800, Chen, Yu C wrote:
>>> On 10/27/2025 2:04 PM, K Prateek Nayak wrote:
>>>> Hello Tim,
>>>>
>>>> On 10/11/2025 11:54 PM, Tim Chen wrote:
>>>>> @@ -3999,6 +4038,7 @@ account_entity_enqueue(struct cfs_rq *cfs_rq, struct sched_entity *se)
>>>>>    		struct rq *rq = rq_of(cfs_rq);
>>>>>    
>>>>>    		account_numa_enqueue(rq, task_of(se));
>>>>> +		account_llc_enqueue(rq, task_of(se));
>>>>>    		list_add(&se->group_node, &rq->cfs_tasks);
>>>>>    	}
>>>>>    	cfs_rq->nr_queued++;
>>>>> @@ -4010,9 +4050,14 @@ account_entity_dequeue(struct cfs_rq *cfs_rq, struct sched_entity *se)
>>>>>    	update_load_sub(&cfs_rq->load, se->load.weight);
>>>>>    	if (entity_is_task(se)) {
>>>>>    		account_numa_dequeue(rq_of(cfs_rq), task_of(se));
>>>>> +		account_llc_dequeue(rq_of(cfs_rq), task_of(se));
>>>>>    		list_del_init(&se->group_node);
>>>>>    	}
>>>>>    	cfs_rq->nr_queued--;
>>>>> +
>>>>> +	/* safeguard to clear the cache aware data */
>>>>> +	if (!parent_entity(se) && !cfs_rq->nr_queued)
>>>>> +		reset_llc_stats(rq_of(cfs_rq));
>>>>
>>>> Instead of relying on reset_llc_stats() hack, I think a better approach
>>>> would be to have a "p->se.llc_sched_active" flag similar to how uclamp
>>>> has "uc_se->active" and we set this in account_llc_enqueue() which will
>>>> still check for sched_cache_enabled() but account_llc_dequeue() would
>>>> only check for "p->se.llc_sched_active" to decrement the stats and then
>>>> unset the flag.
>>>>
>>>> That way, we cannot have an imbalanced accounting. Thoughts?
>>>>
>>>
>>> I suppose what you mean is to avoid the race condition between
>>> enabling sched_cache and EQ/DE_LLC, similar to uclamp:
>>>
>>>           enqueue(taskA)
>>>           // sched_cache gets enabled
>>>           enqueue(taskB)
>>>           dequeue(taskA)
>>>           // Must not decrement rq->llc_pref for taskA
>>
>> For this case, task A is already on rq when sched cache get
>> enabled. But task A's preferred_llc is still -1.
>>
>> If we dequeue it while its preferred_llc is still -1, it won't
>> affect rq->llc_pref.
>>
>> If we change its preferred_llc to llc_i before we dequeue it,
>> then rq->llc_pref[llc_i] will be incremented first.
>>
>> Then when we dequeue task A, we will decrement it. We are
>> still accounting rq->llc_pref[llc_i] correctly with current
>> code.
> 
> So what I really disliked was having reset_llc_stats() to
> reset the stat but looking at it again, that too is guarded
> by sched_cache_enabled() counter so I think the counters can
> still go out of balance if:
> 
>      /* Cache aware scheduling enabled */
>      enqueue(TaskA) /* nr_llc_running = 1 */
>      enqueue(TaskB) /* nr_llc_running = 2 */
>      enqueue(TaskC) /* nr_llc_running = 3 */
>      dequeue(TaskA) /* nr_llc_running = 2 */
> 
>      /* Cache aware scheduling disabled */
> 
>     dequeue(TaskB) /* nr_llc_running = 2 */

If we introduce the mechanism you suggested previously:
"enable p->llc_sched_active in account_llc_enqueue(), which will
still check sched_cache_enabled(), but account_llc_dequeue() only
checks p->llc_sched_active to decrement the stats. Then the above
scenario might be covered: dequeue(TaskB) will decrease nr_llc_running
even if cache aware is disabled. Another idea is to reset all CPU
statistics when cache aware scheduling is disabled at runtime, this
might also avoid several race conditions, for example cpu hotplug vs
cache aware scheduling.

thanks,
Chenyu