linux-kernel - Re: [PATCH v2 1/3] sched/fair: Fix warning if NEXT

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <3deb3671-64df-4dd9-a539-3d41009f9875@os.amperecomputing.com>
Date: Fri, 29 Nov 2024 11:21:23 +0800
From: Adam Li <adamli@...amperecomputing.com>
To: K Prateek Nayak <kprateek.nayak@....com>, peterz@...radead.org,
 mingo@...hat.com, juri.lelli@...hat.com, vincent.guittot@...aro.org
Cc: dietmar.eggemann@....com, rostedt@...dmis.org, bsegall@...gle.com,
 mgorman@...e.de, vschneid@...hat.com, linux-kernel@...r.kernel.org,
 patches@...erecomputing.com, cl@...ux.com, christian.loehle@....com,
 vineethr@...ux.ibm.com
Subject: Re: [PATCH v2 1/3] sched/fair: Fix warning if NEXT_BUDDY enabled

On 11/28/2024 3:29 PM, K Prateek Nayak wrote:
> Hello Adam,
> 
Hi Prateek,
Thanks for comments.

> On 11/27/2024 11:26 AM, Adam Li wrote:
>> Enabling NEXT_BUDDY triggers warning, and rcu stall:
>>
>> [  124.977300] cfs_rq->next->sched_delayed
> 
> I could reproduce this with a run of "perf bench sched messaging" but
> given that we hit this warning, it also means that either
> set_next_buddy() has incorrectly set a delayed entity as next buddy, or
> clear_next_buddy() did not clear a delayed entity.
> 
Yes. The logic of this patch is a delayed entity should not be set as next buddy.

> I also see PSI splats like:
> 
>     psi: inconsistent task state! task=2524:kworker/u1028:2 cpu=154 psi_flags=10 clear=14 set=0
> 
> but the PSI flags it has set "(TSK_MEMSTALL_RUNNING | TSK_MEMSTALL)" and
> the flags it is trying to clear
> "(TSK_MEMSTALL_RUNNING | TSK_MEMSTALL | TSK_RUNNING)" seem to be only
> possible if you have picked a dequeued entity for running before its
> wakeup, which is also perhaps why the "nr_running" computation goes awry
> and pick_eevdf() returns NULL (which it should never since
> pick_next_entity() is only called when rq->cfs.nr_running is > 0)
IIUC, one path for pick_eevdf() to return NULL is:
pick_eevdf():
<snip>
	if (curr && (!curr->on_rq || !entity_eligible(cfs_rq, curr)))
		curr = NULL; <--- curr is set to NULL
<snip>
found:
	if (!best || (curr && entity_before(curr, best)))
		best = curr; <--- curr and best are both NULL

	return best;  <--- return NULL

>> diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
>> index fbdca89c677f..cd1188b7f3df 100644
>> --- a/kernel/sched/fair.c
>> +++ b/kernel/sched/fair.c
>> @@ -8748,6 +8748,8 @@ static void set_next_buddy(struct sched_entity *se)
>>               return;
>>           if (se_is_idle(se))
>>               return;
>> +        if (se->sched_delayed)
>> +            return;
> 
> I tried to put a SCHED_WARN_ON() here to track where this comes from and
> seems like it is usually from attach_task() in the load balancing path
> pulling a delayed task which is set as the next buddy in
> check_preempt_wakeup_fair()
> 
> Can you please try the following diff instead of the first two patches
> and see if you still hit these warnings, stalls, and pick_eevdf()
> returning NULL?
> 
Tested. Run specjbb with NEXT_BUDDY enabled, warnings, stalls and panic disappear.

Regards,
-adam