[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <3deb3671-64df-4dd9-a539-3d41009f9875@os.amperecomputing.com>
Date: Fri, 29 Nov 2024 11:21:23 +0800
From: Adam Li <adamli@...amperecomputing.com>
To: K Prateek Nayak <kprateek.nayak@....com>, peterz@...radead.org,
mingo@...hat.com, juri.lelli@...hat.com, vincent.guittot@...aro.org
Cc: dietmar.eggemann@....com, rostedt@...dmis.org, bsegall@...gle.com,
mgorman@...e.de, vschneid@...hat.com, linux-kernel@...r.kernel.org,
patches@...erecomputing.com, cl@...ux.com, christian.loehle@....com,
vineethr@...ux.ibm.com
Subject: Re: [PATCH v2 1/3] sched/fair: Fix warning if NEXT_BUDDY enabled
On 11/28/2024 3:29 PM, K Prateek Nayak wrote:
> Hello Adam,
>
Hi Prateek,
Thanks for comments.
> On 11/27/2024 11:26 AM, Adam Li wrote:
>> Enabling NEXT_BUDDY triggers warning, and rcu stall:
>>
>> [ 124.977300] cfs_rq->next->sched_delayed
>
> I could reproduce this with a run of "perf bench sched messaging" but
> given that we hit this warning, it also means that either
> set_next_buddy() has incorrectly set a delayed entity as next buddy, or
> clear_next_buddy() did not clear a delayed entity.
>
Yes. The logic of this patch is a delayed entity should not be set as next buddy.
> I also see PSI splats like:
>
> psi: inconsistent task state! task=2524:kworker/u1028:2 cpu=154 psi_flags=10 clear=14 set=0
>
> but the PSI flags it has set "(TSK_MEMSTALL_RUNNING | TSK_MEMSTALL)" and
> the flags it is trying to clear
> "(TSK_MEMSTALL_RUNNING | TSK_MEMSTALL | TSK_RUNNING)" seem to be only
> possible if you have picked a dequeued entity for running before its
> wakeup, which is also perhaps why the "nr_running" computation goes awry
> and pick_eevdf() returns NULL (which it should never since
> pick_next_entity() is only called when rq->cfs.nr_running is > 0)
IIUC, one path for pick_eevdf() to return NULL is:
pick_eevdf():
<snip>
if (curr && (!curr->on_rq || !entity_eligible(cfs_rq, curr)))
curr = NULL; <--- curr is set to NULL
<snip>
found:
if (!best || (curr && entity_before(curr, best)))
best = curr; <--- curr and best are both NULL
return best; <--- return NULL
>> diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
>> index fbdca89c677f..cd1188b7f3df 100644
>> --- a/kernel/sched/fair.c
>> +++ b/kernel/sched/fair.c
>> @@ -8748,6 +8748,8 @@ static void set_next_buddy(struct sched_entity *se)
>> return;
>> if (se_is_idle(se))
>> return;
>> + if (se->sched_delayed)
>> + return;
>
> I tried to put a SCHED_WARN_ON() here to track where this comes from and
> seems like it is usually from attach_task() in the load balancing path
> pulling a delayed task which is set as the next buddy in
> check_preempt_wakeup_fair()
>
> Can you please try the following diff instead of the first two patches
> and see if you still hit these warnings, stalls, and pick_eevdf()
> returning NULL?
>
Tested. Run specjbb with NEXT_BUDDY enabled, warnings, stalls and panic disappear.
Regards,
-adam
Powered by blists - more mailing lists