[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <6683a7dc-b83b-489c-bdfc-ad225ab816c6@amd.com>
Date: Fri, 29 Nov 2024 23:16:47 +0530
From: K Prateek Nayak <kprateek.nayak@....com>
To: Peter Zijlstra <peterz@...radead.org>
CC: Adam Li <adamli@...amperecomputing.com>, <mingo@...hat.com>,
<juri.lelli@...hat.com>, <vincent.guittot@...aro.org>,
<dietmar.eggemann@....com>, <rostedt@...dmis.org>, <bsegall@...gle.com>,
<mgorman@...e.de>, <vschneid@...hat.com>, <linux-kernel@...r.kernel.org>,
<patches@...erecomputing.com>, <cl@...ux.com>, <christian.loehle@....com>,
<vineethr@...ux.ibm.com>
Subject: Re: [PATCH v2 1/3] sched/fair: Fix warning if NEXT_BUDDY enabled
Hello Peter,
On 11/29/2024 3:25 PM, Peter Zijlstra wrote:
> On Thu, Nov 28, 2024 at 12:59:54PM +0530, K Prateek Nayak wrote:
>
>> I tried to put a SCHED_WARN_ON() here to track where this comes from and
>> seems like it is usually from attach_task() in the load balancing path
>> pulling a delayed task which is set as the next buddy in
>> check_preempt_wakeup_fair()
>>
>> Can you please try the following diff instead of the first two patches
>> and see if you still hit these warnings, stalls, and pick_eevdf()
>> returning NULL?
>>
>> diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
>> index ff7cae9274c5..61e74eb5af22 100644
>> --- a/kernel/sched/fair.c
>> +++ b/kernel/sched/fair.c
>> @@ -5478,6 +5478,7 @@ dequeue_entity(struct cfs_rq *cfs_rq, struct sched_entity *se, int flags)
>> bool sleep = flags & DEQUEUE_SLEEP;
>> update_curr(cfs_rq);
>> + clear_buddies(cfs_rq, se);
>> if (flags & DEQUEUE_DELAYED) {
>> SCHED_WARN_ON(!se->sched_delayed);
>> @@ -5520,8 +5521,6 @@ dequeue_entity(struct cfs_rq *cfs_rq, struct sched_entity *se, int flags)
>> update_stats_dequeue_fair(cfs_rq, se, flags);
>> - clear_buddies(cfs_rq, se);
>> -
>> update_entity_lag(cfs_rq, se);
>> if (sched_feat(PLACE_REL_DEADLINE) && !sleep) {
>> se->deadline -= se->vruntime;
>
> So this puts the clear_buddies() before the whole delayed thing, and
> should be sufficient afaict, no?
>
>> @@ -8767,7 +8766,7 @@ static void check_preempt_wakeup_fair(struct rq *rq, struct task_struct *p, int
>> if (unlikely(throttled_hierarchy(cfs_rq_of(pse))))
>> return;
>> - if (sched_feat(NEXT_BUDDY) && !(wake_flags & WF_FORK)) {
>> + if (sched_feat(NEXT_BUDDY) && !(wake_flags & WF_FORK) && !pse->sched_delayed) {
>> set_next_buddy(pse);
>> }
>
> But then this should never happen, which is after a wakeup, p and the
> whole hierarchy up should be runnable at this point.
>
> Or should I go find more wake-up juice and try again :-)
The motivation there was this splat from my testing:
Kernel panic - not syncing: Encountered delayed entity in check_preempt_wakeup_fair()
CPU: 190 UID: 1000 PID: 4215 Comm: sched-messaging Tainted: G W 6.12.0-rc4-test+ #156
Tainted: [W]=WARN
Hardware name: Dell Inc. PowerEdge R6525/024PW1, BIOS 2.7.3 03/30/2022
Call Trace:
<TASK>
panic+0x399/0x3f0
check_preempt_wakeup_fair+0x21b/0x220
wakeup_preempt+0x64/0x70
sched_balance_rq+0x970/0x12e0
? update_load_avg+0x7e/0x7e0
sched_balance_newidle+0x1e2/0x490
pick_next_task_fair+0x32/0x3b0
__pick_next_task+0x3d/0x1a0
__schedule+0x1da/0x1540
? srso_alias_return_thunk+0x5/0xfbef5
? aa_file_perm+0x121/0x4d0
schedule+0x28/0x110
pipe_read+0x345/0x470
? __pfx_autoremove_wake_function+0x10/0x10
vfs_read+0x2f1/0x330
ksys_read+0xaf/0xe0
do_syscall_64+0x6f/0x110
? srso_alias_return_thunk+0x5/0xfbef5
? current_time+0x31/0xf0
? srso_alias_return_thunk+0x5/0xfbef5
? atime_needs_update+0x9c/0x110
? srso_alias_return_thunk+0x5/0xfbef5
? touch_atime+0x1e/0x100
? srso_alias_return_thunk+0x5/0xfbef5
? pipe_read+0x3ec/0x470
? srso_alias_return_thunk+0x5/0xfbef5
? vfs_read+0x2f1/0x330
? srso_alias_return_thunk+0x5/0xfbef5
? ksys_read+0xcc/0xe0
? srso_alias_return_thunk+0x5/0xfbef5
? syscall_exit_to_user_mode+0x51/0x1a0
? srso_alias_return_thunk+0x5/0xfbef5
? do_syscall_64+0x7b/0x110
? ksys_read+0xcc/0xe0
? srso_alias_return_thunk+0x5/0xfbef5
? syscall_exit_to_user_mode+0x51/0x1a0
? srso_alias_return_thunk+0x5/0xfbef5
? do_syscall_64+0x7b/0x110
? srso_alias_return_thunk+0x5/0xfbef5
? ksys_read+0xcc/0xe0
? do_syscall_64+0x7b/0x110
? srso_alias_return_thunk+0x5/0xfbef5
? syscall_exit_to_user_mode+0x51/0x1a0
? srso_alias_return_thunk+0x5/0xfbef5
? do_syscall_64+0x7b/0x110
? srso_alias_return_thunk+0x5/0xfbef5
? do_syscall_64+0x7b/0x110
entry_SYSCALL_64_after_hwframe+0x76/0x7e
RIP: 0033:0x7fa8b851481c
Code: ec 28 48 89 54 24 18 48 89 74 24 10 89 7c 24 08 e8 e9 c1 f7 ff 48 8b 54 24 18 48 8b 74 24 10 41 89 c0 8b 7c 24 08 31 c0 0f 05 <48> 3d 00 f0 ff ff 77 34 44 89 c7 48 89 44 24 08 e8 2f c2 f7 ff 48
RSP: 002b:00007fa8ae0d8cb0 EFLAGS: 00000246 ORIG_RAX: 0000000000000000
RAX: ffffffffffffffda RBX: 0000000000000000 RCX: 00007fa8b851481c
RDX: 0000000000000064 RSI: 00007fa8ae0d8cf0 RDI: 0000000000000027
RBP: 00007fa8ae0d8d90 R08: 0000000000000000 R09: 00007ffee0fdf97f
R10: 0000000000000008 R11: 0000000000000246 R12: 0000000000000064
R13: 00007fa8ae0d8cf0 R14: 00000000000005aa R15: 000055c2e80accb0
</TASK>
--
newidle balance pulls a delayed entity which goes through the
check_preempt_wakeup_fair() path in attach_task() and is set as the next
buddy. On a second thought this is perhaps not required since even if
this delayed entity is picked, it'll go thorough a full dequeue and the
clear_buddies() change above should take care of it.
>
>
>
> Anyway.. I'm sure I started a patch series cleaning up the whole next
> buddy thing months ago (there's more problems here), but I can't seem to
> find it in a hurry :/
>
>
>
--
Thanks and Regards,
Prateek
Powered by blists - more mailing lists