lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <6683a7dc-b83b-489c-bdfc-ad225ab816c6@amd.com>
Date: Fri, 29 Nov 2024 23:16:47 +0530
From: K Prateek Nayak <kprateek.nayak@....com>
To: Peter Zijlstra <peterz@...radead.org>
CC: Adam Li <adamli@...amperecomputing.com>, <mingo@...hat.com>,
	<juri.lelli@...hat.com>, <vincent.guittot@...aro.org>,
	<dietmar.eggemann@....com>, <rostedt@...dmis.org>, <bsegall@...gle.com>,
	<mgorman@...e.de>, <vschneid@...hat.com>, <linux-kernel@...r.kernel.org>,
	<patches@...erecomputing.com>, <cl@...ux.com>, <christian.loehle@....com>,
	<vineethr@...ux.ibm.com>
Subject: Re: [PATCH v2 1/3] sched/fair: Fix warning if NEXT_BUDDY enabled

Hello Peter,

On 11/29/2024 3:25 PM, Peter Zijlstra wrote:
> On Thu, Nov 28, 2024 at 12:59:54PM +0530, K Prateek Nayak wrote:
> 
>> I tried to put a SCHED_WARN_ON() here to track where this comes from and
>> seems like it is usually from attach_task() in the load balancing path
>> pulling a delayed task which is set as the next buddy in
>> check_preempt_wakeup_fair()
>>
>> Can you please try the following diff instead of the first two patches
>> and see if you still hit these warnings, stalls, and pick_eevdf()
>> returning NULL?
>>
>> diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
>> index ff7cae9274c5..61e74eb5af22 100644
>> --- a/kernel/sched/fair.c
>> +++ b/kernel/sched/fair.c
>> @@ -5478,6 +5478,7 @@ dequeue_entity(struct cfs_rq *cfs_rq, struct sched_entity *se, int flags)
>>   	bool sleep = flags & DEQUEUE_SLEEP;
>>   	update_curr(cfs_rq);
>> +	clear_buddies(cfs_rq, se);
>>   	if (flags & DEQUEUE_DELAYED) {
>>   		SCHED_WARN_ON(!se->sched_delayed);
>> @@ -5520,8 +5521,6 @@ dequeue_entity(struct cfs_rq *cfs_rq, struct sched_entity *se, int flags)
>>   	update_stats_dequeue_fair(cfs_rq, se, flags);
>> -	clear_buddies(cfs_rq, se);
>> -
>>   	update_entity_lag(cfs_rq, se);
>>   	if (sched_feat(PLACE_REL_DEADLINE) && !sleep) {
>>   		se->deadline -= se->vruntime;
> 
> So this puts the clear_buddies() before the whole delayed thing, and
> should be sufficient afaict, no?
> 
>> @@ -8767,7 +8766,7 @@ static void check_preempt_wakeup_fair(struct rq *rq, struct task_struct *p, int
>>   	if (unlikely(throttled_hierarchy(cfs_rq_of(pse))))
>>   		return;
>> -	if (sched_feat(NEXT_BUDDY) && !(wake_flags & WF_FORK)) {
>> +	if (sched_feat(NEXT_BUDDY) && !(wake_flags & WF_FORK) && !pse->sched_delayed) {
>>   		set_next_buddy(pse);
>>   	}
> 
> But then this should never happen, which is after a wakeup, p and the
> whole hierarchy up should be runnable at this point.
> 
> Or should I go find more wake-up juice and try again :-)

The motivation there was this splat from my testing:

     Kernel panic - not syncing: Encountered delayed entity in check_preempt_wakeup_fair()
     CPU: 190 UID: 1000 PID: 4215 Comm: sched-messaging Tainted: G        W          6.12.0-rc4-test+ #156
     Tainted: [W]=WARN
     Hardware name: Dell Inc. PowerEdge R6525/024PW1, BIOS 2.7.3 03/30/2022
     Call Trace:
      <TASK>
      panic+0x399/0x3f0
      check_preempt_wakeup_fair+0x21b/0x220
      wakeup_preempt+0x64/0x70
      sched_balance_rq+0x970/0x12e0
      ? update_load_avg+0x7e/0x7e0
      sched_balance_newidle+0x1e2/0x490
      pick_next_task_fair+0x32/0x3b0
      __pick_next_task+0x3d/0x1a0
      __schedule+0x1da/0x1540
      ? srso_alias_return_thunk+0x5/0xfbef5
      ? aa_file_perm+0x121/0x4d0
      schedule+0x28/0x110
      pipe_read+0x345/0x470
      ? __pfx_autoremove_wake_function+0x10/0x10
      vfs_read+0x2f1/0x330
      ksys_read+0xaf/0xe0
      do_syscall_64+0x6f/0x110
      ? srso_alias_return_thunk+0x5/0xfbef5
      ? current_time+0x31/0xf0
      ? srso_alias_return_thunk+0x5/0xfbef5
      ? atime_needs_update+0x9c/0x110
      ? srso_alias_return_thunk+0x5/0xfbef5
      ? touch_atime+0x1e/0x100
      ? srso_alias_return_thunk+0x5/0xfbef5
      ? pipe_read+0x3ec/0x470
      ? srso_alias_return_thunk+0x5/0xfbef5
      ? vfs_read+0x2f1/0x330
      ? srso_alias_return_thunk+0x5/0xfbef5
      ? ksys_read+0xcc/0xe0
      ? srso_alias_return_thunk+0x5/0xfbef5
      ? syscall_exit_to_user_mode+0x51/0x1a0
      ? srso_alias_return_thunk+0x5/0xfbef5
      ? do_syscall_64+0x7b/0x110
      ? ksys_read+0xcc/0xe0
      ? srso_alias_return_thunk+0x5/0xfbef5
      ? syscall_exit_to_user_mode+0x51/0x1a0
      ? srso_alias_return_thunk+0x5/0xfbef5
      ? do_syscall_64+0x7b/0x110
      ? srso_alias_return_thunk+0x5/0xfbef5
      ? ksys_read+0xcc/0xe0
      ? do_syscall_64+0x7b/0x110
      ? srso_alias_return_thunk+0x5/0xfbef5
      ? syscall_exit_to_user_mode+0x51/0x1a0
      ? srso_alias_return_thunk+0x5/0xfbef5
      ? do_syscall_64+0x7b/0x110
      ? srso_alias_return_thunk+0x5/0xfbef5
      ? do_syscall_64+0x7b/0x110
      entry_SYSCALL_64_after_hwframe+0x76/0x7e
     RIP: 0033:0x7fa8b851481c
     Code: ec 28 48 89 54 24 18 48 89 74 24 10 89 7c 24 08 e8 e9 c1 f7 ff 48 8b 54 24 18 48 8b 74 24 10 41 89 c0 8b 7c 24 08 31 c0 0f 05 <48> 3d 00 f0 ff ff 77 34 44 89 c7 48 89 44 24 08 e8 2f c2 f7 ff 48
     RSP: 002b:00007fa8ae0d8cb0 EFLAGS: 00000246 ORIG_RAX: 0000000000000000
     RAX: ffffffffffffffda RBX: 0000000000000000 RCX: 00007fa8b851481c
     RDX: 0000000000000064 RSI: 00007fa8ae0d8cf0 RDI: 0000000000000027
     RBP: 00007fa8ae0d8d90 R08: 0000000000000000 R09: 00007ffee0fdf97f
     R10: 0000000000000008 R11: 0000000000000246 R12: 0000000000000064
     R13: 00007fa8ae0d8cf0 R14: 00000000000005aa R15: 000055c2e80accb0
      </TASK>
--

newidle balance pulls a delayed entity which goes through the
check_preempt_wakeup_fair() path in attach_task() and is set as the next
buddy. On a second thought this is perhaps not required since even if
this delayed entity is picked, it'll go thorough a full dequeue and the
clear_buddies() change above should take care of it.

> 
> 
> 
> Anyway..  I'm sure I started a patch series cleaning up the whole next
> buddy thing months ago (there's more problems here), but I can't seem to
> find it in a hurry :/
> 
> 
> 

-- 
Thanks and Regards,
Prateek


Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ