lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <ba9f47fc-4465-4db8-93e3-fe86c619bdef@amd.com>
Date: Fri, 29 Nov 2024 13:30:12 +0530
From: K Prateek Nayak <kprateek.nayak@....com>
To: Adam Li <adamli@...amperecomputing.com>, <peterz@...radead.org>,
	<mingo@...hat.com>, <juri.lelli@...hat.com>, <vincent.guittot@...aro.org>
CC: <dietmar.eggemann@....com>, <rostedt@...dmis.org>, <bsegall@...gle.com>,
	<mgorman@...e.de>, <vschneid@...hat.com>, <linux-kernel@...r.kernel.org>,
	<patches@...erecomputing.com>, <cl@...ux.com>, <christian.loehle@....com>,
	<vineethr@...ux.ibm.com>
Subject: Re: [PATCH v2 1/3] sched/fair: Fix warning if NEXT_BUDDY enabled

Hello Adam,

On 11/29/2024 1:10 PM, Adam Li wrote:
> On 11/29/2024 12:28 PM, K Prateek Nayak wrote:
> 
>>>> I also see PSI splats like:
>>>>
>>>>       psi: inconsistent task state! task=2524:kworker/u1028:2 cpu=154 psi_flags=10 clear=14 set=0
>>>>
>>>> but the PSI flags it has set "(TSK_MEMSTALL_RUNNING | TSK_MEMSTALL)" and
>>>> the flags it is trying to clear
>>>> "(TSK_MEMSTALL_RUNNING | TSK_MEMSTALL | TSK_RUNNING)" seem to be only
>>>> possible if you have picked a dequeued entity for running before its
>>>> wakeup, which is also perhaps why the "nr_running" computation goes awry
>>>> and pick_eevdf() returns NULL (which it should never since
>>>> pick_next_entity() is only called when rq->cfs.nr_running is > 0)
>>> IIUC, one path for pick_eevdf() to return NULL is:
>>> pick_eevdf():
>>> <snip>
>>>      if (curr && (!curr->on_rq || !entity_eligible(cfs_rq, curr)))
>>>          curr = NULL; <--- curr is set to NULL
>>
>> "on_rq" is only cleared when the entity is dequeued so "curr" is in fact
>> going to sleep (proper sleep) and we've reached at pick_eevdf(),
>> otherwise, if "curr" is not eligible, there is at least one more tasks
>> on the cfs_rq which implies best has be found and will be non-null.
>>
> if curr->sched_delayed == 1, the condition: '(curr && (!curr->on_rq || !entity_eligible(cfs_rq, curr)))'
> can be true. Please correct me if wrong.

If curr is sched_delayed, that means it is going to sleep and it is
ineligible but it can only be ineligible if there is at least one more
task with a lower vruntime and hence best must be found. A delayed
entity will also not decrement the "nr_running" and it'll be queued back
from put_prev_entity() to be picked off later.

Essentially, I believe curr can never be ineligible in absence of other
eligible tasks.

> 
>>> <snip>
>>> found:
>>>      if (!best || (curr && entity_before(curr, best)))
>>>          best = curr; <--- curr and best are both NULL
>>
>> Say "curr" is going to sleep, and there is no "best", in which case
>> "curr" is already blocked and "cfs_rq->nr_running" should be 0 and it
>> should have not reached pick_eevdf() in the first place since
>> pick_next_entity() is only called by pick_task_fair() if
>> "cfs_rq->nr_running" is non-zero.
>>
>> So as long as "cfs_rq->nr_running" is non-zero, pick_eevdf() should
>> return a valid runnable entity. Failure to do so perhaps points to
>> "entity_eligible()" check going sideways somewhere or a bug in
>> "nr_running" accounting.
>>
>> Chenyu had proposed a similar fix long back in
>> https://lore.kernel.org/lkml/20240226082349.302363-1-yu.c.chen@intel.com/
>> but the consensus was it was covering up a larger problem which
>> then boiled down to avg_vruntime being computed incorrectly
>> https://lore.kernel.org/lkml/ZiAWTU5xb%2FJMn%2FHs@chenyu5-mobl2/
>>
> Thanks for the information.
>  From the timeline, it seems the issue is before 152e11f6df29 ("sched/fair: Implement delayed dequeue").
> DELAY_DEQUEUE may introduce risk for pick_eevdf() return NULL.

Ideally it shouldn't since delayed entities are still captured in
"cfs_rq->nr_running" and they'll eventually become eligible and be
picked off but I may be wrong and I hope someone corrects me in which
case :)

> 
> After patch 1 ("Fix warning if NEXT_BUDDY enabled"), the NULL pointer panic disappears.
> Patch 2 ("Fix panic if pick_eevdf() returns NULL") is a safe guard.
> 
> Thanks,
> -adam
> 

-- 
Thanks and Regards,
Prateek


Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ