lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <cf3dbcfe-3c26-48e3-b32f-6473c8dbeb06@os.amperecomputing.com>
Date: Fri, 29 Nov 2024 15:40:21 +0800
From: Adam Li <adamli@...amperecomputing.com>
To: K Prateek Nayak <kprateek.nayak@....com>, peterz@...radead.org,
 mingo@...hat.com, juri.lelli@...hat.com, vincent.guittot@...aro.org
Cc: dietmar.eggemann@....com, rostedt@...dmis.org, bsegall@...gle.com,
 mgorman@...e.de, vschneid@...hat.com, linux-kernel@...r.kernel.org,
 patches@...erecomputing.com, cl@...ux.com, christian.loehle@....com,
 vineethr@...ux.ibm.com
Subject: Re: [PATCH v2 1/3] sched/fair: Fix warning if NEXT_BUDDY enabled

On 11/29/2024 12:28 PM, K Prateek Nayak wrote:

>>> I also see PSI splats like:
>>>
>>>      psi: inconsistent task state! task=2524:kworker/u1028:2 cpu=154 psi_flags=10 clear=14 set=0
>>>
>>> but the PSI flags it has set "(TSK_MEMSTALL_RUNNING | TSK_MEMSTALL)" and
>>> the flags it is trying to clear
>>> "(TSK_MEMSTALL_RUNNING | TSK_MEMSTALL | TSK_RUNNING)" seem to be only
>>> possible if you have picked a dequeued entity for running before its
>>> wakeup, which is also perhaps why the "nr_running" computation goes awry
>>> and pick_eevdf() returns NULL (which it should never since
>>> pick_next_entity() is only called when rq->cfs.nr_running is > 0)
>> IIUC, one path for pick_eevdf() to return NULL is:
>> pick_eevdf():
>> <snip>
>>     if (curr && (!curr->on_rq || !entity_eligible(cfs_rq, curr)))
>>         curr = NULL; <--- curr is set to NULL
> 
> "on_rq" is only cleared when the entity is dequeued so "curr" is in fact
> going to sleep (proper sleep) and we've reached at pick_eevdf(),
> otherwise, if "curr" is not eligible, there is at least one more tasks
> on the cfs_rq which implies best has be found and will be non-null.
> 
if curr->sched_delayed == 1, the condition: '(curr && (!curr->on_rq || !entity_eligible(cfs_rq, curr)))'
can be true. Please correct me if wrong.

>> <snip>
>> found:
>>     if (!best || (curr && entity_before(curr, best)))
>>         best = curr; <--- curr and best are both NULL
> 
> Say "curr" is going to sleep, and there is no "best", in which case
> "curr" is already blocked and "cfs_rq->nr_running" should be 0 and it
> should have not reached pick_eevdf() in the first place since
> pick_next_entity() is only called by pick_task_fair() if
> "cfs_rq->nr_running" is non-zero.
> 
> So as long as "cfs_rq->nr_running" is non-zero, pick_eevdf() should
> return a valid runnable entity. Failure to do so perhaps points to
> "entity_eligible()" check going sideways somewhere or a bug in
> "nr_running" accounting.
> 
> Chenyu had proposed a similar fix long back in
> https://lore.kernel.org/lkml/20240226082349.302363-1-yu.c.chen@intel.com/
> but the consensus was it was covering up a larger problem which
> then boiled down to avg_vruntime being computed incorrectly
> https://lore.kernel.org/lkml/ZiAWTU5xb%2FJMn%2FHs@chenyu5-mobl2/
> 
Thanks for the information.
>From the timeline, it seems the issue is before 152e11f6df29 ("sched/fair: Implement delayed dequeue").
DELAY_DEQUEUE may introduce risk for pick_eevdf() return NULL.

After patch 1 ("Fix warning if NEXT_BUDDY enabled"), the NULL pointer panic disappears.
Patch 2 ("Fix panic if pick_eevdf() returns NULL") is a safe guard.

Thanks,
-adam


Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ