linux-kernel - Re: [RFC PATCH 0/5] sched/psi: Fix PSI accounting with proxy execution

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <db867b9e-9997-4869-86a8-78fe03696624@amd.com>
Date: Tue, 18 Nov 2025 10:38:52 +0530
From: K Prateek Nayak <kprateek.nayak@....com>
To: John Stultz <jstultz@...gle.com>
CC: Ingo Molnar <mingo@...hat.com>, Peter Zijlstra <peterz@...radead.org>,
	Juri Lelli <juri.lelli@...hat.com>, Vincent Guittot
	<vincent.guittot@...aro.org>, Johannes Weiner <hannes@...xchg.org>, Suren
 Baghdasaryan <surenb@...gle.com>, <linux-kernel@...r.kernel.org>, Dietmar
 Eggemann <dietmar.eggemann@....com>, Steven Rostedt <rostedt@...dmis.org>,
	Ben Segall <bsegall@...gle.com>, Mel Gorman <mgorman@...e.de>, Valentin
 Schneider <vschneid@...hat.com>
Subject: Re: [RFC PATCH 0/5] sched/psi: Fix PSI accounting with proxy
 execution

Hello John,

On 11/18/2025 9:56 AM, John Stultz wrote:
> On Mon, Nov 17, 2025 at 5:39 PM K Prateek Nayak <kprateek.nayak@....com> wrote:
>> On 11/18/2025 6:15 AM, John Stultz wrote:
>>> I'm still getting my head around the description above (its been
>>> awhile since I last looked at the PSI code), but early on I often hit
>>> PSI splats, and I thought I had addressed it with the patch here:
>>>   https://github.com/johnstultz-work/linux-dev/commit/f60923a6176b3778a8fc9b9b0bbe4953153ce565
>>
>> Oooo! Let me go test that.

Seems like that solution works too on top of current tip:sched/core.
I think you can send it out as a standalone patch for inclusion while
we hash out the donor migration bits (and blocked owner, and rwsem!).

>>
>>>
>>> And with that I've not run across any warnings since.
>>>
>>> Now, I hadn't tripped over the issue recently with the subset of the
>>> full series I've been pushing upstream, and as I most easily ran into
>>> it with the sleeping owner enqueuing feature I was holding the fix
>>> back for those changes. But I realize unfortunately CONFIG_PSI at some
>>> point got disabled in my test defconfig, so I've not had the
>>> opportunity to trip it, and sure enough I can trivially see it booting
>>> with the current upstream code.
>>
>> I hit this on tip:sched/core when looking at the recent sched_yield()
>> changes. Maybe the "blocked_on" serialization with the proxy migration
>> will make this all go away :)
>>
>>>
>>> Applying that fix does seem to avoid the warnings in my trivial
>>> testing, but again I've not dug through the logic in awhile, so you
>>> may have a better sense of the inadequacies of that fix.
>>>
>>> If it looks reasonable to you, I'll rework the commit message so it
>>> isn't so focused on the sleeping-owner-enquing case and submit it.
>>
>> That would be great! And it seems to be a lot more simpler than the
>> the stuff I'm trying to do. I'll give it a spin and get back to you.
>> Thank you again for pointing to the fix.
>>
>>>
>>> I'll have to spend some time here looking more at your proposed
>>> solution. On the initial glance, I do fret a little with the
>>> task->sched_proxy bit overlapping a bit in meaning with the
>>> task->blocked_on value.
>>
>> Ack! I'm pretty sure with the blocked_on locking we'll not have these
>> "interesting" situations but I posted the RFC out just in case we
>> needed something in the interim but turns out its a solved problem :)
>>
>> On last thing, it'll be good to get some clarification on how to treat
>> the blocked tasks retained on the runqueue for PSI - quick look at your
>> fix suggests we still consider them runnable (TSK_RUNNING) from PSI
>> standpoint - is this ideal or should PSI consider these tasks blocked?
> 
> So my default way of thinking about mutex-blocked tasks with proxy is
> that they are equivalent to runnable. They can be selected by
> pick_next_task(), and they are charged for the time they donate to the
> lock-owner that runs as the proxy.
> To conceptualize things with ProxyExec, I often imagine the
> mutex-blocked task as being in "optimistic spin" mode waiting for the
> mutex, where we'd just run the task and let it spin, instead of
> blocking the task (when the lock owner isn't already running). Then we
> just have the optimization of instead of just wasting time spinning,
> we run the lock owner to release the lock.

I think I can see it now. I generally considered them the other
way around as blocked tasks retained just for the vruntime context.
I'll try changing my perspective to match yours when looking at
proxy :)

As for the fix in your tree, feel free to include:

Tested-by: K Prateek Nayak <kprateek.nayak@....com>

> 
> So, I need to further refresh myself with more of the subtleties of
> PSI, but to me considering it TSK_RUNNING seems intuitive.
> 
> There are maybe some transient cases, like where the blocked task is
> on one RQ, and the lock holder is on another, and thus until the
> blocked task is selected (and then proxy-migrated to boost the task on
> the other cpu), where if it were very far back in the runqueue it
> could be contributing what could be seen as "false pressure" on that
> RQ.  So maybe I need to think a bit more about that. But it still is a
> task that wants to run to boost the lock owner, so I'm not sure how
> different it is in the PSI view compared to transient runqueue
> imbalances.

I think Johannes has a better understanding of how these signals are
used in the field so I'll defer to him.

-- 
Thanks and Regards,
Prateek