[<prev] [next>] [<thread-prev] [day] [month] [year] [list]
Message-ID: <db867b9e-9997-4869-86a8-78fe03696624@amd.com>
Date: Tue, 18 Nov 2025 10:38:52 +0530
From: K Prateek Nayak <kprateek.nayak@....com>
To: John Stultz <jstultz@...gle.com>
CC: Ingo Molnar <mingo@...hat.com>, Peter Zijlstra <peterz@...radead.org>,
Juri Lelli <juri.lelli@...hat.com>, Vincent Guittot
<vincent.guittot@...aro.org>, Johannes Weiner <hannes@...xchg.org>, Suren
Baghdasaryan <surenb@...gle.com>, <linux-kernel@...r.kernel.org>, Dietmar
Eggemann <dietmar.eggemann@....com>, Steven Rostedt <rostedt@...dmis.org>,
Ben Segall <bsegall@...gle.com>, Mel Gorman <mgorman@...e.de>, Valentin
Schneider <vschneid@...hat.com>
Subject: Re: [RFC PATCH 0/5] sched/psi: Fix PSI accounting with proxy
execution
Hello John,
On 11/18/2025 9:56 AM, John Stultz wrote:
> On Mon, Nov 17, 2025 at 5:39 PM K Prateek Nayak <kprateek.nayak@....com> wrote:
>> On 11/18/2025 6:15 AM, John Stultz wrote:
>>> I'm still getting my head around the description above (its been
>>> awhile since I last looked at the PSI code), but early on I often hit
>>> PSI splats, and I thought I had addressed it with the patch here:
>>> https://github.com/johnstultz-work/linux-dev/commit/f60923a6176b3778a8fc9b9b0bbe4953153ce565
>>
>> Oooo! Let me go test that.
Seems like that solution works too on top of current tip:sched/core.
I think you can send it out as a standalone patch for inclusion while
we hash out the donor migration bits (and blocked owner, and rwsem!).
>>
>>>
>>> And with that I've not run across any warnings since.
>>>
>>> Now, I hadn't tripped over the issue recently with the subset of the
>>> full series I've been pushing upstream, and as I most easily ran into
>>> it with the sleeping owner enqueuing feature I was holding the fix
>>> back for those changes. But I realize unfortunately CONFIG_PSI at some
>>> point got disabled in my test defconfig, so I've not had the
>>> opportunity to trip it, and sure enough I can trivially see it booting
>>> with the current upstream code.
>>
>> I hit this on tip:sched/core when looking at the recent sched_yield()
>> changes. Maybe the "blocked_on" serialization with the proxy migration
>> will make this all go away :)
>>
>>>
>>> Applying that fix does seem to avoid the warnings in my trivial
>>> testing, but again I've not dug through the logic in awhile, so you
>>> may have a better sense of the inadequacies of that fix.
>>>
>>> If it looks reasonable to you, I'll rework the commit message so it
>>> isn't so focused on the sleeping-owner-enquing case and submit it.
>>
>> That would be great! And it seems to be a lot more simpler than the
>> the stuff I'm trying to do. I'll give it a spin and get back to you.
>> Thank you again for pointing to the fix.
>>
>>>
>>> I'll have to spend some time here looking more at your proposed
>>> solution. On the initial glance, I do fret a little with the
>>> task->sched_proxy bit overlapping a bit in meaning with the
>>> task->blocked_on value.
>>
>> Ack! I'm pretty sure with the blocked_on locking we'll not have these
>> "interesting" situations but I posted the RFC out just in case we
>> needed something in the interim but turns out its a solved problem :)
>>
>> On last thing, it'll be good to get some clarification on how to treat
>> the blocked tasks retained on the runqueue for PSI - quick look at your
>> fix suggests we still consider them runnable (TSK_RUNNING) from PSI
>> standpoint - is this ideal or should PSI consider these tasks blocked?
>
> So my default way of thinking about mutex-blocked tasks with proxy is
> that they are equivalent to runnable. They can be selected by
> pick_next_task(), and they are charged for the time they donate to the
> lock-owner that runs as the proxy.
> To conceptualize things with ProxyExec, I often imagine the
> mutex-blocked task as being in "optimistic spin" mode waiting for the
> mutex, where we'd just run the task and let it spin, instead of
> blocking the task (when the lock owner isn't already running). Then we
> just have the optimization of instead of just wasting time spinning,
> we run the lock owner to release the lock.
I think I can see it now. I generally considered them the other
way around as blocked tasks retained just for the vruntime context.
I'll try changing my perspective to match yours when looking at
proxy :)
As for the fix in your tree, feel free to include:
Tested-by: K Prateek Nayak <kprateek.nayak@....com>
>
> So, I need to further refresh myself with more of the subtleties of
> PSI, but to me considering it TSK_RUNNING seems intuitive.
>
> There are maybe some transient cases, like where the blocked task is
> on one RQ, and the lock holder is on another, and thus until the
> blocked task is selected (and then proxy-migrated to boost the task on
> the other cpu), where if it were very far back in the runqueue it
> could be contributing what could be seen as "false pressure" on that
> RQ. So maybe I need to think a bit more about that. But it still is a
> task that wants to run to boost the lock owner, so I'm not sure how
> different it is in the PSI view compared to transient runqueue
> imbalances.
I think Johannes has a better understanding of how these signals are
used in the field so I'll defer to him.
--
Thanks and Regards,
Prateek
Powered by blists - more mailing lists