linux-kernel - Re: [RFC PATCH 0/5] sched/psi: Fix PSI accounting with proxy execution

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <CANDhNCqRR1A9feJjy+Gz15OhWgZzpfVF6U9Nue9QX4M_4kB5rA@mail.gmail.com>
Date: Mon, 17 Nov 2025 20:26:00 -0800
From: John Stultz <jstultz@...gle.com>
To: K Prateek Nayak <kprateek.nayak@....com>
Cc: Ingo Molnar <mingo@...hat.com>, Peter Zijlstra <peterz@...radead.org>, 
	Juri Lelli <juri.lelli@...hat.com>, Vincent Guittot <vincent.guittot@...aro.org>, 
	Johannes Weiner <hannes@...xchg.org>, Suren Baghdasaryan <surenb@...gle.com>, linux-kernel@...r.kernel.org, 
	Dietmar Eggemann <dietmar.eggemann@....com>, Steven Rostedt <rostedt@...dmis.org>, 
	Ben Segall <bsegall@...gle.com>, Mel Gorman <mgorman@...e.de>, 
	Valentin Schneider <vschneid@...hat.com>
Subject: Re: [RFC PATCH 0/5] sched/psi: Fix PSI accounting with proxy execution

On Mon, Nov 17, 2025 at 5:39 PM K Prateek Nayak <kprateek.nayak@....com> wrote:
> On 11/18/2025 6:15 AM, John Stultz wrote:
> > I'm still getting my head around the description above (its been
> > awhile since I last looked at the PSI code), but early on I often hit
> > PSI splats, and I thought I had addressed it with the patch here:
> >   https://github.com/johnstultz-work/linux-dev/commit/f60923a6176b3778a8fc9b9b0bbe4953153ce565
>
> Oooo! Let me go test that.
>
> >
> > And with that I've not run across any warnings since.
> >
> > Now, I hadn't tripped over the issue recently with the subset of the
> > full series I've been pushing upstream, and as I most easily ran into
> > it with the sleeping owner enqueuing feature I was holding the fix
> > back for those changes. But I realize unfortunately CONFIG_PSI at some
> > point got disabled in my test defconfig, so I've not had the
> > opportunity to trip it, and sure enough I can trivially see it booting
> > with the current upstream code.
>
> I hit this on tip:sched/core when looking at the recent sched_yield()
> changes. Maybe the "blocked_on" serialization with the proxy migration
> will make this all go away :)
>
> >
> > Applying that fix does seem to avoid the warnings in my trivial
> > testing, but again I've not dug through the logic in awhile, so you
> > may have a better sense of the inadequacies of that fix.
> >
> > If it looks reasonable to you, I'll rework the commit message so it
> > isn't so focused on the sleeping-owner-enquing case and submit it.
>
> That would be great! And it seems to be a lot more simpler than the
> the stuff I'm trying to do. I'll give it a spin and get back to you.
> Thank you again for pointing to the fix.
>
> >
> > I'll have to spend some time here looking more at your proposed
> > solution. On the initial glance, I do fret a little with the
> > task->sched_proxy bit overlapping a bit in meaning with the
> > task->blocked_on value.
>
> Ack! I'm pretty sure with the blocked_on locking we'll not have these
> "interesting" situations but I posted the RFC out just in case we
> needed something in the interim but turns out its a solved problem :)
>
> On last thing, it'll be good to get some clarification on how to treat
> the blocked tasks retained on the runqueue for PSI - quick look at your
> fix suggests we still consider them runnable (TSK_RUNNING) from PSI
> standpoint - is this ideal or should PSI consider these tasks blocked?

So my default way of thinking about mutex-blocked tasks with proxy is
that they are equivalent to runnable. They can be selected by
pick_next_task(), and they are charged for the time they donate to the
lock-owner that runs as the proxy.
To conceptualize things with ProxyExec, I often imagine the
mutex-blocked task as being in "optimistic spin" mode waiting for the
mutex, where we'd just run the task and let it spin, instead of
blocking the task (when the lock owner isn't already running). Then we
just have the optimization of instead of just wasting time spinning,
we run the lock owner to release the lock.

So, I need to further refresh myself with more of the subtleties of
PSI, but to me considering it TSK_RUNNING seems intuitive.

There are maybe some transient cases, like where the blocked task is
on one RQ, and the lock holder is on another, and thus until the
blocked task is selected (and then proxy-migrated to boost the task on
the other cpu), where if it were very far back in the runqueue it
could be contributing what could be seen as "false pressure" on that
RQ.  So maybe I need to think a bit more about that. But it still is a
task that wants to run to boost the lock owner, so I'm not sure how
different it is in the PSI view compared to transient runqueue
imbalances.

thanks
-john