linux-kernel - Re: [PATCH resend 2/2] sched: psi: use rq

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <aYIjpyQCkkxxJZ0s@cmpxchg.org>
Date: Tue, 3 Feb 2026 11:34:47 -0500
From: Johannes Weiner <hannes@...xchg.org>
To: Peter Zijlstra <peterz@...radead.org>
Cc: Suren Baghdasaryan <surenb@...gle.com>, Ingo Molnar <mingo@...nel.org>,
	Chengming Zhou <chengming.zhou@...ux.dev>,
	Dietmar Eggemann <dietmar.eggemann@....com>,
	John Stultz <jstultz@...gle.com>, linux-kernel@...r.kernel.org
Subject: Re: [PATCH resend 2/2] sched: psi: use rq_clock() during task state
 changes

On Mon, Feb 02, 2026 at 09:41:37PM +0100, Peter Zijlstra wrote:
> On Wed, Jan 14, 2026 at 10:43:17AM -0500, Johannes Weiner wrote:
> > In the hottest psi paths, the scheduler already caches the cpu_clock()
> > call for the event in rq->clock. Now that the clocks between state
> > changes and pressure aggregation don't need to be synchronized inside
> > the seqcount section anymore, use the cheaper rq_clock().
> > 
> > Add update_rq_clock() calls to the few places where psi is entered
> > without the rq already locked.
> 
> Just to be clear, rq->clock is not a cache of cpu_clock(). rq->clock
> discards all backwards motion (which obviously should never happen, but
> if it does, the clocks go out of sync).
> 
> So if you use rq->clock, you must use it for all and not mix with
> cpu_clock().
> 
> I *think* the patch does that, but I've not double checked.

Ah no, it does mix them :(

Yeah I'm using rq_clock() consistently on the scheduler side to
accumulate the times of concluded states.

	state_start = rq_clock()
	...
	state_time = rq_clock() - state_start

However, the aggregator side still uses

	state_time += cpu_clock() - state_start

to incorporate currently active state. If they don't have the same
base, this won't work.

Doing the full lock and update_rq_clock() from the aggregator sounds
quite heavy handed. How about using sched_clock_cpu() directly and
doing the backwards motion check by hand?

	local_irq_save()
	now = sched_clock_cpu(cpu)
	local_irq_restore()

	...

	if (state_mask & (1 << s) && now > state_start)
		times[s] += now - state_start