lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <CAP_z_CgtWoJ8CAm=hAxJMcWrGMoi=OPj+Axsu+sMyZ_BuuHVxg@mail.gmail.com>
Date: Tue, 28 Oct 2025 15:26:39 -0700
From: Blake Jones <blakejones@...gle.com>
To: Peter Zijlstra <peterz@...radead.org>
Cc: Ingo Molnar <mingo@...hat.com>, Juri Lelli <juri.lelli@...hat.com>, 
	Vincent Guittot <vincent.guittot@...aro.org>, Madadi Vineeth Reddy <vineethr@...ux.ibm.com>, 
	Josh Don <joshdon@...gle.com>, Andrew Morton <akpm@...ux-foundation.org>, 
	Dietmar Eggemann <dietmar.eggemann@....com>, Steven Rostedt <rostedt@...dmis.org>, 
	Ben Segall <bsegall@...gle.com>, Mel Gorman <mgorman@...e.de>, 
	Valentin Schneider <vschneid@...hat.com>, linux-kernel@...r.kernel.org
Subject: Re: [PATCH v3] sched: reorder some fields in struct rq

[I thought I'd turned off HTML on my reply just now, but apparently not.
Sorry about that. Re-sending so the list has context...]

Hi Peter,

Thanks for your questions. Here are my thoughts:

On Tue, Oct 28, 2025 at 4:29 AM Peter Zijlstra <peterz@...radead.org> wrote:
> queue_mask is modified right along with nr_running -- every
> enqueue/dequeue. It seems weird to move it away from nr_running.

My main goal with this change is to optimize the first line for read-mostly
usage - specifically, for the loop in update_sg_lb_stats(), which loads
data from many rq's without locking them. Data from some Google fleet
profilers showed about 38x as many loads as stores to these fields.

>From what I could tell from your patch, queue_mask is write-mostly -
enqueue_task() and dequeue_task() update it, and sched_balance_newidle()
stores to it and loads it later. That's why I didn't put it in the first
line.

That said, the place where I relocated it to is somewhat arbitrary. I
mostly put it there because it had some open space on that line, though
its new neighbors prev_mm and next_balance are also updated somewhat
frequently. If you have another suggestion I'm quite open to it.

> Does it not make more sense to put the lock in the same cacheline as the
> clock fields?

Great question. When I was first playing around with this reordering, I did
some more substantial movement within struct rq. But for the versions I've
sent out, I decided to keep it focused on just a few areas where I could
tell there was clustered usage. I didn't want to over-fit the placement of
a whole lot of fields in struct rq, since the code inevitably changes.

So I'd be open to moving the lock to the same line as the clock fields if
you think it would be a perf win, but I don't personally have any perf
numbers that could demonstrate the merits of doing that. WDYT?

> nr_iowait has cross CPU updates and would be subject to false sharing.

Mmm, good point. I moved it back before the clock-related cache line to
balance out moving "clock" and "clock_update_flags" to that line, since
otherwise it was going to add an extra line to the structure. But also,
it had been on the same line as the various clock fields, so that false
sharing was already not doing us any favors.

The line that I've moved it to has your queue_mask, so that's probably not
the best place. How about if I move it to the end of the structure, next to
cfsb_csd and cfsb_csd_list, since those don't seem to get used very often?

> Does the thing want a __no_randomize_layout ?

Well, this is by far one of the hottest kernel data structures we see at
Google, and it's not an obvious thing for a security vuln to target. So I'd
be quite happy to add __no_randomize_layout; if you agree, I'll add it.

Blake

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ