[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <20251029091437.GE988547@noisy.programming.kicks-ass.net>
Date: Wed, 29 Oct 2025 10:14:37 +0100
From: Peter Zijlstra <peterz@...radead.org>
To: Blake Jones <blakejones@...gle.com>
Cc: Ingo Molnar <mingo@...hat.com>, Juri Lelli <juri.lelli@...hat.com>,
Vincent Guittot <vincent.guittot@...aro.org>,
Madadi Vineeth Reddy <vineethr@...ux.ibm.com>,
Josh Don <joshdon@...gle.com>,
Andrew Morton <akpm@...ux-foundation.org>,
Dietmar Eggemann <dietmar.eggemann@....com>,
Steven Rostedt <rostedt@...dmis.org>,
Ben Segall <bsegall@...gle.com>, Mel Gorman <mgorman@...e.de>,
Valentin Schneider <vschneid@...hat.com>,
linux-kernel@...r.kernel.org
Subject: Re: [PATCH v3] sched: reorder some fields in struct rq
On Tue, Oct 28, 2025 at 03:15:27PM -0700, Blake Jones wrote:
> Hi Peter,
>
> Thanks for your questions. Here are my thoughts:
>
> On Tue, Oct 28, 2025 at 4:29 AM Peter Zijlstra <peterz@...radead.org> wrote:
>
> > queue_mask is modified right along with nr_running -- every
> > enqueue/dequeue. It seems weird to move it away from nr_running.
> >
>
> My main goal with this change is to optimize the first line for read-mostly
> usage - specifically, for the loop in update_sg_lb_stats(), which loads
> data from many rq's without locking them. Data from some Google fleet
> profilers showed about 38x as many loads as stores to these fields.
>
> From what I could tell from your patch, queue_mask is write-mostly -
> enqueue_task() and dequeue_task() update it, and sched_balance_newidle()
> stores to it and loads it later. That's why I didn't put it in the first
> line.
>
> That said, the place where I relocated it to is somewhat arbitrary. I
> mostly put it there because it had some open space on that line, though
> its new neighbors prev_mm and next_balance are also updated somewhat
> frequently. If you have another suggestion I'm quite open to it.
Right. I missed the read-heavy part. It still feels somewhat weird to
split variables you know are modified together, but alas.
Also, I might be reworking/removing queue_mask again soon, we'll see if
that pans out.
> > Does it not make more sense to put the lock in the same cacheline as the
> > clock fields?
> >
>
> Great question. When I was first playing around with this reordering, I did
> some more substantial movement within struct rq. But for the versions I've
> sent out, I decided to keep it focused on just a few areas where I could
> tell there was clustered usage. I didn't want to over-fit the placement of
> a whole lot of fields in struct rq, since the code inevitably changes.
>
> So I'd be open to moving the lock to the same line as the clock fields if
> you think it would be a perf win, but I don't personally have any perf
> numbers that could demonstrate the merits of doing that. WDYT?
Most places that take rq->lock also update rq->clock. So I was wondering
if it made sense to put them together. I don't have numbers either.
> > nr_iowait has cross CPU updates and would be subject to false sharing.
> >
>
> Mmm, good point. I moved it back before the clock-related cache line to
> balance out moving "clock" and "clock_update_flags" to that line, since
> otherwise it was going to add an extra line to the structure. But also,
> it had been on the same line as the various clock fields, so that false
> sharing was already not doing us any favors.
>
> The line that I've moved it to has your queue_mask, so that's probably not
> the best place. How about if I move it to the end of the structure, next to
> cfsb_csd and cfsb_csd_list, since those don't seem to get used very often?
Works for me.
> > Does the thing want a __no_randomize_layout ?
> >
>
> Well, this is by far one of the hottest kernel data structures we see at
> Google, and it's not an obvious thing for a security vuln to target. So I'd
> be quite happy to add __no_randomize_layout; if you agree, I'll add it.
Lets make it so.
Thanks!
Powered by blists - more mailing lists