[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-Id: <DG0BD0NGT1HH.1PVFYFO0CCT6N@garyguo.net>
Date: Wed, 28 Jan 2026 15:05:37 +0000
From: "Gary Guo" <gary@...yguo.net>
To: "Alice Ryhl" <aliceryhl@...gle.com>, "Gary Guo" <gary@...yguo.net>
Cc: "Alexandre Courbot" <acourbot@...dia.com>, "Andreas Hindborg"
<a.hindborg@...nel.org>, "Miguel Ojeda" <ojeda@...nel.org>, "Boqun Feng"
<boqun.feng@...il.com>, Björn Roy Baron
<bjorn3_gh@...tonmail.com>, "Benno Lossin" <lossin@...nel.org>, "Trevor
Gross" <tmgross@...ch.edu>, "Danilo Krummrich" <dakr@...nel.org>,
<linux-kernel@...r.kernel.org>, <rust-for-linux@...r.kernel.org>
Subject: Re: [PATCH] rust: add `CacheAligned` for easy cache line alignment
of values
On Wed Jan 28, 2026 at 2:46 PM GMT, Alice Ryhl wrote:
> On Wed, Jan 28, 2026 at 02:41:05PM +0000, Gary Guo wrote:
>> On Wed Jan 28, 2026 at 2:25 PM GMT, Alexandre Courbot wrote:
>> > On Wed Jan 28, 2026 at 11:05 PM JST, Andreas Hindborg wrote:
>> > While 64 bytes is the most common cache line size, AFAIK this is not
>> > a universal value? Can we expose and use `L1_CACHE_BYTES` here?
>>
>> On all archs that we do support today, I think the value is always 64. However
>> it'd worth putting a FIXME or TODO (or assertion, maybe?) in case new archs gets
>> addded where this isn't true.
>
> Are you sure? From Tokio:
>
>> Starting from Intel's Sandy Bridge, spatial prefetcher is now pulling pairs of 64-byte cache
>> lines at a time, so we have to align to 128 bytes rather than 64.
A cache line is still 64B, even if a prefetcher might pull in multiple cache
lines.
The hardware prefetcher usually only engage when a sequential access pattern is
discovered. So if you're doing array access with increasing index, it would
engage and pull in next cache line; however if you are performing random access
(e.g. following a link list), it would not engage, as otherwise you're
effectively having half the number of cache lines available in your L1 cache.
If the software need to fight the hardware prefetcher in general (where there's
no regular seqential access pattern) by spreading things further apart in
memory, it means that the hardware prefetcher has failed its task and is a bad
design :)
>>
>> Sources:
>> - https://www.intel.com/content/dam/www/public/us/en/documents/manuals/64-ia-32-architectures-optimization-manual.pdf
>> - https://github.com/facebook/folly/blob/1b5288e6eea6df074758f877c849b6e73bbb9fbb/folly/lang/Align.h#L107
>>
>> ARM's big.LITTLE architecture has asymmetric cores and "big" cores have 128-byte cache line size.
>>
>> Sources:
>> - https://www.mono-project.com/news/2016/09/12/arm64-icache/
arch/arm64/include/asm/cache.h defines L1_CACHE_BYTES as 64.
>>
>> powerpc64 has 128-byte cache line size.
>>
>> Sources:
>> - https://github.com/golang/go/blob/3dd58676054223962cd915bb0934d1f9f489d4d2/src/internal/cpu/cpu_ppc64x.go#L9
There's no PPC support today in kernel Rust today.
Best,
Gary
>
> https://github.com/tokio-rs/tokio/blob/master/tokio/src/util/cacheline.rs#L85
Powered by blists - more mailing lists