[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <CAH5Ym4iqvQuO6JxO-jypTp05Ug_2vDokCDoBgGB+cOzgmTQpkQ@mail.gmail.com>
Date: Sun, 24 Aug 2025 16:43:08 -0700
From: Sam Edwards <cfsworks@...il.com>
To: Marc Zyngier <maz@...nel.org>
Cc: Ard Biesheuvel <ardb@...nel.org>, Catalin Marinas <catalin.marinas@....com>,
Will Deacon <will@...nel.org>, Andrew Morton <akpm@...ux-foundation.org>,
Anshuman Khandual <anshuman.khandual@....com>, Ryan Roberts <ryan.roberts@....com>,
Baruch Siach <baruch@...s.co.il>, Kevin Brodsky <kevin.brodsky@....com>,
Joey Gouly <joey.gouly@....com>, linux-arm-kernel@...ts.infradead.org,
linux-kernel@...r.kernel.org, stable@...r.kernel.org
Subject: Re: [PATCH] arm64/boot: Zero-initialize idmap PGDs before use
Hi, Marc! It's been a while; hope you're well.
On Sun, Aug 24, 2025 at 1:55 AM Marc Zyngier <maz@...nel.org> wrote:
>
> Hi Sam,
>
> On Sun, 24 Aug 2025 04:05:05 +0100,
> Sam Edwards <cfsworks@...il.com> wrote:
> >
> > On Sat, Aug 23, 2025 at 5:29 PM Ard Biesheuvel <ardb@...nel.org> wrote:
> > >
>
> [...]
>
> > > Under which conditions would PGD_SIZE assume a value greater than PAGE_SIZE?
> >
> > I might be doing my math wrong, but wouldn't 52-bit VA with 4K
> > granules and 5 levels result in this?
>
> No. 52bit VA at 4kB granule results in levels 0-3 each resolving 9
> bits, and level -1 resolving 4 bits. That's a total of 40 bits, plus
> the 12 bits coming directly from the VA making for the expected 52.
Thank you, that makes it clear: I made an off-by-one mistake in my
counting of the levels.
> > Each PTE represents 4K of virtual memory, so covers VA bits [11:0]
> > (this is level 3)
>
> That's where you got it wrong. The architecture is pretty clear that
> each level resolves PAGE_SHIFT-3 bits, hence the computation
> above. The bottom PAGE_SHIFT bits are directly extracted from the VA,
> without any translation.
Bear with me a moment while I unpack which part of that I got wrong:
A PTE is the terminal entry of the MMU walk, so I believe I'm correct
(in this example, and assuming no hugepages) that each PTE represents
4K of virtual memory: that means the final step of computing a PA
takes a (valid) PTE and the low 12 bits of the VA, then just adds
those bits to the physical frame address.
It sounds like what you're saying is "That isn't a *level* though:
that's just concatenation. A 'level' always takes a bitslice of the VA
and uses it as an index into a table of word-sized entries. PTEs don't
point to a further table: they have all of the final information
encoded directly."
That makes a lot more sense to me, but contradicts how I read this
comment from pgtable-hwdef.h:
* Level 3 descriptor (PTE).
I took this as, "a PTE describes how to perform level 3 of the
translation." But because in fact there are no "levels" after a PTE,
it must actually be saying "Level 3 of the translation is a lookup
into an array of PTEs."? The problem with that latter reading is that
this comment...
* Level -1 descriptor (PGD).
...when read the same way, is saying "Level -1 of the translation is a
lookup into an array of PGDs." An "array of PGDs" is nonsense, so I
reverted back to my earlier readings: "PGD describes how to do level
-1." and "PTE describes how to do level 3."
This smells like a classic "fencepost problem": The "PXX" Linuxisms
refer to the *nodes* along the MMU walk, while the "levels" in ARM
parlance are the actual steps of the walk taken by hardware -- edges,
not nodes, getting us from fencepost to fencepost. A fence with five
segments needs six posts, but we only have five currently.
So: where do the terms P4D, PUD, and PMD fit in here? And which one's
our missing fencepost?
PGD ----> ??? ----> ??? ----> ??? ----> ??? ----> PTE (|| low VA bits
= final PA)
> > > Note that at stage 1, arm64 does not support page table concatenation,
> > > and so the root page table is never larger than a page.
> >
> > Doesn't PGD_SIZE refer to the size used for userspace PGDs after the
> > boot progresses beyond stage 1? (What do you mean by "never" here?
> > "Under no circumstances is it larger than a page at stage 1"? Or
> > "during the entire lifecycle of the system, there is no time at which
> > it's larger than a page"?)
>
> Never, ever, is a S1 table bigger than a page. This concept doesn't
> exist in the architecture. Only S2 tables can use concatenation at the
> top-most level, for up to 16 pages (in order to skip a level when
> possible).
>
> The top-level can be smaller than a page, with some alignment
> constraints, but that's about the only degree of freedom you have for
> S1 page tables.
Okay, that clicked for me: I was reading "stage" in the context of the
boot process. These explanations make a lot more sense when reading
"stage" in the context of the MMU.
So PGD_SIZE <= PAGE_SIZE, the PAGE_SIZE spacing in vmlinux.lds.S is
for alignment, and I should be looking at cases where PGDs are assumed
to be PAGE_SIZE to make those consistent instead. Thanks!
Cheers,
Sam
>
> Thanks,
>
> M.
>
> --
> Jazz isn't dead. It just smells funny.
Powered by blists - more mailing lists