[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <DD80BJMZM5EF.4V737FVJY4F3@google.com>
Date: Thu, 02 Oct 2025 17:19:54 +0000
From: Brendan Jackman <jackmanb@...gle.com>
To: Dave Hansen <dave.hansen@...el.com>, Brendan Jackman <jackmanb@...gle.com>,
Andy Lutomirski <luto@...nel.org>, Lorenzo Stoakes <lorenzo.stoakes@...cle.com>,
"Liam R. Howlett" <Liam.Howlett@...cle.com>, Suren Baghdasaryan <surenb@...gle.com>,
Michal Hocko <mhocko@...e.com>, Johannes Weiner <hannes@...xchg.org>, Zi Yan <ziy@...dia.com>,
Axel Rasmussen <axelrasmussen@...gle.com>, Yuanchu Xie <yuanchu@...gle.com>,
Roman Gushchin <roman.gushchin@...ux.dev>
Cc: <peterz@...radead.org>, <bp@...en8.de>, <dave.hansen@...ux.intel.com>,
<mingo@...hat.com>, <tglx@...utronix.de>, <akpm@...ux-foundation.org>,
<david@...hat.com>, <derkling@...gle.com>, <junaids@...gle.com>,
<linux-kernel@...r.kernel.org>, <linux-mm@...ck.org>, <reijiw@...gle.com>,
<rientjes@...gle.com>, <rppt@...nel.org>, <vbabka@...e.cz>, <x86@...nel.org>,
<yosry.ahmed@...ux.dev>
Subject: Re: [PATCH 04/21] x86/mm/asi: set up asi_nonsensitive_pgd
On Thu Oct 2, 2025 at 4:14 PM UTC, Dave Hansen wrote:
> On 10/2/25 07:05, Brendan Jackman wrote:
>> On Wed Oct 1, 2025 at 8:28 PM UTC, Dave Hansen wrote:
> ...>> I also can't help but wonder if it would have been easier and more
>>> straightforward to just start this whole exercise at 4k: force all the
>>> ASI tables to be 4k. Then, later, add the 2MB support and tie to
>>> pageblocks on after.
>>
>> This would lead to a much smaller patchset, but I think it creates some
>> pretty yucky technical debt and complexity of its own. If you're
>> imagining a world where we just leave most of the allocator as-is, and
>> just inject "map into ASI" or "unmap from ASI" at the right moments...
> ...
>
> I'm trying to separate out the two problems:
>
> 1. Have a set of page tables that never require allocations in order to
> map or unmap sensitive data.
> 2. Manage each pageblock as either all sensitive or all not sensitive
>
> There is a nonzero set of dependencies to make sure that the pageblock
> size is compatible with the page table mapping size... unless you just
> make the mapping size 4k.
>
> If the mapping size is 4k, the pageblock size can be anything. There's
> no dependency to satisfy.
>
> So I'm not saying to make the sensitive/nonsensitive boundary 4k. Just
> to make the _mapping_ size 4k. Then, come back later, and move the
> mapping size over to 2MB as an optimization.
Ahh thanks, I get your point now. And yep I'm sold, I'll go to 4k for
v2.
>>>> + if (asi_nonsensitive_pgd) {
>>>> + /*
>>>> + * Since most memory is expected to end up sensitive, start with
>>>> + * everything unmapped in this pagetable.
>>>> + */
>>>> + pgprot_t prot_np = __pgprot(pgprot_val(prot) & ~_PAGE_PRESENT);
>>>> +
>>>> + VM_BUG_ON((PAGE_SHIFT + pageblock_order) < page_level_shift(PG_LEVEL_2M));
>>>> + phys_pgd_init(asi_nonsensitive_pgd, paddr_start, paddr_end, 1 << PG_LEVEL_2M,
>>>> + prot_np, init, NULL);
>>>> + }
>>>
>>> I'm also kinda wondering what the purpose is of having a whole page
>>> table full of !_PAGE_PRESENT entries. It would be nice to know how this
>>> eventually gets turned into something useful.
>>
>> If you are thinking of the fact that just clearing P doesn't really do
>> anything for Meltdown/L1TF.. yeah that's true! We'll actually need to
>> munge the PFN or something too, but here I wanted do just focus on the
>> broad strokes of integration without worrying too much about individual
>> CPU mitigations. Flippping _PAGE_PRESENT is already supported by
>> set_memory.c and IIRC it's good enough for everything newer than
>> Skylake.
>>
>> Other than that, these pages being unmapped is the whole point.. later
>> on, the subset of memory that we don't need to protect will get flipped
>> to being present. Everything else will trigger a pagefault if touched
>> and we'll switch address spaces, do the flushing etc.
>>
>> Sorry if I'm missing your point here...
>
> What is the point of having a pgd if you can't put it in CR3? If you:
>
> write_cr3(asi_nonsensitive_pgd);
>
> you'll just triple fault because all kernel text is !_PAGE_PRESENT.
>
> The critical point is when 'asi_nonsensitive_pgd' is functional enough
> that it can be loaded into CR3 and handle a switch to the normal
> init_mm->pgd.
Hm, are you saying that I should expand the scope of the patchset from
"set up the direct map" to "set up an ASI address space"? If so, yeah I
can do that, I don't think the patchset would get that much bigger. I
only left the other bits out because it feels weird to set up a whole
address space but never actually switch into it. Setting up the logic to
switch into it would make the patchset really big though.
Like I said in the cover letter, I could also always change tack:
we could instead start with all the address-space switching logic, but
just have the two address spaces be clones of each other. Then we could
come back and start poking holes in the ASI one for the second series. I
don't have a really strong opinion about the best place to start, but
I'll stick to my current course unless someone else does have a strong
opinion.
Powered by blists - more mailing lists