lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <DD7WQ97R8OG6.1CA5E2FU5ISMZ@google.com>
Date: Thu, 02 Oct 2025 14:31:02 +0000
From: Brendan Jackman <jackmanb@...gle.com>
To: Dave Hansen <dave.hansen@...el.com>, Brendan Jackman <jackmanb@...gle.com>, 
	Andy Lutomirski <luto@...nel.org>, Lorenzo Stoakes <lorenzo.stoakes@...cle.com>, 
	"Liam R. Howlett" <Liam.Howlett@...cle.com>, Suren Baghdasaryan <surenb@...gle.com>, 
	Michal Hocko <mhocko@...e.com>, Johannes Weiner <hannes@...xchg.org>, Zi Yan <ziy@...dia.com>, 
	Axel Rasmussen <axelrasmussen@...gle.com>, Yuanchu Xie <yuanchu@...gle.com>, 
	Roman Gushchin <roman.gushchin@...ux.dev>
Cc: <peterz@...radead.org>, <bp@...en8.de>, <dave.hansen@...ux.intel.com>, 
	<mingo@...hat.com>, <tglx@...utronix.de>, <akpm@...ux-foundation.org>, 
	<david@...hat.com>, <derkling@...gle.com>, <junaids@...gle.com>, 
	<linux-kernel@...r.kernel.org>, <linux-mm@...ck.org>, <reijiw@...gle.com>, 
	<rientjes@...gle.com>, <rppt@...nel.org>, <vbabka@...e.cz>, <x86@...nel.org>, 
	Yosry Ahmed <yosry.ahmed@...ux.dev>
Subject: Re: [PATCH 05/21] x86/mm/pat: mirror direct map changes to ASI

On Wed Oct 1, 2025 at 8:50 PM UTC, Dave Hansen wrote:
> On 9/24/25 07:59, Brendan Jackman wrote:
>> ASI has a separate PGD for the physmap, which needs to be kept in sync
>> with the unrestricted physmap with respect to permissions.
>
> So that leads to another thing... What about vmalloc()? Why doesn't it
> need to be in the ASI pgd?

Oh yeah it does. For the "actually entering the restricted addres space"
patchset, I'll include logic that just shares that region between the
unrestricted and restricted address space, something like this:

https://github.com/torvalds/linux/commit/04fd7a0b0098af48f2f8d9c0343b1edd12987681#diff-ecb3536ec179c07d4b4b387e58e62a9a6e553069cfed22a73448eb2ce5b82aa6R637-R669

Later, we'll want to be able to protect subsets of the vmalloc area
(i.e. unmap them from the restricted address space) too, but that's
something we can think about later I think. Unless I'm mistaken it's
much simpler than for the direct map. Junaid had a minumal solution for
that in his 2022 RFC [0]:

[0] https://lore.kernel.org/all/20220223052223.1202152-12-junaids@google.com/
	
>> +static inline bool is_direct_map(unsigned long vaddr)
>> +{
>> +	return within(vaddr, PAGE_OFFSET,
>> +		      PAGE_OFFSET + (max_pfn_mapped << PAGE_SHIFT));
>> +}
>>  
>>  static int __cpa_process_fault(struct cpa_data *cpa, unsigned long vaddr,
>>  			       int primary)
>> @@ -1808,8 +1814,7 @@ static int __cpa_process_fault(struct cpa_data *cpa, unsigned long vaddr,
>>  	 * one virtual address page and its pfn. TBD: numpages can be set based
>>  	 * on the initial value and the level returned by lookup_address().
>>  	 */
>> -	if (within(vaddr, PAGE_OFFSET,
>> -		   PAGE_OFFSET + (max_pfn_mapped << PAGE_SHIFT))) {
>> +	if (is_direct_map(vaddr)) {
>>  		cpa->numpages = 1;
>>  		cpa->pfn = __pa(vaddr) >> PAGE_SHIFT;
>>  		return 0;
>> @@ -1981,6 +1986,27 @@ static int cpa_process_alias(struct cpa_data *cpa)
>>  	return 0;
>>  }
>>  
>> +/*
>> + * Having updated the unrestricted PGD, reflect this change in the ASI
>> + * restricted address space too.
>> + */
>> +static inline int mirror_asi_direct_map(struct cpa_data *cpa, int primary)
>> +{
>> +	struct cpa_data asi_cpa = *cpa;
>> +
>> +	if (!asi_enabled_static())
>> +		return 0;
>> +
>> +	/* Only need to do this for the real unrestricted direct map. */
>> +	if ((cpa->pgd && cpa->pgd != init_mm.pgd) || !is_direct_map(*cpa->vaddr))
>> +		return 0;
>> +	VM_WARN_ON_ONCE(!is_direct_map(*cpa->vaddr + (cpa->numpages * PAGE_SIZE)));
>> +
>> +	asi_cpa.pgd = asi_nonsensitive_pgd;
>> +	asi_cpa.curpage = 0;
>
> Please document what functionality this curpage=0 has. It's not clear.

Ack, I'll add some commentary.

>> +	return __change_page_attr(cpa, primary);
>> +}
>
> But let's say someone is doing something silly like:
>
> 	set_memory_np(addr, size);
> 	set_memory_p(addr, size);
>
> Won't that end up in here and make the "unrestricted PGD" have
> _PAGE_PRESENT==1 entries?

Er, yes, that's a bug, thanks for pointing this out. I guess this is
actually broken under debug_pagealloc or something? I should check that.

This code should only mirror the bits that are irrelevant to ASI. 

> Also, could we try and make the nomenclature consistent? We've got
> "unrestricted direct map" and "asi_nonsensitive_pgd" being used (at
> least). Could the terminology be made more consistent?

Hm. It is actually consistent: "unrestricted" is a property of the
address space / execution context. "nonsensitive" is a property of the
memory. Nonsensitive memory is mapped into the unrestricted address
space. asi_nonsensitive_pgd isn't an address space we enter it's just a
holding area (like if we never actually pointed CR3 at init_mm.pgd but
just useed it as a source to clone from).

However.. just because it's consistent doesn't mean it's not confusing.
Do you think we should just squash these two words and call the whole
thing "nonsensitive"? I don't know if "nonsensitive address space" makes
much sense... Is it possible I can fix this by just adding more
comments?

> One subtle thing here is that it's OK to allocate memory here when
> mirroring changes into 'asi_nonsensitive_pgd'. It's just not OK when
> flipping sensitivity. That seems worth a comment.

Ack, will add that.

>>  static int __change_page_attr_set_clr(struct cpa_data *cpa, int primary)
>>  {
>>  	unsigned long numpages = cpa->numpages;
>> @@ -2007,6 +2033,8 @@ static int __change_page_attr_set_clr(struct cpa_data *cpa, int primary)
>>  		if (!debug_pagealloc_enabled())
>>  			spin_lock(&cpa_lock);
>>  		ret = __change_page_attr(cpa, primary);
>> +		if (!ret)
>> +			ret = mirror_asi_direct_map(cpa, primary);
>>  		if (!debug_pagealloc_enabled())
>>  			spin_unlock(&cpa_lock);
>>  		if (ret)
>> 
>
> Is cpa->pgd ever have any values other than NULL or init_mm->pgd? I
> didn't see anything in a quick grep.

It can also be efi_mm.pgd via sev_es_efi_map_ghcbs_cas().

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ