lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [day] [month] [year] [list]
Message-ID: <a063f6c5-2785-4a9f-8079-25edb3e54cef@arm.com>
Date: Thu, 11 Dec 2025 13:59:53 +0000
From: Ryan Roberts <ryan.roberts@....com>
To: Samuel Holland <samuel.holland@...ive.com>,
 "David Hildenbrand (Red Hat)" <david@...nel.org>,
 Palmer Dabbelt <palmer@...belt.com>, Paul Walmsley <pjw@...nel.org>,
 linux-riscv@...ts.infradead.org, Andrew Morton <akpm@...ux-foundation.org>,
 linux-mm@...ck.org
Cc: devicetree@...r.kernel.org, Suren Baghdasaryan <surenb@...gle.com>,
 linux-kernel@...r.kernel.org, Mike Rapoport <rppt@...nel.org>,
 Michal Hocko <mhocko@...e.com>, Conor Dooley <conor@...nel.org>,
 Lorenzo Stoakes <lorenzo.stoakes@...cle.com>,
 Krzysztof Kozlowski <krzk+dt@...nel.org>, Alexandre Ghiti <alex@...ti.fr>,
 Emil Renner Berthing <kernel@...il.dk>, Rob Herring <robh+dt@...nel.org>,
 Vlastimil Babka <vbabka@...e.cz>, "Liam R . Howlett"
 <Liam.Howlett@...cle.com>
Subject: Re: [PATCH v3 08/22] mm: Allow page table accessors to be
 non-idempotent

On 11/12/2025 00:33, Samuel Holland wrote:
> On 2025-11-28 2:47 AM, David Hildenbrand (Red Hat) wrote:
>> On 11/27/25 17:57, Ryan Roberts wrote:
>>> On 13/11/2025 01:45, Samuel Holland wrote:
>>>> Currently, some functions such as pte_offset_map() are passed both
>>>> pointers to hardware page tables, and pointers to previously-read PMD
>>>> entries on the stack. To ensure correctness in the first case, these
>>>> functions must use the page table accessor function (pmdp_get()) to
>>>> dereference the supplied pointer. However, this means pmdp_get() is
>>>> called twice in the second case. This double call must be avoided if
>>>> pmdp_get() applies some non-idempotent transformation to the value.
>>>>
>>>> Avoid the double transformation by calling set_pmd() on the stack
>>>> variables where necessary to keep set_pmd()/pmdp_get() calls balanced.
>>>
>>> I don't think this is a good solution.
>>
>> Agreed,
>>
>>     set_pmd(&pmd, pmd);
>>
>> is rather horrible.
> I agree that this patch is ugly. The only way I see to avoid code like this is
> to refactor (or duplicate) the functions so no function takes pointers to both
> hardware page tables and on-stack page table entries. Is that sort of
> refactoring the right direction to go for v4?

>From a quick look at the code, I think that some cases are solvable by
refactoring to pass the value instead of the pointer, and leave it to the higher
level decide how to read the value from the pointer - it knows if it is pointing
to HW pgtable or if it's a (e.g) stack value.

But the more I look at the code, the more instances I find where pointers to
stack variables are being passed to arch pgtable helpers as if they are HW
pgtable entry pointers. (Mainly pmd level).

I wonder if we need to bite the bullet and explicitly separate the types? At
each level, we have:

 1. page table entry value
 2. pointer to page table entry _value_ (e.g. pointer to pXX_t on stack)
 3. pointer to page table entry in HW pgtable

Today, 1 is represented by pte_t, pmd_t, etc. 2 and 3 are represented by the
same type; pte_t*, pmd_t*, etc.

If we create a new type for 3, it will both document and enforce when type 2 or
type 3 is required.

e.g:

// pte_t: defined by arch.
typedef unsigned long pte_t;

// ptep_t: new opaque type that can't be dereferenced.
struct __ptep_t;
typedef struct __ptep_t *ptep_t;

// getter/setter responsible for cast & deref as appropriate.
pte_t ptep_get(ptep_t ptep)
{
	return READ_ONCE(*(pte_t *)ptep);
}

int do_stuff(void)
{
	// value on stack: ok
	pte_t mypte;

	// pointer to value on stack: ok
	pte_t *pmypte = &mypte;

	// handle to entry on stack: not allowed by compiler!
	ptep_t myptep = &mypte;

	// handle to entry in pgtable: ok
	ptep_t myptep = pte_offset_kernel(...);

	// read value of pgtable entry: ok
	pte_t val = ptep_get(myptep);

	// attempt to pass pointer to stack variable: not allowed by compiler!
	pte_t val = ptep_get(&mypte);

	// attempt to directly dereference ptep: not allowed by compiler!
	pte_t val = *myptep;
}


We could do this incrementally by initially typedefing ptep_t to be:
typedef pte_t *ptep_t;

Then we could flip the switch arch-by-arch to enable the stronger checking. We
likely wouldn't need to convert arches that don't care.

I think by doing this, it will expose all the current issues and force us to fix
them properly.

On the related subject of conversion to pXXp_get(); I've been looking into this
and personally, I think we should have 2 helper flavours at each level:

 - pXXd_get()      optimizable by compiler; defaults to C dereference
 - pXXd_get_once() single-copy-atomic and unmovable by compiler

It simplifies the converstion process, and reduces the risk of bugs
significantly (go read about the arm32 issues discussed in Anshuman's series if
you haven't done already).

I appreciate this is all probably a lot more work than you would prefer to sign
up for, I'd be happy to collaborate if we get concensus that this approach makes
sense. What do you think?

Thanks,
Ryan


Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ