linux-kernel - Re: [PATCH v3 06/22] mm: Always use page table accessor functions

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <f1932547-0fc6-4dbd-868c-0b0b34429a33@arm.com>
Date: Thu, 27 Nov 2025 15:32:34 +0000
From: Ryan Roberts <ryan.roberts@....com>
To: "David Hildenbrand (Red Hat)" <david@...nel.org>,
 Lorenzo Stoakes <lorenzo.stoakes@...cle.com>
Cc: Wei Yang <richard.weiyang@...il.com>,
 Samuel Holland <samuel.holland@...ive.com>,
 Palmer Dabbelt <palmer@...belt.com>, Paul Walmsley <pjw@...nel.org>,
 linux-riscv@...ts.infradead.org, Andrew Morton <akpm@...ux-foundation.org>,
 linux-mm@...ck.org, devicetree@...r.kernel.org,
 Suren Baghdasaryan <surenb@...gle.com>, linux-kernel@...r.kernel.org,
 Mike Rapoport <rppt@...nel.org>, Michal Hocko <mhocko@...e.com>,
 Conor Dooley <conor@...nel.org>, Krzysztof Kozlowski <krzk+dt@...nel.org>,
 Alexandre Ghiti <alex@...ti.fr>, Emil Renner Berthing <kernel@...il.dk>,
 Rob Herring <robh+dt@...nel.org>, Vlastimil Babka <vbabka@...e.cz>,
 "Liam R . Howlett" <Liam.Howlett@...cle.com>,
 Julia Lawall <Julia.Lawall@...ia.fr>, Nicolas Palix <nicolas.palix@...g.fr>,
 Anshuman Khandual <anshuman.khandual@....com>
Subject: Re: [PATCH v3 06/22] mm: Always use page table accessor functions

On 27/11/2025 07:31, David Hildenbrand (Red Hat) wrote:
> On 11/27/25 08:14, David Hildenbrand (Red Hat) wrote:
>> On 11/26/25 21:31, David Hildenbrand (Red Hat) wrote:
>>> On 11/26/25 17:34, Ryan Roberts wrote:
>>>> On 26/11/2025 16:07, Ryan Roberts wrote:
>>>>> On 26/11/2025 15:12, David Hildenbrand (Red Hat) wrote:
>>>>>> On 11/26/25 16:08, Lorenzo Stoakes wrote:
>>>>>>> On Wed, Nov 26, 2025 at 03:56:13PM +0100, David Hildenbrand (Red Hat) wrote:
>>>>>>>> On 11/26/25 15:52, Lorenzo Stoakes wrote:
>>>>>>>>>
>>>>>>>>> Would the pmdp_get() never get invoked then? Or otherwise wouldn't that
>>>>>>>>> end up
>>>>>>>>> requiring a READ_ONCE() further up the stack?
>>>>>>>>
>>>>>>>> See my other reply, I think the pmdp_get() is required because all pud_*
>>>>>>>> functions are just simple stubs.
>>>>>>>
>>>>>>> OK, thought you were saying we should push further down the stack? Or up
>>>>>>> depending on how you view these things :P as in READ_ONCE at leaf?
>>>>>>
>>>>>> I think at leaf because I think the previous ones should essentially be only
>>>>>> used by stubs.
>>>>>>
>>>>>> But I haven't fully digested how this is all working. Or supposed to work.
>>>>>>
>>>>>> I'm trying to chew through the arch/arm/include/asm/pgtable-2level.h
>>>>>> example to
>>>>>> see if I can make sense of it,
>>>>>
>>>>> I wonder if we can think about this slightly differently;
>>>>>
>>>>> READ_ONCE() has two important properties:
>>>>>
>>>>>     - It guarrantees that a load will be issued, *even if output is unused*
>>>>>     - It guarrantees that the read will be single-copy-atomic (no tearing)
>>>>>
>>>>> I think for the existing places where READ_ONCE() is used for pagetable
>>>>> reads we
>>>>> only care about:
>>>>>
>>>>>     - It guarrantees that a load will be issued, *if output is used*
>>>>>     - It guarrantees that the read will be single-copy-atomic (no tearing)
>>>>>
>>>>> I think if we can weaken to the "if output is used" property, then the
>>>>> compiler
>>>>> will optimize out all the unneccessary reads.
>>>>>
>>>>> AIUI, a C dereference provides neither of the guarrantees so that's no good.
>>>>>
>>>>> What about non-volatile asm? I'm told (thought need to verify) that for
>>>>> non-volatile asm, the compiler will emit it if the output is used and
>>>>> remove it
>>>>> otherwise. So if the asm contains the required single-copy-atomic, perhaps we
>>>>> are in business?
>>>>>
>>>>> So we would need a new READ_SCA() macro that could default to READ_ONCE()
>>>>> (which
>>>>> is stronger) and arches could opt in to providing a weaker asm version.
>>>>> Then the
>>>>> default pXdp_get() could be READ_SCA(). And this should work for all cases.
>>>>>
>>>>> I think.
>>>>
>>>> I'm not sure this works. It looks like the compiler is free to move non-
>>>> volatile
>>>> asm sections which might be problematic for places where we are currently using
>>>> READ_ONCE() in lockless algorithms, (e.g. GUP?). We wouldn't want to end up
>>>> with
>>>> a stale value.
>>>>
>>>> Another idea:
>>>>
>>>> Given the main pattern where we are aiming to optimize out the read is
>>>> something
>>>> like:
>>>>
>>>> if (!pud_present(*pud))
>>>>
>>>> where for a folded pmd:
>>>>
>>>> static inline int pud_present(pud_t pud)    { return 1; }
>>>>
>>>> And we will change it to this:
>>>>
>>>> if (!pud_present(pudp_get(pud)))
>>>>
>>>> ...
>>>>
>>>> perhaps we can just define the folded pXd_present(), pXd_none(), pXd_bad(),
>>>> pXd_user() and pXd_leaf() as macros:
>>>>
>>>> #define pud_present(pud)    1
>>>>
>>>
>>> Let's take a step back and realize that with __PAGETABLE_PMD_FOLDED
>>>
>>> (a) *pudp does not make any sense

I'm struggling with this statement. The PMD is folded, not the PUD. So the PMD
table consists of a single entry, which overlays the entry in the PUD, right? So
the PUD exists and *pudp must be valid?

Now, pgtable-nopmd.h includes pgtable-nopud.h so actually the PUD is also folded
into the P4D and the P4D is also folded into the PGD. But the PGD exists and is
not folded.

So what you are saying is true in practice, I guess. But it's not true if you
replace PMD with P4D and PUD with PGD.

Does what I'm saying matter or have any impact on your assessment below? I have
no idea. My head hurts.

>>>
>>> For a folded PMD, *pudp == *pmdp and consequently we would actually
>>> get a PMD, not a PUD.

Well based on what I said above, the PMD is folded into the PUD, the PUD is
folded into the P4D and the P4D is folded into the PGD. So all of those things
are actually pointing to the PGD entry in the PGD table, right? And pmd_t,
pud_t, p4d_t are all defined as containers holding the next thing up:

typedef struct { pud_t pud; } pmd_t;

So it all works out, and is safe, I think?

>>>
>>> For this reason all these pud_* helpers ignore the passed value
>>> completely. It would be wrong.
>>>
>>> (b) pmd_offset() does *not* consume a pud but instead a pudp.
>>>
>>> That makes sense, just imagine what would happen if someone would pass
>>> *pudp to that helper (we'd dereference twice ...).
>>>
>>>
>>> So I wonder if we can just teach get_pudp() and friends to ... return
>>> true garbage instead of dereferencing something that does not make sense?

Sorry, I don't really understand why dereferencing does not make sense. the
pud_t is a struct containing a single p4d_t and that is a struct containing a
single pgd_t and it is pointing to the pgd entry. So why isn't it safe?

>>>
>>> diff --git a/include/linux/pgtable.h b/include/linux/pgtable.h
>>> index 32e8457ad5352..c95d0d89ab3f1 100644
>>> --- a/include/linux/pgtable.h
>>> +++ b/include/linux/pgtable.h
>>> @@ -351,7 +351,13 @@ static inline pmd_t pmdp_get(pmd_t *pmdp)
>>>     #ifndef pudp_get
>>>     static inline pud_t pudp_get(pud_t *pudp)
>>>     {
>>> +#ifdef __PAGETABLE_PMD_FOLDED
>>> +       pud_t dummy = { 0 };
>>> +
>>> +       return dummy;

If you're confident that this is correct, then I believe you. But I can't follow
your logic at the moment. But we have a fairly long history of you being right
and me not understanding... :)

I don't think in general it is invalid for something to call the getter and do
something with the result just because it's folded. e.g. log it's value. So I
don't think returning a dummy value is ok.

See __print_bad_page_map_pgtable() for example?

>>> +#else
>>>            return READ_ONCE(*pudp);
>>> +#endif
>>>     }
>>>     #endif
>>>     set_pud/pud_page/pud_pgtable helper are confusing, I would
>>> assume they are essentially unused (like documented for set_put)
>>> and only required to keep compilers happy.
>>
>> Staring at GUP-fast and perf_get_pgtable_size()---which should better be
>> converted to pudp_get() etc--I guess we might have to rework
>> p4d_offset_lockless() to do something that doesn't rely on
>> passing variables of local variables.
>>
>> We might have to enlighten these walkers (and only these) about folded
>> page tables such that they don't depend on the result of pudp_get() and
>> friends.
> 
> Talking to myself (I know), handling this might be as simple as having
> 
> diff --git a/include/asm-generic/pgtable-nopmd.h b/include/asm-generic/pgtable-
> nopmd.h
> index 8ffd64e7a24cb..60e5ba02bcf06 100644
> --- a/include/asm-generic/pgtable-nopmd.h
> +++ b/include/asm-generic/pgtable-nopmd.h
> @@ -49,6 +49,13 @@ static inline pmd_t * pmd_offset(pud_t * pud, unsigned long
> address)
>  }
>  #define pmd_offset pmd_offset
>  
> +static inline pmd_t * pmd_offset_lockless(pud_t *pud, puf_t pud, unsigned long
> address)
> +{
> +       return (pmd_t *)pud;
> +}
> +#define pmd_offset_lcokless pmd_offset_lockless
> +
> +
>  #define pmd_val(x)                             (pud_val((x).pud))
>  #define __pmd(x)                               ((pmd_t) { __pud(x) } )
>  
> 
> IOW, just like for pmd_offset() we cast the pointer and don't ever touch the pud.
> 
> 
> As a reminder, the default is
> 
> #ifndef pmd_offset_lockless
> #define pmd_offset_lockless(pudp, pud, address) pmd_offset(&(pud), address)
> #endif
> 
> (isn't that nasty? :) )

Just as a general comment; arm64 expects and requires today that any pte_t
pointer passed to ptep_get(), set_ptes(), and all the other ptep helpers point
to entries in the pgtable. We cannot tolerate being passed pointers to stack
variables. We assume we can roam around in the pgtable for contpte purposes.

Extending this to the other levels, we would have the same requirements.

Thanks,
Ryan

>