[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <bcd2a49d-a42f-41e8-9f64-4fd24fc862c7@kernel.org>
Date: Thu, 27 Nov 2025 08:14:45 +0100
From: "David Hildenbrand (Red Hat)" <david@...nel.org>
To: Ryan Roberts <ryan.roberts@....com>,
Lorenzo Stoakes <lorenzo.stoakes@...cle.com>
Cc: Wei Yang <richard.weiyang@...il.com>,
Samuel Holland <samuel.holland@...ive.com>,
Palmer Dabbelt <palmer@...belt.com>, Paul Walmsley <pjw@...nel.org>,
linux-riscv@...ts.infradead.org, Andrew Morton <akpm@...ux-foundation.org>,
linux-mm@...ck.org, devicetree@...r.kernel.org,
Suren Baghdasaryan <surenb@...gle.com>, linux-kernel@...r.kernel.org,
Mike Rapoport <rppt@...nel.org>, Michal Hocko <mhocko@...e.com>,
Conor Dooley <conor@...nel.org>, Krzysztof Kozlowski <krzk+dt@...nel.org>,
Alexandre Ghiti <alex@...ti.fr>, Emil Renner Berthing <kernel@...il.dk>,
Rob Herring <robh+dt@...nel.org>, Vlastimil Babka <vbabka@...e.cz>,
"Liam R . Howlett" <Liam.Howlett@...cle.com>,
Julia Lawall <Julia.Lawall@...ia.fr>, Nicolas Palix <nicolas.palix@...g.fr>,
Anshuman Khandual <anshuman.khandual@....com>
Subject: Re: [PATCH v3 06/22] mm: Always use page table accessor functions
On 11/26/25 21:31, David Hildenbrand (Red Hat) wrote:
> On 11/26/25 17:34, Ryan Roberts wrote:
>> On 26/11/2025 16:07, Ryan Roberts wrote:
>>> On 26/11/2025 15:12, David Hildenbrand (Red Hat) wrote:
>>>> On 11/26/25 16:08, Lorenzo Stoakes wrote:
>>>>> On Wed, Nov 26, 2025 at 03:56:13PM +0100, David Hildenbrand (Red Hat) wrote:
>>>>>> On 11/26/25 15:52, Lorenzo Stoakes wrote:
>>>>>>>
>>>>>>> Would the pmdp_get() never get invoked then? Or otherwise wouldn't that end up
>>>>>>> requiring a READ_ONCE() further up the stack?
>>>>>>
>>>>>> See my other reply, I think the pmdp_get() is required because all pud_*
>>>>>> functions are just simple stubs.
>>>>>
>>>>> OK, thought you were saying we should push further down the stack? Or up
>>>>> depending on how you view these things :P as in READ_ONCE at leaf?
>>>>
>>>> I think at leaf because I think the previous ones should essentially be only
>>>> used by stubs.
>>>>
>>>> But I haven't fully digested how this is all working. Or supposed to work.
>>>>
>>>> I'm trying to chew through the arch/arm/include/asm/pgtable-2level.h example to
>>>> see if I can make sense of it,
>>>
>>> I wonder if we can think about this slightly differently;
>>>
>>> READ_ONCE() has two important properties:
>>>
>>> - It guarrantees that a load will be issued, *even if output is unused*
>>> - It guarrantees that the read will be single-copy-atomic (no tearing)
>>>
>>> I think for the existing places where READ_ONCE() is used for pagetable reads we
>>> only care about:
>>>
>>> - It guarrantees that a load will be issued, *if output is used*
>>> - It guarrantees that the read will be single-copy-atomic (no tearing)
>>>
>>> I think if we can weaken to the "if output is used" property, then the compiler
>>> will optimize out all the unneccessary reads.
>>>
>>> AIUI, a C dereference provides neither of the guarrantees so that's no good.
>>>
>>> What about non-volatile asm? I'm told (thought need to verify) that for
>>> non-volatile asm, the compiler will emit it if the output is used and remove it
>>> otherwise. So if the asm contains the required single-copy-atomic, perhaps we
>>> are in business?
>>>
>>> So we would need a new READ_SCA() macro that could default to READ_ONCE() (which
>>> is stronger) and arches could opt in to providing a weaker asm version. Then the
>>> default pXdp_get() could be READ_SCA(). And this should work for all cases.
>>>
>>> I think.
>>
>> I'm not sure this works. It looks like the compiler is free to move non-volatile
>> asm sections which might be problematic for places where we are currently using
>> READ_ONCE() in lockless algorithms, (e.g. GUP?). We wouldn't want to end up with
>> a stale value.
>>
>> Another idea:
>>
>> Given the main pattern where we are aiming to optimize out the read is something
>> like:
>>
>> if (!pud_present(*pud))
>>
>> where for a folded pmd:
>>
>> static inline int pud_present(pud_t pud) { return 1; }
>>
>> And we will change it to this:
>>
>> if (!pud_present(pudp_get(pud)))
>>
>> ...
>>
>> perhaps we can just define the folded pXd_present(), pXd_none(), pXd_bad(),
>> pXd_user() and pXd_leaf() as macros:
>>
>> #define pud_present(pud) 1
>>
>
> Let's take a step back and realize that with __PAGETABLE_PMD_FOLDED
>
> (a) *pudp does not make any sense
>
> For a folded PMD, *pudp == *pmdp and consequently we would actually
> get a PMD, not a PUD.
>
> For this reason all these pud_* helpers ignore the passed value
> completely. It would be wrong.
>
> (b) pmd_offset() does *not* consume a pud but instead a pudp.
>
> That makes sense, just imagine what would happen if someone would pass
> *pudp to that helper (we'd dereference twice ...).
>
>
> So I wonder if we can just teach get_pudp() and friends to ... return
> true garbage instead of dereferencing something that does not make sense?
>
> diff --git a/include/linux/pgtable.h b/include/linux/pgtable.h
> index 32e8457ad5352..c95d0d89ab3f1 100644
> --- a/include/linux/pgtable.h
> +++ b/include/linux/pgtable.h
> @@ -351,7 +351,13 @@ static inline pmd_t pmdp_get(pmd_t *pmdp)
> #ifndef pudp_get
> static inline pud_t pudp_get(pud_t *pudp)
> {
> +#ifdef __PAGETABLE_PMD_FOLDED
> + pud_t dummy = { 0 };
> +
> + return dummy;
> +#else
> return READ_ONCE(*pudp);
> +#endif
> }
> #endif
>
> set_pud/pud_page/pud_pgtable helper are confusing, I would
> assume they are essentially unused (like documented for set_put)
> and only required to keep compilers happy.
Staring at GUP-fast and perf_get_pgtable_size()---which should better be
converted to pudp_get() etc--I guess we might have to rework
p4d_offset_lockless() to do something that doesn't rely on
passing variables of local variables.
We might have to enlighten these walkers (and only these) about folded
page tables such that they don't depend on the result of pudp_get() and
friends.
--
Cheers
David
Powered by blists - more mailing lists