[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <cbea0364-b8e3-4e3b-852f-b16436b60de1@kernel.org>
Date: Thu, 18 Dec 2025 10:49:26 +0100
From: "David Hildenbrand (Red Hat)" <david@...nel.org>
To: Ryan Roberts <ryan.roberts@....com>,
Samuel Holland <samuel.holland@...ive.com>,
Palmer Dabbelt <palmer@...belt.com>, Paul Walmsley <pjw@...nel.org>,
linux-riscv@...ts.infradead.org, Andrew Morton <akpm@...ux-foundation.org>,
linux-mm@...ck.org
Cc: devicetree@...r.kernel.org, Suren Baghdasaryan <surenb@...gle.com>,
linux-kernel@...r.kernel.org, Mike Rapoport <rppt@...nel.org>,
Michal Hocko <mhocko@...e.com>, Conor Dooley <conor@...nel.org>,
Lorenzo Stoakes <lorenzo.stoakes@...cle.com>,
Krzysztof Kozlowski <krzk+dt@...nel.org>, Alexandre Ghiti <alex@...ti.fr>,
Emil Renner Berthing <kernel@...il.dk>, Rob Herring <robh+dt@...nel.org>,
Vlastimil Babka <vbabka@...e.cz>, "Liam R . Howlett"
<Liam.Howlett@...cle.com>
Subject: Re: [PATCH v3 08/22] mm: Allow page table accessors to be
non-idempotent
On 12/11/25 14:59, Ryan Roberts wrote:
> On 11/12/2025 00:33, Samuel Holland wrote:
>> On 2025-11-28 2:47 AM, David Hildenbrand (Red Hat) wrote:
>>> On 11/27/25 17:57, Ryan Roberts wrote:
>>>> On 13/11/2025 01:45, Samuel Holland wrote:
>>>>> Currently, some functions such as pte_offset_map() are passed both
>>>>> pointers to hardware page tables, and pointers to previously-read PMD
>>>>> entries on the stack. To ensure correctness in the first case, these
>>>>> functions must use the page table accessor function (pmdp_get()) to
>>>>> dereference the supplied pointer. However, this means pmdp_get() is
>>>>> called twice in the second case. This double call must be avoided if
>>>>> pmdp_get() applies some non-idempotent transformation to the value.
>>>>>
>>>>> Avoid the double transformation by calling set_pmd() on the stack
>>>>> variables where necessary to keep set_pmd()/pmdp_get() calls balanced.
>>>>
>>>> I don't think this is a good solution.
>>>
>>> Agreed,
>>>
>>> set_pmd(&pmd, pmd);
>>>
>>> is rather horrible.
>> I agree that this patch is ugly. The only way I see to avoid code like this is
>> to refactor (or duplicate) the functions so no function takes pointers to both
>> hardware page tables and on-stack page table entries. Is that sort of
>> refactoring the right direction to go for v4?
>
> From a quick look at the code, I think that some cases are solvable by
> refactoring to pass the value instead of the pointer, and leave it to the higher
> level decide how to read the value from the pointer - it knows if it is pointing
> to HW pgtable or if it's a (e.g) stack value.
>
> But the more I look at the code, the more instances I find where pointers to
> stack variables are being passed to arch pgtable helpers as if they are HW
> pgtable entry pointers. (Mainly pmd level).
>
> I wonder if we need to bite the bullet and explicitly separate the types? At
> each level, we have:
>
> 1. page table entry value
> 2. pointer to page table entry _value_ (e.g. pointer to pXX_t on stack)
> 3. pointer to page table entry in HW pgtable
>
> Today, 1 is represented by pte_t, pmd_t, etc. 2 and 3 are represented by the
> same type; pte_t*, pmd_t*, etc.
>
> If we create a new type for 3, it will both document and enforce when type 2 or
> type 3 is required.
>
> e.g:
>
> // pte_t: defined by arch.
> typedef unsigned long pte_t;
>
> // ptep_t: new opaque type that can't be dereferenced.
> struct __ptep_t;
> typedef struct __ptep_t *ptep_t;
This is what I had in mind when we last discussed this topic and I
suggested a way forward to not play whack-a-mole with new users that do
*ptep showing up.
Agreed that we ideally indicate that this is a HW PTE pointer that must
be de-referenced through ptep_get() or similar. (maybe we'd find a new
name for this set of functions).
I talked to Samuel at LPC, pointing him at the way XEN-pv implemented
support for changing PFNs in ptes. We might not need all of that rework
to move forward with the risc-v change.
Having that said, I also agree that this would be a cleanup worth having
(which will result in quite a bit of churn :) ).
So if someone wants to bump up the patch count, speak now.
--
Cheers
David
Powered by blists - more mailing lists