[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <db00bb89-a7f4-4749-be2f-7a98c7636070@lucifer.local>
Date: Wed, 4 Jun 2025 10:20:03 +0100
From: Lorenzo Stoakes <lorenzo.stoakes@...cle.com>
To: David Hildenbrand <david@...hat.com>
Cc: Andrew Morton <akpm@...ux-foundation.org>, Barry Song <baohua@...nel.org>,
"Liam R . Howlett" <Liam.Howlett@...cle.com>,
Vlastimil Babka <vbabka@...e.cz>, Mike Rapoport <rppt@...nel.org>,
Suren Baghdasaryan <surenb@...gle.com>, Michal Hocko <mhocko@...e.com>,
Muchun Song <muchun.song@...ux.dev>,
Oscar Salvador <osalvador@...e.de>,
Huacai Chen <chenhuacai@...nel.org>, WANG Xuerui <kernel@...0n.name>,
Jonas Bonn <jonas@...thpole.se>,
Stefan Kristiansson <stefan.kristiansson@...nalahti.fi>,
Stafford Horne <shorne@...il.com>,
Paul Walmsley <paul.walmsley@...ive.com>,
Palmer Dabbelt <palmer@...belt.com>, Albert Ou <aou@...s.berkeley.edu>,
Alexandre Ghiti <alex@...ti.fr>, Jann Horn <jannh@...gle.com>,
loongarch@...ts.linux.dev, linux-kernel@...r.kernel.org,
linux-openrisc@...r.kernel.org, linux-riscv@...ts.infradead.org,
linux-mm@...ck.org
Subject: Re: [PATCH RESEND] mm/pagewalk: split walk_page_range_novma() into
kernel/user parts
On Wed, Jun 04, 2025 at 09:39:30AM +0200, David Hildenbrand wrote:
> On 03.06.25 21:22, Lorenzo Stoakes wrote:
> > The walk_page_range_novma() function is rather confusing - it supports two
> > modes, one used often, the other used only for debugging.
> >
> > The first mode is the common case of traversal of kernel page tables, which
> > is what nearly all callers use this for.
>
> ... and what people should be using it for 🙂
:)
Yeah the whole intent of this patch is to detach the 'crazy debug' bit from
the 'used by arches all over the place' stuff.
Being super clear as to what you're doing matters.
>
> >
> > Secondly it provides an unusual debugging interface that allows for the
> > traversal of page tables in a userland range of memory even for that memory
> > which is not described by a VMA.
> >
> > This is highly unusual and it is far from certain that such page tables
> > should even exist, but perhaps this is precisely why it is useful as a
> > debugging mechanism.
> >
> > As a result, this is utilised by ptdump only. Historically, things were
> > reversed - ptdump was the only user, and other parts of the kernel evolved
> > to use the kernel page table walking here.
> >
> > Since we have some complicated and confusing locking rules for the novma
> > case, it makes sense to separate the two usages into their own functions.
> >
> > Doing this also provide self-documentation as to the intent of the caller -
> > are they doing something rather unusual or are they simply doing a standard
> > kernel page table walk?
> >
> > We therefore maintain walk_page_range_novma() for this single usage, and
> > document the function as such.
>
> If we have to keep this dangerous interface, it should probably be
>
> walk_page_range_debug() or walk_page_range_dump()
Ugh it's too early, I thought Mike suggested this :P but he suggested the
mm/internal.h bit.
But anyway I agree with both, will fix in v2.
>
> >
> > Note that ptdump uses the precise same function for kernel walking as a
> > convenience, so we permit this but make it very explicit by having
> > walk_page_range_novma() invoke walk_page_range_kernel() in this case.
> >
> > We introduce walk_page_range_kernel() for the far more common case of
> > kernel page table traversal.
>
> I wonder if we should give it a completely different name scheme to
> highlight that this is something completely different.
>
> walk_kernel_page_table_range()
Yeah, I think this might be a good idea actually. This is doing something
'unusual' unlike all the other walk_kernel_xxx() handlers, so this should
highlight it even more clearly.
Will fixup in v2.
>
> etc.
>
>
> --
> Cheers,
>
> David / dhildenb
>
>
> _______________________________________________
> linux-riscv mailing list
> linux-riscv@...ts.infradead.org
> http://lists.infradead.org/mailman/listinfo/linux-riscv
Powered by blists - more mailing lists