lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <db00bb89-a7f4-4749-be2f-7a98c7636070@lucifer.local>
Date: Wed, 4 Jun 2025 10:20:03 +0100
From: Lorenzo Stoakes <lorenzo.stoakes@...cle.com>
To: David Hildenbrand <david@...hat.com>
Cc: Andrew Morton <akpm@...ux-foundation.org>, Barry Song <baohua@...nel.org>,
        "Liam R . Howlett" <Liam.Howlett@...cle.com>,
        Vlastimil Babka <vbabka@...e.cz>, Mike Rapoport <rppt@...nel.org>,
        Suren Baghdasaryan <surenb@...gle.com>, Michal Hocko <mhocko@...e.com>,
        Muchun Song <muchun.song@...ux.dev>,
        Oscar Salvador <osalvador@...e.de>,
        Huacai Chen <chenhuacai@...nel.org>, WANG Xuerui <kernel@...0n.name>,
        Jonas Bonn <jonas@...thpole.se>,
        Stefan Kristiansson <stefan.kristiansson@...nalahti.fi>,
        Stafford Horne <shorne@...il.com>,
        Paul Walmsley <paul.walmsley@...ive.com>,
        Palmer Dabbelt <palmer@...belt.com>, Albert Ou <aou@...s.berkeley.edu>,
        Alexandre Ghiti <alex@...ti.fr>, Jann Horn <jannh@...gle.com>,
        loongarch@...ts.linux.dev, linux-kernel@...r.kernel.org,
        linux-openrisc@...r.kernel.org, linux-riscv@...ts.infradead.org,
        linux-mm@...ck.org
Subject: Re: [PATCH RESEND] mm/pagewalk: split walk_page_range_novma() into
 kernel/user parts

On Wed, Jun 04, 2025 at 09:39:30AM +0200, David Hildenbrand wrote:
> On 03.06.25 21:22, Lorenzo Stoakes wrote:
> > The walk_page_range_novma() function is rather confusing - it supports two
> > modes, one used often, the other used only for debugging.
> >
> > The first mode is the common case of traversal of kernel page tables, which
> > is what nearly all callers use this for.
>
> ... and what people should be using it for 🙂

:)

Yeah the whole intent of this patch is to detach the 'crazy debug' bit from
the 'used by arches all over the place' stuff.

Being super clear as to what you're doing matters.

>
> >
> > Secondly it provides an unusual debugging interface that allows for the
> > traversal of page tables in a userland range of memory even for that memory
> > which is not described by a VMA.
> >
> > This is highly unusual and it is far from certain that such page tables
> > should even exist, but perhaps this is precisely why it is useful as a
> > debugging mechanism.
> >
> > As a result, this is utilised by ptdump only. Historically, things were
> > reversed - ptdump was the only user, and other parts of the kernel evolved
> > to use the kernel page table walking here.
> >
> > Since we have some complicated and confusing locking rules for the novma
> > case, it makes sense to separate the two usages into their own functions.
> >
> > Doing this also provide self-documentation as to the intent of the caller -
> > are they doing something rather unusual or are they simply doing a standard
> > kernel page table walk?
> >
> > We therefore maintain walk_page_range_novma() for this single usage, and
> > document the function as such.
>
> If we have to keep this dangerous interface, it should probably be
>
> walk_page_range_debug() or walk_page_range_dump()

Ugh it's too early, I thought Mike suggested this :P but he suggested the
mm/internal.h bit.

But anyway I agree with both, will fix in v2.

>
> >
> > Note that ptdump uses the precise same function for kernel walking as a
> > convenience, so we permit this but make it very explicit by having
> > walk_page_range_novma() invoke walk_page_range_kernel() in this case.
> >
> > We introduce walk_page_range_kernel() for the far more common case of
> > kernel page table traversal.
>
> I wonder if we should give it a completely different name scheme to
> highlight that this is something completely different.
>
> walk_kernel_page_table_range()

Yeah, I think this might be a good idea actually. This is doing something
'unusual' unlike all the other walk_kernel_xxx() handlers, so this should
highlight it even more clearly.

Will fixup in v2.

>
> etc.
>
>
> --
> Cheers,
>
> David / dhildenb
>
>
> _______________________________________________
> linux-riscv mailing list
> linux-riscv@...ts.infradead.org
> http://lists.infradead.org/mailman/listinfo/linux-riscv

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ