[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <af6d28d0-d646-45d5-832c-66add20ea388@redhat.com>
Date: Thu, 5 Jun 2025 22:23:52 +0200
From: David Hildenbrand <david@...hat.com>
To: Jann Horn <jannh@...gle.com>, Lorenzo Stoakes <lorenzo.stoakes@...cle.com>
Cc: Andrew Morton <akpm@...ux-foundation.org>, Barry Song
<baohua@...nel.org>, "Liam R . Howlett" <Liam.Howlett@...cle.com>,
Vlastimil Babka <vbabka@...e.cz>, Mike Rapoport <rppt@...nel.org>,
Suren Baghdasaryan <surenb@...gle.com>, Michal Hocko <mhocko@...e.com>,
Muchun Song <muchun.song@...ux.dev>, Oscar Salvador <osalvador@...e.de>,
Huacai Chen <chenhuacai@...nel.org>, WANG Xuerui <kernel@...0n.name>,
Jonas Bonn <jonas@...thpole.se>,
Stefan Kristiansson <stefan.kristiansson@...nalahti.fi>,
Stafford Horne <shorne@...il.com>, Paul Walmsley <paul.walmsley@...ive.com>,
Palmer Dabbelt <palmer@...belt.com>, Albert Ou <aou@...s.berkeley.edu>,
Alexandre Ghiti <alex@...ti.fr>, loongarch@...ts.linux.dev,
linux-kernel@...r.kernel.org, linux-openrisc@...r.kernel.org,
linux-riscv@...ts.infradead.org, linux-mm@...ck.org
Subject: Re: [PATCH v2] mm/pagewalk: split walk_page_range_novma() into
kernel/user parts
On 05.06.25 21:19, Jann Horn wrote:
> On Wed, Jun 4, 2025 at 4:21 PM Lorenzo Stoakes
> <lorenzo.stoakes@...cle.com> wrote:
>> The walk_page_range_novma() function is rather confusing - it supports two
>> modes, one used often, the other used only for debugging.
>>
>> The first mode is the common case of traversal of kernel page tables, which
>> is what nearly all callers use this for.
>>
>> Secondly it provides an unusual debugging interface that allows for the
>> traversal of page tables in a userland range of memory even for that memory
>> which is not described by a VMA.
>>
>> It is far from certain that such page tables should even exist, but perhaps
>> this is precisely why it is useful as a debugging mechanism.
>>
>> As a result, this is utilised by ptdump only. Historically, things were
>> reversed - ptdump was the only user, and other parts of the kernel evolved
>> to use the kernel page table walking here.
>
> Just for the record, copy-pasting my comment on v1 that was
> accidentally sent off-list:
> ```
> Sort of a tangential comment: I wonder if it would make sense to give
> ptdump a different page table walker that uses roughly the same safety
> contract as gup_fast() - turn off IRQs and then walk the page tables
> locklessly. We'd need basically no locking and no special cases
> (regarding userspace mappings at least), at the cost of having to
> write the walker code such that we periodically restart the walk from
> scratch and not being able to inspect referenced pages. (That might
> also be nicer for debugging, since it wouldn't block on locks...)
> ```
I assume we don't have to dump more than pte values etc? So
pte_special() and friends are not relevant to get it right.
GUP-fast depend on CONFIG_HAVE_GUP_FAST, not sure if that would be a
concern for now.
--
Cheers,
David / dhildenb
Powered by blists - more mailing lists