linux-kernel - Re: [PATCH v2 6/9] mm/memory: convert print_bad_pte() to print_bad_page

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <432662e3-e043-41cc-b7a2-acf14387eb95@lucifer.local>
Date: Fri, 18 Jul 2025 13:55:46 +0100
From: Lorenzo Stoakes <lorenzo.stoakes@...cle.com>
To: David Hildenbrand <david@...hat.com>
Cc: linux-kernel@...r.kernel.org, linux-mm@...ck.org,
        xen-devel@...ts.xenproject.org, linux-fsdevel@...r.kernel.org,
        nvdimm@...ts.linux.dev, Andrew Morton <akpm@...ux-foundation.org>,
        Juergen Gross <jgross@...e.com>,
        Stefano Stabellini <sstabellini@...nel.org>,
        Oleksandr Tyshchenko <oleksandr_tyshchenko@...m.com>,
        Dan Williams <dan.j.williams@...el.com>,
        Matthew Wilcox <willy@...radead.org>, Jan Kara <jack@...e.cz>,
        Alexander Viro <viro@...iv.linux.org.uk>,
        Christian Brauner <brauner@...nel.org>,
        "Liam R. Howlett" <Liam.Howlett@...cle.com>,
        Vlastimil Babka <vbabka@...e.cz>, Mike Rapoport <rppt@...nel.org>,
        Suren Baghdasaryan <surenb@...gle.com>, Michal Hocko <mhocko@...e.com>,
        Zi Yan <ziy@...dia.com>, Baolin Wang <baolin.wang@...ux.alibaba.com>,
        Nico Pache <npache@...hat.com>, Ryan Roberts <ryan.roberts@....com>,
        Dev Jain <dev.jain@....com>, Barry Song <baohua@...nel.org>,
        Jann Horn <jannh@...gle.com>, Pedro Falcato <pfalcato@...e.de>,
        Hugh Dickins <hughd@...gle.com>, Oscar Salvador <osalvador@...e.de>,
        Lance Yang <lance.yang@...ux.dev>
Subject: Re: [PATCH v2 6/9] mm/memory: convert print_bad_pte() to
 print_bad_page_map()

On Fri, Jul 18, 2025 at 01:04:30PM +0200, David Hildenbrand wrote:
> >
> > Yeah sorry I was in 'what locks do we need' mode and hadn't shifted back here,
> > but I guess the intent is that the caller _must_ hold this lock.
> >
> > I know it's nitty and annoying (sorry!) but as asserting seems to not be a
> > possibility here, could we spell these out as a series of points like:
> >
> > /*
> >   * The caller MUST hold the following locks:
> >   *
> >   * - Leaf page table lock
> >   * - Appropriate VMA lock to keep VMA stable
> >   */
> >
> > I don't _actually_ think you need the rmap lock then, as none of the page tables
> > you access would be impacted by any rmap action afaict, with these locks held.
>
> I don't enjoy wrong comments ;)
>
> This can be called from rmap code when doing a vm_normal_page() while
> holding the PTL.

OK so this is just underlining the confusion here (not your fault! I mean in
general with page table code, really).

Would this possibly work better then?:

/*
 * The caller MUST prevent page table teardown (by holding mmap, vma or rmap lock)
 * and MUST hold the leaf page table lock.
 */

>
> Really, I think we are over-thinking a helper that is triggered in specific
> context when the world is about to collide.

Indeed, but I think it's important to be clear as to what is required for this
to work (even though, as I understand it, we're already in trouble and if page
tables are corrupted or something like this we may even NULL ptr deref anyway so
it's best effort).

>
> This is not your general-purpose API.
>
> Maybe I should have never added a comment. Maybe I should just not have done
> this patch, because I really don't want to do more than the bare minimum to
> print_bad_page_map().

No no David no :P come back to the light sir...

This is a good patch I don't mean to dissuade you, I just want to make things
clear in a confusing bit of the kernel as we in mm as usual make life hard for
ourselves...

I think the locking comment _is_ important, as for far too long in mm we've had
_implicit_ understanding of where the locks should be at any given time, which
constantly blows up underneath us.

I'd rather keep things as clear as possible so even the intellectually
challenged such as myself are subject to less confusion.

Anyway TL;DR if you do something similar to suggestion above it's all good. Just
needs clarification.

>
> Because I deeply detest it, and no comments we will add will change that.

So do I! *clinks glass* but yes, it's horrid. But there's value in improving
quality of horrid code and refactoring to the least-worst version.

And we can attack it in a more serious way later.

[snip]

> > > > > +static void print_bad_page_map(struct vm_area_struct *vma,
> > > > > +		unsigned long addr, unsigned long long entry, struct page *page)
> > > > > +{
> > > > > +	struct address_space *mapping;
> > > > > +	pgoff_t index;
> > > > > +
> > > > > +	if (is_bad_page_map_ratelimited())
> > > > > +		return;
> > > > >
> > > > >    	mapping = vma->vm_file ? vma->vm_file->f_mapping : NULL;
> > > > >    	index = linear_page_index(vma, addr);
> > > > >
> > > > > -	pr_alert("BUG: Bad page map in process %s  pte:%08llx pmd:%08llx\n",
> > > > > -		 current->comm,
> > > > > -		 (long long)pte_val(pte), (long long)pmd_val(*pmd));
> > > > > +	pr_alert("BUG: Bad page map in process %s  entry:%08llx", current->comm, entry);
> > > >
> > > > Sort of wonder if this is even useful if you don't know what the 'entry'
> > > > is? But I guess the dump below will tell you.
> > >
> > > You probably missed in the patch description:
> > >
> > > "Whether it is a PTE or something else will usually become obvious from the
> > > page table dump or from the dumped stack. If ever required in the future, we
> > > could pass the entry level type similar to "enum rmap_level". For now, let's
> > > keep it simple."
> >
> > Yeah sorry I glossed over the commit msg, and now I pay for it ;) OK this
> > is fine then.
>
> Let me play with indicating the page table level, but it's the kind of stuff
> I wouldn't want to do in this series here.

Sure understood. I don't mean to hold this up with nits. Am happy to be
flexible, just thinking out loud generally.

>
> > >
> > > >
> > > > Then we have VM_IO, which strictly must not have an associated page right?
> > >
> > > VM_IO just means read/write side-effects, I think you could have ones with
> > > an memmap easily ... e.g., memory section (128MiB) spanning both memory and
> > > MMIO regions.
> >
> > Hmm, but why not have two separate VMAs? I guess I need to look into more
> > what this flag actually effects.
>
> Oh, I meant, that we might have a "struct page" for MMIO memory (pfn_valid()
> == true).
>
> In a MIXEDMAP that will get refcounted. Not sure if there are users that use
> VM_IO in a MIXEDMAP, I would assume so but didn't check.
>
> So VM_IO doesn't really interact with vm_normal_page(), really. It's all
> about PFNMAP and MIXEDMAP.

Thanks, yeah am thinking more broadly about VM_SPECIAL here. May go off and do
some work on that... VM_UNMERGEABLE might be better.

>
> --
> Cheers,
>
> David / dhildenb
>

Cheers, Lorenzo