[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <c022ddc3-1cbd-8291-68a3-f90ffb93af84@google.com>
Date: Tue, 3 Jan 2023 15:07:12 -0800 (PST)
From: David Rientjes <rientjes@...gle.com>
To: Matthew Wilcox <willy@...radead.org>
cc: Vlastimil Babka <vbabka@...e.cz>,
kernel test robot <oliver.sang@...el.com>,
Hyeonggon Yoo <42.hyeyoo@...il.com>, oe-lkp@...ts.linux.dev,
lkp@...el.com, Mike Rapoport <rppt@...ux.ibm.com>,
Christoph Lameter <cl@...ux.com>, linux-kernel@...r.kernel.org,
linux-mm@...ck.org
Subject: Re: A better dump_page()
On Tue, 3 Jan 2023, Matthew Wilcox wrote:
> On Tue, Jan 03, 2023 at 11:42:11AM +0100, Vlastimil Babka wrote:
> > Separately we should also make the __dump_page() more resilient.
>
> Right. It's not ideal when one of our best debugging tools obfuscates
> the problem we're trying to debug. I've seen probems like this before,
> and the problem is that somebody calls dump_page() on a page that they
> don't own a refcount on. That lets the page mutate under us in some
> fairly awkward ways (as you've seen here, it seems to be part of several
> different compound allocations at various points during the dump
> process).
>
> One possibility I thought about was taking our own refcount on the
> page at the start of dump_page(). That would kill off the possibility
> of ever passing in a const struct page, and it would confuse people.
> Also, what if somebody passes in a pointer to something that's not a
> struct page? Then we've (tried to) modify memory that's not a refcount.
>
> I think the best we can do is to snapshot the struct page and the folio
> it appears to belong to at the start of dump_page(). It'll take a
> little care (for example, folio_pfn() must be passed the original
> folio, and not the snapshot), but I think it's doable.
>
By snapshot do you mean memcpy() of the metadata to the stack? I assume
this still leaves the opportunity for the underlying mutation of the page
but makes the window more narrow.
Powered by blists - more mailing lists