[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <Ydd6IRTxI5RU/Sp1@google.com>
Date: Thu, 6 Jan 2022 15:24:17 -0800
From: Minchan Kim <minchan@...nel.org>
To: John Hubbard <jhubbard@...dia.com>
Cc: Andrew Morton <akpm@...ux-foundation.org>,
Michal Hocko <mhocko@...e.com>,
David Hildenbrand <david@...hat.com>,
linux-mm <linux-mm@...ck.org>,
LKML <linux-kernel@...r.kernel.org>,
Suren Baghdasaryan <surenb@...gle.com>,
John Dias <joaodias@...gle.com>, huww98@...look.com
Subject: Re: [RFC v2] mm: introduce page pin owner
On Thu, Jan 06, 2022 at 02:27:48PM -0800, John Hubbard wrote:
> On 12/28/21 09:59, Minchan Kim wrote:
> > A Contiguous Memory Allocator(CMA) allocation can fail if any page
> > within the requested range has an elevated refcount(a pinned page).
> >
> > Debugging such failures is difficult, because the struct pages only
> > show a combined refcount, and do not show the callstacks or
> > backtraces of the code that acquired each refcount. So the source
> > of the page pins remains a mystery, at the time of CMA failure.
> >
> > In order to solve this without adding too much overhead, just do
> > nothing most of the time, which is pretty low overhead. However,
> > once a CMA failure occurs, then mark the page (this requires a
> > pointer's worth of space in struct page, but it uses page extensions
> > to get that), and start tracing the subsequent put_page() calls.
> > As the program finishes up, each page pin will be undone, and
> > traced with a backtrace. The programmer reads the trace output and
> > sees the list of all page pinning code paths.
> >
> > This will consume an additional 8 bytes per 4KB page, or an
> > additional 0.2% of RAM. In addition to the storage space, it will
> > have some performance cost, due to increasing the size of struct
> > page so that it is greater than the cacheline size (or multiples
> > thereof) of popular (x86, ...) CPUs.
> >
> > The idea can apply every user of migrate_pages as well as CMA to
> > know the reason why the page migration failed. To support it,
> > the implementation takes "enum migrate_reason" string as filter
> > of the tracepoint(see below).
> >
>
> Hi Minchan,
>
> If this is ready to propose, then maybe it's time to remove the "RFC"
> qualification from the subject line, and re-post for final review.
>
> And also when you do that, could you please specify which tree or commit
> this applies to? I wasn't able to figure that out this time.
Sorry for that. It was based on next-20211224.
>
> > Usage)
>
> This extensive "usage" section is probably helpful, but the commit
> log is certainly not the place for the "how to" documentation. Let's
> find an .rst file to stash it in, I think.
I wanted to get some review for implementation/interface/usage before
respin removing the RFC. Otherwise, the the documentation need to keep
update heavily. Based on your comment, I think you are almost agree
with as-is. Then, yeah, let me cook up the doc and repost it with
removing the RFC tag.
Thanks.
Powered by blists - more mailing lists