[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <YEefLYiX6rF3Uk4E@google.com>
Date: Tue, 9 Mar 2021 08:15:41 -0800
From: Minchan Kim <minchan@...nel.org>
To: Michal Hocko <mhocko@...e.com>
Cc: Andrew Morton <akpm@...ux-foundation.org>,
linux-mm <linux-mm@...ck.org>,
LKML <linux-kernel@...r.kernel.org>,
John Dias <joaodias@...gle.com>,
David Hildenbrand <david@...hat.com>,
Jason Baron <jbaron@...mai.com>
Subject: Re: [PATCH v2] mm: page_alloc: dump migrate-failed pages
On Tue, Mar 09, 2021 at 10:32:51AM +0100, Michal Hocko wrote:
> On Mon 08-03-21 12:20:47, Minchan Kim wrote:
> > alloc_contig_range is usually used on cma area or movable zone.
> > It's critical if the page migration fails on those areas so
> > dump more debugging message.
>
> I disagree with this statement. alloc_contig_range is not a reliable
> allocator. Any user, be it CMA or direct users of alloc_contig_range
> have to deal with allocation failures. Debugging information can be
> still useful but considering migration failures critical is
> overstatement to say the least.
Fair enough. Let's change it.
"Currently, debugging CMA allocation failure is too hard
due to lacking of page information. alloc_contig_range is
proper place to dump them since it has migrate-failed page
list."
>
> > page refcount, mapcount with page flags on dump_page are
> > helpful information to deduce the culprit. Furthermore,
> > dump_page_owner was super helpful to find long term pinner
> > who initiated the page allocation.
> >
> > Admin could enable the dump like this(by default, disabled)
> >
> > echo "func dump_migrate_failure_pages +p" > control
> >
> > Admin could disable it.
> >
> > echo "func dump_migrate_failure_pages =_" > control
>
> My original idea was to add few pr_debug and -DDYNAMIC_DEBUG_MODULE for
> page_alloc.c. It makes sense to enable a whole bunch at once though.
> The naming should better reflect this is alloc_contig_rage related
> because the above sounds like a generic migration failure thing.
alloc_contig_dump_pages?
>
> Somebody more familiar with the dynamic debugging infrastructure needs
> to have a look but from from a quick look it seems ok.
>
> Do we really need all the ugly ifdefery, though? Don't we want to have
> this compiled in all the time and just rely on the static branch managed
> by the dynamic debugging framework?
I have no further idea to make it simple while we keep the flexibility
for arguments and print format.
#if defined(CONFIG_DYNAMIC_DEBUG) || \
(defined(CONFIG_DYNAMIC_DEBUG_CORE) && defined(DYNAMIC_DEBUG_MODULE))
static void alloc_contig_dump_pages(struct list_head *page_list)
{
static DEFINE_RATELIMIT_STATE(_rs,
DEFAULT_RATELIMIT_INTERVAL,
DEFAULT_RATELIMIT_BURST);
DEFINE_DYNAMIC_DEBUG_METADATA(descriptor,
"migrate failure");
if (DYNAMIC_DEBUG_BRANCH(descriptor) && __ratelimit(&_rs)) {
struct page *page;
WARN(1, "failed callstack");
list_for_each_entry(page, page_list, lru)
dump_page(page, "migration failure");
}
}
#else
static inline void alloc_contig_dump_pages(struct list_head *page_list)
{
}
#endif
>
> [...]
> > +void dump_migrate_failure_pages(struct list_head *page_list)
> > +{
> > + DEFINE_DYNAMIC_DEBUG_METADATA(descriptor,
> > + "migrate failure");
> > + if (DYNAMIC_DEBUG_BRANCH(descriptor) &&
> > + alloc_contig_ratelimit()) {
> > + struct page *page;
> > +
> > + WARN(1, "failed callstack");
> > + list_for_each_entry(page, page_list, lru)
> > + dump_page(page, "migration failure");
> > + }
>
> Apart from the above, do we have to warn for something that is a
> debugging aid? A similar concern wrt dump_page which uses pr_warn and
Make sense.
> page owner is using even pr_alert.
> Would it make sense to add a loglevel parameter both into __dump_page
> and dump_page_owner?
Let me try it.
Powered by blists - more mailing lists