[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <20120925073654.GN13234@bbox>
Date: Tue, 25 Sep 2012 16:36:54 +0900
From: Minchan Kim <minchan@...nel.org>
To: Mel Gorman <mgorman@...e.de>
Cc: Andrew Morton <akpm@...ux-foundation.org>,
Richard Davies <richard@...chsys.com>,
Shaohua Li <shli@...nel.org>, Rik van Riel <riel@...hat.com>,
Avi Kivity <avi@...hat.com>,
QEMU-devel <qemu-devel@...gnu.org>, KVM <kvm@...r.kernel.org>,
Linux-MM <linux-mm@...ck.org>,
LKML <linux-kernel@...r.kernel.org>
Subject: Re: [PATCH 6/9] mm: compaction: Acquire the zone->lock as late as
possible
On Mon, Sep 24, 2012 at 09:52:38AM +0100, Mel Gorman wrote:
> On Fri, Sep 21, 2012 at 02:35:57PM -0700, Andrew Morton wrote:
> > On Fri, 21 Sep 2012 11:46:20 +0100
> > Mel Gorman <mgorman@...e.de> wrote:
> >
> > > Compactions free scanner acquires the zone->lock when checking for PageBuddy
> > > pages and isolating them. It does this even if there are no PageBuddy pages
> > > in the range.
> > >
> > > This patch defers acquiring the zone lock for as long as possible. In the
> > > event there are no free pages in the pageblock then the lock will not be
> > > acquired at all which reduces contention on zone->lock.
> > >
> > > ...
> > >
> > > --- a/mm/compaction.c
> > > +++ b/mm/compaction.c
> > > @@ -93,6 +93,28 @@ static inline bool compact_trylock_irqsave(spinlock_t *lock,
> > > return compact_checklock_irqsave(lock, flags, false, cc);
> > > }
> > >
> > > +/* Returns true if the page is within a block suitable for migration to */
> > > +static bool suitable_migration_target(struct page *page)
> > > +{
> > > +
> >
> > stray newline
> >
>
> Fixed.
>
> > > + int migratetype = get_pageblock_migratetype(page);
> > > +
> > > + /* Don't interfere with memory hot-remove or the min_free_kbytes blocks */
> > > + if (migratetype == MIGRATE_ISOLATE || migratetype == MIGRATE_RESERVE)
> > > + return false;
> > > +
> > > + /* If the page is a large free page, then allow migration */
> > > + if (PageBuddy(page) && page_order(page) >= pageblock_order)
> > > + return true;
> > > +
> > > + /* If the block is MIGRATE_MOVABLE or MIGRATE_CMA, allow migration */
> > > + if (migrate_async_suitable(migratetype))
> > > + return true;
> > > +
> > > + /* Otherwise skip the block */
> > > + return false;
> > > +}
> > > +
> > >
> > > ...
> > >
> > > @@ -168,23 +193,38 @@ static unsigned long isolate_freepages_block(unsigned long blockpfn,
> > > int isolated, i;
> > > struct page *page = cursor;
> > >
> > > - if (!pfn_valid_within(blockpfn)) {
> > > - if (strict)
> > > - return 0;
> > > - continue;
> > > - }
> > > + if (!pfn_valid_within(blockpfn))
> > > + goto strict_check;
> > > nr_scanned++;
> > >
> > > - if (!PageBuddy(page)) {
> > > - if (strict)
> > > - return 0;
> > > - continue;
> > > - }
> > > + if (!PageBuddy(page))
> > > + goto strict_check;
> > > +
> > > + /*
> > > + * The zone lock must be held to isolate freepages. This
> > > + * unfortunately this is a very coarse lock and can be
> >
> > this this
> >
>
> Fixed.
>
> > > + * heavily contended if there are parallel allocations
> > > + * or parallel compactions. For async compaction do not
> > > + * spin on the lock and we acquire the lock as late as
> > > + * possible.
> > > + */
> > > + locked = compact_checklock_irqsave(&cc->zone->lock, &flags,
> > > + locked, cc);
> > > + if (!locked)
> > > + break;
> > > +
> > > + /* Recheck this is a suitable migration target under lock */
> > > + if (!strict && !suitable_migration_target(page))
> > > + break;
> > > +
> > > + /* Recheck this is a buddy page under lock */
> > > + if (!PageBuddy(page))
> > > + goto strict_check;
> > >
> > > /* Found a free page, break it into order-0 pages */
> > > isolated = split_free_page(page);
> > > if (!isolated && strict)
> > > - return 0;
> > > + goto strict_check;
> > > total_isolated += isolated;
> > > for (i = 0; i < isolated; i++) {
> > > list_add(&page->lru, freelist);
> > > @@ -196,9 +236,23 @@ static unsigned long isolate_freepages_block(unsigned long blockpfn,
> > > blockpfn += isolated - 1;
> > > cursor += isolated - 1;
> > > }
> > > +
> > > + continue;
> > > +
> > > +strict_check:
> > > + /* Abort isolation if the caller requested strict isolation */
> > > + if (strict) {
> > > + total_isolated = 0;
> > > + goto out;
> > > + }
> > > }
> > >
> > > trace_mm_compaction_isolate_freepages(nr_scanned, total_isolated);
> > > +
> > > +out:
> > > + if (locked)
> > > + spin_unlock_irqrestore(&cc->zone->lock, flags);
> > > +
> > > return total_isolated;
> > > }
> >
> > A a few things about this function.
> >
> > Would it be cleaner if we did
> >
> > if (!strict) {
> > if (!suitable_migration_target(page))
> > break;
> > } else {
> > if (!PageBuddy(page))
> > goto strict_check;
> > }
> >
> > and then remove the test of `strict' from strict_check (which then
> > should be renamed)?
> >
>
> I was not able to implement what you suggested without breakage.
> However, I can do something very similar if "strict" mode does not bail
> out ASAP and instead double checks at the end that everything was
> isolated correctly. This does mean that CMAs failure case is slightly
> more expensive but that really is a corner case and I think it's
> acceptable if the code is easier to follow as a result.
>
> > Which perhaps means that the whole `strict_check' block can go away:
> >
> > if (!strict) {
> > if (!suitable_migration_target(page))
> > break;
> > } else {
> > if (!PageBuddy(page)) {
> > total_isolated = 0;
> > goto out;
> > }
> >
> > Have a think about it? The function is a little straggly.
> >
>
> Proposal below.
>
> > Secondly, it is correct/desirable to skip the (now misnamed
> > `trace_mm_compaction_isolate_freepages') tracepoint generation if we
> > baled out of the loop? The fact that we entered
> > isolate_freepages_block() but failed to isolate anything is data which
> > people might be interested in?
> >
>
> You're right, it may be interesting for someone debugging CMA to know that
> nr_scanned != nr_isolated at the time of allocation failure.
>
> > Thirdly, that (existing) comment "This assumes the block is valid" is
> > either too vague or wrong. If it's valid, why wo we check
> > pfn_valid_within()?
> >
>
> Comment is stale and should be removed.
>
> > > @@ -218,13 +272,18 @@ static unsigned long isolate_freepages_block(unsigned long blockpfn,
> > > unsigned long
> > > isolate_freepages_range(unsigned long start_pfn, unsigned long end_pfn)
> > > {
> > > - unsigned long isolated, pfn, block_end_pfn, flags;
> > > + unsigned long isolated, pfn, block_end_pfn;
> > > struct zone *zone = NULL;
> > > LIST_HEAD(freelist);
> > > + struct compact_control cc;
> > >
> > > if (pfn_valid(start_pfn))
> > > zone = page_zone(pfn_to_page(start_pfn));
> > >
> > > + /* cc needed for isolate_freepages_block to acquire zone->lock */
> > > + cc.zone = zone;
> > > + cc.sync = true;
> >
> > We initialise two of cc's fields, leave the other 12 fields containing
> > random garbage, then start using it. I see no bug here, but...
> >
>
> I get your point. At the very least we should initialise it with zeros.
>
> How about this?
>
> ---8<---
> mm: compaction: Iron out isolate_freepages_block() and isolate_freepages_range()
>
> Andrew pointed out that isolate_freepages_block() is "straggly" and
> isolate_freepages_range() is making assumptions on how compact_control is
> used which is delicate. This patch straightens isolate_freepages_block()
> and makes it fly straight and initialses compact_control to zeros in
> isolate_freepages_range(). The code should be easier to follow and
> is functionally equivalent. The CMA failure path is now a little more
> expensive but that is a marginal corner-case.
>
> Signed-off-by: Mel Gorman <mgorman@...e.de>
Acked-by: Minchan Kim <minchan@...nel.org>
--
Kind regards,
Minchan Kim
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
Powered by blists - more mailing lists