[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Date: Mon, 05 May 2014 16:48:29 +0200
From: Vlastimil Babka <vbabka@...e.cz>
To: David Rientjes <rientjes@...gle.com>, Mel Gorman <mgorman@...e.de>
CC: Andrew Morton <akpm@...ux-foundation.org>,
Rik van Riel <riel@...hat.com>,
Joonsoo Kim <iamjoonsoo.kim@....com>,
Greg Thelen <gthelen@...gle.com>,
Hugh Dickins <hughd@...gle.com>, linux-kernel@...r.kernel.org,
linux-mm@...ck.org
Subject: Re: [patch v2 4/4] mm, thp: do not perform sync compaction on pagefault
On 05/02/2014 10:29 PM, David Rientjes wrote:
> On Fri, 2 May 2014, Mel Gorman wrote:
>
>>> The page locks I'm referring to is the lock_page() in __unmap_and_move()
>>> that gets called for sync compaction after the migrate_pages() iteration
>>> makes a few passes and unsuccessfully grabs it. This becomes a forced
>>> migration since __unmap_and_move() returns -EAGAIN when the trylock fails.
>>>
>>
>> Can that be fixed then instead of disabling it entirely?
>>
>
> We could return -EAGAIN when the trylock_page() fails for
> MIGRATE_SYNC_LIGHT. It would become a forced migration but we ignore that
> currently for MIGRATE_ASYNC, and I could extend it to be ignored for
> MIGRATE_SYNC_LIGHT as well.
>
>>> We have perf profiles from one workload in particular that shows
>>> contention on i_mmap_mutex (anon isn't interesting since the vast majority
>>> of memory on this workload [120GB on a 128GB machine] is has a gup pin and
>>> doesn't get isolated because of 119d6d59dcc0 ("mm, compaction: avoid
>>> isolating pinned pages")) between cpus all doing memory compaction trying
>>> to fault thp memory.
>>>
>>
>> Abort SYNC_LIGHT compaction if the mutex is contended.
>>
>
> Yeah, I have patches for that as well but we're waiting to see if they are
> actually needed when sync compaction is disabled for thp. If we aren't
> actually going to disable it entirely, then I can revive those patches if
> the contention becomes such an issue.
>
>>> That's one example that we've seen, but the fact remains that at times
>>> sync compaction will iterate the entire 128GB machine and not allow an
>>> order-9 page to be allocated and there's nothing to preempt it like the
>>> need_resched() or lock contention checks that async compaction has.
>>
>> Make compact_control->sync the same enum field and check for contention
>> on the async/sync_light case but leave it for sync if compacting via the
>> proc interface?
>>
>
> Ok, that certainly can be done, I wasn't sure you would be happy with such
> a change. I'm not sure there's so much of a difference between the new
> compact_control->sync == MIGRATE_ASYNC and == MIGRATE_SYNC_LIGHT now,
> though. Would it make sense to remove MIGRATE_SYNC_LIGHT entirely from
> the page allocator, i.e. remove sync_migration entirely, and just retry
> with a second call to compaction before failing instead?
Maybe we should step back and rethink the conditions for all sources of
compaction to better balance effort vs desired outcome, so distinguish 4
modes (just quick thoughts):
1) kswapd - we don't want to slow it down too much, thus async, uses
cached pfns, honors skip bits, but does not give up on order=X blocks of
pages just because some page was unable to be isolated/migrated (so
gradually it might clean up such block over time).
2) hugepaged - it can afford to wait, so async + sync, use cached pfns,
ignore skip bits on sync, also do not give up on order=THP blocks (?)
3) direct compaction on allocation of order X- we want to avoid
allocation latencies, so it should skip over order=X blocks of pages as
soon as it cannot isolate some page in such block (and put back
already-isolated pages instead of migrating them). As for other
parameters I'm not sure. Sync should still be possible thanks to the
deferred_compaction logic? Maybe somehow transform the yes/no answer of
deferred compaction to a number of how many pageblocks it should try
before giving up, so there are no latency spikes limited only by the
size of the zone?
4) compaction from proc interface - sync, reset cached pfn's, ignore
deferring, ignore skip bits...
Vlastimil
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
Powered by blists - more mailing lists