linux-kernel - Re: [patch v2 4/4] mm, thp: do not perform sync compaction on pagefault

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Date:	Mon, 05 May 2014 16:48:29 +0200
From:	Vlastimil Babka <vbabka@...e.cz>
To:	David Rientjes <rientjes@...gle.com>, Mel Gorman <mgorman@...e.de>
CC:	Andrew Morton <akpm@...ux-foundation.org>,
	Rik van Riel <riel@...hat.com>,
	Joonsoo Kim <iamjoonsoo.kim@....com>,
	Greg Thelen <gthelen@...gle.com>,
	Hugh Dickins <hughd@...gle.com>, linux-kernel@...r.kernel.org,
	linux-mm@...ck.org
Subject: Re: [patch v2 4/4] mm, thp: do not perform sync compaction on pagefault

On 05/02/2014 10:29 PM, David Rientjes wrote:
> On Fri, 2 May 2014, Mel Gorman wrote:
>
>>> The page locks I'm referring to is the lock_page() in __unmap_and_move()
>>> that gets called for sync compaction after the migrate_pages() iteration
>>> makes a few passes and unsuccessfully grabs it.  This becomes a forced
>>> migration since __unmap_and_move() returns -EAGAIN when the trylock fails.
>>>
>>
>> Can that be fixed then instead of disabling it entirely?
>>
>
> We could return -EAGAIN when the trylock_page() fails for
> MIGRATE_SYNC_LIGHT.  It would become a forced migration but we ignore that
> currently for MIGRATE_ASYNC, and I could extend it to be ignored for
> MIGRATE_SYNC_LIGHT as well.
>
>>> We have perf profiles from one workload in particular that shows
>>> contention on i_mmap_mutex (anon isn't interesting since the vast majority
>>> of memory on this workload [120GB on a 128GB machine] is has a gup pin and
>>> doesn't get isolated because of 119d6d59dcc0 ("mm, compaction: avoid
>>> isolating pinned pages")) between cpus all doing memory compaction trying
>>> to fault thp memory.
>>>
>>
>> Abort SYNC_LIGHT compaction if the mutex is contended.
>>
>
> Yeah, I have patches for that as well but we're waiting to see if they are
> actually needed when sync compaction is disabled for thp.  If we aren't
> actually going to disable it entirely, then I can revive those patches if
> the contention becomes such an issue.
>
>>> That's one example that we've seen, but the fact remains that at times
>>> sync compaction will iterate the entire 128GB machine and not allow an
>>> order-9 page to be allocated and there's nothing to preempt it like the
>>> need_resched() or lock contention checks that async compaction has.
>>
>> Make compact_control->sync the same enum field and check for contention
>> on the async/sync_light case but leave it for sync if compacting via the
>> proc interface?
>>
>
> Ok, that certainly can be done, I wasn't sure you would be happy with such
> a change.  I'm not sure there's so much of a difference between the new
> compact_control->sync == MIGRATE_ASYNC and == MIGRATE_SYNC_LIGHT now,
> though.  Would it make sense to remove MIGRATE_SYNC_LIGHT entirely from
> the page allocator, i.e. remove sync_migration entirely, and just retry
> with a second call to compaction before failing instead?

Maybe we should step back and rethink the conditions for all sources of 
compaction to better balance effort vs desired outcome, so distinguish 4 
modes (just quick thoughts):

1) kswapd - we don't want to slow it down too much, thus async, uses 
cached pfns, honors skip bits, but does not give up on order=X blocks of 
pages just because some page was unable to be isolated/migrated (so 
gradually it might clean up such block over time).

2) hugepaged - it can afford to wait, so async + sync, use cached pfns, 
ignore skip bits on sync, also do not give up on order=THP blocks (?)

3) direct compaction on allocation of order X- we want to avoid 
allocation latencies, so it should skip over order=X blocks of pages as 
soon as it cannot isolate some page in such block (and put back 
already-isolated pages instead of migrating them). As for other 
parameters I'm not sure. Sync should still be possible thanks to the 
deferred_compaction logic? Maybe somehow transform the yes/no answer of 
deferred compaction to a number of how many pageblocks it should try 
before giving up, so there are no latency spikes limited only by the 
size of the zone?

4) compaction from proc interface - sync, reset cached pfn's, ignore 
deferring, ignore skip bits...

Vlastimil


--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/