linux-kernel - Re: [RFC 08/13] mm, compaction: simplify contended compaction handling

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <20160513130950.GN20141@dhcp22.suse.cz>
Date:	Fri, 13 May 2016 15:09:52 +0200
From:	Michal Hocko <mhocko@...nel.org>
To:	Vlastimil Babka <vbabka@...e.cz>
Cc:	linux-mm@...ck.org, Andrew Morton <akpm@...ux-foundation.org>,
	Joonsoo Kim <iamjoonsoo.kim@....com>,
	Rik van Riel <riel@...hat.com>,
	David Rientjes <rientjes@...gle.com>,
	Mel Gorman <mgorman@...hsingularity.net>,
	Johannes Weiner <hannes@...xchg.org>,
	Tetsuo Handa <penguin-kernel@...ove.sakura.ne.jp>,
	linux-kernel@...r.kernel.org,
	Linus Torvalds <torvalds@...ux-foundation.org>
Subject: Re: [RFC 08/13] mm, compaction: simplify contended compaction
 handling

On Tue 10-05-16 09:35:58, Vlastimil Babka wrote:
> Async compaction detects contention either due to failing trylock on zone->lock
> or lru_lock, or by need_resched(). Since 1f9efdef4f3f ("mm, compaction:
> khugepaged should not give up due to need_resched()") the code got quite
> complicated to distinguish these two up to the __alloc_pages_slowpath() level,
> so different decisions could be taken for khugepaged allocations.
> 
> After the recent changes, khugepaged allocations don't check for contended
> compaction anymore, so we again don't need to distinguish lock and sched
> contention, and simplify the current convoluted code a lot.
> 
> However, I believe it's also possible to simplify even more and completely
> remove the check for contended compaction after the initial async compaction
> for costly orders, which was originally aimed at THP page fault allocations.
> There are several reasons why this can be done now:
> 
> - with the new defaults, THP page faults no longer do reclaim/compaction at
>   all, unless the system admin has overriden the default, or application has
>   indicated via madvise that it can benefit from THP's. In both cases, it
>   means that the potential extra latency is expected and worth the benefits.

Yes this sounds reasonable to me. Especially when we consider the code bloat
size this is causing.

> - even if reclaim/compaction proceeds after this patch where it previously
>   wouldn't, the second compaction attempt is still async and will detect the
>   contention and back off, if the contention persists

MIGRATE_ASYNC still backs off after this patch so I would be surprise to
see more latency issues from this change.

> - there are still heuristics like deferred compaction and pageblock skip bits
>   in place that prevent excessive THP page fault latencies
> 
> Signed-off-by: Vlastimil Babka <vbabka@...e.cz>

I hope I haven't missed anything because the compaction is full of
subtle traps but this seems the changes seem ok to me.

Acked-by: Michal Hocko <mhocko@...e.com>

> ---
>  include/linux/compaction.h | 10 +------
>  mm/compaction.c            | 72 +++++++++-------------------------------------
>  mm/internal.h              |  5 +---
>  mm/page_alloc.c            | 28 +-----------------
>  4 files changed, 16 insertions(+), 99 deletions(-)

This is really nice cleanup considering it doesn't introduce big
behavior changes which is my understanding from the code.

[...]
> @@ -1564,14 +1564,11 @@ static enum compact_result compact_zone(struct zone *zone, struct compact_contro
>  	trace_mm_compaction_end(start_pfn, cc->migrate_pfn,
>  				cc->free_pfn, end_pfn, sync, ret);
>  
> -	if (ret == COMPACT_CONTENDED)
> -		ret = COMPACT_PARTIAL;
> -
>  	return ret;
>  }

This took me a while to grasp but then I realized this is correct
because we shouldn't pretend progress when there was none in fact,
especially when __alloc_pages_direct_compact basically replaced this
"fake" COMPACT_PARTIAL by COMPACT_CONTENDED anyway.

[...]
-- 
Michal Hocko
SUSE Labs