linux-kernel - Re: [PATCH 02/10] mm, compaction: report compaction as contended only due to lock contention

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives
Hash Suite for Android: free password hash cracker in your pocket
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <53984A06.6020607@suse.cz>
Date:	Wed, 11 Jun 2014 14:22:30 +0200
From:	Vlastimil Babka <vbabka@...e.cz>
To:	Minchan Kim <minchan@...nel.org>
CC:	David Rientjes <rientjes@...gle.com>, linux-mm@...ck.org,
	linux-kernel@...r.kernel.org,
	Andrew Morton <akpm@...ux-foundation.org>,
	Greg Thelen <gthelen@...gle.com>, Mel Gorman <mgorman@...e.de>,
	Joonsoo Kim <iamjoonsoo.kim@....com>,
	Michal Nazarewicz <mina86@...a86.com>,
	Naoya Horiguchi <n-horiguchi@...jp.nec.com>,
	Christoph Lameter <cl@...ux.com>,
	Rik van Riel <riel@...hat.com>
Subject: Re: [PATCH 02/10] mm, compaction: report compaction as contended
 only due to lock contention

On 06/11/2014 03:10 AM, Minchan Kim wrote:
> On Mon, Jun 09, 2014 at 11:26:14AM +0200, Vlastimil Babka wrote:
>> Async compaction aborts when it detects zone lock contention or need_resched()
>> is true. David Rientjes has reported that in practice, most direct async
>> compactions for THP allocation abort due to need_resched(). This means that a
>> second direct compaction is never attempted, which might be OK for a page
>> fault, but hugepaged is intended to attempt a sync compaction in such case and
>> in these cases it won't.
>>
>> This patch replaces "bool contended" in compact_control with an enum that
>> distinguieshes between aborting due to need_resched() and aborting due to lock
>> contention. This allows propagating the abort through all compaction functions
>> as before, but declaring the direct compaction as contended only when lock
>> contantion has been detected.
>>
>> As a result, hugepaged will proceed with second sync compaction as intended,
>> when the preceding async compaction aborted due to need_resched().
>
> You said "second direct compaction is never attempted, which might be OK
> for a page fault" and said "hugepagd is intented to attempt a sync compaction"
> so I feel you want to handle khugepaged so special unlike other direct compact
> (ex, page fault).

Well khugepaged is my primary concern, but I imagine there are other 
direct compaction users besides THP page fault and khugepaged.

> By this patch, direct compaction take care only lock contention, not rescheduling
> so that pop questions.
>
> Is it okay not to consider need_resched in direct compaction really?

It still considers need_resched() to back of from async compaction. It's 
only about signaling contended_compaction back to 
__alloc_pages_slowpath(). There's this code executed after the first, 
async compaction fails:

/*
  * It can become very expensive to allocate transparent hugepages at
  * fault, so use asynchronous memory compaction for THP unless it is
  * khugepaged trying to collapse.
  */
if (!(gfp_mask & __GFP_NO_KSWAPD) || (current->flags & PF_KTHREAD))
         migration_mode = MIGRATE_SYNC_LIGHT;

/*
  * If compaction is deferred for high-order allocations, it is because
  * sync compaction recently failed. In this is the case and the caller
  * requested a movable allocation that does not heavily disrupt the
  * system then fail the allocation instead of entering direct reclaim.
  */
if ((deferred_compaction || contended_compaction) &&
                                         (gfp_mask & __GFP_NO_KSWAPD))
         goto nopage;

Both THP page fault and khugepaged use __GFP_NO_KSWAPD. The first if() 
decides whether the second attempt will be sync (for khugepaged) or 
async (page fault). The second if() decides that if compaction was 
contended, then there won't be any second attempt (and reclaim) at all. 
Counting need_resched() as contended in this case is bad for khugepaged. 
Even for page fault it means no direct reclaim and a second async 
compaction. David says need_resched() occurs so often then it is a poor 
heuristic to decide this.

> We have taken care of it in direct reclaim path so why direct compaction is
> so special?

I admit I'm not that familiar with reclaim but I didn't quickly find any 
need_resched() there? There's plenty of cond_resched() but that doesn't 
mean it will abort? Could you explain for me?

> Why does khugepaged give up easily if lock contention/need_resched happens?
> khugepaged is important for success ratio as I read your description so IMO,
> khugepaged should do synchronously without considering early bail out by
> lock/rescheduling.

Well a stupid answer is that's how __alloc_pages_slowpath() works :) I 
don't think it's bad to try using first a more lightweight approach 
before trying the heavyweight one. As long as the heavyweight one is not 
skipped for khugepaged.

> If it causes problems, user should increase scan_sleep_millisecs/alloc_sleep_millisecs,
> which is exactly the knob for that cases.
>
> So, my point is how about making khugepaged doing always dumb synchronous
> compaction thorough PG_KHUGEPAGED or GFP_SYNC_TRANSHUGE?
>
>>
>> Reported-by: David Rientjes <rientjes@...gle.com>
>> Signed-off-by: Vlastimil Babka <vbabka@...e.cz>
>> Cc: Minchan Kim <minchan@...nel.org>
>> Cc: Mel Gorman <mgorman@...e.de>
>> Cc: Joonsoo Kim <iamjoonsoo.kim@....com>
>> Cc: Michal Nazarewicz <mina86@...a86.com>
>> Cc: Naoya Horiguchi <n-horiguchi@...jp.nec.com>
>> Cc: Christoph Lameter <cl@...ux.com>
>> Cc: Rik van Riel <riel@...hat.com>
>> ---
>>   mm/compaction.c | 20 ++++++++++++++------
>>   mm/internal.h   | 15 +++++++++++----
>>   2 files changed, 25 insertions(+), 10 deletions(-)
>>
>> diff --git a/mm/compaction.c b/mm/compaction.c
>> index b73b182..d37f4a8 100644
>> --- a/mm/compaction.c
>> +++ b/mm/compaction.c
>> @@ -185,9 +185,14 @@ static void update_pageblock_skip(struct compact_control *cc,
>>   }
>>   #endif /* CONFIG_COMPACTION */
>>
>> -static inline bool should_release_lock(spinlock_t *lock)
>> +enum compact_contended should_release_lock(spinlock_t *lock)
>>   {
>> -	return need_resched() || spin_is_contended(lock);
>> +	if (need_resched())
>> +		return COMPACT_CONTENDED_SCHED;
>> +	else if (spin_is_contended(lock))
>> +		return COMPACT_CONTENDED_LOCK;
>> +	else
>> +		return COMPACT_CONTENDED_NONE;
>>   }
>>
>>   /*
>> @@ -202,7 +207,9 @@ static inline bool should_release_lock(spinlock_t *lock)
>>   static bool compact_checklock_irqsave(spinlock_t *lock, unsigned long *flags,
>>   				      bool locked, struct compact_control *cc)
>>   {
>> -	if (should_release_lock(lock)) {
>> +	enum compact_contended contended = should_release_lock(lock);
>> +
>> +	if (contended) {
>>   		if (locked) {
>>   			spin_unlock_irqrestore(lock, *flags);
>>   			locked = false;
>> @@ -210,7 +217,7 @@ static bool compact_checklock_irqsave(spinlock_t *lock, unsigned long *flags,
>>
>>   		/* async aborts if taking too long or contended */
>>   		if (cc->mode == MIGRATE_ASYNC) {
>> -			cc->contended = true;
>> +			cc->contended = contended;
>>   			return false;
>>   		}
>>
>> @@ -236,7 +243,7 @@ static inline bool compact_should_abort(struct compact_control *cc)
>>   	/* async compaction aborts if contended */
>>   	if (need_resched()) {
>>   		if (cc->mode == MIGRATE_ASYNC) {
>> -			cc->contended = true;
>> +			cc->contended = COMPACT_CONTENDED_SCHED;
>>   			return true;
>>   		}
>>
>> @@ -1095,7 +1102,8 @@ static unsigned long compact_zone_order(struct zone *zone, int order,
>>   	VM_BUG_ON(!list_empty(&cc.freepages));
>>   	VM_BUG_ON(!list_empty(&cc.migratepages));
>>
>> -	*contended = cc.contended;
>> +	/* We only signal lock contention back to the allocator */
>> +	*contended = cc.contended == COMPACT_CONTENDED_LOCK;
>>   	return ret;
>>   }
>>
>> diff --git a/mm/internal.h b/mm/internal.h
>> index 7f22a11f..4659e8e 100644
>> --- a/mm/internal.h
>> +++ b/mm/internal.h
>> @@ -117,6 +117,13 @@ extern int user_min_free_kbytes;
>>
>>   #if defined CONFIG_COMPACTION || defined CONFIG_CMA
>>
>> +/* Used to signal whether compaction detected need_sched() or lock contention */
>> +enum compact_contended {
>> +	COMPACT_CONTENDED_NONE = 0, /* no contention detected */
>> +	COMPACT_CONTENDED_SCHED,    /* need_sched() was true */
>> +	COMPACT_CONTENDED_LOCK,     /* zone lock or lru_lock was contended */
>> +};
>> +
>>   /*
>>    * in mm/compaction.c
>>    */
>> @@ -144,10 +151,10 @@ struct compact_control {
>>   	int order;			/* order a direct compactor needs */
>>   	int migratetype;		/* MOVABLE, RECLAIMABLE etc */
>>   	struct zone *zone;
>> -	bool contended;			/* True if a lock was contended, or
>> -					 * need_resched() true during async
>> -					 * compaction
>> -					 */
>> +	enum compact_contended contended; /* Signal need_sched() or lock
>> +					   * contention detected during
>> +					   * compaction
>> +					   */
>>   };
>>
>>   unsigned long
>> --
>> 1.8.4.5
>>
>> --
>> To unsubscribe, send a message with 'unsubscribe linux-mm' in
>> the body to majordomo@...ck.org.  For more info on Linux MM,
>> see: http://www.linux-mm.org/ .
>> Don't email: <a href=mailto:"dont@...ck.org"> email@...ck.org </a>
>

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/