linux-kernel - Re: [PATCH 4/4] mm, compaction: Finish pageblocks on complete migration failure

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [day] [month] [year] [list]
Message-ID: <bddb8cd0-398b-bd57-5d19-7b046db6c0ea@suse.cz>
Date:   Mon, 13 Feb 2023 22:50:57 +0100
From:   Vlastimil Babka <vbabka@...e.cz>
To:     Mel Gorman <mgorman@...hsingularity.net>
Cc:     Andrew Morton <akpm@...ux-foundation.org>,
        Jiri Slaby <jirislaby@...nel.org>,
        Maxim Levitsky <mlevitsk@...hat.com>,
        Michal Hocko <mhocko@...nel.org>,
        Pedro Falcato <pedro.falcato@...il.com>,
        Paolo Bonzini <pbonzini@...hat.com>,
        Chuyi Zhou <zhouchuyi@...edance.com>,
        Linux-MM <linux-mm@...ck.org>,
        LKML <linux-kernel@...r.kernel.org>
Subject: Re: [PATCH 4/4] mm, compaction: Finish pageblocks on complete
 migration failure

On 2/7/23 18:42, Vlastimil Babka wrote:
> On 1/25/23 14:44, Mel Gorman wrote:
>> Commit 7efc3b726103 ("mm/compaction: fix set skip in
>> fast_find_migrateblock") address an issue where a pageblock selected
>> by fast_find_migrateblock() was ignored. Unfortunately, the same fix
>> resulted in numerous reports of khugepaged or kcompactd stalling for
>> long periods of time or consuming 100% of CPU.
>>
>> Tracing showed that there was a lot of rescanning between a small subset
>> of pageblocks because the conditions for marking the block skip are not
>> met. The scan is not reaching the end of the pageblock because enough
>> pages were isolated but none were migrated successfully. Eventually it
>> circles back to the same block.
>>
>> Pageblock skip tracking tries to minimise both latency and excessive
>> scanning but tracking exactly when a block is fully scanned requires an
>> excessive amount of state. This patch forcibly rescans a pageblock when
>> all isolated pages fail to migrate even though it could be for transient
>> reasons such as page writeback or page dirty. This will sometimes migrate
>> too many pages but pageblocks will be marked skip and forward progress
>> will be made.
>>
>> "Usemen" from the mmtests configuration
>> workload-usemem-stress-numa-compact was used to stress compaction.
>> The compaction trace events were recorded using a 6.2-rc5 kernel that
>> includes commit 7efc3b726103 and count of unique ranges were measured. The
>> top 5 ranges were
>>
>>    3076 range=(0x10ca00-0x10cc00)
>>    3076 range=(0x110a00-0x110c00)
>>    3098 range=(0x13b600-0x13b800)
>>    3104 range=(0x141c00-0x141e00)
>>   11424 range=(0x11b600-0x11b800)
>>
>> While this workload is very different than what the bugs reported,
>> the pattern of the same subset of blocks being repeatedly scanned is
>> observed. At one point, *only* the range range=(0x11b600 ~ 0x11b800)
>> was scanned for 2 seconds. 14 seconds passed between the first
>> migration-related event and the last.
>>
>> With the series applied including this patch, the top 5 ranges were
>>
>>       1 range=(0x11607e-0x116200)
>>       1 range=(0x116200-0x116278)
>>       1 range=(0x116278-0x116400)
>>       1 range=(0x116400-0x116424)
>>       1 range=(0x116424-0x116600)
>>
>> Only unique ranges were scanned and the time between the first
>> migration-related event was 0.11 milliseconds.
>>
>> Fixes: 7efc3b726103 ("mm/compaction: fix set skip in fast_find_migrateblock")
>> Signed-off-by: Mel Gorman <mgorman@...hsingularity.net>
> 
> While this seems like it will mostly prevent the issue at hand (I think
> kcompactd is still a hazard, see below), I'm not super happy about some of
> the implementation details.

For the record, after some discussion with Mel, my concerns are not a blocker
and the series can proceed from mm-stable to 6.3.

> 1. it modifies code that was meant to quickly skip an order-aligned block
> where a page migration failed during MIGRATE_ASYNC direct compaction, as
> it's very unlikely to sucessfully form a free page of that order in that
> block. Now instead it will finish the whole pageblock in that case, which
> could be lots of useless work and thus exactly the opposite of what we
> wanted for MIGRATE_ASYNC direct compaction.

There are both advantages and disadvantages to this so not clear win or lose.

> 2. The conditions "cc->direct_compaction" and "(cc->mode < MIGRATE_SYNC)"
> seem a bit hazardous. I think we can have a compaction where these are not
> true, and yet it uses fast_find_migrateblock() and thus can exhibit the bug
> but won't be forced to rescan?
> AFAICS kcompactd_do_work()
> - is MIGRATE_SYNC_LIGHT
> - has ignore_skip_hint = false
> - has direct_compaction = false
> 
> so AFAICS it will use fast_find_migrateblock() and not bail out in one of
> the preconditions. But the cc->direct_compaction condition here won't apply.

This is only a concern once we attempt to un-revert 7efc3b726103 again so
only necessary to be addressed as part of future series leading to the un-revert.

> So it might be better to leave the current "skip the rest of block" check
> alone, and add a separate check for the finish_pageblock rescan that will
> not miss some cases where it should apply - maybe it could check for a
> complete migration failure specifically as well?
>> ---
>>  mm/compaction.c | 30 ++++++++++++++++++++++--------
>>  1 file changed, 22 insertions(+), 8 deletions(-)
>>
>> diff --git a/mm/compaction.c b/mm/compaction.c
>> index 4b3a0238879c..937ec2f05f2c 100644
>> --- a/mm/compaction.c
>> +++ b/mm/compaction.c
>> @@ -2394,6 +2394,7 @@ compact_zone(struct compact_control *cc, struct capture_control *capc)
>>  			cc->finish_pageblock = true;
>>  		}
>>  
>> +rescan:
>>  		switch (isolate_migratepages(cc)) {
>>  		case ISOLATE_ABORT:
>>  			ret = COMPACT_CONTENDED;
>> @@ -2436,15 +2437,28 @@ compact_zone(struct compact_control *cc, struct capture_control *capc)
>>  				goto out;
>>  			}
>>  			/*
>> -			 * We failed to migrate at least one page in the current
>> -			 * order-aligned block, so skip the rest of it.
>> +			 * If an ASYNC or SYNC_LIGHT fails to migrate a page
>> +			 * within the current order-aligned block, scan the
>> +			 * remainder of the pageblock. This will mark the
>> +			 * pageblock "skip" to avoid rescanning in the near
>> +			 * future. This will isolate more pages than necessary
>> +			 * for the request but avoid loops due to
>> +			 * fast_find_migrateblock revisiting blocks that were
>> +			 * recently partially scanned.
>>  			 */
>> -			if (cc->direct_compaction &&
>> -						(cc->mode == MIGRATE_ASYNC)) {
>> -				cc->migrate_pfn = block_end_pfn(
>> -						cc->migrate_pfn - 1, cc->order);
>> -				/* Draining pcplists is useless in this case */
>> -				last_migrated_pfn = 0;
>> +			if (cc->direct_compaction && !cc->finish_pageblock &&
>> +						(cc->mode < MIGRATE_SYNC)) {
>> +				cc->finish_pageblock = true;
>> +
>> +				/*
>> +				 * Draining pcplists does not help THP if
>> +				 * any page failed to migrate. Even after
>> +				 * drain, the pageblock will not be free.
>> +				 */
>> +				if (cc->order == COMPACTION_HPAGE_ORDER)
>> +					last_migrated_pfn = 0;
>> +
>> +				goto rescan;
>>  			}
>>  		}
>>  
> 
>