[<prev] [next>] [day] [month] [year] [list]
Message-ID: <d7c5959c-fd3d-4406-b3eb-a4cbf04121d4@oppo.com>
Date: Tue, 12 Mar 2024 17:22:59 +0800
From: 李培锋 <lipeifeng@...o.com>
To: Matthew Wilcox <willy@...radead.org>
Cc: 21cnbao@...il.com, akpm@...ux-foundation.org, david@...hat.com,
osalvador@...e.de, linux-mm@...ck.org, linux-kernel@...r.kernel.org,
Minchan Kim <minchan@...nel.org>
Subject: Re: [PATCH v2 0/2] reclaim contended folios asynchronously instead of
promoting them
在 2024/3/8 14:41, 李培锋 写道:
>
>
> 在 2024/3/8 12:56, Matthew Wilcox 写道:
>> On Fri, Mar 08, 2024 at 11:11:24AM +0800,lipeifeng@...o.com wrote:
>>> Commit 6d4675e60135 ("mm: don't be stuck to rmap lock on reclaim path")
>>> prevents the reclaim path from becoming stuck on the rmap lock. However,
>>> it reinserts those folios at the head of the LRU during shrink_folio_list,
>>> even if those folios are very cold.
>> This seems like a lot of new code. Did you consider something simpler
>> like this?
>>
>> Also, this is Minchan's patch you're complaining about. Add him to the
>> cc.
>>
>> +++ b/mm/vmscan.c
>> @@ -817,6 +817,7 @@ enum folio_references {
>> FOLIOREF_RECLAIM,
>> FOLIOREF_RECLAIM_CLEAN,
>> FOLIOREF_KEEP,
>> + FOLIOREF_RESCAN,
>> FOLIOREF_ACTIVATE,
>> };
>>
>> @@ -837,9 +838,9 @@ static enum folio_references folio_check_references(struct folio *folio,
>> if (vm_flags & VM_LOCKED)
>> return FOLIOREF_ACTIVATE;
>>
>> - /* rmap lock contention: rotate */
>> + /* rmap lock contention: keep at the tail */
>> if (referenced_ptes == -1)
>> - return FOLIOREF_KEEP;
>> + return FOLIOREF_RESCAN;
>>
>> if (referenced_ptes) {
>> /*
>> @@ -1164,6 +1165,7 @@ static unsigned int shrink_folio_list(struct list_head *folio_list,
>> case FOLIOREF_ACTIVATE:
>> goto activate_locked;
>> case FOLIOREF_KEEP:
>> + case FOLIOREF_RESCAN:
>> stat->nr_ref_keep += nr_pages;
>> goto keep_locked;
>> case FOLIOREF_RECLAIM:
>> @@ -1446,7 +1448,10 @@ static unsigned int shrink_folio_list(struct list_head *folio_list,
>> keep_locked:
>> folio_unlock(folio);
>> keep:
>> - list_add(&folio->lru, &ret_folios);
>> + if (references == FOLIOREF_RESCAN)
>> + list_add(&folio->lru, &rescan_folios);
>> + else
>> + list_add(&folio->lru, &ret_folios);
>> VM_BUG_ON_FOLIO(folio_test_lru(folio) ||
>> folio_test_unevictable(folio), folio);
>> }
>
> Actually, we have tested the implementation method you mentioned:
>
> Putting back the contended-folios in the tail of LRU during
> shrink_folio_list
>
> and rescan it in next shrink_folio_list.
>
> In some cases, we found the another serious problems that more and more
>
> contended-folios were piled up at the tail of the LRU, which caused to
> the
>
> serious lowmem-situation, because none of folios isolated could be
> reclaimed
>
> since lock-contended during shrink_folio_list.
>
Let me provide more detail.
In fact, we have tested the implementation you mentioned:
if folio is found to be in rmap lock-contention during
shrink_folio_list, it would be put back to the end of LRU and rescanned
in the next shrink_fofolio_list.
During the testing, we found a serious problem:
In some shrink_folio_list,all isolated pages could not be reclaimed due
to rmap lock-contention, resulting in a serious memory reclam
inefficiency and insufficient memfree.
The specific reasons are as follows:
In the case of insufficient memory, if folios are put back to the tail
of LRU due to rmap lock-contention during shirnk_folio_list, they will
be isolated in shrink_inactive_list soon and attempted to be reclaimed
by the next shrink_folio_list.But these folios are still likely to fail
to reclaim due to rmap lock-contention in the short term and put back to
the tail of LRU again.
As the testing progressed, more and more folios with high probability of
rmap lock-contention were put back to the tail of the LRU during
shrink_inactive_list, ultimately resulting in no folios isolated could
be successfully reclaimed in shrink_folio_list.
The shrink_inactive_list procedure does the following:
shrink_inactive_list()
-> isolate_lru_folios():
isolate the 32 folios from the tail of LRU(some of which may have been
put back in LRU last shrink_folio_list since rmap lock-contention)
-> shrink_folio_list():
reclaime folios and putback rmap lock-contended folios to the tail of LRU
For example, assuming all folios which were put back in LRU due to rmap
lock-contention in last shrink_folio_list, can not be reclaimed
successfully because of rmap lock-contention in some case:
1st shrink_inactive_list():
-> isolate_lru_folios():isolate 32 folios
-> shrink_folio_list():reclaim 24 folios, putback 8 rmap lock-contended
folios
2nd shrink_inactive_list():
-> isolate_lru_folios():isolate 32 folios, include 8 rmap lock-contended
folios
-> shrink_folio_list():reclaim 16 folios, putback 16 rmap lock-contended
folios
3rd shrink_inactive_list():
-> isolate_lru_folios():isolate 32 folios, include 16 rmap
lock-contended folios
-> shrink_folio_list():reclaim 8 folios, putback 24 rmap lock-contended
folios
4th shrink_inactive_list():
-> isolate_lru_folios():isolate 32 folios, include 24 rmap
lock-contended folios
-> shrink_folio_list():reclaim 0 folios, putback 32 rmap lock-contended
folios
5th shrink_inactive_list():
-> isolate_lru_folios():isolate 32 folios, include 32 rmap
lock-contended folios
-> shrink_folio_list():reclaim 0 folios, putback 32 rmap lock-contended
folios
Powered by blists - more mailing lists