linux-kernel - Re: [RFC PATCH v2 1/1] mm/vmscan: move the written-back folios to the tail of LRU after shrinking

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <bf98a80a-2be0-413f-8a7a-34bb17f053cc@huawei.com>
Date: Fri, 29 Nov 2024 10:25:02 +0800
From: chenridong <chenridong@...wei.com>
To: Barry Song <21cnbao@...il.com>, Yu Zhao <yuzhao@...gle.com>
CC: Matthew Wilcox <willy@...radead.org>, Chris Li <chrisl@...nel.org>, Chen
 Ridong <chenridong@...weicloud.com>, <akpm@...ux-foundation.org>,
	<mhocko@...e.com>, <hannes@...xchg.org>, <yosryahmed@...gle.com>,
	<yuzhao@...gle.com>, <david@...hat.com>, <ryan.roberts@....com>,
	<linux-mm@...ck.org>, <linux-kernel@...r.kernel.org>,
	<wangweiyang2@...wei.com>, <xieym_ict@...mail.com>
Subject: Re: [RFC PATCH v2 1/1] mm/vmscan: move the written-back folios to the
 tail of LRU after shrinking



On 2024/11/29 7:08, Barry Song wrote:
> On Mon, Nov 25, 2024 at 2:19 PM chenridong <chenridong@...wei.com> wrote:
>>
>>
>>
>> On 2024/11/18 12:21, Matthew Wilcox wrote:
>>> On Mon, Nov 18, 2024 at 05:14:14PM +1300, Barry Song wrote:
>>>> On Mon, Nov 18, 2024 at 5:03 PM Matthew Wilcox <willy@...radead.org> wrote:
>>>>>
>>>>> On Sat, Nov 16, 2024 at 09:16:58AM +0000, Chen Ridong wrote:
>>>>>> 2. In shrink_page_list function, if folioN is THP(2M), it may be splited
>>>>>>    and added to swap cache folio by folio. After adding to swap cache,
>>>>>>    it will submit io to writeback folio to swap, which is asynchronous.
>>>>>>    When shrink_page_list is finished, the isolated folios list will be
>>>>>>    moved back to the head of inactive lru. The inactive lru may just look
>>>>>>    like this, with 512 filioes have been move to the head of inactive lru.
>>>>>
>>>>> I was hoping that we'd be able to stop splitting the folio when adding
>>>>> to the swap cache.  Ideally. we'd add the whole 2MB and write it back
>>>>> as a single unit.
>>>>
>>>> This is already the case: adding to the swapcache doesn’t require splitting
>>>> THPs, but failing to allocate 2MB of contiguous swap slots will.
>>>
>>> Agreed we need to understand why this is happening.  As I've said a few
>>> times now, we need to stop requiring contiguity.  Real filesystems don't
>>> need the contiguity (they become less efficient, but they can scatter a
>>> single 2MB folio to multiple places).
>>>
>>> Maybe Chris has a solution to this in the works?
>>>
>>
>> Hi, Chris, do you have a better idea to solve this issue?
> 
> Not Chris. As I read the code again, we have already the below code to fixup
> the issue "missed folio_rotate_reclaimable()" in evict_folios():
> 
>                 /* retry folios that may have missed
> folio_rotate_reclaimable() */
>                 list_move(&folio->lru, &clean);
> 
> It doesn't work for you?
> 
> commit 359a5e1416caaf9ce28396a65ed3e386cc5de663
> Author: Yu Zhao <yuzhao@...gle.com>
> Date:   Tue Nov 15 18:38:07 2022 -0700
>     mm: multi-gen LRU: retry folios written back while isolated
> 
>     The page reclaim isolates a batch of folios from the tail of one of the
>     LRU lists and works on those folios one by one.  For a suitable
>     swap-backed folio, if the swap device is async, it queues that folio for
>     writeback.  After the page reclaim finishes an entire batch, it puts back
>     the folios it queued for writeback to the head of the original LRU list.
> 
>     In the meantime, the page writeback flushes the queued folios also by
>     batches.  Its batching logic is independent from that of the page reclaim.
>     For each of the folios it writes back, the page writeback calls
>     folio_rotate_reclaimable() which tries to rotate a folio to the tail.
> 
> 
>     folio_rotate_reclaimable() only works for a folio after the page reclaim
>     has put it back.  If an async swap device is fast enough, the page
>     writeback can finish with that folio while the page reclaim is still
>     working on the rest of the batch containing it.  In this case, that folio
>     will remain at the head and the page reclaim will not retry it before
>     reaching there.
> 
>     This patch adds a retry to evict_folios().  After evict_folios() has
>     finished an entire batch and before it puts back folios it cannot free
>     immediately, it retries those that may have missed the rotation.
>     Before this patch, ~60% of folios swapped to an Intel Optane missed
>     folio_rotate_reclaimable().  After this patch, ~99% of missed folios were
>     reclaimed upon retry.
> 
>     This problem affects relatively slow async swap devices like Samsung 980
>     Pro much less and does not affect sync swap devices like zram or zswap at
>     all.
> 
>>
>> Best regards,
>> Ridong
> 
> Thanks
> Barry

Thank you for your reply, Barry.
I found this issue with 5.10 version. I reproduced this issue with the
next version, but the CONFIG_LRU_GEN_ENABLED kconfig is disabled. I
tested again with  CONFIG_LRU_GEN_ENABLED enabled, and this issue can be
fixed.

IIUC, the 359a5e1416caaf9ce28396a65ed3e386cc5de663 commit can only work
when CONFIG_LRU_GEN_ENABLED is enabled, but this issue exists when
CONFIG_LRU_GEN_ENABLED is disabled and it should be fixed.

I read the code of commit 359a5e1416caaf9ce28396a65ed3e386cc5de663, it
found folios that are missed to rotate in a more complicated way, but it
 makes it much clearer what is being done. Should I implement in Yu
Zhao's way?

Best regards,
Ridong