linux-kernel - Re: [PATCH v2 1/2] mm:vmscan: the dirty folio in folio

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <b1d05ce2-1625-490f-ac5a-c88d3468385f@vivo.com>
Date:   Fri, 20 Oct 2023 12:09:47 +0800
From:   zhiguojiang <justinjiang@...o.com>
To:     David Hildenbrand <david@...hat.com>,
        Andrew Morton <akpm@...ux-foundation.org>, linux-mm@...ck.org,
        linux-kernel@...r.kernel.org
Cc:     opensource.kernel@...o.com
Subject: Re: [PATCH v2 1/2] mm:vmscan: the dirty folio in folio_list skip
 unmap



在 2023/10/20 11:59, zhiguojiang 写道:
>
>
> 在 2023/10/19 22:15, David Hildenbrand 写道:
>> [你通常不会收到来自 david@...hat.com 的电子邮件。请访问 
>> https://aka.ms/LearnAboutSenderIdentification，以了解这一点为什么很重要]
>>
>> On 19.10.23 15:14, Zhiguo Jiang wrote:
>>> In the shrink_folio_list() the sources of the file dirty folio include
>>> two ways below:
>>> 1. The dirty folio is from the incoming parameter folio_list,
>>>     which is the inactive file lru.
>>> 2. The dirty folio is from the PTE dirty bit transferred by
>>>     the try_to_unmap().
>>>
>>> For the first source of the dirty folio, if the dirty folio does not
>>> support pageout, the dirty folio can skip unmap in advance to reduce
>>> recyling time.
>>>
>>> Signed-off-by: Zhiguo Jiang <justinjiang@...o.com>
>>> ---
>>>
>>> Changelog:
>>> v1->v2:
>>> 1. Keep the original judgment flow.
>>> 2. Add the interface of folio_check_pageout().
>>> 3. The dirty folio which does not support pageout in inactive file lru
>>>     skip unmap in advance.
>>>
>>>   mm/vmscan.c | 103 
>>> +++++++++++++++++++++++++++++++++-------------------
>>>   1 file changed, 66 insertions(+), 37 deletions(-)
>>>
>>> diff --git a/mm/vmscan.c b/mm/vmscan.c
>>> index a68d01fcc307..e067269275a5 100755
>>> --- a/mm/vmscan.c
>>> +++ b/mm/vmscan.c
>>> @@ -925,6 +925,44 @@ static void folio_check_dirty_writeback(struct 
>>> folio *folio,
>>>               mapping->a_ops->is_dirty_writeback(folio, dirty, 
>>> writeback);
>>>   }
>>>
>>> +/* Check if a dirty folio can support pageout in the recyling 
>>> process*/
>>> +static bool folio_check_pageout(struct folio *folio,
>>> +                                             struct pglist_data 
>>> *pgdat)
>>> +{
>>> +     int ret = true;
>>> +
>>> +     /*
>>> +      * Anonymous folios are not handled by flushers and must be 
>>> written
>>> +      * from reclaim context. Do not stall reclaim based on them.
>>> +      * MADV_FREE anonymous folios are put into inactive file list 
>>> too.
>>> +      * They could be mistakenly treated as file lru. So further anon
>>> +      * test is needed.
>>> +      */
>>> +     if (!folio_is_file_lru(folio) ||
>>> +             (folio_test_anon(folio) && 
>>> !folio_test_swapbacked(folio)))
>>> +             goto out;
>>> +
>>> +     if (folio_test_dirty(folio) &&
>>> +             (!current_is_kswapd() ||
>>> +              !folio_test_reclaim(folio) ||
>>> +              !test_bit(PGDAT_DIRTY, &pgdat->flags))) {
>>> +             /*
>>> +              * Immediately reclaim when written back.
>>> +              * Similar in principle to folio_deactivate()
>>> +              * except we already have the folio isolated
>>> +              * and know it's dirty
>>> +              */
>>> +             node_stat_mod_folio(folio, NR_VMSCAN_IMMEDIATE,
>>> +                     folio_nr_pages(folio));
>>> +             folio_set_reclaim(folio);
>>> +
>>> +             ret = false;
>>> +     }
>>> +
>>> +out:
>>> +     return ret;
>>> +}
>>> +
>>>   static struct folio *alloc_demote_folio(struct folio *src,
>>>               unsigned long private)
>>>   {
>>> @@ -1078,6 +1116,12 @@ static unsigned int shrink_folio_list(struct 
>>> list_head *folio_list,
>>>               if (dirty && !writeback)
>>>                       stat->nr_unqueued_dirty += nr_pages;
>>>
>>> +             /* If the dirty folio dose not support pageout,
>>> +              * the dirty folio can skip this recycling.
>>> +              */
>>> +             if (!folio_check_pageout(folio, pgdat))
>>> +                     goto activate_locked;
>>> +
>>>               /*
>>>                * Treat this folio as congested if folios are cycling
>>>                * through the LRU so quickly that the folios marked
>>> @@ -1261,43 +1305,6 @@ static unsigned int shrink_folio_list(struct 
>>> list_head *folio_list,
>>>                       enum ttu_flags flags = TTU_BATCH_FLUSH;
>>>                       bool was_swapbacked = 
>>> folio_test_swapbacked(folio);
>>>
>>> -                     if (folio_test_dirty(folio)) {
>>> -                             /*
>>> -                              * Only kswapd can writeback 
>>> filesystem folios
>>> -                              * to avoid risk of stack overflow. 
>>> But avoid
>>> -                              * injecting inefficient single-folio 
>>> I/O into
>>> -                              * flusher writeback as much as 
>>> possible: only
>>> -                              * write folios when we've encountered 
>>> many
>>> -                              * dirty folios, and when we've 
>>> already scanned
>>> -                              * the rest of the LRU for clean 
>>> folios and see
>>> -                              * the same dirty folios again (with 
>>> the reclaim
>>> -                              * flag set).
>>> -                              */
>>> -                             if (folio_is_file_lru(folio) &&
>>> -                                     (!current_is_kswapd() ||
>>> - !folio_test_reclaim(folio) ||
>>> -                                      !test_bit(PGDAT_DIRTY, 
>>> &pgdat->flags))) {
>>> -                                     /*
>>> -                                      * Immediately reclaim when 
>>> written back.
>>> -                                      * Similar in principle to 
>>> folio_deactivate()
>>> -                                      * except we already have the 
>>> folio isolated
>>> -                                      * and know it's dirty
>>> -                                      */
>>> - node_stat_mod_folio(folio, NR_VMSCAN_IMMEDIATE,
>>> - nr_pages);
>>> - folio_set_reclaim(folio);
>>> -
>>> -                                     goto activate_locked;
>>> -                             }
>>> -
>>> -                             if (references == FOLIOREF_RECLAIM_CLEAN)
>>> -                                     goto keep_locked;
>>> -                             if (!may_enter_fs(folio, sc->gfp_mask))
>>> -                                     goto keep_locked;
>>> -                             if (!sc->may_writepage)
>>> -                                     goto keep_locked;
>>> -                     }
>>> -
>>>                       if (folio_test_pmd_mappable(folio))
>>>                               flags |= TTU_SPLIT_HUGE_PMD;
>>>
>>> @@ -1323,6 +1330,28 @@ static unsigned int shrink_folio_list(struct 
>>> list_head *folio_list,
>>>
>>>               mapping = folio_mapping(folio);
>>>               if (folio_test_dirty(folio)) {
>>> +                     /*
>>> +                      * Only kswapd can writeback filesystem folios
>>> +                      * to avoid risk of stack overflow. But avoid
>>> +                      * injecting inefficient single-folio I/O into
>>> +                      * flusher writeback as much as possible: only
>>> +                      * write folios when we've encountered many
>>> +                      * dirty folios, and when we've already scanned
>>> +                      * the rest of the LRU for clean folios and see
>>> +                      * the same dirty folios again (with the reclaim
>>> +                      * flag set).
>>> +                      */
>>> +                     if (folio_is_file_lru(folio) &&
>>> +                             !folio_check_pageout(folio, pgdat))
>>> +                             goto activate_locked;
>>> +
>>> +                     if (references == FOLIOREF_RECLAIM_CLEAN)
>>> +                             goto keep_locked;
>>> +                     if (!may_enter_fs(folio, sc->gfp_mask))
>>> +                             goto keep_locked;
>>> +                     if (!sc->may_writepage)
>>> +                             goto keep_locked;
>>> +
>>>                       /*
>>>                        * Folio is dirty. Flush the TLB if a writable 
>>> entry
>>>                        * potentially exists to avoid CPU writes 
>>> after I/O
>>
>> I'm confused. Did you apply this on top of v1 by accident?
> Hi,
> According to my modified mm_vmscan_lru_shrink_inactive test tracelog, 
> in the 32 scanned inactive file pages, 20 were dirty, and the 20 dirty 
> pages were not reclamed, but they took 20us to perform try_to_unmap.
>
> I think unreclaimed dirty folio in inactive file lru can skip to 
> perform try_to_unmap. Please help to continue review. Thanks.
>
> kswapd0-99      (     99) [005] .....   687.793724: 
> mm_vmscan_lru_shrink_inactive: [Justin] nid 0 scan=32 isolate=32 
> reclamed=12 nr_dirty=20 nr_unqueued_dirty=20 nr_writeback=0 
> nr_congested=0 nr_immediate=0 nr_activate[0]=0 nr_activate[1]=20 
> nr_ref_keep=0 nr_unmap_fail=0 priority=2 
> file=RECLAIM_WB_FILE|RECLAIM_WB_ASYNC total=39 exe=0 reference_cost=5 
> reference_exe=0 unmap_cost=21 unmap_exe=0 dirty_unmap_cost=20 
> dirty_unmap_exe=0 pageout_cost=0 pageout_exe=0
>
To supplement, I think the unreclaimed dirty folio of the inactive file 
lru in shrink_folio_list() can exit the recyling flow in advance and 
avoid to execute some time-consuming interfaces, such as 
folio_check_references() and try_to_unmap().
>> -- 
>> Cheers,
>>
>> David / dhildenb
>>
>