lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <6ebb5cd0-0897-4de7-9303-422d0caa18cb@linux.alibaba.com>
Date: Tue, 16 Sep 2025 15:18:28 +0800
From: Baolin Wang <baolin.wang@...ux.alibaba.com>
To: Hugh Dickins <hughd@...gle.com>
Cc: Shakeel Butt <shakeel.butt@...ux.dev>, akpm@...ux-foundation.org,
 hannes@...xchg.org, david@...hat.com, mhocko@...nel.org,
 zhengqi.arch@...edance.com, lorenzo.stoakes@...cle.com, willy@...radead.org,
 linux-mm@...ck.org, linux-kernel@...r.kernel.org
Subject: Re: [PATCH 1/2] mm: vmscan: remove folio_test_private() check in
 pageout()



On 2025/9/16 12:00, Hugh Dickins wrote:
> On Sat, 13 Sep 2025, Baolin Wang wrote:
>> On 2025/9/13 00:13, Shakeel Butt wrote:
>>> On Fri, Sep 12, 2025 at 11:45:07AM +0800, Baolin Wang wrote:
>>>> Currently, we no longer attempt to write back filesystem folios in
>>>> pageout(),
>>>> and only tmpfs/shmem folios and anonymous swapcache folios can be written
>>>> back.
>>>> Moreover, tmpfs/shmem and swapcache folios do not use the PG_private flag,
>>>> which means no fs-private private data is used. Therefore, we can remove
>>>> the
>>>> redundant folio_test_private() checks and related buffer_head release
>>>> logic.
>>>>
>>>> Signed-off-by: Baolin Wang <baolin.wang@...ux.alibaba.com>
>>>> ---
>>>>    mm/vmscan.c | 16 +---------------
>>>>    1 file changed, 1 insertion(+), 15 deletions(-)
>>>>
>>>> diff --git a/mm/vmscan.c b/mm/vmscan.c
>>>> index f1fc36729ddd..8056fccb9cc4 100644
>>>> --- a/mm/vmscan.c
>>>> +++ b/mm/vmscan.c
>>>> @@ -697,22 +697,8 @@ static pageout_t pageout(struct folio *folio, struct
>>>> address_space *mapping,
>>>>      * swap_backing_dev_info is bust: it doesn't reflect the
>>>>      * congestion state of the swapdevs.  Easy to fix, if needed.
>>>>      */
>>>> -	if (!is_page_cache_freeable(folio))
>>>> +	if (!is_page_cache_freeable(folio) || !mapping)
>>>>    		return PAGE_KEEP;
>>>> -	if (!mapping) {
>>>> -		/*
>>>> -		 * Some data journaling orphaned folios can have
>>>> -		 * folio->mapping == NULL while being dirty with clean
>>>> buffers.
>>>> -		 */
>>>
>>> Can this case not happen anymore and try_to_free_buffers is not needed?
>>
>> For dirty file folios, pageout() will return PAGE_KEEP and put them back on
>> the LRU list. So even if mapping = NULL, background workers for writeback will
>> continue to handle them, rather than in shrink_folio_list().
> 
> You've persuaded everyone else, but I'm still not convinced:
> what are those "background workers for writeback",
> that manage to work on orphaned folios with NULL mapping?

Sorry for not being clear. The ‘background workers for writeback’ here 
refer to the workers responsible for handling the writeback of dirty 
data (see wb_workfn()).

> I think *this* is the place which deals with that case, and you're
> now proposing to remove it (and returning PAGE_KEEP not PAGE_CLEAN,
> so it misses the filemap_release_folio() below the switch(pageout())).
> 
> There's even a comment over in migrate_folio_unmap():
> "Everywhere else except page reclaim, the page is invisible to the vm".
> 
> And your argument that the code *afterwards* rejects everything but
> shmem or anon, and neither of those would have folio_test_private(),
> certainly did not convince me.
> 
> Please persuade me.  But I've no evidence that this case does or does
> not still arise; and agree that there must be cleaner ways of doing it.

I did some further analysis, and seems you are right. The flush worker 
does check the folio mapping when writeback, but it does not further 
release the private data, for example, in mpage_prepare_extent_to_map():

/*
  * If the page is no longer dirty, or its mapping no
  * longer corresponds to inode we are writing (which
  * means it has been truncated or invalidated), or the
  * page is already under writeback and we are not doing
  * a data integrity writeback, skip the page
  */
if (!folio_test_dirty(folio) ||
     (folio_test_writeback(folio) &&
      (mpd->wbc->sync_mode == WB_SYNC_NONE)) ||
     unlikely(folio->mapping != mapping)) {
	folio_unlock(folio);
	continue;
}

This is somewhat beyond my expectations. I expected the flush worker 
could handle such cases, allowing page reclaim to skip from handling 
dirty file folios to improve reclaim efficiency. Obviously, I overlooked 
this corner case.

Additionally, I'm still struggling to understand this case where a folio 
is dirty but has a NULL mapping, but I might understand that ext3 
journaling might do this from the comments in truncate_cleanup_folio().

But I still doubt whether this case exists because the refcount check in 
is_page_cache_freeable() considers the pagecache. This means if this 
dirty folio's mapping is NULL, the following check would return false. 
If it returns true, it means that even if we release the private data 
here, the orphaned folio's refcount still doesn't meet the requirements 
for being reclaimed. Please correct me if I missed anything.

static inline int is_page_cache_freeable(struct folio *folio)
{
         /*
          * A freeable page cache folio is referenced only by the caller
          * that isolated the folio, the page cache and optional filesystem
          * private data at folio->private.
          */
         return folio_ref_count(folio) - folio_test_private(folio) ==
                 1 + folio_nr_pages(folio);
}


Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ