linux-kernel - Re: [PATCH v5 4/9] mm: filemap: use xa_get

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [day] [month] [year] [list]

Message-ID: <e88b1850-ca36-aec5-ad27-0b2753c836f5@google.com>
Date: Fri, 30 Aug 2024 03:18:11 -0700 (PDT)
From: Hugh Dickins <hughd@...gle.com>
To: Baolin Wang <baolin.wang@...ux.alibaba.com>
cc: Hugh Dickins <hughd@...gle.com>, Andrew Morton <akpm@...ux-foundation.org>, 
    willy@...radead.org, david@...hat.com, wangkefeng.wang@...wei.com, 
    chrisl@...nel.org, ying.huang@...el.com, 21cnbao@...il.com, 
    ryan.roberts@....com, shy828301@...il.com, ziy@...dia.com, 
    ioworker0@...il.com, da.gomez@...sung.com, p.raghav@...sung.com, 
    linux-mm@...ck.org, linux-kernel@...r.kernel.org
Subject: Re: [PATCH v5 4/9] mm: filemap: use xa_get_order() to get the swap
 entry order

On Thu, 29 Aug 2024, Baolin Wang wrote:
> On 2024/8/29 16:07, Hugh Dickins wrote:
...
> > 
> > Fix below.  Successful testing on mm-everything-2024-08-24-07-21 (well,
> > that minus the commit which spewed warnings from bootup) confirmed it.
> > But testing on mm-everything-2024-08-28-21-38 very quickly failed:
> > unrelated to this series, presumably caused by patch or patches added
> > since 08-24, one kind of crash on one machine (some memcg thing called
> > from isolate_migratepages_block), another kind of crash on another (some
> > memcg thing called from __read_swap_cache_async), I'm exhausted by now
> > but will investigate later in the day (or hope someone else has).
> 
> I saw the isolate_migratepages_block crash issue on
> mm-everything-2024-08-28-09-32, and I reverted Kefeng's series "[PATCH 0/4]
> mm: convert to folio_isolate_movable()", the isolate_migratepages_block issue
> seems to be resolved (at least I can not reproduce it).
> 
> And I have already pointed out some potential issues in Kefeng’s series[1].
> Andrew has dropped this series from mm-everything-2024-08-28-21-38. However,
> you can still encounter the isolate_migratepages_block issue on
> mm-everything-2024-08-28-21-38, while I cannot, weird.

It was not that issue: isolate_migratepages_block() turned out to be an
innocent bystander in my case: and I didn't see it crash there again,
but in a variety of other memcg places, many of them stat updates.

The error came from a different series, fix now posted:
https://lore.kernel.org/linux-mm/56d42242-37fe-b94f-d3cb-00673f1e5efb@google.com/T/#u

> 
> > [PATCH] mm: filemap: use xa_get_order() to get the swap entry order: fix
> > 
> > find_lock_entries(), used in the first pass of shmem_undo_range() and
> > truncate_inode_pages_range() before partial folios are dealt with, has
> > to be careful to avoid those partial folios: as its doc helpfully says,
> > "Folios which are partially outside the range are not returned".  Of
> > course, the same must be true of any value entries returned, otherwise
> > truncation and hole-punch risk erasing swapped areas - as has been seen.
> > 
> > Rewrite find_lock_entries() to emphasize that, following the same pattern
> > for folios and for value entries.
> > 
> > Adjust find_get_entries() slightly, to get order while still holding
> > rcu_read_lock(), and to round down the updated start: good changes, like
> > find_lock_entries() now does, but it's unclear if either is ever important.
> > 
> > Signed-off-by: Hugh Dickins <hughd@...gle.com>
> 
> Thanks Hugh. The changes make sense to me.

Thanks!
Hugh