linux-kernel - Re: [PATCH hotfix] mm: fix crashes from deferred split racing folio migration

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <ec3a5d94-1985-f66d-1aa8-3783fe498f5a@google.com>
Date: Tue, 2 Jul 2024 09:15:54 -0700 (PDT)
From: Hugh Dickins <hughd@...gle.com>
To: Baolin Wang <baolin.wang@...ux.alibaba.com>
cc: Hugh Dickins <hughd@...gle.com>, Andrew Morton <akpm@...ux-foundation.org>, 
    Nhat Pham <nphamcs@...il.com>, Yang Shi <shy828301@...il.com>, 
    Zi Yan <ziy@...dia.com>, Barry Song <baohua@...nel.org>, 
    Kefeng Wang <wangkefeng.wang@...wei.com>, 
    David Hildenbrand <david@...hat.com>, Matthew Wilcox <willy@...radead.org>, 
    linux-kernel@...r.kernel.org, linux-mm@...ck.org
Subject: Re: [PATCH hotfix] mm: fix crashes from deferred split racing folio
 migration

On Tue, 2 Jul 2024, Baolin Wang wrote:
> On 2024/7/2 15:40, Hugh Dickins wrote:
> > Even on 6.10-rc6, I've been seeing elusive "Bad page state"s (often on
> > flags when freeing, yet the flags shown are not bad: PG_locked had been
> > set and cleared??), and VM_BUG_ON_PAGE(page_ref_count(page) == 0)s from
> > deferred_split_scan()'s folio_put(), and a variety of other BUG and WARN
> > symptoms implying double free by deferred split and large folio migration.
> > 
> > 6.7 commit 9bcef5973e31 ("mm: memcg: fix split queue list crash when large
> > folio migration") was right to fix the memcg-dependent locking broken in
> > 85ce2c517ade ("memcontrol: only transfer the memcg data for migration"),
> > but missed a subtlety of deferred_split_scan(): it moves folios to its own
> > local list to work on them without split_queue_lock, during which time
> > folio->_deferred_list is not empty, but even the "right" lock does nothing
> > to secure the folio and the list it is on.
> > 
> > Fortunately, deferred_split_scan() is careful to use folio_try_get(): so
> > folio_migrate_mapping() can avoid the race by folio_undo_large_rmappable()
> > while the old folio's reference count is temporarily frozen to 0 - adding
> > such a freeze in the !mapping case too (originally, folio lock and

(I should have added "isolation and" into that list of conditions.)

> > unmapping and no swap cache left an anon folio unreachable, so no freezing
> > was needed there: but the deferred split queue offers a way to reach it).
> 
> Thanks Hugh.
> 
> But after reading your analysis, I am concerned that the
> folio_undo_large_rmappable() and deferred_split_scan() may still encounter a
> race condition with the local list, even with your patch.
> 
> Suppose folio A has already been queued into the local list in
> deferred_split_scan() by thread A, but fails to 'folio_trylock' and then
> releases the reference count. At the same time, folio A can be frozen by
> another thread B in folio_migrate_mapping(). In such a case,
> folio_undo_large_rmappable() would remove folio A from the local list without
> *any* lock protection, creating a race condition with the local list iteration
> in deferred_split_scan().

It's a good doubt to raise, but I think we're okay: because Kirill
designed the deferred split list (and its temporary local list) to
be safe in that way.

You're right that if thread B's freeze to 0 wins the race, thread B
will be messing with a list on thread A's stack while thread A is
quite possibly running; but thread A will not leave that stack frame
without taking again the split_queue_lock which thread B holds while
removing from the list.

We would certainly not want to introduce such a subtlety right now!
But never mind page migration, this is how it has always worked when
racing with the folio being freed at the same time - maybe
deferred_split_scan() wins the freeing race and is the one to remove
folio from deferred split list, or maybe the other freer does that.

I forget whether there was an initial flurry of races to be fixed when
it came in, but it has been stable against concurrent freeing for years.

Please think this over again: do not trust my honeyed words!

> 
> Anyway, I think this patch can still fix some possible races. Feel free to
> add:
> Reviewed-by: Baolin Wang <baolin.wang@...ux.alibaba.com>

Thanks, but I certainly don't want this to go into the tree if it's
still flawed as you suggest.

Hugh