linux-kernel - Re: [PATCH] mm/huge_memory: fix swap entry values of tail pages of THP

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <a683e199-ce8a-4534-a21e-65f2528415a6@redhat.com>
Date: Tue, 13 Feb 2024 10:25:22 +0100
From: David Hildenbrand <david@...hat.com>
To: Charan Teja Kalla <quic_charante@...cinc.com>,
 gregkh@...uxfoundation.org, akpm@...ux-foundation.org, willy@...radead.org,
 vbabka@...e.cz, dhowells@...hat.com, surenb@...gle.com
Cc: linux-mm@...ck.org, linux-kernel@...r.kernel.org,
 # see patch description <stable@...r.kernel.org>
Subject: Re: [PATCH] mm/huge_memory: fix swap entry values of tail pages of
 THP

On 13.02.24 09:48, Charan Teja Kalla wrote:
> An anon THP page is first added to swap cache before reclaiming it.
> Initially, each tail page contains the proper swap entry value(stored in
> ->private field) which is filled from add_to_swap_cache(). After

This is a stable-only fix? In that case, it make sense to indicate that 
in the patch subject [PATCH STABLE vX.Y].

But it's always odd to have stable-only fixes, the docs [1] don't cover 
that (maybe they should? or I missed it).

[1] https://www.kernel.org/doc/Documentation/process/stable-kernel-rules.rst


So we are migrating a THP that was added to the swapcache. Do you have a 
reproducer?

> migrating the THP page sitting on the swap cache, only the swap entry of
> the head page is filled(see folio_migrate_mapping()).
> 
> Now when this page is tried to split(one case is when this page is again
> migrated, see migrate_pages()->try_split_thp()), the tail pages
> ->private is not stored with proper swap entry values.  When this tail
> page is now try to be freed, as part of it delete_from_swap_cache() is
> called which operates on the wrong swap cache index and eventually
> replaces the wrong swap cache index with shadow/NULL value, frees the
> page.

But what if we don't split the THP after migration. Is there anything 
else that can go wrong?

It's sufficient to look where upstream calls page_swap_entry():

For example, do_swap_page() will never be able to make progress, because 
the swap entry of the page does not correspond to the swap entry in the 
PTE? It can easily fault on a swap PTE that refers a THP subpage in the 
swapcache.

So unless I am missing something, only fixing this up during the split 
is insufficient. You have to fix it up during migration.

> 
> This leads to the state with a swap cache containing the freed page.
> This issue can manifest in many forms and the most common thing observed
> is the rcu stall during the swapin (see mapping_get_entry()).
> 
> On the recent kernels, this issues is indirectly getting fixed with the
> series[1], to be specific[2].
> 
> When tried to back port this series, it is observed many merge
> conflicts and also seems dependent on many other changes. As backporting
> to LTS branches is not a trivial one, the similar change from [2] is
> picked as a fix.

The fix is in

commit cfeed8ffe55b37fa10286aaaa1369da00cb88440
Author: David Hildenbrand <david@...hat.com>
Date:   Mon Aug 21 18:08:46 2023 +0200

     mm/swap: stop using page->private on tail pages for THP_SWAP

> 
> [1] https://lore.kernel.org/all/20230821160849.531668-1-david@redhat.com/
> [2] https://lore.kernel.org/all/20230821160849.531668-5-david@redhat.com/
> 
> Closes: https://lore.kernel.org/linux-mm/69cb784f-578d-ded1-cd9f-c6db04696336@quicinc.com/
> Fixes: 3417013e0d18 ("mm/migrate: Add folio_migrate_mapping()")
> Cc: <stable@...r.kernel.org> # see patch description, applicable to <=6.1
> Signed-off-by: Charan Teja Kalla <quic_charante@...cinc.com>

3417013e0d18 went into 5.16.

cfeed8ffe55b3 went into 6.6.

So only 6.1 is affected.


Isn't there a way to bite the bullet and backport that series to 6.1 
instead?

-- 
Cheers,

David / dhildenb