linux-kernel - Re: Whether is the race for SWP_SYNCHRONOUS_IO possible? (was Re: [PATCH v3 6/7] mm/swap, shmem: use unified swapin helper for shmem)

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <CAMgjq7DhjeeCehzj5hiO=v+X0Jg5mEpKim3k8abJA20TN63SHA@mail.gmail.com>
Date: Wed, 31 Jan 2024 11:58:07 +0800
From: Kairui Song <ryncsn@...il.com>
To: "Huang, Ying" <ying.huang@...el.com>
Cc: Minchan Kim <minchan@...nel.org>, linux-mm@...ck.org, 
	Andrew Morton <akpm@...ux-foundation.org>, Chris Li <chrisl@...nel.org>, 
	Hugh Dickins <hughd@...gle.com>, Johannes Weiner <hannes@...xchg.org>, 
	Matthew Wilcox <willy@...radead.org>, Michal Hocko <mhocko@...e.com>, 
	Yosry Ahmed <yosryahmed@...gle.com>, David Hildenbrand <david@...hat.com>, linux-kernel@...r.kernel.org, 
	Yu Zhao <yuzhao@...gle.com>
Subject: Re: Whether is the race for SWP_SYNCHRONOUS_IO possible? (was Re:
 [PATCH v3 6/7] mm/swap, shmem: use unified swapin helper for shmem)

Hi Ying,

On Wed, Jan 31, 2024 at 10:53 AM Huang, Ying <ying.huang@...el.com> wrote:
>
> Hi, Minchan,
>
> When I review the patchset from Kairui, I checked the code to skip swap
> cache in do_swap_page() for swap device with SWP_SYNCHRONOUS_IO.  Is the
> following race possible?  Where a page is swapped out to a swap device
> with SWP_SYNCHRONOUS_IO and the swap count is 1.  Then 2 threads of the
> process runs on CPU0 and CPU1 as below.  CPU0 is running do_swap_page().

Chris raised a similar issue about the shmem path, and I was worrying
about the same issue in previous discussions about do_swap_page:
https://lore.kernel.org/linux-mm/CAMgjq7AwFiDb7cAMkWMWb3vkccie1-tocmZfT7m4WRb_UKPghg@mail.gmail.com/

"""
In do_swap_page path, multiple process could swapin the page at the
same time (a mapped once page can still be shared by sub threads),
they could get different folios. The later pte lock and pte_same check
is not enough, because while one process is not holding the pte lock,
another process could read-in, swap_free the entry, then swap-out the
page again, using same entry, an ABA problem. The race is not likely
to happen in reality but in theory possible.
"""

>
> CPU0                            CPU1
> ----                            ----
> swap_cache_get_folio()
> check sync io and swap count
> alloc folio
> swap_readpage()
> folio_lock_or_retry()
>                                 swap in the swap entry
>                                 write page
>                                 swap out to same swap entry
> pte_offset_map_lock()
> check pte_same()
> swap_free()   <-- new content lost!
> set_pte_at()  <-- stale page!
> folio_unlock()
> pte_unmap_unlock()

Thank you very much for highlighting this!

My concern previously is the same as yours (swapping out using the
same entry is like an ABA issue, where pte_same failed to detect the
page table change), later when working on V3, I mistakenly thought
that's impossible as entry should be pinned until swap_free on CPU0,
and I'm wrong. CPU1 can also just call swap_free, then swap count is
dropped to 0 and it can just swap out using the same entry. Now I
think my patch 6/7 is also affected by this potential race. Seems
nothing can stop it from doing this.

Actually I was trying to make a reproducer locally, due to swap slot
cache, swap allocation algorithm, and the short race window, this is
very unlikely to happen though.

How about we just increase the swap count temporarily in the direct
swap in path (after alloc folio), then drop the count after pte_same
(or shmem_add_to_page_cache in shmem path)? That seems enough to
prevent the entry reuse issue.