linux-kernel - Re: Whether is the race for SWP_SYNCHRONOUS_IO possible? (was Re: [PATCH v3 6/7] mm/swap, shmem: use unified swapin helper for shmem)

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [day] [month] [year] [list]

Message-ID: <87plxhf1k1.fsf@yhuang6-desk2.ccr.corp.intel.com>
Date: Thu, 01 Feb 2024 08:52:46 +0800
From: "Huang, Ying" <ying.huang@...el.com>
To: Chris Li <chrisl@...nel.org>
Cc: Kairui Song <ryncsn@...il.com>,  Minchan Kim <minchan@...nel.org>,
  linux-mm@...ck.org,  Andrew Morton <akpm@...ux-foundation.org>,  Hugh
 Dickins <hughd@...gle.com>,  Johannes Weiner <hannes@...xchg.org>,
  Matthew Wilcox <willy@...radead.org>,  Michal Hocko <mhocko@...e.com>,
  Yosry Ahmed <yosryahmed@...gle.com>,  David Hildenbrand
 <david@...hat.com>,  linux-kernel@...r.kernel.org,  Yu Zhao
 <yuzhao@...gle.com>
Subject: Re: Whether is the race for SWP_SYNCHRONOUS_IO possible? (was Re:
 [PATCH v3 6/7] mm/swap, shmem: use unified swapin helper for shmem)

Chris Li <chrisl@...nel.org> writes:

> On Tue, Jan 30, 2024 at 7:58 PM Kairui Song <ryncsn@...il.com> wrote:
>>
>> Hi Ying,
>>
>> On Wed, Jan 31, 2024 at 10:53 AM Huang, Ying <ying.huang@...el.com> wrote:
>> >
>> > Hi, Minchan,
>> >
>> > When I review the patchset from Kairui, I checked the code to skip swap
>> > cache in do_swap_page() for swap device with SWP_SYNCHRONOUS_IO.  Is the
>> > following race possible?  Where a page is swapped out to a swap device
>> > with SWP_SYNCHRONOUS_IO and the swap count is 1.  Then 2 threads of the
>> > process runs on CPU0 and CPU1 as below.  CPU0 is running do_swap_page().
>>
>> Chris raised a similar issue about the shmem path, and I was worrying
>> about the same issue in previous discussions about do_swap_page:
>> https://lore.kernel.org/linux-mm/CAMgjq7AwFiDb7cAMkWMWb3vkccie1-tocmZfT7m4WRb_UKPghg@mail.gmail.com/
>
> Ha thanks for remembering that.
>
>>
>> """
>> In do_swap_page path, multiple process could swapin the page at the
>> same time (a mapped once page can still be shared by sub threads),
>> they could get different folios. The later pte lock and pte_same check
>> is not enough, because while one process is not holding the pte lock,
>> another process could read-in, swap_free the entry, then swap-out the
>> page again, using same entry, an ABA problem. The race is not likely
>> to happen in reality but in theory possible.
>> """
>>
>> >
>> > CPU0                            CPU1
>> > ----                            ----
>> > swap_cache_get_folio()
>> > check sync io and swap count
>> > alloc folio
>> > swap_readpage()
>> > folio_lock_or_retry()
>> >                                 swap in the swap entry
>> >                                 write page
>> >                                 swap out to same swap entry
>> > pte_offset_map_lock()
>> > check pte_same()
>> > swap_free()   <-- new content lost!
>> > set_pte_at()  <-- stale page!
>> > folio_unlock()
>> > pte_unmap_unlock()
>>
>> Thank you very much for highlighting this!
>>
>> My concern previously is the same as yours (swapping out using the
>> same entry is like an ABA issue, where pte_same failed to detect the
>> page table change), later when working on V3, I mistakenly thought
>> that's impossible as entry should be pinned until swap_free on CPU0,
>> and I'm wrong. CPU1 can also just call swap_free, then swap count is
>> dropped to 0 and it can just swap out using the same entry. Now I
>> think my patch 6/7 is also affected by this potential race. Seems
>> nothing can stop it from doing this.
>>
>> Actually I was trying to make a reproducer locally, due to swap slot
>> cache, swap allocation algorithm, and the short race window, this is
>> very unlikely to happen though.
>
> You can put some sleep in some of the CPU0 where expect the other race
> to happen to manual help triggering it. Yes, it sounds hard to trigger
> in real life due to reclaim swap out.
>
>>
>> How about we just increase the swap count temporarily in the direct
>> swap in path (after alloc folio), then drop the count after pte_same
>> (or shmem_add_to_page_cache in shmem path)? That seems enough to
>> prevent the entry reuse issue.
>
> Sounds like a good solution.

Yes.  It seems that this can solve the race.

--
Best Regards,
Huang, Ying