[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <5abe8b0c-2354-4107-9004-ccf86cf90d25@redhat.com>
Date: Mon, 26 May 2025 14:39:31 +0200
From: David Hildenbrand <david@...hat.com>
To: Barry Song <21cnbao@...il.com>, Peter Xu <peterx@...hat.com>,
Suren Baghdasaryan <surenb@...gle.com>, Lokesh Gidra
<lokeshgidra@...gle.com>, Andrea Arcangeli <aarcange@...hat.com>
Cc: Andrew Morton <akpm@...ux-foundation.org>, Linux-MM <linux-mm@...ck.org>,
Kairui Song <ryncsn@...il.com>, LKML <linux-kernel@...r.kernel.org>
Subject: Re: [BUG]userfaultfd_move fails to move a folio when swap-in occurs
concurrently with swap-out
On 23.05.25 01:23, Barry Song wrote:
> Hi All,
Hi!
>
> I'm encountering another bug that can be easily reproduced using the small
> program below[1], which performs swap-out and swap-in in parallel.
>
> The issue occurs when a folio is being swapped out while it is accessed
> concurrently. In this case, do_swap_page() handles the access. However,
> because the folio is under writeback, do_swap_page() completely removes
> its exclusive attribute.
>
> do_swap_page:
> } else if (exclusive && folio_test_writeback(folio) &&
> data_race(si->flags & SWP_STABLE_WRITES)) {
> ...
> exclusive = false;
>
> As a result, userfaultfd_move() will return -EBUSY, even though the
> folio is not shared and is in fact exclusively owned.
>
> folio = vm_normal_folio(src_vma, src_addr,
> orig_src_pte);
> if (!folio || !PageAnonExclusive(&folio->page)) {
> spin_unlock(src_ptl);
> + pr_err("%s %d folio:%lx exclusive:%d
> swapcache:%d\n",
> + __func__, __LINE__, folio,
> PageAnonExclusive(&folio->page),
> + folio_test_swapcache(folio));
> err = -EBUSY;
> goto out;
> }
>
> I understand that shared folios should not be moved. However, in this
> case, the folio is not shared, yet its exclusive flag is not set.
>
> Therefore, I believe PageAnonExclusive is not a reliable indicator of
> whether a folio is truly exclusive to a process.
It is. The flag *not* being set is not a reliable indicator whether it
is really shared. ;)
The reason why we have this PAE workaround (dropping the flag) in place
is because the page must not be written to (SWP_STABLE_WRITES). CoW
reuse is not possible.
uffd moving that page -- and in that same process setting it writable,
see move_present_pte()->pte_mkwrite() -- would be very bad.
>
> The kernel log output is shown below:
> [ 23.009516] move_pages_pte 1285 folio:fffffdffc01bba40 exclusive:0
> swapcache:1
>
> I'm still struggling to find a real fix; it seems quite challenging.
PAE tells you that you can immediately write to that page without going
through CoW. However, here, CoW is required.
> Please let me know if you have any ideas. In any case It seems
> userspace should fall back to userfaultfd_copy.
We could try detecting whether the page is now exclusive, to reset PAE.
That will only be possible after writeback completed, so it adds
complexity without being able to move the page in all cases (during
writeback).
Letting userspace deal with that in these rate scenarios is
significantly easier.
--
Cheers,
David / dhildenb
Powered by blists - more mailing lists