lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite for Android: free password hash cracker in your pocket
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <5abe8b0c-2354-4107-9004-ccf86cf90d25@redhat.com>
Date: Mon, 26 May 2025 14:39:31 +0200
From: David Hildenbrand <david@...hat.com>
To: Barry Song <21cnbao@...il.com>, Peter Xu <peterx@...hat.com>,
 Suren Baghdasaryan <surenb@...gle.com>, Lokesh Gidra
 <lokeshgidra@...gle.com>, Andrea Arcangeli <aarcange@...hat.com>
Cc: Andrew Morton <akpm@...ux-foundation.org>, Linux-MM <linux-mm@...ck.org>,
 Kairui Song <ryncsn@...il.com>, LKML <linux-kernel@...r.kernel.org>
Subject: Re: [BUG]userfaultfd_move fails to move a folio when swap-in occurs
 concurrently with swap-out

On 23.05.25 01:23, Barry Song wrote:
> Hi All,

Hi!

> 
> I'm encountering another bug that can be easily reproduced using the small
> program below[1], which performs swap-out and swap-in in parallel.
> 
> The issue occurs when a folio is being swapped out while it is accessed
> concurrently. In this case, do_swap_page() handles the access. However,
> because the folio is under writeback, do_swap_page() completely removes
> its exclusive attribute.
> 
> do_swap_page:
>                 } else if (exclusive && folio_test_writeback(folio) &&
>                            data_race(si->flags & SWP_STABLE_WRITES)) {
>                          ...
>                          exclusive = false;
> 
> As a result, userfaultfd_move() will return -EBUSY, even though the
> folio is not shared and is in fact exclusively owned.
> 
>                          folio = vm_normal_folio(src_vma, src_addr,
> orig_src_pte);
>                          if (!folio || !PageAnonExclusive(&folio->page)) {
>                                  spin_unlock(src_ptl);
> +                               pr_err("%s %d folio:%lx exclusive:%d
> swapcache:%d\n",
> +                                       __func__, __LINE__, folio,
> PageAnonExclusive(&folio->page),
> +                                       folio_test_swapcache(folio));
>                                  err = -EBUSY;
>                                  goto out;
>                          }
> 
> I understand that shared folios should not be moved. However, in this
> case, the folio is not shared, yet its exclusive flag is not set.
> 
> Therefore, I believe PageAnonExclusive is not a reliable indicator of
> whether a folio is truly exclusive to a process.

It is. The flag *not* being set is not a reliable indicator whether it 
is really shared. ;)

The reason why we have this PAE workaround (dropping the flag) in place 
is because the page must not be written to (SWP_STABLE_WRITES). CoW 
reuse is not possible.

uffd moving that page -- and in that same process setting it writable, 
see move_present_pte()->pte_mkwrite() -- would be very bad.

> 
> The kernel log output is shown below:
> [   23.009516] move_pages_pte 1285 folio:fffffdffc01bba40 exclusive:0
> swapcache:1
> 
> I'm still struggling to find a real fix; it seems quite challenging.

PAE tells you that you can immediately write to that page without going 
through CoW. However, here, CoW is required.

> Please let me know if you have any ideas. In any case It seems
> userspace should fall back to userfaultfd_copy.

We could try detecting whether the page is now exclusive, to reset PAE. 
That will only be possible after writeback completed, so it adds 
complexity without being able to move the page in all cases (during 
writeback).

Letting userspace deal with that in these rate scenarios is 
significantly easier.

-- 
Cheers,

David / dhildenb


Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ