linux-kernel - Re: [BUG]userfaultfd_move fails to move a folio when swap-in occurs concurrently with swap-out

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [day] [month] [year] [list]

Message-ID: <42ce8dcc-0139-4dd7-9bef-bf3efa93849a@redhat.com>
Date: Tue, 27 May 2025 13:06:42 +0200
From: David Hildenbrand <david@...hat.com>
To: Barry Song <21cnbao@...il.com>
Cc: aarcange@...hat.com, akpm@...ux-foundation.org,
 linux-kernel@...r.kernel.org, linux-mm@...ck.org, lokeshgidra@...gle.com,
 peterx@...hat.com, ryncsn@...il.com, surenb@...gle.com
Subject: Re: [BUG]userfaultfd_move fails to move a folio when swap-in occurs
 concurrently with swap-out

>>
>>          EBUSY
>>                 The pages in the source virtual memory range are either
>>                 pinned or not exclusive to the process. The kernel might
>>                 only perform lightweight checks for detecting whether the
>>                 pages are exclusive. To make the operation more likely to
>>                 succeed, KSM should be disabled, fork() should be avoided
>>                 or MADV_DONTFORK should be configured for the source
>>                virtual memory area before fork().
>>
>> Note the "lightweight" and "more likely to succeed".
>>
> 
> Initially, my point was that an exclusive folio (single-process case)
> should be movable.

Yeah, I would wish that we wouldn't need that PAE hack in the swapin code.

I was asking myself if we could just ... wait for writeback to end in 
that case?

I mean, if we would have to swap in the folio we would also have to wait 
for disk I/O ... so here we would also have to wait for disk I/O.

We could either wait for writeback before mapping the folio, or set the 
PAE bit and map the page R/O, to then wait for writeback during write 
faults.

The latter has the downside that we have to handle it with more 
complexity during write faults (check if page is under writeback, then 
check if we require this sync I/O during write faults even though PAE is 
set ...).

> Now I understand this isn’t a bug, but rather a compromise made due
> to implementation constraints.

That is a good summary!

> Perhaps the remaining value of this report is that it helped better
> understand scenarios beyond fork where a move might also fail.
> 
> I truly appreciate your time and your clear analysis.

YW :)

-- 
Cheers,

David / dhildenb