linux-kernel - Re: [PATCH] mm/userfaultfd: Fix release hang over concurrent GUP

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-Id: <20250312150411.f4a5b406600742a49b46d04e@linux-foundation.org>
Date: Wed, 12 Mar 2025 15:04:11 -0700
From: Andrew Morton <akpm@...ux-foundation.org>
To: Peter Xu <peterx@...hat.com>
Cc: linux-kernel@...r.kernel.org, linux-mm@...ck.org, Kefeng Wang
 <wangkefeng.wang@...wei.com>, Mike Rapoport <rppt@...nel.org>, Al Viro
 <viro@...iv.linux.org.uk>, Axel Rasmussen <axelrasmussen@...gle.com>, Pavel
 Emelyanov <xemul@...tuozzo.com>, Jinjiang Tu <tujinjiang@...wei.com>,
 Dimitris Siakavaras <jimsiak@...ab.ece.ntua.gr>, Andrea Arcangeli
 <aarcange@...hat.com>
Subject: Re: [PATCH] mm/userfaultfd: Fix release hang over concurrent GUP

On Wed, 12 Mar 2025 10:51:31 -0400 Peter Xu <peterx@...hat.com> wrote:

> This patch should fix a possible userfaultfd release() hang during
> concurrent GUP.
> 
> This problem was initially reported by Dimitris Siakavaras in July 2023 [1]
> in a firecracker use case.  Firecracker has a separate process handling
> page faults remotely, and when the process releases the userfaultfd it can
> race with a concurrent GUP from KVM trying to fault in a guest page during
> the secondary MMU page fault process.
> 
> A similar problem was reported recently again by Jinjiang Tu in March 2025
> [2], even though the race happened this time with a mlockall() operation,
> which does GUP in a similar fashion.
> 
> In 2017, commit 656710a60e36 ("userfaultfd: non-cooperative: closing the
> uffd without triggering SIGBUS") was trying to fix this issue.  AFAIU, that
> fixes well the fault paths but may not work yet for GUP.  In GUP, the issue
> is NOPAGE will be almost treated the same as "page fault resolved" in
> faultin_page(), then the GUP will follow page again, seeing page missing,
> and it'll keep going into a live lock situation as reported.
> 
> This change makes core mm return RETRY instead of NOPAGE for both the GUP
> and fault paths, proactively releasing the mmap read lock.  This should
> guarantee the other release thread make progress on taking the write lock
> and avoid the live lock even for GUP.
> 
> When at it, rearrange the comments to make sure it's uptodate.

It would be good to have a Fixes: target but this bug seems to be so
old that a bare cc:stable should be OK?