linux-kernel - Re: [PATCH] fs/nfs/read: fix double-unlock bug in nfs_return_empty

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [day] [month] [year] [list]

Message-ID: <CALF+zOmH0fCwsLgWxOLh+Zpak9dZO5Z0D1WozTWjMMOjsyhbBQ@mail.gmail.com>
Date: Wed, 23 Apr 2025 10:39:36 -0400
From: David Wysochanski <dwysocha@...hat.com>
To: Max Kellermann <max.kellermann@...os.com>
Cc: linux-nfs@...r.kernel.org, trondmy@...nel.org, anna@...nel.org, 
	linux-kernel@...r.kernel.org, stable@...r.kernel.org
Subject: Re: [PATCH] fs/nfs/read: fix double-unlock bug in nfs_return_empty_folio()

On Wed, Apr 23, 2025 at 9:23 AM Max Kellermann <max.kellermann@...os.com> wrote:
>
> Sometimes, when a file was read while it was being truncated by
> another NFS client, the kernel could deadlock because folio_unlock()
> was called twice, and the second call would XOR back the `PG_locked`
> flag.
>
> Most of the time (depending on the timing of the truncation), nobody
> notices the problem because folio_unlock() gets called three times,
> which flips `PG_locked` back off:
>
>  1. vfs_read, nfs_read_folio, ... nfs_read_add_folio,
>     nfs_return_empty_folio
>  2. vfs_read, nfs_read_folio, ... netfs_read_collection,
>     netfs_unlock_abandoned_read_pages
>  3. vfs_read, ... nfs_do_read_folio, nfs_read_add_folio,
>     nfs_return_empty_folio
>
> The problem is that nfs_read_add_folio() is not supposed to unlock the
> folio if fscache is enabled, and a nfs_netfs_folio_unlock() check is
> missing in nfs_return_empty_folio().
>
> Rarely this leads to a warning in netfs_read_collection():
>
>  ------------[ cut here ]------------
>  R=0000031c: folio 10 is not locked
>  WARNING: CPU: 0 PID: 29 at fs/netfs/read_collect.c:133 netfs_read_collection+0x7c0/0xf00
>  [...]
>  Workqueue: events_unbound netfs_read_collection_worker
>  RIP: 0010:netfs_read_collection+0x7c0/0xf00
>  [...]
>  Call Trace:
>   <TASK>
>   netfs_read_collection_worker+0x67/0x80
>   process_one_work+0x12e/0x2c0
>   worker_thread+0x295/0x3a0
>
> Most of the time, however, processes just get stuck forever in
> folio_wait_bit_common(), waiting for `PG_locked` to disappear, which
> never happens because nobody is really holding the folio lock.
>
> Fixes: 000dbe0bec05 ("NFS: Convert buffered read paths to use netfs when fscache is enabled")
> Cc: stable@...r.kernel.org
> Signed-off-by: Max Kellermann <max.kellermann@...os.com>
> ---
>  fs/nfs/read.c | 3 ++-
>  1 file changed, 2 insertions(+), 1 deletion(-)
>
> diff --git a/fs/nfs/read.c b/fs/nfs/read.c
> index 81bd1b9aba17..3c1fa320b3f1 100644
> --- a/fs/nfs/read.c
> +++ b/fs/nfs/read.c
> @@ -56,7 +56,8 @@ static int nfs_return_empty_folio(struct folio *folio)
>  {
>         folio_zero_segment(folio, 0, folio_size(folio));
>         folio_mark_uptodate(folio);
> -       folio_unlock(folio);
> +       if (nfs_netfs_folio_unlock(folio))
> +               folio_unlock(folio);
>         return 0;
>  }
>
> --
> 2.47.2
>

LGTM
Thanks for tracking this down, Max.

Reviewed-by: Dave Wysochanski <dwysocha@...hat.com>