linux-kernel - [PATCH 10/10] netfs: Fix the (non-)cancellation of copy when cache is temporarily disabled

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <20241213135013.2964079-11-dhowells@redhat.com>
Date: Fri, 13 Dec 2024 13:50:10 +0000
From: David Howells <dhowells@...hat.com>
To: Christian Brauner <christian@...uner.io>
Cc: David Howells <dhowells@...hat.com>,
	Max Kellermann <max.kellermann@...os.com>,
	Ilya Dryomov <idryomov@...il.com>,
	Xiubo Li <xiubli@...hat.com>,
	Trond Myklebust <trondmy@...nel.org>,
	Jeff Layton <jlayton@...nel.org>,
	Matthew Wilcox <willy@...radead.org>,
	netfs@...ts.linux.dev,
	linux-afs@...ts.infradead.org,
	linux-cifs@...r.kernel.org,
	linux-nfs@...r.kernel.org,
	ceph-devel@...r.kernel.org,
	v9fs@...ts.linux.dev,
	linux-erofs@...ts.ozlabs.org,
	linux-fsdevel@...r.kernel.org,
	linux-mm@...ck.org,
	linux-kernel@...r.kernel.org
Subject: [PATCH 10/10] netfs: Fix the (non-)cancellation of copy when cache is temporarily disabled

When the caching for a cookie is temporarily disabled (e.g. due to a DIO
write on that file), future copying to the cache for that file is disabled
until all fds open on that file are closed.  However, if netfslib is using
the deprecated PG_private_2 method (such as is currently used by ceph), and
decides it wants to copy to the cache, netfs_advance_write() will just bail
at the first check seeing that the cache stream is unavailable, and
indicate that it dealt with all the content.

This means that we have no subrequests to provide notifications to drive
the state machine or even to pin the request and the request just gets
discarded, leaving the folios with PG_private_2 set.

Fix this by jumping directly to cancel the request if the cache is not
available.  That way, we don't remove mark3 from the folio_queue list and
netfs_pgpriv2_cancel() will clean up the folios.

This was found by running the generic/013 xfstest against ceph with an
active cache and the "-o fsc" option passed to ceph.  That would usually
hang

Fixes: ee4cdf7ba857 ("netfs: Speed up buffered reading")
Reported-by: Max Kellermann <max.kellermann@...os.com>
Closes: https://lore.kernel.org/r/CAKPOu+_4m80thNy5_fvROoxBm689YtA0dZ-=gcmkzwYSY4syqw@mail.gmail.com/
Signed-off-by: David Howells <dhowells@...hat.com>
cc: Jeff Layton <jlayton@...nel.org>
cc: Ilya Dryomov <idryomov@...il.com>
cc: Xiubo Li <xiubli@...hat.com>
cc: netfs@...ts.linux.dev
cc: ceph-devel@...r.kernel.org
cc: linux-fsdevel@...r.kernel.org
---
 fs/netfs/read_pgpriv2.c | 4 ++++
 1 file changed, 4 insertions(+)

diff --git a/fs/netfs/read_pgpriv2.c b/fs/netfs/read_pgpriv2.c
index ba5af89d37fa..54d5004fec18 100644
--- a/fs/netfs/read_pgpriv2.c
+++ b/fs/netfs/read_pgpriv2.c
@@ -170,6 +170,10 @@ void netfs_pgpriv2_write_to_the_cache(struct netfs_io_request *rreq)

 	trace_netfs_write(wreq, netfs_write_trace_copy_to_cache);
 	netfs_stat(&netfs_n_wh_copy_to_cache);
+	if (!wreq->io_streams[1].avail) {
+		netfs_put_request(wreq, false, netfs_rreq_trace_put_return);
+		goto couldnt_start;
+	}

 	for (;;) {
 		error = netfs_pgpriv2_copy_folio(wreq, folio);