lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <20241213135013.2964079-11-dhowells@redhat.com>
Date: Fri, 13 Dec 2024 13:50:10 +0000
From: David Howells <dhowells@...hat.com>
To: Christian Brauner <christian@...uner.io>
Cc: David Howells <dhowells@...hat.com>,
	Max Kellermann <max.kellermann@...os.com>,
	Ilya Dryomov <idryomov@...il.com>,
	Xiubo Li <xiubli@...hat.com>,
	Trond Myklebust <trondmy@...nel.org>,
	Jeff Layton <jlayton@...nel.org>,
	Matthew Wilcox <willy@...radead.org>,
	netfs@...ts.linux.dev,
	linux-afs@...ts.infradead.org,
	linux-cifs@...r.kernel.org,
	linux-nfs@...r.kernel.org,
	ceph-devel@...r.kernel.org,
	v9fs@...ts.linux.dev,
	linux-erofs@...ts.ozlabs.org,
	linux-fsdevel@...r.kernel.org,
	linux-mm@...ck.org,
	linux-kernel@...r.kernel.org
Subject: [PATCH 10/10] netfs: Fix the (non-)cancellation of copy when cache is temporarily disabled

When the caching for a cookie is temporarily disabled (e.g. due to a DIO
write on that file), future copying to the cache for that file is disabled
until all fds open on that file are closed.  However, if netfslib is using
the deprecated PG_private_2 method (such as is currently used by ceph), and
decides it wants to copy to the cache, netfs_advance_write() will just bail
at the first check seeing that the cache stream is unavailable, and
indicate that it dealt with all the content.

This means that we have no subrequests to provide notifications to drive
the state machine or even to pin the request and the request just gets
discarded, leaving the folios with PG_private_2 set.

Fix this by jumping directly to cancel the request if the cache is not
available.  That way, we don't remove mark3 from the folio_queue list and
netfs_pgpriv2_cancel() will clean up the folios.

This was found by running the generic/013 xfstest against ceph with an
active cache and the "-o fsc" option passed to ceph.  That would usually
hang

Fixes: ee4cdf7ba857 ("netfs: Speed up buffered reading")
Reported-by: Max Kellermann <max.kellermann@...os.com>
Closes: https://lore.kernel.org/r/CAKPOu+_4m80thNy5_fvROoxBm689YtA0dZ-=gcmkzwYSY4syqw@mail.gmail.com/
Signed-off-by: David Howells <dhowells@...hat.com>
cc: Jeff Layton <jlayton@...nel.org>
cc: Ilya Dryomov <idryomov@...il.com>
cc: Xiubo Li <xiubli@...hat.com>
cc: netfs@...ts.linux.dev
cc: ceph-devel@...r.kernel.org
cc: linux-fsdevel@...r.kernel.org
---
 fs/netfs/read_pgpriv2.c | 4 ++++
 1 file changed, 4 insertions(+)

diff --git a/fs/netfs/read_pgpriv2.c b/fs/netfs/read_pgpriv2.c
index ba5af89d37fa..54d5004fec18 100644
--- a/fs/netfs/read_pgpriv2.c
+++ b/fs/netfs/read_pgpriv2.c
@@ -170,6 +170,10 @@ void netfs_pgpriv2_write_to_the_cache(struct netfs_io_request *rreq)
 
 	trace_netfs_write(wreq, netfs_write_trace_copy_to_cache);
 	netfs_stat(&netfs_n_wh_copy_to_cache);
+	if (!wreq->io_streams[1].avail) {
+		netfs_put_request(wreq, false, netfs_rreq_trace_put_return);
+		goto couldnt_start;
+	}
 
 	for (;;) {
 		error = netfs_pgpriv2_copy_folio(wreq, folio);


Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ