linux-kernel - [PATCH v2 1/2] mpage: terminate read-ahead on read error

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [thread-next>] [day] [month] [year] [list]

Message-ID: <20250829023659.688649-1-chizhiling@163.com>
Date: Fri, 29 Aug 2025 10:36:58 +0800
From: Chi Zhiling <chizhiling@....com>
To: linux-fsdevel@...r.kernel.org,
	linux-mm@...ck.org,
	linux-kernel@...r.kernel.org
Cc: Alexander Viro <viro@...iv.linux.org.uk>,
	Christian Brauner <brauner@...nel.org>,
	Jan Kara <jack@...e.cz>,
	Matthew Wilcox <willy@...radead.org>,
	Andrew Morton <akpm@...ux-foundation.org>,
	Namjae Jeon <linkinjeon@...nel.org>,
	Sungjong Seo <sj1557.seo@...sung.com>,
	Yuezhang Mo <yuezhang.mo@...y.com>,
	Chi Zhiling <chizhiling@...inos.cn>
Subject: [PATCH v2 1/2] mpage: terminate read-ahead on read error

From: Chi Zhiling <chizhiling@...inos.cn>

For exFAT filesystems with 4MB read_ahead_size, removing the storage device
during read operations can delay EIO error reporting by several minutes.
This occurs because the read-ahead implementation in mpage doesn't handle
errors.

Another reason for the delay is that the filesystem requires metadata to
issue file read request. When the storage device is removed, the metadata
buffers are invalidated, causing mpage to repeatedly attempt to fetch
metadata during each get_block call.

The original purpose of this patch is terminate read ahead when we fail
to get metadata, to make the patch more generic, implement it by checking
folio status, instead of checking the return of get_block().

So, if a folio is synchronously unlocked and non-uptodate, should we 
quit the read ahead?

I think it depends on whether the error is permanent or temporary, and 
whether further read ahead might succeed.
A device being unplugged is one reason for returning such a folio, but 
we could return it for many other reasons (e.g., metadata errors).
I think most errors won't be restored in a short time, so we should quit 
read ahead when they occur.

Signed-off-by: Chi Zhiling <chizhiling@...inos.cn>
---

diff from v1:
No functional changes. Improved code style as suggested

[v1]: https://lore.kernel.org/all/20250812072225.181798-1-chizhiling@163.com/T/#u

Just submit the final version, it doesn't matter to me if it doesn't merge :)

 fs/mpage.c | 6 ++++++
 1 file changed, 6 insertions(+)

diff --git a/fs/mpage.c b/fs/mpage.c
index c5fd821fd30e..e4c11831f234 100644
--- a/fs/mpage.c
+++ b/fs/mpage.c
@@ -369,6 +369,12 @@ void mpage_readahead(struct readahead_control *rac, get_block_t get_block)
 		args.folio = folio;
 		args.nr_pages = readahead_count(rac);
 		args.bio = do_mpage_readpage(&args);
+		/*
+		 * If read ahead failed synchronously, it may cause by removed
+		 * device, or some filesystem metadata error.
+		 */
+		if (!folio_test_locked(folio) && !folio_test_uptodate(folio))
+			break;
 	}
 	if (args.bio)
 		mpage_bio_submit_read(args.bio);
-- 
2.43.0