lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-Id: <20250129181751.5527308F@davehans-spike.ostc.intel.com>
Date: Wed, 29 Jan 2025 10:17:51 -0800
From: Dave Hansen <dave.hansen@...ux.intel.com>
To: linux-kernel@...r.kernel.org
Cc: Linus Torvalds <torvalds@...ux-foundation.org>,Ted Ts'o <tytso@....edu>,Christian Brauner <brauner@...nel.org>,Darrick J. Wong <djwong@...nel.org>,Matthew Wilcox (Oracle) <willy@...radead.org>,Al Viro <viro@...iv.linux.org.uk>,linux-fsdevel@...r.kernel.org,Dave Hansen <dave.hansen@...ux.intel.com>
Subject: [PATCH 1/7] filemap: Move prefaulting out of hot write path


From: Dave Hansen <dave.hansen@...ux.intel.com>

There is a bit of a sordid history here. I originally wrote
998ef75ddb57 ("fs: do not prefault sys_write() user buffer pages")
to fix a performance issue that showed up on early SMAP hardware.
But that was reverted with 00a3d660cbac because it exposed an
underlying filesystem bug.

This is a reimplementation of the original commit along with some
simplification and comment improvements.

The basic problem is that the generic write path has two userspace
accesses: one to prefault the write source buffer and then another to
perform the actual write. On x86, this means an extra STAC/CLAC pair.
These are relatively expensive instructions because they function as
barriers.

Keep the prefaulting behavior but move it into the slow path that gets
run when the write did not make any progress. This avoids livelocks
that can happen when the write's source and destination target the
same folio. Contrary to the existing comments, the fault-in does not
prevent deadlocks. That's accomplished by using an "atomic" usercopy
that disables page faults.

The end result is that the generic write fast path now touches
userspace once instead of twice. That should speed things up.

Signed-off-by: Dave Hansen <dave.hansen@...ux.intel.com>
Link: https://lore.kernel.org/all/yxyuijjfd6yknryji2q64j3keq2ygw6ca6fs5jwyolklzvo45s@4u63qqqyosy2/
Cc: Ted Ts'o <tytso@....edu>

---

 b/mm/filemap.c |   26 +++++++++++++++-----------
 1 file changed, 15 insertions(+), 11 deletions(-)

diff -puN mm/filemap.c~generic_perform_write-1 mm/filemap.c
--- a/mm/filemap.c~generic_perform_write-1	2025-01-29 09:03:30.963260106 -0800
+++ b/mm/filemap.c	2025-01-29 09:03:30.971260772 -0800
@@ -4027,17 +4027,6 @@ retry:
 		bytes = min(chunk - offset, bytes);
 		balance_dirty_pages_ratelimited(mapping);
 
-		/*
-		 * Bring in the user page that we will copy from _first_.
-		 * Otherwise there's a nasty deadlock on copying from the
-		 * same page as we're writing to, without it being marked
-		 * up-to-date.
-		 */
-		if (unlikely(fault_in_iov_iter_readable(i, bytes) == bytes)) {
-			status = -EFAULT;
-			break;
-		}
-
 		if (fatal_signal_pending(current)) {
 			status = -EINTR;
 			break;
@@ -4055,6 +4044,11 @@ retry:
 		if (mapping_writably_mapped(mapping))
 			flush_dcache_folio(folio);
 
+		/*
+		 * This needs to be atomic because actually handling page
+		 * faults on 'i' can deadlock if the copy targets a
+		 * userspace mapping of 'folio'.
+		 */
 		copied = copy_folio_from_iter_atomic(folio, offset, bytes, i);
 		flush_dcache_folio(folio);
 
@@ -4080,6 +4074,16 @@ retry:
 				bytes = copied;
 				goto retry;
 			}
+
+			/*
+			 * 'folio' is now unlocked and faults on it can be
+			 * handled. Ensure forward progress by trying to
+			 * fault it in now.
+			 */
+			if (fault_in_iov_iter_readable(i, bytes) == bytes) {
+				status = -EFAULT;
+				break;
+			}
 		} else {
 			pos += status;
 			written += status;
_

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ