lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [thread-next>] [day] [month] [year] [list]
Message-ID: <20251028175639.2567832-1-kuniyu@google.com>
Date: Tue, 28 Oct 2025 17:56:28 +0000
From: Kuniyuki Iwashima <kuniyu@...gle.com>
To: Christian Brauner <brauner@...nel.org>, Jens Axboe <axboe@...nel.dk>
Cc: Dave Hansen <dave.hansen@...ux.intel.com>, 
	David Laight <david.laight.linux@...il.com>, 
	Linus Torvalds <torvalds@...ux-foundation.org>, Eric Dumazet <edumazet@...gle.com>, 
	Kuniyuki Iwashima <kuniyu@...gle.com>, Kuniyuki Iwashima <kuni1840@...il.com>, linux-kernel@...r.kernel.org, 
	Dave Hansen <dave.hansen@...el.com>
Subject: [PATCH v2] epoll: Use user_write_access_begin() and unsafe_put_user()
 in epoll_put_uevent().

epoll_put_uevent() calls __put_user() twice, which are inlined
to two calls of out-of-line functions, __put_user_nocheck_4()
and __put_user_nocheck_8().

Both functions wrap mov with a stac/clac pair, which is expensive
on the AMD EPYC 7B12 (Zen 2) 64-Core Processor platform.

  __put_user_nocheck_4  /proc/kcore [Percent: local period]
  Percent │
    89.91 │      stac
     0.19 │      mov  %eax,(%rcx)
     0.15 │      xor  %ecx,%ecx
     9.69 │      clac
     0.06 │    ← retq

This was remarkable while testing neper/udp_rr with 1000 flows per
thread.

  Overhead  Shared O  Symbol
    10.08%  [kernel]  [k] _copy_to_iter
     7.12%  [kernel]  [k] ip6_output
     6.40%  [kernel]  [k] sock_poll
     5.71%  [kernel]  [k] move_addr_to_user
     4.39%  [kernel]  [k] __put_user_nocheck_4
     ...
     1.06%  [kernel]  [k] ep_try_send_events
     ...                  ^- epoll_put_uevent() was inlined
     0.78%  [kernel]  [k] __put_user_nocheck_8

Let's use user_write_access_begin() and unsafe_put_user() in
epoll_put_uevent().

We saw 2% more pps with udp_rr by saving a stac/clac pair.

Before:

  # nstat > /dev/null; sleep 10; nstat | grep -i udp
  Udp6InDatagrams                 2184011            0.0

  @ep_try_send_events_ns:
  [256, 512)       2796601 |@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@|
  [512, 1K)         627863 |@@@@@@@@@@@                                         |
  [1K, 2K)          166403 |@@@                                                 |
  [2K, 4K)           10437 |                                                    |
  [4K, 8K)            1396 |                                                    |
  [8K, 16K)            116 |                                                    |

After:

  # nstat > /dev/null; sleep 10; nstat | grep -i udp
  Udp6InDatagrams                 2232730            0.0

  @ep_try_send_events_ns:
  [256, 512)       2900655 |@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@|
  [512, 1K)         622045 |@@@@@@@@@@@                                         |
  [1K, 2K)          172831 |@@@                                                 |
  [2K, 4K)           17687 |                                                    |
  [4K, 8K)            1103 |                                                    |
  [8K, 16K)            174 |                                                    |

Another option would be to use can_do_masked_user_access()
and masked_user_access_begin(), but we saw 3% regression. (See Link)

Link: https://lore.kernel.org/lkml/20251028053330.2391078-1-kuniyu@google.com/
Suggested-by: Eric Dumazet <edumazet@...gle.com>
Suggested-by: Dave Hansen <dave.hansen@...el.com>
Signed-off-by: Kuniyuki Iwashima <kuniyu@...gle.com>
---
v2:
  * Drop patch 1
  * Use user_write_access_begin() instead of a bare stac (Dave Hansen)

v1: https://lore.kernel.org/lkml/20251023000535.2897002-1-kuniyu@google.com/
---
 include/linux/eventpoll.h | 12 ++++++++----
 1 file changed, 8 insertions(+), 4 deletions(-)

diff --git a/include/linux/eventpoll.h b/include/linux/eventpoll.h
index ccb478eb174b..31a1b11e4ddf 100644
--- a/include/linux/eventpoll.h
+++ b/include/linux/eventpoll.h
@@ -82,11 +82,15 @@ static inline struct epoll_event __user *
 epoll_put_uevent(__poll_t revents, __u64 data,
 		 struct epoll_event __user *uevent)
 {
-	if (__put_user(revents, &uevent->events) ||
-	    __put_user(data, &uevent->data))
+	if (!user_write_access_begin(uevent, sizeof(*uevent)))
 		return NULL;
-
-	return uevent+1;
+	unsafe_put_user(revents, &uevent->events, efault);
+	unsafe_put_user(data, &uevent->data, efault);
+	user_access_end();
+	return uevent + 1;
+efault:
+	user_access_end();
+	return NULL;
 }
 #endif
 
-- 
2.51.1.851.g4ebd6896fd-goog


Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ