lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <20251028192734.2591319-1-kuniyu@google.com>
Date: Tue, 28 Oct 2025 19:25:31 +0000
From: Kuniyuki Iwashima <kuniyu@...gle.com>
To: torvalds@...ux-foundation.org
Cc: axboe@...nel.dk, brauner@...nel.org, dave.hansen@...el.com, 
	dave.hansen@...ux.intel.com, david.laight.linux@...il.com, 
	edumazet@...gle.com, kuni1840@...il.com, kuniyu@...gle.com, 
	linux-kernel@...r.kernel.org
Subject: Re: [PATCH v2] epoll: Use user_write_access_begin() and
 unsafe_put_user() in epoll_put_uevent().

From: Linus Torvalds <torvalds@...ux-foundation.org>
Date: Tue, 28 Oct 2025 12:02:33 -0700
> On Tue, 28 Oct 2025 at 10:56, Kuniyuki Iwashima <kuniyu@...gle.com> wrote:
> >
> > Let's use user_write_access_begin() and unsafe_put_user() in
> > epoll_put_uevent().
> >
> > We saw 2% more pps with udp_rr by saving a stac/clac pair.
> 
> This patch looks fine to me. Simple and targeted.
> 
> > Another option would be to use can_do_masked_user_access()
> > and masked_user_access_begin(), but we saw 3% regression. (See Link)
> 
> So I find this intriguing, because generally,
> masked_user_access_begin() should _never_ be slower than
> user_write_access_begin().
> 
> user_write_access_begin() ends up doing a __uaccess_begin_nospec() on
> x86, which is not just the STAC instruction, but also a barrier.
> 
> In contrast, masked_user_access_begin() obviously also has the STAC,
> but it avoids the barrier and only uses a simple conditional mask.
> 
> So I get the feeling that you did something wrong. In particular,
> following your link, I see you describe that case (2) as
> 
>   2) masked_user_access_begin() + masked_user_access_begin()
>   97% pps compared to 1).
>   96% calls of ep_try_send_events().
> 
> and you mention masked_user_access_begin() *twice*.

Oh sorry, this was my copy-and-paste mistake, and it should have
been can_do_masked_user_access() + masked_user_access_begin().


> 
> Which would certainly explain why it's slower.
> 
> Can you show what the patch you used is?

I used the diff below as the masked version.

I noticed user_access_begin() is used in "else if ()", but I
think it should not matter as long as tested on x86_64 platform
where can_do_masked_user_access() is true.

So, I think the diff is basically the same with the attached one.

---8<---
diff --git a/include/linux/eventpoll.h b/include/linux/eventpoll.h
index ccb478eb174b..c530d83f89d0 100644
--- a/include/linux/eventpoll.h
+++ b/include/linux/eventpoll.h
@@ -82,11 +82,23 @@ static inline struct epoll_event __user *
 epoll_put_uevent(__poll_t revents, __u64 data,
                 struct epoll_event __user *uevent)
 {
-       if (__put_user(revents, &uevent->events) ||
-           __put_user(data, &uevent->data))
+       if (can_do_masked_user_access()) {
+               uevent = masked_user_access_begin(uevent);
+       } else if (!user_access_begin(uevent, sizeof(*uevent))) {
+               /*
+                * dead branch for __must_check.
+                * ep_check_params() already checked access_ok().
+                */
                return NULL;
-
-       return uevent+1;
+       }
+
+       unsafe_put_user(revents, &uevent->events, efault);
+       unsafe_put_user(data, &uevent->data, efault);
+       user_access_end();
+       return uevent + 1;
+efault:
+       user_access_end();
+       return NULL;
 }
 #endif
 
---8<---


> 
> Because I think the proper patch should look something like the
> attached.. For me, that generates
> 
> 
>         movabs $0x123456789abcdef,%rcx
>         cmp    %rcx,%r15
>         cmova  %rcx,%r15
>         stac
>         mov    %r12d,(%r15)
>         mov    %rax,0x4(%r15)
>         clac
> 
> which honestly should be pretty much optimal.

I had this with the masked version, taken from perf + 'a'.

---8<---
        │       movabs $0x7ffffffff000,%rcx
   0.05 │       cmp    %rcx,%r15
   0.18 │       cmova  %rcx,%r15
  72.69 │       stac
   0.08 │       lea    0x28(%rsp),%r12
   0.09 │       mov    %r14d,(%r15)
   0.06 │       mov    %rax,0x4(%r15)
   6.42 │       clac
---8<---

One possibility that Eric mentioned 2 weeks ago is that cmov
might be expensive on the Zen 2 platform.


> 
> (That 0x123456789abcdef is just a placeholder for the USER_PTR_MAX
> value - it gets rewritten at boot to the right value).
> 
> NOTE! The attached patch has absolutely not been tested. I only used
> it to verify the code generation visually, so you should definitely
> check it yourself.
> 

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ