[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <20251028192734.2591319-1-kuniyu@google.com>
Date: Tue, 28 Oct 2025 19:25:31 +0000
From: Kuniyuki Iwashima <kuniyu@...gle.com>
To: torvalds@...ux-foundation.org
Cc: axboe@...nel.dk, brauner@...nel.org, dave.hansen@...el.com,
dave.hansen@...ux.intel.com, david.laight.linux@...il.com,
edumazet@...gle.com, kuni1840@...il.com, kuniyu@...gle.com,
linux-kernel@...r.kernel.org
Subject: Re: [PATCH v2] epoll: Use user_write_access_begin() and
unsafe_put_user() in epoll_put_uevent().
From: Linus Torvalds <torvalds@...ux-foundation.org>
Date: Tue, 28 Oct 2025 12:02:33 -0700
> On Tue, 28 Oct 2025 at 10:56, Kuniyuki Iwashima <kuniyu@...gle.com> wrote:
> >
> > Let's use user_write_access_begin() and unsafe_put_user() in
> > epoll_put_uevent().
> >
> > We saw 2% more pps with udp_rr by saving a stac/clac pair.
>
> This patch looks fine to me. Simple and targeted.
>
> > Another option would be to use can_do_masked_user_access()
> > and masked_user_access_begin(), but we saw 3% regression. (See Link)
>
> So I find this intriguing, because generally,
> masked_user_access_begin() should _never_ be slower than
> user_write_access_begin().
>
> user_write_access_begin() ends up doing a __uaccess_begin_nospec() on
> x86, which is not just the STAC instruction, but also a barrier.
>
> In contrast, masked_user_access_begin() obviously also has the STAC,
> but it avoids the barrier and only uses a simple conditional mask.
>
> So I get the feeling that you did something wrong. In particular,
> following your link, I see you describe that case (2) as
>
> 2) masked_user_access_begin() + masked_user_access_begin()
> 97% pps compared to 1).
> 96% calls of ep_try_send_events().
>
> and you mention masked_user_access_begin() *twice*.
Oh sorry, this was my copy-and-paste mistake, and it should have
been can_do_masked_user_access() + masked_user_access_begin().
>
> Which would certainly explain why it's slower.
>
> Can you show what the patch you used is?
I used the diff below as the masked version.
I noticed user_access_begin() is used in "else if ()", but I
think it should not matter as long as tested on x86_64 platform
where can_do_masked_user_access() is true.
So, I think the diff is basically the same with the attached one.
---8<---
diff --git a/include/linux/eventpoll.h b/include/linux/eventpoll.h
index ccb478eb174b..c530d83f89d0 100644
--- a/include/linux/eventpoll.h
+++ b/include/linux/eventpoll.h
@@ -82,11 +82,23 @@ static inline struct epoll_event __user *
epoll_put_uevent(__poll_t revents, __u64 data,
struct epoll_event __user *uevent)
{
- if (__put_user(revents, &uevent->events) ||
- __put_user(data, &uevent->data))
+ if (can_do_masked_user_access()) {
+ uevent = masked_user_access_begin(uevent);
+ } else if (!user_access_begin(uevent, sizeof(*uevent))) {
+ /*
+ * dead branch for __must_check.
+ * ep_check_params() already checked access_ok().
+ */
return NULL;
-
- return uevent+1;
+ }
+
+ unsafe_put_user(revents, &uevent->events, efault);
+ unsafe_put_user(data, &uevent->data, efault);
+ user_access_end();
+ return uevent + 1;
+efault:
+ user_access_end();
+ return NULL;
}
#endif
---8<---
>
> Because I think the proper patch should look something like the
> attached.. For me, that generates
>
>
> movabs $0x123456789abcdef,%rcx
> cmp %rcx,%r15
> cmova %rcx,%r15
> stac
> mov %r12d,(%r15)
> mov %rax,0x4(%r15)
> clac
>
> which honestly should be pretty much optimal.
I had this with the masked version, taken from perf + 'a'.
---8<---
│ movabs $0x7ffffffff000,%rcx
0.05 │ cmp %rcx,%r15
0.18 │ cmova %rcx,%r15
72.69 │ stac
0.08 │ lea 0x28(%rsp),%r12
0.09 │ mov %r14d,(%r15)
0.06 │ mov %rax,0x4(%r15)
6.42 │ clac
---8<---
One possibility that Eric mentioned 2 weeks ago is that cmov
might be expensive on the Zen 2 platform.
>
> (That 0x123456789abcdef is just a placeholder for the USER_PTR_MAX
> value - it gets rewritten at boot to the right value).
>
> NOTE! The attached patch has absolutely not been tested. I only used
> it to verify the code generation visually, so you should definitely
> check it yourself.
>
Powered by blists - more mailing lists