[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <CAAVpQUARk-XeMdTeGy_s65sdwuLY2RzocGyJ=2_WkhsrFN-bUw@mail.gmail.com>
Date: Tue, 28 Oct 2025 09:42:25 -0700
From: Kuniyuki Iwashima <kuniyu@...gle.com>
To: David Laight <david.laight.linux@...il.com>
Cc: dave.hansen@...el.com, alex@...ti.fr, aou@...s.berkeley.edu,
axboe@...nel.dk, bp@...en8.de, brauner@...nel.org, catalin.marinas@....com,
christophe.leroy@...roup.eu, dave.hansen@...ux.intel.com, edumazet@...gle.com,
hpa@...or.com, kuni1840@...il.com, linux-arm-kernel@...ts.infradead.org,
linux-kernel@...r.kernel.org, linux-riscv@...ts.infradead.org,
linuxppc-dev@...ts.ozlabs.org, maddy@...ux.ibm.com, mingo@...hat.com,
mpe@...erman.id.au, npiggin@...il.com, palmer@...belt.com, pjw@...nel.org,
tglx@...utronix.de, torvalds@...ux-foundation.org, will@...nel.org,
x86@...nel.org
Subject: Re: [PATCH v1 2/2] epoll: Use __user_write_access_begin() and
unsafe_put_user() in epoll_put_uevent().
On Tue, Oct 28, 2025 at 2:54 AM David Laight
<david.laight.linux@...il.com> wrote:
>
> On Tue, 28 Oct 2025 05:32:13 +0000
> Kuniyuki Iwashima <kuniyu@...gle.com> wrote:
>
> ....
> > I rebased on 19ab0a22efbd and tested 4 versions on
> > AMD EPYC 7B12 machine:
>
> That is zen5 which I believe has much faster clac/stac than anything else.
> (It might also have a faster lfence - not sure.)
This is the Zen 2 platform, so probably the stac/clac cost will be
more expensive than you expect on Zen 5.
>
> Getting a 3% change for that diff also seems unlikely.
> Even if you halved the execution time of that code the system would have
> to be spending 6% of the time in that loop.
> Even your original post only shows 1% in ep_try_send_events().
We saw a similar improvement on the same platform by
1fb0e471611d ("net: remove one stac/clac pair from
move_addr_to_user()").
>
> An 'interesting' test is to replicate the code you are optimising
> to see how much slower it goes - you can't gain more than the slowdown.
>
> What is more likely is that breathing on the code changes the cache
> line layout and that causes a larger performance change.
>
> A better test for epoll_put_event would be to create 1000 fd (pipes or events).
> Then time calls epoll_wait() that return lots of events.
>
> David
Powered by blists - more mailing lists