[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <CAL+tcoBoD9v5a+LoftwEGCXM4y7kMr5kGbYRGQK0S0RWt3k16Q@mail.gmail.com>
Date: Wed, 24 Jul 2024 08:55:01 +0800
From: Jason Xing <kerneljasonxing@...il.com>
To: Eric Dumazet <edumazet@...gle.com>
Cc: davem@...emloft.net, kuba@...nel.org, pabeni@...hat.com, horms@...nel.org,
netdev@...r.kernel.org, Jason Xing <kernelxing@...cent.com>
Subject: Re: [RFC PATCH net-next] net: add an entry for CONFIG_NET_RX_BUSY_POLL
On Wed, Jul 24, 2024 at 8:38 AM Jason Xing <kerneljasonxing@...il.com> wrote:
>
> On Wed, Jul 24, 2024 at 12:28 AM Eric Dumazet <edumazet@...gle.com> wrote:
> >
> > On Tue, Jul 23, 2024 at 6:01 PM Jason Xing <kerneljasonxing@...il.com> wrote:
> > >
> > > On Tue, Jul 23, 2024 at 11:26 PM Eric Dumazet <edumazet@...gle.com> wrote:
> > > >
> > > > On Tue, Jul 23, 2024 at 5:13 PM Jason Xing <kerneljasonxing@...il.com> wrote:
> > > > >
> > > > > On Tue, Jul 23, 2024 at 11:09 PM Jason Xing <kerneljasonxing@...il.com> wrote:
> > > > > >
> > > > > > On Tue, Jul 23, 2024 at 10:57 PM Eric Dumazet <edumazet@...gle.com> wrote:
> > > > > > >
> > > > > > > On Tue, Jul 23, 2024 at 3:57 PM Jason Xing <kerneljasonxing@...il.com> wrote:
> > > > > > > >
> > > > > > > > From: Jason Xing <kernelxing@...cent.com>
> > > > > > > >
> > > > > > > > When I was doing performance test on unix_poll(), I found out that
> > > > > > > > accessing sk->sk_ll_usec when calling sock_poll()->sk_can_busy_loop()
> > > > > > > > occupies too much time, which causes around 16% degradation. So I
> > > > > > > > decided to turn off this config, which cannot be done apparently
> > > > > > > > before this patch.
> > > > > > >
> > > > > > > Too many CONFIG_ options, distros will enable it anyway.
> > > > > > >
> > > > > > > In my builds, offset of sk_ll_usec is 0xe8.
> > > > > > >
> > > > > > > Are you using some debug options or an old tree ?
> > > > >
> > > > > I forgot to say: I'm running the latest kernel which I pulled around
> > > > > two hours ago. Whatever kind of configs with/without debug options I
> > > > > use, I can still reproduce it.
> > > >
> > > > Ok, please post :
> > > >
> > > > pahole --hex -C sock vmlinux
> > >
> > > 1) Enable the config:
> > > $ pahole --hex -C sock vmlinux
> > > struct sock {
> > > struct sock_common __sk_common; /* 0 0x88 */
> > > /* --- cacheline 2 boundary (128 bytes) was 8 bytes ago --- */
> > > __u8
> > > __cacheline_group_begin__sock_write_rx[0]; /* 0x88 0 */
> > > atomic_t sk_drops; /* 0x88 0x4 */
> > > __s32 sk_peek_off; /* 0x8c 0x4 */
> > > struct sk_buff_head sk_error_queue; /* 0x90 0x18 */
> > > struct sk_buff_head sk_receive_queue; /* 0xa8 0x18 */
> > > /* --- cacheline 3 boundary (192 bytes) --- */
> > > struct {
> > > atomic_t rmem_alloc; /* 0xc0 0x4 */
> > > int len; /* 0xc4 0x4 */
> > > struct sk_buff * head; /* 0xc8 0x8 */
> > > struct sk_buff * tail; /* 0xd0 0x8 */
> > > } sk_backlog; /* 0xc0 0x18 */
> > > __u8
> > > __cacheline_group_end__sock_write_rx[0]; /* 0xd8 0 */
> > > __u8
> > > __cacheline_group_begin__sock_read_rx[0]; /* 0xd8 0 */
> > > struct dst_entry * sk_rx_dst; /* 0xd8 0x8 */
> > > int sk_rx_dst_ifindex; /* 0xe0 0x4 */
> > > u32 sk_rx_dst_cookie; /* 0xe4 0x4 */
> > > unsigned int sk_ll_usec; /* 0xe8 0x4 */
> >
> > See here ? offset of sk_ll_usec is 0xe8, not 0x104 as you posted.
>
> Oh, so sorry. My fault. I remembered only that perf record was
> executed in an old tree before you optimise the layout of struct sock.
> Then I found out if I disable the config applying to the latest tree
> running in my virtual machine, the result is better. So let me find a
> physical server to run the latest kernel and will get back more
> accurate information of 'perf record' here.
>
> >
> > Do not blindly trust perf here.
> >
> > Please run a benchmark with 1,000,000 af_unix messages being sent and received.
> >
> > I am guessing your patch makes no difference at all (certainly not 16
> > % as claimed in your changelog)
>
> The fact is the performance would improve when I disable the config if
> I only test unix_poll related paths. The time spent can decrease from
> 2.45 to 2.05 which is 16%. As I said, it can be easily reproduced.
To prove that accessing the sk_ll_usec field could cause performance
issue, I only remove those lines as below with CONFIG_NET_RX_BUSY_POLL
enabled:
diff --git a/net/socket.c b/net/socket.c
index fcbdd5bc47ac..74a730330a01 100644
--- a/net/socket.c
+++ b/net/socket.c
@@ -1392,20 +1392,11 @@ static __poll_t sock_poll(struct file *file,
poll_table *wait)
{
struct socket *sock = file->private_data;
const struct proto_ops *ops = READ_ONCE(sock->ops);
- __poll_t events = poll_requested_events(wait), flag = 0;
+ __poll_t flag = 0;
if (!ops->poll)
return 0;
- if (sk_can_busy_loop(sock->sk)) {
- /* poll once if requested by the syscall */
- if (events & POLL_BUSY_LOOP)
- sk_busy_loop(sock->sk, 1);
-
- /* if this socket can poll_ll, tell the system call */
- flag = POLL_BUSY_LOOP;
- }
-
return ops->poll(file, sock, wait) | flag;
}
The result of time could decrease to ~2.1.
>
> Thanks,
> Jason
Powered by blists - more mailing lists