[<prev] [next>] [<thread-prev] [day] [month] [year] [list]
Message-ID: <CAL+tcoB=c2wbUQV67-qSAZ1R34DOrQasqsudBi9dz_TOt1MutQ@mail.gmail.com>
Date: Wed, 24 Jul 2024 17:01:56 +0800
From: Jason Xing <kerneljasonxing@...il.com>
To: Eric Dumazet <edumazet@...gle.com>
Cc: davem@...emloft.net, kuba@...nel.org, pabeni@...hat.com, horms@...nel.org,
netdev@...r.kernel.org, Jason Xing <kernelxing@...cent.com>
Subject: Re: [RFC PATCH net-next] net: add an entry for CONFIG_NET_RX_BUSY_POLL
On Wed, Jul 24, 2024 at 4:54 PM Eric Dumazet <edumazet@...gle.com> wrote:
>
> On Wed, Jul 24, 2024 at 9:33 AM Jason Xing <kerneljasonxing@...il.com> wrote:
> >
> > On Wed, Jul 24, 2024 at 8:38 AM Jason Xing <kerneljasonxing@...il.com> wrote:
> > >
> > > On Wed, Jul 24, 2024 at 12:28 AM Eric Dumazet <edumazet@...gle.com> wrote:
> > > >
> > > > On Tue, Jul 23, 2024 at 6:01 PM Jason Xing <kerneljasonxing@...il.com> wrote:
> > > > >
> > > > > On Tue, Jul 23, 2024 at 11:26 PM Eric Dumazet <edumazet@...gle.com> wrote:
> > > > > >
> > > > > > On Tue, Jul 23, 2024 at 5:13 PM Jason Xing <kerneljasonxing@...il.com> wrote:
> > > > > > >
> > > > > > > On Tue, Jul 23, 2024 at 11:09 PM Jason Xing <kerneljasonxing@...il.com> wrote:
> > > > > > > >
> > > > > > > > On Tue, Jul 23, 2024 at 10:57 PM Eric Dumazet <edumazet@...gle.com> wrote:
> > > > > > > > >
> > > > > > > > > On Tue, Jul 23, 2024 at 3:57 PM Jason Xing <kerneljasonxing@...il.com> wrote:
> > > > > > > > > >
> > > > > > > > > > From: Jason Xing <kernelxing@...cent.com>
> > > > > > > > > >
> > > > > > > > > > When I was doing performance test on unix_poll(), I found out that
> > > > > > > > > > accessing sk->sk_ll_usec when calling sock_poll()->sk_can_busy_loop()
> > > > > > > > > > occupies too much time, which causes around 16% degradation. So I
> > > > > > > > > > decided to turn off this config, which cannot be done apparently
> > > > > > > > > > before this patch.
> > > > > > > > >
> > > > > > > > > Too many CONFIG_ options, distros will enable it anyway.
> > > > > > > > >
> > > > > > > > > In my builds, offset of sk_ll_usec is 0xe8.
> > > > > > > > >
> > > > > > > > > Are you using some debug options or an old tree ?
> > > > > > >
> > > > > > > I forgot to say: I'm running the latest kernel which I pulled around
> > > > > > > two hours ago. Whatever kind of configs with/without debug options I
> > > > > > > use, I can still reproduce it.
> > > > > >
> > > > > > Ok, please post :
> > > > > >
> > > > > > pahole --hex -C sock vmlinux
> > > > >
> > > > > 1) Enable the config:
> > > > > $ pahole --hex -C sock vmlinux
> > > > > struct sock {
> > > > > struct sock_common __sk_common; /* 0 0x88 */
> > > > > /* --- cacheline 2 boundary (128 bytes) was 8 bytes ago --- */
> > > > > __u8
> > > > > __cacheline_group_begin__sock_write_rx[0]; /* 0x88 0 */
> > > > > atomic_t sk_drops; /* 0x88 0x4 */
> > > > > __s32 sk_peek_off; /* 0x8c 0x4 */
> > > > > struct sk_buff_head sk_error_queue; /* 0x90 0x18 */
> > > > > struct sk_buff_head sk_receive_queue; /* 0xa8 0x18 */
> > > > > /* --- cacheline 3 boundary (192 bytes) --- */
> > > > > struct {
> > > > > atomic_t rmem_alloc; /* 0xc0 0x4 */
> > > > > int len; /* 0xc4 0x4 */
> > > > > struct sk_buff * head; /* 0xc8 0x8 */
> > > > > struct sk_buff * tail; /* 0xd0 0x8 */
> > > > > } sk_backlog; /* 0xc0 0x18 */
> > > > > __u8
> > > > > __cacheline_group_end__sock_write_rx[0]; /* 0xd8 0 */
> > > > > __u8
> > > > > __cacheline_group_begin__sock_read_rx[0]; /* 0xd8 0 */
> > > > > struct dst_entry * sk_rx_dst; /* 0xd8 0x8 */
> > > > > int sk_rx_dst_ifindex; /* 0xe0 0x4 */
> > > > > u32 sk_rx_dst_cookie; /* 0xe4 0x4 */
> > > > > unsigned int sk_ll_usec; /* 0xe8 0x4 */
> > > >
> > > > See here ? offset of sk_ll_usec is 0xe8, not 0x104 as you posted.
> > >
> > > Oh, so sorry. My fault. I remembered only that perf record was
> > > executed in an old tree before you optimise the layout of struct sock.
> > > Then I found out if I disable the config applying to the latest tree
> > > running in my virtual machine, the result is better. So let me find a
> > > physical server to run the latest kernel and will get back more
> > > accurate information of 'perf record' here.
> >
> > Now I'm back. The same output of perf when running the latest kernel
> > on the virtual server goes like this:
> > │
> > │ static inline bool sk_can_busy_loop(const struct sock *sk)
> > │ {
> > │ return READ_ONCE(sk->sk_ll_usec) && !signal_pending(current);
> > │ mov 0xe8(%rdx),%ebp
> > 55.71 │ test %ebp,%ebp
> > │ ↓ jne 62
> > │ sock_poll():
> > command I used: perf record -g -e cycles:k -F 999 -o tk5_select10.data
> > -- ./bin-x86_64/select -E -C 200 -L -S -W -M -N "select_10" -n 100 -B
> > 500
> >
> > If it's running on the physical server, the perf output is like this:
> > │ ↓ je e1
> > │ mov 0x18(%r13),%rdx
> > 0.03 │ mov %rsi,%rbx
> > 0.00 │ mov %rdi,%r12
> > │ mov 0xe8(%rdx),%r14d
> > 26.48 │ test %r14d,%r14d
> >
> > What a interesting thing I found is that running on the physical
> > server the delta output is better than on the virtual server:
> > original kernel, remove access of sk_ll_usec
> > physical: 2.26, 2.08 (delta is 8.4%)
> > virtual: 2.45, 2.05 (delta is ~16%)
> >
> > I'm still confused about reading this sk_ll_usec can cause such a
> > performance degradation situation.
> >
> > Eric, may I ask if you have more ideas/suggestions about this one?
> >
>
> We do not micro-optimize based on 'perf' reports, because of artifacts.
Sure, I know this. The reason why I use perf to observe is that I
found performance degradation between 5.x and the latest kernel. Then
I started to look into the sock_poll() and unix_poll(). It turns out
that some accesses of members can consume more time than expected.
>
> Please run a full workload, sending/receiving 1,000,000 messages and report
> the time difference, not on a precise function but the whole workload.
Okay.
>
> Again, I am guessing there will be no difference, because the cache
> line is needed anyway.
To conclude from the theory of the layout, I agree that I cannot see
any better method to improve.
>
> Please make sure to run the latest kernels, this will avoid you
> discovering issues that have been already fixed.
Sure, I did it based on the latest kernel as my previous emails said.
Without accessing sk_ll_usec, the performance is better.
Anyway, thanks so much for your help!
Thanks,
Jason
Powered by blists - more mailing lists