netdev - Re: [RFC PATCH net-next] net: add an entry for CONFIG_NET_RX_BUSY

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite for Android: free password hash cracker in your pocket

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <CAL+tcoBGRz1ukKe=z2qjPUgjSZ=a-WdXLpTcLj5BxTVNAhnUZg@mail.gmail.com>
Date: Wed, 24 Jul 2024 08:38:16 +0800
From: Jason Xing <kerneljasonxing@...il.com>
To: Eric Dumazet <edumazet@...gle.com>
Cc: davem@...emloft.net, kuba@...nel.org, pabeni@...hat.com, horms@...nel.org, 
	netdev@...r.kernel.org, Jason Xing <kernelxing@...cent.com>
Subject: Re: [RFC PATCH net-next] net: add an entry for CONFIG_NET_RX_BUSY_POLL

On Wed, Jul 24, 2024 at 12:28 AM Eric Dumazet <edumazet@...gle.com> wrote:
>
> On Tue, Jul 23, 2024 at 6:01 PM Jason Xing <kerneljasonxing@...il.com> wrote:
> >
> > On Tue, Jul 23, 2024 at 11:26 PM Eric Dumazet <edumazet@...gle.com> wrote:
> > >
> > > On Tue, Jul 23, 2024 at 5:13 PM Jason Xing <kerneljasonxing@...il.com> wrote:
> > > >
> > > > On Tue, Jul 23, 2024 at 11:09 PM Jason Xing <kerneljasonxing@...il.com> wrote:
> > > > >
> > > > > On Tue, Jul 23, 2024 at 10:57 PM Eric Dumazet <edumazet@...gle.com> wrote:
> > > > > >
> > > > > > On Tue, Jul 23, 2024 at 3:57 PM Jason Xing <kerneljasonxing@...il.com> wrote:
> > > > > > >
> > > > > > > From: Jason Xing <kernelxing@...cent.com>
> > > > > > >
> > > > > > > When I was doing performance test on unix_poll(), I found out that
> > > > > > > accessing sk->sk_ll_usec when calling sock_poll()->sk_can_busy_loop()
> > > > > > > occupies too much time, which causes around 16% degradation. So I
> > > > > > > decided to turn off this config, which cannot be done apparently
> > > > > > > before this patch.
> > > > > >
> > > > > > Too many CONFIG_ options, distros will enable it anyway.
> > > > > >
> > > > > > In my builds, offset of sk_ll_usec is 0xe8.
> > > > > >
> > > > > > Are you using some debug options or an old tree ?
> > > >
> > > > I forgot to say: I'm running the latest kernel which I pulled around
> > > > two hours ago. Whatever kind of configs with/without debug options I
> > > > use, I can still reproduce it.
> > >
> > > Ok, please post :
> > >
> > > pahole --hex -C sock vmlinux
> >
> > 1) Enable the config:
> > $ pahole --hex -C sock vmlinux
> > struct sock {
> >         struct sock_common         __sk_common;          /*     0  0x88 */
> >         /* --- cacheline 2 boundary (128 bytes) was 8 bytes ago --- */
> >         __u8
> > __cacheline_group_begin__sock_write_rx[0]; /*  0x88     0 */
> >         atomic_t                   sk_drops;             /*  0x88   0x4 */
> >         __s32                      sk_peek_off;          /*  0x8c   0x4 */
> >         struct sk_buff_head        sk_error_queue;       /*  0x90  0x18 */
> >         struct sk_buff_head        sk_receive_queue;     /*  0xa8  0x18 */
> >         /* --- cacheline 3 boundary (192 bytes) --- */
> >         struct {
> >                 atomic_t           rmem_alloc;           /*  0xc0   0x4 */
> >                 int                len;                  /*  0xc4   0x4 */
> >                 struct sk_buff *   head;                 /*  0xc8   0x8 */
> >                 struct sk_buff *   tail;                 /*  0xd0   0x8 */
> >         } sk_backlog;                                    /*  0xc0  0x18 */
> >         __u8
> > __cacheline_group_end__sock_write_rx[0]; /*  0xd8     0 */
> >         __u8
> > __cacheline_group_begin__sock_read_rx[0]; /*  0xd8     0 */
> >         struct dst_entry *         sk_rx_dst;            /*  0xd8   0x8 */
> >         int                        sk_rx_dst_ifindex;    /*  0xe0   0x4 */
> >         u32                        sk_rx_dst_cookie;     /*  0xe4   0x4 */
> >         unsigned int               sk_ll_usec;           /*  0xe8   0x4 */
>
> See here ? offset of sk_ll_usec is 0xe8, not 0x104 as you posted.

Oh, so sorry. My fault. I remembered only that perf record was
executed in an old tree before you optimise the layout of struct sock.
Then I found out if I disable the config applying to the latest tree
running in my virtual machine, the result is better. So let me find a
physical server to run the latest kernel and will get back more
accurate information of 'perf record' here.

>
> Do not blindly trust perf here.
>
> Please run a benchmark with 1,000,000 af_unix messages being sent and received.
>
> I am guessing your patch makes no difference at all (certainly not 16
> % as claimed in your changelog)

The fact is the performance would improve when I disable the config if
I only test unix_poll related paths. The time spent can decrease from
2.45 to 2.05 which is 16%. As I said, it can be easily reproduced.

Thanks,
Jason