netdev - Re: [PATCH net-next v7] net/core: Introduce netdev_core_stats

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <CANn89iK7bvQtGD=p+fHaWiiaNn=u8vWrt0YQ26pGQY=kZTdfJw@mail.gmail.com>
Date: Sun, 8 Oct 2023 10:53:05 +0200
From: Eric Dumazet <edumazet@...gle.com>
To: Yajun Deng <yajun.deng@...ux.dev>
Cc: davem@...emloft.net, kuba@...nel.org, pabeni@...hat.com, 
	netdev@...r.kernel.org, linux-kernel@...r.kernel.org, 
	Alexander Lobakin <aleksander.lobakin@...el.com>
Subject: Re: [PATCH net-next v7] net/core: Introduce netdev_core_stats_inc()

On Sun, Oct 8, 2023 at 10:44 AM Yajun Deng <yajun.deng@...ux.dev> wrote:
>
>
> On 2023/10/8 15:18, Eric Dumazet wrote:
> > On Sun, Oct 8, 2023 at 9:00 AM Yajun Deng <yajun.deng@...ux.dev> wrote:
> >>
> >> On 2023/10/8 14:45, Eric Dumazet wrote:
> >>> On Sat, Oct 7, 2023 at 8:34 AM Yajun Deng <yajun.deng@...ux.dev> wrote:
> >>>> On 2023/10/7 13:29, Eric Dumazet wrote:
> >>>>> On Sat, Oct 7, 2023 at 7:06 AM Yajun Deng <yajun.deng@...ux.dev> wrote:
> >>>>>> Although there is a kfree_skb_reason() helper function that can be used to
> >>>>>> find the reason why this skb is dropped, but most callers didn't increase
> >>>>>> one of rx_dropped, tx_dropped, rx_nohandler and rx_otherhost_dropped.
> >>>>>>
> >>>>> ...
> >>>>>
> >>>>>> +
> >>>>>> +void netdev_core_stats_inc(struct net_device *dev, u32 offset)
> >>>>>> +{
> >>>>>> +       /* This READ_ONCE() pairs with the write in netdev_core_stats_alloc() */
> >>>>>> +       struct net_device_core_stats __percpu *p = READ_ONCE(dev->core_stats);
> >>>>>> +       unsigned long *field;
> >>>>>> +
> >>>>>> +       if (unlikely(!p))
> >>>>>> +               p = netdev_core_stats_alloc(dev);
> >>>>>> +
> >>>>>> +       if (p) {
> >>>>>> +               field = (unsigned long *)((void *)this_cpu_ptr(p) + offset);
> >>>>>> +               WRITE_ONCE(*field, READ_ONCE(*field) + 1);
> >>>>> This is broken...
> >>>>>
> >>>>> As I explained earlier, dev_core_stats_xxxx(dev) can be called from
> >>>>> many different contexts:
> >>>>>
> >>>>> 1) process contexts, where preemption and migration are allowed.
> >>>>> 2) interrupt contexts.
> >>>>>
> >>>>> Adding WRITE_ONCE()/READ_ONCE() is not solving potential races.
> >>>>>
> >>>>> I _think_ I already gave you how to deal with this ?
> >>>> Yes, I replied in v6.
> >>>>
> >>>> https://lore.kernel.org/all/e25b5f3c-bd97-56f0-de86-b93a3172870d@linux.dev/
> >>>>
> >>>>> Please try instead:
> >>>>>
> >>>>> +void netdev_core_stats_inc(struct net_device *dev, u32 offset)
> >>>>> +{
> >>>>> +       /* This READ_ONCE() pairs with the write in netdev_core_stats_alloc() */
> >>>>> +       struct net_device_core_stats __percpu *p = READ_ONCE(dev->core_stats);
> >>>>> +       unsigned long __percpu *field;
> >>>>> +
> >>>>> +       if (unlikely(!p)) {
> >>>>> +               p = netdev_core_stats_alloc(dev);
> >>>>> +               if (!p)
> >>>>> +                       return;
> >>>>> +       }
> >>>>> +       field = (__force unsigned long __percpu *)((__force void *)p + offset);
> >>>>> +       this_cpu_inc(*field);
> >>>>> +}
> >>>> This wouldn't trace anything even the rx_dropped is in increasing. It
> >>>> needs to add an extra operation, such as:
> >>> I honestly do not know what you are talking about.
> >>>
> >>> Have you even tried to change your patch to use
> >>>
> >>> field = (__force unsigned long __percpu *)((__force void *)p + offset);
> >>> this_cpu_inc(*field);
> >>
> >> Yes, I tested this code. But the following couldn't show anything even
> >> if the rx_dropped is increasing.
> >>
> >> 'sudo python3 /usr/share/bcc/tools/trace netdev_core_stats_inc'
> > Well, I am not sure about this, "bpftrace" worked for me.
> >
> > Make sure your toolchain generates something that looks like what I got:
> >
> > 000000000000ef20 <netdev_core_stats_inc>:
> >      ef20: f3 0f 1e fa          endbr64
> >      ef24: e8 00 00 00 00        call   ef29 <netdev_core_stats_inc+0x9>
> > ef25: R_X86_64_PLT32 __fentry__-0x4
> >      ef29: 55                    push   %rbp
> >      ef2a: 48 89 e5              mov    %rsp,%rbp
> >      ef2d: 53                    push   %rbx
> >      ef2e: 89 f3                mov    %esi,%ebx
> >      ef30: 48 8b 87 f0 01 00 00 mov    0x1f0(%rdi),%rax
> >      ef37: 48 85 c0              test   %rax,%rax
> >      ef3a: 74 0b                je     ef47 <netdev_core_stats_inc+0x27>
> >      ef3c: 89 d9                mov    %ebx,%ecx
> >      ef3e: 65 48 ff 04 08        incq   %gs:(%rax,%rcx,1)
> >      ef43: 5b                    pop    %rbx
> >      ef44: 5d                    pop    %rbp
> >      ef45: c3                    ret
> >      ef46: cc                    int3
> >      ef47: e8 00 00 00 00        call   ef4c <netdev_core_stats_inc+0x2c>
> > ef48: R_X86_64_PLT32 .text.unlikely.+0x13c
> >      ef4c: 48 85 c0              test   %rax,%rax
> >      ef4f: 75 eb                jne    ef3c <netdev_core_stats_inc+0x1c>
> >      ef51: eb f0                jmp    ef43 <netdev_core_stats_inc+0x23>
> >      ef53: 66 66 66 66 2e 0f 1f data16 data16 data16 cs nopw 0x0(%rax,%rax,1)
> >      ef5a: 84 00 00 00 00 00
>
>
> I'll share some I can see it.
>
> 1.
>
> objdump -D vmlinux
>
> ffffffff81b2f170 <netdev_core_stats_inc>:
> ffffffff81b2f170:    e8 8b ea 55 ff           callq ffffffff8108dc00
> <__fentry__>
> ffffffff81b2f175:    55                       push   %rbp
> ffffffff81b2f176:    48 89 e5                 mov    %rsp,%rbp
> ffffffff81b2f179:    48 83 ec 08              sub    $0x8,%rsp
> ffffffff81b2f17d:    48 8b 87 e8 01 00 00     mov 0x1e8(%rdi),%rax
> ffffffff81b2f184:    48 85 c0                 test   %rax,%rax
> ffffffff81b2f187:    74 0d                    je ffffffff81b2f196
> <netdev_core_stats_inc+0x26>
> ffffffff81b2f189:    89 f6                    mov    %esi,%esi
> ffffffff81b2f18b:    65 48 ff 04 30           incq %gs:(%rax,%rsi,1)
> ffffffff81b2f190:    c9                       leaveq
> ffffffff81b2f191:    e9 aa 31 6d 00           jmpq ffffffff82202340
> <__x86_return_thunk>
> ffffffff81b2f196:    89 75 fc                 mov %esi,-0x4(%rbp)
> ffffffff81b2f199:    e8 82 ff ff ff           callq ffffffff81b2f120
> <netdev_core_stats_alloc>
> ffffffff81b2f19e:    8b 75 fc                 mov -0x4(%rbp),%esi
> ffffffff81b2f1a1:    48 85 c0                 test   %rax,%rax
> ffffffff81b2f1a4:    75 e3                    jne ffffffff81b2f189
> <netdev_core_stats_inc+0x19>
> ffffffff81b2f1a6:    c9                       leaveq
> ffffffff81b2f1a7:    e9 94 31 6d 00           jmpq ffffffff82202340
> <__x86_return_thunk>
> ffffffff81b2f1ac:    0f 1f 40 00              nopl   0x0(%rax)
>
>
> 2.
>
> sudo cat /proc/kallsyms | grep netdev_core_stats_inc
>
> ffffffff9c72f120 T netdev_core_stats_inc
> ffffffff9ca2676c t netdev_core_stats_inc.cold
> ffffffff9d5235e0 r __ksymtab_netdev_core_stats_inc
>
>
> 3.
>
> ➜  ~ ifconfig enp34s0f0
> enp34s0f0: flags=4163<UP,BROADCAST,RUNNING,MULTICAST>  mtu 1500
>          inet 10.10.30.88  netmask 255.255.255.0  broadcast 10.10.30.255
>          inet6 fe80::6037:806c:14b6:f1ca  prefixlen 64  scopeid 0x20<link>
>          ether 04:d4:c4:5c:81:42  txqueuelen 1000  (Ethernet)
>          RX packets 29024  bytes 3118278 (3.1 MB)
>          RX errors 0  dropped 794  overruns 0  frame 0
>          TX packets 16961  bytes 2662290 (2.6 MB)
>          TX errors 0  dropped 0 overruns 0  carrier 0  collisions 0
>          device interrupt 29  memory 0x39fff4000000-39fff47fffff
>
> ➜  ~ ifconfig enp34s0f0
> enp34s0f0: flags=4163<UP,BROADCAST,RUNNING,MULTICAST>  mtu 1500
>          inet 10.10.30.88  netmask 255.255.255.0  broadcast 10.10.30.255
>          inet6 fe80::6037:806c:14b6:f1ca  prefixlen 64  scopeid 0x20<link>
>          ether 04:d4:c4:5c:81:42  txqueuelen 1000  (Ethernet)
>          RX packets 29272  bytes 3148997 (3.1 MB)
>          RX errors 0  dropped 798  overruns 0  frame 0
>          TX packets 17098  bytes 2683547 (2.6 MB)
>          TX errors 0  dropped 0 overruns 0  carrier 0  collisions 0
>          device interrupt 29  memory 0x39fff4000000-39fff47fffff
>
>
> The rx_dropped is increasing.
>
>
> 4.
>
> sudo python3 /usr/share/bcc/tools/trace netdev_core_stats_inc
>
> TIME     PID     TID     COMM            FUNC
>
> (Empty, I didn't see anything.)
>
>
> 5.
>
> sudo trace-cmd record -p function -l netdev_core_stats_inc
>
> sudo trace-cmd report
>
> (Empty, I didn't see anything.)
>
>
> If I add a 'pr_info("\n");'   like:
>
> +      pr_info("\n");
>          field = (__force unsigned long __percpu *)((__force void *)p +
> offset);
>          this_cpu_inc(*field);
>
>
> Everything is OK. The 'pr_info("\n");' can be changed to anything else,
> but not
>
> without it.

This seems to be a bug that has nothing to do with the patch.

Try getting help from Steven maybe.