[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <917708b5-cb86-f233-e878-9233c4e6c707@linux.dev>
Date: Sat, 7 Oct 2023 14:34:11 +0800
From: Yajun Deng <yajun.deng@...ux.dev>
To: Eric Dumazet <edumazet@...gle.com>
Cc: davem@...emloft.net, kuba@...nel.org, pabeni@...hat.com,
netdev@...r.kernel.org, linux-kernel@...r.kernel.org,
Alexander Lobakin <aleksander.lobakin@...el.com>
Subject: Re: [PATCH net-next v7] net/core: Introduce netdev_core_stats_inc()
On 2023/10/7 13:29, Eric Dumazet wrote:
> On Sat, Oct 7, 2023 at 7:06 AM Yajun Deng <yajun.deng@...ux.dev> wrote:
>> Although there is a kfree_skb_reason() helper function that can be used to
>> find the reason why this skb is dropped, but most callers didn't increase
>> one of rx_dropped, tx_dropped, rx_nohandler and rx_otherhost_dropped.
>>
> ...
>
>> +
>> +void netdev_core_stats_inc(struct net_device *dev, u32 offset)
>> +{
>> + /* This READ_ONCE() pairs with the write in netdev_core_stats_alloc() */
>> + struct net_device_core_stats __percpu *p = READ_ONCE(dev->core_stats);
>> + unsigned long *field;
>> +
>> + if (unlikely(!p))
>> + p = netdev_core_stats_alloc(dev);
>> +
>> + if (p) {
>> + field = (unsigned long *)((void *)this_cpu_ptr(p) + offset);
>> + WRITE_ONCE(*field, READ_ONCE(*field) + 1);
> This is broken...
>
> As I explained earlier, dev_core_stats_xxxx(dev) can be called from
> many different contexts:
>
> 1) process contexts, where preemption and migration are allowed.
> 2) interrupt contexts.
>
> Adding WRITE_ONCE()/READ_ONCE() is not solving potential races.
>
> I _think_ I already gave you how to deal with this ?
Yes, I replied in v6.
https://lore.kernel.org/all/e25b5f3c-bd97-56f0-de86-b93a3172870d@linux.dev/
> Please try instead:
>
> +void netdev_core_stats_inc(struct net_device *dev, u32 offset)
> +{
> + /* This READ_ONCE() pairs with the write in netdev_core_stats_alloc() */
> + struct net_device_core_stats __percpu *p = READ_ONCE(dev->core_stats);
> + unsigned long __percpu *field;
> +
> + if (unlikely(!p)) {
> + p = netdev_core_stats_alloc(dev);
> + if (!p)
> + return;
> + }
> + field = (__force unsigned long __percpu *)((__force void *)p + offset);
> + this_cpu_inc(*field);
> +}
This wouldn't trace anything even the rx_dropped is in increasing. It
needs to add an extra operation, such as:
pr_info, ++, trace_xxx... . I don't know what's going on.
If this is adopted, I need to send two patches, one is introduce
netdev_core_stats_inc, another is add an tracepoint , like:
+void netdev_core_stats_inc(struct net_device *dev, u32 offset)
+{
+ /* This READ_ONCE() pairs with the write in netdev_core_stats_alloc() */
+ struct net_device_core_stats __percpu *p = READ_ONCE(dev->core_stats);
+ unsigned long __percpu *field;
+
+ if (unlikely(!p)) {
+ p = netdev_core_stats_alloc(dev);
+ if (!p)
+ return;
+ }
+ trace_netdev_core_stats_inc(dev, offset);
+ field = (__force unsigned long __percpu *)((__force void *)p + offset);
+ this_cpu_inc(*field);
+}
--- a/include/trace/events/net.h
+++ b/include/trace/events/net.h
+TRACE_EVENT(netdev_core_stats_inc,
+
+ TP_PROTO(struct net_device *dev,
+ u32 offset),
+
+ TP_ARGS(dev, offset),
+
+ TP_STRUCT__entry(
+ __string( name, dev->name )
+ __string( driver, netdev_drivername(dev))
+ __field( u32, offset )
+ ),
+
+ TP_fast_assign(
+ __assign_str(name, dev->name);
+ __assign_str(driver, netdev_drivername(dev));
+ __entry->offset = offset;
+ ),
+
+ TP_printk("dev=%s driver=%s offset=%u",
+ __get_str(name), __get_str(driver), __entry->offset)
+);
We can trace netdev_core_stats_inc by tracepoint or kprobe.
Powered by blists - more mailing lists