lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Date: Thu, 9 May 2024 00:40:01 +0300
From: Tariq Toukan <ttoukan.linux@...il.com>
To: Joe Damato <jdamato@...tly.com>, Jakub Kicinski <kuba@...nel.org>
Cc: Zhu Yanjun <zyjzyj2000@...il.com>, linux-kernel@...r.kernel.org,
 netdev@...r.kernel.org, saeedm@...dia.com, gal@...dia.com,
 nalramli@...tly.com, "David S. Miller" <davem@...emloft.net>,
 Eric Dumazet <edumazet@...gle.com>, Leon Romanovsky <leon@...nel.org>,
 "open list:MELLANOX MLX5 core VPI driver" <linux-rdma@...r.kernel.org>,
 Paolo Abeni <pabeni@...hat.com>, Tariq Toukan <tariqt@...dia.com>
Subject: Re: [PATCH net-next 0/1] mlx5: Add netdev-genl queue stats



On 06/05/2024 21:04, Joe Damato wrote:
> On Fri, May 03, 2024 at 05:34:29PM -0700, Jakub Kicinski wrote:
>> On Fri, 3 May 2024 16:53:40 -0700 Joe Damato wrote:
>>>> diff --git a/include/net/netdev_queues.h b/include/net/netdev_queues.h
>>>> index c7ac4539eafc..f5d9f3ad5b66 100644
>>>> --- a/include/net/netdev_queues.h
>>>> +++ b/include/net/netdev_queues.h
>>>> @@ -59,6 +59,8 @@ struct netdev_queue_stats_tx {
>>>>    * statistics will not generally add up to the total number of events for
>>>>    * the device. The @get_base_stats callback allows filling in the delta
>>>>    * between events for currently live queues and overall device history.
>>>> + * @get_base_stats can also be used to report any miscellaneous packets
>>>> + * transferred outside of the main set of queues used by the networking stack.
>>>>    * When the statistics for the entire device are queried, first @get_base_stats
>>>>    * is issued to collect the delta, and then a series of per-queue callbacks.
>>>>    * Only statistics which are set in @get_base_stats will be reported
>>>>
>>>>
>>>> SG?
>>>
>>> I think that sounds good and makes sense, yea. By that definition, then I
>>> should leave the PTP stats as shown above. If you agree, I'll add that
>>> to the v2.
>>
>> Yup, agreed.
>>
>>> I feel like I should probably wait before sending a v2 with PTP included in
>>> get_base_stats to see if the Mellanox folks have any hints about why rtnl
>>> != queue stats on mlx5?
>>>
>>> What do you think?
>>
>> Very odd, the code doesn't appear to be doing any magic :S Did you try
>> to print what the delta in values is? Does bringing the interface up and
>> down affect the size of it?
> 
> I booted the kernel which includes PTP stats in the base stats as you've
> suggested (as shown in the diff in this thread) and I've brought the
> interface down and back up:
> 
> $ sudo ip link set dev eth0 down
> $ sudo ip link set dev eth0 up
> 
> Re ran the test script, which includes some mild debugging print out I
> added to show the delta for rx-packets (but I think all stats are off):
> 
>    # Exception| Exception: Qstats are lower, fetched later
> 
> key: rx-packets rstat: 1192281902 qstat: 1186755777
> key: rx-packets rstat: 1192281902 qstat: 1186755781
> 
> So qstat is lower by (1192281902 - 1186755781) = 5,526,121
> 
> Not really sure why, but I'll take another look at the code this morning to
> see if I can figure out what's going on.
> 
> I'm clearly doing something wrong or misunderstanding something about the
> accounting that will seem extremely obvious in retrospect.

Hi Joe,

Thanks for your patch.
Apologies for the late response. I was on PTO for some time.

 From first look the patch looks okay. The overall approach seems correct.

The off-channels queues (like PTP) do not exist in default. So they are 
out of the game unless you explicitly enables them.

A possible reason for this difference is the queues included in the sum.
Our stats are persistent across configuration changes, so they doesn't 
reset when number of channels changes for example.

We keep stats entries for al ring indices that ever existed. Our driver 
loops and sums up the stats for all of them, while the stack loops only 
up to the current netdev->real_num_rx_queues.

Can this explain the diff here?

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ