[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <c3f4f1a4-303d-4d57-ae83-ed52e5a08f69@linux.dev>
Date: Fri, 3 May 2024 12:55:41 +0200
From: Zhu Yanjun <zyjzyj2000@...il.com>
To: Joe Damato <jdamato@...tly.com>, linux-kernel@...r.kernel.org,
netdev@...r.kernel.org, tariqt@...dia.com, saeedm@...dia.com
Cc: gal@...dia.com, nalramli@...tly.com, "David S. Miller"
<davem@...emloft.net>, Eric Dumazet <edumazet@...gle.com>,
Jakub Kicinski <kuba@...nel.org>, Leon Romanovsky <leon@...nel.org>,
"open list:MELLANOX MLX5 core VPI driver" <linux-rdma@...r.kernel.org>,
Paolo Abeni <pabeni@...hat.com>
Subject: Re: [PATCH net-next 0/1] mlx5: Add netdev-genl queue stats
On 03.05.24 04:25, Joe Damato wrote:
> Hi:
>
> This is only 1 patch, so I know a cover letter isn't necessary, but it
> seems there are a few things to mention.
>
> This change adds support for the per queue netdev-genl API to mlx5,
> which seems to output stats:
>
> ./cli.py --spec ../../../Documentation/netlink/specs/netdev.yaml \
> --dump qstats-get --json '{"scope": "queue"}'
>
> ...snip
> {'ifindex': 7,
> 'queue-id': 28,
> 'queue-type': 'tx',
> 'tx-bytes': 399462,
> 'tx-packets': 3311},
> ...snip
Ethtool -S ethx can get the above information
"
...
tx-0.packets: 2094
tx-0.bytes: 294141
rx-0.packets: 2200
rx-0.bytes: 267673
...
"
>
> I've tried to use the tooling suggested to verify that the per queue
> stats match the rtnl stats by doing this:
>
> NETIF=eth0 tools/testing/selftests/drivers/net/stats.py
>
> And the tool outputs that there is a failure:
>
> # Exception| Exception: Qstats are lower, fetched later
> not ok 3 stats.pkt_byte_sum
With ethtool, does the above problem still occur?
Zhu Yanjun
>
> The other tests all pass (including stats.qstat_by_ifindex).
>
> This appears to mean that the netdev-genl queue stats have lower numbers
> than the rtnl stats even though the rtnl stats are fetched first. I
> added some debugging and found that both rx and tx bytes and packets are
> slightly lower.
>
> The only explanations I can think of for this are:
>
> 1. tx_ptp_opened and rx_ptp_opened are both true, in which case
> mlx5e_fold_sw_stats64 adds bytes and packets to the rtnl struct and
> might account for the difference. I skip this case in my
> implementation, so that could certainly explain it.
> 2. Maybe I'm just misunderstanding how stats aggregation works in mlx5,
> and that's why the numbers are slightly off?
>
> It appears that the driver uses a workqueue to queue stats updates which
> happen periodically.
>
> 0. the driver occasionally calls queue_work on the update_stats_work
> workqueue.
> 1. This eventually calls MLX5E_DECLARE_STATS_GRP_OP_UPDATE_STATS(sw),
> in drivers/net/ethernet/mellanox/mlx5/core/en_stats.c, which appears
> to begin by first memsetting the internal stats struct where stats are
> aggregated to zero. This would mean, I think, the get_base_stats
> netdev-genl API implementation that I have is correct: simply set
> everything to 0.... otherwise we'd end up double counting in the
> netdev-genl RX and TX handlers.
> 2. Next, each of the stats helpers are called to collect stats into the
> freshly 0'd internal struct (for example:
> mlx5e_stats_grp_sw_update_stats_rq_stats).
>
> That seems to be how stats are aggregated, which would suggest that if I
> simply .... do what I'm doing in this change the numbers should line up.
>
> But they don't and its either because of PTP or because I am
> misunderstanding/doing something wrong.
>
> Maybe the MLNX folks can suggest a hint?
>
> Thanks,
> Joe
>
> Joe Damato (1):
> net/mlx5e: Add per queue netdev-genl stats
>
> .../net/ethernet/mellanox/mlx5/core/en_main.c | 68 +++++++++++++++++++
> 1 file changed, 68 insertions(+)
>
Powered by blists - more mailing lists