lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <20240815124247.65183cbf@kernel.org>
Date: Thu, 15 Aug 2024 12:42:47 -0700
From: Jakub Kicinski <kuba@...nel.org>
To: Edward Cree <ecree.xilinx@...il.com>
Cc: Network Development <netdev@...r.kernel.org>
Subject: Re: Per-queue stats question

On Thu, 15 Aug 2024 18:11:42 +0100 Edward Cree wrote:
> I'm working on adding netdev_stat_ops support to sfc, and finding that
>  the expectations of the selftest around the relation between qstats
>  and rtnl stats are difficult for us to meet.  I'm not sure whether it
>  is our existing rtnl stats or the qstats I'm adding that have the
>  wrong semantics.
> 
> sfc fills in rtnl_link_stats64 with MAC stats from the firmware (or
>  'vadaptor stats' if using SR-IOV).  These count packets (or bytes)
>  since last FW boot/reset (for instance, "ethtool --reset $dev all"
>  clears them).  (Also, for reasons I'm still investigating, while the
>  interface is administratively down they read as zero, then jump back
>  to what they were on "ip link set up".)  Moreover, the counts are
>  updated by periodic DMA, so can be up to 1 second stale.
> The queue stats, meanwhile, are maintained in software, and count
>  since ifup (efx_start_channels()), so that they can be reset on
>  reconfiguration; the base_stats count since driver probe
>  (efx_alloc_channels()).
> 
> Thus, as it stands, it is possible for qstats and rtstats to disagree,
>  in both directions.  For example:

[reordering for grouped answer]

> * Driver is unloaded and then loaded again.  base_stats will reset,
>   but MAC stats won't.
> * ethtool reset.  MAC stats will reset, but base_stats won't.
> * RX filter drops (e.g. unwanted destination MAC address).  These are
>   counted in MAC stats but since they never reach the driver they're
>   not counted in qstats/base_stats (and by my reading of netdev.yaml
>   they shouldn't be, even if we could).

rstats have no clear semantics on modern devices, some drivers count
at the MAC (potentially including VF traffic), some drivers count after
XDP (i.e. don't count XDP_DROP!?!)

We should maintain the qstat semantics as packets intended for a given
netdev, with rx-packets being packets which got delivered to the host
and picked up by the driver.

> * Traffic is passing during the test.  qstats will be up to date,
>   whereas MAC stats, being up to 1s stale, could be far behind.

That's a bug, we should pop a env.wait_hw_stats_settle() in the right
spots in the test.

> Any of these will cause the stats.pkt_byte_sum selftest to fail.
> Which side do I need to change, qstats or rtstats?  Or is the test
>  being too strict?

Test is too strict, I'm not sure what to do about it. It has proven useful
in the past, mlx5 has "misc queues" for PTP, for example, and it caught
that they are added to rstats but weren't added to the base. IDK what
to do about drivers which use MAC stats for rstat :( The fact that the
test fails doesn't mean they misuse qstat.

> On a related note, I notice that the stat_cmp() function within that
>  selftest returns the first nonzero delta it finds in the stats, so
>  that if (say) tx-packets goes forwards but rx-packets goes backwards,
>  it will return >0 causing the rx-packets delta to be ignored.  Is
>  this intended behaviour, or should I submit a patch?

Looks like a bug.

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ