netdev - Re: Per-queue stats question

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <20240815180127.GP632411@kernel.org>
Date: Thu, 15 Aug 2024 19:01:27 +0100
From: Simon Horman <horms@...nel.org>
To: Edward Cree <ecree.xilinx@...il.com>
Cc: Jakub Kicinski <kuba@...nel.org>,
	Network Development <netdev@...r.kernel.org>
Subject: Re: Per-queue stats question

On Thu, Aug 15, 2024 at 06:11:42PM +0100, Edward Cree wrote:
> I'm working on adding netdev_stat_ops support to sfc, and finding that
>  the expectations of the selftest around the relation between qstats
>  and rtnl stats are difficult for us to meet.  I'm not sure whether it
>  is our existing rtnl stats or the qstats I'm adding that have the
>  wrong semantics.
> 
> sfc fills in rtnl_link_stats64 with MAC stats from the firmware (or
>  'vadaptor stats' if using SR-IOV).  These count packets (or bytes)
>  since last FW boot/reset (for instance, "ethtool --reset $dev all"
>  clears them).  (Also, for reasons I'm still investigating, while the
>  interface is administratively down they read as zero, then jump back
>  to what they were on "ip link set up".)  Moreover, the counts are
>  updated by periodic DMA, so can be up to 1 second stale.
> The queue stats, meanwhile, are maintained in software, and count
>  since ifup (efx_start_channels()), so that they can be reset on
>  reconfiguration; the base_stats count since driver probe
>  (efx_alloc_channels()).
> 
> Thus, as it stands, it is possible for qstats and rtstats to disagree,
>  in both directions.  For example:
> * Driver is unloaded and then loaded again.  base_stats will reset,
>   but MAC stats won't.
> * ethtool reset.  MAC stats will reset, but base_stats won't.
> * Traffic is passing during the test.  qstats will be up to date,
>   whereas MAC stats, being up to 1s stale, could be far behind.
> * RX filter drops (e.g. unwanted destination MAC address).  These are
>   counted in MAC stats but since they never reach the driver they're
>   not counted in qstats/base_stats (and by my reading of netdev.yaml
>   they shouldn't be, even if we could).
> 
> Any of these will cause the stats.pkt_byte_sum selftest to fail.
> Which side do I need to change, qstats or rtstats?  Or is the test
>  being too strict?

Maybe speaking out of turn (or through my hat) but my opinion is that the
test is too strict.

This is because I believe that in practice there can be constraints which
make it impractical or undesirable for stats to be fetched directly from
hardware. And that this leads to some level of caching in software. Which
in turn leads to some scope for stats to be out of date.

I think that is fine, so long as the stats don't lag too much. Where, IMHO
too much might be say more than 0.1s, or something like that. Because, in
my experience, that is fine as a user-experience: either a user, looking at
stats, or a process analysing stats for usage patterns.

> 
> On a related note, I notice that the stat_cmp() function within that
>  selftest returns the first nonzero delta it finds in the stats, so
>  that if (say) tx-packets goes forwards but rx-packets goes backwards,
>  it will return >0 causing the rx-packets delta to be ignored.  Is
>  this intended behaviour, or should I submit a patch?
> 
> -ed
>