[<prev] [next>] [<thread-prev] [day] [month] [year] [list]
Message-ID: <a4bea058-c5cd-4d8f-ab0f-cd637ccb3969@lunn.ch>
Date: Tue, 18 Jun 2024 20:19:10 +0200
From: Andrew Lunn <andrew@...n.ch>
To: Sean Anderson <sean.anderson@...ux.dev>
Cc: Radhey Shyam Pandey <radhey.shyam.pandey@....com>,
netdev@...r.kernel.org, linux-arm-kernel@...ts.infradead.org,
Michal Simek <michal.simek@....com>,
Jakub Kicinski <kuba@...nel.org>,
Russell King <linux@...linux.org.uk>,
Paolo Abeni <pabeni@...hat.com>, Eric Dumazet <edumazet@...gle.com>,
linux-kernel@...r.kernel.org,
"David S . Miller" <davem@...emloft.net>
Subject: Re: [PATCH net-next 3/3] net: xilinx: axienet: Add statistics support
On Tue, Jun 18, 2024 at 01:03:47PM -0400, Sean Anderson wrote:
> Hi Andrew,
>
> On 6/11/24 11:36, Sean Anderson wrote:
> > On 6/10/24 20:26, Andrew Lunn wrote:
> >>> +static u64 axienet_stat(struct axienet_local *lp, enum temac_stat stat)
> >>> +{
> >>> + return u64_stats_read(&lp->hw_stats[stat]);
> >>> +}
> >>> @@ -1695,6 +1760,35 @@ axienet_get_stats64(struct net_device *dev, struct rtnl_link_stats64 *stats)
> >>> stats->tx_packets = u64_stats_read(&lp->tx_packets);
> >>> stats->tx_bytes = u64_stats_read(&lp->tx_bytes);
> >>> } while (u64_stats_fetch_retry(&lp->tx_stat_sync, start));
> >>> +
> >>> + if (!(lp->features & XAE_FEATURE_STATS))
> >>> + return;
> >>> +
> >>> + do {
> >>> + start = u64_stats_fetch_begin(&lp->hw_stat_sync);
> >>> + stats->rx_length_errors =
> >>> + axienet_stat(lp, STAT_RX_LENGTH_ERRORS);
> >>
> >> I'm i reading this correctly. You are returning the counters from the
> >> last refresh period. What is that? 2.5Gbps would wrapper around a 32
> >> byte counter in 13 seconds. I hope these statistics are not 13 seconds
> >> out of date?
> >
> > By default we use a 1 Hz refresh period. You can of course configure this
> > up to 13 seconds, but we refuse to raise it further since we risk missing
> > a wrap-around. It's configurable by userspace so they can determine how
> > out-of-date they like their stats (vs how often they want to wake up the
> > CPU).
> >
> >> Since axienet_stats_update() also uses the lp->hw_stat_sync, i don't
> >> see why you cannot read the hardware counter value and update to the
> >> latest value.
> >
> > We would need to synchronize against updates to hw_last_counter. Imagine
> > a scenario like
> >
> > CPU 1 CPU 2
> > __axienet_device_reset()
> > axienet_stats_update()
> > axienet_stat()
> > u64_stats_read()
> > axienet_ior()
> > /* device reset */
> > hw_last_counter = 0
> > stats->foo = ... - hw_last_counter[...]
> >
> > and now we have a glitch in the counter values, since we effectively are
> > double-counting the current counter value. Alternatively, we could read
> > the counter after reset but before hw_last_counter was updated and get a
> > glitch due to underflow.
>
> Does this make sense to you? If it does, I'll send v2 with just the mutex
> change and the variable rename pointed out by Simon.
What you have is O.K. I just think you can do better.
As you point out, there is a potential race. There are a few
synchronisation mechanisms for that.
Often you arrange to have exclusive access to a data structure, so you
know it cannot change while you use it. But as you pointed out, you
are not in a context which can block on a mutex.
Another mechanism is to know if the data structure has changed while
you where using it. If it has, throw away what you have, and start
again. That is what lp->hw_stat_sync etc is all about. You loop while
its value changes, indicating something made changes. You should be
able to use this when returning statistics to user space, to return
the real current statistics, not old cached values.
Andrew
Powered by blists - more mailing lists