[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <e05bed50-ef3f-466c-92e9-913b08bbc86c@intel.com>
Date: Tue, 27 Feb 2024 11:29:02 +0100
From: Przemek Kitszel <przemyslaw.kitszel@...el.com>
To: Jakub Kicinski <kuba@...nel.org>, <davem@...emloft.net>
CC: <netdev@...r.kernel.org>, <edumazet@...gle.com>, <pabeni@...hat.com>,
<amritha.nambiar@...el.com>, <danielj@...dia.com>, <mst@...hat.com>,
<michael.chan@...adcom.com>, <sdf@...gle.com>, <vadim.fedorenko@...ux.dev>
Subject: Re: [PATCH net-next 1/3] netdev: add per-queue statistics
On 2/26/24 22:10, Jakub Kicinski wrote:
> The ethtool-nl family does a good job exposing various protocol
> related and IEEE/IETF statistics which used to get dumped under
> ethtool -S, with creative names. Queue stats don't have a netlink
> API, yet, and remain a lion's share of ethtool -S output for new
> drivers. Not only is that bad because the names differ driver to
> driver but it's also bug-prone. Intuitively drivers try to report
> only the stats for active queues, but querying ethtool stats
> involves multiple system calls, and the number of stats is
> read separately from the stats themselves. Worse still when user
> space asks for values of the stats, it doesn't inform the kernel
> how big the buffer is. If number of stats increases in the meantime
> kernel will overflow user buffer.
>
> Add a netlink API for dumping queue stats. Queue information is
> exposed via the netdev-genl family, so add the stats there.
> Support per-queue and sum-for-device dumps. Latter will be useful
> when subsequent patches add more interesting common stats than
> just bytes and packets.
>
> The API does not currently distinguish between HW and SW stats.
> The expectation is that the source of the stats will either not
> matter much (good packets) or be obvious (skb alloc errors).
>
> Signed-off-by: Jakub Kicinski <kuba@...nel.org>
> ---
> Documentation/netlink/specs/netdev.yaml | 84 +++++++++
> Documentation/networking/statistics.rst | 17 +-
> include/linux/netdevice.h | 3 +
> include/net/netdev_queues.h | 54 ++++++
> include/uapi/linux/netdev.h | 19 +++
> net/core/netdev-genl-gen.c | 12 ++
> net/core/netdev-genl-gen.h | 2 +
> net/core/netdev-genl.c | 217 ++++++++++++++++++++++++
> tools/include/uapi/linux/netdev.h | 19 +++
> 9 files changed, 426 insertions(+), 1 deletion(-)
>
I like the series, thank you very much!
[...]
> diff --git a/include/net/netdev_queues.h b/include/net/netdev_queues.h
> index 8b8ed4e13d74..d633347eeda5 100644
> --- a/include/net/netdev_queues.h
> +++ b/include/net/netdev_queues.h
> @@ -4,6 +4,60 @@
>
> #include <linux/netdevice.h>
>
> +struct netdev_queue_stats_rx {
> + u64 bytes;
> + u64 packets;
> +};
> +
> +struct netdev_queue_stats_tx {
> + u64 bytes;
> + u64 packets;
> +};
> +
> +/**
> + * struct netdev_stat_ops - netdev ops for fine grained stats
> + * @get_queue_stats_rx: get stats for a given Rx queue
> + * @get_queue_stats_tx: get stats for a given Tx queue
> + * @get_base_stats: get base stats (not belonging to any live instance)
> + *
> + * Query stats for a given object. The values of the statistics are undefined
> + * on entry (specifically they are *not* zero-initialized). Drivers should
> + * assign values only to the statistics they collect. Statistics which are not
> + * collected must be left undefined.
> + *
> + * Queue objects are not necessarily persistent, and only currently active
> + * queues are queried by the per-queue callbacks. This means that per-queue
> + * statistics will not generally add up to the total number of events for
> + * the device. The @get_base_stats callback allows filling in the delta
> + * between events for currently live queues and overall device history.
> + * When the statistics for the entire device are queried, first @get_base_stats
> + * is issued to collect the delta, and then a series of per-queue callbacks.
> + * Only statistics which are set in @get_base_stats will be reported
> + * at the device level, meaning that unlike in queue callbacks, setting
> + * a statistic to zero in @get_base_stats is a legitimate thing to do.
> + * This is because @get_base_stats has a second function of designating which
> + * statistics are in fact correct for the entire device (e.g. when history
> + * for some of the events is not maintained, and reliable "total" cannot
> + * be provided).
> + *
> + * Device drivers can assume that when collecting total device stats,
> + * the @get_base_stats and subsequent per-queue calls are performed
> + * "atomically" (without releasing the rtnl_lock).
> + *
> + * Device drivers are encouraged to reset the per-queue statistics when
> + * number of queues change. This is because the primary use case for
> + * per-queue statistics is currently to detect traffic imbalance.
I get it, but encouraging users to reset those on queue-count-change
seems to cover that case too. I'm fine though :P
> + */
> +struct netdev_stat_ops {
> + void (*get_queue_stats_rx)(struct net_device *dev, int idx,
> + struct netdev_queue_stats_rx *stats);
> + void (*get_queue_stats_tx)(struct net_device *dev, int idx,
> + struct netdev_queue_stats_tx *stats);
> + void (*get_base_stats)(struct net_device *dev,
> + struct netdev_queue_stats_rx *rx,
> + struct netdev_queue_stats_tx *tx);
> +};
> +
> /**
> * DOC: Lockless queue stopping / waking helpers.
> *
[...]
> diff --git a/net/core/netdev-genl.c b/net/core/netdev-genl.c
> index fd98936da3ae..0fbd666f2b79 100644
> --- a/net/core/netdev-genl.c
> +++ b/net/core/netdev-genl.c
> @@ -8,6 +8,7 @@
> #include <net/xdp.h>
> #include <net/xdp_sock.h>
> #include <net/netdev_rx_queue.h>
> +#include <net/netdev_queues.h>
> #include <net/busy_poll.h>
>
> #include "netdev-genl-gen.h"
> @@ -469,6 +470,222 @@ int netdev_nl_queue_get_dumpit(struct sk_buff *skb, struct netlink_callback *cb)
> return skb->len;
> }
>
> +#define NETDEV_STAT_NOT_SET (~0ULL)
> +
> +static void
> +netdev_nl_stats_add(void *_sum, const void *_add, size_t size)
nit: this declaration fits in one line
> +{
> + const u64 *add = _add;
> + u64 *sum = _sum;
> +
> + while (size) {
> + if (*add != NETDEV_STAT_NOT_SET && *sum != NETDEV_STAT_NOT_SET)
> + *sum += *add;
> + sum++;
> + add++;
> + size -= 8;
> + }
> +}
> +
Powered by blists - more mailing lists