lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <ZVs7MthVYdIBq9lz@lore-desk>
Date: Mon, 20 Nov 2023 11:55:46 +0100
From: Lorenzo Bianconi <lorenzo@...nel.org>
To: 黄杰 <huangjie.albert@...edance.com>
Cc: Eric Dumazet <edumazet@...gle.com>, Jakub Kicinski <kuba@...nel.org>,
	Paolo Abeni <pabeni@...hat.com>,
	"David S. Miller" <davem@...emloft.net>,
	Toshiaki Makita <toshiaki.makita1@...il.com>,
	netdev@...r.kernel.org, linux-kernel@...r.kernel.org
Subject: Re: [External] Re: [PATCH net] veth: fix ethtool statistical errors

> Lorenzo Bianconi <lorenzo@...nel.org> 于2023年11月20日周一 17:52写道:
> >
> > > Lorenzo Bianconi <lorenzo@...nel.org> 于2023年11月17日周五 17:26写道:
> > > >
> > > > > if peer->real_num_rx_queues > 1, the ethtool -s command for
> > > > > veth network device will display some error statistical values.
> > > > > The value of tx_idx is reset with each iteration, so even if
> > > > > peer->real_num_rx_queues is greater than 1, the value of tx_idx
> > > > > will remain constant. This results in incorrect statistical values.
> > > > > To fix this issue, assign the value of pp_idx to tx_idx.
> > > > >
> > > > > Fixes: 5fe6e56776ba ("veth: rely on peer veth_rq for ndo_xdp_xmit accounting")
> > > > > Signed-off-by: Albert Huang <huangjie.albert@...edance.com>
> > > > > ---
> > > > >  drivers/net/veth.c | 2 +-
> > > > >  1 file changed, 1 insertion(+), 1 deletion(-)
> > > > >
> > > > > diff --git a/drivers/net/veth.c b/drivers/net/veth.c
> > > > > index 0deefd1573cf..3a8e3fc5eeb5 100644
> > > > > --- a/drivers/net/veth.c
> > > > > +++ b/drivers/net/veth.c
> > > > > @@ -225,7 +225,7 @@ static void veth_get_ethtool_stats(struct net_device *dev,
> > > > >       for (i = 0; i < peer->real_num_rx_queues; i++) {
> > > > >               const struct veth_rq_stats *rq_stats = &rcv_priv->rq[i].stats;
> > > > >               const void *base = (void *)&rq_stats->vs;
> > > > > -             unsigned int start, tx_idx = idx;
> > > > > +             unsigned int start, tx_idx = pp_idx;
> > > > >               size_t offset;
> > > > >
> > > > >               tx_idx += (i % dev->real_num_tx_queues) * VETH_TQ_STATS_LEN;
> > > > > --
> > > > > 2.20.1
> > > > >
> > > >
> > > > Hi Albert,
> > > >
> > > > Can you please provide more details about the issue you are facing?
> > > > In particular, what is the number of configured tx and rx queues for both
> > > > peers?
> > >
> > > Hi, Lorenzo
> > > I found this because I wanted to add more echo information in ethttool(for veth,
> > > but I found that the information was incorrect. That's why I paid
> > > attention here.
> >
> > ack. Could you please share the veth pair tx/rx queue configuration?
> >
> 
> dev: tx --->4.  rx--->4
> peer: tx--->1 rx---->1
> 
> Could the following code still be problematic? pp_idx not updated correctly.
> page_pool_stats:
> veth_get_page_pool_stats(dev, &data[pp_idx]);

Thx for pointing this out. This part is a bit tricky but I think I can see the
issue now. Since we have just one peer rx queue, when we run ndo_xdp_xmit
pointer on dev, we will squash all dev xmit queues on the single peer rx one
(where we do do the accounting) [0].
The issue is ethtool will display all dev xmit queues so we need to set pp_idx
properly in veth_get_ethtool_stats().
Can you please take a look to the patch below?

Regards,
Lorenzo

[0] https://github.com/LorenzoBianconi/net-next/blob/master/drivers/net/veth.c#L417

diff --git a/drivers/net/veth.c b/drivers/net/veth.c
index 9980517ed8b0..8607eb8cf458 100644
--- a/drivers/net/veth.c
+++ b/drivers/net/veth.c
@@ -236,8 +236,8 @@ static void veth_get_ethtool_stats(struct net_device *dev,
 				data[tx_idx + j] += *(u64 *)(base + offset);
 			}
 		} while (u64_stats_fetch_retry(&rq_stats->syncp, start));
-		pp_idx = tx_idx + VETH_TQ_STATS_LEN;
 	}
+	pp_idx = idx + dev->real_num_tx_queues * VETH_TQ_STATS_LEN;
 
 page_pool_stats:
 	veth_get_page_pool_stats(dev, &data[pp_idx]);

> 
> BR
> Albert
> 
> > Rergards,
> > Lorenzo
> >
> > >
> > > > tx_idx is the index of the current (local) tx queue and it must restart from
> > > > idx in each iteration otherwise we will have an issue when
> > > > peer->real_num_rx_queues is greater than dev->real_num_tx_queues.
> > > >
> > > OK. I don't know if this is a known issue.
> > >
> > > BR
> > > Albert
> > >
> > >
> > > > Regards,
> > > > Lorenzo

Download attachment "signature.asc" of type "application/pgp-signature" (229 bytes)

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ