lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <CAL+tcoBXB97rCp2ghvVGkZAuC+2a4mnMnjsywRLK+nL0+n+s2A@mail.gmail.com>
Date: Fri, 7 Jun 2024 17:16:05 +0800
From: Jason Xing <kerneljasonxing@...il.com>
To: Daniel Jurgens <danielj@...dia.com>
Cc: Jakub Kicinski <kuba@...nel.org>, "Michael S. Tsirkin" <mst@...hat.com>, 
	"netdev@...r.kernel.org" <netdev@...r.kernel.org>, "jasowang@...hat.com" <jasowang@...hat.com>, 
	"xuanzhuo@...ux.alibaba.com" <xuanzhuo@...ux.alibaba.com>, 
	"virtualization@...ts.linux.dev" <virtualization@...ts.linux.dev>, 
	"davem@...emloft.net" <davem@...emloft.net>, "edumazet@...gle.com" <edumazet@...gle.com>, 
	"abeni@...hat.com" <abeni@...hat.com>, Parav Pandit <parav@...dia.com>
Subject: Re: [PATCH net-next] virtio_net: Add TX stop and wake counters

On Sat, Feb 3, 2024 at 12:46 AM Daniel Jurgens <danielj@...dia.com> wrote:
>
> > From: Jakub Kicinski <kuba@...nel.org>
> > Sent: Friday, February 2, 2024 10:01 AM
> > Subject: Re: [PATCH net-next] virtio_net: Add TX stop and wake counters
> >
> > On Fri, 2 Feb 2024 14:52:59 +0800 Jason Xing wrote:
> > > > Can you say more? I'm curious what's your use case.
> > >
> > > I'm not working at Nvidia, so my point of view may differ from theirs.
> > > From what I can tell is that those two counters help me narrow down
> > > the range if I have to diagnose/debug some issues.
> >
> > right, i'm asking to collect useful debugging tricks, nothing against the patch
> > itself :)
> >
> > > 1) I sometimes notice that if some irq is held too long (say, one
> > > simple case: output of printk printed to the console), those two
> > > counters can reflect the issue.
> > > 2) Similarly in virtio net, recently I traced such counters the
> > > current kernel does not have and it turned out that one of the output
> > > queues in the backend behaves badly.
> > > ...
> > >
> > > Stop/wake queue counters may not show directly the root cause of the
> > > issue, but help us 'guess' to some extent.
> >
> > I'm surprised you say you can detect stall-related issues with this.
> > I guess virtio doesn't have BQL support, which makes it special.
> > Normal HW drivers with BQL almost never stop the queue by themselves.
> > I mean - if they do, and BQL is active, then the system is probably
> > misconfigured (queue is too short). This is what we use at Meta to detect
> > stalls in drivers with BQL:
> >
> > https://lore.kernel.org/all/20240131102150.728960-3-leitao@debian.org/
> >
> > Daniel, I think this may be a good enough excuse to add per-queue stats to
> > the netdev genl family, if you're up for that. LMK if you want more info,
> > otherwise I guess ethtool -S is fine for now.
>
> For now, I would like to go with ethtool -S. But I can take on the netdev level as a background task. Are you suggesting adding them to rtnl_link_stats64?

Hello Daniel, Jakub,

Sorry to revive this thread. I wonder why not use this patch like mlnx
driver does instead of adding statistics into the yaml file? Are we
gradually using or adding more fields into the yaml file to replace
the 'ethtool -S' command?

Thanks,
Jason

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ