lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <20200825175306.377be2f4@kicinski-fedora-PC1C0HJN>
Date:   Tue, 25 Aug 2020 17:53:06 -0700
From:   Jakub Kicinski <kuba@...nel.org>
To:     David Awogbemila <awogbemila@...gle.com>
Cc:     Yangchun Fu <yangchun@...gle.com>, netdev@...r.kernel.org,
        Kuo Zhao <kuozhao@...gle.com>,
        Willem de Bruijn <willemdebruijn.kernel@...il.com>
Subject: Re: [PATCH net-next 05/18] gve: Add Gvnic stats AQ command and
 ethtool show/set-priv-flags.

On Tue, 25 Aug 2020 17:06:27 -0700 David Awogbemila wrote:
> On Tue, Aug 25, 2020 at 9:46 AM Jakub Kicinski <kuba@...nel.org> wrote:
> > On Tue, 25 Aug 2020 08:46:12 -0700 David Awogbemila wrote:  
> > > > > +     // stats from NIC
> > > > > +     RX_QUEUE_DROP_CNT               = 65,
> > > > > +     RX_NO_BUFFERS_POSTED            = 66,
> > > > > +     RX_DROPS_PACKET_OVER_MRU        = 67,
> > > > > +     RX_DROPS_INVALID_CHECKSUM       = 68,  
> > > >
> > > > Most of these look like a perfect match for members of struct
> > > > rtnl_link_stats64. Please use the standard stats to report the errors,
> > > > wherever possible.  
> > > These stats are based on the NIC stats format which don't exactly
> > > match rtnl_link_stats64.
> > > I'll add some clarification in the description and within the comments.  
> >
> > You must report standard stats. Don't be lazy and just dump everything
> > in ethtool -S and expect the user to figure out the meaning of your
> > strings.  
> Apologies for responding a week later, I'll try to respond more quickly.

Thanks!

> I could use some help figuring out the use of rtnl_link_stats64 here.
> These 4 stats are per-queue stats written by the NIC. It looks like
> rtnl_link_stats64 is meant to sum stats for the entire device? Is the
> requirement here simply to use the member names in rtnl_link_stats64
> when reporting stats via ethtool? Thanks.

If these are per queue you can report them per queue in ethtool (name
is up to you), but you must report the sum over all queues in
rtnl_link_stats64.

FWIW the mapping I'd suggest is probably:

RX_QUEUE_DROP_CNT         -> rx_dropped
RX_NO_BUFFERS_POSTED      -> rx_missed_errors
RX_DROPS_PACKET_OVER_MRU  -> rx_length_errors
RX_DROPS_INVALID_CHECKSUM -> rx_crc_errors

The no-buffers-posted stat is unfortunately slightly disputable between
rx_missed_errors and rx_fifo_errors (even rx_over_errors). I'd go for missed.

> > > > > +static int gve_set_priv_flags(struct net_device *netdev, u32 flags)
> > > > > +{
> > > > > +     struct gve_priv *priv = netdev_priv(netdev);
> > > > > +     u64 ori_flags, new_flags;
> > > > > +     u32 i;
> > > > > +
> > > > > +     ori_flags = READ_ONCE(priv->ethtool_flags);
> > > > > +     new_flags = ori_flags;
> > > > > +
> > > > > +     for (i = 0; i < GVE_PRIV_FLAGS_STR_LEN; i++) {
> > > > > +             if (flags & BIT(i))
> > > > > +                     new_flags |= BIT(i);
> > > > > +             else
> > > > > +                     new_flags &= ~(BIT(i));
> > > > > +             priv->ethtool_flags = new_flags;
> > > > > +             /* set report-stats */
> > > > > +             if (strcmp(gve_gstrings_priv_flags[i], "report-stats") == 0) {
> > > > > +                     /* update the stats when user turns report-stats on */
> > > > > +                     if (flags & BIT(i))
> > > > > +                             gve_handle_report_stats(priv);
> > > > > +                     /* zero off gve stats when report-stats turned off */
> > > > > +                     if (!(flags & BIT(i)) && (ori_flags & BIT(i))) {
> > > > > +                             int tx_stats_num = GVE_TX_STATS_REPORT_NUM *
> > > > > +                                     priv->tx_cfg.num_queues;
> > > > > +                             int rx_stats_num = GVE_RX_STATS_REPORT_NUM *
> > > > > +                                     priv->rx_cfg.num_queues;
> > > > > +                             memset(priv->stats_report->stats, 0,
> > > > > +                                    (tx_stats_num + rx_stats_num) *
> > > > > +                                    sizeof(struct stats));  
> > > >
> > > > I don't quite get why you need the knob to disable some statistics.
> > > > Please remove or explain this in the cover letter. Looks unnecessary.  
> > > We use this to give the guest the option of disabling stats reporting
> > > through ethtool set-priv-flags. I'll update the cover letter.  
> >
> > I asked you why you reply a week later with "I want to give user the
> > option. I'll update the cover letter." :/ That's quite painful for the
> > reviewer. Please just provide the justification.  
> I apologize for the pain; it certainly wasn't intended :) .
> Just to clarify, stats will always be available to the user via ethtool.
> This is only giving users the option of disabling the reporting of
> stats from the driver to the virtual NIC should the user decide they
> do not want to share driver stats with Google as a matter of privacy.

Okay, so this is for the to-hypervisor direction. Hopefully the patch
split will make this clearer.

> > > > > @@ -880,6 +953,10 @@ static void gve_handle_status(struct gve_priv *priv, u32 status)
> > > > >               dev_info(&priv->pdev->dev, "Device requested reset.\n");
> > > > >               gve_set_do_reset(priv);
> > > > >       }
> > > > > +     if (GVE_DEVICE_STATUS_REPORT_STATS_MASK & status) {
> > > > > +             dev_info(&priv->pdev->dev, "Device report stats on.\n");  
> > > >
> > > > How often is this printed?  
> > > Stats reporting is disabled by default. But when enabled, this would
> > > only get printed whenever the virtual NIC detects
> > > an issue and triggers a report-stats request.  
> >
> > What kind of issue? Something serious? Packet drops?  
> Sorry, to correct myself, this would get printed only at the moments
> when the device switches report-stats on, not on every stats report.
> After that, it would not get printed until it is switched off and then
> on again, so rarely.
> It would get switched on if there is a networking issue such as packet
> drops and help us investigate a stuck queue for example.

Reporting of the stats is a one-shot:

+	if (gve_get_do_report_stats(priv)) {
+		gve_handle_report_stats(priv);
+		gve_clear_do_report_stats(priv);
+	}

So the hypervisor implements some rate limiting on the requests?

In general, I don't think users will want these messages to keep
popping up. A ethtool -S statistic for the number of times reporting
happened should be sufficient. Or if you really want them - have a
verbose version of the priv flag, maybe?

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ