netdev - Re: [PATCH] net: mvpp2: add ethtool GOP statistics

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [day] [month] [year] [list]
Message-ID: <20171103113721.61685dfd@xps13>
Date:   Fri, 3 Nov 2017 11:37:21 +0100
From:   Miquel RAYNAL <miquel.raynal@...e-electrons.com>
To:     Florian Fainelli <f.fainelli@...il.com>
Cc:     "David S . Miller" <davem@...emloft.net>,
        Thomas Petazzoni <thomas.petazzoni@...e-electrons.com>,
        Antoine Tenart <antoine.tenart@...e-electrons.com>,
        Gregory Clement <gregory.clement@...e-electrons.com>,
        Nadav Haklai <nadavh@...vell.com>, netdev@...r.kernel.org,
        Stefan Chulski <stefanc@...vell.com>
Subject: Re: [PATCH] net: mvpp2: add ethtool GOP statistics

Hi Florian,

> > +static u64 mvpp2_read_count(struct mvpp2_port *port, unsigned int
> > offset) +{
> > +	bool reg_is_64b =
> > +		(offset == MVPP2_MIB_GOOD_OCTETS_RCVD_LOW) ||
> > +		(offset == MVPP2_MIB_GOOD_OCTETS_SENT_LOW);  
> 
> This does not scale very well, put that in your statistics structure
> and define a member "reg_is_64b" there such that you can pass a
> pointer to one of these members here, and check, on per-counter basis
> whether this is needed or not.

I completely agree, I will use a third parameter in the statistics
structure as you suggest.

> 
> > +	void __iomem *base;
> > +	u64 val;
> > +
> > +	if (port->priv->hw_version == MVPP21)
> > +		base = port->priv->lms_base +
> > MVPP21_MIB_COUNTERS_OFFSET +
> > +		       port->gop_id * MVPP21_MIB_COUNTERS_PORT_SZ;
> > +	else
> > +		base = port->priv->iface_base +
> > MVPP22_MIB_COUNTERS_OFFSET +
> > +		       port->gop_id * MVPP22_MIB_COUNTERS_PORT_SZ;
> > +
> > +	val = readl(base + offset);
> > +	if (reg_is_64b)
> > +		val += (u64)readl(base + offset + 4) << 32;  
> 
> So the value gets latched when the higher part gets read last?

When counters are 64-bit, they use two 32-bit registers to store the
value. If the lower 32 bits are at address X, then the higher 32 bits
are at X+4.

> 
> > +
> > +	return val;
> > +}
> > +
> > +struct mvpp2_ethtool_statistics {
> > +	unsigned int offset;
> > +	const char string[ETH_GSTRING_LEN];  
> 
> Add your reg_is_64b member here too.

Of course, yes.

> 
> > +};
> > +
> > +/* Due to the fact that software statistics and hardware
> > statistics are, by
> > + * design, incremented at different moments in the chain of packet
> > processing,
> > + * it is very likely that incoming packets could have been dropped
> > after being
> > + * counted by hardware but before reaching software statistics
> > (most probably
> > + * multicast packets), and in the oppposite way, during
> > transmission, FCS bytes
> > + * are added in between as well as TSO skb will be split and
> > header bytes added.
> > + */  
> 
> OK, not sure what to make of that comment.

For me it is a problem if when doing 'ifconfig eth0' and ' ethtool -S
eth0' gives you totally different statistics. This comment explains why
as I asked myself how the statistics could evolve so differently. Hence
the comment. Do you want me to get rid of it? For now I have added this
to make it clearer:

  * Hence, statistics gathered from userspace with ifconfig (software)
  * and ethtool (hardware) cannot be compared.
  */

> > +static void mvpp2_gather_hw_statistics(struct work_struct *work)
> > +{
> > +	struct delayed_work *del_work = to_delayed_work(work);
> > +	struct mvpp2 *priv = container_of(del_work, struct mvpp2,
> > stats_work);
> > +	struct mvpp2_port *port;
> > +	u64 *pstats;
> > +	int i, j;
> > +
> > +	for (i = 0; i < priv->port_count; i++) {
> > +		if (!priv->port_list[i])
> > +			continue;
> > +
> > +		port = priv->port_list[i];
> > +		pstats = port->ethtool_stats;  
> 
> port->ethtool_stats was allocated this way:
> 
> port->ethtool_stats = kzalloc(sizeof(mvpp2_ethtool_stats)) instead of
> ARRAY_SIZE(mvpp2_ethtool_stats) * sizeof(u64)
> 
> This is probably working right now because mvpp2_ethtool_stats is much
> bigger than ARRAY_SIZE(mvpp2_ethtool_stats) * sizeof(u64).

My mistake, sorry about that.

> 
> 
> > +		for (j = 0; j < ARRAY_SIZE(mvpp2_ethtool_stats);
> > j++)
> > +			*pstats++ += mvpp2_read_count(
> > +				port,
> > mvpp2_ethtool_stats[j].offset);  
> 
> You might want to look into the helper functions from
> include/linux/u64_stats_sync.h to safely add a 32-bit quantity to a
> 64-bit quantity on 32-bit hosts.

I looked at the header, but I do not think this applies here because
once the first register in the list is read (at the lowest address),
counters are freezed until the last register is read (hardware is
still counting but will not update the registers we read until then).
So I guess there is no need for it here, what do you think?


> >  static void mvpp2_port_reset(struct mvpp2_port *port)
> >  {
> >  	u32 val;
> > +	int i;  
> 
> unsigned int i

Ok.


> > @@ -7613,13 +7788,19 @@ static int mvpp2_port_probe(struct
> > platform_device *pdev, port->base = priv->iface_base +
> > MVPP22_GMAC_BASE(port->gop_id); }
> >  
> > -	/* Alloc per-cpu stats */
> > +	/* Alloc per-cpu and ethtool stats */
> >  	port->stats = netdev_alloc_pcpu_stats(struct
> > mvpp2_pcpu_stats); if (!port->stats) {
> >  		err = -ENOMEM;
> >  		goto err_free_irq;
> >  	}
> >  
> > +	port->ethtool_stats = kzalloc(sizeof(mvpp2_ethtool_stats),
> > GFP_KERNEL);
> > +	if (!port->ethtool_stats) {
> > +		err = -ENOMEM;
> > +		goto err_free_stats;
> > +	}  
> 
> Should not the above be kcalloc(sizeof(u64),
> ARRAY_SIZE(mvpp2_ethtool_stats), GFP_KERNEL)? That is, an array of
> ARRAY_SIZE(mvpp2_ethtool_stats) elements, each sizeof(u64) bytes wide?

Yes it is, corrected.


> > (!priv->port_list) { @@ -8023,6 +8207,20 @@ static int
> > mvpp2_probe(struct platform_device *pdev) goto err_mg_clk;
> >  	}
> >  
> > +	/* Statistics must be gathered regularly because some of
> > them (like
> > +	 * packets counters) are 32-bit registers and could
> > overflow quite
> > +	 * quickly. For instance, a 10Gb link used at full
> > bandwidth with the
> > +	 * smallest packets (64B) will overflow a 32-bit counter
> > in less than
> > +	 * 30 seconds. Then, use a workqueue to fill 64-bit
> > counters.
> > +	 */
> > +	priv->stats_queue =
> > create_singlethread_workqueue("mvpp2_hw_stats");
> > +	if (!priv->stats_queue) {
> > +		err = -ENOMEM;
> > +		goto err_mg_clk;
> > +	}  
> 
> If I have multiple of these network devices in my system, it would be
> nice to have an unique identifier after mvpp22_hw_stats to help
> differentiate them (and possibly change their scheduling priorities),
> how about using "mvpp2_hw_stats/%d"?

Ok.

> 
> > +
> > +	INIT_DELAYED_WORK(&priv->stats_work,
> > mvpp2_gather_hw_statistics); +
> >  	/* Initialize ports */
> >  	i = 0;
> >  	for_each_available_child_of_node(dn, port_node) {
> > @@ -8033,6 +8231,10 @@ static int mvpp2_probe(struct
> > platform_device *pdev) }
> >  
> >  	platform_set_drvdata(pdev, priv);
> > +
> > +	queue_delayed_work(priv->stats_queue, &priv->stats_work,
> > +			   MVPP2_MIB_COUNTERS_STATS_DELAY);  
> 
> If the network interface is not used (ndo_open is not called) we have
> this workqueue running for nothing because the statistics should not
> even increase, and this is just creating unnecessary system activity
> for nothing.

That is right, thank you for pointing it.

> 
> > +
> >  	return 0;
> >  
> >  err_mg_clk:
> > @@ -8053,6 +8255,18 @@ static int mvpp2_remove(struct
> > platform_device *pdev) struct device_node *port_node;
> >  	int i = 0;
> >  
> > +	/* This work recall himself within a delay. If the
> > cancellation returned
> > +	 * a non-zero value, it means a work is still running. In
> > that case, use
> > +	 * use the flush (returns when the running work will be
> > done) and cancel
> > +	 * the new work that was just submitted to the queue but
> > not started yet
> > +	 * due to the delay.
> > +	 */
> > +	if (!cancel_delayed_work(&priv->stats_work)) {
> > +		flush_workqueue(priv->stats_queue);
> > +		cancel_delayed_work(&priv->stats_work);
> > +	}  
> 
> Similarly, this needs to be moved to the ndo_stop() function.

Sure.

> 
> > +	destroy_workqueue(priv->stats_queue);
> > +
> >  	for_each_available_child_of_node(dn, port_node) {
> >  		if (priv->port_list[i])
> >  			mvpp2_port_remove(priv->port_list[i]);
> >   
> 
> 

Thanks for reviewing.

Best regards,
Miquèl