lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Date:	Tue, 5 Aug 2008 09:52:28 -0700
From:	"Ronciak, John" <john.ronciak@...el.com>
To:	Neil Horman <nhorman@...driver.com>,
	Ben Hutchings <bhutchings@...arflare.com>
CC:	"netdev@...r.kernel.org" <netdev@...r.kernel.org>,
	"e1000-devel@...ts.sourceforge.net" 
	<e1000-devel@...ts.sourceforge.net>,
	"Kirsher, Jeffrey T" <jeffrey.t.kirsher@...el.com>,
	"Brandeburg, Jesse" <jesse.brandeburg@...el.com>,
	"Allan, Bruce W" <bruce.w.allan@...el.com>
Subject: RE: [PATCH] catch up device stats when multicast > total frames

The total packet count in the driver was done so that the count is always right even in between stats updates.  Believe it or not there are some people that think this is an important thing for some reason.  We don't understand that but agreed to do this to correct what they were seeing with the stats being updated every 2 seconds.

The problem that seems to be happening here is that the BMC is consuming frames that the driver never sees.  This I think is what is messing up the counts.  The problem we have is that we aren't sure if all the BMC's are doing the same thing.  We can't seem to find that information out.  So we end up with a fix like that that prevents the calculations from going negative.

I'm not sure what the right thing to do is.  I suggested in the past (in the BZ) that the total HW packet count be used for this calculation but Niel didn't like that for some reason, at least I think that's why it wasn't used.  I suggested that the HW counts be used when looking for these MC frame counts and not use the real time counts it should work correctly without regard to which BMC is being used.  It would always calculate correctly.  The calculation could come out of the update stats routine.

BTW, why did this move from the RH BZ to here?


Cheers,
John
-----------------------------------------------------------
167 days until the end of the error!
-----------------------------------------------------------
"Those who would give up essential Liberty, to purchase a little temporary Safety, deserve neither Liberty nor Safety.", Benjamin Franklin 1755

-----Original Message-----
From: Neil Horman [mailto:nhorman@...driver.com]
Sent: Tuesday, August 05, 2008 6:16 AM
To: Ben Hutchings
Cc: netdev@...r.kernel.org; e1000-devel@...ts.sourceforge.net; Kirsher, Jeffrey T; Brandeburg, Jesse; Allan, Bruce W; Ronciak, John
Subject: Re: [PATCH] catch up device stats when multicast > total frames

On Tue, Aug 05, 2008 at 01:15:29PM +0100, Ben Hutchings wrote:
> Neil Horman wrote:
> > Hey-
> >     REcently observed a problem wherein, if a BMC or other IPMI device is
> > attached to a NIC, multicast frames can be consumed by the aformentioned device
> > without ever being seen by the driver.  Since multicast frames are counted in
> > the hardware and the total frame counter is counted in the driver napi routine,
>
> I'd be surprised if the hardware does not also maintain a total frame
> counter.  If not, you can possibly calculate the total as good + bad
> packets, or unicast + multicast + broadcast + bad, or something like that.
>

It does, however,  for whatever reason the drriver relies on a software
computed total packets counter rather than the hardware based on (possibly for
the same reason, in that it counts frames the driver never actually sees).  I
asked John R. in the origional bug report why e1000 might do this, and while
neither of us were sure, that seemed to be the consensus.

> [...]
> > --- a/drivers/net/e1000/e1000_main.c
> > +++ b/drivers/net/e1000/e1000_main.c
> > @@ -3761,6 +3761,8 @@ e1000_update_stats(struct e1000_adapter *adapter)
> >     /* Fill out the OS statistics structure */
> >     adapter->net_stats.multicast = adapter->stats.mprc;
> >     adapter->net_stats.collisions = adapter->stats.colc;
> > +   if (adapter->net_stats.rx_packets < adapter->net_stats.multicast)
> > +           adapter->net_stats.rx_packets = adapter->net_stats.multicast;
> >
> >     /* Rx Errors */
> >
> [...]
>
> This is a botch - it means the numbers can't be so obviously wrong, but
> doesn't make them correct.
>
I agree, but as it stands currently the numbers are obviously wrong, and I don't
see a way to make them undenyably correct.  We could use the hardware counter
instead, but then we would be counting frames that various IPMI endpoints might
consume, so that wouldn't be quite right.  We could fix up the sofware counter
based on the hardware counter, but that doesn't seem right either.  We could
inspect the packet in the driver, and do a software multicast frame counter, but
that would be a big performance hit.  So there is no great fix to this problem.
This at least makes the stats sane, doesn't impact performance, and doesn't
affect stats for which IPMI reception isn't an issue.

John, you stopped commenting in the origional bug report, do you have any
further thoughts on this as a fix to the issue at hand?

Neil


--
/****************************************************
 * Neil Horman <nhorman@...driver.com>
 * Software Engineer, Red Hat
 ****************************************************/
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ