lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Date:	Wed, 18 Dec 2013 17:22:31 +1100
From:	David Gibson <david@...son.dropbear.id.au>
To:	Manish Chopra <manish.chopra@...gic.com>
Cc:	Sony Chacko <sony.chacko@...gic.com>,
	Rajesh Borundia <rajesh.borundia@...gic.com>,
	netdev <netdev@...r.kernel.org>,
	"snagarka@...hat.com" <snagarka@...hat.com>,
	"tcamuso@...hat.com" <tcamuso@...hat.com>,
	"vdasgupt@...hat.com" <vdasgupt@...hat.com>
Subject: Re: [0/2] netxen: bug fix and diagnostics for possible (hardware?)
 bug

On Tue, Dec 17, 2013 at 09:50:52PM +0000, Manish Chopra wrote:
> >-----Original Message-----
> >From: David Gibson [mailto:david@...son.dropbear.id.au]
> >Sent: Tuesday, December 17, 2013 10:53 AM
> >To: Manish Chopra; Sony Chacko; Rajesh Borundia
> >Cc: netdev; snagarka@...hat.com; tcamuso@...hat.com;
> >vdasgupt@...hat.com
> >Subject: [0/2] netxen: bug fix and diagnostics for possible (hardware?) bug
> >
> >At Red Hat, we've hit a couple of customer cases with crashes in the netxen driver
> >due to list corruption.  This seems to be very rarely triggered, and unfortunately
> >the dumps we have don't have enough information to be certain of the cause,
> >although we have a possible theory.
> >
> >I'm suggesting, therefore a patch to add some sanity checking which should help
> >to at least localize and mitigate the problem when someone hits it in future.
> >Please let me know if there's a better approach to doing this.
> >
> >That's 2/2.  1/2 is a fix for a clear bug I spotted along the way, but not one that
> >could cause the symptoms we've seen.
> 
> David,
> 
> Having these checks in data path(Rx path) may have some performance
> impact. It's better to root cause it instead of putting some sanity
> checks.

Obviously, but this was the best way I could think of to try narrowing
down the root cause (at least trying to eliminate driver vs. firmware
bug).

> We will get back to you on this.

If you have a better idea for locating the root cause, please let me
know.  I have access to a vmcore which I can poke around in.

-- 
David Gibson			| I'll have my music baroque, and my code
david AT gibson.dropbear.id.au	| minimalist, thank you.  NOT _the_ _other_
				| _way_ _around_!
http://www.ozlabs.org/~dgibson

Content of type "application/pgp-signature" skipped

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ