netdev - Re: PROBLEM: A set of networking related oopses

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Date:	Thu, 24 Apr 2008 17:25:59 +0300
From:	Tuomas Jormola <tj@...itudo.net>
To:	Jarek Poplawski <jarkao2@...il.com>
Cc:	netdev@...r.kernel.org
Subject: Re: PROBLEM: A set of networking related oopses

Hi again,

On Sun, Mar 09, 2008 at 06:31:22PM +0100, Jarek Poplawski wrote:
> On Sun, Mar 09, 2008 at 06:58:47PM +0200, Tuomas Jormola wrote:
> ...
> > there be new oopses, I will replace the old card with a newer Intel 
> > gigabit card that I have laying around, and put it in a different PCI 
> > slot.
> 
> The link I gave you described similar problem just with e1000.
> The next message after this thread looks alike (e1000 driver).
> So, you shouldn't hurry with this change. Just set this affinity
> for both cards and check if it's respected.
I've now run my system about a month with the following configuration. I
replaced the very old e100 card with a newer e1000 PCI card and set
affinity so that interrupts for the IRQs of both e1000e and e1000 cards
are handled by a single CPU, and this is working very well.

(17:15:13)(tj@...kti)(~)$ grep eth /proc/interrupts 
 18:   88113407       3780   IO-APIC-fasteoi   uhci_hcd:usb1, uhci_hcd:usb6, eth0
217:    9710797       4297   PCI-MSI-edge      eth1

(This is after about a 8 days of uptime, the affinity was set
automatically in a local init script)

And with this, I've gotten rid of the OOPSes I had earlier. But is this
really a feasible long term solution to the problem? I.e. if you're
getting networking related OOPSes with SMP kernel on a box with two or
more CPUs, the first thing you should do is to switch off the interrupt
handling load balacing between the CPUs by issuing some obscure statment
on the command line? I don't think that's very friendly advice for so
called regular users... There's no way to work around it on the kernel
side?

Also after installing the e1000 card, I've gotten a few of these dumps
(see attachments) from the e1000 driver (during about a month, a dozen
incidents, sometimes there might be 3 incidents a day, sometimes it
takes a week when everything's normal.

Thanks,

-- 
Tuomas Jormola <tj@...itudo.net>

View attachment "e1000-hang1.txt" of type "text/plain" (3658 bytes)

View attachment "e1000-hang2.txt" of type "text/plain" (2800 bytes)

Download attachment "signature.asc" of type "application/pgp-signature" (190 bytes)