[<prev] [next>] [<thread-prev] [day] [month] [year] [list]
Message-Id: <200803121341.54096.mitov@issp.bas.bg>
Date: Wed, 12 Mar 2008 13:41:53 +0200
From: Marin Mitov <mitov@...p.bas.bg>
To: Jeff Garzik <jeff@...zik.org>
Cc: linux-kernel@...r.kernel.org
Subject: Re: net: tx timeouts with skge, 8139too, dmfe drivers/NICs
On Monday 25 February 2008 10:53:01 pm you wrote:
> > As far as this happens with 3 different NICs/drivers could it be
> > a problem in the (common for all of them) networking subsystem?
>
> A TX timeout (like hardware timeouts, in general) is a very generic
> behavior, with many causes.
>
> In general, when you see timeouts with varied hardware and drivers,
> you're almost always dealing with a problem with interrupt delivery, or
> a generic system problem, rather than bugs in the network stack or all
> three drivers.
Well, this gave me a direction of research.
Using printk in various parts of skge driver, as well as modifying it to
collect different statistics (used via ethtool -S eth0), the following observations
had been made when it freezes:
1. interrupts are generated (status register shows there are pending
interrupts and they are NOT masked), but irq_handler is NOT invoked.
2. Looking on the cat /proc/interrups shows that when skge is working
both CPUs receive any IRQs. When skge freezes NO CPU receives skge's
interrupts, CPU[0] receives any others IRQs, but skge's, CPU[1] do not
receive any IRQ above the line (see bellow), but receives LOC: and RES:
below the line.
#cat /proc/interrups
CPU0 CPU1
0: 85 1 IO-APIC-edge timer
1: 34078 9 IO-APIC-edge i8042
6: 1 4 IO-APIC-edge floppy
7: 216 1 IO-APIC-edge parport0
8: 0 1 IO-APIC-edge rtc
9: 0 0 IO-APIC-fasteoi acpi
12: 893003 1390080 IO-APIC-edge i8042
14: 59682 286628 IO-APIC-edge ide0
15: 5458527 12 IO-APIC-edge ide1
16: 60547054 1 IO-APIC-fasteoi mga@pci:0000:01:00.0
17: 1634623 914447 IO-APIC-fasteoi sata_via
18: 7768 7 IO-APIC-fasteoi sata_promise
19: 0 0 IO-APIC-fasteoi ehci_hcd:usb1, uhci_hcd:usb2, uhci_hcd:usb3, uhci_hcd:usb4, uhci_hcd:usb5
20: 535380 1 IO-APIC-fasteoi VIA8237
21: 30780380 31448992 IO-APIC-fasteoi eth0
---------line added by me----------------------------------
NMI: 0 0 Non-maskable interrupts
LOC: 154311126 154736178 Local timer interrupts
RES: 1325239 2423719 Rescheduling interrupts
CAL: 40893 456 function call interrupts
TLB: 52651 29184 TLB shootdowns
TRM: 0 0 Thermal event interrupts
SPU: 0 0 Spurious interrupts
ERR: 0
MIS: 0
That looks like IRQs are somehow disabled (at IO-APIC/LAPIC?)
at some priority and bellow.
Here is the place to say that after freezing, ifconfig down/up (+routing info)
does NOT solve the problem, while rmmod/modprobe the driver, makes it work
again.
So, I moved the functions request_irq()/free_irq() from driver's probe()/release()
methods to open()/stop() methods. Thus modified, when skge freezes,
ifconfig down/up makes it work again (no need to rmmod/modprobe).
That makes me think that somehow skge's IRQ is disabled OUT of the driver
and free_irq()/request_irq() clears the problem. Am I wrong?
Could it be possible? How could this happen?
Any comments/suggestions/patches wellcome.
Regards
Marin Mitov
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
Powered by blists - more mailing lists