[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <4A16D5F6.8040000@windriver.com>
Date: Fri, 22 May 2009 12:42:30 -0400
From: "Hong H. Pham" <hong.pham@...driver.com>
To: David Miller <davem@...emloft.net>
CC: netdev@...r.kernel.org, matheos.worku@....com
Subject: Re: [PATCH 0/1] NIU: fix spurious interrupts
David Miller wrote:
> I wonder if the spurious interrupts trigger exactly at the
>
> nw64(LD_IM0(LDN_RXDMA(rp->rx_channel)), 0);
>
> in niu_poll_core().
>
> Can you run one more test? Supplement the debugging output
> with:
>
> "%pS", get_irq_regs()->tpc
>
> so we can see where the program counter is at the time of
> the spurious interrupt?
The tpc at the time of the spurious interrupt is niu_poll+0x99c.
Looking this address up, it's at this line in niu_ldg_rearm():
nw64(LDG_IMGMT(lp->ldg_num), val);
Since the timer is also reprogrammed when the LDG is rearmed,
interrupts should not have been generated immediately after
writing to LDG_IMGMT.
The tpc also showed interrupts happening in net_rx_action. In
this case the LDG has been rearmed, but the timer prevented
interrupt delivery until after niu_poll is done.
> Meanwhile, even if we go with your patch to fix this, we can't
> use it as-is. Let me explain.
>
> Suppose that we get this spurious interrupt right after we unmask the
> interrupt and right before napi_complete(). Your change will make us
> re-mask the interrupts, but without scheduling NAPI.
>
> So once the napi_complete() happens, if no further interrupts trigger
> in that LDG, we'll never process those interrupt events cleared by
> your new code. See what I mean?
Understood.
> I don't know how to fix this, it's full of races. I suppose we could
> recheck if events are pending in the LDG after we do the
> napi_complete() and reschedule NAPI again if so. But that might be
> expensive (several register reads, just to check something that's not
> going to happen most of the time).
> I'm also wondering why we see this on Niagara-2 and not on PCI-E
> cards. If the interrupts that go into the NCU unit of Niagara-2 are
> levelled interrupts, and somehow the ARM bit is not implemented
> correctly in the NIU logic when hooked up to NCU instead of PCI-E
> logic, that could explain things.
>
> I bet that our Linux driver is the only one that bangs on the LDG
> mask registers like this.
I tried the test on a T5440, which has a PCI-E NIU (4 x 1GB) card.
I could not reproduce the spurious interrupts. So this bug seems
to be limited to XAUI NIU cards. Which also makes it a Niagara-2
specific problem.
Regards,
Hong
[ 2226.589782] NIU: eth4 CPU=5 LDG=41 rx_vec=0x2000: spurious interrupt
[ 2226.589800] tpc = <niu_poll+0x99c/0xc20>
[ 2226.589814] LD_IM0 = 0x0000000000000003 [ldf_mask=0x03]
[ 2226.589826] LDG_IMGMT= 0x0000000000000000 [arm=0x00 timer=0x00]
[ 2226.589855] NIU: eth4 CPU=5 LDG=41 rx_vec=0x2000: spurious interrupt
[ 2226.589867] tpc = <niu_poll+0x99c/0xc20>
[ 2226.589878] LD_IM0 = 0x0000000000000000 [ldf_mask=0x00]
[ 2226.589890] LDG_IMGMT= 0x0000000000000000 [arm=0x00 timer=0x00]
[ 2226.589915] NIU: eth4 CPU=5 LDG=41 rx_vec=0x2000: spurious interrupt
[ 2226.589927] tpc = <niu_poll+0x99c/0xc20>
[ 2226.589938] LD_IM0 = 0x0000000000000000 [ldf_mask=0x00]
[ 2226.589950] LDG_IMGMT= 0x0000000000000000 [arm=0x00 timer=0x00]
[ 2226.589974] NIU: eth4 CPU=5 LDG=41 rx_vec=0x2000: spurious interrupt
[ 2226.589986] tpc = <niu_poll+0x99c/0xc20>
[ 2226.589996] LD_IM0 = 0x0000000000000000 [ldf_mask=0x00]
[ 2226.590008] LDG_IMGMT= 0x0000000000000000 [arm=0x00 timer=0x00]
[ 2229.380931] NIU: eth4 CPU=58 LDG=40 rx_vec=0x1000: spurious interrupt
[ 2229.380949] tpc = <niu_poll+0x99c/0xc20>
[ 2229.380962] LD_IM0 = 0x0000000000000000 [ldf_mask=0x00]
[ 2229.380974] LDG_IMGMT= 0x0000000000000000 [arm=0x00 timer=0x00]
[ 2229.381003] NIU: eth4 CPU=58 LDG=40 rx_vec=0x1000: spurious interrupt
[ 2229.381015] tpc = <niu_poll+0x99c/0xc20>
[ 2229.381026] LD_IM0 = 0x0000000000000000 [ldf_mask=0x00]
[ 2229.381038] LDG_IMGMT= 0x0000000000000000 [arm=0x00 timer=0x00]
[ 2229.381063] NIU: eth4 CPU=58 LDG=40 rx_vec=0x1000: spurious interrupt
[ 2229.381075] tpc = <niu_poll+0x99c/0xc20>
[ 2229.381086] LD_IM0 = 0x0000000000000000 [ldf_mask=0x00]
[ 2229.381097] LDG_IMGMT= 0x0000000000000000 [arm=0x00 timer=0x00]
[ 2229.381122] NIU: eth4 CPU=58 LDG=40 rx_vec=0x1000: spurious interrupt
[ 2229.381134] tpc = <niu_poll+0x99c/0xc20>
[ 2229.381145] LD_IM0 = 0x0000000000000000 [ldf_mask=0x00]
[ 2229.381156] LDG_IMGMT= 0x0000000000000000 [arm=0x00 timer=0x00]
[ 2236.743967] NIU: eth4 CPU=21 LDG=43 rx_vec=0x8000: spurious interrupt
[ 2236.743983] tpc = <net_rx_action+0x138/0x260>
[ 2236.743996] LD_IM0 = 0x0000000000000000 [ldf_mask=0x00]
[ 2236.744008] LDG_IMGMT= 0x0000000000000000 [arm=0x00 timer=0x00]
[ 2236.744034] NIU: eth4 CPU=21 LDG=43 rx_vec=0x8000: spurious interrupt
[ 2236.744046] tpc = <net_rx_action+0x138/0x260>
[ 2236.744058] LD_IM0 = 0x0000000000000000 [ldf_mask=0x00]
[ 2236.744070] LDG_IMGMT= 0x0000000000000000 [arm=0x00 timer=0x00]
[ 2236.744095] NIU: eth4 CPU=21 LDG=43 rx_vec=0x8000: spurious interrupt
[ 2236.744107] tpc = <net_rx_action+0x138/0x260>
[ 2236.744118] LD_IM0 = 0x0000000000000000 [ldf_mask=0x00]
[ 2236.744130] LDG_IMGMT= 0x0000000000000000 [arm=0x00 timer=0x00]
[ 2236.744155] NIU: eth4 CPU=21 LDG=43 rx_vec=0x8000: spurious interrupt
[ 2236.744167] tpc = <net_rx_action+0x138/0x260>
[ 2236.744178] LD_IM0 = 0x0000000000000000 [ldf_mask=0x00]
[ 2236.744190] LDG_IMGMT= 0x0000000000000000 [arm=0x00 timer=0x00]
View attachment "niu-instrument-ldg-interrupt.patch" of type "text/plain" (2470 bytes)
Powered by blists - more mailing lists