lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <4A16D5F6.8040000@windriver.com>
Date:	Fri, 22 May 2009 12:42:30 -0400
From:	"Hong H. Pham" <hong.pham@...driver.com>
To:	David Miller <davem@...emloft.net>
CC:	netdev@...r.kernel.org, matheos.worku@....com
Subject: Re: [PATCH 0/1] NIU: fix spurious interrupts

David Miller wrote:
> I wonder if the spurious interrupts trigger exactly at the
> 
> 		nw64(LD_IM0(LDN_RXDMA(rp->rx_channel)), 0);
> 
> in niu_poll_core().
> 
> Can you run one more test?  Supplement the debugging output
> with:
> 
> 	"%pS", get_irq_regs()->tpc
> 
> so we can see where the program counter is at the time of
> the spurious interrupt?

The tpc at the time of the spurious interrupt is niu_poll+0x99c.
Looking this address up, it's at this line in niu_ldg_rearm():

   nw64(LDG_IMGMT(lp->ldg_num), val);

Since the timer is also reprogrammed when the LDG is rearmed,
interrupts should not have been generated immediately after
writing to LDG_IMGMT.

The tpc also showed interrupts happening in net_rx_action.  In
this case the LDG has been rearmed, but the timer prevented
interrupt delivery until after niu_poll is done.

> Meanwhile, even if we go with your patch to fix this, we can't
> use it as-is.  Let me explain.
> 
> Suppose that we get this spurious interrupt right after we unmask the
> interrupt and right before napi_complete().  Your change will make us
> re-mask the interrupts, but without scheduling NAPI.
> 
> So once the napi_complete() happens, if no further interrupts trigger
> in that LDG, we'll never process those interrupt events cleared by
> your new code.  See what I mean?

Understood.

> I don't know how to fix this, it's full of races.  I suppose we could
> recheck if events are pending in the LDG after we do the
> napi_complete() and reschedule NAPI again if so.  But that might be
> expensive (several register reads, just to check something that's not
> going to happen most of the time).

> I'm also wondering why we see this on Niagara-2 and not on PCI-E
> cards.  If the interrupts that go into the NCU unit of Niagara-2 are
> levelled interrupts, and somehow the ARM bit is not implemented
> correctly in the NIU logic when hooked up to NCU instead of PCI-E
> logic, that could explain things.
> 
> I bet that our Linux driver is the only one that bangs on the LDG
> mask registers like this.

I tried the test on a T5440, which has a PCI-E NIU (4 x 1GB) card.
I could not reproduce the spurious interrupts.  So this bug seems
to be limited to XAUI NIU cards.  Which also makes it a Niagara-2
specific problem.

Regards,
Hong

[ 2226.589782] NIU: eth4 CPU=5 LDG=41 rx_vec=0x2000: spurious interrupt
[ 2226.589800]   tpc      = <niu_poll+0x99c/0xc20>
[ 2226.589814]   LD_IM0   = 0x0000000000000003 [ldf_mask=0x03]
[ 2226.589826]   LDG_IMGMT= 0x0000000000000000 [arm=0x00 timer=0x00]
[ 2226.589855] NIU: eth4 CPU=5 LDG=41 rx_vec=0x2000: spurious interrupt
[ 2226.589867]   tpc      = <niu_poll+0x99c/0xc20>
[ 2226.589878]   LD_IM0   = 0x0000000000000000 [ldf_mask=0x00]
[ 2226.589890]   LDG_IMGMT= 0x0000000000000000 [arm=0x00 timer=0x00]
[ 2226.589915] NIU: eth4 CPU=5 LDG=41 rx_vec=0x2000: spurious interrupt
[ 2226.589927]   tpc      = <niu_poll+0x99c/0xc20>
[ 2226.589938]   LD_IM0   = 0x0000000000000000 [ldf_mask=0x00]
[ 2226.589950]   LDG_IMGMT= 0x0000000000000000 [arm=0x00 timer=0x00]
[ 2226.589974] NIU: eth4 CPU=5 LDG=41 rx_vec=0x2000: spurious interrupt
[ 2226.589986]   tpc      = <niu_poll+0x99c/0xc20>
[ 2226.589996]   LD_IM0   = 0x0000000000000000 [ldf_mask=0x00]
[ 2226.590008]   LDG_IMGMT= 0x0000000000000000 [arm=0x00 timer=0x00]
[ 2229.380931] NIU: eth4 CPU=58 LDG=40 rx_vec=0x1000: spurious interrupt
[ 2229.380949]   tpc      = <niu_poll+0x99c/0xc20>
[ 2229.380962]   LD_IM0   = 0x0000000000000000 [ldf_mask=0x00]
[ 2229.380974]   LDG_IMGMT= 0x0000000000000000 [arm=0x00 timer=0x00]
[ 2229.381003] NIU: eth4 CPU=58 LDG=40 rx_vec=0x1000: spurious interrupt
[ 2229.381015]   tpc      = <niu_poll+0x99c/0xc20>
[ 2229.381026]   LD_IM0   = 0x0000000000000000 [ldf_mask=0x00]
[ 2229.381038]   LDG_IMGMT= 0x0000000000000000 [arm=0x00 timer=0x00]
[ 2229.381063] NIU: eth4 CPU=58 LDG=40 rx_vec=0x1000: spurious interrupt
[ 2229.381075]   tpc      = <niu_poll+0x99c/0xc20>
[ 2229.381086]   LD_IM0   = 0x0000000000000000 [ldf_mask=0x00]
[ 2229.381097]   LDG_IMGMT= 0x0000000000000000 [arm=0x00 timer=0x00]
[ 2229.381122] NIU: eth4 CPU=58 LDG=40 rx_vec=0x1000: spurious interrupt
[ 2229.381134]   tpc      = <niu_poll+0x99c/0xc20>
[ 2229.381145]   LD_IM0   = 0x0000000000000000 [ldf_mask=0x00]
[ 2229.381156]   LDG_IMGMT= 0x0000000000000000 [arm=0x00 timer=0x00]
[ 2236.743967] NIU: eth4 CPU=21 LDG=43 rx_vec=0x8000: spurious interrupt
[ 2236.743983]   tpc      = <net_rx_action+0x138/0x260>
[ 2236.743996]   LD_IM0   = 0x0000000000000000 [ldf_mask=0x00]
[ 2236.744008]   LDG_IMGMT= 0x0000000000000000 [arm=0x00 timer=0x00]
[ 2236.744034] NIU: eth4 CPU=21 LDG=43 rx_vec=0x8000: spurious interrupt
[ 2236.744046]   tpc      = <net_rx_action+0x138/0x260>
[ 2236.744058]   LD_IM0   = 0x0000000000000000 [ldf_mask=0x00]
[ 2236.744070]   LDG_IMGMT= 0x0000000000000000 [arm=0x00 timer=0x00]
[ 2236.744095] NIU: eth4 CPU=21 LDG=43 rx_vec=0x8000: spurious interrupt
[ 2236.744107]   tpc      = <net_rx_action+0x138/0x260>
[ 2236.744118]   LD_IM0   = 0x0000000000000000 [ldf_mask=0x00]
[ 2236.744130]   LDG_IMGMT= 0x0000000000000000 [arm=0x00 timer=0x00]
[ 2236.744155] NIU: eth4 CPU=21 LDG=43 rx_vec=0x8000: spurious interrupt
[ 2236.744167]   tpc      = <net_rx_action+0x138/0x260>
[ 2236.744178]   LD_IM0   = 0x0000000000000000 [ldf_mask=0x00]
[ 2236.744190]   LDG_IMGMT= 0x0000000000000000 [arm=0x00 timer=0x00]


View attachment "niu-instrument-ldg-interrupt.patch" of type "text/plain" (2470 bytes)

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ