netdev - Re: [PATCH 0/1] NIU: fix spurious interrupts

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-Id: <20090522.010849.89655675.davem@davemloft.net>
Date:	Fri, 22 May 2009 01:08:49 -0700 (PDT)
From:	David Miller <davem@...emloft.net>
To:	hong.pham@...driver.com
Cc:	netdev@...r.kernel.org, matheos.worku@....com
Subject: Re: [PATCH 0/1] NIU: fix spurious interrupts

From: "Hong H. Pham" <hong.pham@...driver.com>
Date: Thu, 21 May 2009 20:40:06 -0400

> Posted below is a log with the fix.

Thank you.

> What's interesting (baffling?) is that interrupts are being received
> with the LD interrupt mask set or cleared.  The mask also changes
> in between interrupts.  The mask always changes from 3 to 0, and never
> from 0 to 3.

The "3 --> 0" transition is made by niu_poll_core() as we are
about to napi_complete() and rearm the LDG.

But yes this log doesn't make any sense.  Neither the masks
nor the ARM bit appear to be working.

I wonder if the spurious interrupts trigger exactly at the

		nw64(LD_IM0(LDN_RXDMA(rp->rx_channel)), 0);

in niu_poll_core().

Can you run one more test?  Supplement the debugging output
with:

	"%pS", get_irq_regs()->tpc

so we can see where the program counter is at the time of
the spurious interrupt?

Meanwhile, even if we go with your patch to fix this, we can't
use it as-is.  Let me explain.

Suppose that we get this spurious interrupt right after we unmask the
interrupt and right before napi_complete().  Your change will make us
re-mask the interrupts, but without scheduling NAPI.

So once the napi_complete() happens, if no further interrupts trigger
in that LDG, we'll never process those interrupt events cleared by
your new code.  See what I mean?

I don't know how to fix this, it's full of races.  I suppose we could
recheck if events are pending in the LDG after we do the
napi_complete() and reschedule NAPI again if so.  But that might be
expensive (several register reads, just to check something that's not
going to happen most of the time).

I'm also wondering why we see this on Niagara-2 and not on PCI-E
cards.  If the interrupts that go into the NCU unit of Niagara-2 are
levelled interrupts, and somehow the ARM bit is not implemented
correctly in the NIU logic when hooked up to NCU instead of PCI-E
logic, that could explain things.

I bet that our Linux driver is the only one that bangs on the LDG
mask registers like this.
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html