netdev - Re: [PATCH 0/1] NIU: fix spurious interrupts

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-Id: <20090519.150156.115978100.davem@davemloft.net>
Date:	Tue, 19 May 2009 15:01:56 -0700 (PDT)
From:	David Miller <davem@...emloft.net>
To:	hong.pham@...driver.com
Cc:	netdev@...r.kernel.org, matheos.worku@....com
Subject: Re: [PATCH 0/1] NIU: fix spurious interrupts

From: "Hong H. Pham" <hong.pham@...driver.com>
Date: Tue, 19 May 2009 17:52:15 -0400

> Unfortunately I don't have a PCIe NIU card to test in an x86 box.
> If the hang does not happen on x86 (which is my suspicion), that
> would rule out a problem with the NIU chip.  That would mean there's
> some interaction between the NIU and sun4v hypervisor that's causing
> the spurious interrupts.

I am still leaning towards the NIU chip, or our programming of
it, as causing this behavior.

Although it's possible that the interrupt logic inside of
Niagara-T2, or how it's hooked up to the internal NIU ASIC
inside of the CPU, might be to blame I don't consider it likely
given the basic gist of the behavior you see.

To quote section 17.3.2 of the UltraSPARC-T2 manual:

	An interrupt will only be issued if the timer is zero,
	the arm bit is set, and one of more LD's in the LDG, have
	their flags set and not masked.

which confirms our understanding of how this should work.

Can you test something Hong?  Simply trigger the hung case
and when it happens read the LDG registers to see if the ARM
bit is set, and what the LDG mask bits say.

There might be a bug somewhere that causes us to call
niu_ldg_rearm() improperly.  In particular I'm looking
at that test done in niu_interrupt():

	if (likely(v0 & ~((u64)1 << LDN_MIF)))
		niu_schedule_napi(np, lp, v0, v1, v2);
	else
		niu_ldg_rearm(np, lp, 1);

If we call niu_ldg_rearm() on an LDG being serviced by NAPI
before that poll sequence calls napi_complete() we could
definitely see this weird behavior.  And whatever causes
that would be the bug to fix.

Thanks!

--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html