[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-Id: <20090518.220911.102225532.davem@davemloft.net>
Date: Mon, 18 May 2009 22:09:11 -0700 (PDT)
From: David Miller <davem@...emloft.net>
To: hong.pham@...driver.com
Cc: netdev@...r.kernel.org, matheos.worku@....com
Subject: Re: [PATCH 0/1] NIU: fix spurious interrupts
From: "Hong H. Pham" <hong.pham@...driver.com>
Date: Mon, 11 May 2009 15:00:52 -0400
> I've tracked down a hang on a SPARC64 system (a Netra T5220 with 64 strands)
> whenever the NIU is handling lots of receive traffic. The hang is
> reproducible by running iperf with multiple TCP streams (eg. iperf -P16 ...),
> with the SPARC box being the listener.
>
> I've found that it's possible for an RX DMA interrupt to be triggered
> while NAPI is in progress. When this happens, spurious interrupts will
> keep being regenerated which will cause the CPU to hang. It's too busy
> servicing the spurious interrupts, and the NIU NAPI handler (or anything
> else on that CPU) never gets a chance to run.
>
> In niu_schedule_napi(), if the logical device interrupt is unconditionally
> masked out by calling __niu_fastpath_interrupt(), the hang goes away.
Thanks for tracking down this problem, but I want to understand
why this even happens. As far as I can tell it shouldn't.
When we are done polling, the order of events is:
1) unmask LDG interrupt(s)
2) napi_complete()
3) rearm LDG interrupt(s)
The interrupts should not be sent again until that rearm operation,
which is after NAPI is completed. So the condition you are hitting
does not seem possible.
Matheos, can the chip violate this? If an RX event is reported
in an LDG, it is masked, and then unmaked the interrupt should
not appear until the LDG is also rearmed right?
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Powered by blists - more mailing lists