lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite for Android: free password hash cracker in your pocket
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-Id: <20090518.220911.102225532.davem@davemloft.net>
Date:	Mon, 18 May 2009 22:09:11 -0700 (PDT)
From:	David Miller <davem@...emloft.net>
To:	hong.pham@...driver.com
Cc:	netdev@...r.kernel.org, matheos.worku@....com
Subject: Re: [PATCH 0/1] NIU: fix spurious interrupts

From: "Hong H. Pham" <hong.pham@...driver.com>
Date: Mon, 11 May 2009 15:00:52 -0400

> I've tracked down a hang on a SPARC64 system (a Netra T5220 with 64 strands)
> whenever the NIU is handling lots of receive traffic.  The hang is
> reproducible by running iperf with multiple TCP streams (eg. iperf -P16 ...),
> with the SPARC box being the listener.
> 
> I've found that it's possible for an RX DMA interrupt to be triggered
> while NAPI is in progress.  When this happens, spurious interrupts will
> keep being regenerated which will cause the CPU to hang.  It's too busy
> servicing the spurious interrupts, and the NIU NAPI handler (or anything
> else on that CPU) never gets a chance to run.
> 
> In niu_schedule_napi(), if the logical device interrupt is unconditionally
> masked out by calling __niu_fastpath_interrupt(), the hang goes away.

Thanks for tracking down this problem, but I want to understand
why this even happens.  As far as I can tell it shouldn't.

When we are done polling, the order of events is:

1) unmask LDG interrupt(s)
2) napi_complete()
3) rearm LDG interrupt(s)

The interrupts should not be sent again until that rearm operation,
which is after NAPI is completed.  So the condition you are hitting
does not seem possible.

Matheos, can the chip violate this?  If an RX event is reported
in an LDG, it is masked, and then unmaked the interrupt should
not appear until the LDG is also rearmed right?
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ