lists.openwall.net | lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC | |
Open Source and information security mailing list archives
| ||
|
Date: Tue, 19 Jul 2011 15:24:00 -0700 From: Ben Greear <greearb@...delatech.com> To: john stultz <johnstul@...ibm.com> CC: Linux Kernel Mailing List <linux-kernel@...r.kernel.org>, "Rafael J. Wysocki" <rjw@...k.pl>, Maciej Rutecki <maciej.rutecki@...il.com>, Thomas Gleixner <tglx@...utronix.de>, Andrew Morton <akpm@...ux-foundation.org> Subject: Re: BUG spinlock lockup, rtc related, 3.0-rc7+ On 07/19/2011 03:17 PM, john stultz wrote: > On Wed, Jul 13, 2011 at 10:29 AM, Ben Greear<greearb@...delatech.com> wrote: >> This is on the same nfs testing machine I've been posting about. This >> has some additional nfs patches included, running tests to mount, do io, >> unmount >> over and over again. Seems that the NFS bugs might be finally fixed, but >> system is still un-stable in general when under load. >> >> This info was printed after several other warnings that I previously posted >> to lkml. >> >> This one appears to lock up the machine pretty badly though...can't ssh into >> it anymore, and similar messages keep spewing every few minutes. >> >> I *think* the BUG at the end of this email is the important part, but >> maybe it's just a symptom of something else... > > Huh. So does this trigger frequently, or was this just a one time > thing? I suspect the latter. It seems I have been hitting a lot of rcu-boost locking issues on this system with my nfs mount/unmount testing. The system was having various lockups and bugs, but I don't think I saw this particular one more than once or perhaps twice. I plan to run some more tests with the rcu-boost locking fixes applied to the kernel shortly. At the time I reported this, I wasn't aware of the rcu boost bugs, but perhaps that is root cause here as well...I don't know enough about the code in question to make an educated guess. >> From the looks of it, there's the btserver process (on cpu4) which > during exit is caught up spinning trying to get the hrtimer base lock > from hrtimer_cancel() in rtc_irq_set_state() when cleaning up from > rtc_device_release(). > > Meanwhile, On cpu0, a rtc periodic timer has fired and we're stuck in > rtc_handle_legacy_irq(), likely waiting for the irq_task_lock held by > cpu4 in rtc_irq_set_state(). > > The rest of the cpus are idle, with the exception of the one that > detected the stall from the normal timer tick. > > Hrmm.. It sounds like a circular lock between the rtc->irq_task_lock > and the hrtimer base lock. > > rtc_irq_set_state: Grab irq_task_lock -> call hrtimer_cancel -> grab > hrtimer_base_lock > > IRQ: grab hrtimer_base_lock -> run timers -> rtc_handle_legacy_irq -> > grab irq_task_lock > > But looking at __run_hrtimer(), the base lock should be released > before the timer is run. > > So I'm not really sure what would be gumming up things here. > > Thomas: Any thoughts? There shouldn't be an issue calling > hrtimer_cancel or other hrtimer operations from an hrtimer handler > right? > > thanks > -john -- Ben Greear <greearb@...delatech.com> Candela Technologies Inc http://www.candelatech.com -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majordomo@...r.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Powered by blists - more mailing lists