[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <42930.166.70.238.46.1219406040.squirrel@webmail.wolfmountaingroup.com>
Date: Fri, 22 Aug 2008 05:54:00 -0600 (MDT)
From: jmerkey@...fmountaingroup.com
To: "Stefan Richter" <stefanr@...6.in-berlin.de>
Cc: "Nick Piggin" <nickpiggin@...oo.com.au>,
jmerkey@...fmountaingroup.com, paulmck@...ux.vnet.ibm.com,
"Peter Zijlstra" <peterz@...radead.org>,
linux-kernel@...r.kernel.org,
"Linus Torvalds" <torvalds@...ux-foundation.org>,
"David Howells" <dhowells@...hat.com>
Subject: Re: [ANNOUNCE] mdb: Merkey's Linux Kernel Debugger 2.6.27-rc4
released
> On 22 Aug, Nick Piggin wrote:
>> On Friday 22 August 2008 00:09, Stefan Richter wrote:
>>> Nick Piggin wrote:
>>> > On Thursday 21 August 2008 22:26, jmerkey@...fmountaingroup.com
>>> wrote:
>>> >> It's simple to reproduce. Take away the volatile declaration for
>>> the
>>> >> rlock_t structure in mdb-ia32.c (rlock_t debug_lock) in all code
>>> >> references and watch the thing lock up in SMP with multiple
>>> processors
>>> >> in the debugger each stuck with their own local copy of debug_lock.
>>> >
>>> > You should disable preempt before getting the processor id. Can't see
>>> any
>>> > other possible bugs, but you should be able to see from the
>>> disassembly
>>> > pretty easily.
>>>
>>> debug_lock() is AFAICS only called from contexts which have preemption
>>> disabled. Last time around I recommended to Jeff to document this
>>> requirement on the calling context.
>>
>> I'm not talking about where debug_lock gets called, I'm talking about
>> where the processor id is derived that eventually filters down to
>> debug_lock.
>
> You are right, I replied to fast. debug_unlock() retrieves the
> processor itself, but not debug_lock().
>
>>> But even though preemption is disabled, debug_lock() is still incorrect
>>> as I mentioned in my other post a minute ago. It corrupts its .flags
>>> and .count members. (Or maybe it coincidentally doesn't as long as
>>> volatile is around.)
>>
>> I don't think so. And flags should only be restored by the processor
>> that saved it because the spinlock should disable preemption, right?
>
> OK; the .count seems alright due to restrictions of the calling
> contexts. About .flags: Jeff, can the following happen?
>
> - Context A on CPU A has interrupts enabled. Enters debug_lock(),
> thus disables its interrupts. (Saves its flags in rlock->flags with
> the plan to enable interrupts later when leaving debug_unlock()
> provided it does so as last holder.)
ints should already be off here.
>
> - Context B on CPU B happens to have interrupts disabled. Enters
> debug_lock(), overwrites rlock->flags with its different value.
> (Spins on the rlock which is held by CPU A.)
ints are disabled on both here (should be).
>
> - Context A on CPU A leaves debug_unlock. Doesn't re-enable its
> interrupts as planned, since rlock->flags is the one from CPU B.
> --
It should be benign since both procs have ints off here.
Stefan,
The flags needs fixing, you are right, however, since this case only
occurs when two processors both have an int1 exception (or some exception
other than an NMI) and ints are disabled here anyway, its benign. That
being said, it needs to be perfect. So I moved the flags to a stack
variable.
I will post a new patch series correcting the flags issue along with code
fragments showing the gcc breakage with debug lock later tonight. I have
a lot to get done at omega8 on appliance releases this week, and being a
48 years old this year, I am slower than I used to be.
:-)
Jeff
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
Powered by blists - more mailing lists