[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <Pine.LNX.4.64.0609301713460.3952@g5.osdl.org>
Date:	Sat, 30 Sep 2006 17:25:12 -0700 (PDT)
From:	Linus Torvalds <torvalds@...l.org>
To:	Andi Kleen <ak@...e.de>
cc:	Eric Rannaud <eric.rannaud@...il.com>,
	Linux Kernel Mailing List <linux-kernel@...r.kernel.org>,
	Ingo Molnar <mingo@...e.hu>, Andrew Morton <akpm@...l.org>,
	nagar@...son.ibm.com, Chandra Seetharaman <sekharan@...ibm.com>,
	Jan Beulich <jbeulich@...ell.com>
Subject: Re: BUG-lockdep and freeze (was: Arrr! Linux 2.6.18)
On Sun, 1 Oct 2006, Andi Kleen wrote:
> 
> On i386 it is simpler (only one interrupt stack and one process stack)
> However there can be still nasty corner cases, like the temporary NMI stacks
> that were added recently. It could be probably all handled in a state machine,
> but it would be ugly (at least I heard complaints about the x86 code that
> does it) 
No, I don't understand AT ALL why you are so hung up about this state 
machine thing. It's not true. There's simply no state machine involved, 
although from a theoretical standpoint you can obviously always implement 
just about _anything_ as a state machine.
> > Read the x86 code. Please. The one _before_ you added unwinding.
> 
> It's still the same if you disable unwinding.
I think your problem is that you think that "unwinding" needs to handle 
all the page crossing. That's incorrect, and that just results in stupid 
and unworkable code.
Instead (and this is why I was trying to point you to the original 
pre-unwinding code on i386), what you should do is to see it as two 
totally independent phases:
 - you have an outer loop that loops around the pages (since the _kernel_ 
   controls the stack nesting at that level). This is the loop I quoted at 
   you.
 - you have a _separate_ "unwinder()" for each page. It only unwinds 
   within that one page, and if the frame moves away from the page, it 
   immediately just returns that address, but it knows that it cannot be a 
   "valid" unwind address within that page.
That separate "unwind within one page" can well be using the dwarf info: 
it only really needs to verify that
 - it stays within the same page
 - the unwind info results in an aligned pointer at a strictly higher 
   address.
those two tests are trivial, and _guarantee_ that we don't access any 
half-way untrustworthy pointer.
See? No state machine. And notice how the dwarf info absolutely does _not_ 
need to know about the magic page-crossing events like interrupts, 
exceptions or anything else. Very much on purpose.
This is what we used to do with %ebp following (at least on x86), and what 
I tried to explain. Nothing complicated. And it's easy to set up, and the 
dawrf unwinder (which depends on complex data structures) can be trivially 
verified (ie it just stops immediately when it doesn't understand 
something, and "crosses a page" is such an event).
And the page-crossing events don't need to know _anything_ about the dwarf 
format (or %ebp, or any other unwinding details), and it can just depend 
on the chain of pages (which is trivial to set up - if the exception pages 
aren't already using the same format as the interrupt stack pages, it 
should at least not be hard to make them do that).
And the page-level unwinding data format is trivial. I don't think we even 
bothered verifying it on x86, but I guess some simple sanity-checking even 
there might be worth it (but unlike any dwarf unwinding, this is _not_ at 
all complicated, and there are absolutely _zero_ issues about compiler and 
linker versions etc etc).
			Linus
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/
Powered by blists - more mailing lists
 
