linux-kernel - Re: [BUG] Oops on boot (probably ACPI related)

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-Id: <200609272250.21925.ak@suse.de>
Date:	Wed, 27 Sep 2006 22:50:21 +0200
From:	Andi Kleen <ak@...e.de>
To:	Linus Torvalds <torvalds@...l.org>
Cc:	Kyle McMartin <kyle@...isc-linux.org>,
	linux-kernel@...r.kernel.org, linux-acpi@...r.kernel.org,
	akpm@...l.org, jbeulich@...ell.com
Subject: Re: [BUG] Oops on boot (probably ACPI related)

On Wednesday 27 September 2006 22:35, Linus Torvalds wrote:
> 
> On Wed, 27 Sep 2006, Linus Torvalds wrote:
> > 
> > On Wed, 27 Sep 2006, Andi Kleen wrote:
> > > 
> > > I expect this patch to fix it.
> > 
> > Andrew, Kyle, can you verify?
> 
> Not that it really matters. Andi sure as hell pinpointed a real problem 
> with the new and broken inline asm. That's almost certainly the bug that 
> crept in during the recent rewrite.
> 
> HOWEVER, now that I look more closely at the rewrite, I'm really wondering 
> whether the rewrite was worth it at all. It generates smaller code, but at 
> the expense of
> 
>  - the actual cache-footprint is bigger
>  - the branch will now be mis-predicted by default

It doesn't matter much because these days this stuff is all out of lined
anyways and in a single function. And the dynamic branch predictor
in all modern CPUs will usually cache the decision (unlocked) there.

(Actually there is something dumb  left -- on a non preempt kernel
spin_unlock caller is larger than doing it inline. But that is left
for fixing later)

> The fact that rewinders have problems is fairly immaterial. Maybe we 
> should just take this as a hint that all the stupid rewinding code was 
> wrong in the first place, and we should stop doing that? We can go back 
> to just printing out our stacktrace guesse

> 
> 		Linus
> 
s, that has worked for us for a 
> long time, and the stack unwinding simply looks _fundamentally_ flawed.

Unfortunately Linux is a lot more complex than it was in the early days.

> So I have a real urge to just revert that change anyway.
> 
> Are there any _real_ advantages to this broken unwinding code that has had 
> more bugs that Windows XP?

I thought for a long time we didn't need it either, but these days with all 
these callbacks in some parts of the kernel (driver model, others) and you 
get a oops with 60+ entries it is just too much trouble to figure it out manually.

I admit when I took the code I didn't realize that dwarf2 has these
problems (not supporting out of line sections is clearly a spec
bug and would even hit gcc generated code). But we don't have 
that many out of line sections anyways, so it's not that big an issue. 

And all the people who process a lot of oopses (e.g. Andrew, Ingo, others) tend
to use frame pointers by default anyways. They already voted with their feet.
And the unwinder certainly gives better code than frame pointers. The mispredicted
branches you're worrying about are nothing against frame pointers 
(e.g. on K8 FP tends to stall the CPU on each function call slightly)

Anyways, in theory it would be possible to keep the out of line sections
and define some own dwarf2 extension that allows us to express them.
Jan might have some thoughts on it. But I didn't think it was worth
it for these cases due to the reasons above.

-Andi
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/