lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <20090821115847.GE24647@elte.hu>
Date:	Fri, 21 Aug 2009 13:58:47 +0200
From:	Ingo Molnar <mingo@...e.hu>
To:	Peter Zijlstra <a.p.zijlstra@...llo.nl>
Cc:	linux-tip-commits@...r.kernel.org,
	Arjan van de Ven <arjan@...radead.org>,
	Alan Cox <alan@...rguk.ukuu.org.uk>,
	Andrew Morton <akpm@...ux-foundation.org>,
	Dave Jones <davej@...hat.com>,
	Kyle McMartin <kyle@...artin.ca>, Greg KH <gregkh@...e.de>,
	linux-kernel@...r.kernel.org, hpa@...or.com, mingo@...hat.com,
	torvalds@...ux-foundation.org, catalin.marinas@....com,
	jens.axboe@...cle.com, fweisbec@...il.com, stable@...nel.org,
	srostedt@...hat.com, tglx@...utronix.de
Subject: Re: [tip:tracing/urgent] tracing: Fix too large stack usage in
	do_one_initcall()


* Peter Zijlstra <a.p.zijlstra@...llo.nl> wrote:

> On Fri, 2009-08-21 at 13:14 +0200, Ingo Molnar wrote:
> 
> > > There's a lot of fat functions on that stack trace, but
> > > the largest of all is do_one_initcall(). This is due to
> > > the boot trace entry variables being on the stack.
> > > 
> > > Fixing this is relatively easy, initcalls are fundamentally
> > > serialized, so we can move the local variables to file scope.
> > > 
> > > Note that this large stack footprint was present for a
> > > couple of months already - what pushed my system over
> > > the edge was the addition of kmemleak to the call-chain:
> > > 
> > >   6)     3328      36   allocate_slab+0xb1/0x100
> > >   7)     3292      36   new_slab+0x1c/0x160
> > >   8)     3256      36   __slab_alloc+0x133/0x2b0
> > >   9)     3220       4   kmem_cache_alloc+0x1bb/0x1d0
> > >  10)     3216     108   create_object+0x28/0x250
> > >  11)     3108      40   kmemleak_alloc+0x81/0xc0
> > >  12)     3068      24   kmem_cache_alloc+0x162/0x1d0
> > >  13)     3044      52   scsi_pool_alloc_command+0x29/0x70
> > > 
> > > This pushes the total to ~3800 bytes, only a tiny bit
> > > more was needed to corrupt the on-kernel-stack thread_info.
> > > 
> > > The fix reduces the stack footprint from 572 bytes
> > > to 28 bytes.
> > 
> > btw., it will just take two more features like kmemleak to trigger 
> > hard to debug stack overflows again on 32-bit. We are right at the 
> > edge and this situation is not really fixable in a reliable way 
> > anymore.
> > 
> > So i think we should be more drastic and solve the real problem: we 
> > should drop 4K stacks and 8K combo-stacks on 32-bit, and go 
> > exclusively to 8K split stacks on 32-bit.
> > 
> > I.e. the stack size will be 'unified' too between 64-bit and 32-bit 
> > to a certain degree: process stacks will be 8K on both 64-bit and 
> > 32-bit x86, IRQ stacks will be separate. (on 64-bit we also have the 
> > IST stacks for certain exceptions that further isolates things)
> > 
> > This will simplify the 32-bit situation quite a bit and removes a 
> > contentious config option and makes the kernel more robust in 
> > general. 8K combo stacks are not safe due to irq nesting and 4K 
> > isolated stacks are not enough. 8K isolated stacks is the way to go.
> > 
> > Opinions?
> 
> I'm obviously all in favour of merging the i386 and x86_64 stack 
> code. Esp after having had to look at the i386 stuff recently.

ok.

> Now I don't think that unifying all this requires the sizes to be 
> the same between them, because x86_64 typically has larger stack 
> footprint due to it being 64 bit. If we need to bump 32 bit stack 
> sizes, then we're likely to also need a bump in 64 bit as well at 
> some point soon.

Well 64-bit is larger, but not twice as large. Here are the factors 
('+' increases stack footprint, '.' is neutral, '-' decreases it):

 + pointers are 2x as large
 + alignment can cause 4 byte holes
 . other data is generally the same size
 - it has less register pressure so fewer stack spills

So it's far from 2x size.

Btw., i've measured this precisely: head to head the same .config 
triggers the following worst-case stack footprint critical path:

 32-bit:  0)     3704      52   __change_page_attr+0xb8/0x290
 64-bit:  0)     5672     112   __change_page_attr+0xc1/0x2f0

So 64-bit has almost precisely +50% stack footprint. (same compiler, 
etc.)

And since 64-bit has larger hardware and gets stress-tested more 
these days than 32-bit, i think it's time to flip it around: now the 
pressure is to keep things within the 64-bit 8K stack, not the other 
way around.

	Ingo
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ