lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <20090525025353.GA2580@elte.hu>
Date:	Mon, 25 May 2009 04:53:53 +0200
From:	Ingo Molnar <mingo@...e.hu>
To:	Yinghai Lu <yinghai@...nel.org>
Cc:	Pekka J Enberg <penberg@...helsinki.fi>,
	Linus Torvalds <torvalds@...ux-foundation.org>,
	"H. Peter Anvin" <hpa@...or.com>, Jeff Garzik <jgarzik@...ox.com>,
	Alexander Viro <viro@....linux.org.uk>,
	Rusty Russell <rusty@...tcorp.com.au>,
	Linux Kernel Mailing List <linux-kernel@...r.kernel.org>,
	Andrew Morton <akpm@...ux-foundation.org>,
	Peter Zijlstra <a.p.zijlstra@...llo.nl>
Subject: Re: [GIT PULL] scheduler fixes


* Yinghai Lu <yinghai@...nel.org> wrote:

> Pekka J Enberg wrote:
> > On Mon, 18 May 2009, Linus Torvalds wrote:
> >>>> I hate that stupid bootmem allocator. I suspect we seriously 
> >>>> over-use it, and that we _should_ be able to do the SL*B init 
> >>>> earlier.
> >>> Hm, tempting thought - not sure how to pull it off though.
> >> As far as I can recall, one of the things that historically made us want 
> >> to use the bootmem allocator even relatively late was that the real SLAB 
> >> allocator had to wait until all the node information etc was initialized. 
> >>
> >> That's pretty damn late. And I wonder if SLUB (and SLOB) might not need a 
> >> lot less initialization, and work much earlier. Something like that might 
> >> be the final nail in the coffin for SLAB, and convince me to just say 
> >> 'we don't support it any more".
> > 
> > Ingo, here's a patch that boots UMA+SMP+SLUB x86-64 kernel on qemu all 
> > the way to userspace. It probably breaks bunch of things for now but 
> > something for you to play with if you want.
> > 
> 
> updated with tip/master. also add change to cpupri_init
> otherwise will get 
> [    0.000000] Memory: 523096612k/537526272k available (10461k kernel code, 656156k absent, 13773504k reserved, 7186k data, 2548k init)
> [    0.000000] SLUB: Genslabs=14, HWalign=64, Order=0-3, MinObjects=0, CPUs=32, Nodes=8
> [    0.000000] ------------[ cut here ]------------
> [    0.000000] WARNING: at kernel/lockdep.c:2282 lockdep_trace_alloc+0xaf/0xee()
> [    0.000000] Hardware name: Sun Fire X4600 M2
> [    0.000000] Modules linked in:
> [    0.000000] Pid: 0, comm: swapper Not tainted 2.6.30-rc6-tip-01778-g0afdd0f-dirty #259
> [    0.000000] Call Trace:
> [    0.000000]  [<ffffffff810a0274>] ? lockdep_trace_alloc+0xaf/0xee
> [    0.000000]  [<ffffffff81075ab0>] warn_slowpath_common+0x88/0xcb
> [    0.000000]  [<ffffffff81075b15>] warn_slowpath_null+0x22/0x38
> [    0.000000]  [<ffffffff810a0274>] lockdep_trace_alloc+0xaf/0xee
> [    0.000000]  [<ffffffff8110301b>] kmem_cache_alloc_node+0x38/0x14d
> [    0.000000]  [<ffffffff813ec548>] ? alloc_cpumask_var_node+0x4a/0x10a
> [    0.000000]  [<ffffffff8109eb61>] ? lockdep_init_map+0xb9/0x564
> [    0.000000]  [<ffffffff813ec548>] alloc_cpumask_var_node+0x4a/0x10a
> [    0.000000]  [<ffffffff813ec62c>] alloc_cpumask_var+0x24/0x3a
> [    0.000000]  [<ffffffff819e6306>] cpupri_init+0x7f/0x112
> [    0.000000]  [<ffffffff819e5a30>] init_rootdomain+0x72/0xb7
> [    0.000000]  [<ffffffff821facce>] sched_init+0x109/0x660
> [    0.000000]  [<ffffffff82203082>] ? kmem_cache_init+0x193/0x1b2
> [    0.000000]  [<ffffffff821dfd7a>] start_kernel+0x218/0x3f3
> [    0.000000]  [<ffffffff821df2a9>] x86_64_start_reservations+0xb9/0xd4
> [    0.000000]  [<ffffffff821df3b2>] x86_64_start_kernel+0xee/0x109
> [    0.000000] ---[ end trace a7919e7f17c0a725 ]---
> 
> works with 8 sockets numa amd64 box.
> 
> YH
> 
> ---
>  init/main.c           |   28 ++++++++++++++++------------
>  kernel/irq/handle.c   |   23 ++++++++---------------
>  kernel/sched.c        |   34 +++++++++++++---------------------
>  kernel/sched_cpupri.c |    9 ++++++---
>  mm/slub.c             |   17 ++++++++++-------
>  5 files changed, 53 insertions(+), 58 deletions(-)

Very nice!

Would it be possible to restructure things to move kmalloc init to 
before IRQ init as well? We have a couple of uglinesses there too.

Conceptually, memory should be the first thing set up in general, in 
a kernel. It does not need IRQs, timers, the scheduler or any of the 
IO facilities and abstractions. All of them need memory though - and 
as Linux scales to more and more hardware via the same single image, 
so will we get more and more dynamic concepts like cpumask_var_t and 
sparse-irqs, which want to allocate very early.

setup_arch() is one huge function that sets up all architecture 
details at once - but if we split a separate setup_arch_mem() out of 
it, and left the rest in setup_arch (and moved it further down), we 
could remove much of bootmem (especially the ugly uses).

This might even be doable realistically, and we could thus librarize 
bootmem and eliminate it from x86 at least. Perhaps.

	Ingo
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ