[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-Id: <1372133667.2776.145@driftwood>
Date: Mon, 24 Jun 2013 23:14:27 -0500
From: Rob Landley <rob@...dley.net>
To: Nathan Zimmer <nzimmer@....com>
Cc: holt@....com, travis@....com, nzimmer@....com, tglx@...utronix.de,
mingo@...hat.com, hpa@...or.com, yinghai@...nel.org,
akpm@...ux-foundation.org, gregkh@...uxfoundation.org,
x86@...nel.org, linux-doc@...r.kernel.org,
linux-kernel@...r.kernel.org
Subject: Re: [RFC 1/2] x86_64, mm: Delay initializing large portion of
memory
On 06/21/2013 11:25:33 AM, Nathan Zimmer wrote:
> On a 16TB system it can takes upwards of two hours to boot the system
> with
> about 60% of the time being spent initializing memory. This patch
> delays
> initializing a large portion of memory until after the system is
> booted.
> This can significantly reduce the time it takes the boot the system
> down
> to the 15 to 30 minute range.
Why is this conditional? Initialize the minimum amount of memory to
bring up each NUMA node, and then have each processor initialize its
own memory. I would have thought it was already doing this...
> + delay_mem_init=B:M:n:l:h
> + This delays the initialization of a large
> portion of
> + memory by inserting it into the "absent" memory
> list.
> + This allows the system to boot up much faster
> at the
> + expense of the time needed to add this absent
> memory
> + after the system has booted. That however can
> be done
> + in parallel with other operations.
This seems like a giant advertisement primarily aimed at repeating why
you think we need to merge the patch, not explaining what it is or how
to use it.
I would rephrase:
Defer memory initialization until after SMP
init (so
large memory ranges can be initialized in
parallel) by
moving memory not needed during boot to the
"absent" list.
And I repeat: why do we need to micromanage this? It sounds like all
NUMA systems should do something like this. (Single-threaded memory
initialization in an SMP system is kind of weird.)
> + Format: B:M:n:l:h
> + (1 << B) is the block size (bsize)
> + ['0' indicates use the default
> 128M]
> + (1 << M) is the address space per node
> + (n * bsize) is minimum sized node memory to
> slice
> + (l * bisze) is low memory to leave on node
> + (h * bisze) is high memory to leave on node
I don't understand this in the slightest. I understand "low memory to
leave on the node", I have no idea why there are four other parameters.
> +config DELAY_MEM_INIT
> + bool "Delay memory initialization"
> + depends on EFI && MEMORY_HOTPLUG_SPARSE
> + ---help---
> + This option delays initializing a large portion of memory
> + until after the system is booted. This can significantly
> + reduce the time it takes the boot the system when there
> + is a significant amount of memory present. Systems with
> + 8TB or more of memory benefit the most.
I can see an SMP phone wanting to use this to shave a quarter second
off its boot time. Your "large portion of memory" description is a bit
myopic.
Rob--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
Powered by blists - more mailing lists