[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <20130425133825.GA25089@sgi.com>
Date: Thu, 25 Apr 2013 08:38:25 -0500
From: Cliff Wickman <cpw@....com>
To: HATAYAMA Daisuke <d.hatayama@...fujitsu.com>
Cc: ebiederm@...ssion.com, vgoyal@...hat.com,
kumagai-atsushi@....nes.nec.co.jp, lisa.mitchell@...com,
kexec@...ts.infradead.org, linux-kernel@...r.kernel.org
Subject: Re: [PATCH v4 0/8] kdump, vmcore: support mmap() on /proc/vmcore
On Fri, Apr 05, 2013 at 12:04:02AM +0000, HATAYAMA Daisuke wrote:
> Currently, read to /proc/vmcore is done by read_oldmem() that uses
> ioremap/iounmap per a single page. For example, if memory is 1GB,
> ioremap/iounmap is called (1GB / 4KB)-times, that is, 262144
> times. This causes big performance degradation.
>
> In particular, the current main user of this mmap() is makedumpfile,
> which not only reads memory from /proc/vmcore but also does other
> processing like filtering, compression and IO work.
>
> To address the issue, this patch implements mmap() on /proc/vmcore to
> improve read performance.
>
> Benchmark
> =========
>
> You can see two benchmarks on terabyte memory system. Both show about
> 40 seconds on 2TB system. This is almost equal to performance by
> experimtanal kernel-side memory filtering.
>
> - makedumpfile mmap() benchmark, by Jingbai Ma
> https://lkml.org/lkml/2013/3/27/19
>
> - makedumpfile: benchmark on mmap() with /proc/vmcore on 2TB memory system
> https://lkml.org/lkml/2013/3/26/914
>
> ChangeLog
> =========
>
> v3 => v4)
>
> - Rebase 3.9-rc7.
> - Drop clean-up patches orthogonal to the main topic of this patch set.
> - Copy ELF note segments in the 1st kernel just as in v1. Allocate
> vmcore objects per pages. => See [PATCH 5/8]
> - Map memory referenced by PT_LOAD entry directly even if the start or
> end of the region doesn't fit inside page boundary, no longer copy
> them as the previous v3. Then, holes, outside OS memory, are visible
> from /proc/vmcore. => See [PATCH 7/8]
>
> v2 => v3)
>
> - Rebase 3.9-rc3.
> - Copy program headers seprately from e_phoff in ELF note segment
> buffer. Now there's no risk to allocate huge memory if program
> header table positions after memory segment.
> - Add cleanup patch that removes unnecessary variable.
> - Fix wrongly using the variable that is buffer size configurable at
> runtime. Instead, use the varibale that has original buffer size.
>
> v1 => v2)
>
> - Clean up the existing codes: use e_phoff, and remove the assumption
> on PT_NOTE entries.
> - Fix potencial bug that ELF haeader size is not included in exported
> vmcoreinfo size.
> - Divide patch modifying read_vmcore() into two: clean-up and primary
> code change.
> - Put ELF note segments in page-size boundary on the 1st kernel
> instead of copying them into the buffer on the 2nd kernel.
>
> Test
> ====
>
> This patch set is composed based on v3.9-rc7.
>
> Done on x86-64, x86-32 both with 1GB and over 4GB memory environments.
>
> ---
>
> HATAYAMA Daisuke (8):
> vmcore: support mmap() on /proc/vmcore
> vmcore: treat memory chunks referenced by PT_LOAD program header entries in \
> page-size boundary in vmcore_list
> vmcore: count holes generated by round-up operation for page boudary for size \
> of /proc/vmcore
> vmcore: copy ELF note segments in the 2nd kernel per page vmcore objects
> vmcore: Add helper function vmcore_add()
> vmcore, procfs: introduce MEM_TYPE_CURRENT_KERNEL flag to distinguish objects \
> copied in 2nd kernel vmcore: clean up read_vmcore()
> vmcore: allocate buffer for ELF headers on page-size alignment
>
>
> fs/proc/vmcore.c | 349 ++++++++++++++++++++++++++++++++---------------
> include/linux/proc_fs.h | 8 +
> 2 files changed, 245 insertions(+), 112 deletions(-)
>
> --
>
> Thanks.
> HATAYAMA, Daisuke
This is a very important patch set for speeding the kdump process.
(patches 1 - 8)
We have found the mmap interface to /proc/vmcore about 80x faster than the
read interface.
That is, doing mmap's and copying data (in pieces the size of page
structures) transfers all of /proc/vmcore about 80 times faster than
reading it.
This greatly speeds up the capture of a kdump, as the scan of page
structures takes the bulk of the time in dumping the OS on a machine
with terabytes of memory.
We would very much like to see this set make it into the 3.10 release.
Acked-by: Cliff Wickman <cpw@....com>
-Cliff
--
Cliff Wickman
SGI
cpw@....com
(651) 683-3824
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
Powered by blists - more mailing lists