linux-kernel - Re: [PATCH 0/2] kdump/mmap: Fix mmap of /proc/vmcore for s390

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <20130603152718.5ba4d05f@holzheu>
Date:	Mon, 3 Jun 2013 15:27:18 +0200
From:	Michael Holzheu <holzheu@...ux.vnet.ibm.com>
To:	Vivek Goyal <vgoyal@...hat.com>
Cc:	Zhang Yanfei <zhangyanfei.yes@...il.com>,
	"Eric W. Biederman" <ebiederm@...ssion.com>,
	HATAYAMA Daisuke <d.hatayama@...fujitsu.com>,
	Jan Willeke <willeke@...ibm.com>,
	Martin Schwidefsky <schwidefsky@...ibm.com>,
	Heiko Carstens <heiko.carstens@...ibm.com>,
	linux-kernel@...r.kernel.org, kexec@...ts.infradead.org,
	Andrew Morton <akpm@...ux-foundation.org>
Subject: Re: [PATCH 0/2] kdump/mmap: Fix mmap of /proc/vmcore for s390

On Fri, 31 May 2013 12:01:58 -0400
Vivek Goyal <vgoyal@...hat.com> wrote:

> On Fri, May 31, 2013 at 04:21:27PM +0200, Michael Holzheu wrote:
> > On Thu, 30 May 2013 16:38:47 -0400
> > Vivek Goyal <vgoyal@...hat.com> wrote:
> > 
> > > On Wed, May 29, 2013 at 01:51:44PM +0200, Michael Holzheu wrote:
> > > 

[...]

> > For zfcpdump currently we add a load from [0, HSA_SIZE] where
> > p_offset equals p_paddr. Therefore we can't distinguish in
> > copy_oldmem_page() if we read from oldmem (HSA) or newmem. The
> > range [0, HSA_SIZE] is used twice. As a workaroun we could use an
> > artificial p_offset for the HSA memory chunk that is not used by
> > the 1st kernel physical memory. This is not really beautiful, but
> > probably doable.
> 
> Ok, zfcpdump is a problem because HSA memory region is in addition to
> regular memory address space. 

Right and the HSA memory is accessed with a read() interface and can't
be directly mapped.

[...]

> If you decide not to do that, agreed that copy_oldmem_page() need to
> differentiate between reference to HSA memory and reference to new
> memory. I guess in that case we will have to go with original proposal
> of using arch functions to access and read headers.

Let me think about that a bit more ...

[...]

> > If copy_oldmem_page() now also must be able to copy to vmalloc
> > memory, we would have to add new code for that:
> > 
> > * oldmem -> newmem (real): Use direct memcpy_real()
> > * oldmem -> newmem (vmalloc): Use intermediate buffer with
> > memcpy_real()
> > * newmem -> newmem: Use memcpy()
> > 
> > What do you think?
> 
> Yep, looks like you will have to do something like that.
> 
> Can't we map HSA frames temporarily, copy data and tear down the
> mapping?

Yes, we would have to create a *temporarily* mapping (see suggestion
below). We do not have enough memory to copy the complete HSA.

> If not, how would remap_pfn_range() work with HSA region when
> /proc/vmcore is mmaped()?

I am no memory management expert, so I discussed that with Martin
Schwidefsky (s390 architecture maintainer). Perhaps something like
the following could work:

After vmcore_mmap() is called the HSA pages are not initially mapped in
the page tables. So when user space accesses those parts
of /proc/vmcore, a fault will be generated. We implement a mechanism
that in this case the HSA is copied to a new page in the page cache and
a mapping is created for it. Since the page is allocated in the page
cache, it can be released afterwards by the kernel when we get memory
pressure.

Our current idea for such an implementation:

* Create new address space (struct address_space) for /proc/vmcore.
* Implement new vm_operations_struct "vmcore_mmap_ops" with
  new vmcore_fault() ".fault" callback for /proc/vmcore.
* Set vma->vm_ops to vmcore_mmap_ops in mmap_vmcore().
* The vmcore_fault() function will get a new page cache page,
  copy HSA page to page cache page add it to vmcore address space.
  To see how this could work, we looked into the functions
  filemap_fault() in "mm/filemap.c" and relay_buf_fault() in
  "kernel/relay.c".

What do you think?

Michael

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/