linux-kernel - Re: [PATCH v4 2/4] vm: add a syscall to map a process memory into a pipe

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-Id: <20171127154249.39e60ecf72019216f2f1782d@linux-foundation.org>
Date:   Mon, 27 Nov 2017 15:42:49 -0800
From:   Andrew Morton <akpm@...ux-foundation.org>
To:     Mike Rapoport <rppt@...ux.vnet.ibm.com>
Cc:     Alexander Viro <viro@...iv.linux.org.uk>, linux-mm@...ck.org,
        linux-fsdevel@...r.kernel.org, linux-kernel@...r.kernel.org,
        linux-api@...r.kernel.org, criu@...nvz.org,
        Arnd Bergmann <arnd@...db.de>,
        Pavel Emelyanov <xemul@...tuozzo.com>,
        Michael Kerrisk <mtk.manpages@...il.com>,
        Thomas Gleixner <tglx@...utronix.de>,
        Josh Triplett <josh@...htriplett.org>,
        Jann Horn <jannh@...gle.com>,
        Greg KH <gregkh@...uxfoundation.org>,
        Andrei Vagin <avagin@...nvz.org>,
        Andrei Vagin <avagin@...tuozzo.com>
Subject: Re: [PATCH v4 2/4] vm: add a syscall to map a process memory into a
 pipe

On Mon, 27 Nov 2017 09:19:39 +0200 Mike Rapoport <rppt@...ux.vnet.ibm.com> wrote:

> From: Andrei Vagin <avagin@...tuozzo.com>
> 
> It is a hybrid of process_vm_readv() and vmsplice().
> 
> vmsplice can map memory from a current address space into a pipe.
> process_vm_readv can read memory of another process.
> 
> A new system call can map memory of another process into a pipe.
> 
> ssize_t process_vmsplice(pid_t pid, int fd, const struct iovec *iov,
>                         unsigned long nr_segs, unsigned int flags)
> 
> All arguments are identical with vmsplice except pid which specifies a
> target process.
> 
> Currently if we want to dump a process memory to a file or to a socket,
> we can use process_vm_readv() + write(), but it works slow, because data
> are copied into a temporary user-space buffer.
> 
> A second way is to use vmsplice() + splice(). It is more effective,
> because data are not copied into a temporary buffer, but here is another
> problem. vmsplice works with the currect address space, so it can be
> used only if we inject our code into a target process.
> 
> The second way suffers from a few other issues:
> * a process has to be stopped to run a parasite code
> * a number of pipes is limited, so it may be impossible to dump all
>   memory in one iteration, and we have to stop process and inject our
>   code a few times.
> * pages in pipes are unreclaimable, so it isn't good to hold a lot of
>   memory in pipes.
> 
> The introduced syscall allows to use a second way without injecting any
> code into a target process.
> 
> My experiments shows that process_vmsplice() + splice() works two time
> faster than process_vm_readv() + write().
>
> It is particularly useful on a pre-dump stage. On this stage we enable a
> memory tracker, and then we are dumping  a process memory while a
> process continues work. On the first iteration we are dumping all
> memory, and then we are dumpung only modified memory from a previous
> iteration.  After a few pre-dump operations, a process is stopped and
> dumped finally. The pre-dump operations allow to significantly decrease
> a process downtime, when a process is migrated to another host.

What is the overall improvement in a typical dumping operation?

Does that improvement justify the addition of a new syscall, and all
that this entails?  If so, why?

Are there any other applications of this syscall?