lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Date:   Thu, 1 Jun 2017 10:09:09 +0200
From:   Michal Hocko <mhocko@...nel.org>
To:     Mike Rapoport <rppt@...ux.vnet.ibm.com>
Cc:     Andrea Arcangeli <aarcange@...hat.com>,
        Vlastimil Babka <vbabka@...e.cz>,
        "Kirill A. Shutemov" <kirill@...temov.name>,
        Andrew Morton <akpm@...ux-foundation.org>,
        Arnd Bergmann <arnd@...db.de>,
        "Kirill A. Shutemov" <kirill.shutemov@...ux.intel.com>,
        Pavel Emelyanov <xemul@...tuozzo.com>,
        linux-mm <linux-mm@...ck.org>,
        lkml <linux-kernel@...r.kernel.org>,
        Linux API <linux-api@...r.kernel.org>
Subject: Re: [PATCH] mm: introduce MADV_CLR_HUGEPAGE

On Thu 01-06-17 09:53:02, Mike Rapoport wrote:
> On Tue, May 30, 2017 at 04:39:41PM +0200, Michal Hocko wrote:
> > On Tue 30-05-17 16:04:56, Andrea Arcangeli wrote:
> > > 
> > > UFFDIO_COPY while not being a major slowdown for sure, it's likely
> > > measurable at the microbenchmark level because it would add a
> > > enter/exit kernel to every 4k memcpy. It's not hard to imagine that as
> > > measurable. How that impacts the total precopy time I don't know, it
> > > would need to be benchmarked to be sure.
> > 
> > Yes, please!
> 
> I've run a simple test (below) that fills 1G of memory either with memcpy
> of ioctl(UFFDIO_COPY) in 4K chunks.
> The machine I used has two "Intel(R) Xeon(R) CPU E5-2680 0 @ 2.70GHz" and
> 128G of RAM.
> I've averaged elapsed time reported by /usr/bin/time over 100 runs and here
> what I've got:
> 
> memcpy with THP on: 0.3278 sec
> memcpy with THP off: 0.5295 sec
> UFFDIO_COPY: 0.44 sec

I assume that the standard deviation is small?
 
> That said, for the CRIU usecase UFFDIO_COPY seems faster that disabling THP
> and then doing memcpy.

That is a bit surprising. I didn't think that the userfault syscall
(ioctl) can be faster than a regular #PF but considering that
__mcopy_atomic bypasses the page fault path and it can be optimized for
the anon case suggests that we can save some cycles for each page and so
the cumulative savings can be visible.

-- 
Michal Hocko
SUSE Labs

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ