linux-kernel - Re: [PATCH v2 RESEND] x86: optimize memcpy

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <20180622013049.GA12505@gmail.com>
Date:   Fri, 22 Jun 2018 03:30:49 +0200
From:   Ingo Molnar <mingo@...nel.org>
To:     Mikulas Patocka <mpatocka@...hat.com>
Cc:     Mike Snitzer <snitzer@...hat.com>,
        Thomas Gleixner <tglx@...utronix.de>,
        Dan Williams <dan.j.williams@...el.com>,
        device-mapper development <dm-devel@...hat.com>,
        X86 ML <x86@...nel.org>, linux-kernel@...r.kernel.org
Subject: Re: [PATCH v2 RESEND] x86: optimize memcpy_flushcache


* Mikulas Patocka <mpatocka@...hat.com> wrote:

> On Thu, 21 Jun 2018, Ingo Molnar wrote:
> 
> > 
> > * Mike Snitzer <snitzer@...hat.com> wrote:
> > 
> > > From: Mikulas Patocka <mpatocka@...hat.com>
> > > Subject: [PATCH v2] x86: optimize memcpy_flushcache
> > > 
> > > In the context of constant short length stores to persistent memory,
> > > memcpy_flushcache suffers from a 2% performance degradation compared to
> > > explicitly using the "movnti" instruction.
> > > 
> > > Optimize 4, 8, and 16 byte memcpy_flushcache calls to explicitly use the
> > > movnti instruction with inline assembler.
> > 
> > Linus requested asm optimizations to include actual benchmarks, so it would be 
> > nice to describe how this was tested, on what hardware, and what the before/after 
> > numbers are.
> > 
> > Thanks,
> > 
> > 	Ingo
> 
> It was tested on 4-core skylake machine with persistent memory being 
> emulated using the memmap kernel option. The dm-writecache target used the 
> emulated persistent memory as a cache and sata SSD as a backing device. 
> The patch results in 2% improved throughput when writing data using dd.
> 
> I don't have access to the machine anymore.

I think this information is enough, but do we know how well memmap emulation 
represents true persistent memory speed and cache management characteristics?
It might be representative - but I don't know for sure, nor probably most
readers of the changelog.

So could you please put all this into an updated changelog, and also add a short 
description that outlines exactly which codepaths end up using this method in a 
typical persistent memory setup? All filesystem ops - or only reads, etc?

Thanks,

	Ingo