linux-kernel - RE: [PATCHV2 3/3] x86, ras: Add mcsafe_memcpy() function to recover from machine checks

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <94D0CD8314A33A4D9D801C0FE68B40295BE9F290@G4W3202.americas.hpqcorp.net>
Date:	Tue, 15 Dec 2015 19:19:58 +0000
From:	"Elliott, Robert (Persistent Memory)" <elliott@....com>
To:	Borislav Petkov <bp@...en8.de>,
	Dan Williams <dan.j.williams@...el.com>
CC:	"Luck, Tony" <tony.luck@...el.com>,
	linux-nvdimm <linux-nvdimm@...1.01.org>, X86 ML <x86@...nel.org>,
	"linux-kernel@...r.kernel.org" <linux-kernel@...r.kernel.org>,
	Linux MM <linux-mm@...ck.org>,
	Andy Lutomirski <luto@...nel.org>,
	Andrew Morton <akpm@...ux-foundation.org>,
	Ingo Molnar <mingo@...nel.org>
Subject: RE: [PATCHV2 3/3] x86, ras: Add mcsafe_memcpy() function to recover
 from machine checks



> -----Original Message-----
> From: Linux-nvdimm [mailto:linux-nvdimm-bounces@...ts.01.org] On Behalf
> Of Borislav Petkov
> Sent: Tuesday, December 15, 2015 12:39 PM
> To: Dan Williams <dan.j.williams@...el.com>
> Cc: Luck, Tony <tony.luck@...el.com>; linux-nvdimm <linux-
> nvdimm@...1.01.org>; X86 ML <x86@...nel.org>; linux-
> kernel@...r.kernel.org; Linux MM <linux-mm@...ck.org>; Andy Lutomirski
> <luto@...nel.org>; Andrew Morton <akpm@...ux-foundation.org>; Ingo Molnar
> <mingo@...nel.org>
> Subject: Re: [PATCHV2 3/3] x86, ras: Add mcsafe_memcpy() function to
> recover from machine checks
> 
> On Tue, Dec 15, 2015 at 10:35:49AM -0800, Dan Williams wrote:
> > Correction we have MOVNTDQA, but that requires saving the fpu state
> > and marking the memory as WC, i.e. probably not worth it.
> 
> Not really. Last time I tried an SSE3 memcpy in the kernel like glibc
> does, it wasn't worth it. The enhanced REP; MOVSB is hands down faster.

Reading from NVDIMM, rep movsb is efficient, but it 
fills the CPU caches with the NVDIMM addresses. For
large data moves (not uncommon for storage) this
will crowd out more important cacheable data.

For normal block device reads made through the pmem
block device driver, this CPU cache consumption is
wasteful, since it is unlikely the application will
ask pmem to read the same addresses anytime soon.
Due to the historic long latency of storage devices,
applications don't re-read from storage again; they
save the results.  So, the streaming-load
instructions are beneficial:
* movntdqa (16-byte xmm registers) 
* vmovntdqa (32-byte ymm registers)
* vmovntdqa (64-byte zmm registers)

Dan Williams wrote:
> Correction we have MOVNTDQA, but that requires
> saving the fpu state and marking the memory as WC
> i.e. probably not worth it.

Although the WC memory type is described in the SDM
in the most detail:
    "An implementation may also make use of the
    non-temporal hint associated with this instruction
    if the memory source is WB (write back) memory
    type. ... may optimize cache reads generated by 
    (V)MOVNTDQA on WB memory type to reduce cache 
    evictions."

For applications doing loads from mmap() DAX memory, 
the CPU cache usage could be worthwhile, because
applications expect mmap() regions to consist of
traditional writeback-cached memory and might do
lots of loads/stores.

Writing to the NVDIMM requires either:
* non-temporal stores; or
* normal stores + cache flushes + fences

movnti is OK for small transfers, but these are
better for bulk moves:
* movntdq (16-byte xmm registers)
* vmovntdq (32-byte ymm registers)
* vmovntdq (64-byte zmm registers)

---
Robert Elliott, HPE Persistent Memory

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/