linux-kernel - RE: CPU data cache across reboot/kexec for pmem/dax devices

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <65c687fd30cf2_afa4294bc@dwillia2-xfh.jf.intel.com.notmuch>
Date: Fri, 9 Feb 2024 12:15:57 -0800
From: Dan Williams <dan.j.williams@...el.com>
To: Mathieu Desnoyers <mathieu.desnoyers@...icios.com>, Dan Williams
	<dan.j.williams@...el.com>
CC: linux-kernel <linux-kernel@...r.kernel.org>, Dave Hansen
	<dave.hansen@...ux.intel.com>, <linux-cxl@...r.kernel.org>,
	<nvdimm@...ts.linux.dev>, Vishal Verma <vishal.l.verma@...el.com>, Dave Jiang
	<dave.jiang@...el.com>, rostedt <rostedt@...dmis.org>, "Masami Hiramatsu
 (Google)" <mhiramat@...nel.org>
Subject: RE: CPU data cache across reboot/kexec for pmem/dax devices

Mathieu Desnoyers wrote:
> Hi Dan,
> 
> In the context of extracting user-space trace data when the kernel crashes,
> the LTTng user-space tracer recommends using nvdimm/pmem to reserve an area
> of physical (volatile) RAM at boot (memmap=nn[KMG]!ss[KMG]), and use the
> resulting device to create/mount a dax-enabled fs (e.g. ext4).
> 
> We then use this filesystem to mmap() the shared memory files for the tracer.
> 
> I want to make sure that the very last events from the userspace tracer written
> to the memory mapped buffers (mmap()) by userspace are present after a
> warm-reboot (or kexec/kdump).
> 
> Note that the LTTng user-space tracer (LTTng-UST) does *not* issue any clflush
> (or equivalent pmem_persist() from libpmem) for performance reasons: ring buffer
> data is usually overwritten many times before the system actually crashes, and
> the only thing we really need to make sure is that the cache lines are not
> invalidated without write back.
> 
> So I understand that the main use-case for pmem is nvdimm, and that in order to
> guarantee persistence of the data on power off an explicit pmem_persist() is
> needed after each "transaction", but for the needs of tracing, is there some
> kind of architectural guarantee that the data present in the cpu data cache
> is not invalidated prior to write back in each of those scenarios ?
>
> - reboot with bios explicitly not clearing memory,

This one gives me pause, because a trip through the BIOS typically means
lots of resets and other low level magic, so this would likely require
pushing dirty data out of CPU caches prior to entering the BIOS code
paths.

So this either needs explicit cache flushing or mapping the memory with
write-through semantics. That latter one is not supported in the stack
today.

> - kexec/kdump.

This should maintain the state of CPU caches. As far as the CPU is
concerned it is just long jumping into a new kernel in memory without
resetting any CPU cache state.