linux-kernel - Re: [PATCH v4] memregion: Add cpu_cache_invalidate

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <3ce6ef93-2f47-eda3-f974-0cfea7f43766@intel.com>
Date:   Thu, 27 Oct 2022 09:18:13 -0700
From:   Dave Hansen <dave.hansen@...el.com>
To:     Dan Williams <dan.j.williams@...el.com>, linux-cxl@...r.kernel.org
Cc:     Thomas Gleixner <tglx@...utronix.de>,
        Ingo Molnar <mingo@...hat.com>, Borislav Petkov <bp@...en8.de>,
        Dave Hansen <dave.hansen@...ux.intel.com>, x86@...nel.org,
        "H. Peter Anvin" <hpa@...or.com>,
        Andy Lutomirski <luto@...nel.org>,
        Peter Zijlstra <peterz@...radead.org>,
        Davidlohr Bueso <dave@...olabs.net>, nvdimm@...ts.linux.dev,
        linux-kernel@...r.kernel.org
Subject: Re: [PATCH v4] memregion: Add cpu_cache_invalidate_memregion()
 interface

On 10/25/22 13:05, Dan Williams wrote:
> Users must first call cpu_cache_has_invalidate_memregion() to know whether
> this functionality is available on the architecture. Only enable it on
> x86-64 via the wbinvd() hammer. Hypervisors are not supported as TDX
> guests may trigger a virtualization exception and may need proper handling
> to recover. See:
That sentence doesn't quite parse correctly.  Does it need to be "and
may trigger..."?

> This global cache invalidation facility,
> cpu_cache_invalidate_memregion(), is exported to modules since NVDIMM
> and CXL support can be built as a module. However, the intent is that
> this facility is not available outside of specific "device-memory" use
> cases. To that end the API is scoped to a new "DEVMEM" module namespace
> that only applies to the NVDIMM and CXL subsystems.

Oh, thank $DEITY you're trying to limit the amount of code that has
access to this thing.

> diff --git a/arch/x86/Kconfig b/arch/x86/Kconfig
> index 67745ceab0db..b68661d0633b 100644
> --- a/arch/x86/Kconfig
> +++ b/arch/x86/Kconfig
> @@ -69,6 +69,7 @@ config X86
>  	select ARCH_ENABLE_THP_MIGRATION if X86_64 && TRANSPARENT_HUGEPAGE
>  	select ARCH_HAS_ACPI_TABLE_UPGRADE	if ACPI
>  	select ARCH_HAS_CACHE_LINE_SIZE
> +	select ARCH_HAS_CPU_CACHE_INVALIDATE_MEMREGION  if X86_64

What is 64-bit only about this?

I don't expect to have a lot of NVDIMMs or CXL devices on 32-bit
kernels, but it would be nice to remove this if it's not strictly
needed.  Or, to add a changelog nugget that says:

	Restrict this to X86_64 kernels.  It would probably work on 32-
	bit, but there is no practical reason to use 32-bit kernels and
	no one is testing them.

>  	select ARCH_HAS_CURRENT_STACK_POINTER
>  	select ARCH_HAS_DEBUG_VIRTUAL
>  	select ARCH_HAS_DEBUG_VM_PGTABLE	if !X86_PAE
> diff --git a/arch/x86/mm/pat/set_memory.c b/arch/x86/mm/pat/set_memory.c
> index 97342c42dda8..8650bb6481a8 100644
> --- a/arch/x86/mm/pat/set_memory.c
> +++ b/arch/x86/mm/pat/set_memory.c
> @@ -330,6 +330,21 @@ void arch_invalidate_pmem(void *addr, size_t size)
>  EXPORT_SYMBOL_GPL(arch_invalidate_pmem);
>  #endif
>  
> +#ifdef CONFIG_ARCH_HAS_CPU_CACHE_INVALIDATE_MEMREGION
> +bool cpu_cache_has_invalidate_memregion(void)
> +{
> +	return !cpu_feature_enabled(X86_FEATURE_HYPERVISOR);
> +}
> +EXPORT_SYMBOL_NS_GPL(cpu_cache_has_invalidate_memregion, DEVMEM);
> +
> +int cpu_cache_invalidate_memregion(int res_desc)
> +{
> +	wbinvd_on_all_cpus();
> +	return 0;
> +}

Does this maybe also deserve a:

	WARN_ON_ONCE(!cpu_cache_has_invalidate_memregion());

in case one of the cpu_cache_invalidate_memregion() paths missed a
cpu_cache_has_invalidate_memregion() check?

> +/**
> + * cpu_cache_invalidate_memregion - drop any CPU cached data for
> + *     memregions described by @res_desc
> + * @res_desc: one of the IORES_DESC_* types
> + *
> + * Perform cache maintenance after a memory event / operation that
> + * changes the contents of physical memory in a cache-incoherent manner.
> + * For example, device memory technologies like NVDIMM and CXL have
> + * device secure erase, or dynamic region provision features where such
> + * semantics.

s/where/with/ ?

> + * Limit the functionality to architectures that have an efficient way
> + * to writeback and invalidate potentially terabytes of memory at once.
> + * Note that this routine may or may not write back any dirty contents
> + * while performing the invalidation. It is only exported for the
> + * explicit usage of the NVDIMM and CXL modules in the 'DEVMEM' symbol
> + * namespace.
> + *
> + * Returns 0 on success or negative error code on a failure to perform
> + * the cache maintenance.
> + */

WBINVD is a scary beast.  But, there's also no better alternative in the
architecture.  I don't think any of my comments above are deal breakers,
so from the x86 side:

Acked-by: Dave Hansen <dave.hansen@...ux.intel.com>