lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <20251125134755.GMaSWzi-_vZwdkFcdp@fat_crate.local>
Date: Tue, 25 Nov 2025 14:47:55 +0100
From: Borislav Petkov <bp@...en8.de>
To: Ankur Arora <ankur.a.arora@...cle.com>
Cc: linux-kernel@...r.kernel.org, linux-mm@...ck.org, x86@...nel.org,
	akpm@...ux-foundation.org, david@...nel.org,
	dave.hansen@...ux.intel.com, hpa@...or.com, mingo@...hat.com,
	mjguzik@...il.com, luto@...nel.org, peterz@...radead.org,
	tglx@...utronix.de, willy@...radead.org, raghavendra.kt@....com,
	boris.ostrovsky@...cle.com, konrad.wilk@...cle.com
Subject: Re: [PATCH v9 4/7] x86/mm: Simplify clear_page_*

On Fri, Nov 21, 2025 at 12:23:49PM -0800, Ankur Arora wrote:
> +/**
> + * clear_page() - clear a page using a kernel virtual address.
> + * @addr: address of kernel page
> + *
> + * Switch between three implementations of page clearing based on CPU
> + * capabilities:
> + *
> + *  - __clear_pages_unrolled(): the oldest, slowest and universally
> + *    supported method. Zeroes via 8-byte MOV instructions unrolled 8x
> + *    to write a 64-byte cacheline in each loop iteration.
> + *
> + *  - "REP; STOSQ": really old CPUs had crummy REP implementations.
> + *    Vendor CPU setup code sets 'REP_GOOD' on CPUs where REP can be
> + *    trusted. The instruction writes 8-byte per REP iteration but
> + *    CPUs can internally batch these together and do larger writes.
> + *
> + *  - "REP; STOSB": CPUs that enumerate 'ERMS' have an improved STOS
> + *    implementation that is less picky about alignment and where
> + *    STOSB (1-byte at a time) is actually faster than STOSQ (8-bytes
> + *    at a time.)

Please put here in BIG RED LETTERS something along the lines of:

"The inline asm has a CALL instruction and usually that is a no-no due to the
compiler not knowing that there's a CALL inside the asm and thus won't track
callee-clobbered registers but in this case, all the callee clobbereds by
__clear_pages_unrolled() are part of the inline asm register specification so
that is fine.

Just don't assume that you can call *any* function from inside asm due to the
above."

> + *
> + * Does absolutely no exception handling.
> + */
> +static inline void clear_page(void *addr)
>  {
> +	u64 len = PAGE_SIZE;
>  	/*
>  	 * Clean up KMSAN metadata for the page being cleared. The assembly call
> -	 * below clobbers @page, so we perform unpoisoning before it.
> +	 * below clobbers @addr, so we perform unpoisoning before it.

s/we //

>  	 */
> -	kmsan_unpoison_memory(page, PAGE_SIZE);
> -	alternative_call_2(clear_page_orig,
> -			   clear_page_rep, X86_FEATURE_REP_GOOD,
> -			   clear_page_erms, X86_FEATURE_ERMS,
> -			   "=D" (page),
> -			   "D" (page),
> -			   "cc", "memory", "rax", "rcx");
> +	kmsan_unpoison_memory(addr, len);
> +	asm volatile(ALTERNATIVE_2("call __clear_pages_unrolled",
> +				   "shrq $3, %%rcx; rep stosq", X86_FEATURE_REP_GOOD,
> +				   "rep stosb", X86_FEATURE_ERMS)
> +			: "+c" (len), "+D" (addr), ASM_CALL_CONSTRAINT
> +			: "a" (0)
> +			: "cc", "memory");
>  }

With that:

Reviewed-by: Borislav Petkov (AMD) <bp@...en8.de>

Thx.

-- 
Regards/Gruss,
    Boris.

https://people.kernel.org/tglx/notes-about-netiquette

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ