[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <78aab15e-5bc2-47cc-ac1c-5a348bff0e17@oracle.com>
Date: Thu, 3 Jul 2025 15:33:16 +0200
From: Vegard Nossum <vegard.nossum@...cle.com>
To: David Laight <david.laight.linux@...il.com>,
"Kirill A. Shutemov" <kirill.shutemov@...ux.intel.com>
Cc: Andy Lutomirski <luto@...nel.org>, Thomas Gleixner <tglx@...utronix.de>,
Ingo Molnar <mingo@...hat.com>, Borislav Petkov <bp@...en8.de>,
Dave Hansen <dave.hansen@...ux.intel.com>, x86@...nel.org,
"H. Peter Anvin" <hpa@...or.com>,
Peter Zijlstra <peterz@...radead.org>,
Ard Biesheuvel <ardb@...nel.org>,
"Paul E. McKenney" <paulmck@...nel.org>,
Josh Poimboeuf <jpoimboe@...nel.org>,
Xiongwei Song <xiongwei.song@...driver.com>,
Xin Li <xin3.li@...el.com>, "Mike Rapoport (IBM)" <rppt@...nel.org>,
Brijesh Singh <brijesh.singh@....com>,
Michael Roth <michael.roth@....com>, Tony Luck <tony.luck@...el.com>,
Alexey Kardashevskiy <aik@....com>,
Alexander Shishkin <alexander.shishkin@...ux.intel.com>,
Jonathan Corbet <corbet@....net>, Sohil Mehta <sohil.mehta@...el.com>,
Ingo Molnar <mingo@...nel.org>,
Pawan Gupta <pawan.kumar.gupta@...ux.intel.com>,
Daniel Sneddon <daniel.sneddon@...ux.intel.com>,
Kai Huang <kai.huang@...el.com>, Sandipan Das <sandipan.das@....com>,
Breno Leitao <leitao@...ian.org>,
Rick Edgecombe
<rick.p.edgecombe@...el.com>,
Alexei Starovoitov <ast@...nel.org>, Hou Tao <houtao1@...wei.com>,
Juergen Gross <jgross@...e.com>, Kees Cook <kees@...nel.org>,
Eric Biggers <ebiggers@...gle.com>, Jason Gunthorpe <jgg@...pe.ca>,
"Masami Hiramatsu (Google)" <mhiramat@...nel.org>,
Andrew Morton <akpm@...ux-foundation.org>,
Luis Chamberlain <mcgrof@...nel.org>, Yuntao Wang <ytcoode@...il.com>,
Rasmus Villemoes <linux@...musvillemoes.dk>,
Christophe Leroy <christophe.leroy@...roup.eu>,
Tejun Heo <tj@...nel.org>, Changbin Du <changbin.du@...wei.com>,
Huang Shijie <shijie@...amperecomputing.com>,
Geert Uytterhoeven <geert+renesas@...der.be>,
Namhyung Kim <namhyung@...nel.org>,
Arnaldo Carvalho de Melo <acme@...hat.com>, linux-doc@...r.kernel.org,
linux-kernel@...r.kernel.org, linux-efi@...r.kernel.org,
linux-mm@...ck.org
Subject: Re: [PATCHv8 02/17] x86/asm: Introduce inline memcpy and memset
On 03/07/2025 14:15, David Laight wrote:
> On Thu, 3 Jul 2025 13:39:57 +0300
> "Kirill A. Shutemov" <kirill.shutemov@...ux.intel.com> wrote:
>> On Thu, Jul 03, 2025 at 09:44:17AM +0100, David Laight wrote:
>>> On Tue, 1 Jul 2025 12:58:31 +0300
>>> "Kirill A. Shutemov" <kirill.shutemov@...ux.intel.com> wrote:
>>>> diff --git a/arch/x86/lib/clear_page_64.S b/arch/x86/lib/clear_page_64.S
>>>> index a508e4a8c66a..47b613690f84 100644
>>>> --- a/arch/x86/lib/clear_page_64.S
>>>> +++ b/arch/x86/lib/clear_page_64.S
>>>> @@ -55,17 +55,26 @@ SYM_FUNC_END(clear_page_erms)
>>>> EXPORT_SYMBOL_GPL(clear_page_erms)
>>>>
>>>> /*
>>>> - * Default clear user-space.
>>>> + * Default memset.
>>>> * Input:
>>>> * rdi destination
>>>> + * rsi scratch
>>>> * rcx count
>>>> - * rax is zero
>>>> + * al is value
>>>> *
>>>> * Output:
>>>> * rcx: uncleared bytes or 0 if successful.
>>>> + * rdx: clobbered
>>>> */
>>>> SYM_FUNC_START(rep_stos_alternative)
>>>> ANNOTATE_NOENDBR
>>>> +
>>>> + movzbq %al, %rsi
>>>> + movabs $0x0101010101010101, %rax
>>>> +
>>>> + /* RDX:RAX = RAX * RSI */
>>>> + mulq %rsi
>>>
>>> NAK - you can't do that here.
>>> Neither %rsi nor %rdx can be trashed.
>>> The function has a very explicit calling convention.
That's why we have the clobbers... see below
>> What calling convention? We change the only caller to confirm to this.
>
> The one that is implicit in:
>
>>>> + asm volatile("1:\n\t"
>>>> + ALT_64("rep stosb",
>>>> + "call rep_stos_alternative", ALT_NOT(X86_FEATURE_FSRM))
>>>> + "2:\n\t"
>>>> + _ASM_EXTABLE_UA(1b, 2b)
>>>> + : "+c" (len), "+D" (addr), ASM_CALL_CONSTRAINT
>>>> + : "a" ((uint8_t)v)
>
> The called function is only allowed to change the registers that
> 'rep stosb' uses - except it can access (but not change)
> all of %rax - not just %al.
>
> See: https://godbolt.org/z/3fnrT3x9r
> In particular note that 'do_mset' must not change %rax.
>
> This is very specific and is done so that the compiler can use
> all the registers.
I feel like you trimmed off the clobbers from the asm() in the context
above. For reference, it is:
+ : "memory", _ASM_SI, _ASM_DX);
I'm not saying this can't be optimized, but that doesn't seem to be your
complaint -- you say "the called function is only allowed to change
...", but this is not true when we have the clobbers, right?
This is exactly what I fixed with my v7 fixlet to this patch:
https://lore.kernel.org/all/1b96b0ca-5c14-4271-86c1-c305bf052b16@oracle.com/
Vegard
Powered by blists - more mailing lists