[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <12E72F30-EF1D-4A1B-9D71-3A7FD5AF343C@zytor.com>
Date: Fri, 25 Oct 2024 10:12:59 -0700
From: "H. Peter Anvin" <hpa@...or.com>
To: Uros Bizjak <ubizjak@...il.com>, Dave Hansen <dave.hansen@...el.com>
CC: x86@...nel.org, linux-kernel@...r.kernel.org,
Thomas Gleixner <tglx@...utronix.de>, Ingo Molnar <mingo@...nel.org>,
Borislav Petkov <bp@...en8.de>,
Dave Hansen <dave.hansen@...ux.intel.com>
Subject: Re: [PATCH] x86/ioperm: Use atomic64_inc_return() in ksys_ioperm()
On October 24, 2024 9:20:01 AM PDT, Uros Bizjak <ubizjak@...il.com> wrote:
>On Thu, Oct 24, 2024 at 5:21 PM Dave Hansen <dave.hansen@...el.com> wrote:
>>
>> On 10/7/24 01:33, Uros Bizjak wrote:
>> > Use atomic64_inc_return(&ref) instead of atomic64_add_return(1, &ref)
>> > to use optimized implementation and ease register pressure around
>> > the primitive for targets that implement optimized variant.
>>
>> Ease register pressure at the end of a syscall?
>>
>> I'll accept that we're doing this just as a matter of hygiene. But it's
>> a stretch to say there are any performance concerns whatsoever at the
>> end of the ioperm() syscall.
>>
>> So what is the real reason for this patch?
>
>Please see code dumps for i386, a target that implements atomic64_inc_return():
>
> 1a9: 8d 04 95 04 00 00 00 lea 0x4(,%edx,4),%eax
> 1b0: b9 00 00 00 00 mov $0x0,%ecx
> 1b1: R_386_32 .bss
> 1b5: 89 43 0c mov %eax,0xc(%ebx)
> 1b8: 31 d2 xor %edx,%edx
> 1ba: b8 01 00 00 00 mov $0x1,%eax
> 1bf: e8 fc ff ff ff call 1c0 <ksys_ioperm+0xa8>
> 1c0: R_386_PC32 atomic64_add_return_cx8
> 1c4: 89 03 mov %eax,(%ebx)
> 1c6: 89 53 04 mov %edx,0x4(%ebx)
>
>vs. improved:
>
> 1a9: 8d 04 95 04 00 00 00 lea 0x4(,%edx,4),%eax
> 1b0: be 00 00 00 00 mov $0x0,%esi
> 1b1: R_386_32 .bss
> 1b5: 89 43 0c mov %eax,0xc(%ebx)
> 1b8: e8 fc ff ff ff call 1b9 <ksys_ioperm+0xa1>
> 1b9: R_386_PC32 atomic64_inc_return_cx8
> 1bd: 89 03 mov %eax,(%ebx)
> 1bf: 89 53 04 mov %edx,0x4(%ebx)
>
>There is no need to initialize %eax/%edx register pair before the
>"call" to atomic64_inc_return() function. The "call" is not an ABI
>function call, but an asm volatile (which BTW lacks
>ASM_CALL_CONSTRAINT), so there is no ABI guarantees which register is
>call-preserved and which call-clobbered.
>
>Oh, this is the "return" variant - the function indeed returns the
>new value in %eax/%edx pair, so the difference is only in the
>redundant register initialization. I can reword the commit message for
>this case to mention that an initialization of register pair is spared
>before the call.
>
>Uros.
>
What does ASM_CALL_CONSTRAINT actually do *in the kernel*, *for x86*? There isn't a redzone in the kernel, and there *can't* be, because asynchronous events can clobber data below the stack pointer at any time.
With FRED that is no longer true and we could use the redzone in the kernel, but such a kernel would not be able to boot on a legacy CPU/VMM, and is only applicable for 64 bits.
This by itself is a good enough reason to be good about this, to be sure, but one of the reasons I'm asking is because of older versions of gcc where "asm goto" is incompatible with output constraints.
Powered by blists - more mailing lists