linux-kernel - Re: [PATCH] x86/ioperm: Use atomic64_inc_return() in ksys

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <12E72F30-EF1D-4A1B-9D71-3A7FD5AF343C@zytor.com>
Date: Fri, 25 Oct 2024 10:12:59 -0700
From: "H. Peter Anvin" <hpa@...or.com>
To: Uros Bizjak <ubizjak@...il.com>, Dave Hansen <dave.hansen@...el.com>
CC: x86@...nel.org, linux-kernel@...r.kernel.org,
        Thomas Gleixner <tglx@...utronix.de>, Ingo Molnar <mingo@...nel.org>,
        Borislav Petkov <bp@...en8.de>,
        Dave Hansen <dave.hansen@...ux.intel.com>
Subject: Re: [PATCH] x86/ioperm: Use atomic64_inc_return() in ksys_ioperm()

On October 24, 2024 9:20:01 AM PDT, Uros Bizjak <ubizjak@...il.com> wrote:
>On Thu, Oct 24, 2024 at 5:21 PM Dave Hansen <dave.hansen@...el.com> wrote:
>>
>> On 10/7/24 01:33, Uros Bizjak wrote:
>> > Use atomic64_inc_return(&ref) instead of atomic64_add_return(1, &ref)
>> > to use optimized implementation and ease register pressure around
>> > the primitive for targets that implement optimized variant.
>>
>> Ease register pressure at the end of a syscall?
>>
>> I'll accept that we're doing this just as a matter of hygiene.  But it's
>> a stretch to say there are any performance concerns whatsoever at the
>> end of the ioperm() syscall.
>>
>> So what is the real reason for this patch?
>
>Please see code dumps for i386, a target that implements atomic64_inc_return():
>
> 1a9:    8d 04 95 04 00 00 00     lea    0x4(,%edx,4),%eax
> 1b0:    b9 00 00 00 00           mov    $0x0,%ecx
>            1b1: R_386_32    .bss
> 1b5:    89 43 0c                 mov    %eax,0xc(%ebx)
> 1b8:    31 d2                    xor    %edx,%edx
> 1ba:    b8 01 00 00 00           mov    $0x1,%eax
> 1bf:    e8 fc ff ff ff           call   1c0 <ksys_ioperm+0xa8>
>            1c0: R_386_PC32    atomic64_add_return_cx8
> 1c4:    89 03                    mov    %eax,(%ebx)
> 1c6:    89 53 04                 mov    %edx,0x4(%ebx)
>
>vs. improved:
>
> 1a9:    8d 04 95 04 00 00 00     lea    0x4(,%edx,4),%eax
> 1b0:    be 00 00 00 00           mov    $0x0,%esi
>            1b1: R_386_32    .bss
> 1b5:    89 43 0c                 mov    %eax,0xc(%ebx)
> 1b8:    e8 fc ff ff ff           call   1b9 <ksys_ioperm+0xa1>
>            1b9: R_386_PC32    atomic64_inc_return_cx8
> 1bd:    89 03                    mov    %eax,(%ebx)
> 1bf:    89 53 04                 mov    %edx,0x4(%ebx)
>
>There is no need to initialize %eax/%edx register pair before the
>"call" to atomic64_inc_return() function. The "call" is not an ABI
>function call, but an asm volatile (which BTW lacks
>ASM_CALL_CONSTRAINT), so there is no ABI guarantees which register is
>call-preserved and which call-clobbered.
>
>Oh, this is the "return" variant -  the function indeed returns the
>new value in %eax/%edx pair, so the difference is only in the
>redundant register initialization. I can reword the commit message for
>this case to mention that an initialization of register pair is spared
>before the call.
>
>Uros.
>

What does ASM_CALL_CONSTRAINT actually do *in the kernel*, *for x86*? There isn't a redzone in the kernel, and there *can't* be, because asynchronous events can clobber data below the stack pointer at any time. 

With FRED that is no longer true and we could use the redzone in the kernel, but such a kernel would not be able to boot on a legacy CPU/VMM, and is only applicable for 64 bits.

This by itself is a good enough reason to be good about this, to be sure, but one of the reasons I'm asking is because of older versions of gcc where "asm goto" is incompatible with output constraints.