lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <CAFULd4ahPq+ui5=mx0VHNkfQrqRB2n-HgmXFA1UyNFYtZaCeEA@mail.gmail.com>
Date: Tue, 16 Dec 2025 17:30:55 +0100
From: Uros Bizjak <ubizjak@...il.com>
To: David Laight <david.laight.linux@...il.com>
Cc: x86@...nel.org, linux-kernel@...r.kernel.org, 
	Thomas Gleixner <tglx@...utronix.de>, Ingo Molnar <mingo@...nel.org>, Borislav Petkov <bp@...en8.de>, 
	Dave Hansen <dave.hansen@...ux.intel.com>, "H. Peter Anvin" <hpa@...or.com>
Subject: Re: [PATCH RESEND] x86/asm/32: Modernize _memcpy()

On Tue, Dec 16, 2025 at 2:14 PM David Laight
<david.laight.linux@...il.com> wrote:

> > 00e778b0 <memcpy>:
> >   e778b0:     55                      push   %ebp
> >   e778b1:     89 e5                   mov    %esp,%ebp
> >   e778b3:     83 ec 08                sub    $0x8,%esp
> >   e778b6:     89 75 f8                mov    %esi,-0x8(%ebp)
> >   e778b9:     89 d6                   mov    %edx,%esi
> >   e778bb:     89 ca                   mov    %ecx,%edx
> >   e778bd:     89 7d fc                mov    %edi,-0x4(%ebp)
> >   e778c0:     c1 e9 02                shr    $0x2,%ecx
> >   e778c3:     89 c7                   mov    %eax,%edi
> >   e778c5:     f3 a5                   rep movsl %ds:(%esi),%es:(%edi)
> >   e778c7:     83 e2 03                and    $0x3,%edx
> >   e778ca:     74 04                   je     e778d0 <memcpy+0x20>
> >   e778cc:     89 d1                   mov    %edx,%ecx
> >   e778ce:     f3 a4                   rep movsb %ds:(%esi),%es:(%edi)
> >   e778d0:     8b 75 f8                mov    -0x8(%ebp),%esi
> >   e778d3:     8b 7d fc                mov    -0x4(%ebp),%edi
> >   e778d6:     89 ec                   mov    %ebp,%esp
> >   e778d8:     5d                      pop    %ebp
> >   e778d9:     c3                      ret
> >
> > due to a better register allocation, avoiding the call-saved
> > %ebx register.
>
> That'll might be semi-random.

Not really, the compiler has more freedom to allocate more optimal register.

> > +     unsigned long ecx = n >> 2;
> > +
> > +     asm volatile("rep movsl"
> > +                  : "+D" (edi), "+S" (esi), "+c" (ecx)
> > +                  : : "memory");
> > +     ecx = n & 3;
> > +     if (ecx)
> > +             asm volatile("rep movsb"
> > +                          : "+D" (edi), "+S" (esi), "+c" (ecx)
> > +                          : : "memory");
> >       return to;
> >  }
> >

> This version seems to generate better code still:
> see https://godbolt.org/z/78cq97PPj
>
> void *__memcpy(void *to, const void *from, unsigned long n)
> {
>         unsigned long ecx = n >> 2;
>
>         asm volatile("rep movsl"
>                      : "+D" (to), "+S" (from), "+c" (ecx)
>                      : : "memory");
>         ecx = n & 3;
>         if (ecx)
>                 asm volatile("rep movsb"
>                              : "+D" (to), "+S" (from), "+c" (ecx)
>                              : : "memory");
>         return (char *)to - n;

I don't think that additional subtraction outweighs a move from EAX to
a temporary.

BR,
Uros.

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ