[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <CAFULd4ahPq+ui5=mx0VHNkfQrqRB2n-HgmXFA1UyNFYtZaCeEA@mail.gmail.com>
Date: Tue, 16 Dec 2025 17:30:55 +0100
From: Uros Bizjak <ubizjak@...il.com>
To: David Laight <david.laight.linux@...il.com>
Cc: x86@...nel.org, linux-kernel@...r.kernel.org,
Thomas Gleixner <tglx@...utronix.de>, Ingo Molnar <mingo@...nel.org>, Borislav Petkov <bp@...en8.de>,
Dave Hansen <dave.hansen@...ux.intel.com>, "H. Peter Anvin" <hpa@...or.com>
Subject: Re: [PATCH RESEND] x86/asm/32: Modernize _memcpy()
On Tue, Dec 16, 2025 at 2:14 PM David Laight
<david.laight.linux@...il.com> wrote:
> > 00e778b0 <memcpy>:
> > e778b0: 55 push %ebp
> > e778b1: 89 e5 mov %esp,%ebp
> > e778b3: 83 ec 08 sub $0x8,%esp
> > e778b6: 89 75 f8 mov %esi,-0x8(%ebp)
> > e778b9: 89 d6 mov %edx,%esi
> > e778bb: 89 ca mov %ecx,%edx
> > e778bd: 89 7d fc mov %edi,-0x4(%ebp)
> > e778c0: c1 e9 02 shr $0x2,%ecx
> > e778c3: 89 c7 mov %eax,%edi
> > e778c5: f3 a5 rep movsl %ds:(%esi),%es:(%edi)
> > e778c7: 83 e2 03 and $0x3,%edx
> > e778ca: 74 04 je e778d0 <memcpy+0x20>
> > e778cc: 89 d1 mov %edx,%ecx
> > e778ce: f3 a4 rep movsb %ds:(%esi),%es:(%edi)
> > e778d0: 8b 75 f8 mov -0x8(%ebp),%esi
> > e778d3: 8b 7d fc mov -0x4(%ebp),%edi
> > e778d6: 89 ec mov %ebp,%esp
> > e778d8: 5d pop %ebp
> > e778d9: c3 ret
> >
> > due to a better register allocation, avoiding the call-saved
> > %ebx register.
>
> That'll might be semi-random.
Not really, the compiler has more freedom to allocate more optimal register.
> > + unsigned long ecx = n >> 2;
> > +
> > + asm volatile("rep movsl"
> > + : "+D" (edi), "+S" (esi), "+c" (ecx)
> > + : : "memory");
> > + ecx = n & 3;
> > + if (ecx)
> > + asm volatile("rep movsb"
> > + : "+D" (edi), "+S" (esi), "+c" (ecx)
> > + : : "memory");
> > return to;
> > }
> >
> This version seems to generate better code still:
> see https://godbolt.org/z/78cq97PPj
>
> void *__memcpy(void *to, const void *from, unsigned long n)
> {
> unsigned long ecx = n >> 2;
>
> asm volatile("rep movsl"
> : "+D" (to), "+S" (from), "+c" (ecx)
> : : "memory");
> ecx = n & 3;
> if (ecx)
> asm volatile("rep movsb"
> : "+D" (to), "+S" (from), "+c" (ecx)
> : : "memory");
> return (char *)to - n;
I don't think that additional subtraction outweighs a move from EAX to
a temporary.
BR,
Uros.
Powered by blists - more mailing lists