[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <ZO9lmGoMDh10jdsk@1wt.eu>
Date: Wed, 30 Aug 2023 17:51:52 +0200
From: Willy Tarreau <w@....eu>
To: Ammar Faizi <ammarfaizi2@...weeb.org>
Cc: Alviro Iskandar Setiawan <alviro.iskandar@...weeb.org>,
Thomas Weißschuh <linux@...ssschuh.net>,
Nicholas Rosenberg <inori@...x.org>,
Michael William Jonathan <moe@...weeb.org>,
GNU/Weeb Mailing List <gwml@...r.gnuweeb.org>,
Linux Kernel Mailing List <linux-kernel@...r.kernel.org>
Subject: Re: [RFC PATCH v1 2/5] tools/nolibc: x86-64: Use `rep stosb` for
`memset()`
On Wed, Aug 30, 2023 at 10:44:53PM +0700, Ammar Faizi wrote:
> On Wed, Aug 30, 2023 at 05:23:22PM +0200, Willy Tarreau wrote:
> > Then "xchg %esi, %eax" is just one byte with no memory access ;-)
>
> Perfect!
>
> Now I got this, shorter than "movl %esi, %eax":
> ```
> 0000000000001500 <memset>:
> 1500: 96 xchg %eax,%esi
> 1501: 48 89 d1 mov %rdx,%rcx
> 1504: 57 push %rdi
> 1505: f3 aa rep stos %al,%es:(%rdi)
> 1507: 58 pop %rax
> 1508: c3 ret
> ```
>
> Unfortunately, the xchg trick doesn't yield smaller machine code for
> %rdx, %rcx. Lol.
Normal, that's because historically "xchg ax, regX" was a single-byte 0x9X
on 8086, then it turned to 32-bit keeping the same encoding, like many
instructions (note that NOP is encoded as xchg ax,ax). It remains short
when you can sacrifice the other register, or restore it later using yet
another xchg. For rcx/rdx a push/pop could do it as they should also be
a single-byte 0x5X even in long mode unless I'm mistaken. Thus if you
absolutely want to squeeze that 9th byte to end up with a 8-byte function
you could probably do:
xchg %eax, %esi 1
push %rdx 1
pop %rcx 1
push %rdi 1
rep movsb 2
pop %rax 1
ret 1
------------- Total: 8 bytes :-)
Willy
Powered by blists - more mailing lists