lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Date:   Wed, 30 Aug 2023 17:51:52 +0200
From:   Willy Tarreau <w@....eu>
To:     Ammar Faizi <ammarfaizi2@...weeb.org>
Cc:     Alviro Iskandar Setiawan <alviro.iskandar@...weeb.org>,
        Thomas Weißschuh <linux@...ssschuh.net>,
        Nicholas Rosenberg <inori@...x.org>,
        Michael William Jonathan <moe@...weeb.org>,
        GNU/Weeb Mailing List <gwml@...r.gnuweeb.org>,
        Linux Kernel Mailing List <linux-kernel@...r.kernel.org>
Subject: Re: [RFC PATCH v1 2/5] tools/nolibc: x86-64: Use `rep stosb` for
 `memset()`

On Wed, Aug 30, 2023 at 10:44:53PM +0700, Ammar Faizi wrote:
> On Wed, Aug 30, 2023 at 05:23:22PM +0200, Willy Tarreau wrote:
> > Then "xchg %esi, %eax" is just one byte with no memory access ;-)
> 
> Perfect!
> 
> Now I got this, shorter than "movl %esi, %eax":
> ```
> 0000000000001500 <memset>:
>     1500: 96          xchg   %eax,%esi
>     1501: 48 89 d1    mov    %rdx,%rcx
>     1504: 57          push   %rdi
>     1505: f3 aa       rep stos %al,%es:(%rdi)
>     1507: 58          pop    %rax
>     1508: c3          ret
> ```
> 
> Unfortunately, the xchg trick doesn't yield smaller machine code for
> %rdx, %rcx. Lol.

Normal, that's because historically "xchg ax, regX" was a single-byte 0x9X
on 8086, then it turned to 32-bit keeping the same encoding, like many
instructions (note that NOP is encoded as xchg ax,ax). It remains short
when you can sacrifice the other register, or restore it later using yet
another xchg. For rcx/rdx a push/pop could do it as they should also be
a single-byte 0x5X even in long mode unless I'm mistaken. Thus if you
absolutely want to squeeze that 9th byte to end up with a 8-byte function
you could probably do:

    xchg %eax, %esi      1
    push %rdx            1
    pop %rcx             1
    push %rdi            1
    rep movsb            2
    pop %rax             1
    ret                  1
    ------------- Total: 8 bytes :-)

Willy

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ