lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Date:   Thu, 29 Nov 2018 22:20:11 +0100
From:   Arnd Bergmann <arnd@...db.de>
To:     jose.abreu@...opsys.com
Cc:     David Laight <David.Laight@...lab.com>,
        "open list:SYNOPSYS ARC ARCHITECTURE" 
        <linux-snps-arc@...ts.infradead.org>,
        Linux Kernel Mailing List <linux-kernel@...r.kernel.org>,
        Vineet Gupta <vineet.gupta1@...opsys.com>,
        alexey.brodkin@...opsys.com, Joao Pinto <joao.pinto@...opsys.com>,
        Vitor Soares <vitor.soares@...opsys.com>
Subject: Re: [PATCH v2] ARC: io.h: Implement reads{x}()/writes{x}()

On Thu, Nov 29, 2018 at 5:14 PM Jose Abreu <jose.abreu@...opsys.com> wrote:

> --->8--
> static noinline void test_readsl(char *buf, int len)
> {
>         readsl(0xdeadbeef, buf, len);
> }
> --->8---
>
> And the disassembly:
> --->8---
> 00000e88 <test_readsl>:
>  e88:    breq.dr1,0,eac <0xeac>        /* if (count) */
>  e8c:    and r2,r0,3
>
>  e90:    mov_s lp_count,r1            /* r1 = count */
>  e92:    brne r2,0,eb0 <0xeb0>        /* if (bptr % ((t) / 8)) */
>
>  e96:    sub r0,r0,4
>  e9a:    nop_s
>
>  e9c:    lp eac <0xeac>                /* first loop */
>  ea0:    ld r2,[0xdeadbeef]
>  ea8:    st.a r2,[r0,4]
>  eac:    j_s [blink]
>  eae:    nop_s
>
>  eb0:    lp ed6 <0xed6>                /* second loop */
>  eb4:    ld r2,[0xdeadbeef]
>  ebc:    lsr r5,r2,8
>  ec0:    lsr r4,r2,16
>  ec4:    lsr r3,r2,24
>  ec8:    stb_s r2,[r0,0]
>  eca:    stb r5,[r0,1]
>  ece:    stb r4,[r0,2]
>  ed2:    stb_s r3,[r0,3]
>  ed4:    add_s r0,r0,4
>  ed6:    j_s [blink]
>
> --->8---
>
> See how the if condition added in this version is checked in
> <test_readsl+0xe92> and then it takes two different loops.

This looks good to me. I wonder what the result is for CPUs
that /do/ support unaligned accesses. Normally put_unaligned()
should fall back to a simple store in that case, but I'm not
sure it can fold the two stores back into one and skip the
alignment check. Probably not worth overoptimizing for that
case (the MMIO access latency should be much higher than
anything you could gain here), but I'm still curious about
how well our get/put_unaligned macros work.

       Arnd

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ