lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Date:   Fri, 30 Nov 2018 14:44:13 +0100
From:   Arnd Bergmann <arnd@...db.de>
To:     jose.abreu@...opsys.com
Cc:     David Laight <David.Laight@...lab.com>,
        "open list:SYNOPSYS ARC ARCHITECTURE" 
        <linux-snps-arc@...ts.infradead.org>,
        Linux Kernel Mailing List <linux-kernel@...r.kernel.org>,
        Vineet Gupta <vineet.gupta1@...opsys.com>,
        alexey.brodkin@...opsys.com, Joao Pinto <joao.pinto@...opsys.com>,
        Vitor Soares <vitor.soares@...opsys.com>
Subject: Re: [PATCH v2] ARC: io.h: Implement reads{x}()/writes{x}()

On Fri, Nov 30, 2018 at 9:57 AM Jose Abreu <jose.abreu@...opsys.com> wrote:
> On 29-11-2018 21:20, Arnd Bergmann wrote:
> > On Thu, Nov 29, 2018 at 5:14 PM Jose Abreu <jose.abreu@...opsys.com> wrote:
> >> See how the if condition added in this version is checked in
> >> <test_readsl+0xe92> and then it takes two different loops.
> > This looks good to me. I wonder what the result is for CPUs
> > that /do/ support unaligned accesses. Normally put_unaligned()
> > should fall back to a simple store in that case, but I'm not
> > sure it can fold the two stores back into one and skip the
> > alignment check. Probably not worth overoptimizing for that
> > case (the MMIO access latency should be much higher than
> > anything you could gain here), but I'm still curious about
> > how well our get/put_unaligned macros work.
>
> Here is disassembly for an ARC CPU that supports unaligned accesses:
>
> -->8---
> 00000d48 <test_readsl>:
>  d48:    breq_s r1,0,28            /* if (count) */
>  d4a:    tst    r0,0x3
>  d4e:    bne_s 32                /* if (bptr % ((t) / 8)) */
>
>  d50:    ld r2,[0xdeadbeef]        /* first loop */
>  d58:    sub_s r1,r1,0x1
>  d5a:    tst_s r1,r1
>  d5c:    bne.d -12
>  d60:    st.ab r2,[r0,4]
>
>  d64:    dmb    0x1                    /* common exit point */
>  d68:    j_s    [blink]
>  d6a:    nop_s
>
>  d6c:    ld r2,[0xdeadbeef]        /* second loop */
>  d74:    sub_s r1,r1,0x1
>  d76:    tst_s r1,r1
>  d78:    bne.d -12
>  d7c:    st.ab r2,[r0,4]
>
>  d80:    b_s -28                    /* jmp to 0xd64 */
>  d82:    nop_s
> --->8---
>
> Notice how first and second loop are exactly equal ...

Ok, so it's halfway there: it managed to optimize the the unaligned
case correctly, but it failed to notice that both sides are
identical now.

      Arnd

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ