linux-kernel - RE: [PATCH v5 1/3] riscv: optimized memcpy

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <63ab9e73cb58404c99e271b11b2b5bf5@AcuMS.aculab.com>
Date:   Fri, 1 Oct 2021 08:06:13 +0000
From:   David Laight <David.Laight@...LAB.COM>
To:     'Vineet Gupta' <vgupta@...nel.org>,
        Matteo Croce <mcroce@...ux.microsoft.com>,
        Guo Ren <guoren@...nel.org>
CC:     linux-riscv <linux-riscv@...ts.infradead.org>,
        Linux Kernel Mailing List <linux-kernel@...r.kernel.org>,
        linux-arch <linux-arch@...r.kernel.org>,
        "Paul Walmsley" <paul.walmsley@...ive.com>,
        Palmer Dabbelt <palmer@...belt.com>,
        Albert Ou <aou@...s.berkeley.edu>,
        Atish Patra <atish.patra@....com>,
        "Emil Renner Berthing" <kernel@...il.dk>,
        Akira Tsukamoto <akira.tsukamoto@...il.com>,
        Drew Fustini <drew@...gleboard.org>,
        Bin Meng <bmeng.cn@...il.com>, Christoph Hellwig <hch@....de>,
        Philipp Tomsich <philipp.tomsich@...ll.eu>
Subject: RE: [PATCH v5 1/3] riscv: optimized memcpy

...
> BTW off topic (but relevant to this patchset), I strongly feel that
> routines like memset/memcpy are better coded in assembly for really
> water tight instruction scheduling and ease of further optimizing (e.g.
> use of CMO.zero etc as experimented by Philipp). What is blocking you
> from optimizing the asm version ? You are leaving the fate of these
> critical routines in the hand of compiler - this can lead to performance
> shenanigans on a big gcc upgrade.

You also need to worry about the cost of short transfers.
A few cycles there could have a much bigger difference
that something that speeds up long transfers.
Short ones are likely to be fairly common.
I doubt the loop unrolling optimisation in gcc is actually
any good for loops that might be done a few times.

Fortunately the kernel doesn't get 'hit by' gcc unrolling
loops into the AVX instructions.
The setup costs for that (and I-cache footprint) are horrid.
Although I suspect it is that optimisation that 'broke'
code that used misaligned pointers on overlapping data.

It is a general problem with the 'one size fits all' memcpy().

	David

-
Registered Address Lakeside, Bramley Road, Mount Farm, Milton Keynes, MK1 1PT, UK
Registration No: 1397386 (Wales)