lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Date:   Tue, 22 Jun 2021 04:07:47 +0300
From:   Nick Kossifidis <mick@....forth.gr>
To:     Matteo Croce <mcroce@...ux.microsoft.com>
Cc:     linux-riscv@...ts.infradead.org, linux-kernel@...r.kernel.org,
        linux-arch@...r.kernel.org,
        Paul Walmsley <paul.walmsley@...ive.com>,
        Palmer Dabbelt <palmer@...belt.com>,
        Albert Ou <aou@...s.berkeley.edu>,
        Atish Patra <atish.patra@....com>,
        Emil Renner Berthing <kernel@...il.dk>,
        Akira Tsukamoto <akira.tsukamoto@...il.com>,
        Drew Fustini <drew@...gleboard.org>,
        Bin Meng <bmeng.cn@...il.com>,
        David Laight <David.Laight@...lab.com>,
        Guo Ren <guoren@...nel.org>
Subject: Re: [PATCH v3 3/3] riscv: optimized memset

Στις 2021-06-17 18:27, Matteo Croce έγραψε:
> +
> +void *__memset(void *s, int c, size_t count)
> +{
> +	union types dest = { .u8 = s };
> +
> +	if (count >= MIN_THRESHOLD) {
> +		const int bytes_long = BITS_PER_LONG / 8;

You could make 'const int bytes_long = BITS_PER_LONG / 8;' and 'const 
int mask = bytes_long - 1;' from your memcpy patch visible to memset as 
well (static const...) and use them here (mask would make more sense to 
be named as word_mask).

> +		unsigned long cu = (unsigned long)c;
> +
> +		/* Compose an ulong with 'c' repeated 4/8 times */
> +		cu |= cu << 8;
> +		cu |= cu << 16;
> +#if BITS_PER_LONG == 64
> +		cu |= cu << 32;
> +#endif
> +

You don't have to create cu here, you'll fill dest buffer with 'c' 
anyway so after filling up enough 'c's to be able to grab an aligned 
word full of them from dest, you can just grab that word and keep 
filling up dest with it.

> +#ifndef CONFIG_HAVE_EFFICIENT_UNALIGNED_ACCESS
> +		/* Fill the buffer one byte at time until the destination
> +		 * is aligned on a 32/64 bit boundary.
> +		 */
> +		for (; count && dest.uptr % bytes_long; count--)

You could reuse & mask here instead of % bytes_long.

> +			*dest.u8++ = c;
> +#endif

I noticed you also used CONFIG_HAVE_EFFICIENT_UNALIGNED_ACCESS on your 
memcpy patch, is it worth it here ? To begin with riscv doesn't set it 
and even if it did we are talking about a loop that will run just a few 
times to reach the alignment boundary (worst case scenario it'll run 7 
times), I don't think we gain much here, even for archs that have 
efficient unaligned access.

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ