lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <87f2cf0e98c5c5560cfb591b4f4b29c8@mailhost.ics.forth.gr>
Date:   Tue, 22 Jun 2021 03:14:59 +0300
From:   Nick Kossifidis <mick@....forth.gr>
To:     Matteo Croce <mcroce@...ux.microsoft.com>
Cc:     linux-riscv@...ts.infradead.org, linux-kernel@...r.kernel.org,
        linux-arch@...r.kernel.org,
        Paul Walmsley <paul.walmsley@...ive.com>,
        Palmer Dabbelt <palmer@...belt.com>,
        Albert Ou <aou@...s.berkeley.edu>,
        Atish Patra <atish.patra@....com>,
        Emil Renner Berthing <kernel@...il.dk>,
        Akira Tsukamoto <akira.tsukamoto@...il.com>,
        Drew Fustini <drew@...gleboard.org>,
        Bin Meng <bmeng.cn@...il.com>,
        David Laight <David.Laight@...lab.com>,
        Guo Ren <guoren@...nel.org>
Subject: Re: [PATCH v3 1/3] riscv: optimized memcpy

Hello Matteo and thanks for the patch,

Στις 2021-06-17 18:27, Matteo Croce έγραψε:
> 
> @@ -0,0 +1,91 @@
> +// SPDX-License-Identifier: GPL-2.0-only
> +/*
> + * String functions optimized for hardware which doesn't
> + * handle unaligned memory accesses efficiently.
> + *
> + * Copyright (C) 2021 Matteo Croce
> + */
> +
> +#include <linux/types.h>
> +#include <linux/module.h>
> +
> +/* Minimum size for a word copy to be convenient */
> +#define MIN_THRESHOLD (BITS_PER_LONG / 8 * 2)
> +
> +/* convenience union to avoid cast between different pointer types */
> +union types {
> +	u8 *u8;

You are using a type as a name, I'd go with as_bytes/as_ulong/as_uptr 
which makes it easier for the reader to understand what you are trying 
to do.

> +	unsigned long *ulong;
> +	uintptr_t uptr;
> +};
> +
> +union const_types {
> +	const u8 *u8;
> +	unsigned long *ulong;
> +};
> +

I suggest you define those unions inside the function body, no one else 
is using them.

> +void *__memcpy(void *dest, const void *src, size_t count)
> +{
> +	const int bytes_long = BITS_PER_LONG / 8;
> +#ifndef CONFIG_HAVE_EFFICIENT_UNALIGNED_ACCESS
> +	const int mask = bytes_long - 1;
> +	const int distance = (src - dest) & mask;

Why not unsigned ints ?

> +#endif
> +	union const_types s = { .u8 = src };
> +	union types d = { .u8 = dest };
> +
> +#ifndef CONFIG_HAVE_EFFICIENT_UNALIGNED_ACCESS

If you want to be compliant with memcpy you should check for overlapping 
regions here since "The memory areas must not overlap", and do nothing 
about it because according to POSIX this leads to undefined behavior. 
That's why recent libc implementations use memmove in any case (memcpy 
is an alias to memmove), which is the suggested approach.

> +	if (count < MIN_THRESHOLD)
> +		goto copy_remainder;
> +
> +	/* copy a byte at time until destination is aligned */
> +	for (; count && d.uptr & mask; count--)
> +		*d.u8++ = *s.u8++;
> +

You should check for !IS_ENABLED(CONFIG_CPU_BIG_ENDIAN) here.

> +	if (distance) {
> +		unsigned long last, next;
> +
> +		/* move s backward to the previous alignment boundary */
> +		s.u8 -= distance;

It'd help here to explain that since s is distance bytes ahead relative 
to d, and d reached the alignment boundary above, s is now aligned but 
the data needs to be shifted to compensate for distance, in order to do 
word-by-word copy.

> +
> +		/* 32/64 bit wide copy from s to d.
> +		 * d is aligned now but s is not, so read s alignment wise,
> +		 * and do proper shift to get the right value.
> +		 * Works only on Little Endian machines.
> +		 */

This commend is misleading because s is aligned or else s.ulong[0]/[1] 
below would result an unaligned access.

> +		for (next = s.ulong[0]; count >= bytes_long + mask; count -= 
> bytes_long) {
> +			last = next;
> +			next = s.ulong[1];
> +
> +			d.ulong[0] = last >> (distance * 8) |
> +				     next << ((bytes_long - distance) * 8);
> +
> +			d.ulong++;
> +			s.ulong++;
> +		}
> +
> +		/* restore s with the original offset */
> +		s.u8 += distance;
> +	} else
> +#endif
> +	{
> +		/* if the source and dest lower bits are the same, do a simple
> +		 * 32/64 bit wide copy.
> +		 */

A while() loop would make more sense here.

> +		for (; count >= bytes_long; count -= bytes_long)
> +			*d.ulong++ = *s.ulong++;
> +	}
> +
> +	/* suppress warning when CONFIG_HAVE_EFFICIENT_UNALIGNED_ACCESS=y */
> +	goto copy_remainder;
> +
> +copy_remainder:
> +	while (count--)
> +		*d.u8++ = *s.u8++;
> +
> +	return dest;
> +}
> +EXPORT_SYMBOL(__memcpy);
> +
> +void *memcpy(void *dest, const void *src, size_t count) __weak
> __alias(__memcpy);
> +EXPORT_SYMBOL(memcpy);

Powered by blists - more mailing lists