linux-kernel - RE: [PATCH v12 4/5] riscv: Add checksum library

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [day] [month] [year] [list]
Message-ID: <DM8PR11MB5751D7F4F8D7297198452723B896A@DM8PR11MB5751.namprd11.prod.outlook.com>
Date: Wed, 20 Dec 2023 10:28:04 +0000
From: "Wang, Xiao W" <xiao.w.wang@...el.com>
To: Charlie Jenkins <charlie@...osinc.com>, Palmer Dabbelt
	<palmer@...belt.com>, Conor Dooley <conor@...nel.org>, Samuel Holland
	<samuel.holland@...ive.com>, David Laight <David.Laight@...lab.com>, "Evan
 Green" <evan@...osinc.com>, "linux-riscv@...ts.infradead.org"
	<linux-riscv@...ts.infradead.org>, "linux-kernel@...r.kernel.org"
	<linux-kernel@...r.kernel.org>, "linux-arch@...r.kernel.org"
	<linux-arch@...r.kernel.org>
CC: Paul Walmsley <paul.walmsley@...ive.com>, Albert Ou
	<aou@...s.berkeley.edu>, Arnd Bergmann <arnd@...db.de>, Conor Dooley
	<conor.dooley@...rochip.com>
Subject: RE: [PATCH v12 4/5] riscv: Add checksum library



> -----Original Message-----
> From: Charlie Jenkins <charlie@...osinc.com>
> Sent: Wednesday, December 13, 2023 10:11 AM
> To: Palmer Dabbelt <palmer@...belt.com>; Conor Dooley
> <conor@...nel.org>; Samuel Holland <samuel.holland@...ive.com>; David
> Laight <David.Laight@...lab.com>; Wang, Xiao W <xiao.w.wang@...el.com>;
> Evan Green <evan@...osinc.com>; linux-riscv@...ts.infradead.org; linux-
> kernel@...r.kernel.org; linux-arch@...r.kernel.org
> Cc: Paul Walmsley <paul.walmsley@...ive.com>; Albert Ou
> <aou@...s.berkeley.edu>; Arnd Bergmann <arnd@...db.de>; Conor Dooley
> <conor.dooley@...rochip.com>
> Subject: Re: [PATCH v12 4/5] riscv: Add checksum library
> 
> On Tue, Dec 12, 2023 at 05:18:41PM -0800, Charlie Jenkins wrote:
> > Provide a 32 and 64 bit version of do_csum. When compiled for 32-bit
> > will load from the buffer in groups of 32 bits, and when compiled for
> > 64-bit will load in groups of 64 bits.
> >
> > Additionally provide riscv optimized implementation of csum_ipv6_magic.
> >
> > Signed-off-by: Charlie Jenkins <charlie@...osinc.com>
> > Acked-by: Conor Dooley <conor.dooley@...rochip.com>
> > Reviewed-by: Xiao Wang <xiao.w.wang@...el.com>
> > ---
> >  arch/riscv/include/asm/checksum.h |  13 +-
> >  arch/riscv/lib/Makefile           |   1 +
> >  arch/riscv/lib/csum.c             | 326
> ++++++++++++++++++++++++++++++++++++++
> >  3 files changed, 339 insertions(+), 1 deletion(-)
> >
> > diff --git a/arch/riscv/include/asm/checksum.h
> b/arch/riscv/include/asm/checksum.h
> > index 2fcf864186e7..3fa04ff1eda8 100644
> > --- a/arch/riscv/include/asm/checksum.h
> > +++ b/arch/riscv/include/asm/checksum.h
> > @@ -12,6 +12,17 @@
> >
> >  #define ip_fast_csum ip_fast_csum
> >
> > +extern unsigned int do_csum(const unsigned char *buff, int len);
> > +#define do_csum do_csum
> > +
> > +/* Default version is sufficient for 32 bit */
> > +#ifndef CONFIG_32BIT
> > +#define _HAVE_ARCH_IPV6_CSUM
> > +__sum16 csum_ipv6_magic(const struct in6_addr *saddr,
> > +			const struct in6_addr *daddr,
> > +			__u32 len, __u8 proto, __wsum sum);
> > +#endif
> > +
> >  /* Define riscv versions of functions before importing asm-
> generic/checksum.h */
> >  #include <asm-generic/checksum.h>
> >
> > @@ -69,7 +80,7 @@ static inline __sum16 ip_fast_csum(const void *iph,
> unsigned int ihl)
> >  			.option pop"
> >  			: [csum] "+r" (csum), [fold_temp] "=&r" (fold_temp));
> >  		}
> > -		return csum >> 16;
> > +		return (__force __sum16) (csum >> 16);

I notice that this type conversion comes in after V10. This change should go to patch 3/5.

BRs,
Xiao

[...]
> > +
> > +/*
> > + * Perform a checksum on an arbitrary memory address.
> > + * Will do a light-weight address alignment if buff is misaligned, unless
> > + * cpu supports fast misaligned accesses.
> > + */
> > +unsigned int do_csum(const unsigned char *buff, int len)
> > +{
> > +	if (unlikely(len <= 0))
> > +		return 0;
> > +
> > +	/*
> > +	 * Significant performance gains can be seen by not doing alignment
> > +	 * on machines with fast misaligned accesses.
> > +	 *
> > +	 * There is some duplicate code between the "with_alignment" and
> > +	 * "no_alignment" implmentations, but the overlap is too awkward to
> be
> > +	 * able to fit in one function without introducing multiple static
> > +	 * branches. The largest chunk of overlap was delegated into the
> > +	 * do_csum_common function.
> > +	 */
> > +	if (static_branch_likely(&fast_misaligned_access_speed_key))
> > +		return do_csum_no_alignment(buff, len);
> > +
> > +	if (((unsigned long)buff & OFFSET_MASK) == 0)
> > +		return do_csum_no_alignment(buff, len);
> > +
> > +	return do_csum_with_alignment(buff, len);
> > +}
> >
> > --
> > 2.43.0
> >
> 
> There is potentially a code size concern here. These changes do require
> alternatives, and as such it increases the resulting binary size. The
> bloat-o-meter script reports that the do_csum function grows to twice
> the size with this patch:
> 
> Function                                     old     new   delta
> do_csum                                      238     514    +276
> 
> The other functions are harder to measure because they get inlined or
> are not included in generic code. However the do_csum is the most
> impacted because of the misaligned access behavior.
> 
> The performance improvements afforded by alternatives (with the Zbb
> extension) and with the misaligned access checking are significant. In
> my testing these optimizations alone contribute to over a 20% performance
> improvement.
> 
> - Charlie