[<prev] [next>] [<thread-prev] [day] [month] [year] [list]
Message-ID: <DM8PR11MB5751D7F4F8D7297198452723B896A@DM8PR11MB5751.namprd11.prod.outlook.com>
Date: Wed, 20 Dec 2023 10:28:04 +0000
From: "Wang, Xiao W" <xiao.w.wang@...el.com>
To: Charlie Jenkins <charlie@...osinc.com>, Palmer Dabbelt
<palmer@...belt.com>, Conor Dooley <conor@...nel.org>, Samuel Holland
<samuel.holland@...ive.com>, David Laight <David.Laight@...lab.com>, "Evan
Green" <evan@...osinc.com>, "linux-riscv@...ts.infradead.org"
<linux-riscv@...ts.infradead.org>, "linux-kernel@...r.kernel.org"
<linux-kernel@...r.kernel.org>, "linux-arch@...r.kernel.org"
<linux-arch@...r.kernel.org>
CC: Paul Walmsley <paul.walmsley@...ive.com>, Albert Ou
<aou@...s.berkeley.edu>, Arnd Bergmann <arnd@...db.de>, Conor Dooley
<conor.dooley@...rochip.com>
Subject: RE: [PATCH v12 4/5] riscv: Add checksum library
> -----Original Message-----
> From: Charlie Jenkins <charlie@...osinc.com>
> Sent: Wednesday, December 13, 2023 10:11 AM
> To: Palmer Dabbelt <palmer@...belt.com>; Conor Dooley
> <conor@...nel.org>; Samuel Holland <samuel.holland@...ive.com>; David
> Laight <David.Laight@...lab.com>; Wang, Xiao W <xiao.w.wang@...el.com>;
> Evan Green <evan@...osinc.com>; linux-riscv@...ts.infradead.org; linux-
> kernel@...r.kernel.org; linux-arch@...r.kernel.org
> Cc: Paul Walmsley <paul.walmsley@...ive.com>; Albert Ou
> <aou@...s.berkeley.edu>; Arnd Bergmann <arnd@...db.de>; Conor Dooley
> <conor.dooley@...rochip.com>
> Subject: Re: [PATCH v12 4/5] riscv: Add checksum library
>
> On Tue, Dec 12, 2023 at 05:18:41PM -0800, Charlie Jenkins wrote:
> > Provide a 32 and 64 bit version of do_csum. When compiled for 32-bit
> > will load from the buffer in groups of 32 bits, and when compiled for
> > 64-bit will load in groups of 64 bits.
> >
> > Additionally provide riscv optimized implementation of csum_ipv6_magic.
> >
> > Signed-off-by: Charlie Jenkins <charlie@...osinc.com>
> > Acked-by: Conor Dooley <conor.dooley@...rochip.com>
> > Reviewed-by: Xiao Wang <xiao.w.wang@...el.com>
> > ---
> > arch/riscv/include/asm/checksum.h | 13 +-
> > arch/riscv/lib/Makefile | 1 +
> > arch/riscv/lib/csum.c | 326
> ++++++++++++++++++++++++++++++++++++++
> > 3 files changed, 339 insertions(+), 1 deletion(-)
> >
> > diff --git a/arch/riscv/include/asm/checksum.h
> b/arch/riscv/include/asm/checksum.h
> > index 2fcf864186e7..3fa04ff1eda8 100644
> > --- a/arch/riscv/include/asm/checksum.h
> > +++ b/arch/riscv/include/asm/checksum.h
> > @@ -12,6 +12,17 @@
> >
> > #define ip_fast_csum ip_fast_csum
> >
> > +extern unsigned int do_csum(const unsigned char *buff, int len);
> > +#define do_csum do_csum
> > +
> > +/* Default version is sufficient for 32 bit */
> > +#ifndef CONFIG_32BIT
> > +#define _HAVE_ARCH_IPV6_CSUM
> > +__sum16 csum_ipv6_magic(const struct in6_addr *saddr,
> > + const struct in6_addr *daddr,
> > + __u32 len, __u8 proto, __wsum sum);
> > +#endif
> > +
> > /* Define riscv versions of functions before importing asm-
> generic/checksum.h */
> > #include <asm-generic/checksum.h>
> >
> > @@ -69,7 +80,7 @@ static inline __sum16 ip_fast_csum(const void *iph,
> unsigned int ihl)
> > .option pop"
> > : [csum] "+r" (csum), [fold_temp] "=&r" (fold_temp));
> > }
> > - return csum >> 16;
> > + return (__force __sum16) (csum >> 16);
I notice that this type conversion comes in after V10. This change should go to patch 3/5.
BRs,
Xiao
[...]
> > +
> > +/*
> > + * Perform a checksum on an arbitrary memory address.
> > + * Will do a light-weight address alignment if buff is misaligned, unless
> > + * cpu supports fast misaligned accesses.
> > + */
> > +unsigned int do_csum(const unsigned char *buff, int len)
> > +{
> > + if (unlikely(len <= 0))
> > + return 0;
> > +
> > + /*
> > + * Significant performance gains can be seen by not doing alignment
> > + * on machines with fast misaligned accesses.
> > + *
> > + * There is some duplicate code between the "with_alignment" and
> > + * "no_alignment" implmentations, but the overlap is too awkward to
> be
> > + * able to fit in one function without introducing multiple static
> > + * branches. The largest chunk of overlap was delegated into the
> > + * do_csum_common function.
> > + */
> > + if (static_branch_likely(&fast_misaligned_access_speed_key))
> > + return do_csum_no_alignment(buff, len);
> > +
> > + if (((unsigned long)buff & OFFSET_MASK) == 0)
> > + return do_csum_no_alignment(buff, len);
> > +
> > + return do_csum_with_alignment(buff, len);
> > +}
> >
> > --
> > 2.43.0
> >
>
> There is potentially a code size concern here. These changes do require
> alternatives, and as such it increases the resulting binary size. The
> bloat-o-meter script reports that the do_csum function grows to twice
> the size with this patch:
>
> Function old new delta
> do_csum 238 514 +276
>
> The other functions are harder to measure because they get inlined or
> are not included in generic code. However the do_csum is the most
> impacted because of the misaligned access behavior.
>
> The performance improvements afforded by alternatives (with the Zbb
> extension) and with the misaligned access checking are significant. In
> my testing these optimizations alone contribute to over a 20% performance
> improvement.
>
> - Charlie
Powered by blists - more mailing lists