[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <1395667527.12610.32.camel@edumazet-glaptop2.roam.corp.google.com>
Date: Mon, 24 Mar 2014 06:25:27 -0700
From: Eric Dumazet <eric.dumazet@...il.com>
To: David Laight <David.Laight@...LAB.COM>
Cc: David Miller <davem@...emloft.net>,
"herbert@...dor.apana.org.au" <herbert@...dor.apana.org.au>,
"hkchu@...gle.com" <hkchu@...gle.com>,
"mwdalton@...gle.com" <mwdalton@...gle.com>,
"netdev@...r.kernel.org" <netdev@...r.kernel.org>
Subject: RE: [PATCH net-next] net: optimize csum_replace2()
On Mon, 2014-03-24 at 10:22 +0000, David Laight wrote:
> From: Eric Dumazet <edumazet@...gle.com>
> >
> > When changing one 16bit value by another in IP header, we can adjust the
> > IP checksum by doing a simple operation described in RFC 1624,
> > as reminded by David.
> >
> > csum_partial() is a complex function on x86_64, not really suited
> > for small number of checksummed bytes.
> >
> > I spotted csum_partial() being in the top 20 most consuming
> > functions (more than 1 %) in a GRO workload, which was rather
> > unexpected.
> >
> > The caller was inet_gro_complete() doing a csum_replace2() when
> > building the new IP header for the GRO packet.
> >
> > Signed-off-by: Eric Dumazet <edumazet@...gle.com>
> > ---
> > include/net/checksum.h | 23 +++++++++++++++++++++--
> > 1 file changed, 21 insertions(+), 2 deletions(-)
> >
> > diff --git a/include/net/checksum.h b/include/net/checksum.h
> > index 37a0e24adbe7..a28f4e0f6251 100644
> > --- a/include/net/checksum.h
> > +++ b/include/net/checksum.h
> > @@ -69,6 +69,19 @@ static inline __wsum csum_sub(__wsum csum, __wsum addend)
> > return csum_add(csum, ~addend);
> > }
> >
> > +static inline __sum16 csum16_add(__sum16 csum, __be16 addend)
> > +{
> > + u16 res = (__force u16)csum;
>
> Shouldn't that be u32 ?
Why ? We compute 16bit checksums here.
>
> > + res += (__force u16)addend;
> > + return (__force __sum16)(res + (res < (__force u16)addend));
Note how carry is propagated, and how we return 16 bit anyway.
Using u32 would force to use a fold, which is more expensive.
> > +}
> > +
> > +static inline __sum16 csum16_sub(__sum16 csum, __be16 addend)
> > +{
> > + return csum16_add(csum, ~addend);
> > +}
> > +
> > static inline __wsum
> > csum_block_add(__wsum csum, __wsum csum2, int offset)
> > {
> > @@ -112,9 +125,15 @@ static inline void csum_replace4(__sum16 *sum, __be32 from, __be32 to)
> > *sum = csum_fold(csum_partial(diff, sizeof(diff), ~csum_unfold(*sum)));
> > }
> >
> > -static inline void csum_replace2(__sum16 *sum, __be16 from, __be16 to)
> > +/* Implements RFC 1624 (Incremental Internet Checksum)
> > + * 3. Discussion states :
> > + * HC' = ~(~HC + ~m + m')
> > + * m : old value of a 16bit field
> > + * m' : new value of a 16bit field
> > + */
> > +static inline void csum_replace2(__sum16 *sum, __be16 old, __be16 new)
> > {
> > - csum_replace4(sum, (__force __be32)from, (__force __be32)to);
> > + *sum = ~csum16_add(csum16_sub(~(*sum), old), new);
> > }
>
> It might be clearer to just say:
> *sum = ~csum16_add(csum16_add(~*sum, ~old), new));
> or even:
> *sum = ~csum16_add(csum16_add(*sum ^ 0xffff, old ^ 0xffff), new));
> which might remove some mask instructions - especially if all the
> intermediate values are left larger than 16 bits.
We have csum_add() and csum_sub(), I added csum16_add() and
csum16_sub(), for analogy and completeness.
For linux guys, its quite common stuff to use csum_sub(x, y) instead of
csum_add(c, ~y).
You can use whatever code matching your taste.
RFC 1624 mentions ~m, not m ^ 0xffff
But again if you prefer m ^ 0xffff, thats up to you ;)
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Powered by blists - more mailing lists