[<prev] [next>] [<thread-prev] [day] [month] [year] [list]
Message-ID: <CALx6S34mS=v3A4mmuHeKoeyjYLb0wJfMe=ZrZ9-BF=ntu1kmTw@mail.gmail.com>
Date: Wed, 3 Jul 2024 08:56:20 -0700
From: Tom Herbert <tom@...bertland.com>
To: Przemek Kitszel <przemyslaw.kitszel@...el.com>
Cc: "Greenwalt, Paul" <paul.greenwalt@...el.com>,
"Nguyen, Anthony L" <anthony.l.nguyen@...el.com>, "David S. Miller" <davem@...emloft.net>,
Jakub Kicinski <kuba@...nel.org>, Jesse Brandeburg <jesse.brandeburg@...el.com>, cai.huoqing@...ux.dev,
Linux Kernel Network Developers <netdev@...r.kernel.org>, Felipe Magno de Almeida <felipe@...anda.io>,
Justin Iurman <justin.iurman@...ege.be>, Alexander Lobakin <aleksander.lobakin@...el.com>,
"Samudrala, Sridhar" <sridhar.samudrala@...el.com>
Subject: Re: [PATCH net-next v2 0/7] drivers: Fix drivers doing TX csum
offload with EH
On Wed, Jul 3, 2024 at 8:03 AM Przemek Kitszel
<przemyslaw.kitszel@...el.com> wrote:
>
> On 7/3/24 16:38, Tom Herbert wrote:
> >
> >
> > On Wed, Jul 3, 2024, 7:20 AM Greenwalt, Paul <paul.greenwalt@...el.com
> > <mailto:paul.greenwalt@...el.com>> wrote:
> >
> >
> >
> > On 7/2/2024 3:31 AM, Przemek Kitszel wrote:
> > > On 7/1/24 21:55, Tom Herbert wrote:
> > >> Several NICs would seem to support protocol specific TX checksum
> > offload
> > >> and allow for cases where an IPv6 packet contains extension headers.
> > >> When deciding whether to offload a packet, ipv6_skip_exthdr is
> > called
> > >> to skip extension headers. The problem is that if a packet
> > contains an
> > >> IPv6 Routing Header then protocol specific checksum offload
> > can't work,
> > >> the destination IP address in the IPv6 header is not the same
> > one that
> > >> is used in the pseudo header for TCP or UDP. The correct address is
> > >> derived from the last segment in the routing list (which itself
> > might
> > >> be obfuscated so that a device could even read it).
> > >
> > > feels like there is a missing "not" after "could" - with it
> > added, reads
> > > fine (not a request to change, just being verbose about assumptions)
> > >
> > >>
> > >> This patch set adds a new function ipv6_skip_exthdr_no_rthdr to be
> > >> called in lieu of ipv6_skip_exthdr. If a routing header is
> > present in
> > >> a packet then ipv6_skip_exthdr_no_rthdr returns a value less than
> > >> zero, this is an indication to the driver that TX checksum offload
> > >> is not viable and it should call skb_checksum_help instead of
> > >> offloading the checksum.
> > >>
> > >> The i40e, iavf, ice, idpf, hinic, and fm10k are updated accordingly
> > >> to call ipv6_skip_exthdr_no_rthdr.
> > >>
> > >> Testing: The code compiles, but is otherwise untested due to lack of
> > >> NIC hardware. It would be appreciated if someone with access to the
> > >> hardware could test.
> > >
> > > we could test intel ones (except fm10k) via @Tony's tree
> >
> >
> > Awesome! If you need any help let me know.
> >
> > >
> > >>
> > >> v2: Fixed uninitialized variable in exthdrs_core.c
> > >>
> > >> Tom Herbert (7):
> > >> ipv6: Add ipv6_skip_exthdr_no_rthdr
> > >> i40e: Don't do TX csum offload with routing header present
> > >> iavf: Don't do TX csum offload with routing header present
> > >> ice: Don't do TX csum offload with routing header present
> > >
> > > sidenote:
> > > our HW is supporting (among others) a GCO check-summing mode
> > described
> > > as: "Checksum 16bit (TCP/UDP) with no pseudo Header", but we have not
> > > yet provided patches for that, and I don't even know if this mode
> > > will be used (CC @Paul)
> > >
> >
> > We will be adding support for GCO "Checksum 16 with pseudo Headers" to
> > the ice driver. It will be off by default.
> >
> >
> > I'm not sure what that means.
>
> IPv6 Routing Headers render (simple?) HW-offloaded checksumming wrong,
> but not for the "no pseudo Header"-checksum. I have no idea how such
> checksum will be useful, and we don't have plans to implement it, so
> this is not much relevant. But that is just one mode that you could
> config our (new) HW.
>
> > Can ICE just provide checksum-complete?
> > It's by far the simplest, most robust method with the most flexibility
> > for users. :-)
>
> sorry, could you please elaborate?
>
> Paul will implement GCO for ice and otherwise my understanding was that
> our checksum is fine. Is there a room for improvement?
Przemek,
No, there's plenty of room for improvement :-). This is protocol
specific checksum offload versus protocol agnostic checksum offload,
and the opinion of the community has been clear for a long time:
protocol agnostic checksum offload is the preferred method and
protocol specific checksum offload is deprecated. This is true for
both transmit and receive checksum offload. For background see Dave
Miller's presentation on this from 2016:
https://www.youtube.com/watch?v=6VgmazGwL_Y.
Protocol agnostic checksum offload isn't just a to have, it addresses
many bugs in protocol specific checksum offload. This patch set
addresses one obvious bug, but there are others. For instance, in IETF
there is a proposal in spring WG to do SRv6 without a routing header
that would make the checksum incorrect on the wire. This will break
protocol specific TX checksum offload and there's nothing to key on in
the packet like an RH so that a driver would know the offload will
fail. I'm really not sure how we could fix this without major surgery
in the stack. Use protocol agnostic checksum offload and it "just
works" (the proposal to purposely send a bad checksum on the wire
without a RH is a bad idea in general, but I'm not sure we'll be able
to stop it in IETF).
And not to pick on the ICE driver, but please take a look at the
function ice_tx_csum. This function is called on every packet just to
evaluate whether the device is going to be able to offload the packet.
Basically, it parses the packet on transmit to make sure that the
device will be able to parse the packet (this is where we need to call
ipv6_skip_exthdr_no_rthdr). This function is 180 LOC! If the device
properly supports protocol agnostic checksum offload all that is
needed is to write the start offset and checksum offset into the
receive descriptor. Maybe there's some checks on the offset values,
but I can't see that needing more than ten LOC-- it's much less
susceptible to bugs, more robust, and works with a much wider set of
protocol combinations.
BTW, this patch set is the first in a series to formally deprecate and
remove protocol specific checksum offloads from the core kernel. IMO,
we've waited long enough! My intent is to remove CHECKSUM_UNNECESSARY,
NETIF_F_IP_CSUM, and NETIF_F_IPV6_CSUM. (note comment in skbuff.h "New
devices should use %NETIF_F_HW_CSUM to indicate checksum offload
capability."). Helper functions will be provided to support legacy
devices.
Tom
>
> >
> > Tom
> >
> >
> > >> idpf: Don't do TX csum offload with routing header present
> > >> hinic: Don't do TX csum offload with routing header present
> > >> fm10k: Don't do TX csum offload with routing header present
> > >>
> > >> drivers/net/ethernet/huawei/hinic/hinic_tx.c | 23 +++++++++++----
> > >> drivers/net/ethernet/intel/fm10k/fm10k_main.c | 9 ++++--
> > >> drivers/net/ethernet/intel/i40e/i40e_txrx.c | 22 ++++++---------
> > >> drivers/net/ethernet/intel/iavf/iavf_txrx.c | 20 ++++++-------
> > >> drivers/net/ethernet/intel/ice/ice_txrx.c | 22 ++++++---------
> > >> .../ethernet/intel/idpf/idpf_singleq_txrx.c | 28
> > +++++++++----------
> > >> include/net/ipv6.h | 17 +++++++++--
> > >> net/ipv6/exthdrs_core.c | 25
> > ++++++++++++-----
> > >> 8 files changed, 98 insertions(+), 68 deletions(-)
> > >>
> > >
> > > I have reviewed the patches and they conform to commit
> > message/intent,
> > > Reviewed-by: Przemek Kitszel <przemyslaw.kitszel@...el.com
> > <mailto:przemyslaw.kitszel@...el.com>>
> > > (for the series)
> >
>
Powered by blists - more mailing lists