[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <66b0e0d3c2119_2f5edf294c1@willemb.c.googlers.com.notmuch>
Date: Mon, 05 Aug 2024 10:25:23 -0400
From: Willem de Bruijn <willemdebruijn.kernel@...il.com>
To: Jakub Sitnicki <jakub@...udflare.com>,
Willem de Bruijn <willemdebruijn.kernel@...il.com>
Cc: netdev@...r.kernel.org,
"David S. Miller" <davem@...emloft.net>,
Eric Dumazet <edumazet@...gle.com>,
Jakub Kicinski <kuba@...nel.org>,
Paolo Abeni <pabeni@...hat.com>,
Willem de Bruijn <willemb@...gle.com>,
kernel-team@...udflare.com,
syzbot+e15b7e15b8a751a91d9a@...kaller.appspotmail.com
Subject: Re: [PATCH net v2 1/2] gso: Skip bad offload detection when device
supports requested GSO
Jakub Sitnicki wrote:
> On Thu, Aug 01, 2024 at 03:13 PM -04, Willem de Bruijn wrote:
> > On Thu, Aug 1, 2024 at 10:09 AM Jakub Sitnicki <jakub@...udflare.com> wrote:
> >>
> >> In commit 10154dbded6d ("udp: Allow GSO transmit from devices with no
> >> checksum offload") we have intentionally allowed UDP GSO packets marked
> >> CHECKSUM_NONE to pass to the GSO stack, so that they can be segmented and
> >> checksummed by a software fallback when the egress device lacks these
> >> features.
> >>
> >> What was not taken into consideration is that a CHECKSUM_NONE skb can be
> >> handed over to the GSO stack also when the egress device advertises the
> >> tx-udp-segmentation / NETIF_F_GSO_UDP_L4 feature.
> >>
> >> This can happen in two situations, which we detect in __ip_append_data()
> >> and __ip6_append_data():
> >>
> >> 1) when there are IPv6 extension headers present, or
> >> 2) when the tunnel device does not advertise checksum offload.
> >>
> >> Note that in the latter case we have a nonsensical device configuration.
> >> Device support for UDP segmentation offload requires checksum offload in
> >> hardware as well.
> >>
> >> Syzbot has discovered the first case, producing a warning as below:
> >>
> >> ip6tnl0: caps=(0x00000006401d7869, 0x00000006401d7869)
> >> WARNING: CPU: 0 PID: 5112 at net/core/dev.c:3293 skb_warn_bad_offload+0x166/0x1a0 net/core/dev.c:3291
> >> Modules linked in:
> >> CPU: 0 PID: 5112 Comm: syz-executor391 Not tainted 6.10.0-rc7-syzkaller-01603-g80ab5445da62 #0
> >> Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 06/07/2024
> >> RIP: 0010:skb_warn_bad_offload+0x166/0x1a0 net/core/dev.c:3291
> >> [...]
> >> Call Trace:
> >> <TASK>
> >> __skb_gso_segment+0x3be/0x4c0 net/core/gso.c:127
> >> skb_gso_segment include/net/gso.h:83 [inline]
> >> validate_xmit_skb+0x585/0x1120 net/core/dev.c:3661
> >> __dev_queue_xmit+0x17a4/0x3e90 net/core/dev.c:4415
> >> neigh_output include/net/neighbour.h:542 [inline]
> >> ip6_finish_output2+0xffa/0x1680 net/ipv6/ip6_output.c:137
> >> ip6_finish_output+0x41e/0x810 net/ipv6/ip6_output.c:222
> >> ip6_send_skb+0x112/0x230 net/ipv6/ip6_output.c:1958
> >> udp_v6_send_skb+0xbf5/0x1870 net/ipv6/udp.c:1292
> >> udpv6_sendmsg+0x23b3/0x3270 net/ipv6/udp.c:1588
> >> sock_sendmsg_nosec net/socket.c:730 [inline]
> >> __sock_sendmsg+0xef/0x270 net/socket.c:745
> >> ____sys_sendmsg+0x525/0x7d0 net/socket.c:2585
> >> ___sys_sendmsg net/socket.c:2639 [inline]
> >> __sys_sendmmsg+0x3b2/0x740 net/socket.c:2725
> >> __do_sys_sendmmsg net/socket.c:2754 [inline]
> >> __se_sys_sendmmsg net/socket.c:2751 [inline]
> >> __x64_sys_sendmmsg+0xa0/0xb0 net/socket.c:2751
> >> do_syscall_x64 arch/x86/entry/common.c:52 [inline]
> >> do_syscall_64+0xf3/0x230 arch/x86/entry/common.c:83
> >> entry_SYSCALL_64_after_hwframe+0x77/0x7f
> >> [...]
> >> </TASK>
> >>
> >> We are hitting the bad offload warning because when an egress device is
> >> capable of handling segmentation offload requested by
> >> skb_shinfo(skb)->gso_type, the chain of gso_segment callbacks won't produce
> >> any segment skbs and return NULL. See the skb_gso_ok() branch in
> >> {__udp,tcp,sctp}_gso_segment helpers.
> >>
> >> To fix it, skip bad offload detection when gso_segment has returned
> >> nothing. We know that in such case the egress device supports the desired
> >> GSO offload, which implies that it can fill in L4 checksums. Hence we don't
> >> need to check the skb->ip_summed value, which reflects the egress device
> >> checksum capabilities.
> >>
> >> Fixes: 10154dbded6d ("udp: Allow GSO transmit from devices with no checksum offload")
> >> Reported-by: syzbot+e15b7e15b8a751a91d9a@...kaller.appspotmail.com
> >> Closes: https://lore.kernel.org/all/000000000000e1609a061d5330ce@google.com/
> >> Signed-off-by: Jakub Sitnicki <jakub@...udflare.com>
> >
> > Reviewed-by: Willem de Bruijn <willemb@...gle.com>
> >
> > It's a bit odd, in that the ip_summed == CHECKSUM_NONE ends up just
> > being ignored and devices are trusted to always be able to checksum
> > offload when they can segment offload -- even when the device does not
> > advertise checksum offload.
> >
> > I think we should have a follow-on that makes advertising
> > NETIF_F_GSO_UDP_L4 dependent on having at least one of the
> > NETIF_F_*_CSUM bits set (handwaving over what happens when only
> > advertising NETIF_F_IP_CSUM or NETIF_F_IPV6_CSUM).
>
> I agree. I've also gained some clarity as to how the fix should
> look. Let's circle back to it, if we still think it's relevant once we
> hash out the fix.
>
> After spending some quality time debugging the addded regression test
> [1], I've realized this fix is wrong.
>
> You see, with commit 10154dbded6d ("udp: Allow GSO transmit from devices
> with no checksum offload"), I've opened up the UDP_SEGMENT API to two
> uses, which I think should not be allowed:
>
> 1. Hardware USO for IPv6 dgrams with extension headers
>
> Previously that led to -EIO, because __ip6_append_data won't annotate
> such packets as CHECKSUM_PARTIAL.
>
> I'm guessing that we do this because some drivers that advertise csum
> offload can't actually handle checksumming when extension headers are
> present.
>
> Extension headers are not part of IPv6 pseudo header, but who knows what
> limitations NIC firmwares have.
>
> Either way, changing it just like that sounds risky, so I think we need
> to fall back to software USO with software checksum in this case.
>
> Alternatively, we could catch it in the udp layer and error out with EIO
> as ealier. But that shifts some burden onto the user space (detect and
> segment before sendmsg()).
>
> 2. Hardware USO when hardware csum is unsupported / disabled
>
> That sounds like a pathological device configuration case, but since it
> is possible today with some drivers to disable csum offload for one IP
> version and not the other, it seems safest to just handle that
> gracefully.
>
> I don't know why one might want to do that. Perhaps as a workaround for
> some firmware bug while waiting for a fix?
I doubt that this is actually used. But today it can be configured.
Which is why I suggested making NETIF_F_GSO_UDP_L4 dependent on csum
offload (in netdev_fix_features). I doubt that that will break any
real user.
> In this scenario I think we also need to fall back to software USO and
> checksum.
>
> Code-wise that could look like below. WDYT?
Since this only affects USO, can we fix this is in __udp_gso_segment.
Basically, not taking the NETIF_F_GSO_ROBUST path.
skb_segment is so complicated already. Whatever we can do to avoid
adding to that.
> [1] I feel silly for submitting a broken patch. I've been running a
> stale test prog in a VM. It seems that virtme-ng with a read+write
> overlay for rootfs played a trick on me. Changes to the host files are
> not (always?) visible to the guest.
>
> ---8<---
> diff --git a/net/core/skbuff.c b/net/core/skbuff.c
> index 83f8cd8aa2d1..819173807c81 100644
> --- a/net/core/skbuff.c
> +++ b/net/core/skbuff.c
> @@ -4806,7 +4806,13 @@ struct sk_buff *skb_segment(struct sk_buff *head_skb,
> goto perform_csum_check;
>
> if (!sg) {
> - if (!csum) {
> + /* Fall back to software checksum for segments if:
> + * 1) device can't checksum this network protocol, OR
> + * 2) we consider the packet to be not checksummable in
> + * hardware, for example IPv6 extension headers are
> + * present.
> + */
> + if (!csum || head_skb->ip_summed != CHECKSUM_PARTIAL) {
> if (!nskb->remcsum_offload)
> nskb->ip_summed = CHECKSUM_NONE;
> SKB_GSO_CB(nskb)->csum =
> @@ -4896,7 +4902,7 @@ struct sk_buff *skb_segment(struct sk_buff *head_skb,
> nskb->truesize += nskb->data_len;
>
> perform_csum_check:
> - if (!csum) {
> + if (!csum || head_skb->ip_summed != CHECKSUM_PARTIAL) {
> if (skb_has_shared_frag(nskb) &&
> __skb_linearize(nskb))
> goto err;
> diff --git a/net/ipv4/udp_offload.c b/net/ipv4/udp_offload.c
> index bc8a9da750fe..0a34b418b83c 100644
> --- a/net/ipv4/udp_offload.c
> +++ b/net/ipv4/udp_offload.c
> @@ -282,7 +282,15 @@ struct sk_buff *__udp_gso_segment(struct sk_buff *gso_skb,
> skb_transport_header(gso_skb)))
> return ERR_PTR(-EINVAL);
>
> - if (skb_gso_ok(gso_skb, features | NETIF_F_GSO_ROBUST)) {
> + /* Pass-through to device for segmentation only if:
> + * 1) we consider the packet checksummable in hardware, that is no
> + * IPv6 extension headers present, AND
> + * 2) device supports checksum offload for this network protocol
> + * (NETIF_F_{IP,IPV6}_CSUM or NETIF_F_HW_CSUM), AND
> + * 3) device supports the requested GSO kind.
> + */
> + if (gso_skb->ip_summed == CHECKSUM_PARTIAL &&
> + skb_gso_ok(gso_skb, features | NETIF_F_GSO_ROBUST)) {
> /* Packet is from an untrusted source, reset gso_segs. */
> skb_shinfo(gso_skb)->gso_segs = DIV_ROUND_UP(gso_skb->len - sizeof(*uh),
> mss);
Powered by blists - more mailing lists