lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite for Android: free password hash cracker in your pocket
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <CAF=yD-JK+Jnjb5_r3rk+PMwV3cWHTHQHau8CQJ27aSaEQLZxQQ@mail.gmail.com>
Date: Fri, 26 Jul 2024 09:58:38 -0400
From: Willem de Bruijn <willemdebruijn.kernel@...il.com>
To: Jakub Sitnicki <jakub@...udflare.com>
Cc: netdev@...r.kernel.org, "David S. Miller" <davem@...emloft.net>, 
	Eric Dumazet <edumazet@...gle.com>, Jakub Kicinski <kuba@...nel.org>, Paolo Abeni <pabeni@...hat.com>, 
	Willem de Bruijn <willemb@...gle.com>, kernel-team@...udflare.com, 
	syzbot+e15b7e15b8a751a91d9a@...kaller.appspotmail.com
Subject: Re: [PATCH net 1/2] udp: Mark GSO packets as CHECKSUM_UNNECESSARY
 early on on output

On Fri, Jul 26, 2024 at 7:23 AM Jakub Sitnicki <jakub@...udflare.com> wrote:
>
> On Thu, Jul 25, 2024 at 10:21 AM -04, Willem de Bruijn wrote:
> > On Thu, Jul 25, 2024 at 5:56 AM Jakub Sitnicki <jakub@...udflare.com> wrote:
> >>
> >> In commit 10154dbded6d ("udp: Allow GSO transmit from devices with no
> >> checksum offload") we have added a tweak in the UDP GSO code to mark GSO
> >> packets being sent out as CHECKSUM_UNNECESSARY when the egress device
> >> doesn't support checksum offload. This was done to satisfy the offload
> >> checks in the gso stack.
> >>
> >> However, when sending a UDP GSO packet from a tunnel device, we will go
> >> through the TX path and the GSO offload twice. Once for the tunnel device,
> >> which acts as a passthru for GSO packets, and once for the underlying
> >> egress device.
> >>
> >> Even though a tunnel device acts as a passthru for a UDP GSO packet, GSO
> >> offload checks still happen on transmit from a tunnel device. So if the skb
> >> is not marked as CHECKSUM_UNNECESSARY or CHECKSUM_PARTIAL, we will get a
> >> warning from the gso stack.
> >
> > I don't entirely understand. The check should not hit on pass through,
> > where segs == skb:
> >
> >         if (segs != skb && unlikely(skb_needs_check(skb, tx_path) &&
> > !IS_ERR(segs)))
> >                 skb_warn_bad_offload(skb);
> >
>
> That's something I should have explained better. Let me try to shed some
> light on it now. We're hitting the skb_warn_bad_offload warning because
> skb_mac_gso_segment doesn't return any segments (segs == NULL).
>
> And that's because we bail out early out of __udp_gso_segment when we
> detect that the tunnel device is capable of tx-udp-segmentation
> (GSO_UDP_L4):
>
>         if (skb_gso_ok(gso_skb, features | NETIF_F_GSO_ROBUST)) {
>                 /* Packet is from an untrusted source, reset gso_segs. */
>                 skb_shinfo(gso_skb)->gso_segs = DIV_ROUND_UP(gso_skb->len - sizeof(*uh),
>                                                              mss);
>                 return NULL;
>         }

Oh I see. Thanks.

> It has not occurred to me before, but in the spirit of commit
> 8d74e9f88d65 "net: avoid skb_warn_bad_offload on IS_ERR" [1], we could
> tighten the check to exclude cases when segs == NULL. I'm thinking of:
>
>         if (segs != skb && !IS_ERR_OR_NULL(segs) && unlikely(skb_needs_check(skb, tx_path)))
>                 skb_warn_bad_offload(skb);

That looks sensible to me. And nicer than the ip_summed conversion in
udp_send_skb.

> That would be an alternative. Though I'm not sure I understand the
> consequences of such change fully yet. Namely if we're wouldn't be
> losing some diagnostics from the bad offload warning.
>
> [1]
> https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=8d74e9f88d65af8bb2e095aff506aa6eac755ada
>
> >> Today this can occur in two situations, which we check for in
> >> __ip_append_data() and __ip6_append_data():
> >>
> >> 1) when the tunnel device does not advertise checksum offload, or
> >> 2) when there are IPv6 extension headers present.
> >>
> >> To fix it mark UDP_GSO packets as CHECKSUM_UNNECESSARY early on the TX
> >> path, when still in the udp layer, since we need to have ip_summed set up
> >> correctly for GSO processing by tunnel devices.
> >
> > The previous patch converted segments post segmentation to
> > CHECKSUM_UNNECESSARY, which is fine as they had
> > already been checksummed in software, and CHECKSUM_NONE
> > packets on egress are common.
> >
> > This creates GSO packets without CHECKSUM_PARTIAL.
> > Segmentation offload always requires checksum offload. So these
> > would be weird new packets. And having CHECKSUM_NONE (or
> > equivalent), but entering software checksumming is also confusing.
>
> I agree this is confusing to reason about. That is a GSO packet with
> CHECKSUM_UNNECESSARY which has not undergone segmentation and csum
> offload in software.

I was mistaken earlier. Was looking at this code just yesterday too for

https://lore.kernel.org/netdev/20240726023359.879166-1-willemdebruijn.kernel@gmail.com/

We do set the GSO skb already skb CHECKSUM_NONE. So your suggestion is
not a significant change.

> Kind of related, I noticed that turning off tx-checksum-ip-generic with
> ethtool doesn't disable tx-udp-segmentation. That looks like a bug.

I saw the same :)

> > The crux is that I don't understand why the warning fires on tunnel
> > exit when no segmentation takes place there. Hopefully we can fix
> > in a way that does not introduce these weird GSO packets (but if
> > not, so be it).
>
> Attaching a self contained repro which I've been using to trace and
> understand the GSO code:
>
> ---8<---
>
> sh# cat repro-full.py
> #!/bin/env python
> #
> # `modprobe ip6_tunnel` might be needed.
> #
>
> import os
> import subprocess
> import shutil
> from socket import *
>
> UDP_SEGMENT = 103
>
> cmd = [shutil.which("ip"), "-batch", "/dev/stdin"]
> script = b"""
> link set dev lo up
>
> link add name sink mtu 1540 type dummy
> addr add dev sink fd11::2/48 nodad
> link set dev sink up
>
> tunnel add iptnl mode ip6ip6 remote fd11::1 local fd11::2 dev sink
> link set dev iptnl mtu 1500
> addr add dev iptnl fd00::2/48 nodad
> link set dev iptnl up
> """
> proc = subprocess.Popen(cmd, stdin=subprocess.PIPE)
> proc.communicate(input=script)
>
> os.system("ethtool -K sink tx-udp-segmentation off > /dev/null")
> os.system("ethtool -K sink tx-checksum-ip-generic off > /dev/null")
>
> # Alternatively to hopopts:
> # os.system("ethtool -K iptnl tx-checksum-ip-generic off")
>
> hopopts = b"\x00" * 8
> s = socket(AF_INET6, SOCK_DGRAM)
> s.setsockopt(IPPROTO_IPV6, IPV6_HOPOPTS, hopopts)
> s.setsockopt(SOL_UDP, UDP_SEGMENT, 145)
> s.sendto(b"x" * 3000, ("fd00::1", 9))
> sh# perf ftrace -G __skb_gso_segment --graph-opts noirqs,depth=5 -- unshare -n python repro-full.py
> # tracer: function_graph
> #
> # CPU  DURATION                  FUNCTION CALLS
> # |     |   |                     |   |   |   |
>  16)               |  __skb_gso_segment() {
>  16)   0.288 us    |    irq_enter_rcu(); /* = 0xffffa00c03d89ac0 */
>  16)   0.172 us    |    idle_cpu(); /* = 0x0 */
>  16)               |    skb_mac_gso_segment() {
>  16)   0.184 us    |      skb_network_protocol(); /* = 0xdd86 */
>  16)   0.161 us    |      __rcu_read_lock(); /* = 0x2 */
>  16)               |      ipv6_gso_segment() {
>  16)               |        rcu_read_lock_held() {
>  16)   0.151 us    |          rcu_lockdep_current_cpu_online(); /* = 0x1 */
>  16)   0.514 us    |        } /* rcu_read_lock_held = 0x1 */
>  16)               |        rcu_read_lock_held() {
>  16)   0.152 us    |          rcu_lockdep_current_cpu_online(); /* = 0x1 */
>  16)   0.459 us    |        } /* rcu_read_lock_held = 0x1 */
>  16)               |        rcu_read_lock_held() {
>  16)   0.151 us    |          rcu_lockdep_current_cpu_online(); /* = 0x1 */
>  16)   0.459 us    |        } /* rcu_read_lock_held = 0x1 */
>  16)               |        udp6_ufo_fragment() {
>  16)   0.237 us    |          __udp_gso_segment(); /* = 0x0 */
>  16)   0.727 us    |        } /* udp6_ufo_fragment = 0x0 */
>  16)   3.049 us    |      } /* ipv6_gso_segment = 0x0 */
>  16)   0.171 us    |      __rcu_read_unlock(); /* = 0x1 */
>  16)   4.748 us    |    } /* skb_mac_gso_segment = 0x0 */
>  16)               |    skb_warn_bad_offload() {
>  [...]
>  16) ! 785.215 us  |    } /* skb_warn_bad_offload = 0x0 */
>  16) ! 800.986 us  |  } /* __skb_gso_segment = 0x0 */
>  16)               |  __skb_gso_segment() {
>  16)   0.394 us    |    irq_enter_rcu(); /* = 0xffffa00c03d89ac0 */
>  16)   0.181 us    |    idle_cpu(); /* = 0x0 */
>  16)               |    skb_mac_gso_segment() {
>  16)   0.182 us    |      skb_network_protocol(); /* = 0xdd86 */
>  16)   0.178 us    |      __rcu_read_lock(); /* = 0x3 */
>  16)               |      ipv6_gso_segment() {
>  16)               |        rcu_read_lock_held() {
>  16)   0.155 us    |          rcu_lockdep_current_cpu_online(); /* = 0x1 */
>  16)   0.556 us    |        } /* rcu_read_lock_held = 0x1 */
>  16)               |        rcu_read_lock_held() {
>  16)   0.159 us    |          rcu_lockdep_current_cpu_online(); /* = 0x1 */
>  16)   0.480 us    |        } /* rcu_read_lock_held = 0x1 */
>  16)               |        rcu_read_lock_held() {
>  16)   0.159 us    |          rcu_lockdep_current_cpu_online(); /* = 0x1 */
>  16)   0.480 us    |        } /* rcu_read_lock_held = 0x1 */
>  16)               |        ip6ip6_gso_segment() {
>  16) + 22.176 us   |          ipv6_gso_segment(); /* = 0xffffa00c03018c00 */
>  16) + 24.875 us   |        } /* ip6ip6_gso_segment = 0xffffa00c03018c00 */
>  16) + 27.416 us   |      } /* ipv6_gso_segment = 0xffffa00c03018c00 */
>  16)   0.230 us    |      __rcu_read_unlock(); /* = 0x2 */
>  16) + 29.065 us   |    } /* skb_mac_gso_segment = 0xffffa00c03018c00 */
>  16) + 32.828 us   |  } /* __skb_gso_segment = 0xffffa00c03018c00 */
> sh#

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ