[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <CAF=yD-JK+Jnjb5_r3rk+PMwV3cWHTHQHau8CQJ27aSaEQLZxQQ@mail.gmail.com>
Date: Fri, 26 Jul 2024 09:58:38 -0400
From: Willem de Bruijn <willemdebruijn.kernel@...il.com>
To: Jakub Sitnicki <jakub@...udflare.com>
Cc: netdev@...r.kernel.org, "David S. Miller" <davem@...emloft.net>,
Eric Dumazet <edumazet@...gle.com>, Jakub Kicinski <kuba@...nel.org>, Paolo Abeni <pabeni@...hat.com>,
Willem de Bruijn <willemb@...gle.com>, kernel-team@...udflare.com,
syzbot+e15b7e15b8a751a91d9a@...kaller.appspotmail.com
Subject: Re: [PATCH net 1/2] udp: Mark GSO packets as CHECKSUM_UNNECESSARY
early on on output
On Fri, Jul 26, 2024 at 7:23 AM Jakub Sitnicki <jakub@...udflare.com> wrote:
>
> On Thu, Jul 25, 2024 at 10:21 AM -04, Willem de Bruijn wrote:
> > On Thu, Jul 25, 2024 at 5:56 AM Jakub Sitnicki <jakub@...udflare.com> wrote:
> >>
> >> In commit 10154dbded6d ("udp: Allow GSO transmit from devices with no
> >> checksum offload") we have added a tweak in the UDP GSO code to mark GSO
> >> packets being sent out as CHECKSUM_UNNECESSARY when the egress device
> >> doesn't support checksum offload. This was done to satisfy the offload
> >> checks in the gso stack.
> >>
> >> However, when sending a UDP GSO packet from a tunnel device, we will go
> >> through the TX path and the GSO offload twice. Once for the tunnel device,
> >> which acts as a passthru for GSO packets, and once for the underlying
> >> egress device.
> >>
> >> Even though a tunnel device acts as a passthru for a UDP GSO packet, GSO
> >> offload checks still happen on transmit from a tunnel device. So if the skb
> >> is not marked as CHECKSUM_UNNECESSARY or CHECKSUM_PARTIAL, we will get a
> >> warning from the gso stack.
> >
> > I don't entirely understand. The check should not hit on pass through,
> > where segs == skb:
> >
> > if (segs != skb && unlikely(skb_needs_check(skb, tx_path) &&
> > !IS_ERR(segs)))
> > skb_warn_bad_offload(skb);
> >
>
> That's something I should have explained better. Let me try to shed some
> light on it now. We're hitting the skb_warn_bad_offload warning because
> skb_mac_gso_segment doesn't return any segments (segs == NULL).
>
> And that's because we bail out early out of __udp_gso_segment when we
> detect that the tunnel device is capable of tx-udp-segmentation
> (GSO_UDP_L4):
>
> if (skb_gso_ok(gso_skb, features | NETIF_F_GSO_ROBUST)) {
> /* Packet is from an untrusted source, reset gso_segs. */
> skb_shinfo(gso_skb)->gso_segs = DIV_ROUND_UP(gso_skb->len - sizeof(*uh),
> mss);
> return NULL;
> }
Oh I see. Thanks.
> It has not occurred to me before, but in the spirit of commit
> 8d74e9f88d65 "net: avoid skb_warn_bad_offload on IS_ERR" [1], we could
> tighten the check to exclude cases when segs == NULL. I'm thinking of:
>
> if (segs != skb && !IS_ERR_OR_NULL(segs) && unlikely(skb_needs_check(skb, tx_path)))
> skb_warn_bad_offload(skb);
That looks sensible to me. And nicer than the ip_summed conversion in
udp_send_skb.
> That would be an alternative. Though I'm not sure I understand the
> consequences of such change fully yet. Namely if we're wouldn't be
> losing some diagnostics from the bad offload warning.
>
> [1]
> https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=8d74e9f88d65af8bb2e095aff506aa6eac755ada
>
> >> Today this can occur in two situations, which we check for in
> >> __ip_append_data() and __ip6_append_data():
> >>
> >> 1) when the tunnel device does not advertise checksum offload, or
> >> 2) when there are IPv6 extension headers present.
> >>
> >> To fix it mark UDP_GSO packets as CHECKSUM_UNNECESSARY early on the TX
> >> path, when still in the udp layer, since we need to have ip_summed set up
> >> correctly for GSO processing by tunnel devices.
> >
> > The previous patch converted segments post segmentation to
> > CHECKSUM_UNNECESSARY, which is fine as they had
> > already been checksummed in software, and CHECKSUM_NONE
> > packets on egress are common.
> >
> > This creates GSO packets without CHECKSUM_PARTIAL.
> > Segmentation offload always requires checksum offload. So these
> > would be weird new packets. And having CHECKSUM_NONE (or
> > equivalent), but entering software checksumming is also confusing.
>
> I agree this is confusing to reason about. That is a GSO packet with
> CHECKSUM_UNNECESSARY which has not undergone segmentation and csum
> offload in software.
I was mistaken earlier. Was looking at this code just yesterday too for
https://lore.kernel.org/netdev/20240726023359.879166-1-willemdebruijn.kernel@gmail.com/
We do set the GSO skb already skb CHECKSUM_NONE. So your suggestion is
not a significant change.
> Kind of related, I noticed that turning off tx-checksum-ip-generic with
> ethtool doesn't disable tx-udp-segmentation. That looks like a bug.
I saw the same :)
> > The crux is that I don't understand why the warning fires on tunnel
> > exit when no segmentation takes place there. Hopefully we can fix
> > in a way that does not introduce these weird GSO packets (but if
> > not, so be it).
>
> Attaching a self contained repro which I've been using to trace and
> understand the GSO code:
>
> ---8<---
>
> sh# cat repro-full.py
> #!/bin/env python
> #
> # `modprobe ip6_tunnel` might be needed.
> #
>
> import os
> import subprocess
> import shutil
> from socket import *
>
> UDP_SEGMENT = 103
>
> cmd = [shutil.which("ip"), "-batch", "/dev/stdin"]
> script = b"""
> link set dev lo up
>
> link add name sink mtu 1540 type dummy
> addr add dev sink fd11::2/48 nodad
> link set dev sink up
>
> tunnel add iptnl mode ip6ip6 remote fd11::1 local fd11::2 dev sink
> link set dev iptnl mtu 1500
> addr add dev iptnl fd00::2/48 nodad
> link set dev iptnl up
> """
> proc = subprocess.Popen(cmd, stdin=subprocess.PIPE)
> proc.communicate(input=script)
>
> os.system("ethtool -K sink tx-udp-segmentation off > /dev/null")
> os.system("ethtool -K sink tx-checksum-ip-generic off > /dev/null")
>
> # Alternatively to hopopts:
> # os.system("ethtool -K iptnl tx-checksum-ip-generic off")
>
> hopopts = b"\x00" * 8
> s = socket(AF_INET6, SOCK_DGRAM)
> s.setsockopt(IPPROTO_IPV6, IPV6_HOPOPTS, hopopts)
> s.setsockopt(SOL_UDP, UDP_SEGMENT, 145)
> s.sendto(b"x" * 3000, ("fd00::1", 9))
> sh# perf ftrace -G __skb_gso_segment --graph-opts noirqs,depth=5 -- unshare -n python repro-full.py
> # tracer: function_graph
> #
> # CPU DURATION FUNCTION CALLS
> # | | | | | | |
> 16) | __skb_gso_segment() {
> 16) 0.288 us | irq_enter_rcu(); /* = 0xffffa00c03d89ac0 */
> 16) 0.172 us | idle_cpu(); /* = 0x0 */
> 16) | skb_mac_gso_segment() {
> 16) 0.184 us | skb_network_protocol(); /* = 0xdd86 */
> 16) 0.161 us | __rcu_read_lock(); /* = 0x2 */
> 16) | ipv6_gso_segment() {
> 16) | rcu_read_lock_held() {
> 16) 0.151 us | rcu_lockdep_current_cpu_online(); /* = 0x1 */
> 16) 0.514 us | } /* rcu_read_lock_held = 0x1 */
> 16) | rcu_read_lock_held() {
> 16) 0.152 us | rcu_lockdep_current_cpu_online(); /* = 0x1 */
> 16) 0.459 us | } /* rcu_read_lock_held = 0x1 */
> 16) | rcu_read_lock_held() {
> 16) 0.151 us | rcu_lockdep_current_cpu_online(); /* = 0x1 */
> 16) 0.459 us | } /* rcu_read_lock_held = 0x1 */
> 16) | udp6_ufo_fragment() {
> 16) 0.237 us | __udp_gso_segment(); /* = 0x0 */
> 16) 0.727 us | } /* udp6_ufo_fragment = 0x0 */
> 16) 3.049 us | } /* ipv6_gso_segment = 0x0 */
> 16) 0.171 us | __rcu_read_unlock(); /* = 0x1 */
> 16) 4.748 us | } /* skb_mac_gso_segment = 0x0 */
> 16) | skb_warn_bad_offload() {
> [...]
> 16) ! 785.215 us | } /* skb_warn_bad_offload = 0x0 */
> 16) ! 800.986 us | } /* __skb_gso_segment = 0x0 */
> 16) | __skb_gso_segment() {
> 16) 0.394 us | irq_enter_rcu(); /* = 0xffffa00c03d89ac0 */
> 16) 0.181 us | idle_cpu(); /* = 0x0 */
> 16) | skb_mac_gso_segment() {
> 16) 0.182 us | skb_network_protocol(); /* = 0xdd86 */
> 16) 0.178 us | __rcu_read_lock(); /* = 0x3 */
> 16) | ipv6_gso_segment() {
> 16) | rcu_read_lock_held() {
> 16) 0.155 us | rcu_lockdep_current_cpu_online(); /* = 0x1 */
> 16) 0.556 us | } /* rcu_read_lock_held = 0x1 */
> 16) | rcu_read_lock_held() {
> 16) 0.159 us | rcu_lockdep_current_cpu_online(); /* = 0x1 */
> 16) 0.480 us | } /* rcu_read_lock_held = 0x1 */
> 16) | rcu_read_lock_held() {
> 16) 0.159 us | rcu_lockdep_current_cpu_online(); /* = 0x1 */
> 16) 0.480 us | } /* rcu_read_lock_held = 0x1 */
> 16) | ip6ip6_gso_segment() {
> 16) + 22.176 us | ipv6_gso_segment(); /* = 0xffffa00c03018c00 */
> 16) + 24.875 us | } /* ip6ip6_gso_segment = 0xffffa00c03018c00 */
> 16) + 27.416 us | } /* ipv6_gso_segment = 0xffffa00c03018c00 */
> 16) 0.230 us | __rcu_read_unlock(); /* = 0x2 */
> 16) + 29.065 us | } /* skb_mac_gso_segment = 0xffffa00c03018c00 */
> 16) + 32.828 us | } /* __skb_gso_segment = 0xffffa00c03018c00 */
> sh#
Powered by blists - more mailing lists