[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <87h6ccl7mm.fsf@cloudflare.com>
Date: Fri, 26 Jul 2024 13:23:13 +0200
From: Jakub Sitnicki <jakub@...udflare.com>
To: Willem de Bruijn <willemdebruijn.kernel@...il.com>
Cc: netdev@...r.kernel.org, "David S. Miller" <davem@...emloft.net>, Eric
Dumazet <edumazet@...gle.com>, Jakub Kicinski <kuba@...nel.org>, Paolo
Abeni <pabeni@...hat.com>, Willem de Bruijn <willemb@...gle.com>,
kernel-team@...udflare.com,
syzbot+e15b7e15b8a751a91d9a@...kaller.appspotmail.com
Subject: Re: [PATCH net 1/2] udp: Mark GSO packets as CHECKSUM_UNNECESSARY
early on on output
On Thu, Jul 25, 2024 at 10:21 AM -04, Willem de Bruijn wrote:
> On Thu, Jul 25, 2024 at 5:56 AM Jakub Sitnicki <jakub@...udflare.com> wrote:
>>
>> In commit 10154dbded6d ("udp: Allow GSO transmit from devices with no
>> checksum offload") we have added a tweak in the UDP GSO code to mark GSO
>> packets being sent out as CHECKSUM_UNNECESSARY when the egress device
>> doesn't support checksum offload. This was done to satisfy the offload
>> checks in the gso stack.
>>
>> However, when sending a UDP GSO packet from a tunnel device, we will go
>> through the TX path and the GSO offload twice. Once for the tunnel device,
>> which acts as a passthru for GSO packets, and once for the underlying
>> egress device.
>>
>> Even though a tunnel device acts as a passthru for a UDP GSO packet, GSO
>> offload checks still happen on transmit from a tunnel device. So if the skb
>> is not marked as CHECKSUM_UNNECESSARY or CHECKSUM_PARTIAL, we will get a
>> warning from the gso stack.
>
> I don't entirely understand. The check should not hit on pass through,
> where segs == skb:
>
> if (segs != skb && unlikely(skb_needs_check(skb, tx_path) &&
> !IS_ERR(segs)))
> skb_warn_bad_offload(skb);
>
That's something I should have explained better. Let me try to shed some
light on it now. We're hitting the skb_warn_bad_offload warning because
skb_mac_gso_segment doesn't return any segments (segs == NULL).
And that's because we bail out early out of __udp_gso_segment when we
detect that the tunnel device is capable of tx-udp-segmentation
(GSO_UDP_L4):
if (skb_gso_ok(gso_skb, features | NETIF_F_GSO_ROBUST)) {
/* Packet is from an untrusted source, reset gso_segs. */
skb_shinfo(gso_skb)->gso_segs = DIV_ROUND_UP(gso_skb->len - sizeof(*uh),
mss);
return NULL;
}
It has not occurred to me before, but in the spirit of commit
8d74e9f88d65 "net: avoid skb_warn_bad_offload on IS_ERR" [1], we could
tighten the check to exclude cases when segs == NULL. I'm thinking of:
if (segs != skb && !IS_ERR_OR_NULL(segs) && unlikely(skb_needs_check(skb, tx_path)))
skb_warn_bad_offload(skb);
That would be an alternative. Though I'm not sure I understand the
consequences of such change fully yet. Namely if we're wouldn't be
losing some diagnostics from the bad offload warning.
[1]
https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=8d74e9f88d65af8bb2e095aff506aa6eac755ada
>> Today this can occur in two situations, which we check for in
>> __ip_append_data() and __ip6_append_data():
>>
>> 1) when the tunnel device does not advertise checksum offload, or
>> 2) when there are IPv6 extension headers present.
>>
>> To fix it mark UDP_GSO packets as CHECKSUM_UNNECESSARY early on the TX
>> path, when still in the udp layer, since we need to have ip_summed set up
>> correctly for GSO processing by tunnel devices.
>
> The previous patch converted segments post segmentation to
> CHECKSUM_UNNECESSARY, which is fine as they had
> already been checksummed in software, and CHECKSUM_NONE
> packets on egress are common.
>
> This creates GSO packets without CHECKSUM_PARTIAL.
> Segmentation offload always requires checksum offload. So these
> would be weird new packets. And having CHECKSUM_NONE (or
> equivalent), but entering software checksumming is also confusing.
I agree this is confusing to reason about. That is a GSO packet with
CHECKSUM_UNNECESSARY which has not undergone segmentation and csum
offload in software.
Kind of related, I noticed that turning off tx-checksum-ip-generic with
ethtool doesn't disable tx-udp-segmentation. That looks like a bug.
> The crux is that I don't understand why the warning fires on tunnel
> exit when no segmentation takes place there. Hopefully we can fix
> in a way that does not introduce these weird GSO packets (but if
> not, so be it).
Attaching a self contained repro which I've been using to trace and
understand the GSO code:
---8<---
sh# cat repro-full.py
#!/bin/env python
#
# `modprobe ip6_tunnel` might be needed.
#
import os
import subprocess
import shutil
from socket import *
UDP_SEGMENT = 103
cmd = [shutil.which("ip"), "-batch", "/dev/stdin"]
script = b"""
link set dev lo up
link add name sink mtu 1540 type dummy
addr add dev sink fd11::2/48 nodad
link set dev sink up
tunnel add iptnl mode ip6ip6 remote fd11::1 local fd11::2 dev sink
link set dev iptnl mtu 1500
addr add dev iptnl fd00::2/48 nodad
link set dev iptnl up
"""
proc = subprocess.Popen(cmd, stdin=subprocess.PIPE)
proc.communicate(input=script)
os.system("ethtool -K sink tx-udp-segmentation off > /dev/null")
os.system("ethtool -K sink tx-checksum-ip-generic off > /dev/null")
# Alternatively to hopopts:
# os.system("ethtool -K iptnl tx-checksum-ip-generic off")
hopopts = b"\x00" * 8
s = socket(AF_INET6, SOCK_DGRAM)
s.setsockopt(IPPROTO_IPV6, IPV6_HOPOPTS, hopopts)
s.setsockopt(SOL_UDP, UDP_SEGMENT, 145)
s.sendto(b"x" * 3000, ("fd00::1", 9))
sh# perf ftrace -G __skb_gso_segment --graph-opts noirqs,depth=5 -- unshare -n python repro-full.py
# tracer: function_graph
#
# CPU DURATION FUNCTION CALLS
# | | | | | | |
16) | __skb_gso_segment() {
16) 0.288 us | irq_enter_rcu(); /* = 0xffffa00c03d89ac0 */
16) 0.172 us | idle_cpu(); /* = 0x0 */
16) | skb_mac_gso_segment() {
16) 0.184 us | skb_network_protocol(); /* = 0xdd86 */
16) 0.161 us | __rcu_read_lock(); /* = 0x2 */
16) | ipv6_gso_segment() {
16) | rcu_read_lock_held() {
16) 0.151 us | rcu_lockdep_current_cpu_online(); /* = 0x1 */
16) 0.514 us | } /* rcu_read_lock_held = 0x1 */
16) | rcu_read_lock_held() {
16) 0.152 us | rcu_lockdep_current_cpu_online(); /* = 0x1 */
16) 0.459 us | } /* rcu_read_lock_held = 0x1 */
16) | rcu_read_lock_held() {
16) 0.151 us | rcu_lockdep_current_cpu_online(); /* = 0x1 */
16) 0.459 us | } /* rcu_read_lock_held = 0x1 */
16) | udp6_ufo_fragment() {
16) 0.237 us | __udp_gso_segment(); /* = 0x0 */
16) 0.727 us | } /* udp6_ufo_fragment = 0x0 */
16) 3.049 us | } /* ipv6_gso_segment = 0x0 */
16) 0.171 us | __rcu_read_unlock(); /* = 0x1 */
16) 4.748 us | } /* skb_mac_gso_segment = 0x0 */
16) | skb_warn_bad_offload() {
[...]
16) ! 785.215 us | } /* skb_warn_bad_offload = 0x0 */
16) ! 800.986 us | } /* __skb_gso_segment = 0x0 */
16) | __skb_gso_segment() {
16) 0.394 us | irq_enter_rcu(); /* = 0xffffa00c03d89ac0 */
16) 0.181 us | idle_cpu(); /* = 0x0 */
16) | skb_mac_gso_segment() {
16) 0.182 us | skb_network_protocol(); /* = 0xdd86 */
16) 0.178 us | __rcu_read_lock(); /* = 0x3 */
16) | ipv6_gso_segment() {
16) | rcu_read_lock_held() {
16) 0.155 us | rcu_lockdep_current_cpu_online(); /* = 0x1 */
16) 0.556 us | } /* rcu_read_lock_held = 0x1 */
16) | rcu_read_lock_held() {
16) 0.159 us | rcu_lockdep_current_cpu_online(); /* = 0x1 */
16) 0.480 us | } /* rcu_read_lock_held = 0x1 */
16) | rcu_read_lock_held() {
16) 0.159 us | rcu_lockdep_current_cpu_online(); /* = 0x1 */
16) 0.480 us | } /* rcu_read_lock_held = 0x1 */
16) | ip6ip6_gso_segment() {
16) + 22.176 us | ipv6_gso_segment(); /* = 0xffffa00c03018c00 */
16) + 24.875 us | } /* ip6ip6_gso_segment = 0xffffa00c03018c00 */
16) + 27.416 us | } /* ipv6_gso_segment = 0xffffa00c03018c00 */
16) 0.230 us | __rcu_read_unlock(); /* = 0x2 */
16) + 29.065 us | } /* skb_mac_gso_segment = 0xffffa00c03018c00 */
16) + 32.828 us | } /* __skb_gso_segment = 0xffffa00c03018c00 */
sh#
Powered by blists - more mailing lists