lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <87h6ccl7mm.fsf@cloudflare.com>
Date: Fri, 26 Jul 2024 13:23:13 +0200
From: Jakub Sitnicki <jakub@...udflare.com>
To: Willem de Bruijn <willemdebruijn.kernel@...il.com>
Cc: netdev@...r.kernel.org,  "David S. Miller" <davem@...emloft.net>,  Eric
 Dumazet <edumazet@...gle.com>,  Jakub Kicinski <kuba@...nel.org>,  Paolo
 Abeni <pabeni@...hat.com>,  Willem de Bruijn <willemb@...gle.com>,
  kernel-team@...udflare.com,
  syzbot+e15b7e15b8a751a91d9a@...kaller.appspotmail.com
Subject: Re: [PATCH net 1/2] udp: Mark GSO packets as CHECKSUM_UNNECESSARY
 early on on output

On Thu, Jul 25, 2024 at 10:21 AM -04, Willem de Bruijn wrote:
> On Thu, Jul 25, 2024 at 5:56 AM Jakub Sitnicki <jakub@...udflare.com> wrote:
>>
>> In commit 10154dbded6d ("udp: Allow GSO transmit from devices with no
>> checksum offload") we have added a tweak in the UDP GSO code to mark GSO
>> packets being sent out as CHECKSUM_UNNECESSARY when the egress device
>> doesn't support checksum offload. This was done to satisfy the offload
>> checks in the gso stack.
>>
>> However, when sending a UDP GSO packet from a tunnel device, we will go
>> through the TX path and the GSO offload twice. Once for the tunnel device,
>> which acts as a passthru for GSO packets, and once for the underlying
>> egress device.
>>
>> Even though a tunnel device acts as a passthru for a UDP GSO packet, GSO
>> offload checks still happen on transmit from a tunnel device. So if the skb
>> is not marked as CHECKSUM_UNNECESSARY or CHECKSUM_PARTIAL, we will get a
>> warning from the gso stack.
>
> I don't entirely understand. The check should not hit on pass through,
> where segs == skb:
>
>         if (segs != skb && unlikely(skb_needs_check(skb, tx_path) &&
> !IS_ERR(segs)))
>                 skb_warn_bad_offload(skb);
>

That's something I should have explained better. Let me try to shed some
light on it now. We're hitting the skb_warn_bad_offload warning because
skb_mac_gso_segment doesn't return any segments (segs == NULL).

And that's because we bail out early out of __udp_gso_segment when we
detect that the tunnel device is capable of tx-udp-segmentation
(GSO_UDP_L4):

	if (skb_gso_ok(gso_skb, features | NETIF_F_GSO_ROBUST)) {
		/* Packet is from an untrusted source, reset gso_segs. */
		skb_shinfo(gso_skb)->gso_segs = DIV_ROUND_UP(gso_skb->len - sizeof(*uh),
							     mss);
		return NULL;
	}

It has not occurred to me before, but in the spirit of commit
8d74e9f88d65 "net: avoid skb_warn_bad_offload on IS_ERR" [1], we could
tighten the check to exclude cases when segs == NULL. I'm thinking of:

	if (segs != skb && !IS_ERR_OR_NULL(segs) && unlikely(skb_needs_check(skb, tx_path)))
		skb_warn_bad_offload(skb);

That would be an alternative. Though I'm not sure I understand the
consequences of such change fully yet. Namely if we're wouldn't be
losing some diagnostics from the bad offload warning.

[1]
https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=8d74e9f88d65af8bb2e095aff506aa6eac755ada

>> Today this can occur in two situations, which we check for in
>> __ip_append_data() and __ip6_append_data():
>>
>> 1) when the tunnel device does not advertise checksum offload, or
>> 2) when there are IPv6 extension headers present.
>>
>> To fix it mark UDP_GSO packets as CHECKSUM_UNNECESSARY early on the TX
>> path, when still in the udp layer, since we need to have ip_summed set up
>> correctly for GSO processing by tunnel devices.
>
> The previous patch converted segments post segmentation to
> CHECKSUM_UNNECESSARY, which is fine as they had
> already been checksummed in software, and CHECKSUM_NONE
> packets on egress are common.
>
> This creates GSO packets without CHECKSUM_PARTIAL.
> Segmentation offload always requires checksum offload. So these
> would be weird new packets. And having CHECKSUM_NONE (or
> equivalent), but entering software checksumming is also confusing.

I agree this is confusing to reason about. That is a GSO packet with
CHECKSUM_UNNECESSARY which has not undergone segmentation and csum
offload in software.

Kind of related, I noticed that turning off tx-checksum-ip-generic with
ethtool doesn't disable tx-udp-segmentation. That looks like a bug.

> The crux is that I don't understand why the warning fires on tunnel
> exit when no segmentation takes place there. Hopefully we can fix
> in a way that does not introduce these weird GSO packets (but if
> not, so be it).

Attaching a self contained repro which I've been using to trace and
understand the GSO code:

---8<---

sh# cat repro-full.py
#!/bin/env python
#
# `modprobe ip6_tunnel` might be needed.
#

import os
import subprocess
import shutil
from socket import *

UDP_SEGMENT = 103

cmd = [shutil.which("ip"), "-batch", "/dev/stdin"]
script = b"""
link set dev lo up

link add name sink mtu 1540 type dummy
addr add dev sink fd11::2/48 nodad
link set dev sink up

tunnel add iptnl mode ip6ip6 remote fd11::1 local fd11::2 dev sink
link set dev iptnl mtu 1500
addr add dev iptnl fd00::2/48 nodad
link set dev iptnl up
"""
proc = subprocess.Popen(cmd, stdin=subprocess.PIPE)
proc.communicate(input=script)

os.system("ethtool -K sink tx-udp-segmentation off > /dev/null")
os.system("ethtool -K sink tx-checksum-ip-generic off > /dev/null")

# Alternatively to hopopts:
# os.system("ethtool -K iptnl tx-checksum-ip-generic off")

hopopts = b"\x00" * 8
s = socket(AF_INET6, SOCK_DGRAM)
s.setsockopt(IPPROTO_IPV6, IPV6_HOPOPTS, hopopts)
s.setsockopt(SOL_UDP, UDP_SEGMENT, 145)
s.sendto(b"x" * 3000, ("fd00::1", 9))
sh# perf ftrace -G __skb_gso_segment --graph-opts noirqs,depth=5 -- unshare -n python repro-full.py
# tracer: function_graph
#
# CPU  DURATION                  FUNCTION CALLS
# |     |   |                     |   |   |   |
 16)               |  __skb_gso_segment() {
 16)   0.288 us    |    irq_enter_rcu(); /* = 0xffffa00c03d89ac0 */
 16)   0.172 us    |    idle_cpu(); /* = 0x0 */
 16)               |    skb_mac_gso_segment() {
 16)   0.184 us    |      skb_network_protocol(); /* = 0xdd86 */
 16)   0.161 us    |      __rcu_read_lock(); /* = 0x2 */
 16)               |      ipv6_gso_segment() {
 16)               |        rcu_read_lock_held() {
 16)   0.151 us    |          rcu_lockdep_current_cpu_online(); /* = 0x1 */
 16)   0.514 us    |        } /* rcu_read_lock_held = 0x1 */
 16)               |        rcu_read_lock_held() {
 16)   0.152 us    |          rcu_lockdep_current_cpu_online(); /* = 0x1 */
 16)   0.459 us    |        } /* rcu_read_lock_held = 0x1 */
 16)               |        rcu_read_lock_held() {
 16)   0.151 us    |          rcu_lockdep_current_cpu_online(); /* = 0x1 */
 16)   0.459 us    |        } /* rcu_read_lock_held = 0x1 */
 16)               |        udp6_ufo_fragment() {
 16)   0.237 us    |          __udp_gso_segment(); /* = 0x0 */
 16)   0.727 us    |        } /* udp6_ufo_fragment = 0x0 */
 16)   3.049 us    |      } /* ipv6_gso_segment = 0x0 */
 16)   0.171 us    |      __rcu_read_unlock(); /* = 0x1 */
 16)   4.748 us    |    } /* skb_mac_gso_segment = 0x0 */
 16)               |    skb_warn_bad_offload() {
 [...]
 16) ! 785.215 us  |    } /* skb_warn_bad_offload = 0x0 */
 16) ! 800.986 us  |  } /* __skb_gso_segment = 0x0 */
 16)               |  __skb_gso_segment() {
 16)   0.394 us    |    irq_enter_rcu(); /* = 0xffffa00c03d89ac0 */
 16)   0.181 us    |    idle_cpu(); /* = 0x0 */
 16)               |    skb_mac_gso_segment() {
 16)   0.182 us    |      skb_network_protocol(); /* = 0xdd86 */
 16)   0.178 us    |      __rcu_read_lock(); /* = 0x3 */
 16)               |      ipv6_gso_segment() {
 16)               |        rcu_read_lock_held() {
 16)   0.155 us    |          rcu_lockdep_current_cpu_online(); /* = 0x1 */
 16)   0.556 us    |        } /* rcu_read_lock_held = 0x1 */
 16)               |        rcu_read_lock_held() {
 16)   0.159 us    |          rcu_lockdep_current_cpu_online(); /* = 0x1 */
 16)   0.480 us    |        } /* rcu_read_lock_held = 0x1 */
 16)               |        rcu_read_lock_held() {
 16)   0.159 us    |          rcu_lockdep_current_cpu_online(); /* = 0x1 */
 16)   0.480 us    |        } /* rcu_read_lock_held = 0x1 */
 16)               |        ip6ip6_gso_segment() {
 16) + 22.176 us   |          ipv6_gso_segment(); /* = 0xffffa00c03018c00 */
 16) + 24.875 us   |        } /* ip6ip6_gso_segment = 0xffffa00c03018c00 */
 16) + 27.416 us   |      } /* ipv6_gso_segment = 0xffffa00c03018c00 */
 16)   0.230 us    |      __rcu_read_unlock(); /* = 0x2 */
 16) + 29.065 us   |    } /* skb_mac_gso_segment = 0xffffa00c03018c00 */
 16) + 32.828 us   |  } /* __skb_gso_segment = 0xffffa00c03018c00 */
sh#

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ