netdev - Recurring trace from tcp

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [thread-next>] [day] [month] [year] [list]
Message-Id: <CFCC0DCC-4080-4195-9BA9-0860D9701805@fastly.com>
Date:	Fri, 29 May 2015 12:21:59 -0700
From:	Grant Zhang <gzhang@...tly.com>
To:	netdev@...r.kernel.org
Subject: Recurring trace from tcp_fragment()

We have multiple machines running into the following trace repeatedly. The trace shows up every couple of seconds on our production machines.

May 29 18:14:04 cache-fra1230 kernel:[3080455.796143] WARNING: CPU: 7 PID: 0 at net/ipv4/tcp_output.c:1082 tcp_fragment+0x2e4/0x2f0()
May 29 18:14:04 cache-fra1230 kernel:[3080455.796144] Modules linked in: xt_TEE xt_dscp xt_DSCP macvlan gpio_ich x86_pkg_temp_thermal coretemp crct10dif_pclmul crc32_pclmul ghash_clmulni_intel microcode ipmi_watchdog ipmi_devintf sb_edac edac_core lpc_ich mfd_core tpm_tis tpm ipmi_si ipmi_msghandler igb i2c_algo_bit ixgbe ptp isci libsas pps_core mdio
May 29 18:14:04 cache-fra1230 kernel:[3080455.796165] CPU: 7 PID: 0 Comm: swapper/7 Tainted: G        W    3.14.39-fastly #6
May 29 18:14:04 cache-fra1230 kernel:[3080455.796166] Hardware name: Supermicro X9DRi-LN4+/X9DR3-LN4+/X9DRi-LN4+/X9DR3-LN4+, BIOS 3.0 07/05/2013
May 29 18:14:04 cache-fra1230 kernel:[3080455.796168]  000000000000043a ffff885effce38f8 ffffffff81718716 0000000000000007
May 29 18:14:04 cache-fra1230 kernel:[3080455.796170]  0000000000000000 ffff885effce3938 ffffffff8106b54c ffff885e00000001
May 29 18:14:04 cache-fra1230 kernel:[3080455.796172]  ffff8864910c5c00 0000000000000144 ffff882400e2f000 0000000000000400
May 29 18:14:04 cache-fra1230 kernel:[3080455.796174] Call Trace:
May 29 18:14:04 cache-fra1230 kernel:[3080455.796175]  <IRQ>  [<ffffffff81718716>] dump_stack+0x46/0x58
May 29 18:14:04 cache-fra1230 kernel:[3080455.796184]  [<ffffffff8106b54c>] warn_slowpath_common+0x8c/0xc0
May 29 18:14:04 cache-fra1230 kernel:[3080455.796186]  [<ffffffff8106b59a>] warn_slowpath_null+0x1a/0x20
May 29 18:14:04 cache-fra1230 kernel:[3080455.796188]  [<ffffffff81678174>] tcp_fragment+0x2e4/0x2f0
May 29 18:14:04 cache-fra1230 kernel:[3080455.796191]  [<ffffffff8166edeb>] tcp_mark_head_lost+0xeb/0x290
May 29 18:14:04 cache-fra1230 kernel:[3080455.796193]  [<ffffffff8166fe28>] tcp_update_scoreboard+0x58/0x90
May 29 18:14:04 cache-fra1230 kernel:[3080455.796195]  [<ffffffff8167422d>] tcp_fastretrans_alert+0x75d/0xb30
May 29 18:14:04 cache-fra1230 kernel:[3080455.796197]  [<ffffffff816750d5>] tcp_ack+0xa15/0xf50
May 29 18:14:04 cache-fra1230 kernel:[3080455.796199]  [<ffffffff816765db>] tcp_rcv_state_process+0x25b/0xd60
May 29 18:14:04 cache-fra1230 kernel:[3080455.796202]  [<ffffffff8167f7e0>] tcp_v4_do_rcv+0x230/0x490
May 29 18:14:04 cache-fra1230 kernel:[3080455.796206]  [<ffffffff8165cbb0>] ? ip_rcv_finish+0x380/0x380
May 29 18:14:04 cache-fra1230 kernel:[3080455.796208]  [<ffffffff81681a93>] tcp_v4_rcv+0x803/0x850
May 29 18:14:04 cache-fra1230 kernel:[3080455.796210]  [<ffffffff8165cbb0>] ? ip_rcv_finish+0x380/0x380
May 29 18:14:04 cache-fra1230 kernel:[3080455.796214]  [<ffffffff8163c61d>] ? nf_hook_slow+0x7d/0x150
May 29 18:14:04 cache-fra1230 kernel:[3080455.796216]  [<ffffffff8165cbb0>] ? ip_rcv_finish+0x380/0x380
May 29 18:14:04 cache-fra1230 kernel:[3080455.796219]  [<ffffffff8165cc58>] ip_local_deliver_finish+0xa8/0x220
May 29 18:14:04 cache-fra1230 kernel:[3080455.796221]  [<ffffffff8165cf5b>] ip_local_deliver+0x4b/0x90
May 29 18:14:04 cache-fra1230 kernel:[3080455.796223]  [<ffffffff8165c951>] ip_rcv_finish+0x121/0x380
May 29 18:14:04 cache-fra1230 kernel:[3080455.796225]  [<ffffffff8165d226>] ip_rcv+0x286/0x380
May 29 18:14:04 cache-fra1230 kernel:[3080455.796228]  [<ffffffff8160c1d2>] __netif_receive_skb_core+0x512/0x640
May 29 18:14:04 cache-fra1230 kernel:[3080455.796230]  [<ffffffff8160c321>] __netif_receive_skb+0x21/0x70
May 29 18:14:04 cache-fra1230 kernel:[3080455.796232]  [<ffffffff8160c40b>] process_backlog+0x9b/0x170
May 29 18:14:04 cache-fra1230 kernel:[3080455.796234]  [<ffffffff8160c851>] net_rx_action+0x111/0x210
May 29 18:14:04 cache-fra1230 kernel:[3080455.796237]  [<ffffffff8106ff6f>] __do_softirq+0xef/0x2e0
May 29 18:14:04 cache-fra1230 kernel:[3080455.796239]  [<ffffffff81070335>] irq_exit+0x55/0x60
May 29 18:14:04 cache-fra1230 kernel:[3080455.796243]  [<ffffffff81729127>] do_IRQ+0x67/0x110
May 29 18:14:04 cache-fra1230 kernel:[3080455.796246]  [<ffffffff8171edaa>] common_interrupt+0x6a/0x6a
May 29 18:14:04 cache-fra1230 kernel:[3080455.796246]  <EOI>  [<ffffffff8100b8c0>] ? default_idle+0x20/0xe0
May 29 18:14:04 cache-fra1230 kernel:[3080455.796253]  [<ffffffff8100c07f>] arch_cpu_idle+0xf/0x20
May 29 18:14:04 cache-fra1230 kernel:[3080455.796256]  [<ffffffff810b5f60>] cpu_startup_entry+0x80/0x240
May 29 18:14:04 cache-fra1230 kernel:[3080455.796260]  [<ffffffff810c9846>] ? clockevents_config_and_register+0x26/0x30
May 29 18:14:04 cache-fra1230 kernel:[3080455.796264]  [<ffffffff8102d480>] start_secondary+0x190/0x1f0
May 29 18:14:04 cache-fra1230 kernel:[3080455.796265] ---[ end trace 707a3e5aca13730c ]---


We have seen the trace on both 3.10 and 3.14 kernels. My google search also indicates the very issue on 3.16 kernel:
https://bugs.debian.org/cgi-bin/bugreport.cgi?bug=771224

Sorry I haven't tried with the lastest upstream kernel--we only see the traces in production env and we stay a bit behind the latest kernel in production.

Using probe_events, I can see that tcp_fragment() is asked to fragment a skb which is already smaller than "len"(passed in by the caller tcp_mark_head_lost()), and as a result tcp_fragment() returns with -EINVAL at line 1071.

          <idle>-0     [005] d.s. 1267970.050938: myprobe: (tcp_fragment+0x0/0x2f0) skb_len=0x150 len=0x200 mss_now=0x200
          <idle>-0     [005] dNs. 1267970.051069: myretprobe: (tcp_mark_head_lost+0xeb/0x290 <- tcp_fragment) arg1=0xffffffea
          <idle>-0     [005] dNs. 1267970.051097: myprobe: (tcp_fragment+0x0/0x2f0) skb_len=0x150 len=0x400 mss_now=0x200
          <idle>-0     [005] dNs. 1267970.051183: myretprobe: (tcp_mark_head_lost+0xeb/0x290 <- tcp_fragment) arg1=0xffffffea

1061 int tcp_fragment(struct sock *sk, struct sk_buff *skb, u32 len,
1062                  unsigned int mss_now)
1063 {
1064         struct tcp_sock *tp = tcp_sk(sk);
1065         struct sk_buff *buff;
1066         int nsize, old_factor;
1067         int nlen;
1068         u8 flags;
1069 
1070         if (WARN_ON(len > skb->len))
1071                 return -EINVAL;
1072 

The system otherwise runs fine though. Digging a bit into git log, it seems the original BUG_ON was added way back:
commit 3c05d92ed49f644d1f5a960fa48637d63b946016
Date:   Wed Sep 14 20:50:35 2005 -0700
    [TCP]: Compute in_sacked properly when we split up a TSO frame.
    
    The problem is that the SACK fragmenting code may incorrectly call
    tcp_fragment() with a length larger than the skb->len.  This happens
    when the skb on the transmit queue completely falls to the LHS of the
    SACK.
    
    And add a BUG() check to tcp_fragment() so we can spot this kind of
    error more quickly in the future.
    

---------------------------- net/ipv4/tcp_output.c ----------------------------
index c10e443..b018e31 100644
@@ -435,6 +435,8 @@ int tcp_fragment(struct sock *sk, struct sk_buff *skb, u32 len, unsigned int mss
 	int nsize, old_factor;
 	u16 flags;
 
+	BUG_ON(len >= skb->len);
+
 	nsize = skb_headlen(skb) - len;
 	if (nsize < 0)
 		nsize = 0;

My questions are: 
1. is this trace still related to the SACK fragmenting code? 
2. If this trace is benign, is it OK to remove or rate-limit the kernel message?

Thanks for any inputs,

Grant--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html