netdev - AW: AW: big ICMP requests get disrupted on IPSec tunnel activation

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [day] [month] [year] [list]
Message-ID: <EB8510AA7A943D43916A72C9B8F4181F62A08D3B@cvk038.intra.cvk.de>
Date:   Tue, 15 Oct 2019 10:11:37 +0000
From:   "Bartschies, Thomas" <Thomas.Bartschies@....de>
To:     'Eric Dumazet' <eric.dumazet@...il.com>,
        'David Ahern' <dsahern@...il.com>,
        "'netdev@...r.kernel.org'" <netdev@...r.kernel.org>
Subject: AW: AW: big ICMP requests get disrupted on IPSec tunnel activation

Hello Eric,

no problem. Just done that. Here are the results. Hope it helps. Did a parallel debug ftrace also. But no conclusive information within that. Tried it before.

[Di Okt 15 11:51:06 2019] WARNING: CPU: 1 PID: 0 at net/sched/sch_fq.c:389 fq_enqueue+0x531/0x610 [sch_fq]
[Di Okt 15 11:51:06 2019] Modules linked in: twofish_generic twofish_avx_x86_64 twofish_x86_64_3way twofish_x86_64 twofish_common camellia_generic camellia_aesni_avx_x86_64 camellia_x86_64 serpent_avx_x86_64 serpent_sse2_x86_64 serpent_generic blowfish_generic blowfish_x86_64 blowfish_common cast5_avx_x86_64 cast5_generic cast_common des_generic cmac xcbc rmd160 sha512_ssse3 sha512_generic af_key xt_CHECKSUM ipt_rpfilter xt_statistic xt_connlimit nf_conncount ip_set_hash_ip br_netfilter bridge stp llc xt_HL xt_hl ipt_REJECT nf_reject_ipv4 xt_REDIRECT xt_recent xt_realm xt_CT xt_NETMAP xt_MASQUERADE xt_esp ipt_ECN xt_ecn xt_comment ipt_CLUSTERIP ipt_ah xt_addrtype act_police cls_basic cls_flow cls_fw cls_u32 sch_tbf sch_prio sch_htb sch_hfsc sch_ingress sch_sfq xt_set ip_set xt_NFLOG nf_log_ipv4 nf_log_common xt_LOG nf_conntrack_sane nf_conntrack_netlink nf_nat_tftp nf_nat_snmp_basic nf_conntrack_snmp nf_nat_sip nf_nat_pptp nf_nat_irc nf_nat_h323 nf_nat_ftp nf_nat_amanda nf_conntrack_tftp
[Di Okt 15 11:51:06 2019]  nf_conntrack_sip nf_conntrack_pptp nf_conntrack_netbios_ns nf_conntrack_broadcast nf_conntrack_irc nf_conntrack_h323 nf_conntrack_ftp ts_kmp nf_conntrack_amanda xt_TPROXY nf_tproxy_ipv6 nf_tproxy_ipv4 xt_time xt_TCPMSS xt_tcpmss xt_sctp xt_policy xt_pkttype xt_physdev xt_owner xt_NFQUEUE xt_nat xt_multiport xt_mark xt_mac xt_limit xt_length xt_iprange xt_helper xt_hashlimit xt_DSCP xt_dscp xt_dccp xt_connmark xt_CLASSIFY xt_AUDIT iptable_raw iptable_nat nf_nat iptable_mangle iptable_filter vmw_vsock_vmci_transport vsock ip6t_REJECT nf_reject_ipv6 xt_state xt_conntrack nf_conntrack nf_defrag_ipv6 nf_defrag_ipv4 libcrc32c nfnetlink_log nfnetlink ip6table_filter ip6_tables sb_edac crct10dif_pclmul crc32_pclmul ghash_clmulni_intel intel_rapl_perf joydev vmw_balloon vmw_vmci i2c_piix4 sch_fq tcp_illinois binfmt_misc ip_tables vmwgfx drm_kms_helper ttm drm crc32c_intel serio_raw mptspi e1000 scsi_transport_spi mptscsih mptbase ata_generic pata_acpi
[Di Okt 15 11:51:06 2019] CPU: 1 PID: 0 Comm: swapper/1 Not tainted 5.2.18-200.fc30.x86_64 #1
[Di Okt 15 11:51:06 2019] Hardware name: VMware, Inc. VMware Virtual Platform/440BX Desktop Reference Platform, BIOS 6.00 12/12/2018
[Di Okt 15 11:51:06 2019] RIP: 0010:fq_enqueue+0x531/0x610 [sch_fq]
[Di Okt 15 11:51:06 2019] Code: db 48 8b 3d 49 28 00 00 48 89 ee e8 39 ba 03 db 45 85 ed 75 d4 49 8b 2f e9 f1 fb ff ff 48 83 83 30 02 00 00 01 e9 fd fc ff ff <0f> 0b e9 83 fc ff ff 4c 89 ea 31 c0 e9 be fc ff ff 41 8b 46 08 39
[Di Okt 15 11:51:06 2019] RSP: 0018:ffffa2fe806f0a58 EFLAGS: 00010206
[Di Okt 15 11:51:06 2019] RAX: 0000000df8475800 RBX: ffff973638a32800 RCX: 0000000000000000
[Di Okt 15 11:51:06 2019] RDX: 000003df1924d162 RSI: 0000000000000004 RDI: ffffffffc02845b5
[Di Okt 15 11:51:06 2019] RBP: 15cdc829bb26f866 R08: 0000000000000000 R09: ffff97363afc4e00
[Di Okt 15 11:51:06 2019] R10: 0000000000000078 R11: ffff97363808b088 R12: ffff97363821ce00
[Di Okt 15 11:51:06 2019] R13: ffff973634620cb0 R14: 00000000000003cb R15: ffff973635965c70
[Di Okt 15 11:51:06 2019] FS:  0000000000000000(0000) GS:ffff97363ba80000(0000) knlGS:0000000000000000
[Di Okt 15 11:51:06 2019] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[Di Okt 15 11:51:06 2019] CR2: 00005616fa9f90b8 CR3: 0000000135084005 CR4: 00000000000606e0
[Di Okt 15 11:51:06 2019] Call Trace:
[Di Okt 15 11:51:06 2019]  <IRQ>
[Di Okt 15 11:51:06 2019]  ? fq_flow_purge+0x70/0x70 [sch_fq]
[Di Okt 15 11:51:06 2019]  ? ftrace_graph_caller+0xa0/0xa0
[Di Okt 15 11:51:06 2019]  __dev_queue_xmit+0x45b/0x960
[Di Okt 15 11:51:06 2019]  ? ftrace_graph_caller+0xa0/0xa0
[Di Okt 15 11:51:06 2019]  ip_finish_output2+0x2c8/0x580
[Di Okt 15 11:51:06 2019]  ? ftrace_graph_caller+0xa0/0xa0
[Di Okt 15 11:51:06 2019]  ip_do_fragment+0x69d/0x7a0
[Di Okt 15 11:51:06 2019]  ? __ip_flush_pending_frames.isra.0+0x80/0x80
[Di Okt 15 11:51:06 2019]  ? ftrace_graph_caller+0xa0/0xa0
[Di Okt 15 11:51:06 2019]  ip_output+0x71/0xf0
[Di Okt 15 11:51:06 2019]  ? ip_finish_output2+0x580/0x580
[Di Okt 15 11:51:06 2019]  ? ftrace_graph_caller+0xa0/0xa0
[Di Okt 15 11:51:06 2019]  ip_forward+0x3c5/0x480
[Di Okt 15 11:51:06 2019]  ? ip_defrag.cold+0x37/0x37
[Di Okt 15 11:51:06 2019]  ? ftrace_graph_caller+0xa0/0xa0
[Di Okt 15 11:51:06 2019]  ip_rcv+0xbc/0xd0
[Di Okt 15 11:51:06 2019]  ? ip_rcv_finish_core.isra.0+0x390/0x390
[Di Okt 15 11:51:06 2019]  ? ftrace_graph_caller+0xa0/0xa0
[Di Okt 15 11:51:06 2019]  __netif_receive_skb_one_core+0x80/0x90
[Di Okt 15 11:51:06 2019]  ? ftrace_graph_caller+0xa0/0xa0
[Di Okt 15 11:51:06 2019]  netif_receive_skb_internal+0x41/0xb0
[Di Okt 15 11:51:06 2019]  ? return_to_handler+0x15/0x36
[Di Okt 15 11:51:06 2019]  ? ftrace_graph_caller+0xa0/0xa0
[Di Okt 15 11:51:06 2019]  napi_gro_receive+0xf6/0x160
[Di Okt 15 11:51:06 2019]  ? ftrace_graph_caller+0xa0/0xa0
[Di Okt 15 11:51:06 2019]  e1000_clean_rx_irq+0x2ac/0x570 [e1000]
[Di Okt 15 11:51:06 2019]  ? ftrace_graph_caller+0xa0/0xa0
[Di Okt 15 11:51:06 2019]  e1000_clean+0x278/0x5f0 [e1000]
[Di Okt 15 11:51:06 2019]  ? ftrace_graph_caller+0xa0/0xa0
[Di Okt 15 11:51:06 2019]  net_rx_action+0x148/0x3b0
[Di Okt 15 11:51:06 2019]  ? napi_complete_done+0x110/0x110
[Di Okt 15 11:51:06 2019]  ? ftrace_graph_caller+0xa0/0xa0
[Di Okt 15 11:51:06 2019]  __do_softirq+0xed/0x30e
[Di Okt 15 11:51:06 2019]  ? ftrace_graph_caller+0xa0/0xa0
[Di Okt 15 11:51:06 2019]  irq_exit+0xf1/0x100
[Di Okt 15 11:51:06 2019]  ? ftrace_graph_caller+0xa0/0xa0
[Di Okt 15 11:51:06 2019]  do_IRQ+0x81/0xe0
[Di Okt 15 11:51:06 2019]  ? ftrace_graph_caller+0xa0/0xa0
[Di Okt 15 11:51:06 2019]  common_interrupt+0xf/0xf
[Di Okt 15 11:51:06 2019]  </IRQ>
[Di Okt 15 11:51:06 2019] RIP: 0010:native_safe_halt+0xe/0x10
[Di Okt 15 11:51:06 2019] Code: 90 90 90 90 90 90 90 90 90 90 90 90 e9 07 00 00 00 0f 00 2d 46 06 44 00 f4 c3 66 90 e9 07 00 00 00 0f 00 2d 36 06 44 00 fb f4 <c3> 90 e8 eb 98 03 00 41 54 55 53 e8 f2 01 79 ff 65 8b 2d db 7f 64
[Di Okt 15 11:51:06 2019] RSP: 0018:ffffa2fe806abeb0 EFLAGS: 00000282 ORIG_RAX: ffffffffffffffd3
[Di Okt 15 11:51:06 2019] RAX: ffffffff9b9c7d60 RBX: 0000000000000001 RCX: 0000000000000000
[Di Okt 15 11:51:06 2019] RDX: 0000000000000001 RSI: 0000000000000008 RDI: ffffffff9b1140d6
[Di Okt 15 11:51:06 2019] RBP: 0000000000000001 R08: 0000000000000000 R09: ffff97363afc4e00
[Di Okt 15 11:51:06 2019] R10: 0000000000000c78 R11: ffff973638092c88 R12: ffff97363a84ddc0
[Di Okt 15 11:51:06 2019] R13: 0000000000000000 R14: 0000000000000000 R15: ffff97363a84ddc0
[Di Okt 15 11:51:06 2019]  ? __cpuidle_text_start+0x8/0x8
[Di Okt 15 11:51:06 2019]  ? do_idle+0x1f6/0x260
[Di Okt 15 11:51:06 2019]  default_idle+0x1a/0x140
[Di Okt 15 11:51:06 2019]  ? ftrace_graph_caller+0xa0/0xa0
[Di Okt 15 11:51:06 2019]  do_idle+0x1fb/0x260
[Di Okt 15 11:51:06 2019]  ? cpu_startup_entry+0x19/0x20
[Di Okt 15 11:51:06 2019]  ? ftrace_graph_caller+0xa0/0xa0
[Di Okt 15 11:51:06 2019]  cpu_startup_entry+0x19/0x20
[Di Okt 15 11:51:06 2019]  start_secondary+0x17d/0x1d0
[Di Okt 15 11:51:06 2019]  secondary_startup_64+0xa4/0xb0
[Di Okt 15 11:51:06 2019] ---[ end trace 34f95424acff5ddb ]---

Regards,
--
Thomas Bartschies
CVK IT Systeme

-----Ursprüngliche Nachricht-----
Von: Eric Dumazet [mailto:eric.dumazet@...il.com] 
Gesendet: Montag, 14. Oktober 2019 17:32
An: Bartschies, Thomas <Thomas.Bartschies@....de>; 'David Ahern' <dsahern@...il.com>; 'netdev@...r.kernel.org' <netdev@...r.kernel.org>
Betreff: Re: AW: big ICMP requests get disrupted on IPSec tunnel activation



On 10/14/19 7:02 AM, Bartschies, Thomas wrote:
> Hello,
> 
> it took a while to build a testsystem for bisecting the issue. Finally I've identified the patch that causes my problems.
> BTW. The fq packet network scheduler is in use.
> 
> It's
> [PATCH net-next] tcp/fq: move back to CLOCK_MONOTONIC
> 
> In the recent TCP/EDT patch series, I switched TCP and sch_fq clocks 
> from MONOTONIC to TAI, in order to meet the choice done earlier for sch_etf packet scheduler.
> 
> But sure enough, this broke some setups were the TAI clock jumps forward (by almost 50 year...), as reported by Leonard Crestez.
> 
> If we want to converge later, we'll probably need to add an skb field to differentiate the clock bases, or a socket option.
> 
> In the meantime, an UDP application will need to use CLOCK_MONOTONIC 
> base for its SCM_TXTIME timestamps if using fq packet scheduler.
> 
> Fixes: 72b0094f9182 ("tcp: switch tcp_clock_ns() to CLOCK_TAI base")
> Fixes: 142537e41923 ("net_sched: sch_fq: switch to CLOCK_TAI")
> Fixes: fd2bca2aa789 ("tcp: switch internal pacing timer to CLOCK_TAI")
> Signed-off-by: Eric Dumazet <edumazet@...xxxxxxx>
> Reported-by: Leonard Crestez <leonard.crestez@...xxxx>
> 
> ----
> 
> After reverting it in a current 5.2.18 kernel, the problem disappears. 
> There were some post fixes for other issues caused by this patch. 
> These fixed other similar issues, but not mine. I've already tried to 
> set the tstamp to zero in xfrm4_output.c, but with no luck so far. I'm pretty sure, that reverting the clock patch isn't the proper solution for upstream. So I what other way this can be fixed?


Thanks a lot Thomas for this report !

I guess you could add a debug check in fq to let us know the call graph.

Something like the following :

diff --git a/net/sched/sch_fq.c b/net/sched/sch_fq.c index 98dd87ce15108cfe1c011da44ba32f97763776c8..2aa41a39e81b94f3b7092dc51b91829f5929634d 100644
--- a/net/sched/sch_fq.c
+++ b/net/sched/sch_fq.c
@@ -380,9 +380,14 @@ static void flow_queue_add(struct fq_flow *flow, struct sk_buff *skb)  {
        struct rb_node **p, *parent;
        struct sk_buff *head, *aux;
+       s64 delay;
 
        fq_skb_cb(skb)->time_to_send = skb->tstamp ?: ktime_get_ns();
 
+       /* We should really add a TCA_FQ_MAX_HORIZON  at some point :( */
+       delay = fq_skb_cb(skb)->time_to_send - ktime_get_ns();
+       WARN_ON_ONCE(delay > 60 * NSEC_PER_SEC);
+
        head = flow->head;
        if (!head ||
            fq_skb_cb(skb)->time_to_send >= fq_skb_cb(flow->tail)->time_to_send) {


> 
> ---
> [PATCH net] net: clear skb->tstamp in bridge forwarding path Matteo 
> reported forwarding issues inside the linux bridge, if the enslaved interfaces use the fq qdisc.
> 
> Similar to commit 8203e2d844d3 ("net: clear skb->tstamp in forwarding 
> paths"), we need to clear the tstamp field in the bridge forwarding path.
> 
> Fixes: 80b14dee2bea ("net: Add a new socket option for a future 
> transmit time.")
> Fixes: fb420d5d91c1 ("tcp/fq: move back to CLOCK_MONOTONIC")
> Reported-and-tested-by: Matteo Croce <mcroce@...hat.com>
> Signed-off-by: Paolo Abeni <pabeni@...hat.com>
> 
> and
> 
> net: clear skb->tstamp in forwarding paths
> 
> Sergey reported that forwarding was no longer working if fq packet scheduler was used.
> 
> This is caused by the recent switch to EDT model, since incoming 
> packets might have been timestamped by __net_timestamp()
> 
> __net_timestamp() uses ktime_get_real(), while fq expects packets using CLOCK_MONOTONIC base.
> 
> The fix is to clear skb->tstamp in forwarding paths.
> 
> Fixes: 80b14dee ("net: Add a new socket option for a future transmit 
> time.")
> Fixes: fb420d5d ("tcp/fq: move back to CLOCK_MONOTONIC")
> Signed-off-by: default avatarEric Dumazet <edumazet@...gle.com>
> Reported-by: default avatarSergey Matyukevich <geomatsi@...il.com>
> Tested-by: default avatarSergey Matyukevich <geomatsi@...il.com>
> Signed-off-by: default avatarDavid S. Miller <davem@...emloft.net>
> 
> Best regards,
> --
> Thomas Bartschies
> CVK IT Systeme
> 
> 
> -----Ursprüngliche Nachricht-----
> Von: Bartschies, Thomas
> Gesendet: Dienstag, 17. September 2019 09:28
> An: 'David Ahern' <dsahern@...il.com>; 'netdev@...r.kernel.org' 
> <netdev@...r.kernel.org>
> Betreff: AW: big ICMP requests get disrupted on IPSec tunnel 
> activation
> 
> Hello,
> 
> thanks for the suggestion. Running pmtu.sh with kernel versions 4.19, 4.20 and even 5.2.13 made no difference. All tests were successful every time.
> 
> Although my external ping tests still failing with the newer kernels. I've ran the script after triggering my problem, to make sure all possible side effects happening. 
> 
> Please keep in mind, that even when the ICMP requests stalling, other connections still going through. Like e.g. ssh or tracepath. I would expect that all connection types would be affected if this is a MTU problem. Am I wrong?
> 
> Any suggestions for more tests to isolate the cause? 
> 
> Best regards,
> --
> Thomas Bartschies
> CVK IT Systeme
> 
> -----Ursprüngliche Nachricht-----
> Von: David Ahern [mailto:dsahern@...il.com]
> Gesendet: Freitag, 13. September 2019 19:13
> An: Bartschies, Thomas <Thomas.Bartschies@....de>; 
> 'netdev@...r.kernel.org' <netdev@...r.kernel.org>
> Betreff: Re: big ICMP requests get disrupted on IPSec tunnel 
> activation
> 
> On 9/13/19 9:59 AM, Bartschies, Thomas wrote:
>> Hello together,
>>
>> since kenel 4.20 we're observing a strange behaviour when sending big ICMP packets. An example is a packet size of 3000 bytes.
>> The packets should be forwarded by a linux gateway (firewall) having multiple interfaces also acting as a vpn gateway.
>>
>> Test steps:
>> 1. Disabled all iptables rules
>> 2. Enabled the VPN IPSec Policies.
>> 3. Start a ping with packet size (e.g. 3000 bytes) from a client in 
>> the DMZ passing the machine targeting another LAN machine 4. Ping 
>> works 5. Enable a VPN policy by sending pings from the gateway to a 
>> tunnel target. System tries to create the tunnel 6. Ping from 3. immediately stalls. No error messages. Just stops.
>> 7. Stop Ping from 3. Start another without packet size parameter. Stalls also.
>>
>> Result:
>> Connections from the client to other services on the LAN machine 
>> still work. Tracepath works. Only ICMP requests do not pass the 
>> gateway anymore. tcpdump sees them on incoming interface, but not on the outgoing LAN interface. IMCP requests to any other target IP address in LAN still work. Until one uses a bigger packet size. Then these alternative connections stall also.
>>
>> Flushing the policy table has no effect. Flushing the conntrack table has no effect. Setting rp_filter to loose (2) has no effect.
>> Flush the route cache has no effect.
>>
>> Only a reboot of the gateway restores normal behavior.
>>
>> What can be the cause? Is this a networking bug?
>>
> 
> some of these most likely will fail due to other reasons, but can you run 'tools/testing/selftests/net/pmtu.sh'[1] on 4.19 and then 4.20 and compare results. Hopefully it will shed some light on the problem and can be used to bisect to a commit that caused the regression.
> 
> 
> [1]
> https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/tre
> e/tools/testing/selftests/net/pmtu.sh
>