lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Date:   Thu, 8 Dec 2022 07:44:43 -0800
From:   Breno Leitao <leitao@...ian.org>
To:     Eric Dumazet <edumazet@...gle.com>
Cc:     David Miller <davem@...emloft.net>,
        Jakub Kicinski <kuba@...nel.org>,
        netdev <netdev@...r.kernel.org>, leit@...com,
        Hideaki YOSHIFUJI <yoshfuji@...ux-ipv6.org>,
        Paolo Abeni <pabeni@...hat.com>,
        David Ahern <dsahern@...nel.org>,
        LKML <linux-kernel@...r.kernel.org>
Subject: Re: [PATCH RESEND net-next] tcp: socket-specific version of
 WARN_ON_ONCE()

On Wed, Dec 07, 2022 at 06:59:38PM +0100, Eric Dumazet wrote:
> On Wed, Dec 7, 2022 at 6:37 PM Breno Leitao <leitao@...ian.org> wrote:
> >
> > On Sat, Sep 03, 2022 at 09:42:43AM -0700, Eric Dumazet wrote:
> > > On Wed, Aug 31, 2022 at 6:38 AM Breno Leitao <leitao@...ian.org> wrote:
> > > >
> > > > There are cases where we need information about the socket during a
> > > > warning, so, it could help us to find bugs that happens that do not have
> > > > a easily repro.
> > > >
> > > > BPF congestion control algorithms can change socket state in unexpected
> > > > ways, leading to WARNings. Additional information about the socket state
> > > > is useful to identify the culprit.
> > >
> > > Well, this suggests we need to fix BPF side ?
> > >
> > > Not sure how this can happen, because TCP_BPF_IW has
> > >
> > > if (val <= 0 || tp->data_segs_out > tp->syn_data)
> > >      ret = -EINVAL;
> > > else
> > >     tcp_snd_cwnd_set(tp, val);
> >
> > I am not sure we are hitting this path, please check the stack below
> >
> > > It seems you already found the issue in an eBPF CC, can you share the details ?
> >
> > Sure, here is an example of what we are facing (with some obfuscation).
> > Remeber that there are users' BPF application running:
> >
> > [155375.750105] ------------[ cut here ]------------
> > [155375.759526] WARNING: CPU: 19 PID: 0 at net/ipv4/tcp.c:4552 tcp_sock_warn+0x6/0x20
> > [155375.774700] Modules linked in: vhost_net tun vhost vhost_iotlb tap virtio_net net_failover failover mpls_gso mpls_iptunnel mpls_router ip_tunnel fou ip6_udp_tunnel udp_tunnel sch_fq sunrpc bpf_preload tls act_gact cls_bpf tcp_diag inet_diag skx_edac nfit libnvdimm x86_pkg_temp_thermal intel_powerclamp coretemp iTCO_wdt kvm_intel iTCO_vendor_support kvm evdev ses irqbypass enclosure i2c_i801 i2c_smbus ipmi_si ipmi_devintf ipmi_msghandler acpi_cpufreq button tpm_crb sch_fq_codel fuse sg nvme mpi3mr scsi_transport_sas nvme_core xhci_pci xhci_hcd loop efivarfs autofs4
> > [155375.874916] CPU: 19 PID: 0 Comm: swapper/19 Kdump: loaded Tainted: G S                5.12.0-0_XXXXXXXX_g0fed6f189e14 #1
> > [155375.898770] Hardware name: XXXX XXX XXXX XXXXX, BIOS BS_BIOS_XXX XX/XX/2022
> > [155375.916015] RIP: 0010:tcp_sock_warn+0x6/0x20
> > [155375.924755] Code: 4d 01 3e 85 c0 0f 84 57 ff ff ff 48 8b 0c 24 44 8b 01 eb 82 e8 eb b7 14 00 66 66 2e 0f 1f 84 00 00 00 00 00 0f 1f 44 00 00 53 <0f> 0b 48 85 ff 0f 85 77 70 14 00 5b c3 66 66 2e 0f 1f 84 00 00 00
> > [155375.962518] RSP: 0018:ffffc90000d08988 EFLAGS: 00010246
> > [155375.973157] RAX: ffff88817432b5c0 RBX: ffff88828f748000 RCX: ffffc90000d08a34
> > [155375.987614] RDX: 0000000000000000 RSI: 00000000822be8c8 RDI: ffff88828f748000
> > [155376.002074] RBP: 0000000000000040 R08: ffffc90000d08a38 R09: 00000000822be334
> > [155376.016531] R10: 0000000000000000 R11: 0000000000000001 R12: 0000000000000000
> > [155376.030988] R13: ffffc90000d08a34 R14: 0000000000005546 R15: 0000000000000000
> > [155376.045441] FS:  0000000000000000(0000) GS:ffff88903f8c0000(0000) knlGS:0000000000000000
> > [155376.061804] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> > [155376.073474] CR2: 00007fe407603080 CR3: 0000000b19993006 CR4: 00000000007706e0
> > [155376.087928] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
> > [155376.102383] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
> > [155376.116841] PKRU: 55555554
> > [155376.122427] Call Trace:
> > [155376.127497]  <IRQ>
> > [155376.131697]  tcp_fastretrans_alert+0x988/0x9e0
> > [155376.140774]  tcp_ack+0x8b4/0x1360
> > [155376.147581]  ? __cgroup_bpf_run_filter_skb+0x185/0x440
> > [155376.158037]  tcp_rcv_established+0x135/0x660
> > [155376.166755]  ? sk_filter_trim_cap+0xbc/0x220
> > [155376.175472]  tcp_v6_do_rcv+0xbe/0x3e0
> > [155376.182974]  tcp_v6_rcv+0xc01/0xc90
> > [155376.190128]  ip6_protocol_deliver_rcu+0xbd/0x450
> > [155376.199541]  ip6_input_finish+0x3d/0x60
> > [155376.207388]  ip6_input+0xb5/0xc0
> > [155376.214019]  ip6_sublist_rcv_finish+0x37/0x50
> > [155376.222912]  ip6_sublist_rcv+0x1dd/0x270
> > [155376.230935]  ipv6_list_rcv+0x113/0x140
> > [155376.238618]  __netif_receive_skb_list_core+0x1a0/0x210
> > [155376.249080]  netif_receive_skb_list_internal+0x186/0x2a0
> > [155376.259887]  gro_normal_list.part.171+0x19/0x40
> > [155376.269137]  napi_complete_done+0x65/0x150
> > [155376.277514]  bnxt_poll_p5+0x25b/0x2b0
> > [155376.285027]  ? tcp_write_xmit+0x278/0x1060
> > [155376.293398]  __napi_poll+0x25/0x120
> > [155376.300552]  net_rx_action+0x189/0x300
> > [155376.308227]  __do_softirq+0xbb/0x271
> > [155376.315554]  irq_exit_rcu+0x97/0xa0
> > [155376.322710]  common_interrupt+0x7f/0xa0
> > [155376.330566]  </IRQ>
> > [155376.334947]  asm_common_interrupt+0x1e/0x40
> > [155376.343499] RIP: 0010:cpuidle_enter_state+0xc2/0x340
> > [155376.353619] Code: 48 89 c5 0f 1f 44 00 00 31 ff e8 f9 8d 73 ff 45 84 ff 74 12 9c 58 f6 c4 02 0f 85 38 02 00 00 31 ff e8 b2 36 79 ff fb 45 85 f6 <0f> 88 e8 00 00 00 49 63 d6 48 2b 2c 24 48 6b ca 68 48 8d 04 52 48
> > [155376.391367] RSP: 0018:ffffc90000293e90 EFLAGS: 00000202
> > [155376.402004] RAX: ffff88903f8eaa80 RBX: ffffe8ffff4c6c00 RCX: 000000000000001f
> > [155376.416456] RDX: 00008d503c63f228 RSI: 000000005ba4b680 RDI: 0000000000000000
> > [155376.430910] RBP: 00008d503c63f228 R08: 0000000000000002 R09: 000000000002a300
> > [155376.445372] R10: 002435b781e62f4a R11: ffff88903f8e9a84 R12: 0000000000000001
> > [155376.459832] R13: ffffffff83621200 R14: 0000000000000001 R15: 0000000000000000
> > [155376.474307]  cpuidle_enter+0x29/0x40
> > [155376.481654]  do_idle+0x1bb/0x200
> > [155376.488297]  cpu_startup_entry+0x19/0x20
> > [155376.496320]  start_secondary+0x104/0x140
> > [155376.504342]  secondary_startup_64_no_verify+0xb0/0xbb
> > [155376.514629] ---[ end trace 9b428a0d7901c3ff ]---
> > [155376.524041] TCP: Socket Info: family=10 state=1 sport=12345 dport=57616 ccname=ned_tcp_dctcp01 cwnd=1
> > [155376.524045] TCP: saddr=XXXX:XXXX:XXXX:XXXX:XXXX:0000:00c8:0000 daddr=XXXX:XXXX:XXXX:XXXX:XXXX:0000:0288:00
> 
> Try to give us symbols with scripts/decode_stacktrace.sh , thanks.

Sorry, here it is:

 [749619.538804] WARNING: CPU: 19 PID: 0 at net/ipv4/tcp.c:4552 tcp_sock_warn+0x6/0x20
 [749619.553969] Modules linked in: sch_fq sunrpc bpf_preload tls act_gact cls_bpf tcp_diag inet_diag skx_edac nfit libnvdimm x86_pkg_temp_thermal intel_powerclamp coretemp kvm_intel iTCO_wdt iTCO_vendor_support kvm evdev ses irqbypass enclosure i2c_i801 i2c_smbus ipmi_si ipmi_devintf ipmi_msghandler acpi_cpufreq button tpm_crb sch_fq_codel vhost_net tun tap vhost vhost_iotlb virtio_net net_failover failover mpls_gso mpls_iptunnel mpls_router ip_tunnel fou ip6_udp_tunnel udp_tunnel fuse sg nvme mpi3mr scsi_transport_sas nvme_core xhci_pci xhci_hcd loop efivarfs autofs4
 [749619.678066] Hardware name: XXXXX
 [749619.695308] RIP: tcp_sock_warn+0x6/0x20
 [749619.704034] Code: 4d 01 3e 85 c0 0f 84 57 ff ff ff 48 8b 0c 24 44 8b 01 eb 82 e8 eb b7 14 00 66 66 2e 0f 1f 84 00 00 00 00 00 0f 1f 44 00 00 53 <0f> 0b 48 85 ff 0f 85 77 70 14 00 5b c3 66 66 2e 0f 1f 84 00 00 00
 All code
 ========
    0:	4d 01 3e             	add    %r15,(%r14)
    3:	85 c0                	test   %eax,%eax
    5:	0f 84 57 ff ff ff    	je     0xffffffffffffff62
    b:	48 8b 0c 24          	mov    (%rsp),%rcx
    f:	44 8b 01             	mov    (%rcx),%r8d
   12:	eb 82                	jmp    0xffffffffffffff96
   14:	e8 eb b7 14 00       	callq  0x14b804
   19:	66 66 2e 0f 1f 84 00 	data16 nopw %cs:0x0(%rax,%rax,1)
   20:	00 00 00 00
   24:	0f 1f 44 00 00       	nopl   0x0(%rax,%rax,1)
   29:	53                   	push   %rbx
   2a:*	0f 0b                	ud2    		<-- trapping instruction
   2c:	48 85 ff             	test   %rdi,%rdi
   2f:	0f 85 77 70 14 00    	jne    0x1470ac
   35:	5b                   	pop    %rbx
   36:	c3                   	retq
   37:	66                   	data16
   38:	66                   	data16
   39:	2e                   	cs
   3a:	0f                   	.byte 0xf
   3b:	1f                   	(bad)
   3c:	84 00                	test   %al,(%rax)
 	...

 Code starting with the faulting instruction
 ===========================================
    0:	0f 0b                	ud2
    2:	48 85 ff             	test   %rdi,%rdi
    5:	0f 85 77 70 14 00    	jne    0x147082
    b:	5b                   	pop    %rbx
    c:	c3                   	retq
    d:	66                   	data16
    e:	66                   	data16
    f:	2e                   	cs
   10:	0f                   	.byte 0xf
   11:	1f                   	(bad)
   12:	84 00                	test   %al,(%rax)
 	...
 [749619.741779] RSP: 0018:ffffc90000d08988 EFLAGS: 00010246
 [749619.752436] RAX: ffff88814b57f5c0 RBX: ffff8881bd2540c0 RCX: ffffc90000d08a34
 [749619.766900] RDX: 0000000000000000 RSI: 00000000cda8f4af RDI: ffff8881bd2540c0
 [749619.781364] RBP: 0000000000000000 R08: ffffc90000d08a38 R09: 00000000cda8f44f
 [749619.795831] R10: 0000000000000000 R11: 0000000000000000 R12: 0000000000000000
 [749619.810300] R13: ffffc90000d08a34 R14: 0000000000011406 R15: 0000000000000000
 [749619.824788] FS:  0000000000000000(0000) GS:ffff88903f8c0000(0000) knlGS:0000000000000000
 [749619.841168] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
 [749619.852857] CR2: 000000000007c2e9 CR3: 0000000b82412002 CR4: 00000000007706e0
 [749619.867331] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
 [749619.881800] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
 [749619.896260] PKRU: 55555554
 [749619.901859] Call Trace:
 [749619.906927]  <IRQ>
 [749619.911129] tcp_fastretrans_alert+0x988/0x9e0
 [749619.920222] ? kmem_cache_free+0x33c/0x3d0
 [749619.928606] tcp_ack+0x8b4/0x1360
 [749619.935425] ? __cgroup_bpf_run_filter_skb+0x185/0x440
 [749619.945910] tcp_rcv_established+0x2f3/0x660
 [749619.954639] ? sk_filter_trim_cap+0xbc/0x220
 [749619.963358] tcp_v6_do_rcv+0xbe/0x3e0
 [749619.970863] tcp_v6_rcv+0xc01/0xc90
 [749619.978029] ip6_protocol_deliver_rcu+0xbd/0x450
 [749619.987453] ip6_input_finish+0x3d/0x60
 [749619.995313] ip6_input+0xb5/0xc0
 [749620.001958] ip6_sublist_rcv_finish+0x37/0x50
 [749620.010851] ip6_sublist_rcv+0x1dd/0x270
 [749620.018877] ipv6_list_rcv+0x113/0x140
 [749620.026552] __netif_receive_skb_list_core+0x1a0/0x210
 [749620.037025] netif_receive_skb_list_internal+0x186/0x2a0
 [749620.047838] ? napi_gro_complete+0x6c/0xd0
 [749620.056215] gro_normal_list.part.171+0x19/0x40
 [749620.065471] napi_complete_done+0x65/0x150
 [749620.073856] bnxt_poll_p5+0x25b/0x2b0
 [749620.081369] __napi_poll+0x25/0x120
 [749620.088537] net_rx_action+0x189/0x300
 [749620.096224] __do_softirq+0xbb/0x271
 [749620.103571] irq_exit_rcu+0x97/0xa0
 [749620.110732] common_interrupt+0x7f/0xa0
 [749620.118595]  </IRQ>
 [749620.122964] asm_common_interrupt+0x1e/0x40
 [749620.131511] RIP: cpuidle_enter_state+0xc2/0x340
 [749620.141621] Code: 48 89 c5 0f 1f 44 00 00 31 ff e8 f9 8d 73 ff 45 84 ff 74 12 9c 58 f6 c4 02 0f 85 38 02 00 00 31 ff e8 b2 36 79 ff fb 45 85 f6 <0f> 88 e8 00 00 00 49 63 d6 48 2b 2c 24 48 6b ca 68 48 8d 04 52 48
 All code
 ========
    0:	48 89 c5             	mov    %rax,%rbp
    3:	0f 1f 44 00 00       	nopl   0x0(%rax,%rax,1)
    8:	31 ff                	xor    %edi,%edi
    a:	e8 f9 8d 73 ff       	callq  0xffffffffff738e08
    f:	45 84 ff             	test   %r15b,%r15b
   12:	74 12                	je     0x26
   14:	9c                   	pushfq
   15:	58                   	pop    %rax
   16:	f6 c4 02             	test   $0x2,%ah
   19:	0f 85 38 02 00 00    	jne    0x257
   1f:	31 ff                	xor    %edi,%edi
   21:	e8 b2 36 79 ff       	callq  0xffffffffff7936d8
   26:	fb                   	sti
   27:	45 85 f6             	test   %r14d,%r14d
   2a:*	0f 88 e8 00 00 00    	js     0x118		<-- trapping instruction
   30:	49 63 d6             	movslq %r14d,%rdx
   33:	48 2b 2c 24          	sub    (%rsp),%rbp
   37:	48 6b ca 68          	imul   $0x68,%rdx,%rcx
   3b:	48 8d 04 52          	lea    (%rdx,%rdx,2),%rax
   3f:	48                   	rex.W
 
 Code starting with the faulting instruction
 ===========================================
    0:	0f 88 e8 00 00 00    	js     0xee
    6:	49 63 d6             	movslq %r14d,%rdx
    9:	48 2b 2c 24          	sub    (%rsp),%rbp
    d:	48 6b ca 68          	imul   $0x68,%rdx,%rcx
   11:	48 8d 04 52          	lea    (%rdx,%rdx,2),%rax
   15:	48                   	rex.W
 [749620.179375] RSP: 0018:ffffc90000293e90 EFLAGS: 00000202
 [749620.190016] RAX: ffff88903f8eaa80 RBX: ffffe8ffff4c6c00 RCX: 000000000000001f
 [749620.204491] RDX: 0002a9c665c30050 RSI: 000000005ba4b680 RDI: 0000000000000000
 [749620.218949] RBP: 0002a9c665c30050 R08: 0000000000000002 R09: 000000000002a300
 [749620.233403] R10: 00219fc7cf6dd4ca R11: ffff88903f8e9a84 R12: 0000000000000001
 [749620.247854] R13: ffffffff83621200 R14: 0000000000000001 R15: 0000000000000000
 [749620.262315] cpuidle_enter+0x29/0x40
 [749620.269649] do_idle+0x1bb/0x200
 [749620.276296] cpu_startup_entry+0x19/0x20
 [749620.284333] start_secondary+0x104/0x140
 [749620.292364] secondary_startup_64_no_verify+0xb0/0xbb
 [749620.302657] ---[ end trace 44f03a0d3a4e81fd ]---

> Also 5.12 is kind of old :/

Right. Unfortunately this problem is not easily reproducible, and, we
need millions of machines running some specific workload to reproduce
it.


That said, the motivation for this patch is to dump more TCP information
when we get such warnings, so, we have relevant data to understand more
about the issues (when it happens). I.e, this patch is not this problem
above, but, providing more information when we hit such warnings.

That said, this is patch is about creating the
TCP_SOCK_WARN_ON_ONCE() macro, and eventually move some of the
WARN_ON_ONCE() to TCP_SOCK_WARN_ON_ONCE(). (I.e, this is not a patch to solve
the case above.)

Anyway, I got some feedbacks from the patch, let me send a V2 of the
patch, which I expect to clarify the problem better.

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ