[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-Id:
<172708862651.3320223.2618494280244639290.git-patchwork-notify@kernel.org>
Date: Mon, 23 Sep 2024 10:50:26 +0000
From: patchwork-bot+netdevbpf@...nel.org
To: Josh Hunt <johunt@...mai.com>
Cc: edumazet@...gle.com, davem@...emloft.net, kuba@...nel.org,
pabeni@...hat.com, netdev@...r.kernel.org, ncardwell@...gle.com,
linux-kernel@...r.kernel.org
Subject: Re: [PATCH net v3] tcp: check skb is non-NULL in tcp_rto_delta_us()
Hello:
This patch was applied to netdev/net.git (main)
by David S. Miller <davem@...emloft.net>:
On Tue, 10 Sep 2024 15:08:22 -0400 you wrote:
> We have some machines running stock Ubuntu 20.04.6 which is their 5.4.0-174-generic
> kernel that are running ceph and recently hit a null ptr dereference in
> tcp_rearm_rto(). Initially hitting it from the TLP path, but then later we also
> saw it getting hit from the RACK case as well. Here are examples of the oops
> messages we saw in each of those cases:
>
> Jul 26 15:05:02 rx [11061395.780353] BUG: kernel NULL pointer dereference, address: 0000000000000020
> Jul 26 15:05:02 rx [11061395.787572] #PF: supervisor read access in kernel mode
> Jul 26 15:05:02 rx [11061395.792971] #PF: error_code(0x0000) - not-present page
> Jul 26 15:05:02 rx [11061395.798362] PGD 0 P4D 0
> Jul 26 15:05:02 rx [11061395.801164] Oops: 0000 [#1] SMP NOPTI
> Jul 26 15:05:02 rx [11061395.805091] CPU: 0 PID: 9180 Comm: msgr-worker-1 Tainted: G W 5.4.0-174-generic #193-Ubuntu
> Jul 26 15:05:02 rx [11061395.814996] Hardware name: Supermicro SMC 2x26 os-gen8 64C NVME-Y 256G/H12SSW-NTR, BIOS 2.5.V1.2U.NVMe.UEFI 05/09/2023
> Jul 26 15:05:02 rx [11061395.825952] RIP: 0010:tcp_rearm_rto+0xe4/0x160
> Jul 26 15:05:02 rx [11061395.830656] Code: 87 ca 04 00 00 00 5b 41 5c 41 5d 5d c3 c3 49 8b bc 24 40 06 00 00 eb 8d 48 bb cf f7 53 e3 a5 9b c4 20 4c 89 ef e8 0c fe 0e 00 <48> 8b 78 20 48 c1 ef 03 48 89 f8 41 8b bc 24 80 04 00 00 48 f7 e3
> Jul 26 15:05:02 rx [11061395.849665] RSP: 0018:ffffb75d40003e08 EFLAGS: 00010246
> Jul 26 15:05:02 rx [11061395.855149] RAX: 0000000000000000 RBX: 20c49ba5e353f7cf RCX: 0000000000000000
> Jul 26 15:05:02 rx [11061395.862542] RDX: 0000000062177c30 RSI: 000000000000231c RDI: ffff9874ad283a60
> Jul 26 15:05:02 rx [11061395.869933] RBP: ffffb75d40003e20 R08: 0000000000000000 R09: ffff987605e20aa8
> Jul 26 15:05:02 rx [11061395.877318] R10: ffffb75d40003f00 R11: ffffb75d4460f740 R12: ffff9874ad283900
> Jul 26 15:05:02 rx [11061395.884710] R13: ffff9874ad283a60 R14: ffff9874ad283980 R15: ffff9874ad283d30
> Jul 26 15:05:02 rx [11061395.892095] FS: 00007f1ef4a2e700(0000) GS:ffff987605e00000(0000) knlGS:0000000000000000
> Jul 26 15:05:02 rx [11061395.900438] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> Jul 26 15:05:02 rx [11061395.906435] CR2: 0000000000000020 CR3: 0000003e450ba003 CR4: 0000000000760ef0
> Jul 26 15:05:02 rx [11061395.913822] PKRU: 55555554
> Jul 26 15:05:02 rx [11061395.916786] Call Trace:
> Jul 26 15:05:02 rx [11061395.919488]
> Jul 26 15:05:02 rx [11061395.921765] ? show_regs.cold+0x1a/0x1f
> Jul 26 15:05:02 rx [11061395.925859] ? __die+0x90/0xd9
> Jul 26 15:05:02 rx [11061395.929169] ? no_context+0x196/0x380
> Jul 26 15:05:02 rx [11061395.933088] ? ip6_protocol_deliver_rcu+0x4e0/0x4e0
> Jul 26 15:05:02 rx [11061395.938216] ? ip6_sublist_rcv_finish+0x3d/0x50
> Jul 26 15:05:02 rx [11061395.943000] ? __bad_area_nosemaphore+0x50/0x1a0
> Jul 26 15:05:02 rx [11061395.947873] ? bad_area_nosemaphore+0x16/0x20
> Jul 26 15:05:02 rx [11061395.952486] ? do_user_addr_fault+0x267/0x450
> Jul 26 15:05:02 rx [11061395.957104] ? ipv6_list_rcv+0x112/0x140
> Jul 26 15:05:02 rx [11061395.961279] ? __do_page_fault+0x58/0x90
> Jul 26 15:05:02 rx [11061395.965458] ? do_page_fault+0x2c/0xe0
> Jul 26 15:05:02 rx [11061395.969465] ? page_fault+0x34/0x40
> Jul 26 15:05:02 rx [11061395.973217] ? tcp_rearm_rto+0xe4/0x160
> Jul 26 15:05:02 rx [11061395.977313] ? tcp_rearm_rto+0xe4/0x160
> Jul 26 15:05:02 rx [11061395.981408] tcp_send_loss_probe+0x10b/0x220
> Jul 26 15:05:02 rx [11061395.985937] tcp_write_timer_handler+0x1b4/0x240
> Jul 26 15:05:02 rx [11061395.990809] tcp_write_timer+0x9e/0xe0
> Jul 26 15:05:02 rx [11061395.994814] ? tcp_write_timer_handler+0x240/0x240
> Jul 26 15:05:02 rx [11061395.999866] call_timer_fn+0x32/0x130
> Jul 26 15:05:02 rx [11061396.003782] __run_timers.part.0+0x180/0x280
> Jul 26 15:05:02 rx [11061396.008309] ? recalibrate_cpu_khz+0x10/0x10
> Jul 26 15:05:02 rx [11061396.012841] ? native_x2apic_icr_write+0x30/0x30
> Jul 26 15:05:02 rx [11061396.017718] ? lapic_next_event+0x21/0x30
> Jul 26 15:05:02 rx [11061396.021984] ? clockevents_program_event+0x8f/0xe0
> Jul 26 15:05:02 rx [11061396.027035] run_timer_softirq+0x2a/0x50
> Jul 26 15:05:02 rx [11061396.031212] __do_softirq+0xd1/0x2c1
> Jul 26 15:05:02 rx [11061396.035044] do_softirq_own_stack+0x2a/0x40
> Jul 26 15:05:02 rx [11061396.039480]
> Jul 26 15:05:02 rx [11061396.041840] do_softirq.part.0+0x46/0x50
> Jul 26 15:05:02 rx [11061396.046022] __local_bh_enable_ip+0x50/0x60
> Jul 26 15:05:02 rx [11061396.050460] _raw_spin_unlock_bh+0x1e/0x20
> Jul 26 15:05:02 rx [11061396.054817] nf_conntrack_tcp_packet+0x29e/0xbe0 [nf_conntrack]
> Jul 26 15:05:02 rx [11061396.060994] ? get_l4proto+0xe7/0x190 [nf_conntrack]
> Jul 26 15:05:02 rx [11061396.066220] nf_conntrack_in+0xe9/0x670 [nf_conntrack]
> Jul 26 15:05:02 rx [11061396.071618] ipv6_conntrack_local+0x14/0x20 [nf_conntrack]
> Jul 26 15:05:02 rx [11061396.077356] nf_hook_slow+0x45/0xb0
> Jul 26 15:05:02 rx [11061396.081098] ip6_xmit+0x3f0/0x5d0
> Jul 26 15:05:02 rx [11061396.084670] ? ipv6_anycast_cleanup+0x50/0x50
> Jul 26 15:05:02 rx [11061396.089282] ? __sk_dst_check+0x38/0x70
> Jul 26 15:05:02 rx [11061396.093381] ? inet6_csk_route_socket+0x13b/0x200
> Jul 26 15:05:02 rx [11061396.098346] inet6_csk_xmit+0xa7/0xf0
> Jul 26 15:05:02 rx [11061396.102263] __tcp_transmit_skb+0x550/0xb30
> Jul 26 15:05:02 rx [11061396.106701] tcp_write_xmit+0x3c6/0xc20
> Jul 26 15:05:02 rx [11061396.110792] ? __alloc_skb+0x98/0x1d0
> Jul 26 15:05:02 rx [11061396.114708] __tcp_push_pending_frames+0x37/0x100
> Jul 26 15:05:02 rx [11061396.119667] tcp_push+0xfd/0x100
> Jul 26 15:05:02 rx [11061396.123150] tcp_sendmsg_locked+0xc70/0xdd0
> Jul 26 15:05:02 rx [11061396.127588] tcp_sendmsg+0x2d/0x50
> Jul 26 15:05:02 rx [11061396.131245] inet6_sendmsg+0x43/0x70
> Jul 26 15:05:02 rx [11061396.135075] __sock_sendmsg+0x48/0x70
> Jul 26 15:05:02 rx [11061396.138994] ____sys_sendmsg+0x212/0x280
> Jul 26 15:05:02 rx [11061396.143172] ___sys_sendmsg+0x88/0xd0
> Jul 26 15:05:02 rx [11061396.147098] ? __seccomp_filter+0x7e/0x6b0
> Jul 26 15:05:02 rx [11061396.151446] ? __switch_to+0x39c/0x460
> Jul 26 15:05:02 rx [11061396.155453] ? __switch_to_asm+0x42/0x80
> Jul 26 15:05:02 rx [11061396.159636] ? __switch_to_asm+0x5a/0x80
> Jul 26 15:05:02 rx [11061396.163816] __sys_sendmsg+0x5c/0xa0
> Jul 26 15:05:02 rx [11061396.167647] __x64_sys_sendmsg+0x1f/0x30
> Jul 26 15:05:02 rx [11061396.171832] do_syscall_64+0x57/0x190
> Jul 26 15:05:02 rx [11061396.175748] entry_SYSCALL_64_after_hwframe+0x5c/0xc1
> Jul 26 15:05:02 rx [11061396.181055] RIP: 0033:0x7f1ef692618d
> Jul 26 15:05:02 rx [11061396.184893] Code: 28 89 54 24 1c 48 89 74 24 10 89 7c 24 08 e8 ca ee ff ff 8b 54 24 1c 48 8b 74 24 10 41 89 c0 8b 7c 24 08 b8 2e 00 00 00 0f 05 <48> 3d 00 f0 ff ff 77 2f 44 89 c7 48 89 44 24 08 e8 fe ee ff ff 48
> Jul 26 15:05:02 rx [11061396.203889] RSP: 002b:00007f1ef4a26aa0 EFLAGS: 00000293 ORIG_RAX: 000000000000002e
> Jul 26 15:05:02 rx [11061396.211708] RAX: ffffffffffffffda RBX: 000000000000084b RCX: 00007f1ef692618d
> Jul 26 15:05:02 rx [11061396.219091] RDX: 0000000000004000 RSI: 00007f1ef4a26b10 RDI: 0000000000000275
> Jul 26 15:05:02 rx [11061396.226475] RBP: 0000000000004000 R08: 0000000000000000 R09: 0000000000000020
> Jul 26 15:05:02 rx [11061396.233859] R10: 0000000000000000 R11: 0000000000000293 R12: 000000000000084b
> Jul 26 15:05:02 rx [11061396.241243] R13: 00007f1ef4a26b10 R14: 0000000000000275 R15: 000055592030f1e8
> Jul 26 15:05:02 rx [11061396.248628] Modules linked in: vrf bridge stp llc vxlan ip6_udp_tunnel udp_tunnel nls_iso8859_1 amd64_edac_mod edac_mce_amd kvm_amd kvm crct10dif_pclmul ghash_clmulni_intel aesni_intel crypto_simd cryptd glue_helper wmi_bmof ipmi_ssif input_leds joydev rndis_host cdc_ether usbnet mii ast drm_vram_helper ttm drm_kms_helper i2c_algo_bit fb_sys_fops syscopyarea sysfillrect sysimgblt ccp mac_hid ipmi_si ipmi_devintf ipmi_msghandler nft_ct sch_fq_codel nf_tables_set nf_conntrack nf_defrag_ipv6 nf_defrag_ipv4 nf_tables nfnetlink ramoops reed_solomon efi_pstore drm ip_tables x_tables autofs4 raid10 raid456 async_raid6_recov async_memcpy async_pq async_xor async_tx xor raid6_pq libcrc32c raid0 multipath linear mlx5_ib ib_uverbs ib_core raid1 mlx5_core hid_generic pci_hyperv_intf crc32_pclmul tls usbhid ahci mlxfw bnxt_en libahci hid nvme i2c_piix4 nvme_core wmi
> Jul 26 15:05:02 rx [11061396.324334] CR2: 0000000000000020
> Jul 26 15:05:02 rx [11061396.327944] ---[ end trace 68a2b679d1cfb4f1 ]---
> Jul 26 15:05:02 rx [11061396.433435] RIP: 0010:tcp_rearm_rto+0xe4/0x160
> Jul 26 15:05:02 rx [11061396.438137] Code: 87 ca 04 00 00 00 5b 41 5c 41 5d 5d c3 c3 49 8b bc 24 40 06 00 00 eb 8d 48 bb cf f7 53 e3 a5 9b c4 20 4c 89 ef e8 0c fe 0e 00 <48> 8b 78 20 48 c1 ef 03 48 89 f8 41 8b bc 24 80 04 00 00 48 f7 e3
> Jul 26 15:05:02 rx [11061396.457144] RSP: 0018:ffffb75d40003e08 EFLAGS: 00010246
> Jul 26 15:05:02 rx [11061396.462629] RAX: 0000000000000000 RBX: 20c49ba5e353f7cf RCX: 0000000000000000
> Jul 26 15:05:02 rx [11061396.470012] RDX: 0000000062177c30 RSI: 000000000000231c RDI: ffff9874ad283a60
> Jul 26 15:05:02 rx [11061396.477396] RBP: ffffb75d40003e20 R08: 0000000000000000 R09: ffff987605e20aa8
> Jul 26 15:05:02 rx [11061396.484779] R10: ffffb75d40003f00 R11: ffffb75d4460f740 R12: ffff9874ad283900
> Jul 26 15:05:02 rx [11061396.492164] R13: ffff9874ad283a60 R14: ffff9874ad283980 R15: ffff9874ad283d30
> Jul 26 15:05:02 rx [11061396.499547] FS: 00007f1ef4a2e700(0000) GS:ffff987605e00000(0000) knlGS:0000000000000000
> Jul 26 15:05:02 rx [11061396.507886] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> Jul 26 15:05:02 rx [11061396.513884] CR2: 0000000000000020 CR3: 0000003e450ba003 CR4: 0000000000760ef0
> Jul 26 15:05:02 rx [11061396.521267] PKRU: 55555554
> Jul 26 15:05:02 rx [11061396.524230] Kernel panic - not syncing: Fatal exception in interrupt
> Jul 26 15:05:02 rx [11061396.530885] Kernel Offset: 0x1b200000 from 0xffffffff81000000 (relocation range: 0xffffffff80000000-0xffffffffbfffffff)
> Jul 26 15:05:03 rx [11061396.660181] ---[ end Kernel panic - not syncing: Fatal
> exception in interrupt ]---
>
> [...]
Here is the summary with links:
- [net,v3] tcp: check skb is non-NULL in tcp_rto_delta_us()
https://git.kernel.org/netdev/net/c/c8770db2d544
You are awesome, thank you!
--
Deet-doot-dot, I am a bot.
https://korg.docs.kernel.org/patchwork/pwbot.html
Powered by blists - more mailing lists