lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <3547078e-acdf-4189-9a1d-98f581896706@intel.com>
Date: Mon, 16 Sep 2024 13:09:21 +0200
From: Przemek Kitszel <przemyslaw.kitszel@...el.com>
To: Ben Greear <greearb@...delatech.com>
CC: Jan Glaza <jan.glaza@...el.com>, Aleksandr Loktionov
	<aleksandr.loktionov@...el.com>, "intel-wired-lan@...ts.osuosl.org"
	<intel-wired-lan@...ts.osuosl.org>, netdev <netdev@...r.kernel.org>
Subject: Re: tcp_ack __list_del crash in 6.10.3+ hacks

On 9/16/24 12:32, Przemek Kitszel wrote:
> On 9/14/24 07:27, Ben Greear wrote:
>> Hello,
>>
>> We found this during a long duration network test where we are using
>> lots of wifi network devices in a single system, talking with
> 
> It will be really hard to repro for us. Still would like to help.
> 
>> an intel 10g
> 
> It's more likely to get Intel's help if you mail (also) to our IWL list
> (CCed, +Aleksandr for ixgbe expertise).
> 
> 
>> NIC in the same system (using vrfs and such).  The system ran around
>> 7 hours before it crashed.  Seems to be a null pointer in a list, but
>> I'm not having great luck understanding where exactly in the large 
>> tcp_ack
>> method this is happening.  Any suggestions for how to get more relevant
>> info out of gdb?

I would also enable kmemleak, lockdep, ubsan to get some easy helpers.

>>
>> BUG: kernel NULL pointer dereference, address: 0000000000000008^M
>> #PF: supervisor write access in kernel mode^M

could you share your virtualization config?

>> #PF: error_code(0x0002) - not-present page^M
>> PGD 115855067 P4D 115855067 PUD 283ed3067 PMD 0 ^M
>> Oops: Oops: 0002 [#1] PREEMPT SMP^M
>> CPU: 6 PID: 115673 Comm: btserver Tainted: G           O       6.10.3+ 

so, what hacks do you have? those are to aid debugging or to enable some
of the wifi devices?

I don't have any insightful comment unfortunately, sorry.

>> #57^M
>> Hardware name: Default string Default string/SKYBAY, BIOS 5.12 
>> 08/04/2020^M
>> RIP: 0010:tcp_ack+0x62e/0x1530^M
>> Code: 9c 24 80 05 00 00 0f 84 56 09 00 00 49 39 9c 24 50 06 00 00 0f 
>> 84 b2 04 00 00 48 8b 53 58 48 8b 43 60 48 89 df 48 8b 74 24 28 <48> 89 
>> 42 08 48 89 10 48 c7 43 60 00 00 00 00 48 c7 43 58 00 00 00^M
>> RSP: 0018:ffffc9000027c998 EFLAGS: 00010207^M
>> RAX: 0000000000000000 RBX: ffff8881226a8800 RCX: ffff8881226abe01^M
>> RDX: 0000000000000000 RSI: ffff888126a3d4c8 RDI: ffff8881226a8800^M
>> RBP: ffffc9000027ca28 R08: 000000000005edf6 R09: 0000000000000000^M
>> R10: 0000000000000008 R11: 0000000084d9074f R12: ffff888126a3d340^M
>> R13: 0000000000000004 R14: ffff8881226aac00 R15: 0000000000000000^M
>> FS:  00007efc82a2f7c0(0000) GS:ffff88845dd80000(0000) 
>> knlGS:0000000000000000^M
>> CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033^M
>> CR2: 0000000000000008 CR3: 0000000125477006 CR4: 00000000003706f0^M
>> DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000^M
>> DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400^M
>> Call Trace:^M
>>   <IRQ>^M
>>   ? __die+0x1a/0x60^M
>>   ? page_fault_oops+0x150/0x500^M
>>   ? exc_page_fault+0x6f/0x160^M
>>   ? asm_exc_page_fault+0x22/0x30^M
>>   ? tcp_ack+0x62e/0x1530^M
>>   ? tcp_ack+0x5f1/0x1530^M
>>   ? tcp_schedule_loss_probe+0x101/0x1d0^M
>>   tcp_rcv_established+0x168/0x750^M
>>   tcp_v4_do_rcv+0x13f/0x270^M
>>   tcp_v4_rcv+0x1236/0x15f0^M
>>   ? udp_lib_lport_inuse+0x100/0x100^M
>>   ? raw_local_deliver+0xc8/0x250^M
>>   ip_protocol_deliver_rcu+0x1b/0x290^M
>>   ip_local_deliver_finish+0x6d/0x90^M
>>   ip_sublist_rcv_finish+0x2d/0x40^M
>>   ip_sublist_rcv+0x160/0x200^M
>>   ? __netif_receive_skb_core.constprop.0+0x30d/0xf80^M
>>   ip_list_rcv+0xca/0x120^M
>>   __netif_receive_skb_list_core+0x17f/0x1e0^M
>>   netif_receive_skb_list_internal+0x1c5/0x290^M
>>   napi_complete_done+0x69/0x180^M
>>   ixgbe_poll+0xd93/0x13d0 [ixgbe]^M
>>   __napi_poll+0x20/0x1a0^M
>>   net_rx_action+0x2af/0x310^M
>>   handle_softirqs+0xc8/0x2b0^M
>> __irq_exit_rcu+0x5f/0x80^M
>>   common_interrupt+0x81/0xa0^M
>>   </IRQ>^M
>>
>> (gdb) l *(tcp_ack+0x62e)
>> 0xffffffff81c8601e is in tcp_ack (/home/greearb/git/linux-6.10.dev.y/ 
>> include/linux/list.h:195).
>> 190     * This is only for internal list manipulation where we know
>> 191     * the prev/next entries already!
>> 192     */
>> 193    static inline void __list_del(struct list_head * prev, struct 
>> list_head * next)
>> 194    {
>> 195        next->prev = prev;
>> 196        WRITE_ONCE(prev->next, next);
>> 197    }
>> 198
>> 199    /*
>> (gdb) l *(tcp_rcv_established+0x168)
>> 0xffffffff81c88b88 is in tcp_rcv_established (/home/greearb/git/ 
>> linux-6.10.dev.y/net/ipv4/tcp_input.c:6209).
>> 6204
>> 6205        if (!tcp_validate_incoming(sk, skb, th, 1))
>> 6206            return;
>> 6207
>> 6208    step5:
>> 6209        reason = tcp_ack(sk, skb, FLAG_SLOWPATH | 
>> FLAG_UPDATE_TS_RECENT);
>> 6210        if ((int)reason < 0) {
>> 6211            reason = -reason;
>> 6212            goto discard;
>> 6213        }
>> (gdb)
>>
>> Thanks,
>> Ben
>>
> 
> 


Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ