netdev - Re: [PATCH net v6] xsk: avoid data corruption on cq descriptor number

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <23b56ddb-f5a3-4b2b-bf75-e93aa39ab63f@suse.de>
Date: Wed, 26 Nov 2025 10:15:36 +0100
From: Fernando Fernandez Mancera <fmancera@...e.de>
To: Jason Xing <kerneljasonxing@...il.com>,
 Maciej Fijalkowski <maciej.fijalkowski@...el.com>
Cc: netdev@...r.kernel.org, csmate@....hu, bpf@...r.kernel.org,
 davem@...emloft.net, edumazet@...gle.com, kuba@...nel.org,
 pabeni@...hat.com, horms@...nel.org, sdf@...ichev.me, hawk@...nel.org,
 daniel@...earbox.net, ast@...nel.org, john.fastabend@...il.com,
 magnus.karlsson@...el.com
Subject: Re: [PATCH net v6] xsk: avoid data corruption on cq descriptor number

On 11/26/25 2:14 AM, Jason Xing wrote:
> On Wed, Nov 26, 2025 at 12:31 AM Maciej Fijalkowski
> <maciej.fijalkowski@...el.com> wrote:
>>
>> On Tue, Nov 25, 2025 at 08:11:37PM +0800, Jason Xing wrote:
>>> On Tue, Nov 25, 2025 at 7:40 PM Fernando Fernandez Mancera
>>> <fmancera@...e.de> wrote:
>>>>
>>>> On 11/25/25 12:41 AM, Jason Xing wrote:
>>>>> On Tue, Nov 25, 2025 at 1:14 AM Fernando Fernandez Mancera
>>>>> <fmancera@...e.de> wrote:
>>>>>>
>>>>>> Since commit 30f241fcf52a ("xsk: Fix immature cq descriptor
>>>>>> production"), the descriptor number is stored in skb control block and
>>>>>> xsk_cq_submit_addr_locked() relies on it to put the umem addrs onto
>>>>>> pool's completion queue.
>>>>>>
>>>>>> skb control block shouldn't be used for this purpose as after transmit
>>>>>> xsk doesn't have control over it and other subsystems could use it. This
>>>>>> leads to the following kernel panic due to a NULL pointer dereference.
>>>>>>
>>>>>>    BUG: kernel NULL pointer dereference, address: 0000000000000000
>>>>>>    #PF: supervisor read access in kernel mode
>>>>>>    #PF: error_code(0x0000) - not-present page
>>>>>>    PGD 0 P4D 0
>>>>>>    Oops: Oops: 0000 [#1] SMP NOPTI
>>>>>>    CPU: 2 UID: 1 PID: 927 Comm: p4xsk.bin Not tainted 6.16.12+deb14-cloud-amd64 #1 PREEMPT(lazy)  Debian 6.16.12-1
>>>>>>    Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.17.0-debian-1.17.0-1 04/01/2014
>>>>>>    RIP: 0010:xsk_destruct_skb+0xd0/0x180
>>>>>>    [...]
>>>>>>    Call Trace:
>>>>>>     <IRQ>
>>>>>>     ? napi_complete_done+0x7a/0x1a0
>>>>>>     ip_rcv_core+0x1bb/0x340
>>>>>>     ip_rcv+0x30/0x1f0
>>>>>>     __netif_receive_skb_one_core+0x85/0xa0
>>>>>>     process_backlog+0x87/0x130
>>>>>>     __napi_poll+0x28/0x180
>>>>>>     net_rx_action+0x339/0x420
>>>>>>     handle_softirqs+0xdc/0x320
>>>>>>     ? handle_edge_irq+0x90/0x1e0
>>>>>>     do_softirq.part.0+0x3b/0x60
>>>>>>     </IRQ>
>>>>>>     <TASK>
>>>>>>     __local_bh_enable_ip+0x60/0x70
>>>>>>     __dev_direct_xmit+0x14e/0x1f0
>>>>>>     __xsk_generic_xmit+0x482/0xb70
>>>>>>     ? __remove_hrtimer+0x41/0xa0
>>>>>>     ? __xsk_generic_xmit+0x51/0xb70
>>>>>>     ? _raw_spin_unlock_irqrestore+0xe/0x40
>>>>>>     xsk_sendmsg+0xda/0x1c0
>>>>>>     __sys_sendto+0x1ee/0x200
>>>>>>     __x64_sys_sendto+0x24/0x30
>>>>>>     do_syscall_64+0x84/0x2f0
>>>>>>     ? __pfx_pollwake+0x10/0x10
>>>>>>     ? __rseq_handle_notify_resume+0xad/0x4c0
>>>>>>     ? restore_fpregs_from_fpstate+0x3c/0x90
>>>>>>     ? switch_fpu_return+0x5b/0xe0
>>>>>>     ? do_syscall_64+0x204/0x2f0
>>>>>>     ? do_syscall_64+0x204/0x2f0
>>>>>>     ? do_syscall_64+0x204/0x2f0
>>>>>>     entry_SYSCALL_64_after_hwframe+0x76/0x7e
>>>>>>     </TASK>
>>>>>>    [...]
>>>>>>    Kernel panic - not syncing: Fatal exception in interrupt
>>>>>>    Kernel Offset: 0x1c000000 from 0xffffffff81000000 (relocation range: 0xffffffff80000000-0xffffffffbfffffff)
>>>>>>
>>>>>> Instead use the skb destructor_arg pointer along with pointer tagging.
>>>>>> As pointers are always aligned to 8B, use the bottom bit to indicate
>>>>>> whether this a single address or an allocated struct containing several
>>>>>> addresses.
>>>>>>
>>>>>> Fixes: 30f241fcf52a ("xsk: Fix immature cq descriptor production")
>>>>>> Closes: https://lore.kernel.org/netdev/0435b904-f44f-48f8-afb0-68868474bf1c@nop.hu/
>>>>>> Suggested-by: Jakub Kicinski <kuba@...nel.org>
>>>>>> Signed-off-by: Fernando Fernandez Mancera <fmancera@...e.de>
>>>>>
>>>>> Reviewed-by: Jason Xing <kerneljasonxing@...il.com>
>>>>>
>>>>> Could you also post a patch on top of net-next as it has diverged from
>>>>> the net tree?
>>>>>
>>>>
>>>> I think that is handled by maintainers when merging the branches. A
>>>> repost would be wrong because linux-next.git and linux.git will have a
>>>> different variant of the same commit..
>>>
>>> But this patch cannot be applied cleanly in the net-next tree...
>>
>> What we care here is that it applies to net as that's a tree that this
>> patch has been posted to.
> 
> It sounds like I can post my approach without this patch on net-next,
> right? I have no idea how long I should keep waiting :S
> 
> To be clear, what I meant was to ask Fernando to post a new rebased
> patch targetting net-next. If the patch doesn't need to land on
> net-next, I will post it as soon as possible.
> 

My patch landed on net tree and probably soon, net tree changes are 
going to be merged on net-next tree. If there are conflicts when merging 
the patch the maintainer will ask us or they will solve them.

That was my understanding of how the workflow is.

Thanks,
Fernando.

> Thanks,
> Jason
> 
>>>
>>>>
>>>> Please, let me know if I am wrong here.
>>>
>>> I'm not quite sure either.
>>>
>>> Thanks,
>>> Jason
>>>
>>>>
>>>> Thanks,
>>>> Fernando.