[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <CAKq9yRiy166UpMA1HFiuzs0EMEM_aXbXbaTztbXcJ5CKF4F64w@mail.gmail.com>
Date: Sat, 16 Mar 2024 16:26:19 +0100
From: Daniele Salvatore Albano <d.albano@...il.com>
To: Stanislav Fomichev <sdf@...gle.com>
Cc: netdev@...r.kernel.org
Subject: Re: [mlx5_core] kernel NULL pointer dereference when sending packets
with AF_XDP using the hw checksum
On Sat, 16 Mar 2024 at 05:11, Stanislav Fomichev <sdf@...gle.com> wrote:
>
> On 03/16, Daniele Salvatore Albano wrote:
> > Hey there,
> >
> > Hope this is the right ml, if not sorry in advance.
> >
> > I have been facing a reproducible kernel panic with 6.8.0 and 6.8.1
> > when sending packets and enabling the HW checksum calculation with
> > AF_XDP on my mellanox connect 5.
> >
> > Running xskgen ( https://github.com/fomichev/xskgen ), which I saw
> > mentioned in some patches related to AF_XDP and the hw checksum
> > support. In addition to the minimum parameters to make it work, adding
> > the -m option is enough to trigger the kernel panic.
>
> Now I wonder if I ever tested only -m (without passing a flag to request
> tx timestamp). Maybe you can try to confirm that `xskgen -mC` works?
No, the kernel panics and, from the look of it, the stack trace and
the RIP are the same.
[ 157.108402] RIP: 0010:mlx5e_free_xdpsq_desc+0x266/0x320 [mlx5_core]
...
[ 157.108827] Call Trace:
[ 157.108841] <TASK>
[ 157.108855] ? show_regs+0x6d/0x80
[ 157.108876] ? __die+0x24/0x80
[ 157.108893] ? page_fault_oops+0x99/0x1b0
[ 157.108916] ? do_user_addr_fault+0x2ee/0x6b0
[ 157.108937] ? exc_page_fault+0x83/0x1b0
[ 157.108958] ? asm_exc_page_fault+0x27/0x30
[ 157.108986] ? mlx5e_free_xdpsq_desc+0x266/0x320 [mlx5_core]
[ 157.109154] mlx5e_poll_xdpsq_cq+0x17c/0x4f0 [mlx5_core]
[ 157.109324] mlx5e_napi_poll+0x45e/0x7b0 [mlx5_core]
[ 157.109470] __napi_poll+0x33/0x200
[ 157.109488] net_rx_action+0x181/0x2e0
[ 157.109502] ? sched_clock_cpu+0x12/0x1e0
[ 157.109524] __do_softirq+0xe1/0x363
[ 157.109544] ? __pfx_smpboot_thread_fn+0x10/0x10
[ 157.109565] run_ksoftirqd+0x37/0x60
[ 157.109582] smpboot_thread_fn+0xe3/0x1e0
[ 157.109600] kthread+0xf2/0x120
[ 157.109616] ? __pfx_kthread+0x10/0x10
[ 157.109632] ret_from_fork+0x47/0x70
[ 157.109648] ? __pfx_kthread+0x10/0x10
[ 157.109663] ret_from_fork_asm+0x1b/0x30
[ 157.109686] </TASK>
> If you can test custom patches, I think the following should fix it:
>
> diff --git a/include/net/xdp_sock.h b/include/net/xdp_sock.h
> index 3cb4dc9bd70e..3d54de168a6d 100644
> --- a/include/net/xdp_sock.h
> +++ b/include/net/xdp_sock.h
> @@ -188,6 +188,8 @@ static inline void xsk_tx_metadata_complete(struct xsk_tx_metadata_compl *compl,
> {
> if (!compl)
> return;
> + if (!compl->tx_timestamp)
> + return;
>
> *compl->tx_timestamp = ops->tmo_fill_timestamp(priv);
> }
Just built the same kernel from mainline ubuntu 6.8.1 with the patch
applied and it now works with both xsk and my code.
Thanks!
Daniele
Daniele
Powered by blists - more mailing lists