[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <ZcRGl758ek_at4Ha@liutao02-mac.local>
Date: Thu, 8 Feb 2024 11:12:23 +0800
From: Tao Liu <taoliu828@....com>
To: Cosmin Ratiu <cratiu@...dia.com>
Cc: roid@...dia.com, paulb@...dia.com, vladbu@...dia.com,
dchumak@...dia.com, saeedm@...dia.com, taoliu828@....com,
netdev@...r.kernel.org
Subject: Re: Report mlx5_core crash
On 02/07 , Cosmin Ratiu wrote:
> On Tue, 2024-02-06 at 15:01 +0800, Tao Liu wrote:
> > On 01/31 , Tao Liu wrote:
> > > Hi Mellanox team,
> > >
> > > We hit a crash in mlx5_core which is similar with commit
> > > de31854ece17 ("net/mlx5e: Fix nullptr on deleting mirroring rule").
> > > But they are different cases, our case is:
> > > in_port(...),eth(...) \
> > > actions:set(tunnel(...)),vxlan_sys_4789,set(tunnel(...)),vxlan_sys_4789,...
> > >
> > > BUG: kernel NULL pointer dereference, address: 0000000000000270
> > > RIP: 0010:del_sw_hw_rule+0x29/0x190 [mlx5_core]
>
> Hello,
>
> I'll help you find and fix the problem.
> Your core dump analysis was very useful, but not sufficient to find the
> cause of the crash. Would you mind sharing a set of reproduction steps
> so we can debug this further?
>
> Thank you,
> Cosmin.
Hi Cosmin,
Thanks for your reply.
It's hard to reproduce the crash directly. In our case the rule forwards ip
broadcast traffic to 5 vxlan remotes. And driver creates 6 mlx5_flow_rule
which include 5 mlx5_pkt_reformat and 1 counter.
It triggers only when two *dr_action in struct mlx5_pkt_reformat have same
lower 32 bits, which determined by memory allocation.
Is it possible that we do some fault injection in unit test to reproduce?
Best regards,
Tao
Powered by blists - more mailing lists