netdev - Re: [BUG] mlx5_core general protection fault in mlx5_cmd_comp

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <CAMGffEkaZUDLfXQXK239Nt-DSxqkZpbC=8zUeubv0pxLuoMcZw@mail.gmail.com>
Date:   Tue, 15 Nov 2022 16:08:45 +0100
From:   Jinpu Wang <jinpu.wang@...os.com>
To:     Moshe Shemesh <moshe@...dia.com>
Cc:     Leon Romanovsky <leon@...nel.org>, netdev <netdev@...r.kernel.org>,
        RDMA mailing list <linux-rdma@...r.kernel.org>,
        Saeed Mahameed <saeedm@...dia.com>,
        Tariq Toukan <tariqt@...dia.com>,
        Maor Gottlieb <maorg@...dia.com>, Shay Drory <shayd@...dia.com>
Subject: Re: [BUG] mlx5_core general protection fault in mlx5_cmd_comp_handler

On Tue, Nov 15, 2022 at 6:46 AM Jinpu Wang <jinpu.wang@...os.com> wrote:
>
> On Tue, Nov 15, 2022 at 6:15 AM Moshe Shemesh <moshe@...dia.com> wrote:
> >
> >
> > On 11/9/2022 11:51 AM, Jinpu Wang wrote:
> > > On Mon, Oct 17, 2022 at 7:54 AM Jinpu Wang <jinpu.wang@...os.com> wrote:
> > >> On Thu, Oct 13, 2022 at 12:27 PM Leon Romanovsky <leon@...nel.org> wrote:
> > >>> On Thu, Oct 13, 2022 at 10:32:55AM +0200, Jinpu Wang wrote:
> > >>>> On Thu, Oct 13, 2022 at 10:18 AM Leon Romanovsky <leon@...nel.org> wrote:
> > >>>>> On Wed, Oct 12, 2022 at 01:55:55PM +0200, Jinpu Wang wrote:
> > >>>>>> Hi Leon, hi Saeed,
> > >>>>>>
> > >>>>>> We have seen crashes during server shutdown on both kernel 5.10 and
> > >>>>>> kernel 5.15 with GPF in mlx5 mlx5_cmd_comp_handler function.
> > >>>>>>
> > >>>>>> All of the crashes point to
> > >>>>>>
> > >>>>>> 1606                         memcpy(ent->out->first.data,
> > >>>>>> ent->lay->out, sizeof(ent->lay->out));
> > >>>>>>
> > >>>>>> I guess, it's kind of use after free for ent buffer. I tried to reprod
> > >>>>>> by repeatedly reboot the testing servers, but no success  so far.
> > >>>>> My guess is that command interface is not flushed, but Moshe and me
> > >>>>> didn't see how it can happen.
> > >>>>>
> > >>>>>    1206         INIT_DELAYED_WORK(&ent->cb_timeout_work, cb_timeout_handler);
> > >>>>>    1207         INIT_WORK(&ent->work, cmd_work_handler);
> > >>>>>    1208         if (page_queue) {
> > >>>>>    1209                 cmd_work_handler(&ent->work);
> > >>>>>    1210         } else if (!queue_work(cmd->wq, &ent->work)) {
> > >>>>>                            ^^^^^^^ this is what is causing to the splat
> > >>>>>    1211                 mlx5_core_warn(dev, "failed to queue work\n");
> > >>>>>    1212                 err = -EALREADY;
> > >>>>>    1213                 goto out_free;
> > >>>>>    1214         }
> > >>>>>
> > >>>>> <...>
> > >>>>>> Is this problem known, maybe already fixed?
> > >>>>> I don't see any missing Fixes that exist in 6.0 and don't exist in 5.5.32.
> > >>> Sorry it is 5.15.32
> > >>>
> > >>>>> Is it possible to reproduce this on latest upstream code?
> > >>>> I haven't been able to reproduce it, as mentioned above, I tried to
> > >>>> reproduce by simply reboot in loop, no luck yet.
> > >>>> do you have suggestions to speedup the reproduction?
> > >>> Maybe try to shutdown during filling command interface.
> > >>> I think that any query command will do the trick.
> > >> Just an update.
> > >> I tried to run "saquery" in a loop in one session and do "modproble -r
> > >> mlx5_ib && modprobe mlx5_ib" in loop in another session during last
> > >> days , but still no luck. --c
> > >>>> Once I can reproduce, I can also try with kernel 6.0.
> > >>> It will be great.
> > >>>
> > >>> Thanks
> > >> Thanks!
> > > Just want to mention, we see more crash during reboot, all the crash
> > > we saw are all
> > > Intel  Intel(R) Xeon(R) Gold 6338 CPU. We use the same HCA on
> > > different servers. So I suspect the bug is related to Ice Lake server.
> > >
> > > In case it matters, here is lspci attached.
> >
> >
> > Please try the following change on 5.15.32, let me know if it solves the
> > failure :
>
> Thank you Moshe, I will test it on affected servers and report back the result.
Hi Moshe,

I've been running the reboot tests on 4 affected machines in parallel
for more than 6 hours,  in total did 300+ reboot, I can no longer
reproduce the crash. without the fix, I was able to reproduce 2 times
in 20 reboots.
So I think the bug is fixed.
I also did some basic functional test via RNBD/IPOIB, all look good.
Tested-by: Jack Wang <jinpu.wang@...os.com>
Please provide a formal fix.

Thx!
> >
> > diff --git a/drivers/net/ethernet/mellanox/mlx5/core/cmd.c
> > b/drivers/net/ethernet/mellanox/mlx5/core/cmd.c
> > index e06a6104e91f..d45ca9c52a21 100644
> > --- a/drivers/net/ethernet/mellanox/mlx5/core/cmd.c
> > +++ b/drivers/net/ethernet/mellanox/mlx5/core/cmd.c
> > @@ -971,6 +971,7 @@ static void cmd_work_handler(struct work_struct *work)
> >                  cmd_ent_get(ent);
> >          set_bit(MLX5_CMD_ENT_STATE_PENDING_COMP, &ent->state);
> >
> > +       cmd_ent_get(ent); /* for the _real_ FW event on completion */
> >          /* Skip sending command to fw if internal error */
> >          if (mlx5_cmd_is_down(dev) || !opcode_allowed(&dev->cmd, ent->op)) {
> >                  u8 status = 0;
> > @@ -984,7 +985,6 @@ static void cmd_work_handler(struct work_struct *work)
> >                  return;
> >          }
> >
> > -       cmd_ent_get(ent); /* for the _real_ FW event on completion */
> >          /* ring doorbell after the descriptor is valid */
> >          mlx5_core_dbg(dev, "writing 0x%x to command doorbell\n", 1 <<
> > ent->idx);
> >          wmb();
> > @@ -1598,8 +1598,8 @@ static void mlx5_cmd_comp_handler(struct
> > mlx5_core_dev *dev, u64 vec, bool force
> >                                  cmd_ent_put(ent); /* timeout work was
> > canceled */
> >
> >                          if (!forced || /* Real FW completion */
> > -                           pci_channel_offline(dev->pdev) || /* FW is
> > inaccessible */
> > -                           dev->state == MLX5_DEVICE_STATE_INTERNAL_ERROR)
> > +                            mlx5_cmd_is_down(dev) || /* No real FW
> > completion is expected */
> > +                            !opcode_allowed(cmd, ent->op))
> >                                  cmd_ent_put(ent);
> >
> >                          ent->ts2 = ktime_get_ns();
> >
> > > Thx!