[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <51b8abeb-f3de-7a3b-ece0-d5e2fd057bba@nvidia.com>
Date: Tue, 22 Nov 2022 06:31:06 +0200
From: Moshe Shemesh <moshe@...dia.com>
To: Jinpu Wang <jinpu.wang@...os.com>
CC: Leon Romanovsky <leon@...nel.org>, netdev <netdev@...r.kernel.org>,
"RDMA mailing list" <linux-rdma@...r.kernel.org>,
Saeed Mahameed <saeedm@...dia.com>,
Tariq Toukan <tariqt@...dia.com>,
Maor Gottlieb <maorg@...dia.com>, Shay Drory <shayd@...dia.com>
Subject: Re: [BUG] mlx5_core general protection fault in mlx5_cmd_comp_handler
On 11/21/2022 11:11 AM, Jinpu Wang wrote:
> External email: Use caution opening links or attachments
>
>
> On Tue, Nov 15, 2022 at 5:41 PM Moshe Shemesh <moshe@...dia.com> wrote:
>>
>> On 11/15/2022 5:08 PM, Jinpu Wang wrote:
>>> On Tue, Nov 15, 2022 at 6:46 AM Jinpu Wang <jinpu.wang@...os.com> wrote:
>>>> On Tue, Nov 15, 2022 at 6:15 AM Moshe Shemesh <moshe@...dia.com> wrote:
>>>>> On 11/9/2022 11:51 AM, Jinpu Wang wrote:
>>>>>> On Mon, Oct 17, 2022 at 7:54 AM Jinpu Wang <jinpu.wang@...os.com> wrote:
>>>>>>> On Thu, Oct 13, 2022 at 12:27 PM Leon Romanovsky <leon@...nel.org> wrote:
>>>>>>>> On Thu, Oct 13, 2022 at 10:32:55AM +0200, Jinpu Wang wrote:
>>>>>>>>> On Thu, Oct 13, 2022 at 10:18 AM Leon Romanovsky <leon@...nel.org> wrote:
>>>>>>>>>> On Wed, Oct 12, 2022 at 01:55:55PM +0200, Jinpu Wang wrote:
>>>>>>>>>>> Hi Leon, hi Saeed,
>>>>>>>>>>>
>>>>>>>>>>> We have seen crashes during server shutdown on both kernel 5.10 and
>>>>>>>>>>> kernel 5.15 with GPF in mlx5 mlx5_cmd_comp_handler function.
>>>>>>>>>>>
>>>>>>>>>>> All of the crashes point to
>>>>>>>>>>>
>>>>>>>>>>> 1606 memcpy(ent->out->first.data,
>>>>>>>>>>> ent->lay->out, sizeof(ent->lay->out));
>>>>>>>>>>>
>>>>>>>>>>> I guess, it's kind of use after free for ent buffer. I tried to reprod
>>>>>>>>>>> by repeatedly reboot the testing servers, but no success so far.
>>>>>>>>>> My guess is that command interface is not flushed, but Moshe and me
>>>>>>>>>> didn't see how it can happen.
>>>>>>>>>>
>>>>>>>>>> 1206 INIT_DELAYED_WORK(&ent->cb_timeout_work, cb_timeout_handler);
>>>>>>>>>> 1207 INIT_WORK(&ent->work, cmd_work_handler);
>>>>>>>>>> 1208 if (page_queue) {
>>>>>>>>>> 1209 cmd_work_handler(&ent->work);
>>>>>>>>>> 1210 } else if (!queue_work(cmd->wq, &ent->work)) {
>>>>>>>>>> ^^^^^^^ this is what is causing to the splat
>>>>>>>>>> 1211 mlx5_core_warn(dev, "failed to queue work\n");
>>>>>>>>>> 1212 err = -EALREADY;
>>>>>>>>>> 1213 goto out_free;
>>>>>>>>>> 1214 }
>>>>>>>>>>
>>>>>>>>>> <...>
>>>>>>>>>>> Is this problem known, maybe already fixed?
>>>>>>>>>> I don't see any missing Fixes that exist in 6.0 and don't exist in 5.5.32.
>>>>>>>> Sorry it is 5.15.32
>>>>>>>>
>>>>>>>>>> Is it possible to reproduce this on latest upstream code?
>>>>>>>>> I haven't been able to reproduce it, as mentioned above, I tried to
>>>>>>>>> reproduce by simply reboot in loop, no luck yet.
>>>>>>>>> do you have suggestions to speedup the reproduction?
>>>>>>>> Maybe try to shutdown during filling command interface.
>>>>>>>> I think that any query command will do the trick.
>>>>>>> Just an update.
>>>>>>> I tried to run "saquery" in a loop in one session and do "modproble -r
>>>>>>> mlx5_ib && modprobe mlx5_ib" in loop in another session during last
>>>>>>> days , but still no luck. --c
>>>>>>>>> Once I can reproduce, I can also try with kernel 6.0.
>>>>>>>> It will be great.
>>>>>>>>
>>>>>>>> Thanks
>>>>>>> Thanks!
>>>>>> Just want to mention, we see more crash during reboot, all the crash
>>>>>> we saw are all
>>>>>> Intel Intel(R) Xeon(R) Gold 6338 CPU. We use the same HCA on
>>>>>> different servers. So I suspect the bug is related to Ice Lake server.
>>>>>>
>>>>>> In case it matters, here is lspci attached.
>>>>> Please try the following change on 5.15.32, let me know if it solves the
>>>>> failure :
>>>> Thank you Moshe, I will test it on affected servers and report back the result.
>>> Hi Moshe,
>>>
>>> I've been running the reboot tests on 4 affected machines in parallel
>>> for more than 6 hours, in total did 300+ reboot, I can no longer
>>> reproduce the crash. without the fix, I was able to reproduce 2 times
>>> in 20 reboots.
>>> So I think the bug is fixed.
>>
>> Great !
>>
>>> I also did some basic functional test via RNBD/IPOIB, all look good.
>>> Tested-by: Jack Wang <jinpu.wang@...os.com>
>>> Please provide a formal fix.
>>
>> Will do.
> Hi Moshe,
> A gentle ping, when will you send the fix?
>
> Thanks!
Hi, it is part of Saeed's mlx5 fixes patchset.
He sent it a couple of hours ago.
>
>> Thanks!
>>
>>> Thx!
>>>>> diff --git a/drivers/net/ethernet/mellanox/mlx5/core/cmd.c
>>>>> b/drivers/net/ethernet/mellanox/mlx5/core/cmd.c
>>>>> index e06a6104e91f..d45ca9c52a21 100644
>>>>> --- a/drivers/net/ethernet/mellanox/mlx5/core/cmd.c
>>>>> +++ b/drivers/net/ethernet/mellanox/mlx5/core/cmd.c
>>>>> @@ -971,6 +971,7 @@ static void cmd_work_handler(struct work_struct *work)
>>>>> cmd_ent_get(ent);
>>>>> set_bit(MLX5_CMD_ENT_STATE_PENDING_COMP, &ent->state);
>>>>>
>>>>> + cmd_ent_get(ent); /* for the _real_ FW event on completion */
>>>>> /* Skip sending command to fw if internal error */
>>>>> if (mlx5_cmd_is_down(dev) || !opcode_allowed(&dev->cmd, ent->op)) {
>>>>> u8 status = 0;
>>>>> @@ -984,7 +985,6 @@ static void cmd_work_handler(struct work_struct *work)
>>>>> return;
>>>>> }
>>>>>
>>>>> - cmd_ent_get(ent); /* for the _real_ FW event on completion */
>>>>> /* ring doorbell after the descriptor is valid */
>>>>> mlx5_core_dbg(dev, "writing 0x%x to command doorbell\n", 1 <<
>>>>> ent->idx);
>>>>> wmb();
>>>>> @@ -1598,8 +1598,8 @@ static void mlx5_cmd_comp_handler(struct
>>>>> mlx5_core_dev *dev, u64 vec, bool force
>>>>> cmd_ent_put(ent); /* timeout work was
>>>>> canceled */
>>>>>
>>>>> if (!forced || /* Real FW completion */
>>>>> - pci_channel_offline(dev->pdev) || /* FW is
>>>>> inaccessible */
>>>>> - dev->state == MLX5_DEVICE_STATE_INTERNAL_ERROR)
>>>>> + mlx5_cmd_is_down(dev) || /* No real FW
>>>>> completion is expected */
>>>>> + !opcode_allowed(cmd, ent->op))
>>>>> cmd_ent_put(ent);
>>>>>
>>>>> ent->ts2 = ktime_get_ns();
>>>>>
>>>>>> Thx!
Powered by blists - more mailing lists