[<prev] [next>] [<thread-prev] [day] [month] [year] [list]
Message-ID: <ZXDsyLyQYWW4OZN3@x130>
Date: Wed, 6 Dec 2023 13:51:04 -0800
From: Saeed Mahameed <saeed@...nel.org>
To: Shifeng Li <lishifeng@...gfor.com.cn>
Cc: saeedm@...dia.com, leon@...nel.org, davem@...emloft.net,
edumazet@...gle.com, kuba@...nel.org, pabeni@...hat.com,
eranbe@...lanox.com, moshe@...lanox.com, netdev@...r.kernel.org,
linux-rdma@...r.kernel.org, linux-kernel@...r.kernel.org,
dinghui@...gfor.com.cn, lishifeng1992@....com,
Moshe Shemesh <moshe@...dia.com>
Subject: Re: [PATCH net v4] net/mlx5e: Fix a race in command alloc flow
On 02 Dec 00:01, Shifeng Li wrote:
>Fix a cmd->ent use after free due to a race on command entry.
>Such race occurs when one of the commands releases its last refcount and
>frees its index and entry while another process running command flush
>flow takes refcount to this command entry. The process which handles
>commands flush may see this command as needed to be flushed if the other
>process allocated a ent->idx but didn't set ent to cmd->ent_arr in
>cmd_work_handler(). Fix it by moving the assignment of cmd->ent_arr into
>the spin lock.
>
>[70013.081955] BUG: KASAN: use-after-free in mlx5_cmd_trigger_completions+0x1e2/0x4c0 [mlx5_core]
>[70013.081967] Write of size 4 at addr ffff88880b1510b4 by task kworker/26:1/1433361
>[70013.081968]
>[70013.082028] Workqueue: events aer_isr
>[70013.082053] Call Trace:
>[70013.082067] dump_stack+0x8b/0xbb
>[70013.082086] print_address_description+0x6a/0x270
>[70013.082102] kasan_report+0x179/0x2c0
>[70013.082173] mlx5_cmd_trigger_completions+0x1e2/0x4c0 [mlx5_core]
>[70013.082267] mlx5_cmd_flush+0x80/0x180 [mlx5_core]
>[70013.082304] mlx5_enter_error_state+0x106/0x1d0 [mlx5_core]
>[70013.082338] mlx5_try_fast_unload+0x2ea/0x4d0 [mlx5_core]
>[70013.082377] remove_one+0x200/0x2b0 [mlx5_core]
>[70013.082409] pci_device_remove+0xf3/0x280
>[70013.082439] device_release_driver_internal+0x1c3/0x470
>[70013.082453] pci_stop_bus_device+0x109/0x160
>[70013.082468] pci_stop_and_remove_bus_device+0xe/0x20
>[70013.082485] pcie_do_fatal_recovery+0x167/0x550
>[70013.082493] aer_isr+0x7d2/0x960
>[70013.082543] process_one_work+0x65f/0x12d0
>[70013.082556] worker_thread+0x87/0xb50
>[70013.082571] kthread+0x2e9/0x3a0
>[70013.082592] ret_from_fork+0x1f/0x40
>
>The logical relationship of this error is as follows:
>
> aer_recover_work | ent->work
>-------------------------------------------+------------------------------
>aer_recover_work_func |
>|- pcie_do_recovery |
> |- report_error_detected |
> |- mlx5_pci_err_detected |cmd_work_handler
> |- mlx5_enter_error_state | |- cmd_alloc_index
> |- enter_error_state | |- lock cmd->alloc_lock
> |- mlx5_cmd_flush | |- clear_bit
> |- mlx5_cmd_trigger_completions| |- unlock cmd->alloc_lock
> |- lock cmd->alloc_lock |
> |- vector = ~dev->cmd.vars.bitmask
> |- for_each_set_bit |
> |- cmd_ent_get(cmd->ent_arr[i]) (UAF)
> |- unlock cmd->alloc_lock | |- cmd->ent_arr[ent->idx]=ent
>
>The cmd->ent_arr[ent->idx] assignment and the bit clearing are not
>protected by the cmd->alloc_lock in cmd_work_handler().
>
>Fixes: 50b2412b7e78 ("net/mlx5: Avoid possible free of command entry while timeout comp handler")
>Reviewed-by: Moshe Shemesh <moshe@...dia.com>
>Signed-off-by: Shifeng Li <lishifeng@...gfor.com.cn>
LGTM,
Applied to net-mlx5.
Thanks,
Saeed.
Powered by blists - more mailing lists