[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <13d4bbcb-244e-df72-4281-755d4bc51274@intel.com>
Date: Fri, 2 Dec 2022 11:44:10 -0700
From: Dave Jiang <dave.jiang@...el.com>
To: Reinette Chatre <reinette.chatre@...el.com>, fenghua.yu@...el.com,
vkoul@...nel.org, dmaengine@...r.kernel.org
Cc: linux-kernel@...r.kernel.org
Subject: Re: [PATCH 1/3] dmaengine: idxd: Let probe fail when workqueue cannot
be enabled
On 12/2/2022 11:25 AM, Reinette Chatre wrote:
> The workqueue is enabled when the appropriate driver is loaded and
> disabled when the driver is removed. When the driver is removed it
> assumes that the workqueue was enabled successfully and proceeds to
> free allocations made during workqueue enabling.
>
> Failure during workqueue enabling does not prevent the driver from
> being loaded. This is because the error path within drv_enable_wq()
> returns success unless a second failure is encountered
> during the error path. By returning success it is possible to load
> the driver even if the workqueue cannot be enabled and
> allocations that do not exist are attempted to be freed during
> driver remove.
>
> Some examples of problematic flows:
> (a)
>
> idxd_dmaengine_drv_probe() -> drv_enable_wq() -> idxd_wq_request_irq():
> In above flow, if idxd_wq_request_irq() fails then
> idxd_wq_unmap_portal() is called on error exit path, but
> drv_enable_wq() returns 0 because idxd_wq_disable() succeeds. The
> driver is thus loaded successfully.
>
> idxd_dmaengine_drv_remove()->drv_disable_wq()->idxd_wq_unmap_portal()
> Above flow on driver unload triggers the WARN in devm_iounmap() because
> the device resource has already been removed during error path of
> drv_enable_wq().
>
> (b)
>
> idxd_dmaengine_drv_probe() -> drv_enable_wq() -> idxd_wq_request_irq():
> In above flow, if idxd_wq_request_irq() fails then
> idxd_wq_init_percpu_ref() is never called to initialize the percpu
> counter, yet the driver loads successfully because drv_enable_wq()
> returns 0.
>
> idxd_dmaengine_drv_remove()->__idxd_wq_quiesce()->percpu_ref_kill():
> Above flow on driver unload triggers a BUG when attempting to drop the
> initial ref of the uninitialized percpu ref:
> BUG: kernel NULL pointer dereference, address: 0000000000000010
>
> Fix the drv_enable_wq() error path by returning the original error that
> indicates failure of workqueue enabling. This ensures that the probe
> fails when an error is encountered and the driver remove paths are only
> attempted when the workqueue was enabled successfully.
>
> Fixes: 1f2bb40337f0 ("dmaengine: idxd: move wq_enable() to device.c")
> Signed-off-by: Reinette Chatre <reinette.chatre@...el.com>
Reviewed-by: Dave Jiang <dave.jiang@...el.com>
> ---
> drivers/dma/idxd/device.c | 3 +--
> 1 file changed, 1 insertion(+), 2 deletions(-)
>
> diff --git a/drivers/dma/idxd/device.c b/drivers/dma/idxd/device.c
> index 6f44fa8f78a5..fcd03d29a941 100644
> --- a/drivers/dma/idxd/device.c
> +++ b/drivers/dma/idxd/device.c
> @@ -1391,8 +1391,7 @@ int drv_enable_wq(struct idxd_wq *wq)
> err_irq:
> idxd_wq_unmap_portal(wq);
> err_map_portal:
> - rc = idxd_wq_disable(wq, false);
> - if (rc < 0)
> + if (idxd_wq_disable(wq, false))
> dev_dbg(dev, "wq %s disable failed\n", dev_name(wq_confdev(wq)));
> err:
> return rc;
Powered by blists - more mailing lists