[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <fa916ae4-1ed3-4f90-8577-3666ff0fe84a@nvidia.com>
Date: Tue, 10 Jun 2025 12:50:57 +0300
From: Patrisious Haddad <phaddad@...dia.com>
To: Arnd Bergmann <arnd@...nel.org>, Leon Romanovsky <leon@...nel.org>,
Jason Gunthorpe <jgg@...pe.ca>
Cc: Arnd Bergmann <arnd@...db.de>, Christian Göttsche
<cgzones@...glemail.com>, Serge Hallyn <serge@...lyn.com>,
Chiara Meiohas <cmeiohas@...dia.com>, Al Viro <viro@...iv.linux.org.uk>,
linux-rdma@...r.kernel.org, linux-kernel@...r.kernel.org
Subject: Re: [PATCH] RDMA/mlx5: reduce stack usage in mlx5_ib_ufile_hw_cleanup
On 6/10/2025 12:28 PM, Arnd Bergmann wrote:
> External email: Use caution opening links or attachments
>
>
> From: Arnd Bergmann <arnd@...db.de>
>
> This function has an array of eight mlx5_async_cmd structures, which
> often fits on the stack, but depending on the configuration can
> end up blowing the stack frame warning limit:
>
> drivers/infiniband/hw/mlx5/devx.c:2670:6: error: stack frame size (1392) exceeds limit (1280) in 'mlx5_ib_ufile_hw_cleanup' [-Werror,-Wframe-larger-than]
>
> Change this to a dynamic allocation instead. While a kmalloc()
> can theoretically fail, a GFP_KERNEL allocation under a page will
> block until memory has been freed up, so in the worst case, this
> only adds extra time in an already constrained environment.
>
> Fixes: 7c891a4dbcc1 ("RDMA/mlx5: Add implementation for ufile_hw_cleanup device operation")
> Signed-off-by: Arnd Bergmann <arnd@...db.de>
> ---
> drivers/infiniband/hw/mlx5/devx.c | 8 +++++++-
> 1 file changed, 7 insertions(+), 1 deletion(-)
>
> diff --git a/drivers/infiniband/hw/mlx5/devx.c b/drivers/infiniband/hw/mlx5/devx.c
> index 2479da8620ca..c3c0ea219ab7 100644
> --- a/drivers/infiniband/hw/mlx5/devx.c
> +++ b/drivers/infiniband/hw/mlx5/devx.c
> @@ -2669,7 +2669,7 @@ static void devx_wait_async_destroy(struct mlx5_async_cmd *cmd)
>
> void mlx5_ib_ufile_hw_cleanup(struct ib_uverbs_file *ufile)
> {
> - struct mlx5_async_cmd async_cmd[MAX_ASYNC_CMDS];
> + struct mlx5_async_cmd *async_cmd;
Please preserve reverse Christmas tree deceleration.
> struct ib_ucontext *ucontext = ufile->ucontext;
> struct ib_device *device = ucontext->device;
> struct mlx5_ib_dev *dev = to_mdev(device);
> @@ -2678,6 +2678,10 @@ void mlx5_ib_ufile_hw_cleanup(struct ib_uverbs_file *ufile)
> int head = 0;
> int tail = 0;
>
> + async_cmd = kcalloc(MAX_ASYNC_CMDS, sizeof(*async_cmd), GFP_KERNEL);
> + if (WARN_ON(!async_cmd))
> + return;
But honestly I'm not sure I like this, the whole point of this patch was
performance optimization for teardown flow, and this function is called
in a loop not even one time.
So I'm really not sure about how much kcalloc can slow it down here, and
it failing is whole other issue.
I'm thinking out-loud here, but theoretically we know stack size and
this struct size at compile time , so can we should be able to add some
kind of ifdef check "if (stack_frame_size < struct_size)" skip this
function and maybe print some warning.
(since it is purely optimization function and logically the code will
continue correctly without it - but if it needs to be executed then let
it stay like this and needs a big enough stack - which is most of today
systems anyway) ?
> +
> list_for_each_entry(uobject, &ufile->uobjects, list) {
> WARN_ON(uverbs_try_lock_object(uobject, UVERBS_LOOKUP_WRITE));
>
> @@ -2713,6 +2717,8 @@ void mlx5_ib_ufile_hw_cleanup(struct ib_uverbs_file *ufile)
> devx_wait_async_destroy(&async_cmd[head % MAX_ASYNC_CMDS]);
> head++;
> }
> +
> + kfree(async_cmd);
> }
>
> static ssize_t devx_async_cmd_event_read(struct file *filp, char __user *buf,
> --
> 2.39.5
>
Powered by blists - more mailing lists