linux-kernel - Re: [PATCH] RDMA/mlx5: reduce stack usage in mlx5_ib_ufile_hw

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <fa916ae4-1ed3-4f90-8577-3666ff0fe84a@nvidia.com>
Date: Tue, 10 Jun 2025 12:50:57 +0300
From: Patrisious Haddad <phaddad@...dia.com>
To: Arnd Bergmann <arnd@...nel.org>, Leon Romanovsky <leon@...nel.org>,
 Jason Gunthorpe <jgg@...pe.ca>
Cc: Arnd Bergmann <arnd@...db.de>, Christian Göttsche
 <cgzones@...glemail.com>, Serge Hallyn <serge@...lyn.com>,
 Chiara Meiohas <cmeiohas@...dia.com>, Al Viro <viro@...iv.linux.org.uk>,
 linux-rdma@...r.kernel.org, linux-kernel@...r.kernel.org
Subject: Re: [PATCH] RDMA/mlx5: reduce stack usage in mlx5_ib_ufile_hw_cleanup


On 6/10/2025 12:28 PM, Arnd Bergmann wrote:
> External email: Use caution opening links or attachments
>
>
> From: Arnd Bergmann <arnd@...db.de>
>
> This function has an array of eight mlx5_async_cmd structures, which
> often fits on the stack, but depending on the configuration can
> end up blowing the stack frame warning limit:
>
> drivers/infiniband/hw/mlx5/devx.c:2670:6: error: stack frame size (1392) exceeds limit (1280) in 'mlx5_ib_ufile_hw_cleanup' [-Werror,-Wframe-larger-than]
>
> Change this to a dynamic allocation instead. While a kmalloc()
> can theoretically fail, a GFP_KERNEL allocation under a page will
> block until memory has been freed up, so in the worst case, this
> only adds extra time in an already constrained environment.
>
> Fixes: 7c891a4dbcc1 ("RDMA/mlx5: Add implementation for ufile_hw_cleanup device operation")
> Signed-off-by: Arnd Bergmann <arnd@...db.de>
> ---
>   drivers/infiniband/hw/mlx5/devx.c | 8 +++++++-
>   1 file changed, 7 insertions(+), 1 deletion(-)
>
> diff --git a/drivers/infiniband/hw/mlx5/devx.c b/drivers/infiniband/hw/mlx5/devx.c
> index 2479da8620ca..c3c0ea219ab7 100644
> --- a/drivers/infiniband/hw/mlx5/devx.c
> +++ b/drivers/infiniband/hw/mlx5/devx.c
> @@ -2669,7 +2669,7 @@ static void devx_wait_async_destroy(struct mlx5_async_cmd *cmd)
>
>   void mlx5_ib_ufile_hw_cleanup(struct ib_uverbs_file *ufile)
>   {
> -       struct mlx5_async_cmd async_cmd[MAX_ASYNC_CMDS];
> +       struct mlx5_async_cmd *async_cmd;
Please preserve reverse Christmas tree deceleration.
>          struct ib_ucontext *ucontext = ufile->ucontext;
>          struct ib_device *device = ucontext->device;
>          struct mlx5_ib_dev *dev = to_mdev(device);
> @@ -2678,6 +2678,10 @@ void mlx5_ib_ufile_hw_cleanup(struct ib_uverbs_file *ufile)
>          int head = 0;
>          int tail = 0;
>
> +       async_cmd = kcalloc(MAX_ASYNC_CMDS, sizeof(*async_cmd), GFP_KERNEL);
> +       if (WARN_ON(!async_cmd))
> +               return;

But honestly I'm not sure I like this, the whole point of this patch was 
performance optimization for teardown flow, and this function is called 
in a loop not even one time.

So I'm really not sure about how much kcalloc can slow it down here, and 
it failing is whole other issue.


I'm thinking out-loud here, but theoretically we know stack size and 
this struct size at compile time , so can we should be able to add some 
kind of ifdef check "if (stack_frame_size < struct_size)" skip this 
function and maybe print some warning.
(since it is purely optimization function and logically the code will 
continue correctly without it - but if it needs to be executed then let 
it stay like this and needs a big enough stack - which is most of today 
systems anyway) ?

> +
>          list_for_each_entry(uobject, &ufile->uobjects, list) {
>                  WARN_ON(uverbs_try_lock_object(uobject, UVERBS_LOOKUP_WRITE));
>
> @@ -2713,6 +2717,8 @@ void mlx5_ib_ufile_hw_cleanup(struct ib_uverbs_file *ufile)
>                  devx_wait_async_destroy(&async_cmd[head % MAX_ASYNC_CMDS]);
>                  head++;
>          }
> +
> +       kfree(async_cmd);
>   }
>
>   static ssize_t devx_async_cmd_event_read(struct file *filp, char __user *buf,
> --
> 2.39.5
>