[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <cd5670d7-b2ba-487b-94c4-bb781a9d0bf1@linux.dev>
Date: Sat, 21 Jun 2025 09:09:41 -0700
From: Zhu Yanjun <yanjun.zhu@...ux.dev>
To: Arnd Bergmann <arnd@...db.de>, Arnd Bergmann <arnd@...nel.org>,
Bernard Metzler <bmt@...ich.ibm.com>, Jason Gunthorpe <jgg@...pe.ca>,
Leon Romanovsky <leon@...nel.org>, Nathan Chancellor <nathan@...nel.org>
Cc: Nick Desaulniers <nick.desaulniers+lkml@...il.com>,
Bill Wendling <morbo@...gle.com>, Justin Stitt <justinstitt@...gle.com>,
Potnuri Bharat Teja <bharat@...lsio.com>, Showrya M N <showrya@...lsio.com>,
Eric Biggers <ebiggers@...gle.com>, linux-rdma@...r.kernel.org,
linux-kernel@...r.kernel.org, llvm@...ts.linux.dev
Subject: Re: [PATCH] RDMA/siw: work around clang stack size warning
在 2025/6/21 1:43, Arnd Bergmann 写道:
> On Sat, Jun 21, 2025, at 06:12, Zhu Yanjun wrote:
>> 在 2025/6/20 4:43, Arnd Bergmann 写道:
>>
>> Because the array of kvec structures in siw_tx_hdt consumes the majority
>> of the stack space, would it be possible to use kmalloc or a similar
>> dynamic memory allocation function instead of allocating this memory on
>> the stack?
>>
>> Would using kmalloc (or an equivalent) also effectively resolve the
>> stack usage issue?
> Yes, moving the allocation somewhere else (kmalloc, static variable,
> per siw_sge, per siw_wqe) would avoid the high stack usage effectively,
> it's a tradeoff and I picked the solution that made the most sense
> to me, but there is a good chance another alternative is better here.
>
> The main differences are:
>
> - kmalloc() adds runtime overhead that may be expensive in a
> fast path
>
> - kmalloc() can fail, which adds complexity from error handling.
> Note that small allocations with GFP_KERNEL do not fail but instead
> wait for memory to become available
>
> - If kmalloc() runs into a low-memory situation, it can go through
> writeback, which in turn can use more stack space than the
> on-stack allocation it was replacing
>
> - static allocations bloat the kernel image and require locking that
> may be expensive
>
> - per-object preallocations can be wasteful if a lot of objects
> are created, and can still require locking if the object is used
> from multiple threads
>
> As I wrote, I mainly picked the 'noinline_for_stack' approach
> here since that is how the code is known to work with gcc, so
> there is little risk of my patch causing problems.
>
> Moving the both the kvec array and the page array into
> the siw_wqe is likely better here, I'm not familiar enough
> with the driver to tell whether that is an overall improvement.Th
Thank you very much. There are several possible solutions to this issue,
and the appropriate one depends on the specific scenario. For the
problem in siw, the noinline_for_stack approach has been selected. In my
opinion, this appears to be more of a workaround than a true fix. While
it does mitigate the issue, the underlying problem in siw still remains.
That said, now that we have a clearer understanding of the problem and
its root cause through discussions and extended analysis, a more robust
and long-term solution should eventually be proposed.
Thanks,
Reviewed-by: Zhu Yanjun <yanjun.zhu@...ux.dev>
Zhu Yanjun
>
> A related change I would like to see is to remove the
> kmap_local_page() in this driver and instead make it
> depend on 64BIT or !CONFIG_HIGHMEM, to slowly chip away
> at the code that is highmem aware throughout the kernel.
> I'm not sure if that that would also help drop the array
> here.
>
> Arnd
--
Best Regards,
Yanjun.Zhu
Powered by blists - more mailing lists