[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-Id: <ab668ddb-1ea5-4444-95fc-f31469b4f05e@vates.tech>
Date: Wed, 02 Jul 2025 22:42:16 +0000
From: "Tu Dinh" <ngoc-tu.dinh@...es.tech>
To: Abinash <abinashlalotra@...il.com>
Cc: jgross@...e.com, sstabellini@...nel.org, oleksandr_tyshchenko@...m.com, xen-devel@...ts.xenproject.org, linux-kernel@...r.kernel.org, "Abinash Singh" <abinashsinghlalotra@...il.com>
Subject: Re: [RFC PATCH] xen/gntdev: reduce stack usage by dynamically allocating gntdev_copy_batch
On 01/07/2025 23:53, Abinash wrote:
> Hi ,
>
> Thanks for pointing that out.
>
> I haven’t measured the performance impact yet — my main focus was on
> getting rid of the stack usage warning triggered by LLVM due to
> inlining. But you're right, gntdev_ioctl_grant_copy() is on a hot
> path, and calling kmalloc() there could definitely slow things down,
> especially under memory pressure.
>
> I’ll run some benchmarks to compare the current approach with the
> dynamic allocation, and also look into alternatives — maybe
> pre-allocating the struct or limiting inlining instead. If you have
> any ideas or suggestions on how best to approach this, I’d be happy to
> hear them.
>
> Do you have any suggestions on how to test the performance?
>
> Best,
> Abinash
>
>
Preallocating may work but I'd be wary of synchronization if the
preallocated struct is shared.
I'd look at optimizing status[] which should save quite a few bytes.
Reducing GNTDEV_COPY_BATCH could be a last resort, but that may also
impact performance.
As for benchmarks, I think you can use iperf and fio with varying packet
sizes/block sizes.
> On Mon, 30 Jun 2025 at 16:05, Tu Dinh <ngoc-tu.dinh@...es.tech> wrote:
>>
>> Hi,
>>
>> On 30/06/2025 06:54, Abinash Singh wrote:
>>> While building the kernel with LLVM, a warning was reported due to
>>> excessive stack usage in `gntdev_ioctl`:
>>>
>>> drivers/xen/gntdev.c:991: warning: stack frame size (1160) exceeds limit (1024) in function 'gntdev_ioctl'
>>>
>>> Further analysis revealed that the large stack frame was caused by
>>> `struct gntdev_copy_batch`, which was declared on the stack inside
>>> `gntdev_ioctl_grant_copy()`. Since this function was inlined into
>>> `gntdev_ioctl`, the stack usage was attributed to the latter.
>>>
>>> This patch fixes the issue by dynamically allocating `gntdev_copy_batch`
>>> using `kmalloc()`, which significantly reduces the stack footprint and
>>> eliminates the warning.
>>>
>>> This approach is consistent with similar fixes upstream, such as:
>>>
>>> commit fa26198d30f3 ("iommu/io-pgtable-arm: dynamically allocate selftest device struct")
>>>
>>> Fixes: a4cdb556cae0 ("xen/gntdev: add ioctl for grant copy")
>>> Signed-off-by: Abinash Singh <abinashsinghlalotra@...il.com>
>>> ---
>>> The stack usage was confirmed using the `-fstack-usage` flag and mannual analysis, which showed:
>>>
>>> drivers/xen/gntdev.c:953: gntdev_ioctl_grant_copy.isra 1048 bytes
>>> drivers/xen/gntdev.c:826: gntdev_copy 56 bytes
>>>
>>> Since `gntdev_ioctl` was calling the inlined `gntdev_ioctl_grant_copy`, the total
>>> frame size exceeded 1024 bytes, triggering the warning.
>>>
>>> This patch addresses the warning and keeps stack usage within acceptable limits.
>>> Is this patch fine or kmalloc may affect performance ?
>>> ---
>>
>> Have you measured the performance impact? gntdev_ioctl_grant_copy is
>> called quite often especially by the backend. I'm afraid calling the
>> allocator here may cause even more slowdown than there already is,
>> especially when memory is tight.
>>
>>> drivers/xen/gntdev.c | 24 +++++++++++++++---------
>>> 1 file changed, 15 insertions(+), 9 deletions(-)
>>>
>>> diff --git a/drivers/xen/gntdev.c b/drivers/xen/gntdev.c
>>> index 61faea1f0663..9811f3d7c87c 100644
>>> --- a/drivers/xen/gntdev.c
>>> +++ b/drivers/xen/gntdev.c
>>> @@ -953,15 +953,19 @@ static int gntdev_grant_copy_seg(struct gntdev_copy_batch *batch,
>>> static long gntdev_ioctl_grant_copy(struct gntdev_priv *priv, void __user *u)
>>> {
>>> struct ioctl_gntdev_grant_copy copy;
>>> - struct gntdev_copy_batch batch;
>>> + struct gntdev_copy_batch *batch;
>>> unsigned int i;
>>> int ret = 0;
>>>
>>> if (copy_from_user(©, u, sizeof(copy)))
>>> return -EFAULT;
>>> -
>>> - batch.nr_ops = 0;
>>> - batch.nr_pages = 0;
>>> +
>>> + batch = kmalloc(sizeof(*batch), GFP_KERNEL);
>>> + if (!batch)
>>> + return -ENOMEM;
>>> +
>>> + batch->nr_ops = 0;
>>> + batch->nr_pages = 0;
>>>
>>> for (i = 0; i < copy.count; i++) {
>>> struct gntdev_grant_copy_segment seg;
>>> @@ -971,18 +975,20 @@ static long gntdev_ioctl_grant_copy(struct gntdev_priv *priv, void __user *u)
>>> goto out;
>>> }
>>>
>>> - ret = gntdev_grant_copy_seg(&batch, &seg, ©.segments[i].status);
>>> + ret = gntdev_grant_copy_seg(batch, &seg, ©.segments[i].status);
>>> if (ret < 0)
>>> goto out;
>>>
>>> cond_resched();
>>> }
>>> - if (batch.nr_ops)
>>> - ret = gntdev_copy(&batch);
>>> - return ret;
>>> + if (batch->nr_ops)
>>> + ret = gntdev_copy(batch);
>>> + goto free_batch;
>>>
>>> out:
>>> - gntdev_put_pages(&batch);
>>> + gntdev_put_pages(batch);
>>> + free_batch:
>>> + kfree(batch);
>>> return ret;
>>> }
>>>
>>
>>
>>
>> Ngoc Tu Dinh | Vates XCP-ng Developer
>>
>> XCP-ng & Xen Orchestra - Vates solutions
>>
>> web: https://vates.tech
>>
Ngoc Tu Dinh | Vates XCP-ng Developer
XCP-ng & Xen Orchestra - Vates solutions
web: https://vates.tech
Powered by blists - more mailing lists