[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <f1c68345-a511-46b8-964c-a00bb62274ba@wanadoo.fr>
Date: Thu, 3 Jul 2025 19:35:51 +0200
From: Christophe JAILLET <christophe.jaillet@...adoo.fr>
To: Jürgen Groß <jgross@...e.com>,
Tu Dinh <ngoc-tu.dinh@...es.tech>, Abinash <abinashlalotra@...il.com>
Cc: sstabellini@...nel.org, oleksandr_tyshchenko@...m.com,
xen-devel@...ts.xenproject.org, linux-kernel@...r.kernel.org,
Abinash Singh <abinashsinghlalotra@...il.com>
Subject: Re: [RFC PATCH] xen/gntdev: reduce stack usage by dynamically
allocating gntdev_copy_batch
Le 03/07/2025 à 07:22, Jürgen Groß a écrit :
> On 03.07.25 00:42, Tu Dinh wrote:
>> On 01/07/2025 23:53, Abinash wrote:
>>> Hi ,
>>>
>>> Thanks for pointing that out.
>>>
>>> I haven’t measured the performance impact yet — my main focus was on
>>> getting rid of the stack usage warning triggered by LLVM due to
>>> inlining. But you're right, gntdev_ioctl_grant_copy() is on a hot
>>> path, and calling kmalloc() there could definitely slow things down,
>>> especially under memory pressure.
>>>
>>> I’ll run some benchmarks to compare the current approach with the
>>> dynamic allocation, and also look into alternatives — maybe
>>> pre-allocating the struct or limiting inlining instead. If you have
>>> any ideas or suggestions on how best to approach this, I’d be happy to
>>> hear them.
>>>
>>> Do you have any suggestions on how to test the performance?
>>>
>>> Best,
>>> Abinash
>>>
>>>
>>
>> Preallocating may work but I'd be wary of synchronization if the
>> preallocated struct is shared.
>>
>> I'd look at optimizing status[] which should save quite a few bytes.
>>
>> Reducing GNTDEV_COPY_BATCH could be a last resort, but that may also
>> impact performance.
>
> IMHO the most promising way would be to dynamically allocate the struct,
> but
> don't free it at the end of the ioctl. Instead it could be put into a list
> anchored in struct gntdev_priv, so freeing would be done only at close()
> time.
>
> Synchronization would be minimal (just for taking a free struct from the
> list
> or putting it back again), while memory usage would be basically just as
> needed,
> depending on the number of concurrent threads using the same file
> descriptor
> for the ioctl.
>
> This approach would even allow to raise GNTDEV_COPY_BATCH, maybe
> resulting even
> in a gain of performance.
>
> I'll write a patch implementing the allocation scheme.
>
>
> Juergen
It may be an overkill, but sometimes we see pattern that try to keep the
best of the 2 worlds. Something like:
static struct gntdev_copy_batch static_batch;
static struct mutex my_mutex;
static long gntdev_ioctl_grant_copy(...)
{
struct gntdev_copy_batch *dynamic_batch = NULL;
struct gntdev_copy_batch *batch;
...
if (mutex_trylock(&my_mutex)) {
/*
* No concurrent access?
* Use a shared static variable to avoid an allocation
*/
batch = &static_batch;
else {
/* otherwise, we need some fresh memory */
dynamic_batch = kmalloc(sizeof(*batch), GFP_KERNEL);
if (!batch)
return -ENOMEM;
batch = dynamic_batch;
}
/* do stuff with 'batch' */
...
free_batch:
if (!dynamic_batch)
mutex_unlock(&my_mutex);
else
kfree(dynamic_batch);
return ret;
}
Just my 2c.
CJ
Powered by blists - more mailing lists