linux-kernel - Re: [RFC PATCH] xen/gntdev: reduce stack usage by dynamically allocating gntdev_copy

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <f1c68345-a511-46b8-964c-a00bb62274ba@wanadoo.fr>
Date: Thu, 3 Jul 2025 19:35:51 +0200
From: Christophe JAILLET <christophe.jaillet@...adoo.fr>
To: Jürgen Groß <jgross@...e.com>,
 Tu Dinh <ngoc-tu.dinh@...es.tech>, Abinash <abinashlalotra@...il.com>
Cc: sstabellini@...nel.org, oleksandr_tyshchenko@...m.com,
 xen-devel@...ts.xenproject.org, linux-kernel@...r.kernel.org,
 Abinash Singh <abinashsinghlalotra@...il.com>
Subject: Re: [RFC PATCH] xen/gntdev: reduce stack usage by dynamically
 allocating gntdev_copy_batch

Le 03/07/2025 à 07:22, Jürgen Groß a écrit :
> On 03.07.25 00:42, Tu Dinh wrote:
>> On 01/07/2025 23:53, Abinash wrote:
>>> Hi ,
>>>
>>> Thanks for pointing that out.
>>>
>>> I haven’t measured the performance impact yet — my main focus was on
>>> getting rid of the stack usage warning triggered by LLVM due to
>>> inlining. But you're right, gntdev_ioctl_grant_copy() is on a hot
>>> path, and calling kmalloc() there could definitely slow things down,
>>> especially under memory pressure.
>>>
>>> I’ll run some benchmarks to compare the current approach with the
>>> dynamic allocation, and also look into alternatives — maybe
>>> pre-allocating the struct or limiting inlining instead. If you have
>>> any ideas or suggestions on how best to approach this, I’d be happy to
>>> hear them.
>>>
>>> Do you have any suggestions on how to test the performance?
>>>
>>> Best,
>>> Abinash
>>>
>>>
>>
>> Preallocating may work but I'd be wary of synchronization if the
>> preallocated struct is shared.
>>
>> I'd look at optimizing status[] which should save quite a few bytes.
>>
>> Reducing GNTDEV_COPY_BATCH could be a last resort, but that may also
>> impact performance.
> 
> IMHO the most promising way would be to dynamically allocate the struct, 
> but
> don't free it at the end of the ioctl. Instead it could be put into a list
> anchored in struct gntdev_priv, so freeing would be done only at close() 
> time.
> 
> Synchronization would be minimal (just for taking a free struct from the 
> list
> or putting it back again), while memory usage would be basically just as 
> needed,
> depending on the number of concurrent threads using the same file 
> descriptor
> for the ioctl.
> 
> This approach would even allow to raise GNTDEV_COPY_BATCH, maybe 
> resulting even
> in a gain of performance.
> 
> I'll write a patch implementing the allocation scheme.
> 
> 
> Juergen

It may be an overkill, but sometimes we see pattern that try to keep the 
best of the 2 worlds. Something like:


static struct gntdev_copy_batch static_batch;
static struct mutex my_mutex;

static long gntdev_ioctl_grant_copy(...)
{
	struct gntdev_copy_batch *dynamic_batch = NULL;
	struct gntdev_copy_batch *batch;

	...

	if (mutex_trylock(&my_mutex)) {
		/*
		 * No concurrent access?
		 * Use a shared static variable to avoid an allocation
		 */
		batch = &static_batch;
	else {
		/* otherwise, we need some fresh memory */
		dynamic_batch = kmalloc(sizeof(*batch), GFP_KERNEL);
		if (!batch)
			return -ENOMEM;

		batch = dynamic_batch;
	}

	/* do stuff with 'batch' */
	...

free_batch:
	if (!dynamic_batch)
		mutex_unlock(&my_mutex);
	else
		kfree(dynamic_batch);
  	return ret;
  }


Just my 2c.

CJ