linux-kernel - Re: [PATCH RFC v1 1/2] genirq/affinity: add support for limiting managed interrupts

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite for Android: free password hash cracker in your pocket

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <0f5c1192-090b-4c00-a951-9613289057df@kernel.org>
Date: Fri, 1 Nov 2024 08:06:02 +0100
From: Jiri Slaby <jirislaby@...nel.org>
To: 'Guanjun' <guanjun@...ux.alibaba.com>, corbet@....net, axboe@...nel.dk,
 mst@...hat.com, jasowang@...hat.com, xuanzhuo@...ux.alibaba.com,
 eperezma@...hat.com, vgoyal@...hat.com, stefanha@...hat.com,
 miklos@...redi.hu, tglx@...utronix.de, peterz@...radead.org,
 akpm@...ux-foundation.org, paulmck@...nel.org, thuth@...hat.com,
 rostedt@...dmis.org, bp@...en8.de, xiongwei.song@...driver.com,
 linux-doc@...r.kernel.org, linux-kernel@...r.kernel.org,
 linux-block@...r.kernel.org, virtualization@...ts.linux.dev,
 linux-fsdevel@...r.kernel.org
Subject: Re: [PATCH RFC v1 1/2] genirq/affinity: add support for limiting
 managed interrupts

Hi,

On 31. 10. 24, 8:46, 'Guanjun' wrote:
> From: Guanjun <guanjun@...ux.alibaba.com>
> 
> Commit c410abbbacb9 (genirq/affinity: Add is_managed to struct irq_affinity_desc)
> introduced is_managed bit to struct irq_affinity_desc. Due to queue interrupts
> treated as managed interrupts, in scenarios where a large number of
> devices are present (using massive msix queue interrupts), an excessive number
> of IRQ matrix bits (about num_online_cpus() * nvecs) are reserved during
> interrupt allocation. This sequently leads to the situation where interrupts
> for some devices cannot be properly allocated.
> 
> Support for limiting the number of managed interrupts on every node per allocation.
> 
> Signed-off-by: Guanjun <guanjun@...ux.alibaba.com>
> ---
>   .../admin-guide/kernel-parameters.txt         |  9 +++
>   block/blk-mq-cpumap.c                         |  2 +-
>   drivers/virtio/virtio_vdpa.c                  |  2 +-
>   fs/fuse/virtio_fs.c                           |  2 +-
>   include/linux/group_cpus.h                    |  2 +-
>   kernel/irq/affinity.c                         | 11 ++--
>   lib/group_cpus.c                              | 55 ++++++++++++++++++-
>   7 files changed, 73 insertions(+), 10 deletions(-)
> 
> diff --git a/Documentation/admin-guide/kernel-parameters.txt b/Documentation/admin-guide/kernel-parameters.txt
> index 9b61097a6448..ac80f35d04c9 100644
> --- a/Documentation/admin-guide/kernel-parameters.txt
> +++ b/Documentation/admin-guide/kernel-parameters.txt
> @@ -3238,6 +3238,15 @@
>   			different yeeloong laptops.
>   			Example: machtype=lemote-yeeloong-2f-7inch
>   
> +	managed_irqs_per_node=
> +			[KNL,SMP] Support for limiting the number of managed
> +			interrupts on every node to prevent the case that
> +			interrupts cannot be properly allocated where a large
> +			number of devices are present. The default number is 0,
> +			that means no limit to the number of managed irqs.
> +			Format: integer between 0 and num_possible_cpus() / num_possible_nodes()
> +			Default: 0

Kernel parameters suck. Esp. here you have to guess to even properly 
boot. Could this be auto-tuned instead?

> --- a/lib/group_cpus.c
> +++ b/lib/group_cpus.c
> @@ -11,6 +11,30 @@
>   
>   #ifdef CONFIG_SMP
>   
> +static unsigned int __read_mostly managed_irqs_per_node;
> +static struct cpumask managed_irqs_cpumsk[MAX_NUMNODES] __cacheline_aligned_in_smp = {

This is quite excessive. On SUSE configs, this is 8192 cpu bits * 1024 
nodes = 1 M. For everyone. You have to allocate this dynamically 
instead. See e.g. setup_node_to_cpumask_map().

> +	[0 ... MAX_NUMNODES-1] = {CPU_BITS_ALL}
> +};
> +
> +static int __init irq_managed_setup(char *str)
> +{
> +	int ret;
> +
> +	ret = kstrtouint(str, 10, &managed_irqs_per_node);
> +	if (ret < 0) {
> +		pr_warn("managed_irqs_per_node= cannot parse, ignored\n");

could not be parsed

> +		return 0;
> +	}
> +
> +	if (managed_irqs_per_node * num_possible_nodes() > num_possible_cpus()) {
> +		managed_irqs_per_node = num_possible_cpus() / num_possible_nodes();
> +		pr_warn("managed_irqs_per_node= cannot be larger than %u\n",
> +			managed_irqs_per_node);
> +	}
> +	return 1;
> +}
> +__setup("managed_irqs_per_node=", irq_managed_setup);
> +
>   static void grp_spread_init_one(struct cpumask *irqmsk, struct cpumask *nmsk,
>   				unsigned int cpus_per_grp)
>   {
...
> @@ -332,6 +380,7 @@ static int __group_cpus_evenly(unsigned int startgrp, unsigned int numgrps,
>   /**
>    * group_cpus_evenly - Group all CPUs evenly per NUMA/CPU locality
>    * @numgrps: number of groups
> + * @is_managed: if these groups managed by kernel

are managed by the kernel

>    *
>    * Return: cpumask array if successful, NULL otherwise. And each element
>    * includes CPUs assigned to this group

thanks,
-- 
js
suse labs