[<prev] [next>] [<thread-prev] [day] [month] [year] [list]
Message-ID: <86655b79-a334-9d4a-464f-0fc83d5cf86a@huawei.com>
Date: Mon, 26 Aug 2024 11:10:38 +0800
From: Kunkun Jiang <jiangkunkun@...wei.com>
To: Marc Zyngier <maz@...nel.org>, Thomas Gleixner <tglx@...utronix.de>
CC: Oliver Upton <oliver.upton@...ux.dev>, James Morse <james.morse@....com>,
Suzuki K Poulose <suzuki.poulose@....com>, Zenghui Yu <yuzenghui@...wei.com>,
"open list:IRQ SUBSYSTEM" <linux-kernel@...r.kernel.org>, "moderated list:ARM
SMMU DRIVERS" <linux-arm-kernel@...ts.infradead.org>,
<kvmarm@...ts.linux.dev>, "wanghaibin.wang@...wei.com"
<wanghaibin.wang@...wei.com>, <nizhiqiang1@...wei.com>,
"tangnianyao@...wei.com" <tangnianyao@...wei.com>, <wangzhou1@...ilicon.com>
Subject: Re: [bug report] GICv4.1: multiple vpus execute vgic_v4_load at the
same time will greatly increase the time consumption
Hi Marc,
On 2024/8/23 16:49, Marc Zyngier wrote:
> On Thu, 22 Aug 2024 22:20:43 +0100,
> Thomas Gleixner <tglx@...utronix.de> wrote:
>>
>> On Thu, Aug 22 2024 at 13:47, Marc Zyngier wrote:
>>> On Thu, 22 Aug 2024 11:59:50 +0100,
>>> Kunkun Jiang <jiangkunkun@...wei.com> wrote:
>>>>> but that will eat a significant portion of your stack if your kernel is
>>>>> configured for a large number of CPUs.
>>>>>
>>>>
>>>> Currently CONFIG_NR_CPUS=4096,each `struct cpumask` occupies 512 bytes.
>>>
>>> This seems crazy. Why would you build a kernel with something *that*
>>> big, specially considering that you have a lot less than 1k CPUs?
>>
>> That's why CONFIG_CPUMASK_OFFSTACK exists, but that does not help in
>> that context. :)
>>
>>>>> The removal of this global lock is the only option in my opinion.
>>>>> Either the cpumask becomes a stack variable, or it becomes a static
>>>>> per-CPU variable. Both have drawbacks, but they are not a bottleneck
>>>>> anymore.
>>>>
>>>> I also prefer to remove the global lock. Which variable do you think is
>>>> better?
>>>
>>> Given the number of CPUs your system is configured for, there is no
>>> good answer. An on-stack variable is dangerously large, and a per-CPU
>>> cpumask results in 2MB being allocated, which I find insane.
>>
>> Only if there are actually 4096 CPUs enumerated. The per CPU magic is
>> smart enough to limit the damage to the actual number of possible CPUs
>> which are enumerated at boot time. It still will over-allocate due to
>> NR_CPUS being insanely large but on a 4 CPU machine this boils down to
>> 2k of memory waste unless Aaarg64 is stupid enough to allocate for
>> NR_CPUS instead of num_possible_cpus()...
>
> No difference between arm64 and xyz85.999 here.
>
>>
>> That said, on a real 4k CPU system 2M of memory should be the least of
>> your worries.
>
> Don't underestimate the general level of insanity!
>
>>
>>> You'll have to pick your own poison and convince Thomas of the
>>> validity of your approach.
>>
>> As this is an operation which is really not suitable for on demand
>> or large stack allocations the per CPU approach makes sense.
>
> Right, so let's shoot for that. Kunkun, can you please give the
> following hack a go with your workload?
I tested my workload based on the patch below. It solved my problem.
Thank you very much.
Thanks,
Kunkun Jiang
>
> Thanks,
>
> M.
>
> diff --git a/kernel/irq/manage.c b/kernel/irq/manage.c
> index dd53298ef1a5..b6aa259ac749 100644
> --- a/kernel/irq/manage.c
> +++ b/kernel/irq/manage.c
> @@ -224,15 +224,16 @@ int irq_do_set_affinity(struct irq_data *data, const struct cpumask *mask,
> struct irq_desc *desc = irq_data_to_desc(data);
> struct irq_chip *chip = irq_data_get_irq_chip(data);
> const struct cpumask *prog_mask;
> + struct cpumask *tmp_mask;
> int ret;
>
> - static DEFINE_RAW_SPINLOCK(tmp_mask_lock);
> - static struct cpumask tmp_mask;
> + static DEFINE_PER_CPU(struct cpumask, __tmp_mask);
>
> if (!chip || !chip->irq_set_affinity)
> return -EINVAL;
>
> - raw_spin_lock(&tmp_mask_lock);
> + tmp_mask = this_cpu_ptr(&__tmp_mask);
> +
> /*
> * If this is a managed interrupt and housekeeping is enabled on
> * it check whether the requested affinity mask intersects with
> @@ -258,11 +259,11 @@ int irq_do_set_affinity(struct irq_data *data, const struct cpumask *mask,
>
> hk_mask = housekeeping_cpumask(HK_TYPE_MANAGED_IRQ);
>
> - cpumask_and(&tmp_mask, mask, hk_mask);
> - if (!cpumask_intersects(&tmp_mask, cpu_online_mask))
> + cpumask_and(tmp_mask, mask, hk_mask);
> + if (!cpumask_intersects(tmp_mask, cpu_online_mask))
> prog_mask = mask;
> else
> - prog_mask = &tmp_mask;
> + prog_mask = tmp_mask;
> } else {
> prog_mask = mask;
> }
> @@ -272,16 +273,14 @@ int irq_do_set_affinity(struct irq_data *data, const struct cpumask *mask,
> * unless we are being asked to force the affinity (in which
> * case we do as we are told).
> */
> - cpumask_and(&tmp_mask, prog_mask, cpu_online_mask);
> - if (!force && !cpumask_empty(&tmp_mask))
> - ret = chip->irq_set_affinity(data, &tmp_mask, force);
> + cpumask_and(tmp_mask, prog_mask, cpu_online_mask);
> + if (!force && !cpumask_empty(tmp_mask))
> + ret = chip->irq_set_affinity(data, tmp_mask, force);
> else if (force)
> ret = chip->irq_set_affinity(data, mask, force);
> else
> ret = -EINVAL;
>
> - raw_spin_unlock(&tmp_mask_lock);
> -
> switch (ret) {
> case IRQ_SET_MASK_OK:
> case IRQ_SET_MASK_OK_DONE:
>
Powered by blists - more mailing lists