[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <87o75kgspg.ffs@tglx>
Date: Thu, 22 Aug 2024 23:20:43 +0200
From: Thomas Gleixner <tglx@...utronix.de>
To: Marc Zyngier <maz@...nel.org>, Kunkun Jiang <jiangkunkun@...wei.com>
Cc: Oliver Upton <oliver.upton@...ux.dev>, James Morse
<james.morse@....com>, Suzuki K Poulose <suzuki.poulose@....com>, Zenghui
Yu <yuzenghui@...wei.com>, "open list:IRQ
SUBSYSTEM" <linux-kernel@...r.kernel.org>, "moderated list:ARM SMMU
DRIVERS" <linux-arm-kernel@...ts.infradead.org>, kvmarm@...ts.linux.dev,
"wanghaibin.wang@...wei.com" <wanghaibin.wang@...wei.com>,
nizhiqiang1@...wei.com, "tangnianyao@...wei.com" <tangnianyao@...wei.com>,
wangzhou1@...ilicon.com
Subject: Re: [bug report] GICv4.1: multiple vpus execute vgic_v4_load at the
same time will greatly increase the time consumption
On Thu, Aug 22 2024 at 13:47, Marc Zyngier wrote:
> On Thu, 22 Aug 2024 11:59:50 +0100,
> Kunkun Jiang <jiangkunkun@...wei.com> wrote:
>> > but that will eat a significant portion of your stack if your kernel is
>> > configured for a large number of CPUs.
>> >
>>
>> Currently CONFIG_NR_CPUS=4096,each `struct cpumask` occupies 512 bytes.
>
> This seems crazy. Why would you build a kernel with something *that*
> big, specially considering that you have a lot less than 1k CPUs?
That's why CONFIG_CPUMASK_OFFSTACK exists, but that does not help in
that context. :)
>> > The removal of this global lock is the only option in my opinion.
>> > Either the cpumask becomes a stack variable, or it becomes a static
>> > per-CPU variable. Both have drawbacks, but they are not a bottleneck
>> > anymore.
>>
>> I also prefer to remove the global lock. Which variable do you think is
>> better?
>
> Given the number of CPUs your system is configured for, there is no
> good answer. An on-stack variable is dangerously large, and a per-CPU
> cpumask results in 2MB being allocated, which I find insane.
Only if there are actually 4096 CPUs enumerated. The per CPU magic is
smart enough to limit the damage to the actual number of possible CPUs
which are enumerated at boot time. It still will over-allocate due to
NR_CPUS being insanely large but on a 4 CPU machine this boils down to
2k of memory waste unless Aaarg64 is stupid enough to allocate for
NR_CPUS instead of num_possible_cpus()...
That said, on a real 4k CPU system 2M of memory should be the least of
your worries.
> You'll have to pick your own poison and convince Thomas of the
> validity of your approach.
As this is an operation which is really not suitable for on demand
or large stack allocations the per CPU approach makes sense.
Thanks,
tglx
Powered by blists - more mailing lists