lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite for Android: free password hash cracker in your pocket
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <87o75kgspg.ffs@tglx>
Date: Thu, 22 Aug 2024 23:20:43 +0200
From: Thomas Gleixner <tglx@...utronix.de>
To: Marc Zyngier <maz@...nel.org>, Kunkun Jiang <jiangkunkun@...wei.com>
Cc: Oliver Upton <oliver.upton@...ux.dev>, James Morse
 <james.morse@....com>, Suzuki K Poulose <suzuki.poulose@....com>, Zenghui
 Yu <yuzenghui@...wei.com>, "open list:IRQ
 SUBSYSTEM" <linux-kernel@...r.kernel.org>, "moderated list:ARM SMMU
 DRIVERS" <linux-arm-kernel@...ts.infradead.org>, kvmarm@...ts.linux.dev,
 "wanghaibin.wang@...wei.com" <wanghaibin.wang@...wei.com>,
 nizhiqiang1@...wei.com, "tangnianyao@...wei.com" <tangnianyao@...wei.com>,
 wangzhou1@...ilicon.com
Subject: Re: [bug report] GICv4.1: multiple vpus execute vgic_v4_load at the
 same time will greatly increase the time consumption

On Thu, Aug 22 2024 at 13:47, Marc Zyngier wrote:
> On Thu, 22 Aug 2024 11:59:50 +0100,
> Kunkun Jiang <jiangkunkun@...wei.com> wrote:
>> > but that will eat a significant portion of your stack if your kernel is
>> > configured for a large number of CPUs.
>> > 
>> 
>> Currently CONFIG_NR_CPUS=4096,each `struct cpumask` occupies 512 bytes.
>
> This seems crazy. Why would you build a kernel with something *that*
> big, specially considering that you have a lot less than 1k CPUs?

That's why CONFIG_CPUMASK_OFFSTACK exists, but that does not help in
that context. :)

>> > The removal of this global lock is the only option in my opinion.
>> > Either the cpumask becomes a stack variable, or it becomes a static
>> > per-CPU variable. Both have drawbacks, but they are not a bottleneck
>> > anymore.
>> 
>> I also prefer to remove the global lock. Which variable do you think is
>> better?
>
> Given the number of CPUs your system is configured for, there is no
> good answer. An on-stack variable is dangerously large, and a per-CPU
> cpumask results in 2MB being allocated, which I find insane.

Only if there are actually 4096 CPUs enumerated. The per CPU magic is
smart enough to limit the damage to the actual number of possible CPUs
which are enumerated at boot time. It still will over-allocate due to
NR_CPUS being insanely large but on a 4 CPU machine this boils down to
2k of memory waste unless Aaarg64 is stupid enough to allocate for
NR_CPUS instead of num_possible_cpus()...

That said, on a real 4k CPU system 2M of memory should be the least of
your worries.

> You'll have to pick your own poison and convince Thomas of the
> validity of your approach.

As this is an operation which is really not suitable for on demand
or large stack allocations the per CPU approach makes sense.

Thanks,

        tglx

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ