lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Date:   Fri, 23 Jun 2023 08:52:34 +0200
From:   Sebastian Andrzej Siewior <bigeasy@...utronix.de>
To:     Mathieu Desnoyers <mathieu.desnoyers@...icios.com>
Cc:     John Johansen <john.johansen@...onical.com>,
        Swapnil Sapkal <Swapnil.Sapkal@....com>,
        Peter Zijlstra <peterz@...radead.org>,
        linux-kernel@...r.kernel.org, linux-tip-commits@...r.kernel.org,
        Aaron Lu <aaron.lu@...el.com>, x86@...nel.org,
        Andrew Morton <akpm@...ux-foundation.org>,
        Thomas Gleixner <tglx@...utronix.de>
Subject: Re: [tip: sched/core] sched: Fix performance regression introduced
 by mm_cid

On 2023-06-22 10:33:13 [-0400], Mathieu Desnoyers wrote:
> 
> What was fundamentally wrong with the per-cpu caches before commit
> df323337e50 other than being non-RT friendly ? Was the only purpose of that
> commit to reduce the duration of preempt-off critical sections, or is there
> a bigger picture concern it was taking care of by introducing a global pool
> ?

There were memory allocations within preempt-disabled sections
introduced by get_cpu_ptr(). This didn't fly on PREEMPT_RT. After
looking at this on 2 node, 64 CPUs box I didn't get anywhere near
complete usage of the allocated buffers per-CPU buffers. It looked
wasteful.
Based on my testing back then, it looked sufficient to use a global
buffer. 

> Introducing per-cpu memory pools, dealing with migration by giving entries
> back to the right cpu's pool, taking into account the cpu the entry belongs
> to, and use a per-cpu/lock-free data structure allowing lock-free push to
> give back an entry on a remote cpu should do the trick without locking, and
> without long preempt-off critical sections.
> 
> The only downside I see for per-cpu memory pools is a slightly larger memory
> overhead on large multi-core systems. But is that really a concern ?

Yes, if the memory is left unused and can't be reclaimed if needed.

> What am I missing here ?

I added (tried to add) an people claimed that the SLUB allocated got
better over the years. Also on NUMA systems it might be better to use it
since the memory is NUMA local. The allocation is for 8KiB by default.
Is there a test-case/ benchmark I could try this vs kmalloc()?

> Thanks,
> 
> Mathieu

Sebastian

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ