lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [day] [month] [year] [list]
Date:   Wed, 13 Oct 2021 03:44:49 +0000
From:   Hyeonggon Yoo <42.hyeyoo@...il.com>
To:     Christoph Lameter <cl@...two.de>
Cc:     linux-mm@...ck.org, linux-kernel@...r.kernel.org,
        Pekka Enberg <penberg@...nel.org>,
        David Rientjes <rientjes@...gle.com>,
        Joonsoo Kim <iamjoonsoo.kim@....com>,
        Andrew Morton <akpm@...ux-foundation.org>,
        Vlastimil Babka <vbabka@...e.cz>
Subject: Re: [RFC] Some questions and an idea on SLUB/SLAB


Hello Christoph, thank you for answering.

On Mon, Oct 11, 2021 at 09:13:52AM +0200, Christoph Lameter wrote:
> On Sat, 9 Oct 2021, Hyeonggon Yoo wrote:
> 
> >  - Is there a reason that SLUB does not implement cache coloring?
> >    it will help utilizing hardware cache. Especially in block layer,
> >    they are literally *squeezing* its performance now.
> 
> Well as Matthew says: The high associativity of caches 

it seems not useful on my both machines (4-way / 8-way set associative) too.

> and the execution
> of other code path seems to make this not useful anymore.
> 
> I am sure you can find a benchmark that shows some benefit. But please
> realize that in real-life the OS must perform work. This means that
> multiple other code paths are executed that affect cache use and placement
> of data in cache lines.
> 

cache coloring can make benchmark results better. But as slab uses more
cache lines - that reduces other code paths' cache line. Did I get right?

> 
> >  - In SLAB, do we really need to flush queues every few seconds?
> >    (per cpu queue and shared queue). Flushing alien caches makes
> >    sense, but flushing queues seems reducing it's fastpath.
> >    But yeah, we need to reclaim memory. can we just defer this?
> 
> The queues are designed to track cache hot objects (See the Bonwick
> paper). After a while the cachelines will be used for other purposes and
> no longer reflect what is in the caches. That is why they need to be
> expired.

I've read Bonwick paper but I thought expiring was need for reclaiming
memory. maybe I got it wrong.. I should read it again.

> 
> 
> >   - I don't like SLAB's per-node cache coloring, because L1 cache
> >     isn't shared between cpus. For now, cpus in same node are sharing
> >     its colour_next - but we can do better.
> 
> This differs based on the cpu architecture in use. SLAB has an ideal model
> of how caches work and keeps objects cache hot based on that. In real life
> the cpu architecture differs from what SLAB things how caches operate.
> 

So the point is, As cache hierarchy differs based on architecture,
assuming cpus have both unique cache per cpu, and shared cache among
cpus can misfit in some architectures.

> >     what about splitting some per-cpu variables into kmem_cache_cpu
> >     like SLUB? I think cpu_cache, colour (and colour_next),
> >     alloc{hit,miss}, and free{hit,miss} can be per-cpu variables.
> 
> That would in turn increase memory use and potentially the cache footprint
> of the hot paths.
>

I thought splitting percpu data was need for coloring but it
isn't useful. So that's unnecessary cost.

Thanks,
Hyeonggon.

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ