linux-kernel - Re: [PATCH 09/10] percpu: replace area map allocator with bitmap allocator

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <20170719221313.GB92176@dennisz-mbp.dhcp.thefacebook.com>
Date:   Wed, 19 Jul 2017 18:13:14 -0400
From:   Dennis Zhou <dennisz@...com>
To:     Josef Bacik <josef@...icpanda.com>
CC:     Tejun Heo <tj@...nel.org>, Christoph Lameter <cl@...ux.com>,
        <kernel-team@...com>, <linux-kernel@...r.kernel.org>,
        <linux-mm@...ck.org>, Dennis Zhou <dennisszhou@...il.com>
Subject: Re: [PATCH 09/10] percpu: replace area map allocator with bitmap
 allocator

Hi Josef,

Thanks for taking a look at my code.

On Wed, Jul 19, 2017 at 07:16:35PM +0000, Josef Bacik wrote:
> 
> Actually I decided I do want to complain about this.  Have you considered making
> chunks statically sized, like slab does?  We could avoid this whole bound_map
> thing completely and save quite a few cycles trying to figure out how big our
> allocation was.  Thanks,

I did consider something along the lines of a slab allocator, but
ultimately utilization and fragmentation were why I decided against it.

Percpu memory is handled by giving each cpu its own copy of the object
to use. This means cpus can avoid cache coherence when accessing and
manipulating the object. To do this, the percpu allocator creates chunks
to serve each allocation out of. Because each cpu has its own copy, there
is a high cost for having each chunk lying around (and this memory in
general).

With slab allocation, it takes liberty in caching often used sizes and
accepting internal fragmentation for performance. Unfortunately, the
percpu memory allocator does not necessarily know what is going to get
allocated. It would need to keep many slabs around to serve each
allocation which can be quite expensive. In the worst-case, long living
percpu allocations can keep entire slabs alive as there is no way to
perform consolidation once addresses are given out. Additionally, any
internal fragmentation caused by ill-fit objects is amplified by the
number of possible cpus.

Thanks,
Dennis