linux-kernel - Re: [PATCH v4] slob: add size header to all allocations

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [day] [month] [year] [list]
Message-ID: <b01c9f31-4336-dfe7-9042-41339ea9cc0d@suse.cz>
Date:   Tue, 30 Nov 2021 16:39:28 +0100
From:   Vlastimil Babka <vbabka@...e.cz>
To:     David Laight <David.Laight@...LAB.COM>,
        Christoph Lameter <cl@...two.org>
Cc:     Rustam Kovhaev <rkovhaev@...il.com>,
        "penberg@...nel.org" <penberg@...nel.org>,
        "rientjes@...gle.com" <rientjes@...gle.com>,
        "iamjoonsoo.kim@....com" <iamjoonsoo.kim@....com>,
        "akpm@...ux-foundation.org" <akpm@...ux-foundation.org>,
        "corbet@....net" <corbet@....net>,
        "djwong@...nel.org" <djwong@...nel.org>,
        "david@...morbit.com" <david@...morbit.com>,
        "linux-kernel@...r.kernel.org" <linux-kernel@...r.kernel.org>,
        "linux-mm@...ck.org" <linux-mm@...ck.org>,
        "linux-doc@...r.kernel.org" <linux-doc@...r.kernel.org>,
        "gregkh@...uxfoundation.org" <gregkh@...uxfoundation.org>,
        "viro@...iv.linux.org.uk" <viro@...iv.linux.org.uk>,
        "dvyukov@...gle.com" <dvyukov@...gle.com>
Subject: Re: [PATCH v4] slob: add size header to all allocations

On 11/30/21 16:21, David Laight wrote:
> From: Vlastimil Babka
>> Sent: 30 November 2021 14:56
>> 
>> On 11/23/21 11:18, David Laight wrote:
>> > From: Vlastimil Babka
>> >
>> > Or just a single byte that is the index of the associated free list structure.
>> > For 32bit and for the smaller kmalloc() area it may be reasonable to have
>> > a separate array indexed by the page of the address.
>> >
>> >> > So I guess placement at the beginning cannot be avoided. That in turn runs
>> >> > into trouble with the DMA requirements on some platforms where the
>> >> > beginning of the object has to be cache line aligned.
>> >>
>> >> It's no problem to have the real beginning of the object aligned, and the
>> >> prepended header not.
>> >
>> > I'm not sure that helps.
>> > The header can't share a cache line with the previous item (because it
>> > might be mapped for DMA) so will always take a full cache line.
>> 
>> So if this is true, then I think we already have a problem with SLOB today
>> (and AFAICS it's not even due to changes done by my 2019 commit 59bb47985c1d
>> ("mm, sl[aou]b: guarantee natural alignment for kmalloc(power-of-two)" but
>> older).
>> 
>> Let's say we are on arm64 where (AFAICS):
>> ARCH_KMALLOC_MINALIGN = ARCH_DMA_MINALIGN = 128
>> ARCH_SLAB_MINALIGN = 64
> 
> Is that valid?
> Isn't SLAB being used to implement kmalloc() so the architecture
> defined alignment must apply?

SLAB is used to implement kmalloc() yes, but that's an implementation
detail. I assume that we provide these DMA guarantees to all kmalloc() users
as we don't know which will use it for DMA and which not, but if somebody
creates their specific SLAB cache, they have to decide explicitly if they
are going to use DMA with those objects, and request such alignment if yes.
If not, we can use smaller alignment that's only required by e.g. the CPU.

>> The point is that ARCH_SLAB_MINALIGN is smaller than ARCH_DMA_MINALIGN.
>> 
>> Let's say we call kmalloc(64) and get a completely fresh page.
>> In SLOB, alloc() or rather __do_kmalloc_node() will calculate minalign to
>> max(ARCH_KMALLOC_MINALIGN, ARCH_SLAB_MINALIGN) thus 128.
>> It will call slob_alloc() for size of size+minalign=64+128=192, align and
>> align_offset = 128
>> Thus the allocation will use 128 bytes for the header, 64 for the object.
>> Both the header and object aligned to 128 bytes.
>> But the remaining 64 bytes of the second 128 bytes will remain free, as the
>> allocated size is 192 bytes:
>> 
>> | 128B header, aligned | 64B object | 64B free | rest also free |
> 
> That is horribly wasteful on memory :-)

Yes. I don't know how this historically came to be for SLOB, which was
supposed to minimize memory usage (at the expense of cpu efficiency). But
the point raised in this thread that if we extend this to all
kmem_cache_alloc() allocations to make them possible to free with kfree(),
we'll make it a lot worse :/

>> If there's another kmalloc allocation, the 128 bytes aligment due to
>> ARCH_KMALLOC_MINALIGN will avoid it from using these 64 bytes, so that's
>> fine. But if there's a kmem_cache_alloc() from a cache serving <=64B
>> objects, it will be aligned to ARCH_SLAB_MINALIGN and happily use those 64
>> bytes that share the 128 block where the previous kmalloc allocation lies.
> 
> If the memory returned by kmem_cache_alloc() can be used for DMA then
> ARCH_DMA_MINALIGN has to apply to the returned buffers.
> So, maybe, that cache can't exist?

See above, I assume that cache cannot be used for DMA. But if we are trying
to protect the kmalloc(64) DMA guarantees, then that shouldn't depend on the
guarantees of the second, unrelated allocation?

> I'd expect that ARCH_DMA_MINALIGN forces allocations to be a multiple
> of that size.

Doesn't seem so. That would indeed fix the problem, assuming it really is a
problem (yet seems nobody reported it occuring in practice).

> More particularly the rest of the area can't be allocated to anything else.
> So it ought to be valid to return the 2nd half of a 128 byte cache line
> provided the first half isn't written while the allocation is active.

As the allocator cannot know when the first half will be used for DMA by
whoever allocated it, we can only assume it can happen at any time, and not
return the 2nd half, ever?

> But that ARCH_KMALLOC_MINALIGN only applies to 'large' items?
> Small items only need aligning to the power of 2 below their size.
> So 8 bytes items only need 8 byte alignment even though a larger
> item might need (say) 64 byte alignment.

But if we never defined such threshold (that would probably have to be arch
independent) then we can't start making such assumptions today, as we don't
know which kmalloc() users do expect DMA and which not? It would have to be
a flag or something. And yeah there is already a __GFP_DMA flag, but it
means something a bit different...

> 
> 	David
> 
> -
> Registered Address Lakeside, Bramley Road, Mount Farm, Milton Keynes, MK1 1PT, UK
> Registration No: 1397386 (Wales)
>