lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <Ylh61CCMkESmurIp@arm.com>
Date:   Thu, 14 Apr 2022 20:49:40 +0100
From:   Catalin Marinas <catalin.marinas@....com>
To:     Linus Torvalds <torvalds@...ux-foundation.org>
Cc:     Ard Biesheuvel <ardb@...nel.org>,
        Herbert Xu <herbert@...dor.apana.org.au>,
        Will Deacon <will@...nel.org>, Marc Zyngier <maz@...nel.org>,
        Arnd Bergmann <arnd@...db.de>,
        Greg Kroah-Hartman <gregkh@...uxfoundation.org>,
        Andrew Morton <akpm@...ux-foundation.org>,
        Linux Memory Management List <linux-mm@...ck.org>,
        Linux ARM <linux-arm-kernel@...ts.infradead.org>,
        Linux Kernel Mailing List <linux-kernel@...r.kernel.org>,
        "David S. Miller" <davem@...emloft.net>
Subject: Re: [PATCH 07/10] crypto: Use ARCH_DMA_MINALIGN instead of
 ARCH_KMALLOC_MINALIGN

On Wed, Apr 13, 2022 at 09:53:24AM -1000, Linus Torvalds wrote:
> On Tue, Apr 12, 2022 at 10:47 PM Catalin Marinas
> <catalin.marinas@....com> wrote:
> > I agree. There is also an implicit expectation that the DMA API works on
> > kmalloc'ed buffers and that's what ARCH_DMA_MINALIGN is for (and the
> > dynamic arch_kmalloc_minalign() in this series). But the key point is
> > that the driver doesn't need to know the CPU cache topology, coherency,
> > the DMA API and kmalloc() take care of these.
> 
> Honestly, I think it would probably be worth discussing the "kmalloc
> DMA alignment" issues.
> 
> 99.9% of kmalloc users don't want to do DMA.
> 
> And there's actually a fair amount of small kmalloc for random stuff.
> Right now on my laptop, I have
> 
>     kmalloc-8          16907  18432      8  512    1 : ...
> 
> according to slabinfo, so almost 17 _thousand_ allocations of 8 bytes.
> 
> It's all kinds of sad if those allocations need to be 64 bytes in size
> just because of some silly DMA alignment issue, when none of them want
> it.

It's a lot worse, ARCH_KMALLOC_MINALIGN is currently 128 bytes on arm64.
I want to at least get it down to 64 with this series while preserving
the current kmalloc() semantics.

If we know the SoC is fully coherent (a bit tricky with late probed
devices), we could get the alignment down to 8. In the mobile space,
unfortunately, most DMA is non-coherent.

I think it's worth investigating the __dma annotations that Greg
suggested, though I have a suspicion it either is too difficult to track
or we just end up with this annotation everywhere. There are cases where
the memory is allocated outside the driver that knows the DMA needs,
though I guess these are either full page allocations or
kmem_cache_alloc() (e.g. page cache pages, skb).

There's also Ard's suggestion to bounce the (inbound DMA) buffer if not
aligned. That's doable but dma_map_single(), for example, only gets the
size of some random structure/buffer. If the size is below
ARCH_DMA_MINALIGN (or cache_line_size()), the DMA API implementation
would have to retrieve the slab cache, check the real allocation size
and then bounce if necessary.

Irrespective of which option we go for, I think at least part of this
series decoupling ARCH_KMALLOC_MINALIGN from ARCH_DMA_MINALIGN is still
needed since currently the minalign is used in some compile time
attributes. Even getting the kmalloc() size down to 64 is a significant
improvement over 128.

Subsequently I'd attempt Ard's bouncing idea as a quick workaround and
assess the bouncing overhead on some real platforms. This may be needed
before we track down all places to use dma_kmalloc().

I need to think some more on Greg's __dma annotation, as I said the
allocation may be decoupled from the driver in some cases.

-- 
Catalin

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ