linux-kernel - Re: [PATCH 07/10] crypto: Use ARCH_DMA_MINALIGN instead of ARCH_KMALLOC

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [day] [month] [year] [list]

Message-ID: <Y0x5rGRcjn996ZOo@arm.com>
Date:   Sun, 16 Oct 2022 22:37:48 +0100
From:   Catalin Marinas <catalin.marinas@....com>
To:     Linus Torvalds <torvalds@...ux-foundation.org>
Cc:     Saravana Kannan <saravanak@...gle.com>,
        Isaac Manjarres <isaacmanjarres@...gle.com>,
        Herbert Xu <herbert@...dor.apana.org.au>,
        Ard Biesheuvel <ardb@...nel.org>,
        Will Deacon <will@...nel.org>, Marc Zyngier <maz@...nel.org>,
        Arnd Bergmann <arnd@...db.de>,
        Greg Kroah-Hartman <gregkh@...uxfoundation.org>,
        Andrew Morton <akpm@...ux-foundation.org>,
        Linux Memory Management List <linux-mm@...ck.org>,
        Linux ARM <linux-arm-kernel@...ts.infradead.org>,
        Linux Kernel Mailing List <linux-kernel@...r.kernel.org>,
        "David S. Miller" <davem@...emloft.net>, kernel-team@...roid.com
Subject: Re: [PATCH 07/10] crypto: Use ARCH_DMA_MINALIGN instead of
 ARCH_KMALLOC_MINALIGN

On Fri, Oct 14, 2022 at 01:44:25PM -0700, Linus Torvalds wrote:
> On Fri, Oct 14, 2022 at 1:24 PM Saravana Kannan <saravanak@...gle.com> wrote:
> > Agreed. Even allowing a 64-byte kmalloc cache on a system with a
> > 64-byte cacheline size saves quite a bit of memory.
> 
> Well, the *really* trivial thing to do is to just say "if the platform
> is DMA coherent, just allow any size kmalloc cache". And just
> consciously leave the broken garbage behind.

The problem is we don't have a reliable way to tell whether the platform
is DMA-coherent. The CPU IDs don't really say much and in most cases
it's a property of the interconnect/bus and device. We describe the DMA
coherency in DT or ACPI and the latter is somewhat better as it assumes
coherent by default. But for DT, not having a 'dma-coherent' property
means non-coherent DMA (or no DMA at all). We can't even tell whether
the device is going to do any DMA, arch_setup_dma_ops() is called even
for devices like the PMU. We could look into defining new properties
(e.g. "no-dma") and adjust the DTs but we may also have late probed
devices, long after the slab allocator was initialised. A big
'dma-coherent' property on the top node may work but most Android
systems won't benefit from this (your laptop may, I haven't checked).

I think the best bet is still either (a) bounce for small sizes or (b) a
new GFP_NODMA/PACKED/etc. flag for the hot small allocations. (a) is
somehow more universal but lots (most) Android devices are deployed with
no swiotlb buffer as the vendor knows the device needs and don't need
extra buffer waste. Not sure how reliable it would be to trigger another
slab allocation on the dma_map_*() calls for the bounce (it may need to
be GFP_ATOMIC). Option (b) looks more appealing on such systems, though
a lot more churn throughout the kernel.

-- 
Catalin