linux-kernel - Re: Bug in add_dma_entry()'s debugging code

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <76e8def2-ff45-47d3-91ab-96876ea84079@gmail.com>
Date:   Thu, 30 Nov 2023 21:08:23 +0100
From:   Ferry Toth <fntoth@...il.com>
To:     Catalin Marinas <catalin.marinas@....com>,
        Alan Stern <stern@...land.harvard.edu>
Cc:     Christoph Hellwig <hch@....de>,
        Hamza Mahfooz <someguy@...ective-light.com>,
        Dan Williams <dan.j.williams@...el.com>,
        Marek Szyprowski <m.szyprowski@...sung.com>,
        Andrew <travneff@...il.com>,
        Ferry Toth <ferry.toth@...inga.info>,
        Andy Shevchenko <andy.shevchenko@...il.com>,
        Thorsten Leemhuis <regressions@...mhuis.info>,
        iommu@...ts.linux.dev,
        Kernel development list <linux-kernel@...r.kernel.org>,
        USB mailing list <linux-usb@...r.kernel.org>,
        Robin Murphy <robin.murphy@....com>,
        Will Deacon <will@...nel.org>
Subject: Re: Bug in add_dma_entry()'s debugging code

Hi,

Op 28-11-2023 om 18:44 schreef Catalin Marinas:
> On Tue, Nov 28, 2023 at 10:18:19AM -0500, Alan Stern wrote:
>> On Tue, Nov 28, 2023 at 02:37:02PM +0100, Christoph Hellwig wrote:
>>> I'd actually go one step back:
>>>
>>>   1) for not cache coherent DMA you can't do overlapping operations inside
>>>      a cache line
>>
>> Rephrasing slightly: You mustn't perform multiple non-cache-coherent DMA
>> operations that touch the same cache line concurrently.  (The word
>> "overlapping" is a a little ambiguous in this context.)
> 
> The problem is worse. I'd say you should not perform even a single
> non-cache-coherent DMA (usually from-device or bidirectional) operation
> if the cache line is shared with anything else modifying it. It doesn't
> need to be another DMA operation. But that's more difficult to add to
> the DMA API debug code (maybe something like the bouncing logic in
> dma_kmalloc_needs_bounce()).
> 
>>> The logical confcusion from that would be that IFF dma-debug is enabled on
>>> any platform we need to set ARCH_DMA_MINALIGN to the cache line size.
> 
> Or just force the kmalloc() min align to cache_line_size(), something
> like:
> 
> diff --git a/include/linux/dma-mapping.h b/include/linux/dma-mapping.h
> index 4a658de44ee9..3ece20367636 100644
> --- a/include/linux/dma-mapping.h
> +++ b/include/linux/dma-mapping.h
> @@ -543,6 +543,8 @@ static inline int dma_get_cache_alignment(void)
>   #ifdef ARCH_HAS_DMA_MINALIGN
>   	return ARCH_DMA_MINALIGN;
>   #endif
> +	if (IS_ENABLED(CONFIG_DMA_API_DEBUG))
> +		return cache_line_size();
>   	return 1;
>   }
>   #endif
> diff --git a/mm/slab_common.c b/mm/slab_common.c
> index 8d431193c273..d0b21d6e9328 100644
> --- a/mm/slab_common.c
> +++ b/mm/slab_common.c
> @@ -879,7 +879,7 @@ static unsigned int __kmalloc_minalign(void)
>   	unsigned int minalign = dma_get_cache_alignment();
>   
>   	if (IS_ENABLED(CONFIG_DMA_BOUNCE_UNALIGNED_KMALLOC) &&
> -	    is_swiotlb_allocated())
> +	    is_swiotlb_allocated() && !IS_ENABLED(CONFIG_DMA_API_DEBUG))
>   		minalign = ARCH_KMALLOC_MINALIGN;
>   
>   	return max(minalign, arch_slab_minalign());

With above suggestion "force the kmalloc() min align to 
cache_line_size()" + Alan's debug code:

root@...a:~# journalctl -k | grep hub
kernel: usbcore: registered new interface driver hub
kernel: hub 1-0:1.0: USB hub found
kernel: usb usb1: hub buffer at 71c7180, status at 71c71c0
kernel: hub 1-0:1.0: 1 port detected
kernel: hub 2-0:1.0: USB hub found
kernel: usb usb2: hub buffer at 71c79c0, status at 71c7a00
kernel: hub 2-0:1.0: 1 port detected
kernel: hub 1-1:1.0: USB hub found
kernel: usb 1-1: hub buffer at 65b36c0, status at 6639340
kernel: hub 1-1:1.0: 7 ports detected

and the stack trace indeed goes away.

IOW also the 2 root hub kmalloc() are now also aligned to the cache line 
size, even though these never triggered the stack trace. Strange: hub 
status is aligned far away from hub buffer, kmalloc mysteries.

This still did not land for me: are we detecting a false alarm here as 
the 2 DMA operations can never happen on the same cache line on 
non-cache-coherent platforms? If so, shouldn't we fix up the dma debug 
code to not detect a false alarm? Instead of changing the alignment?
Or, is this a bonafide warning (for non-cache-coherent platforms)? Then 
we should not silence it by force aligning it, but issue a WARN (on a 
cache coherent platform) that is more useful (i.e. here we have not an 
overlap but a shared cache line). On a non-cache coherent platform 
something stronger than a WARN might be appropriate?

> Also note that to_cacheline_number() in kernel/dma/debug.c only takes
> into account the L1_CACHE_SHIFT. On arm64 for example, cache_line_size()
> returns the maximum line of all the cache levels (and we've seen
> hardware where the L1 is 64-byte, L2 is 128).
> 
>>> BUT:  we're actually reduzing our dependency on ARCH_DMA_MINALIGN by
>>> moving to bounce buffering unaligned memory for non-coherent
>>> architectures,
>>
>> What's the reason for this?  To allow the minimum allocation size to be
>> smaller than the cache line size?  Does the savings in memory make up
>> for the extra overhead of bounce buffering?
>>
>> Or is this just to allow people to be more careless about how they
>> allocate their DMA buffers (which doesn't seem to make sense)?
> 
> It's the former and it does make a difference with lots of small
> structure or string allocations.
> 
> [...]
>> I get the impression that you would really like to have two different
>> versions of kmalloc() and friends: one for buffers that will be used in
>> DMA (and hence require cache-line alignment) and one for buffers that
>> won't be.
> 
> We've been there for the past 2-3 years (and a few other options). It's
> hard to guess in a generic way because the allocation place may not
> necessarily know how the device is going to access that memory (PIO,
> DMA). The conclusion was that for those cases where the buffer is small,
> we just do a bounce. If it's performance critical, the driver can use a
> kmem_cache_create(SLAB_HWCACHE_ALIGN) and avoid the bouncing.
>