lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <587dc9a8-b974-e222-95b4-93c2a8f2aba2@imgtec.com>
Date:   Tue, 26 Sep 2017 14:57:39 +0100
From:   Matt Redfearn <matt.redfearn@...tec.com>
To:     David Miller <davem@...emloft.net>
CC:     <netdev@...r.kernel.org>, <alexandre.torgue@...com>,
        <peppe.cavallaro@...com>, <linux-kernel@...r.kernel.org>,
        Linux MIPS Mailing List <linux-mips@...ux-mips.org>,
        Paul Burton <paul.burton@...tec.com>,
        James Hogan <james.hogan@...tec.com>
Subject: Re: [PATCH] net: stmmac: Meet alignment requirements for DMA

Hi David,

Thanks for your feedback.


On 23/09/17 02:26, David Miller wrote:
> From: Matt Redfearn <matt.redfearn@...tec.com>
> Date: Fri, 22 Sep 2017 12:13:53 +0100
>
>> According to Documentation/DMA-API.txt:
>>   Warnings:  Memory coherency operates at a granularity called the cache
>>   line width.  In order for memory mapped by this API to operate
>>   correctly, the mapped region must begin exactly on a cache line
>>   boundary and end exactly on one (to prevent two separately mapped
>>   regions from sharing a single cache line).  Since the cache line size
>>   may not be known at compile time, the API will not enforce this
>>   requirement.  Therefore, it is recommended that driver writers who
>>   don't take special care to determine the cache line size at run time
>>   only map virtual regions that begin and end on page boundaries (which
>>   are guaranteed also to be cache line boundaries).
> This is rediculious.  You're misreading what this document is trying
> to explain.

To be clear, is your interpretation of the documentation that drivers do 
not have to ensure cacheline alignment?

As I read that documentation of the DMA API, it puts the onus on device 
drivers to ensure that they operate on cacheline aligned & sized blocks 
of memory. It states "the mapped region must begin exactly on a cache 
line boundary". We know that skb->data is not cacheline aligned, since 
it is skb_headroom() bytes into the skb buffer, and the value of 
skb_headroom is not a multiple of cachelines. To give an example, on the 
Creator Ci40 platform (a 32bit MIPS platform), I have the following values:
skb_headroom(skb) = 130 bytes
sbb->head = 0x8ed9b800 (32byte cacheline aligned)
skb->data = 0x8ed9b882 (not cacheline aligned)

Since the MIPS architecture requires software cache management, 
performing a dma_map_single(TO_DEVICE) will writeback and invalidate the 
cachelines for the required region. To comply with (our interpretation 
of) the DMA API documentation, the MIPS implementation expects a 
cacheline aligned & sized buffer, so that it can writeback a set of 
complete cachelines. Indeed a recent patch 
(https://patchwork.linux-mips.org/patch/14624/) causes the MIPS dma 
mapping code to emit warnings if the buffer it is requested to sync is 
not cachline aligned. This driver, as is, fails this test and causes 
thousands of warnings to be emitted.

The situation for dma_map_single(FROM_DEVICE) is potentially worse, 
since it will perform a cache invalidate of all lines for the buffer. If 
the buffer is not cacheline aligned, this will throw away any adjacent 
data in the same cacheline. The dma_map implementation has no way to 
know if data adjacent to the buffer it is requested to sync is valuable 
or not, and it will simply be thrown away by the cache invalidate.

If I am somehow misinterpreting the DMA API documentation, and drivers 
are NOT required to ensure that buffers to be synced are cacheline 
aligned, then there is only one way that the MIPS implementation can 
ensure that it writeback / invalidates cachelines containing only the 
requested data buffer, and that would be to use bounce buffers. Clearly 
that will be devastating for performance as every outgoing packet will 
need to be copied from it's unaligned location to a guaranteed aligned 
location, and written back from there. Similarly incoming packets would 
have to somehow be initially located in a cacheline aligned location 
before being copied into the non-cacheline aligned location supplied by 
the driver.

> As long as you use the dma_{map,unamp}_single() and sync to/from
> deivce interfaces properly, the cacheline issues will be handled properly
> and the cpu and the device will see proper uptodate memory contents.

I interpret the DMA API document (and the MIPS arch dma code operates 
this way) as stating that the driver must ensure that buffers passed to 
it are cacheline aligned, and cacheline sized, to prevent cache 
management throwing away adjacent data in the same cacheline.

That is what this patch is for, to change the buffer address passed to 
the DMA API to skb->head, which is (as far as I know) guaranteed to be 
cacheline aligned.

>
> It is completely rediculious to require every driver to stash away two
> sets of pointer for every packet, and to DMA map the headroom of the SKB
> which is wasteful.
>
> I'm not applying this, fix this problem properly, thanks.

An alternative, slightly less invasive change, may be to subtract 
NET_IP_ALIGN from the dma buffer address, so that the buffer for which a 
sync is requested is cacheline aligned (is that guaranteed?). Would that 
change be more acceptable?

Thanks,
Matt

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ