linux-kernel - Re: [PATCH 1/2] dma-mapping: introduce relaxed version of dma sync

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <20200818083720.GA9451@infradead.org>
Date:   Tue, 18 Aug 2020 09:37:20 +0100
From:   Christoph Hellwig <hch@...radead.org>
To:     Will Deacon <will@...nel.org>
Cc:     Cho KyongHo <pullip.cho@...sung.com>, joro@...tes.org,
        catalin.marinas@....com, iommu@...ts.linux-foundation.org,
        linux-kernel@...r.kernel.org, linux-arm-kernel@...ts.infradead.org,
        m.szyprowski@...sung.com, robin.murphy@....com,
        janghyuck.kim@...sung.com, hyesoo.yu@...sung.com
Subject: Re: [PATCH 1/2] dma-mapping: introduce relaxed version of dma sync

On Tue, Aug 18, 2020 at 09:28:53AM +0100, Will Deacon wrote:
> On Tue, Aug 18, 2020 at 04:43:10PM +0900, Cho KyongHo wrote:
> > Cache maintenance operations in the most of CPU architectures needs
> > memory barrier after the cache maintenance for the DMAs to view the
> > region of the memory correctly. The problem is that memory barrier is
> > very expensive and dma_[un]map_sg() and dma_sync_sg_for_{device|cpu}()
> > involves the memory barrier per every single cache sg entry. In some
> > CPU micro-architecture, a single memory barrier consumes more time than
> > cache clean on 4KiB. It becomes more serious if the number of CPU cores
> > are larger.
> 
> Have you got higher-level performance data for this change? It's more likely
> that the DSB is what actually forces the prior cache maintenance to
> complete, so it's important to look at the bigger picture, not just the
> apparent relative cost of these instructions.
> 
> Also, it's a miracle that non-coherent DMA even works, so I'm not sure
> that we should be complicating the implementation like this to try to
> make it "fast".

And without not just an important in-tree user but one that actually
matters and can show how this is correct the whole proposal is complete
nonstarter.