[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <20200818083720.GA9451@infradead.org>
Date: Tue, 18 Aug 2020 09:37:20 +0100
From: Christoph Hellwig <hch@...radead.org>
To: Will Deacon <will@...nel.org>
Cc: Cho KyongHo <pullip.cho@...sung.com>, joro@...tes.org,
catalin.marinas@....com, iommu@...ts.linux-foundation.org,
linux-kernel@...r.kernel.org, linux-arm-kernel@...ts.infradead.org,
m.szyprowski@...sung.com, robin.murphy@....com,
janghyuck.kim@...sung.com, hyesoo.yu@...sung.com
Subject: Re: [PATCH 1/2] dma-mapping: introduce relaxed version of dma sync
On Tue, Aug 18, 2020 at 09:28:53AM +0100, Will Deacon wrote:
> On Tue, Aug 18, 2020 at 04:43:10PM +0900, Cho KyongHo wrote:
> > Cache maintenance operations in the most of CPU architectures needs
> > memory barrier after the cache maintenance for the DMAs to view the
> > region of the memory correctly. The problem is that memory barrier is
> > very expensive and dma_[un]map_sg() and dma_sync_sg_for_{device|cpu}()
> > involves the memory barrier per every single cache sg entry. In some
> > CPU micro-architecture, a single memory barrier consumes more time than
> > cache clean on 4KiB. It becomes more serious if the number of CPU cores
> > are larger.
>
> Have you got higher-level performance data for this change? It's more likely
> that the DSB is what actually forces the prior cache maintenance to
> complete, so it's important to look at the bigger picture, not just the
> apparent relative cost of these instructions.
>
> Also, it's a miracle that non-coherent DMA even works, so I'm not sure
> that we should be complicating the implementation like this to try to
> make it "fast".
And without not just an important in-tree user but one that actually
matters and can show how this is correct the whole proposal is complete
nonstarter.
Powered by blists - more mailing lists