[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-Id: <20251219060452.85288-1-21cnbao@gmail.com>
Date: Fri, 19 Dec 2025 14:04:52 +0800
From: Barry Song <21cnbao@...il.com>
To: catalin.marinas@....com,
m.szyprowski@...sung.com,
robin.murphy@....com,
will@...nel.org
Cc: ada.coupriediaz@....com,
anshuman.khandual@....com,
ardb@...nel.org,
iommu@...ts.linux.dev,
linux-arm-kernel@...ts.infradead.org,
linux-kernel@...r.kernel.org,
maz@...nel.org,
ryan.roberts@....com,
surenb@...gle.com,
v-songbaohua@...o.com,
zhengtangquan@...o.com
Subject: [PATCH 0/6] dma-mapping: arm64: support batched cache sync
From: Barry Song <v-songbaohua@...o.com>
For reasons unclear, the cover letter was omitted from the
initial posting, despite Gmail indicating it was sent. This
is a resend. Apologies for the noise.
Many embedded ARM64 SoCs still lack hardware cache coherency support, which
causes DMA mapping operations to appear as hotspots in on-CPU flame graphs.
For an SG list with *nents* entries, the current dma_map/unmap_sg() and DMA
sync APIs perform cache maintenance one entry at a time. After each entry,
the implementation synchronously waits for the corresponding region’s
D-cache operations to complete. On architectures like arm64, efficiency can
be improved by issuing all entries’ operations first and then performing a
single batched wait for completion.
Tangquan's results show that batched synchronization can reduce
dma_map_sg() time by 64.61% and dma_unmap_sg() time by 66.60% on an MTK
phone platform (MediaTek Dimensity 9500). The tests were performed by
pinning the task to CPU7 and fixing the CPU frequency at 2.6 GHz,
running dma_map_sg() and dma_unmap_sg() on 10 MB buffers (10 MB / 4 KB
sg entries per buffer) for 200 iterations and then averaging the
results.
I also ran this patch set on an RK3588 Rock5B+ board and
observed that millions of DMA sync operations were batched.
diff with RFC:
* Dropped lots of #ifdef/#else/#endif according to Catalin and Marek,
thanks!
* Also add iova link/unlink batches, which is marked as RFC as i lack
hardware. This is suggested by Marek, thanks!
RFC link:
https://lore.kernel.org/lkml/20251029023115.22809-1-21cnbao@gmail.com/
Barry Song (6):
arm64: Provide dcache_by_myline_op_nosync helper
arm64: Provide dcache_clean_poc_nosync helper
arm64: Provide dcache_inval_poc_nosync helper
arm64: Provide arch_sync_dma_ batched helpers
dma-mapping: Allow batched DMA sync operations if supported by the
arch
dma-iommu: Allow DMA sync batching for IOVA link/unlink
arch/arm64/Kconfig | 1 +
arch/arm64/include/asm/assembler.h | 79 +++++++++++++++++++-------
arch/arm64/include/asm/cacheflush.h | 2 +
arch/arm64/mm/cache.S | 58 +++++++++++++++----
arch/arm64/mm/dma-mapping.c | 24 ++++++++
drivers/iommu/dma-iommu.c | 12 +++-
include/linux/dma-map-ops.h | 22 ++++++++
kernel/dma/Kconfig | 3 +
kernel/dma/direct.c | 28 +++++++---
kernel/dma/direct.h | 86 +++++++++++++++++++++++++----
10 files changed, 262 insertions(+), 53 deletions(-)
--
2.39.3 (Apple Git-146)
Powered by blists - more mailing lists