[<prev] [next>] [thread-next>] [day] [month] [year] [list]
Message-ID: <20251226225254.46197-1-21cnbao@gmail.com>
Date: Sat, 27 Dec 2025 11:52:40 +1300
From: Barry Song <21cnbao@...il.com>
To: catalin.marinas@....com,
m.szyprowski@...sung.com,
robin.murphy@....com,
will@...nel.org,
iommu@...ts.linux.dev,
linux-arm-kernel@...ts.infradead.org
Cc: linux-kernel@...r.kernel.org,
xen-devel@...ts.xenproject.org,
Barry Song <baohua@...nel.org>,
Leon Romanovsky <leon@...nel.org>,
Ada Couprie Diaz <ada.coupriediaz@....com>,
Ard Biesheuvel <ardb@...nel.org>,
Marc Zyngier <maz@...nel.org>,
Anshuman Khandual <anshuman.khandual@....com>,
Ryan Roberts <ryan.roberts@....com>,
Suren Baghdasaryan <surenb@...gle.com>,
Joerg Roedel <joro@...tes.org>,
Juergen Gross <jgross@...e.com>,
Stefano Stabellini <sstabellini@...nel.org>,
Oleksandr Tyshchenko <oleksandr_tyshchenko@...m.com>,
Tangquan Zheng <zhengtangquan@...o.com>,
Huacai Zhou <zhouhuacai@...o.com>
Subject: [PATCH v2 0/8] dma-mapping: arm64: support batched cache sync
From: Barry Song <baohua@...nel.org>
Many embedded ARM64 SoCs still lack hardware cache coherency support, which
causes DMA mapping operations to appear as hotspots in on-CPU flame graphs.
For an SG list with *nents* entries, the current dma_map/unmap_sg() and DMA
sync APIs perform cache maintenance one entry at a time. After each entry,
the implementation synchronously waits for the corresponding region’s
D-cache operations to complete. On architectures like arm64, efficiency can
be improved by issuing all entries’ operations first and then performing a
single batched wait for completion.
Tangquan's results show that batched synchronization can reduce
dma_map_sg() time by 64.61% and dma_unmap_sg() time by 66.60% on an MTK
phone platform (MediaTek Dimensity 9500). The tests were performed by
pinning the task to CPU7 and fixing the CPU frequency at 2.6 GHz,
running dma_map_sg() and dma_unmap_sg() on 10 MB buffers (10 MB / 4 KB
sg entries per buffer) for 200 iterations and then averaging the
results.
I also ran this patch set on an RK3588 Rock5B+ board and
observed that millions of DMA sync operations were batched.
v2:
* Refine a large amount of arm64 asm code based on feedback from
Robin, thanks!
* Drop batch_add APIs and always use arch_sync_dma_for_* + flush,
even for a single buffer, based on Leon’s suggestion, thanks!
* Refine a large amount of code based on feedback from Leon, thanks!
* Also add batch support for iommu_dma_sync_sg_for_{cpu,device}
v1 link:
https://lore.kernel.org/lkml/20251219053658.84978-1-21cnbao@gmail.com/
v1, diff with RFC:
* Drop a large number of #ifdef/#else/#endif blocks based on feedback
from Catalin and Marek, thanks!
* Also add batched iova link/unlink support, marked as RFC since I lack
the required hardware. This was suggested by Marek, thanks!
RFC link:
https://lore.kernel.org/lkml/20251029023115.22809-1-21cnbao@gmail.com/
Barry Song (8):
arm64: Provide dcache_by_myline_op_nosync helper
arm64: Provide dcache_clean_poc_nosync helper
arm64: Provide dcache_inval_poc_nosync helper
dma-mapping: Separate DMA sync issuing and completion waiting
dma-mapping: Support batch mode for dma_direct_sync_sg_for_*
dma-mapping: Support batch mode for dma_direct_{map,unmap}_sg
dma-iommu: Support DMA sync batch mode for IOVA link and unlink
dma-iommu: Support DMA sync batch mode for iommu_dma_sync_sg_for_{cpu,
device}
arch/arm64/include/asm/assembler.h | 24 +++++++++---
arch/arm64/include/asm/cache.h | 6 +++
arch/arm64/include/asm/cacheflush.h | 2 +
arch/arm64/kernel/relocate_kernel.S | 3 +-
arch/arm64/mm/cache.S | 57 +++++++++++++++++++++++------
arch/arm64/mm/dma-mapping.c | 4 +-
drivers/iommu/dma-iommu.c | 35 ++++++++++++++----
drivers/xen/swiotlb-xen.c | 24 ++++++++----
include/linux/dma-map-ops.h | 6 +++
kernel/dma/direct.c | 23 +++++++++---
kernel/dma/direct.h | 21 ++++++++---
kernel/dma/mapping.c | 6 +--
kernel/dma/swiotlb.c | 4 +-
13 files changed, 165 insertions(+), 50 deletions(-)
Cc: Leon Romanovsky <leon@...nel.org>
Cc: Marek Szyprowski <m.szyprowski@...sung.com>
Cc: Catalin Marinas <catalin.marinas@....com>
Cc: Will Deacon <will@...nel.org>
Cc: Ada Couprie Diaz <ada.coupriediaz@....com>
Cc: Ard Biesheuvel <ardb@...nel.org>
Cc: Marc Zyngier <maz@...nel.org>
Cc: Anshuman Khandual <anshuman.khandual@....com>
Cc: Ryan Roberts <ryan.roberts@....com>
Cc: Suren Baghdasaryan <surenb@...gle.com>
Cc: Robin Murphy <robin.murphy@....com>
Cc: Joerg Roedel <joro@...tes.org>
Cc: Juergen Gross <jgross@...e.com>
Cc: Stefano Stabellini <sstabellini@...nel.org>
Cc: Oleksandr Tyshchenko <oleksandr_tyshchenko@...m.com>
Cc: Tangquan Zheng <zhengtangquan@...o.com>
Cc: Huacai Zhou <zhouhuacai@...o.com>
--
2.43.0
Powered by blists - more mailing lists