lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite for Android: free password hash cracker in your pocket
[<prev] [next>] [<thread-prev] [day] [month] [year] [list]
Message-ID: <CAGsJ_4wy6ZPhCvMqODy6W6Why34Lfn5WXf1bUwJh+Qmd2X2rCQ@mail.gmail.com>
Date: Fri, 7 Nov 2025 04:44:24 +0800
From: Barry Song <21cnbao@...il.com>
To: Catalin Marinas <catalin.marinas@....com>, Will Deacon <will@...nel.org>, 
	Marek Szyprowski <m.szyprowski@...sung.com>, Robin Murphy <robin.murphy@....com>
Cc: Barry Song <v-songbaohua@...o.com>, Ada Couprie Diaz <ada.coupriediaz@....com>, 
	Ard Biesheuvel <ardb@...nel.org>, Marc Zyngier <maz@...nel.org>, 
	Anshuman Khandual <anshuman.khandual@....com>, Ryan Roberts <ryan.roberts@....com>, 
	Suren Baghdasaryan <surenb@...gle.com>, Tangquan Zheng <zhengtangquan@...o.com>, 
	linux-arm-kernel@...ts.infradead.org, linux-kernel@...r.kernel.org, 
	iommu@...ts.linux.dev
Subject: Re: [RFC PATCH 0/5] dma-mapping: arm64: support batched cache sync

On Wed, Oct 29, 2025 at 10:31 AM Barry Song <21cnbao@...il.com> wrote:
>
> From: Barry Song <v-songbaohua@...o.com>
>
> Many embedded ARM64 SoCs still lack hardware cache coherency support, which
> causes DMA mapping operations to appear as hotspots in on-CPU flame graphs.
>
> For an SG list with *nents* entries, the current dma_map/unmap_sg() and DMA
> sync APIs perform cache maintenance one entry at a time. After each entry,
> the implementation synchronously waits for the corresponding region’s
> D-cache operations to complete. On architectures like arm64, efficiency can
> be improved by issuing all entries’ operations first and then performing a
> single batched wait for completion.
>
> Tangquan's initial results show that batched synchronization can reduce
> dma_map_sg() time by 64.61% and dma_unmap_sg() time by 66.60% on an MTK
> phone platform (MediaTek Dimensity 9500). The tests were performed by
> pinning the task to CPU7 and fixing the CPU frequency at 2.6 GHz,
> running dma_map_sg() and dma_unmap_sg() on 10 MB buffers (10 MB / 4 KB
> sg entries per buffer) for 200 iterations and then averaging the
> results.
>
> Barry Song (5):
>   arm64: Provide dcache_by_myline_op_nosync helper
>   arm64: Provide dcache_clean_poc_nosync helper
>   arm64: Provide dcache_inval_poc_nosync helper
>   arm64: Provide arch_sync_dma_ batched helpers
>   dma-mapping: Allow batched DMA sync operations if supported by the
>     arch
>

Hi Catalin, Will, Marek, Robin, and all,
Do you have any feedback on this before I send the formal
patchset (dropping the RFC tag)?

Thanks
Barry

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ