lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [day] [month] [year] [list]
Message-ID: <d19c559e-c93b-4a4d-9a0d-ec289ed4c2e6@samsung.com>
Date: Wed, 31 Dec 2025 15:43:19 +0100
From: Marek Szyprowski <m.szyprowski@...sung.com>
To: Barry Song <21cnbao@...il.com>, Leon Romanovsky <leon@...nel.org>
Cc: catalin.marinas@....com, robin.murphy@....com, will@...nel.org,
	iommu@...ts.linux.dev, linux-arm-kernel@...ts.infradead.org,
	linux-kernel@...r.kernel.org, xen-devel@...ts.xenproject.org, Ada Couprie
	Diaz <ada.coupriediaz@....com>, Ard Biesheuvel <ardb@...nel.org>, Marc
	Zyngier <maz@...nel.org>, Anshuman Khandual <anshuman.khandual@....com>,
	Ryan Roberts <ryan.roberts@....com>, Suren Baghdasaryan <surenb@...gle.com>,
	Joerg Roedel <joro@...tes.org>, Juergen Gross <jgross@...e.com>, Stefano
	Stabellini <sstabellini@...nel.org>, Oleksandr Tyshchenko
	<oleksandr_tyshchenko@...m.com>, Tangquan Zheng <zhengtangquan@...o.com>
Subject: Re: [PATCH v2 4/8] dma-mapping: Separate DMA sync issuing and
 completion waiting

On 28.12.2025 22:38, Barry Song wrote:
> On Mon, Dec 29, 2025 at 3:49 AM Leon Romanovsky <leon@...nel.org> wrote:
>> On Sun, Dec 28, 2025 at 10:45:13AM +1300, Barry Song wrote:
>>> On Sun, Dec 28, 2025 at 9:07 AM Leon Romanovsky <leon@...nel.org> wrote:
>>>> On Sat, Dec 27, 2025 at 11:52:44AM +1300, Barry Song wrote:
>>>>> From: Barry Song <baohua@...nel.org>
>>>>>
>>>>> Currently, arch_sync_dma_for_cpu and arch_sync_dma_for_device
>>>>> always wait for the completion of each DMA buffer. That is,
>>>>> issuing the DMA sync and waiting for completion is done in a
>>>>> single API call.
>>>>>
>>>>> For scatter-gather lists with multiple entries, this means
>>>>> issuing and waiting is repeated for each entry, which can hurt
>>>>> performance. Architectures like ARM64 may be able to issue all
>>>>> DMA sync operations for all entries first and then wait for
>>>>> completion together.
>>>>>
>>>>> To address this, arch_sync_dma_for_* now issues DMA operations in
>>>>> batch, followed by a flush. On ARM64, the flush is implemented
>>>>> using a dsb instruction within arch_sync_dma_flush().
>>>>>
>>>>> For now, add arch_sync_dma_flush() after each
>>>>> arch_sync_dma_for_*() call. arch_sync_dma_flush() is defined as a
>>>>> no-op on all architectures except arm64, so this patch does not
>>>>> change existing behavior. Subsequent patches will introduce true
>>>>> batching for SG DMA buffers.
>>>>>
>>>>> Cc: Leon Romanovsky <leon@...nel.org>
>>>>> Cc: Catalin Marinas <catalin.marinas@....com>
>>>>> Cc: Will Deacon <will@...nel.org>
>>>>> Cc: Marek Szyprowski <m.szyprowski@...sung.com>
>>>>> Cc: Robin Murphy <robin.murphy@....com>
>>>>> Cc: Ada Couprie Diaz <ada.coupriediaz@....com>
>>>>> Cc: Ard Biesheuvel <ardb@...nel.org>
>>>>> Cc: Marc Zyngier <maz@...nel.org>
>>>>> Cc: Anshuman Khandual <anshuman.khandual@....com>
>>>>> Cc: Ryan Roberts <ryan.roberts@....com>
>>>>> Cc: Suren Baghdasaryan <surenb@...gle.com>
>>>>> Cc: Joerg Roedel <joro@...tes.org>
>>>>> Cc: Juergen Gross <jgross@...e.com>
>>>>> Cc: Stefano Stabellini <sstabellini@...nel.org>
>>>>> Cc: Oleksandr Tyshchenko <oleksandr_tyshchenko@...m.com>
>>>>> Cc: Tangquan Zheng <zhengtangquan@...o.com>
>>>>> Signed-off-by: Barry Song <baohua@...nel.org>
>>>>> ---
>>>>>   arch/arm64/include/asm/cache.h |  6 ++++++
>>>>>   arch/arm64/mm/dma-mapping.c    |  4 ++--
>>>>>   drivers/iommu/dma-iommu.c      | 37 +++++++++++++++++++++++++---------
>>>>>   drivers/xen/swiotlb-xen.c      | 24 ++++++++++++++--------
>>>>>   include/linux/dma-map-ops.h    |  6 ++++++
>>>>>   kernel/dma/direct.c            |  8 ++++++--
>>>>>   kernel/dma/direct.h            |  9 +++++++--
>>>>>   kernel/dma/swiotlb.c           |  4 +++-
>>>>>   8 files changed, 73 insertions(+), 25 deletions(-)
>>>> <...>
>>>>
>>>>> +#ifndef arch_sync_dma_flush
>>>>> +static inline void arch_sync_dma_flush(void)
>>>>> +{
>>>>> +}
>>>>> +#endif
>>>> Over the weekend I realized a useful advantage of the ARCH_HAVE_* config
>>>> options: they make it straightforward to inspect the entire DMA path simply
>>>> by looking at the .config.
>>> I am not quite sure how much this benefits users, as the same
>>> information could also be obtained by grepping for
>>> #define arch_sync_dma_flush in the source code.
>> It differs slightly. Users no longer need to grep around or guess whether this
>> platform used the arch_sync_dma_flush path. A simple grep for ARCH_HAVE_ in
>> /proc/config.gz provides the answer.
> In any case, it is only two or three lines of code, so I am fine with
> either approach. Perhaps Marek, Robin, and others have a point here?

If possible I would suggest to follow the already used style in the 
given code even if it means a bit larger patch.

>>>> Thanks,
>>>> Reviewed-by: Leon Romanovsky <leonro@...dia.com>
>>> Thanks very much, Leon, for reviewing this over the weekend. One thing
>>> you might have missed is that I place arch_sync_dma_flush() after all
>>> arch_sync_dma_for_*() calls, for both single and sg cases. I also
>>> used a Python script to scan the code and verify that every
>>> arch_sync_dma_for_*() is followed by arch_sync_dma_flush(), to ensure
>>> that no call is left out.
>>>
>>> In the subsequent patches, for sg cases, the per-entry flush is
>>> replaced by a single flush of the entire sg. Each sg case has
>>> different characteristics: some are straightforward, while others
>>> can be tricky and involve additional contexts.
>> I didn't overlook it, and I understand your rationale. However, this is
>> not how kernel patches should be structured. You should not introduce
>> code in patch X and then move it elsewhere in patch X + Y.
> I am not quite convinced by this concern. This patch only
> separates DMA sync issuing from completion waiting, and it
> reflects that the development is done step by step.
>
>> Place the code in the correct location from the start. Your patches are
>> small enough to review as is.
> My point is that this patch places the code in the correct locations
> from the start. It splits arch_sync_dma_for_*() into
> arch_sync_dma_for_*() plus arch_sync_dma_flush() everywhere, without
> introducing any functional changes from the outset.
> The subsequent patches clearly show which parts are truly batched.
>
> In the meantime, I do not have a strong preference here. If you think
> it is better to move some of the straightforward batching code here,
> I can follow that approach. Perhaps I could move patch 5, patch 8,
> and the iommu_dma_iova_unlink_range_slow change from patch 7 here,
> while keeping
>
>    [PATCH 6] dma-mapping: Support batch mode for
>    dma_direct_{map,unmap}_sg
>
> and the IOVA link part from patch 7 as separate patches, since that
> part is not straightforward. The IOVA link changes affect both
> __dma_iova_link() and dma_iova_sync(), which are two separate
> functions and require a deeper understanding of the contexts to
> determine correctness. That part also lacks testing.
>
> Would that be okay with you?

Yes, this will be okay. The changes are easy to understand, so we don't 
need to go there with such very small steps.

Best regards
-- 
Marek Szyprowski, PhD
Samsung R&D Institute Poland


Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ