[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <CAGsJ_4y=yoYZn+_ztdfuOCj_dS-M0h8YWO51AXubPbeR1FH6uQ@mail.gmail.com>
Date: Mon, 29 Dec 2025 10:38:26 +1300
From: Barry Song <21cnbao@...il.com>
To: Leon Romanovsky <leon@...nel.org>
Cc: catalin.marinas@....com, m.szyprowski@...sung.com, robin.murphy@....com,
will@...nel.org, iommu@...ts.linux.dev, linux-arm-kernel@...ts.infradead.org,
linux-kernel@...r.kernel.org, xen-devel@...ts.xenproject.org,
Ada Couprie Diaz <ada.coupriediaz@....com>, Ard Biesheuvel <ardb@...nel.org>, Marc Zyngier <maz@...nel.org>,
Anshuman Khandual <anshuman.khandual@....com>, Ryan Roberts <ryan.roberts@....com>,
Suren Baghdasaryan <surenb@...gle.com>, Joerg Roedel <joro@...tes.org>, Juergen Gross <jgross@...e.com>,
Stefano Stabellini <sstabellini@...nel.org>,
Oleksandr Tyshchenko <oleksandr_tyshchenko@...m.com>, Tangquan Zheng <zhengtangquan@...o.com>
Subject: Re: [PATCH v2 4/8] dma-mapping: Separate DMA sync issuing and
completion waiting
On Mon, Dec 29, 2025 at 3:49 AM Leon Romanovsky <leon@...nel.org> wrote:
>
> On Sun, Dec 28, 2025 at 10:45:13AM +1300, Barry Song wrote:
> > On Sun, Dec 28, 2025 at 9:07 AM Leon Romanovsky <leon@...nel.org> wrote:
> > >
> > > On Sat, Dec 27, 2025 at 11:52:44AM +1300, Barry Song wrote:
> > > > From: Barry Song <baohua@...nel.org>
> > > >
> > > > Currently, arch_sync_dma_for_cpu and arch_sync_dma_for_device
> > > > always wait for the completion of each DMA buffer. That is,
> > > > issuing the DMA sync and waiting for completion is done in a
> > > > single API call.
> > > >
> > > > For scatter-gather lists with multiple entries, this means
> > > > issuing and waiting is repeated for each entry, which can hurt
> > > > performance. Architectures like ARM64 may be able to issue all
> > > > DMA sync operations for all entries first and then wait for
> > > > completion together.
> > > >
> > > > To address this, arch_sync_dma_for_* now issues DMA operations in
> > > > batch, followed by a flush. On ARM64, the flush is implemented
> > > > using a dsb instruction within arch_sync_dma_flush().
> > > >
> > > > For now, add arch_sync_dma_flush() after each
> > > > arch_sync_dma_for_*() call. arch_sync_dma_flush() is defined as a
> > > > no-op on all architectures except arm64, so this patch does not
> > > > change existing behavior. Subsequent patches will introduce true
> > > > batching for SG DMA buffers.
> > > >
> > > > Cc: Leon Romanovsky <leon@...nel.org>
> > > > Cc: Catalin Marinas <catalin.marinas@....com>
> > > > Cc: Will Deacon <will@...nel.org>
> > > > Cc: Marek Szyprowski <m.szyprowski@...sung.com>
> > > > Cc: Robin Murphy <robin.murphy@....com>
> > > > Cc: Ada Couprie Diaz <ada.coupriediaz@....com>
> > > > Cc: Ard Biesheuvel <ardb@...nel.org>
> > > > Cc: Marc Zyngier <maz@...nel.org>
> > > > Cc: Anshuman Khandual <anshuman.khandual@....com>
> > > > Cc: Ryan Roberts <ryan.roberts@....com>
> > > > Cc: Suren Baghdasaryan <surenb@...gle.com>
> > > > Cc: Joerg Roedel <joro@...tes.org>
> > > > Cc: Juergen Gross <jgross@...e.com>
> > > > Cc: Stefano Stabellini <sstabellini@...nel.org>
> > > > Cc: Oleksandr Tyshchenko <oleksandr_tyshchenko@...m.com>
> > > > Cc: Tangquan Zheng <zhengtangquan@...o.com>
> > > > Signed-off-by: Barry Song <baohua@...nel.org>
> > > > ---
> > > > arch/arm64/include/asm/cache.h | 6 ++++++
> > > > arch/arm64/mm/dma-mapping.c | 4 ++--
> > > > drivers/iommu/dma-iommu.c | 37 +++++++++++++++++++++++++---------
> > > > drivers/xen/swiotlb-xen.c | 24 ++++++++++++++--------
> > > > include/linux/dma-map-ops.h | 6 ++++++
> > > > kernel/dma/direct.c | 8 ++++++--
> > > > kernel/dma/direct.h | 9 +++++++--
> > > > kernel/dma/swiotlb.c | 4 +++-
> > > > 8 files changed, 73 insertions(+), 25 deletions(-)
> > >
> > > <...>
> > >
> > > > +#ifndef arch_sync_dma_flush
> > > > +static inline void arch_sync_dma_flush(void)
> > > > +{
> > > > +}
> > > > +#endif
> > >
> > > Over the weekend I realized a useful advantage of the ARCH_HAVE_* config
> > > options: they make it straightforward to inspect the entire DMA path simply
> > > by looking at the .config.
> >
> > I am not quite sure how much this benefits users, as the same
> > information could also be obtained by grepping for
> > #define arch_sync_dma_flush in the source code.
>
> It differs slightly. Users no longer need to grep around or guess whether this
> platform used the arch_sync_dma_flush path. A simple grep for ARCH_HAVE_ in
> /proc/config.gz provides the answer.
In any case, it is only two or three lines of code, so I am fine with
either approach. Perhaps Marek, Robin, and others have a point here?
>
> >
> > >
> > > Thanks,
> > > Reviewed-by: Leon Romanovsky <leonro@...dia.com>
> >
> > Thanks very much, Leon, for reviewing this over the weekend. One thing
> > you might have missed is that I place arch_sync_dma_flush() after all
> > arch_sync_dma_for_*() calls, for both single and sg cases. I also
> > used a Python script to scan the code and verify that every
> > arch_sync_dma_for_*() is followed by arch_sync_dma_flush(), to ensure
> > that no call is left out.
> >
> > In the subsequent patches, for sg cases, the per-entry flush is
> > replaced by a single flush of the entire sg. Each sg case has
> > different characteristics: some are straightforward, while others
> > can be tricky and involve additional contexts.
>
> I didn't overlook it, and I understand your rationale. However, this is
> not how kernel patches should be structured. You should not introduce
> code in patch X and then move it elsewhere in patch X + Y.
I am not quite convinced by this concern. This patch only
separates DMA sync issuing from completion waiting, and it
reflects that the development is done step by step.
>
> Place the code in the correct location from the start. Your patches are
> small enough to review as is.
My point is that this patch places the code in the correct locations
from the start. It splits arch_sync_dma_for_*() into
arch_sync_dma_for_*() plus arch_sync_dma_flush() everywhere, without
introducing any functional changes from the outset.
The subsequent patches clearly show which parts are truly batched.
In the meantime, I do not have a strong preference here. If you think
it is better to move some of the straightforward batching code here,
I can follow that approach. Perhaps I could move patch 5, patch 8,
and the iommu_dma_iova_unlink_range_slow change from patch 7 here,
while keeping
[PATCH 6] dma-mapping: Support batch mode for
dma_direct_{map,unmap}_sg
and the IOVA link part from patch 7 as separate patches, since that
part is not straightforward. The IOVA link changes affect both
__dma_iova_link() and dma_iova_sync(), which are two separate
functions and require a deeper understanding of the contexts to
determine correctness. That part also lacks testing.
Would that be okay with you?
Thanks
Barry
Powered by blists - more mailing lists