[<prev] [next>] [<thread-prev] [day] [month] [year] [list]
Message-ID: <CAGsJ_4zNTYsMkeBav7z=AKdm5CjB=Y73P1QBzq+9FS--z9t9Cw@mail.gmail.com>
Date: Sat, 22 Feb 2025 19:36:29 +1300
From: Barry Song <21cnbao@...il.com>
To: xiaqinxin <xiaqinxin@...wei.com>
Cc: yangyicong <yangyicong@...wei.com>, "hch@....de" <hch@....de>,
"iommu@...ts.linux.dev" <iommu@...ts.linux.dev>, Jonathan Cameron <jonathan.cameron@...wei.com>,
"Zengtao (B)" <prime.zeng@...ilicon.com>, "fanghao (A)" <fanghao11@...wei.com>,
"linux-kernel@...r.kernel.org" <linux-kernel@...r.kernel.org>
Subject: Re: [PATCH 2/3] dma-mapping: benchmark: add support for dma_map_sg
On Fri, Feb 21, 2025 at 4:16 PM xiaqinxin <xiaqinxin@...wei.com> wrote:
>
>
>
> -----邮件原件-----
> 发件人: Barry Song <21cnbao@...il.com>
> 发送时间: 2025年2月18日 4:59
> 收件人: xiaqinxin <xiaqinxin@...wei.com>
> 抄送: chenxiang66@...ilicon.com; yangyicong <yangyicong@...wei.com>; hch@....de; iommu@...ts.linux.dev; Jonathan Cameron <jonathan.cameron@...wei.com>; Zengtao (B) <prime.zeng@...ilicon.com>; fanghao (A) <fanghao11@...wei.com>; linux-kernel@...r.kernel.org
> 主题: Re: [PATCH 2/3] dma-mapping: benchmark: add support for dma_map_sg
>
> On Wed, Feb 12, 2025 at 3:27 PM Qinxin Xia <xiaqinxin@...wei.com> wrote:
> >
> > Support for dma scatter-gather mapping and is intended for testing
> > mapping performance. It achieves by introducing the dma_sg_map_param
> > structure and related functions, which enable the implementation of
> > scatter-gather mapping preparation, mapping, and unmapping operations.
> > Additionally, the dma_map_benchmark_ops array is updated to include
> > operations for scatter-gather mapping. This commit aims to provide a
> > wider range of mapping performance test to cater to different scenarios.
>
> This benchmark is mainly designed to debug contention issues, such as IOMMU TLB flushes or IOMMU driver bottlenecks. I don't fully understand how SG or single will impact the evaluation of the IOMMU driver, making it unclear if the added complexity is justified.
>
> Can you add some explanation on why single mode is not sufficient for profiling and improving IOMMU drivers?
>
> Hello Barry ! 😊
> Currently, the HiSilicon accelerator service uses the dma_map_sg interface. We want to evaluate the performance of the entire DMA map process. (including not only the iommu, but also the map framework). In addition, for scatterlist, __iommu_map is executed for each nent. This increases the complexity and time overhead of mapping. The effect of this fragmentation is not obvious in dma_map_single, which only handles a single contiguous block of memory.
>
Thanks!
Please update your editor to ensure it doesn't respond with such long sentences
without line breaks in the future :-)
Can you provide concrete examples or data showing how the newly added mode
improves profiling of the entire DMA map process? For instance, what limitations
exist without this mode? What performance issues cannot be identified without
it?
> >
> > Signed-off-by: Qinxin Xia <xiaqinxin@...wei.com>
> > ---
> > include/linux/map_benchmark.h | 1 +
> > kernel/dma/map_benchmark.c | 102 ++++++++++++++++++++++++++++++++++
> > 2 files changed, 103 insertions(+)
> >
> > diff --git a/include/linux/map_benchmark.h
> > b/include/linux/map_benchmark.h index 054db02a03a7..a9c1a104ba4f
> > 100644
> > --- a/include/linux/map_benchmark.h
> > +++ b/include/linux/map_benchmark.h
> > @@ -17,6 +17,7 @@
> >
> > enum {
> > DMA_MAP_SINGLE_MODE,
> > + DMA_MAP_SG_MODE,
> > DMA_MAP_MODE_MAX
> > };
> >
> > diff --git a/kernel/dma/map_benchmark.c b/kernel/dma/map_benchmark.c
> > index d8ec0ce058d8..b5828eeb3db7 100644
> > --- a/kernel/dma/map_benchmark.c
> > +++ b/kernel/dma/map_benchmark.c
> > @@ -17,6 +17,7 @@
> > #include <linux/module.h>
> > #include <linux/pci.h>
> > #include <linux/platform_device.h>
> > +#include <linux/scatterlist.h>
> > #include <linux/slab.h>
> > #include <linux/timekeeping.h>
> >
> > @@ -111,8 +112,109 @@ static struct map_benchmark_ops dma_single_map_benchmark_ops = {
> > .do_unmap = dma_single_map_benchmark_do_unmap,
> > };
> >
> > +struct dma_sg_map_param {
> > + struct sg_table sgt;
> > + struct device *dev;
> > + void **buf;
> > + u32 npages;
> > + u32 dma_dir;
> > +};
> > +
> > +static void *dma_sg_map_benchmark_prepare(struct map_benchmark_data
> > +*map) {
> > + struct scatterlist *sg;
> > + int i = 0;
> > +
> > + struct dma_sg_map_param *mparam __free(kfree) = kzalloc(sizeof(*mparam), GFP_KERNEL);
> > + if (!mparam)
> > + return NULL;
> > +
> > + mparam->npages = map->bparam.granule;
> > + mparam->dma_dir = map->bparam.dma_dir;
> > + mparam->dev = map->dev;
> > + mparam->buf = kmalloc_array(mparam->npages, sizeof(*mparam->buf),
> > + GFP_KERNEL);
> > + if (!mparam->buf)
> > + goto err1;
> > +
> > + if (sg_alloc_table(&mparam->sgt, mparam->npages, GFP_KERNEL))
> > + goto err2;
> > +
> > + for_each_sgtable_sg(&mparam->sgt, sg, i) {
> > + mparam->buf[i] = (void *)__get_free_page(GFP_KERNEL);
> > + if (!mparam->buf[i])
> > + goto err3;
> > +
> > + if (mparam->dma_dir != DMA_FROM_DEVICE)
> > + memset(mparam->buf[i], 0x66, PAGE_SIZE);
> > +
> > + sg_set_buf(sg, mparam->buf[i], PAGE_SIZE);
> > + }
> > +
> > + return_ptr(mparam);
> > +
> > +err3:
> > + while (i-- > 0)
> > + free_page((unsigned long)mparam->buf[i]);
> > +
> > + pr_err("dma_map_sg failed get free page on %s\n", dev_name(mparam->dev));
> > + sg_free_table(&mparam->sgt);
> > +err2:
> > + pr_err("dma_map_sg failed alloc sg table on %s\n", dev_name(mparam->dev));
> > + kfree(mparam->buf);
> > +err1:
> > + pr_err("dma_map_sg failed alloc mparam buf on %s\n", dev_name(mparam->dev));
> > + return NULL;
> > +}
> > +
> > +static void dma_sg_map_benchmark_unprepare(void *arg) {
> > + struct dma_sg_map_param *mparam = arg;
> > + int i;
> > +
> > + for (i = 0; i < mparam->npages; i++)
> > + free_page((unsigned long)mparam->buf[i]);
> > +
> > + sg_free_table(&mparam->sgt);
> > +
> > + kfree(mparam->buf);
> > + kfree(mparam);
> > +}
> > +
> > +static int dma_sg_map_benchmark_do_map(void *arg) {
> > + struct dma_sg_map_param *mparam = arg;
> > +
> > + int sg_mapped = dma_map_sg(mparam->dev, mparam->sgt.sgl,
> > + mparam->npages, mparam->dma_dir);
> > + if (!sg_mapped) {
> > + pr_err("dma_map_sg failed on %s\n", dev_name(mparam->dev));
> > + return -ENOMEM;
> > + }
> > +
> > + return 0;
> > +}
> > +
> > +static int dma_sg_map_benchmark_do_unmap(void *arg) {
> > + struct dma_sg_map_param *mparam = arg;
> > +
> > + dma_unmap_sg(mparam->dev, mparam->sgt.sgl, mparam->npages,
> > + mparam->dma_dir);
> > +
> > + return 0;
> > +}
> > +
> > +static struct map_benchmark_ops dma_sg_map_benchmark_ops = {
> > + .prepare = dma_sg_map_benchmark_prepare,
> > + .unprepare = dma_sg_map_benchmark_unprepare,
> > + .do_map = dma_sg_map_benchmark_do_map,
> > + .do_unmap = dma_sg_map_benchmark_do_unmap, };
> > +
> > static struct map_benchmark_ops *dma_map_benchmark_ops[DMA_MAP_MODE_MAX] = {
> > [DMA_MAP_SINGLE_MODE] = &dma_single_map_benchmark_ops,
> > + [DMA_MAP_SG_MODE] = &dma_sg_map_benchmark_ops,
> > };
> >
> > static int map_benchmark_thread(void *data)
> > --
> > 2.33.0
> >
>
Thanks
Barry
Powered by blists - more mailing lists