linux-kernel - Re: [PATCH V3 0/2] iommu/arm-smmu-v3: Add support for ECMDQ register mode

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <20240502162530.GA578685@nvidia.com>
Date: Thu, 2 May 2024 13:25:30 -0300
From: Jason Gunthorpe <jgg@...dia.com>
To: Tanmay Jagdale <tanmay@...vell.com>
Cc: will@...nel.org, robin.murphy@....com, joro@...tes.org,
	nicolinc@...dia.com, mshavit@...gle.com, baolu.lu@...ux.intel.com,
	thunder.leizhen@...wei.com, set_pte_at@...look.com,
	smostafa@...gle.com, sgoutham@...vell.com, gcherian@...vell.com,
	jcm@...masters.org, linux-arm-kernel@...ts.infradead.org,
	iommu@...ts.linux.dev, linux-kernel@...r.kernel.org
Subject: Re: [PATCH V3 0/2] iommu/arm-smmu-v3: Add support for ECMDQ register
 mode

On Thu, Apr 25, 2024 at 07:41:50AM -0700, Tanmay Jagdale wrote:
> Resending the patches by Zhen Lei <thunder.leizhen@...wei.com> that add
> support for SMMU ECMDQ feature.
> 
> Tested this feature on a Marvell SoC by implementing a smmu-test driver.
> This test driver spawns a thread per CPU and each thread keeps sending
> map, table-walk and unmap requests for a fixed duration.

So this is not just measuring invalidation performance but basically
the DMA API performance to do map/unmap operations? What is "batch
size" ?

Does this HW support the range invalidation? How many invalidation
commands does earch test cycle generate?

>                    Total Requests  Average Requests  Difference
>                                       Per CPU         wrt ECMDQ
> -----------------------------------------------------------------
> ECMDQ                 239286381       2991079
> CMDQ Batch Size 1     228232187       2852902         -4.62%
> CMDQ Batch Size 32    233465784       2918322         -2.43%
> CMDQ Batch Size 64    231679588       2895994         -3.18%
> CMDQ Batch Size 128   233189030       2914862         -2.55%
> CMDQ Batch Size 256   230965773       2887072         -3.48%

If this is really 5% for a typical DMA API map/unmap cycle then that
seems interesting to me.

If it is 5% for just the invalidation command then it is harder to
say.

I'd suggest to present your results in terms of latency to do a dma
API map/unmap cycle, and to show how the results scale as you add more
threads. Does even 2 threads start to show a 4-5% gain?

Also, I'm wondering how ATS would interact, I think we have a
head-of-line blocking issue with ATS.. Allowing ATS to progress
concurrently with unrelated parallel invalidations may also be
interesting, esepcially for multi-SVA workloads.

Jason