linux-kernel - Re: [PATCH 1/3] iommu/io-pgtable-arm: Add nents_per_pgtable in struct io_pgtable

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <20240122175700.GB1455070@nvidia.com>
Date: Mon, 22 Jan 2024 13:57:00 -0400
From: Jason Gunthorpe <jgg@...dia.com>
To: Nicolin Chen <nicolinc@...dia.com>
Cc: "will@...nel.org" <will@...nel.org>,
	Robin Murphy <robin.murphy@....com>,
	"joro@...tes.org" <joro@...tes.org>,
	"jean-philippe@...aro.org" <jean-philippe@...aro.org>,
	Alistair Popple <apopple@...dia.com>,
	"linux-kernel@...r.kernel.org" <linux-kernel@...r.kernel.org>,
	"linux-arm-kernel@...ts.infradead.org" <linux-arm-kernel@...ts.infradead.org>,
	"iommu@...ts.linux.dev" <iommu@...ts.linux.dev>
Subject: Re: [PATCH 1/3] iommu/io-pgtable-arm: Add nents_per_pgtable in
 struct io_pgtable_cfg

On Mon, Jan 22, 2024 at 09:24:08AM -0800, Nicolin Chen wrote:
> > Or do we need to measure at boot time invalidation performance and set
> > a threshold that way?
> 
> I see. We can run an invalidation at default max_tlbi_ops to
> get its delay in msec or usec, and then set as the threshold
> "xx ms" in the idea one.
> 
> > Also, it seems to me that SVA use cases and, say, DMA API cases are
> > somewhat different where we may be willing to wait longer for DMA API.
> 
> Hmm, the lockup that my patch fixed was for an SVA case that
> doesn't seem to involve DMA API:
> https://lore.kernel.org/linux-iommu/20230901203904.4073-1-nicolinc@nvidia.com/
> 
> And the other lockup fix for a non-SVA case from Zhang doesn't
> seem to involve DMA API either:
> https://lore.kernel.org/linux-iommu/e74ea905-d107-4202-97ca-c2c509e7aa1e@huawei.com/
> 
> Maybe we can treat DMA API a bit different. But I am not sure
> about the justification of leaving it to wait longer. Mind
> elaborating?

Well, there are two issues.. The first is the soft lockup, that should
just be reliably prevented. The timer, for instance, is a reasonable
stab at making that universally safe.

Then there is the issue of just raw invalidation performance, where
SVA particularly is linked to the mm and the longer invalidation takes
the slower the apps will be. We don't have any idea where future DMA
might hit the cache, so it is hard to know if all invalidation is not
the right thing..

DMA api is often lazy and the active DMA is a bit more predictable, so
perhaps there is more merit in spending more time to narrow the
invalidation.

The other case was vfio unmap for VM tear down, which ideally would
use whole ASID invalidation.

If your issue is softlockup, not performance, then that should be
prevented strongly. Broadly speaking if SVA is pushing too high an
invalidation workload then we need to agressively trim it, and do so
dynamically. Certainly we should not have a tunable that has to be set
right to avoid soft lockup.

A tunable to improve performance, perhaps, but not to achieve basic
correctness.

Maybe it is really just a simple thing - compute how many invalidation
commands are needed, if they don't all fit in the current queue space,
then do an invalidate all instead?

Jason