lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <20250827184923.GC2206304@nvidia.com>
Date: Wed, 27 Aug 2025 15:49:23 -0300
From: Jason Gunthorpe <jgg@...dia.com>
To: Nicolin Chen <nicolinc@...dia.com>
Cc: will@...nel.org, robin.murphy@....com, joro@...tes.org,
	jean-philippe@...aro.org, miko.lenczewski@....com,
	balbirs@...dia.com, peterz@...radead.org, smostafa@...gle.com,
	kevin.tian@...el.com, praan@...gle.com, zhangzekun11@...wei.com,
	linux-arm-kernel@...ts.infradead.org, iommu@...ts.linux.dev,
	linux-kernel@...r.kernel.org, patches@...ts.linux.dev
Subject: Re: [PATCH rfcv1 7/8] iommu/arm-smmu-v3: Add arm_smmu_invs based
 arm_smmu_domain_inv_range()

On Wed, Aug 13, 2025 at 06:25:38PM -0700, Nicolin Chen wrote:
> +void arm_smmu_domain_inv_range(struct arm_smmu_domain *smmu_domain,
> +			       unsigned long iova, size_t size,
> +			       unsigned int granule, bool leaf)
> +{
> +	struct arm_smmu_cmdq_batch cmds = {};
> +	struct arm_smmu_invs *invs;
> +	bool retried = false;
> +	size_t i;
> +
> +	/*
> +	 * An invalidation request must follow some IOPTE change and then load
> +	 * the invalidation array In the meantime, a domain attachment mutates
> +	 * the array and then stores an STE/CD asking SMMU HW to acquire those
> +	 * changed IOPTEs. In other word, these two are interdependent and can
> +	 * race.
> +	 *
> +	 * In a race, the RCU design (with its underlying memory barriers) can
> +	 * ensure the invalidations array to always get updated before loaded.
> +	 *
> +	 * smp_mb() is used here, paired with the smp_mb() following the array
> +	 * update in a concurrent attach, to ensure:
> +	 *  - HW sees the new IOPTEs if it walks after STE installation
> +	 *  - Invalidation thread sees the updated array with the new ASID.
> +	 *
> +	 *  [CPU0]                        | [CPU1]
> +	 *                                |
> +	 *  change IOPTEs and TLB flush:  |
> +	 *  arm_smmu_domain_inv_range() { | arm_smmu_install_new_domain_invs {
> +	 *    ...                         |   rcu_assign_pointer(new_invs);
> +	 *    smp_mb(); // ensure IOPTEs  |   smp_mb(); // ensure new_invs
> +	 *    ...                         |   kfree_rcu(old_invs, rcu);
> +	 *    // load invalidation array  | }
> +	 *    invs = rcu_dereference();   | arm_smmu_install_ste_for_dev {
> +	 *                                |   STE = TTB0 // read new IOPTEs
> +	 */
> +	smp_mb();
> +
> +	rcu_read_lock();
> +again:
> +	invs = rcu_dereference(smmu_domain->invs);
> +
> +	/* A concurrent attachment might have changed the array. Do a respin */
> +	if (unlikely(!read_trylock(&invs->rwlock)))
> +		goto again;
> +	/* Only one retry. Otherwise, it would soft lockup on an empty array */
> +	if (!retried && unlikely(!invs->num_invs)) {
> +		read_unlock(&invs->rwlock);
> +		retried = true;
> +		goto again;
> +	}

This has missed the point, it was to not get the unless we have
ATS. Something like this:


	rcu_read_lock();

	while (true) {
		invs = rcu_dereference(smmu_domain->invs);

		/*
		 * Avoid locking unless ATS is being used. No ATS invalidate can
		 * be going on after a domain is detached.
		 */
		locked = false;
		if (invs->has_ats || READ_ONCE(invs->old)) {
			read_lock(&invs->rwlock);
			if (invs->old) {
				read_unlock(&invs->rwlock);
				continue;
			}
			locked = true;
		}
		break;
	}

	num_invs = READ_ONCE(num_invs);
	for (i = 0; i != num_invs; i++) {

> +		case INV_TYPE_S2_VMID_S1_CLEAR:
> +			/* CMDQ_OP_TLBI_S12_VMALL already flushed S1 entries */
> +			if (arm_smmu_inv_size_too_big(cur->smmu, size, granule))
> +				continue;
> +			/* Just a single CMDQ_OP_TLBI_NH_ALL, no batching */
> +			cmd.tlbi.vmid = cur->id;
> +			arm_smmu_cmdq_issue_cmd_with_sync(cur->smmu, &cmd);

This should just stick it in the batch, the batch was already flushed:

	/* The batch for S2 TLBI must be done before nested S1 ASIDs */
	if (next->type == INV_TYPE_S2_VMID_S1_CLEAR)
		return true;

That needs a fixup too:

	/* The batch for S2 TLBI must be done before nested S1 ASIDs */
	if (cur->type != INV_TYPE_S2_VMID_S1_CLEAR &&
	    next->type == INV_TYPE_S2_VMID_S1_CLEAR)
		return true;

It makes sense to bundle all the NH_ALL into one batch if there is more
than one.

But this arm_smmu_invs_end_batch() no longer works right since the

		if (refcount_read(&cur->users) == 0)
			continue;

Was added..

Jason

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ