[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <20250827184923.GC2206304@nvidia.com>
Date: Wed, 27 Aug 2025 15:49:23 -0300
From: Jason Gunthorpe <jgg@...dia.com>
To: Nicolin Chen <nicolinc@...dia.com>
Cc: will@...nel.org, robin.murphy@....com, joro@...tes.org,
jean-philippe@...aro.org, miko.lenczewski@....com,
balbirs@...dia.com, peterz@...radead.org, smostafa@...gle.com,
kevin.tian@...el.com, praan@...gle.com, zhangzekun11@...wei.com,
linux-arm-kernel@...ts.infradead.org, iommu@...ts.linux.dev,
linux-kernel@...r.kernel.org, patches@...ts.linux.dev
Subject: Re: [PATCH rfcv1 7/8] iommu/arm-smmu-v3: Add arm_smmu_invs based
arm_smmu_domain_inv_range()
On Wed, Aug 13, 2025 at 06:25:38PM -0700, Nicolin Chen wrote:
> +void arm_smmu_domain_inv_range(struct arm_smmu_domain *smmu_domain,
> + unsigned long iova, size_t size,
> + unsigned int granule, bool leaf)
> +{
> + struct arm_smmu_cmdq_batch cmds = {};
> + struct arm_smmu_invs *invs;
> + bool retried = false;
> + size_t i;
> +
> + /*
> + * An invalidation request must follow some IOPTE change and then load
> + * the invalidation array In the meantime, a domain attachment mutates
> + * the array and then stores an STE/CD asking SMMU HW to acquire those
> + * changed IOPTEs. In other word, these two are interdependent and can
> + * race.
> + *
> + * In a race, the RCU design (with its underlying memory barriers) can
> + * ensure the invalidations array to always get updated before loaded.
> + *
> + * smp_mb() is used here, paired with the smp_mb() following the array
> + * update in a concurrent attach, to ensure:
> + * - HW sees the new IOPTEs if it walks after STE installation
> + * - Invalidation thread sees the updated array with the new ASID.
> + *
> + * [CPU0] | [CPU1]
> + * |
> + * change IOPTEs and TLB flush: |
> + * arm_smmu_domain_inv_range() { | arm_smmu_install_new_domain_invs {
> + * ... | rcu_assign_pointer(new_invs);
> + * smp_mb(); // ensure IOPTEs | smp_mb(); // ensure new_invs
> + * ... | kfree_rcu(old_invs, rcu);
> + * // load invalidation array | }
> + * invs = rcu_dereference(); | arm_smmu_install_ste_for_dev {
> + * | STE = TTB0 // read new IOPTEs
> + */
> + smp_mb();
> +
> + rcu_read_lock();
> +again:
> + invs = rcu_dereference(smmu_domain->invs);
> +
> + /* A concurrent attachment might have changed the array. Do a respin */
> + if (unlikely(!read_trylock(&invs->rwlock)))
> + goto again;
> + /* Only one retry. Otherwise, it would soft lockup on an empty array */
> + if (!retried && unlikely(!invs->num_invs)) {
> + read_unlock(&invs->rwlock);
> + retried = true;
> + goto again;
> + }
This has missed the point, it was to not get the unless we have
ATS. Something like this:
rcu_read_lock();
while (true) {
invs = rcu_dereference(smmu_domain->invs);
/*
* Avoid locking unless ATS is being used. No ATS invalidate can
* be going on after a domain is detached.
*/
locked = false;
if (invs->has_ats || READ_ONCE(invs->old)) {
read_lock(&invs->rwlock);
if (invs->old) {
read_unlock(&invs->rwlock);
continue;
}
locked = true;
}
break;
}
num_invs = READ_ONCE(num_invs);
for (i = 0; i != num_invs; i++) {
> + case INV_TYPE_S2_VMID_S1_CLEAR:
> + /* CMDQ_OP_TLBI_S12_VMALL already flushed S1 entries */
> + if (arm_smmu_inv_size_too_big(cur->smmu, size, granule))
> + continue;
> + /* Just a single CMDQ_OP_TLBI_NH_ALL, no batching */
> + cmd.tlbi.vmid = cur->id;
> + arm_smmu_cmdq_issue_cmd_with_sync(cur->smmu, &cmd);
This should just stick it in the batch, the batch was already flushed:
/* The batch for S2 TLBI must be done before nested S1 ASIDs */
if (next->type == INV_TYPE_S2_VMID_S1_CLEAR)
return true;
That needs a fixup too:
/* The batch for S2 TLBI must be done before nested S1 ASIDs */
if (cur->type != INV_TYPE_S2_VMID_S1_CLEAR &&
next->type == INV_TYPE_S2_VMID_S1_CLEAR)
return true;
It makes sense to bundle all the NH_ALL into one batch if there is more
than one.
But this arm_smmu_invs_end_batch() no longer works right since the
if (refcount_read(&cur->users) == 0)
continue;
Was added..
Jason
Powered by blists - more mailing lists