lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <20240501001758.GZ941030@nvidia.com>
Date: Tue, 30 Apr 2024 21:17:58 -0300
From: Jason Gunthorpe <jgg@...dia.com>
To: Nicolin Chen <nicolinc@...dia.com>
Cc: will@...nel.org, robin.murphy@....com, joro@...tes.org,
	thierry.reding@...il.com, vdumpa@...dia.com, jonathanh@...dia.com,
	linux-kernel@...r.kernel.org, iommu@...ts.linux.dev,
	linux-arm-kernel@...ts.infradead.org, linux-tegra@...r.kernel.org
Subject: Re: [PATCH v6 6/6] iommu/tegra241-cmdqv: Limit CMDs for guest owned
 VINTF

On Tue, Apr 30, 2024 at 11:58:44AM -0700, Nicolin Chen wrote:

> > Has to push everything, across all the iterations of add/submut, onto
> > the same CMDQ otherwise the SYNC won't be properly flushing?
> 
> ECMDQ seems to have such a limitation, but VCMDQs can get away
> as HW can insert a SYNC to a queue that doesn't end with a SYNC.

That seems like a strange thing to do in HW, but I recall you
mentioned it once before. Still, I'm not sure there is any merit in
relying on it?

> > But each arm_smmu_cmdq_issue_cmdlist() calls its own get q
> > function. Yes, they probably return the same Q since we are probably
> > on the same CPU, but it seems logically wrong (and slower!) to
> > organize it like this.
> > 
> > I would expect the Q to be selected when the struct
> > arm_smmu_cmdq_batch is allocated on the stack, and be the same for the
> > entire batch operation. Not only do we spend less time trying to
> > compute the Q to use we have a built in guarentee that every command
> > will be on the same Q as the fenching SYNC.
> 
> This seems to be helpful to ECMDQ. The current version disables
> the preempts, which feels costly to me.

Oh, yes, definately should work like this then!

> > Something sort of like this as another patch?
> > 
> > diff --git a/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c b/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c
> > index 268da20baa4e9c..d8c9597878315a 100644
> > --- a/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c
> > +++ b/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c
> > @@ -357,11 +357,22 @@ static int arm_smmu_cmdq_build_cmd(u64 *cmd, struct arm_smmu_cmdq_ent *ent)
> >  	return 0;
> >  }
> >  
> > -static struct arm_smmu_cmdq *arm_smmu_get_cmdq(struct arm_smmu_device *smmu,
> > -					       u64 *cmds, int n)
> > +enum required_cmds {
> > +	CMDS_ALL,
> > +	/*
> > +	 * Commands will be one of:
> > +	 *  CMDQ_OP_ATC_INV, CMDQ_OP_TLBI_EL2_VA, CMDQ_OP_TLBI_NH_VA,
> > +	 *  CMDQ_OP_TLBI_EL2_ASID, CMDQ_OP_TLBI_NH_ASID, CMDQ_OP_TLBI_S2_IPA,
> > +	 *  CMDQ_OP_TLBI_S12_VMALL, CMDQ_OP_SYNC
> > +	 */
> > +	CMDS_INVALIDATION,
> > +};
> 
> Hmm, guest-owned VCMDQs don't support EL2 commands. So, it feels
> to be somehow complicated to decouple them further in the callers
> of arm_smmu_cmdq_batch_add(). And I am not sure if there is a use
> case of guest issuing CMDQ_OP_TLBI_S2_IPA/CMDQ_OP_TLBI_S12_VMALL
> either, HW surprisingly supports these two though.

These are the max commands that could be issued, but they are all
gated based on the feature bits. The ones VCMDQ don't support are not
going to be issued because of the feature bits. You could test and
enforce this when probing the ECMDQ parts.

> Perhaps we could just scan the first command in the batch, giving
> a faith that no one will covertly sneak different commands in it?

Yes with the current design that does seem to work, but it also feels
a bit obtuse.

> Otherwise, there has to be a get_suported_cmdq callback so batch
> or its callers can avoid adding unsupported commands at the first
> place.

If you really feel strongly the invalidation could be split into
S1/S2/S1_VM groupings that align with the feature bits and that could
be passed down from one step above. But I don't think the complexity
is really needed. It is better to deal with it through the feature
mechanism.

If high speed invalidation is supported then the invalidation queue
must support all the invalidation commands used by the SMMU's active
feature set, otherwise do not enable the invalidation queue. It does
make logical sense.

Jason

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ