linux-kernel - Re: [PATCH v5 0/6] Add Tegra241 (Grace) CMDQV Support (part 1/2)

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <20240417122926.GP3637727@nvidia.com>
Date: Wed, 17 Apr 2024 09:29:26 -0300
From: Jason Gunthorpe <jgg@...dia.com>
To: Shameerali Kolothum Thodi <shameerali.kolothum.thodi@...wei.com>
Cc: Nicolin Chen <nicolinc@...dia.com>, "will@...nel.org" <will@...nel.org>,
	"robin.murphy@....com" <robin.murphy@....com>,
	"joro@...tes.org" <joro@...tes.org>,
	"thierry.reding@...il.com" <thierry.reding@...il.com>,
	"vdumpa@...dia.com" <vdumpa@...dia.com>,
	"jonathanh@...dia.com" <jonathanh@...dia.com>,
	"linux-kernel@...r.kernel.org" <linux-kernel@...r.kernel.org>,
	"iommu@...ts.linux.dev" <iommu@...ts.linux.dev>,
	"linux-arm-kernel@...ts.infradead.org" <linux-arm-kernel@...ts.infradead.org>,
	"linux-tegra@...r.kernel.org" <linux-tegra@...r.kernel.org>,
	Jerry Snitselaar <jsnitsel@...hat.com>
Subject: Re: [PATCH v5 0/6] Add Tegra241 (Grace) CMDQV Support (part 1/2)

On Wed, Apr 17, 2024 at 09:45:34AM +0000, Shameerali Kolothum Thodi wrote:

> Just to add to that. One idea could be like to have a case where when ECMDQs are 
> detected, use that for issuing limited set of cmds(like stage 1 TLBIs) and use the
> normal cmdq for rest. Since we use stage 1 for both host and for Guest nested cases
> and TLBIs are the bottlenecks in most cases I think this should give performance
> benefits.

There is definately options to look at to improve the performance
here.

IMHO the design of the ECMDQ largely seems to expect 1 queue per-cpu
and then we move to a lock-less design where each CPU uses it's own
private per-cpu queue. In this case a VMM calling the kernel to do
invalidation would often naturally use a thread originating on a pCPU
bound to a vCPU which is substantially exclusive to the VM.

Jason