[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <20240417122926.GP3637727@nvidia.com>
Date: Wed, 17 Apr 2024 09:29:26 -0300
From: Jason Gunthorpe <jgg@...dia.com>
To: Shameerali Kolothum Thodi <shameerali.kolothum.thodi@...wei.com>
Cc: Nicolin Chen <nicolinc@...dia.com>, "will@...nel.org" <will@...nel.org>,
"robin.murphy@....com" <robin.murphy@....com>,
"joro@...tes.org" <joro@...tes.org>,
"thierry.reding@...il.com" <thierry.reding@...il.com>,
"vdumpa@...dia.com" <vdumpa@...dia.com>,
"jonathanh@...dia.com" <jonathanh@...dia.com>,
"linux-kernel@...r.kernel.org" <linux-kernel@...r.kernel.org>,
"iommu@...ts.linux.dev" <iommu@...ts.linux.dev>,
"linux-arm-kernel@...ts.infradead.org" <linux-arm-kernel@...ts.infradead.org>,
"linux-tegra@...r.kernel.org" <linux-tegra@...r.kernel.org>,
Jerry Snitselaar <jsnitsel@...hat.com>
Subject: Re: [PATCH v5 0/6] Add Tegra241 (Grace) CMDQV Support (part 1/2)
On Wed, Apr 17, 2024 at 09:45:34AM +0000, Shameerali Kolothum Thodi wrote:
> Just to add to that. One idea could be like to have a case where when ECMDQs are
> detected, use that for issuing limited set of cmds(like stage 1 TLBIs) and use the
> normal cmdq for rest. Since we use stage 1 for both host and for Guest nested cases
> and TLBIs are the bottlenecks in most cases I think this should give performance
> benefits.
There is definately options to look at to improve the performance
here.
IMHO the design of the ECMDQ largely seems to expect 1 queue per-cpu
and then we move to a lock-less design where each CPU uses it's own
private per-cpu queue. In this case a VMM calling the kernel to do
invalidation would often naturally use a thread originating on a pCPU
bound to a vCPU which is substantially exclusive to the VM.
Jason
Powered by blists - more mailing lists