[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <20220111162027.3brb7ga3vgtvv6th@oracle.com>
Date: Tue, 11 Jan 2022 11:20:27 -0500
From: Daniel Jordan <daniel.m.jordan@...cle.com>
To: Jason Gunthorpe <jgg@...dia.com>
Cc: Alexander Duyck <alexanderduyck@...com>,
Alex Williamson <alex.williamson@...hat.com>,
Andrew Morton <akpm@...ux-foundation.org>,
Ben Segall <bsegall@...gle.com>,
Cornelia Huck <cohuck@...hat.com>,
Dan Williams <dan.j.williams@...el.com>,
Dave Hansen <dave.hansen@...ux.intel.com>,
Dietmar Eggemann <dietmar.eggemann@....com>,
Herbert Xu <herbert@...dor.apana.org.au>,
Ingo Molnar <mingo@...hat.com>,
Johannes Weiner <hannes@...xchg.org>,
Josh Triplett <josh@...htriplett.org>,
Michal Hocko <mhocko@...e.com>, Nico Pache <npache@...hat.com>,
Pasha Tatashin <pasha.tatashin@...een.com>,
Peter Zijlstra <peterz@...radead.org>,
Steffen Klassert <steffen.klassert@...unet.com>,
Steve Sistare <steven.sistare@...cle.com>,
Tejun Heo <tj@...nel.org>,
Tim Chen <tim.c.chen@...ux.intel.com>,
Vincent Guittot <vincent.guittot@...aro.org>,
linux-mm@...ck.org, kvm@...r.kernel.org,
linux-kernel@...r.kernel.org, linux-crypto@...r.kernel.org
Subject: Re: [RFC 00/16] padata, vfio, sched: Multithreaded VFIO page pinning
On Mon, Jan 10, 2022 at 08:17:51PM -0400, Jason Gunthorpe wrote:
> On Mon, Jan 10, 2022 at 05:27:25PM -0500, Daniel Jordan wrote:
>
> > > > Pinning itself, the only thing being optimized, improves 8.5x in that
> > > > experiment, bringing the time from 1.8 seconds to .2 seconds. That's a
> > > > significant savings IMHO
> > >
> > > And here is where I suspect we'd get similar results from folio's
> > > based on the unpin performance uplift we already saw.
> > >
> > > As long as PUP doesn't have to COW its work is largely proportional to
> > > the number of struct pages it processes, so we should be expecting an
> > > upper limit of 512x gains on the PUP alone with foliation.
> > >
> > > This is in line with what we saw with the prior unpin work.
> >
> > "in line with what we saw" Not following. The unpin work had two
> > optimizations, I think, 4.5x and 3.5x which together give 16x. Why is
> > that in line with the potential gains from pup?
>
> It is the same basic issue, doing extra work, dirtying extra memory..
Ok, gotcha.
> I don't know of other users that use such huge memory sizes this would
> matter, besides a VMM..
Right, all the VMMs out there that use vfio.
> > My assumption going into this series was that multithreading VFIO page
> > pinning in the kernel was a viable way forward given the positive
> > feedback I got from the VFIO maintainer last time I posted this, which
> > was admittedly a while ago, and I've since been focused on the other
> > parts of this series rather than what's been happening in the mm lately.
> > Anyway, your arguments are reasonable, so I'll go take a look at some of
> > these optimizations and see where I get.
>
> Well, it is not *unreasonable* it just doesn't seem compelling to me
> yet.
>
> Especially since we are not anywhere close to the limit of single
> threaded performance. Aside from GUP, the whole way we transfer the
> physical pages into the iommu is just begging for optimizations
> eg Matthew's struct phyr needs to be an input and output at the iommu
> layer to make this code really happy.
/nods/ There are other ways forward. As I say, I'll take a look.
Powered by blists - more mailing lists