[<prev] [next>] [<thread-prev] [day] [month] [year] [list]
Message-ID: <BYAPR21MB1270E9923364F065D70086E1BFFC9@BYAPR21MB1270.namprd21.prod.outlook.com>
Date: Fri, 29 Apr 2022 15:47:17 +0000
From: Dexuan Cui <decui@...rosoft.com>
To: Lorenzo Pieralisi <lorenzo.pieralisi@....com>
CC: "wei.liu@...nel.org" <wei.liu@...nel.org>,
KY Srinivasan <kys@...rosoft.com>,
Haiyang Zhang <haiyangz@...rosoft.com>,
Stephen Hemminger <sthemmin@...rosoft.com>,
"bhelgaas@...gle.com" <bhelgaas@...gle.com>,
"linux-hyperv@...r.kernel.org" <linux-hyperv@...r.kernel.org>,
"linux-pci@...r.kernel.org" <linux-pci@...r.kernel.org>,
"linux-kernel@...r.kernel.org" <linux-kernel@...r.kernel.org>,
"Michael Kelley (LINUX)" <mikelley@...rosoft.com>,
"robh@...nel.org" <robh@...nel.org>, "kw@...ux.com" <kw@...ux.com>,
Jake Oshins <jakeo@...rosoft.com>
Subject: RE: [PATCH] PCI: hv: Do not set PCI_COMMAND_MEMORY to reduce VM boot
time
> From: Lorenzo Pieralisi <lorenzo.pieralisi@....com>
> Sent: Friday, April 29, 2022 3:15 AM
> To: Dexuan Cui <decui@...rosoft.com>
> Cc: wei.liu@...nel.org; KY Srinivasan <kys@...rosoft.com>; Haiyang Zhang
> <haiyangz@...rosoft.com>; Stephen Hemminger <sthemmin@...rosoft.com>;
> bhelgaas@...gle.com; linux-hyperv@...r.kernel.org;
> linux-pci@...r.kernel.org; linux-kernel@...r.kernel.org; Michael Kelley (LINUX)
> <mikelley@...rosoft.com>; robh@...nel.org; kw@...ux.com; Jake Oshins
> <jakeo@...rosoft.com>
> Subject: Re: [PATCH] PCI: hv: Do not set PCI_COMMAND_MEMORY to reduce
> VM boot time
>
> On Tue, Apr 19, 2022 at 03:00:07PM -0700, Dexuan Cui wrote:
> > A VM on Azure can have 14 GPUs, and each GPU may have a huge MMIO
> BAR,
> > e.g. 128 GB. Currently the boot time of such a VM can be 4+ minutes, and
> > most of the time is used by the host to unmap/map the vBAR from/to pBAR
> > when the VM clears and sets the PCI_COMMAND_MEMORY bit: each
> unmap/map
> > operation for a 128GB BAR needs about 1.8 seconds, and the pci-hyperv
> > driver and the Linux PCI subsystem flip the PCI_COMMAND_MEMORY bit
> > eight times (see pci_setup_device() -> pci_read_bases() and
> > pci_std_update_resource()), increasing the boot time by 1.8 * 8 = 14.4
> > seconds per GPU, i.e. 14.4 * 14 = 201.6 seconds in total.
> >
> > Fix the slowness by not turning on the PCI_COMMAND_MEMORY in
> pci-hyperv.c,
> > so the bit stays in the off state before the PCI device driver calls
> > pci_enable_device(): when the bit is off, pci_read_bases() and
> > pci_std_update_resource() don't cause Hyper-V to unmap/map the vBARs.
> > With this change, the boot time of such a VM is reduced by
> > 1.8 * (8-1) * 14 = 176.4 seconds.
>
> I believe you need to clarify this commit message. It took me a while
> to understand what you are really doing.
>
> What this patch is doing is bootstrapping PCI devices with command
> memory clear because there is no need to have it set (?) in the first
> place and because, if it is set, the PCI core layer needs to toggle it
> on and off in order to eg size BAR regions, which causes the slow down
> you are reporting.
>
> I assume, given the above, that there is strictly no need to have
> devices with command memory set at kernel startup handover and if
> there was it would not be set in the PCI Hyper-V host controller
> driver (because that's what you are _removing_).
>
> I think this should not be merged as a fix and I'd be careful
> about possible regressions before sending it to stable kernels,
> if you wish to do so.
>
> It is fine by me to go via the Hyper-V tree even though I don't
> see why that's better, unless you want to send it as a fix and
> I think you should not.
>
> You can add my tag but the commit log should be rewritten and
> you should add a Link: to the discussion thread.
>
> Acked-by: Lorenzo Pieralisi <lorenzo.pieralisi@....com>
Thanks, Lorenzo! I'll post v2 with the commit message revised.
It's ok to me to have this patch go through the hyperv-next branch
rather than hyperv-fixes.
Powered by blists - more mailing lists