[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <CAOSf1CGLM7wpN3=RwU+osy46jX9iWTi3TWDNyaCnBFOTyBiKpg@mail.gmail.com>
Date: Tue, 15 Nov 2022 23:49:13 +1100
From: "Oliver O'Halloran" <oohall@...il.com>
To: Leon Romanovsky <leon@...nel.org>
Cc: "Longpeng (Mike, Cloud Infrastructure Service Product Dept.)"
<longpeng2@...wei.com>, bhelgaas@...gle.com,
linux-pci@...r.kernel.org, linux-kernel@...r.kernel.org,
jianjay.zhou@...wei.com, zhuangshengen@...wei.com,
arei.gonglei@...wei.com, yechuan@...wei.com,
huangzhichao@...wei.com, xiehong@...wei.com
Subject: Re: [RFC 0/4] pci/sriov: support VFs dynamic addition
On Tue, Nov 15, 2022 at 7:32 PM Leon Romanovsky <leon@...nel.org> wrote:
>
> On Tue, Nov 15, 2022 at 12:50:34PM +1100, Oliver O'Halloran wrote:
> > On Tue, Nov 15, 2022 at 1:27 AM Leon Romanovsky <leon@...nel.org> wrote:
> > >
> > > *snip*
> > >
> > > Anyway, I'm aware of big cloud providers who are pretty happy with live
> > > migration in production.
> >
> > I could see someone sufficiently cloudbrained deciding that rebooting
> > the hypervisor is fine provided the downtime doesn't violate any
> > customer uptime SLAs. Personally I'd only be brave enough to do that
> > for a HV hosting internal services which I know are behind a load
> > balancer, but apparently there are people at Huawei far braver than I.
>
> My main point in this discussion that Huawei team doesn't actually
> provide any meaningful justification why it is great idea to add new
> sysfs file.
All their arguments seem to be based on trying to reduce the
time-to-VMs when a hypervisor is kexec()ed, which is a pretty
reasonable justification IMO. I do have some reservations about the
numbers they're claiming since 250ms for initializing struct pci_dev's
for the VFs seems excessive. Unfortunately, I don't have any hardware
that supports 2048 VFs on hand so I can't verify that claim.
> They use HPC as an argument but in that world, you won't
> see many VMs on one server, as it is important to provide separate MSI-X
> vectors and CPUs to each VM.
I don't think HPC has come up in this thread, but assuming it has: In
the cloud "HPC" usually means "it has timesliced access to GPUs".
Having 2k VMs sharing one or more GPUs on a single system isn't
necessarily advisable, but if we assume only a subset of those VMs
will actually need access to a GPU at any given time it's sort of
reasonable.
> They ask from us optimization (do not add device hierarchy for existing HW)
> that doesn't exist in the kernel.
>
> I would say that they are trying to meld SIOV architecture of subfunctions
> (SFs) into PCI and SR-IOV world.
I don't know what asks you're referring to, but they're not present in
this thread. I'm going to give Longpeng the benefit of the doubt and
assume that this series is an attempt to fix a problem he's facing
with actual hardware that exists today. To say they should have
implemented the device with SIOV (proprietary to Intel until March
this year) rather than SR-IOV (standardised by PCI-SIG over a decade
ago) is not terribly helpful to anyone. Additionally, SIOV exists
largely to solve a problem that's an issue because Intel decided that
all PCI devices should exist within a single PCI domain. If you don't
have that problem SIOV is a lot less compelling.
Powered by blists - more mailing lists