lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <20210114171232.GB944463@unreal>
Date:   Thu, 14 Jan 2021 19:12:32 +0200
From:   Leon Romanovsky <leon@...nel.org>
To:     Alexander Duyck <alexander.duyck@...il.com>
Cc:     Bjorn Helgaas <bhelgaas@...gle.com>,
        Saeed Mahameed <saeedm@...dia.com>,
        Jason Gunthorpe <jgg@...dia.com>,
        Jakub Kicinski <kuba@...nel.org>,
        linux-pci <linux-pci@...r.kernel.org>,
        linux-rdma@...r.kernel.org, Netdev <netdev@...r.kernel.org>,
        Don Dutile <ddutile@...hat.com>,
        Alex Williamson <alex.williamson@...hat.com>
Subject: Re: [PATCH mlx5-next v1 1/5] PCI: Add sysfs callback to allow MSI-X
 table size change of SR-IOV VFs

On Thu, Jan 14, 2021 at 08:51:22AM -0800, Alexander Duyck wrote:
> On Wed, Jan 13, 2021 at 11:16 PM Leon Romanovsky <leon@...nel.org> wrote:
> >
> > On Wed, Jan 13, 2021 at 12:00:00PM -0800, Alexander Duyck wrote:
> > > On Tue, Jan 12, 2021 at 10:09 PM Leon Romanovsky <leon@...nel.org> wrote:
> > > >
> > > > On Tue, Jan 12, 2021 at 01:59:51PM -0800, Alexander Duyck wrote:
> > > > > On Mon, Jan 11, 2021 at 10:39 PM Leon Romanovsky <leon@...nel.org> wrote:
> > > > > >
> > > > > > On Mon, Jan 11, 2021 at 11:30:33AM -0800, Alexander Duyck wrote:
> > > > > > > On Sun, Jan 10, 2021 at 7:12 AM Leon Romanovsky <leon@...nel.org> wrote:
> > > > > > > >
> > > > > > > > From: Leon Romanovsky <leonro@...dia.com>
> > > > > > > >
>
> <snip>
>
> > >
> > > > >
> > > > > Also I am not a big fan of the VF groping around looking for a PF
> > > > > interface as it means the interface will likely be exposed in the
> > > > > guest as well, but it just won't work.
> > > >
> > > > If you are referring to VF exposed to the VM, so in this case VF must be
> > > > bound too vfio driver, or any other driver, and won't allow MSI-X change.
> > > > If you are referring to PF exposed to the VM, it is very unlikely scenario
> > > > in real world and reserved for braves among us. Even in this case, the
> > > > set MSI-X won't work, because PF will be connected to the hypervisor driver
> > > > that doesn't support set_msix.
> > > >
> > > > So both cases are handled.
> > >
> > > I get that they are handled. However I am not a huge fan of the sysfs
> > > attributes for one device being dependent on another device. When you
> > > have to start searching for another device it just makes things messy.
> >
> > This is pretty common way, nothing new here.
>
> This is how writable fields within the device are handled. I am pretty
> sure this is the first sysfs entry that is providing a workaround via
> a device firmware to make the field editable that wasn't intended to
> be.
>
> So if in the future I define a device that has an MMIO register that
> allows me to edit configuration space should I just tie it into the
> same framework? That is kind of where I am going with my objection to
> this. It just seems like you are adding a backdoor to allow editing
> read-only configuration options.

I have no objections as long as your new sysfs will make sense and will
be acceptable by PCI community.

>
> > >
> > > > >
> > > > > > >
> > > > > > > If you are calling this on the VFs then it doesn't really make any
> > > > > > > sense anyway since the VF is not a "VF PCI dev representor" and
> > > > > > > shouldn't be treated as such. In my opinion if we are going to be
> > > > > > > doing per-port resource limiting that is something that might make
> > > > > > > more sense as a part of the devlink configuration for the VF since the
> > > > > > > actual change won't be visible to an assigned device.
> > > > > >
> > > > > > https://lore.kernel.org/linux-pci/20210112061535.GB4678@unreal/
> > > > >
> > > > > So the question I would have is if we are spawning the VFs and
> > > > > expecting them to have different configs or the same configuration?
> > > >
> > > > By default, they have same configuration.
> > > >
> > > > > I'm assuming in your case you are looking for a different
> > > > > configuration per port. Do I have that correct?
> > > >
> > > > No, per-VF as represents one device in the PCI world. For example, mlx5
> > > > can have more than one physical port.
> > >
> > > Sorry, I meant per virtual function, not per port.
> >
> > Yes, PCI spec is clear, MSI-X vector count is per-device and in our case
> > it means per-VF.
>
> I think you overlooked the part about it being "read-only". It isn't
> really meant to be changed and that is what this patch set is
> providing.

We doesn't allow writes to this field, but setting "hint" to the HW on
the proper resource provisioning.

>
> > >
> > > > >
> > > > > Where this gets ugly is that SR-IOV assumes a certain uniformity per
> > > > > VF so doing a per-VF custom limitation gets ugly pretty quick.
> > > >
> > > > I don't find any support for this "uniformity" claim in the PCI spec.
> > >
> > > I am referring to the PCI configuration space. Each VF ends up with
> > > some fixed amount of MMIO resources per function. So typically when
> > > you spawn VFs we had things setup so that all you do is say how many
> > > you want.
> > >
> > > > > I wonder if it would make more sense if we are going this route to just
> > > > > define a device-tree like schema that could be fed in to enable VFs
> > > > > instead of just using echo X > sriov_numvfs and then trying to fix
> > > > > things afterwards. Then you could define this and other features that
> > > > > I am sure you would need in the future via json-schema like is done in
> > > > > device-tree and write it once enabling the set of VFs that you need.
> > > >
> > > > Sorry, but this is overkill, it won't give us much and it doesn't fit
> > > > the VF usage model at all.
> > > >
> > > > Right now, all heavy users of SR-IOV are creating many VFs up to the maximum.
> > > > They do it with autoprobe disabled, because it is too time consuming to wait
> > > > till all VFs probe themselves and unbind them later.
> > > >
> > > > After that, they wait for incoming request to provision VM on VF, they set MAC
> > > > address, change MSI-X according to VM properties and bind that VF to new VM.
> > > >
> > > > So MSI-X change is done after VFs were created.
> > >
> > > So if I understand correctly based on your comments below you are
> > > dynamically changing the VF's MSI-X configuration space then?
> >
> > I'm changing "Table Size" from "7.7.2.2 Message Control Register for
> > MSI-X (Offset 02h)" and nothing more.
> >
> > If you do raw PCI read before and after, only this field will be changed.
>
> I would hope there is much more going on. Otherwise the VF hardware
> will be exploitable by a malicious driver in the guest since you could
> read/write to registers beyond the table and see some result. I am
> assuming the firmware doesn't allow triggering of any interrupts
> beyond the ones defined as being in the table.

You are talking about internal FW implementation, which is of course
secure. I'm talking about PCI config space.

Thanks

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ