linux-kernel - Re: [PATCH 7/8] rust: pci: add physfn(), to return PF device for VF device

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-Id: <DEFZP148A2NK.29G6N1V1SXT8T@kernel.org>
Date: Sun, 23 Nov 2025 23:07:47 +1300
From: "Danilo Krummrich" <dakr@...nel.org>
To: "Leon Romanovsky" <leon@...nel.org>
Cc: "Jason Gunthorpe" <jgg@...pe.ca>, "Peter Colberg" <pcolberg@...hat.com>,
 "Bjorn Helgaas" <bhelgaas@...gle.com>,
 Krzysztof Wilczyński <kwilczynski@...nel.org>, "Miguel
 Ojeda" <ojeda@...nel.org>, "Alex Gaynor" <alex.gaynor@...il.com>, "Boqun
 Feng" <boqun.feng@...il.com>, "Gary Guo" <gary@...yguo.net>,
 Björn Roy Baron <bjorn3_gh@...tonmail.com>, "Benno Lossin"
 <lossin@...nel.org>, "Andreas Hindborg" <a.hindborg@...nel.org>, "Alice
 Ryhl" <aliceryhl@...gle.com>, "Trevor Gross" <tmgross@...ch.edu>, "Abdiel
 Janulgue" <abdiel.janulgue@...il.com>, "Daniel Almeida"
 <daniel.almeida@...labora.com>, "Robin Murphy" <robin.murphy@....com>,
 "Greg Kroah-Hartman" <gregkh@...uxfoundation.org>, "Dave Ertman"
 <david.m.ertman@...el.com>, "Ira Weiny" <ira.weiny@...el.com>,
 <linux-pci@...r.kernel.org>, <rust-for-linux@...r.kernel.org>,
 <linux-kernel@...r.kernel.org>, "Alexandre Courbot" <acourbot@...dia.com>,
 "Alistair Popple" <apopple@...dia.com>, "Joel Fernandes"
 <joelagnelf@...dia.com>, "John Hubbard" <jhubbard@...dia.com>, "Zhi Wang"
 <zhiw@...dia.com>
Subject: Re: [PATCH 7/8] rust: pci: add physfn(), to return PF device for VF
 device

On Sun Nov 23, 2025 at 7:34 PM NZDT, Leon Romanovsky wrote:
> On Sun, Nov 23, 2025 at 11:26:52AM +1300, Danilo Krummrich wrote:
>> On Sun Nov 23, 2025 at 7:57 AM NZDT, Leon Romanovsky wrote:
>> > On Sat, Nov 22, 2025 at 12:16:15PM -0400, Jason Gunthorpe wrote:
>> >> On Sat, Nov 22, 2025 at 11:23:16PM +1300, Danilo Krummrich wrote:
>> >> > So far I haven't heard a convincing reason for not providing this guarantee. The
>> >> > only reason not to guarantee this I have heard is that some PF drivers only
>> >> > enable SR-IOV and hence could be unloaded afterwards. However, I think there is
>> >> > no strong reason to do so.
>> >> 
>> >> I know some people have work flows around this. I think they are
>> >> wrong. When we "fixed" mlx5 a while back there was some pushback and
>> >> some weird things did stop working.
>> >
>> > I don't think that this is correct. The main use case for keeping VFs,
>> > while unloading PF driver is to allow reuse this PF for VFIO too. It is
>> > very natural and common flow for "real" SR-IOV devices which present all
>> > PF/VFs as truly separate devices.
>> 
>> That sounds a bit odd to me, what exactly do you mean with "reuse the PF for
>> VFIO"? What do you do with the PF after driver unload instead? Load another
>> driver? If so, why separate ones?
>
> One of the main use cases for SR-IOV is to provide to VM users/customers
> devices with performance and security promises as physical ones. In this
> case, the VMs are created through PF and not bound to any driver. Once
> customer/user requests VM, that VF is bound to vfio-pci driver and
> attached to that VM.
>
> In many cases, PF is unbound too from its original driver and attached
> to some other VM. It allows for these VM providers to maximize
> utilization of their SR-IOV devices.
>
> At least in PCI spec 6.0.1, it stays clearly that PF can be attached to SI (VM in spec language).
> "Physical Function (PF) - A PF is a PCIe Function that supports the SR-IOV Extended Capability
> and is accessible to an SR-PCIM, a VI, or an SI."

Hm, that's possible, but do we have cases of this in practice where we bind and
unbind the same PF multiple times, pass it to different VMs, etc.?

In any case, what we can always do is what I propose in [1], i.e. add a bool to
struct pci_driver that indicates whether the VFs should be unbound when the PF
is unbound, such that we can uphold the guarantee of VF bound implies PF bound
at least conditionally.

[1] https://lore.kernel.org/all/DEFL4TG0WX1C.2GLH4417EPU3V@kernel.org/

>> >> So while I support this goal, I don't know if enough people will
>> >> agree..
>> >
>> > I don't see that Bjorn is CCed here, so it is unclear to whom Danilo
>> > talked to get rationale for this new behaviour.
>> 
>> Not sure what you mean, Bjorn is CC'd on the whole series and hence also this
>> conversation.
>> 
>> Otherwise, my proposal to guarantee that the PF is bound as long as the VF(s)
>> are bound stems from the corresponding beneficial lifecycle implications for the
>> driver model (some more on this below).
>> 
>> >> > With this, the above code will be correct and a driver can use the generic
>> >> > infrastructure to:
>> >> > 
>> >> >   1) Call pci::Device<Bound>::physfn() returning a Result<pci::Device<Bound>>
>> >> >   2) Grab the driver's device private data from the returned Device<Bound>
>> >> > 
>> >> > Note that 2) (i.e. accessing the driver's device private data with
>> >> > Device::drvdata() [1]) ensures that the device is actually bound (drvdata() is
>> >> > only implemented for Device<Bound>) and that the returned type actually matches
>> >> > the type of the object that has been stored.
>> >> 
>> >> This is what the core helper does, with the twist that it "validates"
>> >> the PF driver is working right by checking its driver binding..
>> >> 
>> >> > I suggest to have a look at [2] for an example with how this turns out with the
>> >> > auxiliary bus [2][3], since in the end it's the same problem, i.e. an auxiliary
>> >> > driver calling into its parent, except that the auxiliary bus already guarantees
>> >> > that the parent is bound when the child is bound.
>> >> 
>> >> Aux bus is a little different because it should be used in a way where
>> >> there are stronger guarantees about what the parent is. Ie the aux bus
>> >> names should be unique and limited to the type of parent.
>> >
>> > Right, aux bus is completely unrelated to real physical PCI bus.
>> 
>> Of course the auxiliary and the PCI bus are fundamentally different busses,
>> *but* they also have commonalities: the driver lifecycle requirements drivers
>> have to deal with are the same.
>> 
>> For instance, if you call from an auxiliary driver into the parent driver to let
>> the parent driver do some work on behalf, the auxiliary driver has to guarantee
>> that the parent device is guranteed to remain bound for the duration of the
>> call, otherwise the parent driver can't call dev_get_drvdata() without risking a
>> UAF, since a concurrent unbind might free the parent driver's device private
>> data.
>> 
>> The same is true for PCI when a VF driver calls into the PF driver to do some
>> work on behalf. The VF driver has to make sure that the PF driver remains bound
>> for the whole duration of the call for the exact same reasons.
>
> And this section is not entirely correct. It is one of the possible
> usages of PF. In many devices, PF and its VFs are independent and don't
> communicate through kernel at all. There is no need for VF to have PF be
> bounded to its original driver.

Of course -- similarly there might be cases where an auxiliary driver might not
cummunicate with its parent, but that's not the case I'm referring to above.

>
>> 
>> I.e. it is a common driver lifecycle pattern for interactions between drivers in
>> general.
>
> And it is not true as well. In case you need to interact between
> devices/drivers, it is up to the caller to ensure that this driver isn't
> unloaded.

You're mixing two things here. The driver model lifecycle requires that if
driver A calls into driver B - where B accesses its device private data - that B
is bound for the full duration of the call.

This is true for any such interaction, the question of who is responsible to
uphold this guarantee, e.g. the bus layer or the driver itself is orthogonal.

Whenever possible, it's best to let the bus layer uphold the guarantee though,
since it's one pitfall less a driver can fall into.

>> 
>> As for Rust specifically, we can actually model such driver lifecycle patterns
>> with the type system and with that do a lot of the driver lifecycle checks (that
>> drivers love to ignore :) even at compile time, such that your code doesn't even
>> compile when you violate the driver lifecycle rules.
>> 
>> For the auxiliary bus that's already the case and I'd love to get to this point
>> with PF <-> VF interactions (in Rust) as well.
>
> It is absolutely correct thing to do for auxbus. It is very questionable for SR-IOV.

At least conditionally (as proposed above), it's an improvement for cases where
there is PF <-> VF interactions, i.e. why let drivers take care if the bus can
already do it for them.