[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <20210321125818.GM2356281@nvidia.com>
Date: Sun, 21 Mar 2021 09:58:18 -0300
From: Jason Gunthorpe <jgg@...dia.com>
To: Alex Williamson <alex.williamson@...hat.com>
Cc: Christoph Hellwig <hch@....de>,
Max Gurtovoy <mgurtovoy@...dia.com>,
Alexey Kardashevskiy <aik@...abs.ru>, cohuck@...hat.com,
kvm@...r.kernel.org, linux-kernel@...r.kernel.org,
liranl@...dia.com, oren@...dia.com, tzahio@...dia.com,
leonro@...dia.com, yarong@...dia.com, aviadye@...dia.com,
shahafs@...dia.com, artemp@...dia.com, kwankhede@...dia.com,
ACurrid@...dia.com, cjia@...dia.com, yishaih@...dia.com,
mjrosato@...ux.ibm.com
Subject: Re: [PATCH 8/9] vfio/pci: export nvlink2 support into vendor
vfio_pci drivers
On Fri, Mar 19, 2021 at 10:40:28PM -0600, Alex Williamson wrote:
> > Well, today we don't, but Max here adds id_table's to the special
> > devices and a MODULE_DEVICE_TABLE would come too if we do the flavours
> > thing below.
>
> I think the id_tables are the wrong approach for IGD and NVLink
> variants.
I really disagree with this. Checking for some random bits in firmware
and assuming that every device made forever into the future works with
this check is not a good way to do compatibility. Christoph made the
same point.
We have good processes to maintain id tables, I don't see this as a
problem.
> > As-is driver_override seems dangerous as overriding the matching table
> > could surely allow root userspace to crash the machine. In situations
> > with trusted boot/signed modules this shouldn't be.
>
> When we're dealing with meta-drivers that can bind to anything, we
> shouldn't rely on the match, but should instead verify the driver is
> appropriate in the probe callback. Even without driver_override,
> there's the new_id mechanism. Either method allows the root user to
> break driver binding. Greg has previously stated something to the
> effect that users get to keep all the pieces when they break something
> by manipulating driver binding.
Yes, but that is a view where root is allowed to break the kernel, we
now have this optional other world where that is not allowed and root
access to lots of dangerous things are now disabled.
new_id and driver_override should probably be in that disable list
too..
> > While that might not seem too bad with these simple drivers, at least
> > the mlx5 migration driver will have a large dependency tree and pull
> > in lots of other modules. Even Max's sample from v1 pulls in mlx5_core.ko
> > and a bunch of other stuff in its orbit.
>
> Luckily the mlx5 driver doesn't need to be covered by compatibility
> support, so we don't need to set a softdep for it and the module could
> be named such that a wildcard driver_override of vfio_pci* shouldn't
> logically include that driver. Users can manually create their own
> modprobe.d softdep entry if they'd like to include it. Otherwise
> userspace would need to know to bind to it specifically.
But now you are giving up on the whole point, which was to
automatically load the correct specific module without special admin
involvement!
> > This is why I want to try for fine grained autoloading first. It
> > really is the elegant solution if we can work it out.
>
> I just don't see how we create a manageable change to userspace.
I'm not sure I understand. Even if we add a new sysfs to set some
flavour then that is a pretty trivial change for userspace to move
from driver_override?
> > I don't think we should over-focus on these two firmware triggered
> > examples. I looked at the Intel GPU driver and it already only reads
> > the firmware thing for certain PCI ID's, we can absolutely generate a
> > narrow match table for it. Same is true for the NVIDIA GPU.
>
> I'm not sure we can make this assertion, both only care about the type
> of device and existence of associated firmware tables.
Well, I read through the Intel GPU driver and this is how I felt it
works. It doesn't even check the firmware bit unless certain PCI IDs
are matched first.
For NVIDIA GPU Max checked internally and we saw it looks very much
like how Intel GPU works. Only some PCI IDs trigger checking on the
feature the firmware thing is linked to.
My point is: the actual *drivers* consuming these firmware features do
*not* blindly match every PCI device and check for the firmware
bit. They all have narrow matches and further only try to use the
firmware thing for some subset of PCI IDs that the entire driver
supports.
Given that the actual drivers work this way there is no technical
reason vfio-pci can't do this as well.
We don't have to change them of course, they can stay as is if people
feel really strongly.
> > Even so, I'm not *so* worried about "over matching" - if IGD or the
> > nvidia stuff load on a wide set of devices then they can just not
> > enable their extended stuff. It wastes some kernel memory, but it is
> > OK.
>
> I'd rather they bind to the base vfio-pci driver if their extended
> features are not available.
Sure it would be nice, but functionally it is no different.
> > And if some driver *really* gets stuck here the true answer is to
> > improve the driver core match capability.
> >
> > > devices in the deny-list and non-endpoint devices. Many drivers
> > > clearly place implicit trust in their id_table, others don't. In the
> > > case of meta drivers, I think it's fair to make use of the latter
> > > approach.
> >
> > Well, AFAIK, the driver core doesn't have a 'try probe, if it fails
> > then try another driver' approach. One device, one driver. Am I
> > missing something?
>
> If the driver probe callback fails, really_probe() returns 0 with the
> comment:
>
> /*
> * Ignore errors returned by ->probe so that the next driver can try
> * its luck.
> */
> ret = 0;
>
> That allows bus_for_each_drv() to continue to iterate.
Er, but we have no reliable way to order drivers in the list so this
still assumes the system has exactly one driver match (even if some of
the match is now in code).
It won't work with a "universal" driver without more changes.
(and I couldn't find out why Cornelia added this long ago, or how or
even if it actually ended up being used)
> > I also think the softdep/implicit loading/ordering will not be
> > welcomed, it feels weird to me.
>
> AFAICT, it works within the existing driver-core, it's largely an
> extension to pci-core driver_override support to enable wildcard
> matching, ideally along with adding the same for all buses that support
> driver_override. Thanks,
It is the implicit ordering of module loading that is trouble.
Regards,
Jason
Powered by blists - more mailing lists