[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <20250617154957.67144f0a.alex.williamson@redhat.com>
Date: Tue, 17 Jun 2025 15:49:57 -0600
From: Alex Williamson <alex.williamson@...hat.com>
To: Daniel Dadap <ddadap@...dia.com>
Cc: Mario Limonciello <superm1@...nel.org>, Bjorn Helgaas
<bhelgaas@...gle.com>, Alex Deucher <alexander.deucher@....com>, Christian
König <christian.koenig@....com>, David Airlie
<airlied@...il.com>, Simona Vetter <simona@...ll.ch>, Lukas Wunner
<lukas@...ner.de>, Maarten Lankhorst <maarten.lankhorst@...ux.intel.com>,
Maxime Ripard <mripard@...nel.org>, Thomas Zimmermann
<tzimmermann@...e.de>, David Woodhouse <dwmw2@...radead.org>, Lu Baolu
<baolu.lu@...ux.intel.com>, Joerg Roedel <joro@...tes.org>, Will Deacon
<will@...nel.org>, Robin Murphy <robin.murphy@....com>, Jaroslav Kysela
<perex@...ex.cz>, Takashi Iwai <tiwai@...e.com>, "open list:DRM DRIVERS"
<dri-devel@...ts.freedesktop.org>, open list
<linux-kernel@...r.kernel.org>, "open list:INTEL IOMMU (VT-d)"
<iommu@...ts.linux.dev>, "open list:PCI SUBSYSTEM"
<linux-pci@...r.kernel.org>, "open list:VFIO DRIVER" <kvm@...r.kernel.org>,
"open list:SOUND" <linux-sound@...r.kernel.org>, Mario Limonciello
<mario.limonciello@....com>
Subject: Re: [PATCH v2 6/6] vgaarb: Look at all PCI display devices in VGA
arbiter
On Tue, 17 Jun 2025 15:56:26 -0500
Daniel Dadap <ddadap@...dia.com> wrote:
> On Tue, Jun 17, 2025 at 03:15:35PM -0500, Mario Limonciello wrote:
> >
> >
> > On 6/17/25 2:20 PM, Daniel Dadap wrote:
> > > On Tue, Jun 17, 2025 at 12:59:10PM -0500, Mario Limonciello wrote:
> > > > From: Mario Limonciello <mario.limonciello@....com>
> > > >
> > > > On a mobile system with an AMD integrated GPU + NVIDIA discrete GPU the
> > > > AMD GPU is not being selected by some desktop environments for any
> > > > rendering tasks. This is because neither GPU is being treated as
> > > > "boot_vga" but that is what some environments use to select a GPU [1].
> > > >
> > > > The VGA arbiter driver only looks at devices that report as PCI display
> > > > VGA class devices. Neither GPU on the system is a PCI display VGA class
> > > > device:
> > > >
> > > > c5:00.0 3D controller: NVIDIA Corporation Device 2db9 (rev a1)
> > > > c6:00.0 Display controller: Advanced Micro Devices, Inc. [AMD/ATI] Device 150e (rev d1)
> > > >
> > > > If the GPUs were looked at the vga_is_firmware_default() function actually
> > > > does do a good job at recognizing the case from the device used for the
> > > > firmware framebuffer.
> > > >
> > > > Modify the VGA arbiter code and matching sysfs file entries to examine all
> > > > PCI display class devices. The existing logic stays the same.
> > > >
> > > > This will cause all GPUs to gain a `boot_vga` file, but the correct device
> > > > (AMD GPU in this case) will now show `1` and the incorrect device shows `0`.
> > > > Userspace then picks the right device as well.
> > > >
> > > > Link: https://github.com/robherring/libpciaccess/commit/b2838fb61c3542f107014b285cbda097acae1e12 [1]
> > > > Suggested-by: Daniel Dadap <ddadap@...dia.com>
> > > > Acked-by: Thomas Zimmermann <tzimmermann@...e.de>
> > > > Signed-off-by: Mario Limonciello <mario.limonciello@....com>
> > > > ---
> > > > drivers/pci/pci-sysfs.c | 2 +-
> > > > drivers/pci/vgaarb.c | 8 ++++----
> > > > 2 files changed, 5 insertions(+), 5 deletions(-)
> > > >
> > > > diff --git a/drivers/pci/pci-sysfs.c b/drivers/pci/pci-sysfs.c
> > > > index 268c69daa4d57..c314ee1b3f9ac 100644
> > > > --- a/drivers/pci/pci-sysfs.c
> > > > +++ b/drivers/pci/pci-sysfs.c
> > > > @@ -1707,7 +1707,7 @@ static umode_t pci_dev_attrs_are_visible(struct kobject *kobj,
> > > > struct device *dev = kobj_to_dev(kobj);
> > > > struct pci_dev *pdev = to_pci_dev(dev);
> > > > - if (a == &dev_attr_boot_vga.attr && pci_is_vga(pdev))
> > > > + if (a == &dev_attr_boot_vga.attr && pci_is_display(pdev))
> > > > return a->mode;
> > >
> > > I can't help but worry about userspace clients that might be checking for
> > > the presence of the boot_vga sysfs file but don't check its contents.
> >
> > Wouldn't those clients "already" be broken by such an assumption?
> > We know today that there are systems with two VGA devices in them too.
> >
>
> Yes, for systems with multiple VGA devices, which is an uncommon case. I
> think that on systems with one VGA device and one 3D device, which is
> probably the most common case, this change might break such clients.
FWIW, this is exactly the opposite of what I'd expect is the common
case. IME, an integrated GPU and discrete GPU, or multiple discrete
GPUs are all VGA devices.
> > I'd think those should have both GPUs exporting a file and one having a 0
> > the other 1.
>
> Yeah, agreed. I'd consider it a userspace bug if the client only tests for
> the presence of the file but doesn't look at its contents, but it's still
> preferable not to break (hypothetical, buggy) clients unnecessarily. One
> could make a philosophical argument that "boot_vga" should really mean VGA
> subclass, as the name implies, but even so I think that, in lieu of a new
> interface to report what the desktop environments are actually trying to
> test for (which nobody uses yet because it doesn't exist), exposing the
> boot_vga file for a non-VGA GPU in the special case of there being zero
> VGA GPUs on the system is a reasonable and practical compromise to allow
> existing code to work on the zero-VGA systems.
>
> I think it ultimately comes down to a semantic argument about what "VGA"
> is really supposed to mean here. There's the real, honest-to-goodness VGA
> interface with INT 10h and VBE, and then there's the common de facto sort
> of shorthand convention (commonly but not universally followed) where VGA
> means it can drive displays and 3D means it can't. It used to be the case
> (at least on x86) that display controllers which could drive real display
> hardware were always VGA-compatible, and display controllers were not VGA
> compatible could never drive real display hardware, which I think is how
> that convention originated, but on UEFI systems with no CSM support, it's
> not necessarily true any more. However, there's so much existing software
> out there that conflates VGA-ness with display-ness that some controllers
> with no actual VGA support get listed with the VGA controller subclass to
> avoid breaking such software.
>
> If you go by the language of the definitions for the subclasses of PCI
> base class 03h, it seems pretty clear that the VGA subclass is supposed
> to mean actually compatible with real honest-to-goodness VGA. So those
> non-VGA devices that pretend to be VGA for software compatibility aren't
> following the spec. I'd be willing to wager that the system in question
> is being accurate when it says that it has no VGA controllers. It is
> arguably a userspace bug that these desktop environments are testing for
> "VGA" when they really probably mean something else, but it will probably
> take some time to hunt down everything that's relying on boot_vga for
> possibly wrong reasons, and I think the pragmatic option is to lie about
> it until we have a better way to test for whatever the desktops really
> want to know, and that better way is widely used. But it would be nice to
> limit the lying to cases where it unbreaks things if we can.
I don't know if you have wiggle room with boot_vga specifically, I
generally take it at face value that it's a VGA device and imo seems
inconsistent to suggest otherwise. I do note however that there's
really no philosophical discussion related to the VGA arbiter, it is
managing devices and routing among them according to the strict PCI
definition of VGA.
Elsewhere in the kernel we can see that vga_default_device() is being
used for strictly VGA related things, the VGA shadow ROM and legacy VGA
resource aperture resolution for instance. It's unfortunate that the
x86 video_is_primary_device() relies on it, but that seems like a
growing pain of introducing non-VGA displays on a largely legacy
encumbered architecture and should be addressed.
Note that it should probably be considered whether VGA_ARB_MAX_GPUS
needs a new default value if all display adapters were to be included.
Thanks,
Alex
Powered by blists - more mailing lists