lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite for Android: free password hash cracker in your pocket
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <aFHWejvqNpGv-3UI@ddadap-lakeline.nvidia.com>
Date: Tue, 17 Jun 2025 15:56:26 -0500
From: Daniel Dadap <ddadap@...dia.com>
To: Mario Limonciello <superm1@...nel.org>
Cc: Bjorn Helgaas <bhelgaas@...gle.com>,
	Alex Deucher <alexander.deucher@....com>,
	Christian König <christian.koenig@....com>,
	David Airlie <airlied@...il.com>, Simona Vetter <simona@...ll.ch>,
	Lukas Wunner <lukas@...ner.de>,
	Maarten Lankhorst <maarten.lankhorst@...ux.intel.com>,
	Maxime Ripard <mripard@...nel.org>,
	Thomas Zimmermann <tzimmermann@...e.de>,
	David Woodhouse <dwmw2@...radead.org>,
	Lu Baolu <baolu.lu@...ux.intel.com>, Joerg Roedel <joro@...tes.org>,
	Will Deacon <will@...nel.org>, Robin Murphy <robin.murphy@....com>,
	Alex Williamson <alex.williamson@...hat.com>,
	Jaroslav Kysela <perex@...ex.cz>, Takashi Iwai <tiwai@...e.com>,
	"open list:DRM DRIVERS" <dri-devel@...ts.freedesktop.org>,
	open list <linux-kernel@...r.kernel.org>,
	"open list:INTEL IOMMU (VT-d)" <iommu@...ts.linux.dev>,
	"open list:PCI SUBSYSTEM" <linux-pci@...r.kernel.org>,
	"open list:VFIO DRIVER" <kvm@...r.kernel.org>,
	"open list:SOUND" <linux-sound@...r.kernel.org>,
	Mario Limonciello <mario.limonciello@....com>
Subject: Re: [PATCH v2 6/6] vgaarb: Look at all PCI display devices in VGA
 arbiter

On Tue, Jun 17, 2025 at 03:15:35PM -0500, Mario Limonciello wrote:
> 
> 
> On 6/17/25 2:20 PM, Daniel Dadap wrote:
> > On Tue, Jun 17, 2025 at 12:59:10PM -0500, Mario Limonciello wrote:
> > > From: Mario Limonciello <mario.limonciello@....com>
> > > 
> > > On a mobile system with an AMD integrated GPU + NVIDIA discrete GPU the
> > > AMD GPU is not being selected by some desktop environments for any
> > > rendering tasks. This is because neither GPU is being treated as
> > > "boot_vga" but that is what some environments use to select a GPU [1].
> > > 
> > > The VGA arbiter driver only looks at devices that report as PCI display
> > > VGA class devices. Neither GPU on the system is a PCI display VGA class
> > > device:
> > > 
> > > c5:00.0 3D controller: NVIDIA Corporation Device 2db9 (rev a1)
> > > c6:00.0 Display controller: Advanced Micro Devices, Inc. [AMD/ATI] Device 150e (rev d1)
> > > 
> > > If the GPUs were looked at the vga_is_firmware_default() function actually
> > > does do a good job at recognizing the case from the device used for the
> > > firmware framebuffer.
> > > 
> > > Modify the VGA arbiter code and matching sysfs file entries to examine all
> > > PCI display class devices. The existing logic stays the same.
> > > 
> > > This will cause all GPUs to gain a `boot_vga` file, but the correct device
> > > (AMD GPU in this case) will now show `1` and the incorrect device shows `0`.
> > > Userspace then picks the right device as well.
> > > 
> > > Link: https://github.com/robherring/libpciaccess/commit/b2838fb61c3542f107014b285cbda097acae1e12 [1]
> > > Suggested-by: Daniel Dadap <ddadap@...dia.com>
> > > Acked-by: Thomas Zimmermann <tzimmermann@...e.de>
> > > Signed-off-by: Mario Limonciello <mario.limonciello@....com>
> > > ---
> > >   drivers/pci/pci-sysfs.c | 2 +-
> > >   drivers/pci/vgaarb.c    | 8 ++++----
> > >   2 files changed, 5 insertions(+), 5 deletions(-)
> > > 
> > > diff --git a/drivers/pci/pci-sysfs.c b/drivers/pci/pci-sysfs.c
> > > index 268c69daa4d57..c314ee1b3f9ac 100644
> > > --- a/drivers/pci/pci-sysfs.c
> > > +++ b/drivers/pci/pci-sysfs.c
> > > @@ -1707,7 +1707,7 @@ static umode_t pci_dev_attrs_are_visible(struct kobject *kobj,
> > >   	struct device *dev = kobj_to_dev(kobj);
> > >   	struct pci_dev *pdev = to_pci_dev(dev);
> > > -	if (a == &dev_attr_boot_vga.attr && pci_is_vga(pdev))
> > > +	if (a == &dev_attr_boot_vga.attr && pci_is_display(pdev))
> > >   		return a->mode;
> > 
> > I can't help but worry about userspace clients that might be checking for
> > the presence of the boot_vga sysfs file but don't check its contents.
> 
> Wouldn't those clients "already" be broken by such an assumption?
> We know today that there are systems with two VGA devices in them too.
>

Yes, for systems with multiple VGA devices, which is an uncommon case. I
think that on systems with one VGA device and one 3D device, which is
probably the most common case, this change might break such clients.
 
> I'd think those should have both GPUs exporting a file and one having a 0
> the other 1.

Yeah, agreed. I'd consider it a userspace bug if the client only tests for
the presence of the file but doesn't look at its contents, but it's still
preferable not to break (hypothetical, buggy) clients unnecessarily. One
could make a philosophical argument that "boot_vga" should really mean VGA
subclass, as the name implies, but even so I think that, in lieu of a new
interface to report what the desktop environments are actually trying to
test for (which nobody uses yet because it doesn't exist), exposing the
boot_vga file for a non-VGA GPU in the special case of there being zero
VGA GPUs on the system is a reasonable and practical compromise to allow
existing code to work on the zero-VGA systems.

I think it ultimately comes down to a semantic argument about what "VGA"
is really supposed to mean here. There's the real, honest-to-goodness VGA
interface with INT 10h and VBE, and then there's the common de facto sort
of shorthand convention (commonly but not universally followed) where VGA
means it can drive displays and 3D means it can't. It used to be the case
(at least on x86) that display controllers which could drive real display
hardware were always VGA-compatible, and display controllers were not VGA
compatible could never drive real display hardware, which I think is how
that convention originated, but on UEFI systems with no CSM support, it's
not necessarily true any more. However, there's so much existing software
out there that conflates VGA-ness with display-ness that some controllers
with no actual VGA support get listed with the VGA controller subclass to
avoid breaking such software.

If you go by the language of the definitions for the subclasses of PCI
base class 03h, it seems pretty clear that the VGA subclass is supposed
to mean actually compatible with real honest-to-goodness VGA. So those
non-VGA devices that pretend to be VGA for software compatibility aren't
following the spec. I'd be willing to wager that the system in question
is being accurate when it says that it has no VGA controllers. It is
arguably a userspace bug that these desktop environments are testing for
"VGA" when they really probably mean something else, but it will probably
take some time to hunt down everything that's relying on boot_vga for
possibly wrong reasons, and I think the pragmatic option is to lie about
it until we have a better way to test for whatever the desktops really
want to know, and that better way is widely used. But it would be nice to
limit the lying to cases where it unbreaks things if we can.

> 
> > I
> > understand that it's the intention to expose the file for non-VGA display
> > controllers in the case where none of the display controllers are of the
> > VGA subclass, but one of them is the boot console device and should be
> > considered "VGA" for the purposes of the overloaded meaning of "VGA", but
> > if it isn't too much trouble to minimize the change to UAPI here, I'd be
> > more comfortable with only exposing this file for devices that really are
> > VGA and/or the firmware default.
> > 
> > Maybe something like making the condition:
> > 
> > if (a == &dev_attr_boot_vga.attr) {
> > 	if (pci_is_vga(pdev) ||
> > 	    (pci_is_display(pdev) && vga_default_device() == pdev))
> > 		return a->mode;
> > }
> > 
> > (maybe we don't even need the pci_is_display() check at that point? I
> > feel more comfortable leaving it in, though)
> 
> I suppose it depends upon call order whether the above works or not.
> 
> I'm not sure 'off hand' right now.
> 
> > > I'd expect that to do something like (assuming two-GPU hybrid system):
> > 
> > * Systems with one VGA controller and one 3D controller:
> >    * VGA controller gets boot_vga file, contents are "1"
> >    * 3D controller does not get boot_vga file
> > * Systems with no VGA controllers and two 3D controllers:
> >    * 3D controller driving the console gets boot_vga file: "1"
> >    * 3D controller not driving the console does not get boot_vga file
> > * Systems with two VGA controllers and no 3D controllers:
> >    * VGA controller driving the console gets boot_vga file: "1"
> >    * VGA controller not driving the console gets boot_vga file: "0"
> > 
> > i.e., the behavior would only be visibly different in the case with two
> > 3D controllers, like the one targeted by this patch. You and I have seen
> > the two VGA controller case in the wild, so we know it exists.
> 
> Yeah I wish we had some more data from that reporter right now to
> potentially support a proposal that would help their system too.
> 
> This patch as it is today will only help case 1 and 2.

I don't think case 1 needs any help: the behavior I describe above is what
I expect the existing behavior to be. However if you expose boot_vga files
for all display controllers, the behavior for case 1 (which I expect to be
the common case) will be different after that change: both controllers get
a boot_vga file with different contents, versus only the VGA controller
having a boot_vga file previously.

> 
> > The one 3D
> > and one VGA controller case is what I'd expect to be the common one, and
> > hopefully this will have the same behavior before and after this change
> > regardless of whether a muxed system defaults to dGPU (like hybrid Mac
> > notebooks) or iGPU (like other hybrid systems I'm accustomed to).
> > 
> > >   	return 0;
> > > diff --git a/drivers/pci/vgaarb.c b/drivers/pci/vgaarb.c
> > > index 78748e8d2dbae..63216e5787d73 100644
> > > --- a/drivers/pci/vgaarb.c
> > > +++ b/drivers/pci/vgaarb.c
> > > @@ -1499,8 +1499,8 @@ static int pci_notify(struct notifier_block *nb, unsigned long action,
> > >   	vgaarb_dbg(dev, "%s\n", __func__);
> > > -	/* Only deal with VGA class devices */
> > > -	if (!pci_is_vga(pdev))
> > > +	/* Only deal with PCI display class devices */
> > > +	if (!pci_is_display(pdev))
> > >   		return 0;
> > >   	/*
> > > @@ -1546,12 +1546,12 @@ static int __init vga_arb_device_init(void)
> > >   	bus_register_notifier(&pci_bus_type, &pci_notifier);
> > > -	/* Add all VGA class PCI devices by default */
> > > +	/* Add all PCI display class devices by default */
> > >   	pdev = NULL;
> > >   	while ((pdev =
> > >   		pci_get_subsys(PCI_ANY_ID, PCI_ANY_ID, PCI_ANY_ID,
> > >   			       PCI_ANY_ID, pdev)) != NULL) {
> > > -		if (pci_is_vga(pdev))
> > > +		if (pci_is_display(pdev))
> > >   			vga_arbiter_add_pci_device(pdev);
> > >   	}
> > > -- 
> > > 2.43.0
> > > 
> 

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ