[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <88EB3F90-128B-41D9-B57E-BF0EFC427252@nvidia.com>
Date: Tue, 17 Jun 2025 23:01:36 +0000
From: Daniel Dadap <ddadap@...dia.com>
To: Alex Williamson <alex.williamson@...hat.com>
CC: Mario Limonciello <superm1@...nel.org>, Bjorn Helgaas
<bhelgaas@...gle.com>, Alex Deucher <alexander.deucher@....com>,
Christian König <christian.koenig@....com>, David Airlie
<airlied@...il.com>, Simona Vetter <simona@...ll.ch>, Lukas Wunner
<lukas@...ner.de>, Maarten Lankhorst <maarten.lankhorst@...ux.intel.com>,
Maxime Ripard <mripard@...nel.org>, Thomas Zimmermann <tzimmermann@...e.de>,
David Woodhouse <dwmw2@...radead.org>, Lu Baolu <baolu.lu@...ux.intel.com>,
Joerg Roedel <joro@...tes.org>, Will Deacon <will@...nel.org>, Robin Murphy
<robin.murphy@....com>, Jaroslav Kysela <perex@...ex.cz>, Takashi Iwai
<tiwai@...e.com>, "open list:DRM DRIVERS" <dri-devel@...ts.freedesktop.org>,
open list <linux-kernel@...r.kernel.org>, "open list:INTEL IOMMU"
<iommu@...ts.linux.dev>, "open list:PCI SUBSYSTEM"
<linux-pci@...r.kernel.org>, "open list:VFIO DRIVER" <kvm@...r.kernel.org>,
"open list:SOUND" <linux-sound@...r.kernel.org>, Mario Limonciello
<mario.limonciello@....com>
Subject: Re: [PATCH v2 6/6] vgaarb: Look at all PCI display devices in VGA
arbiter
> On Jun 17, 2025, at 16:50, Alex Williamson <alex.williamson@...hat.com> wrote:
>
> On Tue, 17 Jun 2025 15:56:26 -0500
> Daniel Dadap <ddadap@...dia.com> wrote:
>
>>> On Tue, Jun 17, 2025 at 03:15:35PM -0500, Mario Limonciello wrote:
>>>
>>>
>>> On 6/17/25 2:20 PM, Daniel Dadap wrote:
>>>> On Tue, Jun 17, 2025 at 12:59:10PM -0500, Mario Limonciello wrote:
>>>>> From: Mario Limonciello <mario.limonciello@....com>
>>>>>
>>>>> On a mobile system with an AMD integrated GPU + NVIDIA discrete GPU the
>>>>> AMD GPU is not being selected by some desktop environments for any
>>>>> rendering tasks. This is because neither GPU is being treated as
>>>>> "boot_vga" but that is what some environments use to select a GPU [1].
>>>>>
>>>>> The VGA arbiter driver only looks at devices that report as PCI display
>>>>> VGA class devices. Neither GPU on the system is a PCI display VGA class
>>>>> device:
>>>>>
>>>>> c5:00.0 3D controller: NVIDIA Corporation Device 2db9 (rev a1)
>>>>> c6:00.0 Display controller: Advanced Micro Devices, Inc. [AMD/ATI] Device 150e (rev d1)
>>>>>
>>>>> If the GPUs were looked at the vga_is_firmware_default() function actually
>>>>> does do a good job at recognizing the case from the device used for the
>>>>> firmware framebuffer.
>>>>>
>>>>> Modify the VGA arbiter code and matching sysfs file entries to examine all
>>>>> PCI display class devices. The existing logic stays the same.
>>>>>
>>>>> This will cause all GPUs to gain a `boot_vga` file, but the correct device
>>>>> (AMD GPU in this case) will now show `1` and the incorrect device shows `0`.
>>>>> Userspace then picks the right device as well.
>>>>>
>>>>> Link: https://github.com/robherring/libpciaccess/commit/b2838fb61c3542f107014b285cbda097acae1e12 [1]
>>>>> Suggested-by: Daniel Dadap <ddadap@...dia.com>
>>>>> Acked-by: Thomas Zimmermann <tzimmermann@...e.de>
>>>>> Signed-off-by: Mario Limonciello <mario.limonciello@....com>
>>>>> ---
>>>>> drivers/pci/pci-sysfs.c | 2 +-
>>>>> drivers/pci/vgaarb.c | 8 ++++----
>>>>> 2 files changed, 5 insertions(+), 5 deletions(-)
>>>>>
>>>>> diff --git a/drivers/pci/pci-sysfs.c b/drivers/pci/pci-sysfs.c
>>>>> index 268c69daa4d57..c314ee1b3f9ac 100644
>>>>> --- a/drivers/pci/pci-sysfs.c
>>>>> +++ b/drivers/pci/pci-sysfs.c
>>>>> @@ -1707,7 +1707,7 @@ static umode_t pci_dev_attrs_are_visible(struct kobject *kobj,
>>>>> struct device *dev = kobj_to_dev(kobj);
>>>>> struct pci_dev *pdev = to_pci_dev(dev);
>>>>> - if (a == &dev_attr_boot_vga.attr && pci_is_vga(pdev))
>>>>> + if (a == &dev_attr_boot_vga.attr && pci_is_display(pdev))
>>>>> return a->mode;
>>>>
>>>> I can't help but worry about userspace clients that might be checking for
>>>> the presence of the boot_vga sysfs file but don't check its contents.
>>>
>>> Wouldn't those clients "already" be broken by such an assumption?
>>> We know today that there are systems with two VGA devices in them too.
>>>
>>
>> Yes, for systems with multiple VGA devices, which is an uncommon case. I
>> think that on systems with one VGA device and one 3D device, which is
>> probably the most common case, this change might break such clients.
>
> FWIW, this is exactly the opposite of what I'd expect is the common
> case. IME, an integrated GPU and discrete GPU, or multiple discrete
> GPUs are all VGA devices.
>
Yeah, I guess I should clarify the context. On a desktop system with an integrated GPU and one or more add-in-board discrete GPUs, I’d expect everything to be a VGA controller. On a hybrid notebook system with one iGPU and one dGPU, I’d expect either both to be VGA controllers, or the boot device to be VGA and the he secondary to be 3D, with the latter being much more common in my observation on newer systems. On a UEFI system with CSM, only the boot device needs to support VGA, so there’s no reason for additional GPUs beyond the boot device to also support it.
The system Mario is talking about here, where both iGPU and dGPU are 3D controllers, is most likely a UEFI-only system with no CSM, where it’s not necessary for any of the GPUs to support VGA. I have seen a few notebook systems that ditch the CSM in recent days. Such systems will probably only become more common. I have not yet seen any CSM-less desktop systems, probably because manufacturers want to support people plugging in their old MBR hard drives, but in theory they should be possible and there is very likely already such a system out there that I just haven’t seen.
>>> I'd think those should have both GPUs exporting a file and one having a 0
>>> the other 1.
>>
>> Yeah, agreed. I'd consider it a userspace bug if the client only tests for
>> the presence of the file but doesn't look at its contents, but it's still
>> preferable not to break (hypothetical, buggy) clients unnecessarily. One
>> could make a philosophical argument that "boot_vga" should really mean VGA
>> subclass, as the name implies, but even so I think that, in lieu of a new
>> interface to report what the desktop environments are actually trying to
>> test for (which nobody uses yet because it doesn't exist), exposing the
>> boot_vga file for a non-VGA GPU in the special case of there being zero
>> VGA GPUs on the system is a reasonable and practical compromise to allow
>> existing code to work on the zero-VGA systems.
>>
>> I think it ultimately comes down to a semantic argument about what "VGA"
>> is really supposed to mean here. There's the real, honest-to-goodness VGA
>> interface with INT 10h and VBE, and then there's the common de facto sort
>> of shorthand convention (commonly but not universally followed) where VGA
>> means it can drive displays and 3D means it can't. It used to be the case
>> (at least on x86) that display controllers which could drive real display
>> hardware were always VGA-compatible, and display controllers were not VGA
>> compatible could never drive real display hardware, which I think is how
>> that convention originated, but on UEFI systems with no CSM support, it's
>> not necessarily true any more. However, there's so much existing software
>> out there that conflates VGA-ness with display-ness that some controllers
>> with no actual VGA support get listed with the VGA controller subclass to
>> avoid breaking such software.
>>
>> If you go by the language of the definitions for the subclasses of PCI
>> base class 03h, it seems pretty clear that the VGA subclass is supposed
>> to mean actually compatible with real honest-to-goodness VGA. So those
>> non-VGA devices that pretend to be VGA for software compatibility aren't
>> following the spec. I'd be willing to wager that the system in question
>> is being accurate when it says that it has no VGA controllers. It is
>> arguably a userspace bug that these desktop environments are testing for
>> "VGA" when they really probably mean something else, but it will probably
>> take some time to hunt down everything that's relying on boot_vga for
>> possibly wrong reasons, and I think the pragmatic option is to lie about
>> it until we have a better way to test for whatever the desktops really
>> want to know, and that better way is widely used. But it would be nice to
>> limit the lying to cases where it unbreaks things if we can.
>
> I don't know if you have wiggle room with boot_vga specifically, I
> generally take it at face value that it's a VGA device and imo seems
> inconsistent to suggest otherwise. I do note however that there's
> really no philosophical discussion related to the VGA arbiter, it is
> managing devices and routing among them according to the strict PCI
> definition of VGA.
>
> Elsewhere in the kernel we can see that vga_default_device() is being
> used for strictly VGA related things, the VGA shadow ROM and legacy VGA
> resource aperture resolution for instance. It's unfortunate that the
> x86 video_is_primary_device() relies on it, but that seems like a
> growing pain of introducing non-VGA displays on a largely legacy
> encumbered architecture and should be addressed.
>
> Note that it should probably be considered whether VGA_ARB_MAX_GPUS
> needs a new default value if all display adapters were to be included.
> Thanks,
>
> Alex
>
Powered by blists - more mailing lists