lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <20230906132904.4e49e269.alex.williamson@redhat.com>
Date:   Wed, 6 Sep 2023 13:29:04 -0600
From:   Alex Williamson <alex.williamson@...hat.com>
To:     Sui Jingfeng <sui.jingfeng@...ux.dev>
Cc:     Sui Jingfeng <suijingfeng@...ngson.cn>,
        nouveau@...ts.freedesktop.org, intel-gfx@...ts.freedesktop.org,
        linux-kernel@...r.kernel.org, dri-devel@...ts.freedesktop.org,
        amd-gfx@...ts.freedesktop.org,
        Thomas Zimmermann <tzimmermann@...e.de>,
        linux-pci@...r.kernel.org, Bjorn Helgaas <bhelgaas@...gle.com>,
        "Koenig, Christian" <Christian.Koenig@....com>
Subject: Re: [RFC, drm-misc-next v4 0/9] PCI/VGA: Allowing the user to
 select the primary video adapter at boot time

On Wed, 6 Sep 2023 11:51:59 +0800
Sui Jingfeng <sui.jingfeng@...ux.dev> wrote:

> Hi,
> 
> 
> On 2023/9/5 22:52, Alex Williamson wrote:
> > On Tue,  5 Sep 2023 03:57:15 +0800
> > Sui Jingfeng <sui.jingfeng@...ux.dev> wrote:
> >  
> >> From: Sui Jingfeng <suijingfeng@...ngson.cn>
> >>
> >> On a machine with multiple GPUs, a Linux user has no control over which
> >> one is primary at boot time. This series tries to solve above mentioned
> >> problem by introduced the ->be_primary() function stub. The specific
> >> device drivers can provide an implementation to hook up with this stub by
> >> calling the vga_client_register() function.
> >>
> >> Once the driver bound the device successfully, VGAARB will call back to
> >> the device driver. To query if the device drivers want to be primary or
> >> not. Device drivers can just pass NULL if have no such needs.
> >>
> >> Please note that:
> >>
> >> 1) The ARM64, Loongarch, Mips servers have a lot PCIe slot, and I would
> >>     like to mount at least three video cards.
> >>
> >> 2) Typically, those non-86 machines don't have a good UEFI firmware
> >>     support, which doesn't support select primary GPU as firmware stage.
> >>     Even on x86, there are old UEFI firmwares which already made undesired
> >>     decision for you.
> >>
> >> 3) This series is attempt to solve the remain problems at the driver level,
> >>     while another series[1] of me is target to solve the majority of the
> >>     problems at device level.
> >>
> >> Tested (limited) on x86 with four video card mounted, Intel UHD Graphics
> >> 630 is the default boot VGA, successfully override by ast2400 with
> >> ast.modeset=10 append at the kernel cmd line.
> >>
> >> $ lspci | grep VGA
> >>
> >>   00:02.0 VGA compatible controller: Intel Corporation CoffeeLake-S GT2 [UHD Graphics 630]  
> > In all my previous experiments with VGA routing and IGD I found that
> > IGD can't actually release VGA routing and Intel confirmed the hardware
> > doesn't have the ability to do so.  
> 
> Which model of the IGD you are using? even for the IGD in Atom D2550,
> the legacy 128KB VGA memory range can be tuned to be mapped to IGD
> or to the DMI Interface. See the 1.7.3.2 section of the N2000 datasheet[1].

I believe it's the VGA I/O that can't be disabled, there's no means to
do so other than the I/O enable bit in the command register and iirc
the driver depends on this for other features.  The history of this is
pretty old, but here are some links:

https://lore.kernel.org/all/1376486637.31494.19.camel@ul30vt.home/
https://bbs.archlinux.org/viewtopic.php?pid=1400212#p1400212
https://lore.kernel.org/all/20130815223917.27890.28003.stgit@bling.home/
https://lore.kernel.org/all/20130824144701.23370.42110.stgit@bling.home/
https://lore.kernel.org/all/20140509201655.2849.97478.stgit@bling.home/

I think the issue was that i915 doesn't claim to the VGA arbiter to be
controlling legacy VGA ranges, but in fact the hardware does claim
those ranges.  We can "fix" i915 to report that VGA MMIO space is
owned and can be controlled, but then Xorg likely sees multiple VGA
arbiter clients and disables DRI because it wants to mmap VGA MMIO
space.

Therefore unless something has changed in the past 10yrs, i915 owns but
does not advertise ownership of the VGA address spaces and therefore
the arbiter can't and doesn't know to change VGA routing to enable a
"be_primary" path to another device.
 
> If a specific model of Intel has a bug in the VGA routing hardware logic unit,
> I would like to ignore it. Or switch to the UEFI firmware on such hardware.

That's a convenient and impractical approach.  I expect all Intel HD
graphics has this issue.  Unknown for Xe.

> It is the hardware engineer's responsibility, I will not worry about it.

We often need to deal with broken hardware in the kernel.

> Thanks for you tell this.
> 
> [1] https://www.intel.com/content/dam/doc/datasheet/atom-d2000-n2000-vol-2-datasheet.pdf
> 
> 
> >   It will always be primary from a
> > VGA routing perspective.  Was this actually tested with non-UEFI?  
> 
> 
> As you already said, the generous Intel already have confirmed that the hardware defect.
> So probably this is a good chance to switch to UEFI to solve the problem. Then, no
> testing for legacy is needed.

Then why are we hacking on VGA arbitration in this series at all?

> > I suspect it might only work in UEFI mode where we probably don't
> > actually have a dependency on VGA routing.  This is essentially why
> > vfio requires UEFI ROMs when assigning GPUs to VMs, VGA routing is too
> > broken to use on Intel systems with IGD.  Thanks,  
> 
> Thanks for you tell me this.
> 
> To be honest, I have only tested my patch on machines with UEFI firmware.
> Since UEFI because the main stream, but if this patch is really useful for
> majority machine, I'm satisfied. The results is not too bad.

This looks like a pretty significant scoping issue if you're proposing
changes to the VGA arbiter which specifically handles the routing of
legacy VGA address spaces but are not willing to commit to testing
legacy configurations.  Thanks,

Alex

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ