linux-kernel - Re: [PATCHv2] vgaarb: Add module param to allow for choosing the boot VGA device

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [day] [month] [year] [list]

Message-ID: <93acb310-ede4-cd9d-e470-2375971a451@absolutedigital.net>
Date:   Tue, 5 Jul 2022 16:42:17 -0400 (EDT)
From:   Cal Peake <cp@...olutedigital.net>
To:     Alex Williamson <alex.williamson@...hat.com>
cc:     Bjorn Helgaas <helgaas@...nel.org>,
        Randy Dunlap <rdunlap@...radead.org>,
        Kernel Mailing List <linux-kernel@...r.kernel.org>,
        Bjorn Helgaas <bhelgaas@...gle.com>,
        Huacai Chen <chenhuacai@...nel.org>, linux-pci@...r.kernel.org,
        Cornelia Huck <cohuck@...hat.com>, kvm@...r.kernel.org
Subject: Re: [PATCHv2] vgaarb: Add module param to allow for choosing the
 boot VGA device

On Tue, 5 Jul 2022, Alex Williamson wrote:

> > > +	ret = sscanf(input, "%x:%x.%x", &bus, &dev, &func);
> > > +	if (ret != 3) {
> > > +		pr_warn("Improperly formatted PCI ID: %s\n", input);
> > > +		return;
> > > +	}
> 
> See pci_dev_str_match()

Hi Alex, thanks for the feedback. I'll add this if we wind up going with 
some version of my patch.

> > > +	if (boot_vga && boot_vga->is_chosen_one)
> > > +		return false;
> > > +
> > > +	if (bootdev_id == PCI_DEVID(pdev->bus->number, pdev->devfn)) {
> > > +		vgadev->is_chosen_one = true;
> > > +		return true;
> > > +	}
> 
> This seems too simplistic, for example PCI code determines whether the
> ROM is a shadow ROM at 0xc0000 based on whether it's the
> vga_default_device() where that default device is set in
> vga_arbiter_add_pci_device() based on the value returned by
> this vga_is_boot_device() function.  A user wishing to specify the boot
> VGA device doesn't magically make that device's ROM shadowed into this
> location.
> 

I think I understand what you're saying. We're not telling the system what 
the boot device is, it's telling us?

> I also don't see how this actually enables VGA routing to the user
> selected device, where we generally expect the boot device already has
> this enabled.
> 
> Furthermore, what's the initialization state of the selected device, if
> it has not had its option ROM executed, is it necessarily in a state to
> accept VGA commands?  If we're changing the default VGA device, are we
> fully uncoupling from any firmware notions of the console device?
> Thanks,

Unfortunately, I'm not the best qualified to answer these questions. My 
understanding is mostly surface-level until I start digging into the code.

I think the answer to most of them though might be that the UEFI firmware
initializes both cards.

During POST, I do get output on both GPUs. One gets the static BIOS text 
(Copyright AMI etc.) -- this is the one selected as boot device -- and the 
other gets the POST-code counting up.

Once the firmware hands off to the bootloader, whichever GPU has the 
active display (both GPUs go to the same display, the input source gets 
switched depending on whether I'm using the host or the VM) is where 
the bootloader output is.

When the bootloader hands off to the kernel, the boot device chosen by the 
firmware gets the kernel output. If that's the host GPU, then everything 
is fine.

If that's the VM GPU, then it gets the kernel output until the vfio-pci 
driver loads and then all output stops. Back on the host GPU, the screen 
is black until the X server spawns[1] but I get no VTs.

With my patch, telling the arbiter that the host GPU is always the boot 
device results in everything just working.

With all that said, if you feel this isn't the right way to go, do you 
have any thoughts on what would be a better path to try?

Thanks,

-- 
Cal Peake

[1] I said in a previous email that this only happened when I set 
VGA_ARB_MAX_GPUS=1, but after doing some more testing just now, it seems I 
was wrong and the X server was just taking longer than expected to load.