lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <CABXGCsMBWwRFRA+EJKF0v6BwZ+uTQHr4Yn9E9_iYgZ6KRbwsJQ@mail.gmail.com>
Date: Fri, 15 Dec 2023 16:45:47 +0500
From: Mikhail Gavrilov <mikhail.v.gavrilov@...il.com>
To: Christian König <ckoenig.leichtzumerken@...il.com>
Cc: amd-gfx list <amd-gfx@...ts.freedesktop.org>, 
	dri-devel <dri-devel@...ts.freedesktop.org>, 
	Linux List Kernel Mailing <linux-kernel@...r.kernel.org>, 
	"Deucher, Alexander" <Alexander.Deucher@....com>
Subject: Re: amdgpu didn't start with pci=nocrs parameter, get error "Fatal
 error during GPU init"

On Tue, Feb 28, 2023 at 5:43 PM Christian König
<ckoenig.leichtzumerken@...il.com> wrote:
>
> The point is it doesn't need to talk to the amdgpu hardware. What it
> does is that it talks to the good old VGA/VESA emulation and that just
> happens to be still enabled by the BIOS/GRUB.
>
> And that VGA/VESA emulation doesn't need any BAR or whatever to keep the
> hw running in the state where it was initialized before the kernel
> started. The kernel just grabs the addresses where it needs to write the
> display data and keeps going with that.
>
> But when a hw specific driver wants to load this is the first thing
> which gets disabled because we need to load new firmware. And with the
> BARs disabled this can't be re-enabled without rebooting the system.
>
> > My suggestion is that if
> > amdgpu fails to talk to the hardware, then let another suitable driver
> > do it. I attached a system log when I apply "pci=nocrs" with
> > "modprobe.blacklist=amdgpu" for showing that graphics work right in
> > this case.
> > To do this, does the Linux module loading mechanism need to be refined?
>
> That's actually working as expected. The real problem is that the BIOS
> on that system is so broken that we can't access the hw correctly.
>
> What we could to do is to check the BARs very early on and refuse to
> load when they are disable. The problem with this approach is that there
> are systems where it is normal that the BARs are disable until the
> driver loads and get enabled during the hardware initialization process.
>
> What you might want to look into is to find a quirk for the BIOS to
> properly enable the nvme controller.
>

That's interesting. I noticed that now amdgpu could work even with
parameter [pci=nocrs] on 6.7.0-0.rc4 and higher kernels.
It means BARs became available?
I attached here the kerner log and lspci. What's changed?

-- 
Best Regards,
Mike Gavrilov.

Download attachment "dmesg-nvme-down-2.zip" of type "application/zip" (46571 bytes)

Download attachment "lspci.zip" of type "application/zip" (2710 bytes)

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ