lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [day] [month] [year] [list]
Message-ID: <20210130020411.GZ29348@merlins.org>
Date:   Fri, 29 Jan 2021 18:04:11 -0800
From:   Marc MERLIN <marc_nouveau@...lins.org>
To:     Bjorn Helgaas <helgaas@...nel.org>
Cc:     nouveau@...ts.freedesktop.org,
        Mika Westerberg <mika.westerberg@...ux.intel.com>,
        LKML <linux-kernel@...r.kernel.org>,
        Linux PCI <linux-pci@...r.kernel.org>
Subject: Re: 5.9.11 still hanging 2mn at each boot and looping on nvidia-gpu
 0000:01:00.3: PME# enabled (Quadro RTX 4000 Mobile)

On Fri, Jan 29, 2021 at 03:20:32PM -0600, Bjorn Helgaas wrote:
> > For comparison the intel iwlwifi driver is very clear about firmware
> > it's trying to load, if it can't and what exact firmware you need to
> > find on the internet (filename)
> 
> I guess you're referring to this in iwl_request_firmware()?
> 
>   IWL_ERR(drv, "check git://git.kernel.org/pub/scm/linux/kernel/git/firmware/linux-firmware.git\n"); 
 
Yes :)

> How can we fix this in nouveau so we don't have the debug this again?
> I don't really know how firmware loading works, but "git grep -A5
> request_firmware drivers/gpu/drm/nouveau/" shows that we generally
> print something when request_firmware() fails.

Well, have a look at https://pastebin.com/dX19aCpj
do you see any warning whatsoever?

> But I didn't notice those messages in your logs, so I'm probably
> barking up the wrong tree.

you're not It seems that newer kernels are a bit better:
[  189.304662] nouveau 0000:01:00.0: pmu: firmware unavailable
[  189.312455] nouveau 0000:01:00.0: disp: destroy running...
[  189.316552] nouveau 0000:01:00.0: disp: destroy completed in 1us
[  189.320326] nouveau 0000:01:00.0: disp ctor failed, -12
[  189.324214] nouveau: probe of 0000:01:00.0 failed with error -12

So, it probably got better, but that message got displayed after the 2mn
hang that having the firmware, stops from happening.

whichever developer with the right hardware can probably easily
reproduce this by removing the firmware and looking at the boot
messages.

At the very least, it should print something more clear "driver will not
function properly", and a URL to where one can get the driver, would be
awesome.

> So maybe the wakeups are related to having vs not having the nouveau
> firmware?  I'm still curious about that, and it smells like a bug to
> me, but probably something to do with nouveau where I have no hope of
> debugging it.
 
Right. Honestly, given the time I've lost with this, and now that it
seems gone with the firmware, I'm happy to leave well enough alone :)

I'm not sure how you are involved with the driver, but are you able to
help improve the dmesg output?

Thanks,
Marc
-- 
"A mouse is a device used to point at the xterm you want to type in" - A.S.R.
 
Home page: http://marc.merlins.org/                       | PGP 7F55D5F27AAF9D08

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ