lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [day] [month] [year] [list]
Message-ID: <CALjAwxiytz=FUy4Fu8j-hOa2BKXpYL0ZyjMHyOGRE0OdsfKDkA@mail.gmail.com>
Date: Wed, 18 Sep 2024 21:29:19 +0100
From: Sitsofe Wheeler <sitsofe@...il.com>
To: Alex Deucher <alexander.deucher@....com>, 
	Christian König <christian.koenig@....com>, 
	Xinhui Pan <Xinhui.Pan@....com>, David Airlie <airlied@...il.com>, Daniel Vetter <daniel@...ll.ch>, 
	amd-gfx@...ts.freedesktop.org, dri-devel@...ts.freedesktop.org, 
	linux-kernel@...r.kernel.org, Hans de Goede <hdegoede@...hat.com>
Subject: Re: Kernel hang when amdgpu driver is loaded on old radeon card

(CC'ing Hans de Goede who recently wrote a blog post
(https://hansdegoede.dreamwidth.org/28552.html ) which sounds like the
same issue I'm seeing)

On Sun, 15 Sept 2024 at 21:30, Sitsofe Wheeler <sitsofe@...il.com> wrote:
>
> Hello,
>
> (Apologies if I have CC'd the wrong people/places - I just went by
> what get_maintainer.pl -f drivers/gpu/drm/amd/amdgpu/amdgpu_drv.c
> said)
>
> I recently upgraded from Ubuntu 20.04 (5.15.0-119.129~20.04.1-generic
> kernel) to Ubuntu 24.04 (6.8.0-44-generic kernel) and found that while
> booting the kernel hangs for around 15 seconds just before the amdgpu
> driver is loaded:
>
> [    4.459519] radeon 0000:01:05.0: [drm] Cannot find any crtc or sizes
> [    4.460118] probe of 0000:01:05.0 returned 0 after 902266 usecs
> [    4.460184] initcall radeon_module_init+0x0/0xff0 [radeon] returned
> 0 after 902473 usecs
> [    4.465797] calling  drm_buddy_module_init+0x0/0xff0 [drm_buddy] @ 122
> [    4.465853] initcall drm_buddy_module_init+0x0/0xff0 [drm_buddy]
> returned 0 after 29 usecs
> [    4.469419] radeon 0000:01:05.0: [drm] Cannot find any crtc or sizes
> [    4.473831] calling  drm_sched_fence_slab_init+0x0/0xff0 [gpu_sched] @ 122
> [    4.473892] initcall drm_sched_fence_slab_init+0x0/0xff0
> [gpu_sched] returned 0 after 31 usecs
> [   18.724442] calling  amdgpu_init+0x0/0xff0 [amdgpu] @ 122
> [   18.726303] [drm] amdgpu kernel modesetting enabled.
> [   18.726576] amdgpu: Virtual CRAT table created for CPU
> [   18.726609] amdgpu: Topology: Add CPU node
> [   18.726787] initcall amdgpu_init+0x0/0xff0 [amdgpu] returned 0
> after 528 usecs
>
> I've checked and the problem still exists in 6.11.0-061100rc7-generic
> (which is close to vanilla upstream).
>
> The graphics card I have is:
> 01:05.0 VGA compatible controller: Advanced Micro Devices, Inc.
> [AMD/ATI] RS880M [Mobility Radeon HD 4225/4250] (prog-if 00 [VGA
> controller])
> 01:05.0 0300: 1002:9712 (prog-if 00 [VGA controller])
> Subsystem: 103c:1609
>
> At first I thought the problem was related to the change
> https://github.com/torvalds/linux/commit/eb4fd29afd4aa1c98d882800ceeee7d1f5262803
> ("drm/amdgpu: bind to any 0x1002 PCI diplay [sic] class device") which
> now means my card is claimed by two drivers (radeon and amdgpu). That
> change complicated things because:
> - The amdgpu module and its dependencies remain permanently present (which
>   never used to happen)
> - It took some time for me to realise that the amdgpu driver hadn't suddenly
>   grown the ability to support this old card :-) There is a nice table on
>   https://www.x.org/wiki/RadeonFeature/#decoderringforengineeringvsmarketingnames
>   that shows it is part of the R600 family and
>   https://www.x.org/wiki/RadeonFeature/#featurematrixforfreeradeondrivers shows
>   that R600 is only supported by the radeon driver.
>
> However, testing a 5.16.20-051620-generic kernel showed that while the
> amdgpu module is loaded, there is no 15 second hang... So far my
> testing has the following results:
> - 5.16.20-051620-generic - amdgpu loaded, no hang
> - 5.18.19-051819-generic - amdgpu loaded, no hang
> - 6.0.0-060000-generic - amdgpu loaded, hang
> - 6.2.0-060200-generic - amdgpu loaded, hang
> - 6.8.0-44-generic - amdgpu loaded, hang
> - 6.11.0-061100rc7-generic - amdgpu loaded, hang
>
> To work around the problem I've taken to blacklisting amdgpu in
> /etc/modprobe.d/ which makes the hang disappear.
>
> Does anyone else see this issue? Is there something better than my
> current workaround? What do other drivers that want to bind to such a
> large set of devices do? Further, while I'm already using
> initcall_debug, is there any other kernel boot parameter to make
> what's happening more visible?
>
> --
> Sitsofe



-- 
Sitsofe

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ