lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [day] [month] [year] [list]
Message-ID: <CADnq5_MbwMv1Hr6+N-SLK9WtGCyzsRquaPZa0JxreL5ssuoHMw@mail.gmail.com>
Date: Fri, 30 May 2025 16:25:56 -0400
From: Alex Deucher <alexdeucher@...il.com>
To: Ozgur Kara <ozgur@...sey.org>
Cc: Durmuş <dozaltay@...il.com>, 
	Christian König <christian.koenig@....com>, 
	David Airlie <airlied@...il.com>, Tao Zhou <tao.zhou1@....com>, Yan Zhen <yanzhen@...o.com>, 
	Greg KH <gregkh@...uxfoundation.org>, Alper Nebi Yasak <alpernebiyasak@...il.com>, 
	linux-kernel@...r.kernel.org, Alex Deucher <alexander.deucher@....com>, 
	amd-gfx@...ts.freedesktop.org, dri-devel@...ts.freedesktop.org
Subject: Re: Regression: RX 470 fails to boot with amdgpu.dpm=1 on kernel 6.7+

On Thu, May 22, 2025 at 8:39 AM Ozgur Kara <ozgur@...sey.org> wrote:
>
> Durmuş <dozaltay@...il.com>, 22 May 2025 Per, 15:15 tarihinde şunu yazdı:
> >
> > I'm using dual monitors. I disconnected the HDMI to test with a single
> > screen, but the result was the same. I also swapped the HDMI ports,
> > but the issue still persisted.
> > I'm not using DisplayPort — in fact, it's a bit weird: I convert VGA
> > to HDMI and connect it to the graphics card. I'm not an expert of
> > course, but since there were no issues on the LTS kernel and the
> > problems started with kernels after 6.7, it made me think it might be
> > a kernel issue.
> > If needed, I’ll set dpm=0 when I install (i don't know when) Linux
> > again and test it.
> > If I remember correctly, when I added amdgpu.dc=0 to GRUB, nothing
> > changed — the system still froze after GRUB.
> >
>
> Hello,
>
> i suspect this is related to latest patch rather than a kernel bug so
> i will add Aurabindo because you may be affected after cfb2d41831ee
> commit.
> first of all, is there any chance you can revert this commit and test kernel?
>
> $ git revert cfb2d41831ee

That patch has been reverted (it's included in my -fixes PR this
week), but we are in the middle of the merge window so it may take a
bit for the revert to land and make its way back to stable.

Alex

>
> So after commit, dmcub ring calls became much higher and some power
> states became unstable i dont know i'm not expert but  these usually
> have to do with things like dmcub firmware and  power gating (gfxoff)
> or post-reset ring buffer access.
> maybe this commit is that  vmin/vmax update call may now be made much
> more frequently and this may cause dmcub to not synchronize properly
> some power states to become unstable or firmware to crash.
>
> we might need to look at the contents of
> /sys/module/amdgpu/parameters/force_vmin_vmax_update but  vmin vmax
> potential call height might be giving an error.
>
> So I added Aurabindo Pillai, should have added you after 3 different
> bug reports.
>
> Regards
>
> Ozgur
>
>
> > On Thu, May 22, 2025 at 3:05 PM Ozgur Kara <ozgur@...sey.org> wrote:
> > >
> > > Durmuş <dozaltay@...il.com>, 22 May 2025 Per, 14:58 tarihinde şunu yazdı:
> > > >
> > > > Hey, thanks for the reply, but I don't use Linux anymore, so I can't
> > > > provide any logs or test it further. Also, FYI, this bug has been
> > > > around since kernel v6.7. If I install Linux again soon, I'll try to
> > > > test it. Could you please advise what I should do about amdgpu.dpm?
> > > > Should it stay at 0 or be set to 1? When I try booting with 1, the PC
> > > > freezes right after the grub screen. I've used Linux for 2-3 months
> > > > but still don’t really know how to debug these kinds of errors
> > > > properly. Thanks!
> > > >
> > >
> > > Hello,
> > >
> > > not problem maybe we should talk about this separately but kernel
> > > lists are progressing complicated with too many development patch
> > > content that is not very suitable for this.
> > > we can also see it as a problem with kernel, gpus or amd company and
> > > too many firmware and drivers.
> > >
> > > if it is hardware based especially gpu related, kernel doesnt
> > > intervene fully at this point.
> > > the system can be opened with amdgpu.dpm=0 but this is not correct and
> > > you did a very good job reporting it.
> > > maybe by adding amdgpu.dc=0 the display core is disabled but this
> > > prevents you from getting 144 mhz.
> > >
> > > we should make sure that there is the correct firmware under
> > > /lib/firmware/amdgpu.
> > > did you use DisplayPort and did you get 144 mhz output?
> > >
> > > $ journalctl -b -1 will give you some information.
> > > $ glxinfo | grep OpenGL can also give you the problem or error.
> > >
> > > So kernel developers and AMD developers should look into this issue
> > > but i think it is most likely a firmware blockage on the AMD side not
> > > a kernel side.
> > >
> > > Regards
> > >
> > > Ozgur
> > >
> > > > On Thu, May 22, 2025 at 2:52 PM Ozgur Kara <ozgur@...sey.org> wrote:
> > > > >
> > > > > Durmuş <dozaltay@...il.com>, 22 May 2025 Per, 14:27 tarihinde şunu yazdı:
> > > > > >
> > > > > > Hello,
> > > > > >
> > > > >
> > > > > Hello,
> > > > >
> > > > > did you get a message in dmesg from kernel, for example an error like this?
> > > > >
> > > > > https://bugs.debian.org/cgi-bin/bugreport.cgi?bug=1106268
> > > > >
> > > > > The dmesg command will give you an output maybe journalctl output or
> > > > > mesa (glxinfo) output would also be sufficient because we need to know
> > > > > which upstream it is affected by.
> > > > > and thanks for report.
> > > > >
> > > > > Note: because there are two similar errors i added the necessary
> > > > > maintainers for upstream.
> > > > >
> > > > > Regards
> > > > >
> > > > > Ozgur
> > > > >
> > > > > > I'm experiencing a critical issue on my system with an AMD RX 470 GPU.
> > > > > > When booting with recent kernel versions (6.7.x or newer), the system
> > > > > > fails to boot properly unless I explicitly disable Dynamic Power
> > > > > > Management (DPM) via the `amdgpu.dpm=0` kernel parameter.
> > > > > >
> > > > > > When DPM is enabled (`amdgpu.dpm=1` or omitted, since it's the
> > > > > > default), the system either freezes during early boot or fails to
> > > > > > initialize the display. However, using the LTS kernel (6.6.x),
> > > > > > everything works as expected with DPM enabled.
> > > > > >
> > > > > > This seems to be a regression introduced in kernel 6.7 or later, and
> > > > > > it specifically affects older GCN4 (Polaris) GPUs like the RX 470.
> > > > > > Disabling DPM allows the system to boot, but significantly reduces GPU
> > > > > > performance.
> > > > > >
> > > > > > Things I’ve tried:
> > > > > > - Confirmed that the latest `linux-firmware` is installed.
> > > > > > - Verified correct firmware files exist under `/lib/firmware/amdgpu/`.
> > > > > > - Tested multiple kernels (mainline and LTS).
> > > > > > - Using Mesa with ACO (Radeon open driver stack).
> > > > > > - System boots fine with LTS kernel (6.6.x) + DPM enabled.
> > > > > >
> > > > > > System info:
> > > > > > - GPU: AMD RX 470 (GCN 4 / Polaris)
> > > > > > - Distro: Arch Linux
> > > > > > - Kernel (working): linux-lts 6.6.x
> > > > > > - Kernel (broken): 6.7.x and newer (currently tested on 6.14.6)
> > > > > >
> > > > > > Thanks in advance,
> > > > > > Durmus Ozaltay
> > > > > >
> > > > > >
> > > > > >
> > > >
> > > >
> >
> >

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ