lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Date:   Mon, 20 Nov 2023 12:31:38 -0500
From:   Alex Deucher <alexdeucher@...il.com>
To:     Christian König <christian.koenig@....com>
Cc:     Christian König <ckoenig.leichtzumerken@...il.com>,
        Dave Airlie <airlied@...il.com>,
        Linux regressions mailing list <regressions@...ts.linux.dev>,
        linux-kernel@...r.kernel.org,
        "amd-gfx@...ts.freedesktop.org" <amd-gfx@...ts.freedesktop.org>,
        Luben Tuikov <luben.tuikov@....com>,
        dri-devel@...ts.freedesktop.org, Phillip Susi <phill@...susis.net>,
        Alex Deucher <alexander.deucher@....com>
Subject: Re: Radeon regression in 6.6 kernel

On Mon, Nov 20, 2023 at 11:24 AM Christian König
<christian.koenig@....com> wrote:
>
> Am 20.11.23 um 17:08 schrieb Alex Deucher:
> > On Mon, Nov 20, 2023 at 10:57 AM Christian König
> > <ckoenig.leichtzumerken@...il.com> wrote:
> >> Am 19.11.23 um 07:47 schrieb Dave Airlie:
> >>>> On 12.11.23 01:46, Phillip Susi wrote:
> >>>>> I had been testing some things on a post 6.6-rc5 kernel for a week or
> >>>>> two and then when I pulled to a post 6.6 release kernel, I found that
> >>>>> system suspend was broken.  It seems that the radeon driver failed to
> >>>>> suspend, leaving the display dead, the wayland display server hung, and
> >>>>> the system still running.  I have been trying to bisect it for the last
> >>>>> few days and have only been able to narrow it down to the following 3
> >>>>> commits:
> >>>>>
> >>>>> There are only 'skip'ped commits left to test.
> >>>>> The first bad commit could be any of:
> >>>>> 56e449603f0ac580700621a356d35d5716a62ce5
> >>>>> c07bf1636f0005f9eb7956404490672286ea59d3
> >>>>> b70438004a14f4d0f9890b3297cd66248728546c
> >>>>> We cannot bisect more!
> >>>> Hmm, not a single reply from the amdgpu folks. Wondering how we can
> >>>> encourage them to look into this.
> >>>>
> >>>> Phillip, reporting issues by mail should still work, but you might have
> >>>> more luck here, as that's where the amdgpu afaics prefer to track bugs:
> >>>> https://gitlab.freedesktop.org/drm/amd/-/issues
> >>>>
> >>>> When you file an issue there, please mention it here.
> >>>>
> >>>> Furthermore it might help if you could verify if 6.7-rc1 (or rc2, which
> >>>> comes out later today) or 6.6.2-rc1 improve things.
> >>> It would also be good to test if reverting any of these is possible or not.
> >> Well none of the commits mentioned can affect radeon in any way. Radeon
> >> simply doesn't use the scheduler.
> >>
> >> My suspicion is that the user is actually using amdgpu instead of
> >> radeon. The switch potentially occurred accidentally, for example by
> >> compiling amdgpu support for SI/CIK.
> >>
> >> Those amdgpu problems for older ASIC have already been worked on and
> >> should be fixed by now.
> > In this case it's a navi23 (so radeon in the marketing sense).
>
> Thanks, couldn't find that in the mail thread.
>
> In that case those are the already known problems with the scheduler
> changes, aren't they?

Yes.  Those changes went into 6.7 though, not 6.6 AFAIK.  Maybe I'm
misunderstanding what the original report was actually testing.  If it
was 6.7, then try reverting:
56e449603f0ac580700621a356d35d5716a62ce5
b70438004a14f4d0f9890b3297cd66248728546c

Alex

>
> Christian.
>
> >
> > Alex
> >
> >> Regards,
> >> Christian.
> >>
> >>> File the gitlab issue and we should poke amd a but more to take a look.
> >>>
> >>> Dave.
>

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ