[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <CACO55tuqAH5Zt+X9pjLFZ-RcFgxpgjpqmrAHPvm4=fb_DMBHyw@mail.gmail.com>
Date: Thu, 1 Jun 2023 20:10:26 +0200
From: Karol Herbst <kherbst@...hat.com>
To: "Limonciello, Mario" <Mario.Limonciello@....com>
Cc: Nick Hastings <nicholaschastings@...il.com>,
Lyude Paul <lyude@...hat.com>, Lukas Wunner <lukas@...ner.de>,
Salvatore Bonaccorso <carnil@...ian.org>,
"1036530@...s.debian.org" <1036530@...s.debian.org>,
"Rafael J. Wysocki" <rafael@...nel.org>,
Len Brown <lenb@...nel.org>,
"linux-acpi@...r.kernel.org" <linux-acpi@...r.kernel.org>,
"linux-kernel@...r.kernel.org" <linux-kernel@...r.kernel.org>,
"regressions@...ts.linux.dev" <regressions@...ts.linux.dev>
Subject: Re: Regression from "ACPI: OSI: Remove Linux-Dell-Video _OSI string"?
(was: Re: Bug#1036530: linux-signed-amd64: Hard lock up of system)
On Thu, Jun 1, 2023 at 7:21 PM Limonciello, Mario
<Mario.Limonciello@....com> wrote:
>
> [AMD Official Use Only - General]
>
> > -----Original Message-----
> > From: Karol Herbst <kherbst@...hat.com>
> > Sent: Thursday, June 1, 2023 12:19 PM
> > To: Limonciello, Mario <Mario.Limonciello@....com>
> > Cc: Nick Hastings <nicholaschastings@...il.com>; Lyude Paul
> > <lyude@...hat.com>; Lukas Wunner <lukas@...ner.de>; Salvatore
> > Bonaccorso <carnil@...ian.org>; 1036530@...s.debian.org; Rafael J.
> > Wysocki <rafael@...nel.org>; Len Brown <lenb@...nel.org>; linux-
> > acpi@...r.kernel.org; linux-kernel@...r.kernel.org;
> > regressions@...ts.linux.dev
> > Subject: Re: Regression from "ACPI: OSI: Remove Linux-Dell-Video _OSI
> > string"? (was: Re: Bug#1036530: linux-signed-amd64: Hard lock up of system)
> >
> > On Thu, Jun 1, 2023 at 6:54 PM Limonciello, Mario
> > <Mario.Limonciello@....com> wrote:
> > >
> > > [AMD Official Use Only - General]
> > >
> > > > -----Original Message-----
> > > > From: Karol Herbst <kherbst@...hat.com>
> > > > Sent: Thursday, June 1, 2023 11:33 AM
> > > > To: Limonciello, Mario <Mario.Limonciello@....com>
> > > > Cc: Nick Hastings <nicholaschastings@...il.com>; Lyude Paul
> > > > <lyude@...hat.com>; Lukas Wunner <lukas@...ner.de>; Salvatore
> > > > Bonaccorso <carnil@...ian.org>; 1036530@...s.debian.org; Rafael J.
> > > > Wysocki <rafael@...nel.org>; Len Brown <lenb@...nel.org>; linux-
> > > > acpi@...r.kernel.org; linux-kernel@...r.kernel.org;
> > > > regressions@...ts.linux.dev
> > > > Subject: Re: Regression from "ACPI: OSI: Remove Linux-Dell-Video _OSI
> > > > string"? (was: Re: Bug#1036530: linux-signed-amd64: Hard lock up of
> > system)
> > > >
> > > > On Thu, Jun 1, 2023 at 6:18 PM Limonciello, Mario
> > > > <mario.limonciello@....com> wrote:
> > > > >
> > > > > +Lyude, Lukas, Karol
> > > > >
> > > > > On 5/31/2023 6:40 PM, Nick Hastings wrote:
> > > > > > Hi,
> > > > > >
> > > > > > * Nick Hastings <nicholaschastings@...il.com> [230530 16:01]:
> > > > > >> * Mario Limonciello <mario.limonciello@....com> [230530 13:00]:
> > > > > > <snip>
> > > > > >>> As you're actually loading nouveau, can you please try
> > > > nouveau.runpm=0 on
> > > > > >>> the kernel command line?
> > > > > >> I'm not intentionally loading it. This machine also has intel graphics
> > > > > >> which is what I prefer. Checking my
> > > > > >> /etc/modprobe.d/blacklist-nvidia-nouveau.conf
> > > > > >> I see:
> > > > > >>
> > > > > >> blacklist nvidia
> > > > > >> blacklist nvidia-drm
> > > > > >> blacklist nvidia-modeset
> > > > > >> blacklist nvidia-uvm
> > > > > >> blacklist ipmi_msghandler
> > > > > >> blacklist ipmi_devintf
> > > > > >>
> > > > > >> So I thought I had blacklisted it but it seems I did not. Since I do not
> > > > > >> want to use it maybe it is better to check if the lock up occurs with
> > > > > >> nouveau blacklisted. I will try that now.
> > > > > > I blacklisted nouveau and booted into a 6.1 kernel:
> > > > > > % uname -a
> > > > > > Linux xps 6.1.0-9-amd64 #1 SMP PREEMPT_DYNAMIC Debian 6.1.27-1
> > > > (2023-05-08) x86_64 GNU/Linux
> > > > > >
> > > > > > It has been running without problems for nearly two days now:
> > > > > > % uptime
> > > > > > 08:34:48 up 1 day, 16:22, 2 users, load average: 1.33, 1.26, 1.27
> > > > > >
> > > > > > Regards,
> > > > > >
> > > > > > Nick.
> > > > >
> > > > > Thanks, that makes a lot more sense now.
> > > > >
> > > > > Nick, Can you please test if nouveau works with runtime PM in the
> > > > > latest 6.4-rc?
> > > > >
> > > > > If it works in 6.4-rc, there are probably nouveau commits that need
> > > > > to be backported to 6.1 LTS.
> > > > >
> > > > > If it's still broken in 6.4-rc, I believe you should file a bug:
> > > > >
> > > > > https://gitlab.freedesktop.org/drm/nouveau/
> > > > >
> > > > >
> > > > > Lyude, Lukas, Karol
> > > > >
> > > > > This thread is in relation to this commit:
> > > > >
> > > > > 24867516f06d ("ACPI: OSI: Remove Linux-Dell-Video _OSI string")
> > > > >
> > > > > Nick has found that runtime PM is *not* working for nouveau.
> > > > >
> > > >
> > > > keep in mind we have a list of PCIe controllers where we apply a
> > > > workaround:
> > > >
> > https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/tree/drivers
> > > > /gpu/drm/nouveau/nouveau_drm.c?h=v6.4-rc4#n682
> > > >
> > > > And I suspect there might be one or two more IDs we'll have to add
> > > > there. Do we have any logs?
> > >
> > > There's some archived onto the distro bug. Search this page for
> > "journalctl.log.gz"
> > > https://bugs.debian.org/cgi-bin/bugreport.cgi?bug=1036530
> > >
> >
> > interesting.. It seems to be the same controller used here. I wonder
> > if the pci topology is different or if the workaround is applied at
> > all.
>
> I didn't see the message in the log about the workaround being applied
> in that log, so I guess PCI topology difference is a likely suspect.
>
yeah, but I also couldn't see a log with the usual nouveau messages,
so it's kinda weird.
Anyway, the output of `lspci -tvnn` would help
> >
> > But yeah, I'd kinda love for somebody with better knowledge on all of
> > this to figure out what exactly is going wrong, but everytime this
> > gets investigated Intel says "our hardware has no bugs", the ACPI
> > folks dig for months and find nothing and I end up figuring out some
> > weirdo workaround I don't understand. And apparently also nobody is
> > able to hand out docs explaining in detail how that runtime
> > suspend/resume stuff is supposed to work.
> >
> > I have a Dell XPS 9560 where the added workaround in nouveau fixed the
> > problem and I know it's fixed on a bunch of other systems. So if
> > anybody is willing to publish docs and/or actually debug it with
> > domain knowledge, please go ahead.
> >
> > > > And could anybody test if adding the
> > > > controller in play here does resolve the problem?
> > > >
> > > > > If you recall we did 24867516f06d because 5775b843a619 was
> > > > > supposed to have fixed it.
> > > > >
> > >
>
Powered by blists - more mailing lists