lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <ZvpTRLxSkcqn03Fk@mail.gmail.com>
Date: Mon, 30 Sep 2024 10:29:08 +0300
From: Maxim Mikityanskiy <maxtram95@...il.com>
To: Takashi Iwai <tiwai@...e.de>
Cc: Jaroslav Kysela <perex@...ex.cz>, Takashi Iwai <tiwai@...e.com>,
	Pierre-Louis Bossart <pierre-louis.bossart@...ux.intel.com>,
	Kai Vehmanen <kai.vehmanen@...ux.intel.com>,
	Maarten Lankhorst <maarten.lankhorst@...ux.intel.com>,
	Peter Ujfalusi <peter.ujfalusi@...ux.intel.com>,
	Chaitanya Kumar Borah <chaitanya.kumar.borah@...el.com>,
	Rui Salvaterra <rsalvaterra@...il.com>,
	Sui Jingfeng <suijingfeng@...ngson.cn>,
	Bjorn Helgaas <bhelgaas@...gle.com>, Peter Wu <peter@...ensteyn.nl>,
	Lukas Wunner <lukas@...ner.de>, linux-sound@...r.kernel.org,
	linux-kernel@...r.kernel.org
Subject: Re: [PATCH] ALSA: hda: intel: Fix Optimus when GPU has no sound

On Mon, 30 Sep 2024 at 08:43:46 +0200, Takashi Iwai wrote:
> On Sat, 21 Sep 2024 10:10:37 +0200,
> Maxim Mikityanskiy wrote:
> > 
> > On Sat, 07 Sep 2024 at 20:10:44 +0200, Takashi Iwai wrote:
> > > On Sat, 07 Sep 2024 19:24:41 +0200,
> > > Maxim Mikityanskiy wrote:
> > > > 
> > > > On Sat, 07 Sep 2024 at 17:49:11 +0200, Takashi Iwai wrote:
> > > > > On Sat, 07 Sep 2024 15:45:41 +0200,
> > > > > Maxim Mikityanskiy wrote:
> > > > > > 
> > > > > > On Sat, 07 Sep 2024 at 12:06:25 +0200, Takashi Iwai wrote:
> > > > > > > On Fri, 06 Sep 2024 20:05:06 +0200,
> > > > > > > Maxim Mikityanskiy wrote:
> > > > > > > > 
> > > > > > > > On Thu, 05 Sep 2024 at 16:24:26 +0200, Takashi Iwai wrote:
> > > > > > > > > On Wed, 04 Sep 2024 22:39:55 +0200,
> > > > > > > > > Maxim Mikityanskiy wrote:
> > > > > > > > > > 
> > > > > > > > > > Lenovo IdeaPad Z570 with NVIDIA GeForce Ge 520M doesn't have sound on
> > > > > > > > 
> > > > > > > > Spotted a typo: s/520M/540M/
> > > > > > > > 
> > > > > > > > > > the discrete GPU. snd_hda_intel probes the device and schedules
> > > > > > > > > > azx_probe_continue(), which fails at azx_first_init(). The driver ends
> > > > > > > > > > up probed, but calls azx_free() and stops the chip. However, from the
> > > > > > > > > > runtime PM point of view, the device remains active, because the PCI
> > > > > > > > > > subsystem makes it active on probe, and it's still bound. It prevents
> > > > > > > > > > vga_switcheroo from turning off the DGPU (pci_create_device_link() syncs
> > > > > > > > > > power management for the video and audio devices).
> > > > > > > > > > 
> > > > > > > > > > Fix it by forcing the device to the suspended state in azx_free().
> > > > > > > > > > 
> > > > > > > > > > Fixes: 07f4f97d7b4b ("vga_switcheroo: Use device link for HDA controller")
> > > > > > > > > > Signed-off-by: Maxim Mikityanskiy <maxtram95@...il.com>
> > > > > > > > > 
> > > > > > > > > What if this device probe is skipped (e.g. adding your device entry to
> > > > > > > > > driver_denylist[] in hda_intel.c)?  Is the device also in the
> > > > > > > > > runtime-suspended state?
> > > > > > > > 
> > > > > > > > I added the following:
> > > > > > > > 
> > > > > > > > { PCI_DEVICE_SUB(0x10de, 0x0bea, 0x0000, 0x0000) },
> > > > > > > > 
> > > > > > > > The probe was apparently skipped (the device is not attached to a
> > > > > > > > driver), runtime_status=suspended, runtime_usage=0, the GPU goes to
> > > > > > > > DynOff.
> > > > > > > 
> > > > > > > OK, that's good.
> > > > > > > 
> > > > > > > > However, I'm not sure whether it's a good idea to blacklist 540M
> > > > > > > > entirely, as there might be other laptops with this GPU that have sound,
> > > > > > > > and AFAIK there are variants of Lenovo Z570 with other NVIDIA GPUs.
> > > > > > > 
> > > > > > > You should specify the PCI SSID of your device instead of 0:0 in the
> > > > > > > above, so that it's picked up only for yours.
> > > > > > 
> > > > > > Where can I get it?
> > > > > > 
> > > > > > # cat /sys/bus/pci/devices/0000\:01\:00.1/subsystem_vendor
> > > > > > 0x0000
> > > > > > # cat /sys/bus/pci/devices/0000\:01\:00.1/subsystem_device
> > > > > > 0x0000
> > > > > 
> > > > > Ouch, Lenovo didn't set it right.
> > > > > Alternatively we may introduce a deny list based on DMI.  Hmm...
> > > > 
> > > > A DMI-based quirk sounds better than blocking any NVIDIA 540M, it would
> > > > also allow handling alternative GPUs on this laptop model.
> > > > 
> > > > But could you explain what's wrong with a generic approach that I
> > > > suggest (probe_continue failed => mark as suspended)? It doesn't require
> > > > any lists. Any drawbacks?
> > > 
> > > As you noticed, it will leave the device bound with driver, i.e. this
> > > looks as if it were operative.  It means that the sound driver is
> > > still responsible for suspend/resume or whatever action even though
> > > it's totally useless.  In that sense, it makes more sense to give it
> > > away at the early stage before actually binding it, and the deny list
> > > is exactly for that.
> > 
> > Thanks for your feedback!
> 
> Sorry for the late reply, as I've been off in the last weeks.
> 
> > Do you happen to know whether there is a way for a driver to unbind
> > itself after probe() returned success? I've never seen anything like
> > this, but it would be a perfect solution: no lists and no driver bound
> > after the delayed initialization fails.
> 
> I don't think it's possible for now (at least in a clean way).
> It should be a call of device_driver_detach(), but it's no exported
> API.
> 
> > > Apart from that, the suggested change itself can be applied
> > > independently from the deny list update, too.  It'd be good for other
> > > similar devices, too.
> > 
> > I was trying to avoid lists, because it's a maintainance nightmare (the
> > lists are never complete, there might be false positives, etc.), and
> > we'd need to add another type of a list for DMI quirks. Moreover, adding
> > both my change and a list entry has a drawback that similarly broken
> > devices will use different workaround code (i.e. less robust), and the
> > ones that use the generic fallback will never get on the list, because
> > no one will report it.
> > 
> > As a maintainer, what's your preferred approach?
> > 
> > 1. Unbind ourselves on failure in azx_probe_continue() (but I don't know
> > whether it's possible).
> 
> So it's no easy way.
> 
> > 2. Add a DMI deny list to hda_intel.
> > 
> > 3. Add a DMI check to quirk_nvidia_hda (i.e. don't enable a non-existent
> > HDA if it's not enabled by BIOS on this laptop model).
> 
> I'm afraid that it may break other systems; IIRC, the power up was
> needed explicitly even if it's disabled on BIOS.

I'm not suggesting full removal of quirk_nvidia_hda. If we add a DMI
check to disable quirk_nvidia_hda on this particular laptop model, it
can't break other systems, can it?

Anyways, adding a DMI check to hda_intel or quirk_nvidia_hda seems the
same level of efforts, so I can proceed with the former. It's the same
power consumption either way, right?

I'm also thinking of adding a module parameter to block probing of the
DGPU audio. Back in the days, there were plenty of similar Lenovo
laptops, which might also suffer from the same issue.

> > 4. Driver becomes a stub no failure in azx_probe_continue() (this
> > patch).
> 
> It's OK, but slightly risky.  It's kept bound, hence any unexpected
> thing might happen if the code is changed without noticing it.
> 
> > 5. Some combo of 4 and 2/3?
> 
> Not safe.
> 
> So, from the safety POV, I'd rather take the DMI deny list approach.
> 
> 
> thanks,
> 
> Takashi
> 
> > > Though, a slight concern is the sequence of runtime PM you applied in
> > > the patch.  Is it a standard idiom to perform pm_runtime_disable(),
> > > set_suspended() and enable()?
> > 
> > I'm not an expert in PM. Before I wrote this code, I was meditating on
> > the docs for a few hours, and what I understood was:
> > 
> > 1. As we are disabling the device (below in the same function), we need
> > set_suspended() to just update the state.
> > 
> > 2. set_suspended() must be called with PM disabled (see the comment
> > above the function).
> > 
> > 3. Disable and enable must come in pairs, because there is refcounting.
> > 
> > I've seen set_suspended() after disable() on device remove (e.g.,
> > rt9120_remove()). For hda_intel, assuming that disable/enable are
> > already paired, adding a new disable without a matching enable would
> > break refcounting. Hence such a sequence.
> > 
> > I'd love to hear an opinion from someone more experienced in PM.
> > 
> > > Also, azx_free() is the common
> > > destructor, hence it's called also at the regular driver unbinding.
> > > I'm not entirely sure whether it's OK to call there also at
> > > unbinding.
> > 
> > I don't have hardware to test this flow, but theoretically it looks OK
> > to me: we set the suspended state as we disable the device.
> > 
> > > 
> > > thanks,
> > > 
> > > Takashi
> > > 
> > > > 
> > > > > 
> > > > > 
> > > > > Takashi
> > > > > 
> > > > > 
> > > > > > 
> > > > > > Is it not the right place?
> > > > > > 
> > > > > > > 
> > > > > > > Takashi
> > > > > > > 
> > > > > > > > Another way to make vga_switcheroo work is to disable quirk_nvidia_hda,
> > > > > > > > although I don't know whether it can be done without recompiling the
> > > > > > > > kernel. In this case, 0000:01:00.1 doesn't even appear on the bus.
> > > > > > > > 
> > > > > > > > (Note that I need to set nouveau.modeset=2 either way, otherwise it
> > > > > > > > stays in DynPwr if the screen is on.)
> > > > > > > > 
> > > > > > > > > 
> > > > > > > > > 
> > > > > > > > > thanks,
> > > > > > > > > 
> > > > > > > > > Takashi
> > > > > > > > > 
> > > > > > > > > > ---
> > > > > > > > > >  sound/pci/hda/hda_intel.c | 14 +++++++++++++-
> > > > > > > > > >  1 file changed, 13 insertions(+), 1 deletion(-)
> > > > > > > > > > 
> > > > > > > > > > diff --git a/sound/pci/hda/hda_intel.c b/sound/pci/hda/hda_intel.c
> > > > > > > > > > index b79020adce63..65fcb92e11c7 100644
> > > > > > > > > > --- a/sound/pci/hda/hda_intel.c
> > > > > > > > > > +++ b/sound/pci/hda/hda_intel.c
> > > > > > > > > > @@ -1361,8 +1361,20 @@ static void azx_free(struct azx *chip)
> > > > > > > > > >  	if (use_vga_switcheroo(hda)) {
> > > > > > > > > >  		if (chip->disabled && hda->probe_continued)
> > > > > > > > > >  			snd_hda_unlock_devices(&chip->bus);
> > > > > > > > > > -		if (hda->vga_switcheroo_registered)
> > > > > > > > > > +		if (hda->vga_switcheroo_registered) {
> > > > > > > > > >  			vga_switcheroo_unregister_client(chip->pci);
> > > > > > > > > > +
> > > > > > > > > > +			/* Some GPUs don't have sound, and azx_first_init fails,
> > > > > > > > > > +			 * leaving the device probed but non-functional. As long
> > > > > > > > > > +			 * as it's probed, the PCI subsystem keeps its runtime
> > > > > > > > > > +			 * PM status as active. Force it to suspended (as we
> > > > > > > > > > +			 * actually stop the chip) to allow GPU to suspend via
> > > > > > > > > > +			 * vga_switcheroo.
> > > > > > > > > > +			 */
> > > > > > > > > > +			pm_runtime_disable(&pci->dev);
> > > > > > > > > > +			pm_runtime_set_suspended(&pci->dev);
> > > > > > > > > > +			pm_runtime_enable(&pci->dev);
> > > > > > > > > > +		}
> > > > > > > > > >  	}
> > > > > > > > > >  
> > > > > > > > > >  	if (bus->chip_init) {
> > > > > > > > > > -- 
> > > > > > > > > > 2.46.0
> > > > > > > > > > 

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ