lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <s5ha9tvyuio.wl%tiwai@suse.de>
Date:	Mon, 03 Dec 2012 17:23:27 +0100
From:	Takashi Iwai <tiwai@...e.de>
To:	Daniel J Blueman <daniel@...ra.org>
Cc:	Seth Forshee <seth.forshee@...onical.com>,
	Dave Airlie <airlied@...ux.ie>,
	Linux Kernel <linux-kernel@...r.kernel.org>
Subject: Re: switcheroo registration vs switching race...

At Mon, 3 Dec 2012 23:08:28 +0800,
Daniel J Blueman wrote:
> 
> On 3 December 2012 22:40, Takashi Iwai <tiwai@...e.de> wrote:
> > At Mon, 3 Dec 2012 22:25:52 +0800,
> > Daniel J Blueman wrote:
> >>
> >> On 3 December 2012 19:17, Takashi Iwai <tiwai@...e.de> wrote:
> >> > At Wed, 28 Nov 2012 09:45:39 +0100,
> >> > Takashi Iwai wrote:
> >> >>
> >> >> At Wed, 28 Nov 2012 11:45:07 +0800,
> >> >> Daniel J Blueman wrote:
> >> >> >
> >> >> > Hi Seth, Dave, Takashi,
> >> >> >
> >> >> > If I power down the unused discrete GPU before lightdm starts by
> >> >> > fiddling with the sysfs file [1] in the upstart script, I see a race
> >> >> > manifesting as the discrete GPU's HDA controller timing out to
> >> >> > commands [2].
> >> >> >
> >> >> > Adding some debug, I see that the registered audio devices are put
> >> >> > into D3 before the GPU is, but it turns out that the discrete (and
> >> >> > internal) GPU's HDA controller gets registered a bit later, so the
> >> >> > list is empty. The symptom is since the HDA driver it's talking to
> >> >> > hardware which is now in D3.
> >> >> >
> >> >> > We could add a mutex to nouveau to allow us to wait for the DGPU HDA
> >> >> > controller, but perhaps this should be solved at a higher level in the
> >> >> > vgaswitcheroo code; what do you think?
> >> >>
> >> >> Maybe it's a side effect for the recent effort to fix another race in
> >> >> the probe.  A part of them problem is that the registration is done at
> >> >> the very last of probing.
> >> >>
> >> >> Instead of delaying the registration, how about the patch below?
> >> >
> >> > Ping.  If this really works, I'd like to queue it for 3.8 merge, at
> >> > least...
> >>
> >> Ping ack; I was trying to find time to understand another race that
> >> occurs with GPU probing after switching, but is separate from the
> >> situation before switching, here.
> >>
> >> In the context of writing the switch, it looks like struct azx isn't
> >> allocated by the time azx_vs_set_state accesses it [1,2]; racing with
> >> azx_codec_create?
> >
> > It was allocated, but it wasn't assigned properly in pci drvdata.
> >
> > Below is the revised patch.  Just moved pci_set_drvdata() before
> > register_vga_switcheroo().  Could you retest with it?
> 
> Superb; this addresses the oops.

OK, I'll queue it to sound tree for 3.8 kernel with Cc to stable.

> ~1 second after the DGPU is put into D3, I still often see "hda-intel:
> spurious response 0x0:0x0, last cmd=0x170500":
> http://quora.org/2012/hda-switch-spurious.txt

Hm, it's not clear who triggers these messages.  I'll try to check the
code paths.

> Presumably this implies the read of the ring-buffer pointer returned
> 0xffffffff, so the HDA driver understands the pointer to have wrapped
> and processes the 191 unwritten entries?

Good point.  Actually there is one bug that looks obviously wrong
(writing 32bit value to CORBWP).  Maybe it has been working just
because writing CORBRP doesn't influence except for the reset bit.

Reading CORBWP as a byte is OK, but this could be better in a word so
that we can check 0xffff as invalid.

A test patch is below.  Hopefully this improves the situation...


thanks,

Takashi

---
diff --git a/sound/pci/hda/hda_intel.c b/sound/pci/hda/hda_intel.c
index 4bb52da..ce990db 100644
--- a/sound/pci/hda/hda_intel.c
+++ b/sound/pci/hda/hda_intel.c
@@ -805,13 +805,18 @@ static int azx_corb_send_cmd(struct hda_bus *bus, u32 val)
 	spin_lock_irq(&chip->reg_lock);
 
 	/* add command to corb */
-	wp = azx_readb(chip, CORBWP);
+	wp = azx_readw(chip, CORBWP);
+	if (wp == 0xffff) {
+		/* something wrong, controller likely turned to D3 */
+		spin_unlock_irq(&chip->reg_lock);
+		return -1;
+	}
 	wp++;
 	wp %= ICH6_MAX_CORB_ENTRIES;
 
 	chip->rirb.cmds[addr]++;
 	chip->corb.buf[wp] = cpu_to_le32(val);
-	azx_writel(chip, CORBWP, wp);
+	azx_writew(chip, CORBWP, wp);
 
 	spin_unlock_irq(&chip->reg_lock);
 
@@ -827,7 +832,12 @@ static void azx_update_rirb(struct azx *chip)
 	unsigned int addr;
 	u32 res, res_ex;
 
-	wp = azx_readb(chip, RIRBWP);
+	wp = azx_readw(chip, RIRBWP);
+	if (wp == 0xffff) {
+		/* something wrong, controller likely turned to D3 */
+		return;
+	}
+
 	if (wp == chip->rirb.wp)
 		return;
 	chip->rirb.wp = wp;
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ