lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <CAMVG2suWOAXJYWJb+QrLH2KV3CJw+Ut6fRRuy0U_F6-9oV9YyA@mail.gmail.com>
Date:	Mon, 3 Dec 2012 23:08:28 +0800
From:	Daniel J Blueman <daniel@...ra.org>
To:	Takashi Iwai <tiwai@...e.de>
Cc:	Seth Forshee <seth.forshee@...onical.com>,
	Dave Airlie <airlied@...ux.ie>,
	Linux Kernel <linux-kernel@...r.kernel.org>
Subject: Re: switcheroo registration vs switching race...

On 3 December 2012 22:40, Takashi Iwai <tiwai@...e.de> wrote:
> At Mon, 3 Dec 2012 22:25:52 +0800,
> Daniel J Blueman wrote:
>>
>> On 3 December 2012 19:17, Takashi Iwai <tiwai@...e.de> wrote:
>> > At Wed, 28 Nov 2012 09:45:39 +0100,
>> > Takashi Iwai wrote:
>> >>
>> >> At Wed, 28 Nov 2012 11:45:07 +0800,
>> >> Daniel J Blueman wrote:
>> >> >
>> >> > Hi Seth, Dave, Takashi,
>> >> >
>> >> > If I power down the unused discrete GPU before lightdm starts by
>> >> > fiddling with the sysfs file [1] in the upstart script, I see a race
>> >> > manifesting as the discrete GPU's HDA controller timing out to
>> >> > commands [2].
>> >> >
>> >> > Adding some debug, I see that the registered audio devices are put
>> >> > into D3 before the GPU is, but it turns out that the discrete (and
>> >> > internal) GPU's HDA controller gets registered a bit later, so the
>> >> > list is empty. The symptom is since the HDA driver it's talking to
>> >> > hardware which is now in D3.
>> >> >
>> >> > We could add a mutex to nouveau to allow us to wait for the DGPU HDA
>> >> > controller, but perhaps this should be solved at a higher level in the
>> >> > vgaswitcheroo code; what do you think?
>> >>
>> >> Maybe it's a side effect for the recent effort to fix another race in
>> >> the probe.  A part of them problem is that the registration is done at
>> >> the very last of probing.
>> >>
>> >> Instead of delaying the registration, how about the patch below?
>> >
>> > Ping.  If this really works, I'd like to queue it for 3.8 merge, at
>> > least...
>>
>> Ping ack; I was trying to find time to understand another race that
>> occurs with GPU probing after switching, but is separate from the
>> situation before switching, here.
>>
>> In the context of writing the switch, it looks like struct azx isn't
>> allocated by the time azx_vs_set_state accesses it [1,2]; racing with
>> azx_codec_create?
>
> It was allocated, but it wasn't assigned properly in pci drvdata.
>
> Below is the revised patch.  Just moved pci_set_drvdata() before
> register_vga_switcheroo().  Could you retest with it?

Superb; this addresses the oops.

~1 second after the DGPU is put into D3, I still often see "hda-intel:
spurious response 0x0:0x0, last cmd=0x170500":
http://quora.org/2012/hda-switch-spurious.txt

Presumably this implies the read of the ring-buffer pointer returned
0xffffffff, so the HDA driver understands the pointer to have wrapped
and processes the 191 unwritten entries?

Daniel
--
Daniel J Blueman
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ