lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Date:	Fri, 12 Oct 2012 23:08:14 +0800
From:	Daniel J Blueman <daniel@...ra.org>
To:	Takashi Iwai <tiwai@...e.de>
Cc:	Dave Airlie <airlied@...hat.com>,
	Linux Kernel <linux-kernel@...r.kernel.org>,
	alsa-devel@...a-project.org
Subject: Re: [3.6-rc7] switcheroo race with Intel HDA...

On 10 October 2012 20:34, Takashi Iwai <tiwai@...e.de> wrote:
> At Tue, 9 Oct 2012 22:26:40 +0800,
> Daniel J Blueman wrote:
>> On 9 October 2012 21:04, Takashi Iwai <tiwai@...e.de> wrote:
>> > At Tue, 9 Oct 2012 19:23:56 +0800,
>> > Daniel J Blueman wrote:
>> >> On 9 October 2012 18:07, Takashi Iwai <tiwai@...e.de> wrote:
>> >> > At Tue, 09 Oct 2012 12:04:08 +0200,
>> >> > Takashi Iwai wrote:
>> >> >> At Tue, 9 Oct 2012 00:34:09 +0800,
>> >> >> Daniel J Blueman wrote:
>> >> >> > On 8 October 2012 20:58, Takashi Iwai <tiwai@...e.de> wrote:
>> >> >> > > At Tue, 25 Sep 2012 13:20:05 +0800,
>> >> >> > > Daniel J Blueman wrote:
>> >> >> > >> On my Macbook with a discrete Nvidia GPU, there is a race between
>> >> >> > >> selecting the integrated GPU and putting the discrete GPU into D3 [1],
>> >> >> > >> reliably causing a kernel oops [2].
>> >> >> > >>
>> >> >> > >> Introducing a delay of ~1s between the calls prevents this. When the
>> >> >> > >> second 'OFF' write path executes, it looks like struct azx at
>> >> >> > >> card->private_data hasn't yet been allocated yet [3], so there is
>> >> >> > >> likely some locking missing.
>> >> >> > >
>> >> >> > > It's rather pci_get_drvdata() returning NULL (i.e. card is NULL, thus
>> >> >> > > card->private_data causes Oops).  Could you check the patch like below
>> >> >> > > and see whether you get a kernel warning (but no Oops) or the problem
>> >> >> > > gets fixed by shifting the assignment of pci drvdata?
>> >> >> > [...]
>> >> >> >
>> >> >> > Good patching. Calling pci_set_drvdata later prevents the oops in HDA,
>> >> >> > though we see unexpected 0x0 responses in the response ring buffer
>> >> >> > [1], which we don't see when there's a >~1.5s delay between IGD and
>> >> >> > OFF.
>> >> >>
>> >> >> If the previous patch fixed, it means that the switching occurred
>> >> >> during the device was being probed.  Maybe a better approach to
>> >> >> register the VGA switcheroo after the proper initialization.
>> >> >>
>> >> >> The patch below is a revised one.  Please give it a try.
>> >> >
>> >> > Also, it's not clear which card spews the spurious response.
>> >> > Apply the patch below in addition.
>> >> [...]
>> >>
>> >> hda-intel: 0000:01:00.1: spurious response 0x0:0x0, last cmd=0x1f0004
>> >> $ lspci -s :1:0.1
>> >> 01:00.1 Audio device: NVIDIA Corporation Device 0e1b (rev ff)
>> >>
>> >> It's the NVIDIA device which presumably hasn't completed it's
>> >> transition to D3 at the time the OFF is executed.
>> >
>> > OK, then could you try the patch below on the top of previous two
>> > patches?
>>
>> The first IGD switcheroo command fails to switch to the integrated GPU:
>>
>> # cat /sys/kernel/debug/vgaswitcheroo/switch
>> 0:DIS:+:Pwr:0000:01:00.0
>> 1:IGD: :Pwr:0000:00:02.0
>> 2:DIS-Audio: :Pwr:0000:01:00.1
>> # echo IGD >/sys/kernel/debug/vgaswitcheroo/switch
>> vga_switcheroo: client 1 refused switch
>>
>> I also instrumented snd_hda_lock_devices, but none of the failure
>> paths are being taken, which would leave inconsistent state, as the
>> return value isn't checked.
>
> Hm, right, the return value of snd_hda_lock_devices() isn't checked,
> but I don't understand how this results like above.
> Basically switching is protected by mutex in vga_switcheroo.c, so the
> whole operation in the client side should be serialized.
>
> In anyway, try the patch below cleanly, and see the spurious message
> error coming up at which timing.
[...]

The patch _does_ address the issue. A recent update to my Macbook
firmware misleadingly broke i915 switching, but since I can reproduce
the oops without the IGD switching completing with the stock kernel,
and consistently can't without [1], the patch is good.

Tested-by: Daniel J Blueman <daniel@...ra.org>

Thanks Takashi!
  Daniel

--- [1]

snd_hda_intel 0000:00:1b.0: enabling device (0000 -> 0002)
snd_hda_intel 0000:00:1b.0: irq 54 for MSI/MSI-X
XXX 0000:00:1b.0: azx_codec_create entered
vga_switcheroo: enabled
XXX 0000:00:1b.0: azx_codec_create done
input: HDA Intel PCH Headphone as
/devices/pci0000:00/0000:00:1b.0/sound/card0/input9
snd_hda_intel 0000:01:00.1: enabling device (0000 -> 0002)
hda_intel: Disabling MSI
hda-intel: 0000:01:00.1: Handle VGA-switcheroo audio client
XXX 0000:01:00.1: azx_codec_create entered
XXX 0000:01:00.1: azx_codec_create done
input: HDA NVidia HDMI/DP,pcm=8 as
/devices/pci0000:00/0000:00:01.0/0000:01:00.1/sound/card1/input12
input: HDA NVidia HDMI/DP,pcm=7 as
/devices/pci0000:00/0000:00:01.0/0000:01:00.1/sound/card1/input13
input: HDA NVidia HDMI/DP,pcm=3 as
/devices/pci0000:00/0000:00:01.0/0000:01:00.1/sound/card1/input14
vga_switcheroo: client 1 refused switch
i915: switched off
-- 
Daniel J Blueman
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Powered by blists - more mailing lists