lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Date:	Fri, 25 May 2012 20:41:39 +0200
From:	Jörg-Volker Peetz <jvpeetz@....de>
To:	Takashi Iwai <tiwai@...e.de>
CC:	Tejun Heo <tj@...nel.org>, Fengguang Wu <fengguang.wu@...el.com>,
	linux-kernel@...r.kernel.org
Subject: Re: Linux 3.4 released

Takashi Iwai wrote, on 05/25/12 18:06:
> At Fri, 25 May 2012 17:33:11 +0200,
> Jörg-Volker Peetz wrote:
>>
>> Hello,
>>
>> Takashi Iwai wrote, on 05/25/12 09:25:
>>> At Wed, 23 May 2012 13:26:57 -0700,
>>> Tejun Heo wrote:
>>>>
>>>> Cc'ing Takashi.  Hi!
>>>
>>> Also Cc'ed Fengguang, who worked on ELD stuff.
>>>
>>>> On Wed, May 23, 2012 at 09:56:36PM +0200, Jörg-Volker Peetz wrote:
>>>>> May 23 21:32:33 hostname kernel: XXX delayed_work_timer_fn: cwq
>>>>> (null), fn=hdmi_repoll_eld
>>>>
>>>> So, we have the winner.
>>>>
>>>> Takashi, sound/pci/hda/patch_hdmi.c::hdmi_repoll_eld() is causing
>>>> workqueue code dereference %NULL pointer.  It *looks* like something
>>>> is corrupting the work item while it's queued.  It could be a
>>>> workqueue bug but I don't think that's likely - the code has been
>>>> stable for quite some time now.  I glanced through the code and
>>>> nothing stands out.  Does something ring a bell?
>>>
>>> I also don't know of this problem.  My initial thought was that the
>>> work struct placed right after sink_eld in struct hdmi_spec_per_pin is
>>> overwritten wrongly by reading some ELD data.  But I failed to spot
>>> out the bug...
>>>
>>> Reading back through the thread, the problem seems triggered via usb
>>> video cam.  I wonder how this is connected to the HDMI audio.
>>>
>>> To get things straight: does this bug happen even without HDMI, DP or
>>> DVI cable plugged, i.e. only with the laptop without connecting to the
>>> external digital output?
>>>
>> yes it happens without any HDMI cable plugged. The notebook is only connected to
>> an ethernet cable and the power cable. I'll append /var/log/dmesg, it also
>> contains the kernel command line with "radeon.audio=1".
>>
>> The computer has two graphic chips:
>> ATI Mobility Radeon HD 4200 integrated graphics (non-free firmware R600_rlc.bin)
>> ATI Mobility Radeon HD 5470 graphic (512MB) (non-free firmware CEDAR_*.bin)
>> During booting, the discrete GPU is switched off using vga switcheroo:
>>
>> $ mount -t debugfs none /sys/kernel/debug
>> $ echo -n OFF > /sys/kernel/debug/vgaswitcheroo/switch
> 
> This explains the codec stall, at least.  Disabling the D-GPU also
> disables the HD-audio controller.  Once when it's disabled, even
> accessing the PCI may trigger an Oops.  It's a known problem.
> 
> The support of vga-switcheroo for HD-audio was recently added, and I
> sent a pull request to Linus today.  Try the latest Linus tree and
> pull sound git tree hda-switcheroo tag onto it:
>   git://git.kernel.org/pub/scm/linux/kernel/git/tiwai/sound.git tags/hda-switcheroo
> 

I will try that and report the result. Is it ok if I use the patch of Tejun on
top of this in order to avoid a freeze?

> I'm not sure whether this is related with the workq Oops, though.
> At least, you can try without disabling D-GPU to check whether you see
> the same workq problem.
>
Simply switching on the discrete GPU with

$ echo -n ON > /sys/kernel/debug/vgaswitcheroo/switch

after it has been switched off results in the same oops and the output of
alsa-info.sh differs only in a few lines (see the attached diff-file).

> 
>> For the sound kernel module the following options are set in
>> /etc/modprobe.d/alsa-base.conf:
>>
>> options snd-hda-intel model=hp-dv7-4000 enable_msi=1
>>
>>>
>>>>> (without line-break).
>>>>>
>>>>> By the way, don't know if this is related, I have a phenomenon with a spurious
>>>>> interrupt with every linux version I've used before on this notebook. Half a
>>>>> minute after starting the system the computer produces approx. 220 lines like
>>>>>
>>>>> ... kernel: hda-intel: spurious response 0x0:0x0, last cmd=0x170503
>>>>>
>>>>> Now with 3.4.0, I see an additional message right before (the minute before) the
>>>>> "XXX ..." line:
>>>>>
>>>>> ...kernel: hda_intel: azx_get_response timeout, switching to single_cmd mode:
>>>>> last cmd=0x003f0900
>>>>
>>>> These too seem to be for you, Takashi. :)
>>>
>>> This means essentially the codec communication got stalled.  This is a
>>> bad signal.  It happens often with a wrong HD-audio verb, but often
>>> with a bad IRQ, whatever.
>>>
>>> I'd need alsa-info.sh output (run with --no-upload option) for further
>>> analysis.
>>>
>>>
>>> thanks,
>>>
>>> Takashi
>>
>> My first try to run the alsa-info.sh script with the plain 3.4 kernel produced
>> the same kernel oops freezing the notebook (and /tmp is mounted on tmpfs).
>> Therefore I applied the patch from Tejun to produce a usable output.
>> I attach it also. As you will notice, it contains the line beginning with "XXX"
>> due to Tejun's patch.
> 
> Get alsa-info.sh without disabling D-GPU if you run it on 3.4 or
> earlier kernel.
> 
For the case without mounting debugfs and , thus, both GPUS active, the output
of alsa-info.sh is also attached. It doesn't trigger the oops and the viewer for
the built-in USB-camera works also without triggering the oops.
> 
> thanks,
> 
> Takashi
-- 
Best regards,
Jörg-Volker.


View attachment "alsa-info.txt-ddis-switched-on.diff" of type "text/x-diff" (1015 bytes)

View attachment "alsa-info.txt-both-gpus" of type "text/plain" (23860 bytes)

Powered by blists - more mailing lists