lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <s5hfwao5jpe.wl%tiwai@suse.de>
Date:	Fri, 25 May 2012 18:06:21 +0200
From:	Takashi Iwai <tiwai@...e.de>
To:	Jörg-Volker Peetz <jvpeetz@....de>
Cc:	Tejun Heo <tj@...nel.org>, Fengguang Wu <fengguang.wu@...el.com>,
	linux-kernel@...r.kernel.org
Subject: Re: Linux 3.4 released

At Fri, 25 May 2012 17:33:11 +0200,
Jörg-Volker Peetz wrote:
> 
> Hello,
> 
> Takashi Iwai wrote, on 05/25/12 09:25:
> > At Wed, 23 May 2012 13:26:57 -0700,
> > Tejun Heo wrote:
> >>
> >> Cc'ing Takashi.  Hi!
> > 
> > Also Cc'ed Fengguang, who worked on ELD stuff.
> > 
> >> On Wed, May 23, 2012 at 09:56:36PM +0200, Jörg-Volker Peetz wrote:
> >>> May 23 21:32:33 hostname kernel: XXX delayed_work_timer_fn: cwq
> >>> (null), fn=hdmi_repoll_eld
> >>
> >> So, we have the winner.
> >>
> >> Takashi, sound/pci/hda/patch_hdmi.c::hdmi_repoll_eld() is causing
> >> workqueue code dereference %NULL pointer.  It *looks* like something
> >> is corrupting the work item while it's queued.  It could be a
> >> workqueue bug but I don't think that's likely - the code has been
> >> stable for quite some time now.  I glanced through the code and
> >> nothing stands out.  Does something ring a bell?
> > 
> > I also don't know of this problem.  My initial thought was that the
> > work struct placed right after sink_eld in struct hdmi_spec_per_pin is
> > overwritten wrongly by reading some ELD data.  But I failed to spot
> > out the bug...
> > 
> > Reading back through the thread, the problem seems triggered via usb
> > video cam.  I wonder how this is connected to the HDMI audio.
> > 
> > To get things straight: does this bug happen even without HDMI, DP or
> > DVI cable plugged, i.e. only with the laptop without connecting to the
> > external digital output?
> > 
> yes it happens without any HDMI cable plugged. The notebook is only connected to
> an ethernet cable and the power cable. I'll append /var/log/dmesg, it also
> contains the kernel command line with "radeon.audio=1".
> 
> The computer has two graphic chips:
> ATI Mobility Radeon HD 4200 integrated graphics (non-free firmware R600_rlc.bin)
> ATI Mobility Radeon HD 5470 graphic (512MB) (non-free firmware CEDAR_*.bin)
> During booting, the discrete GPU is switched off using vga switcheroo:
> 
> $ mount -t debugfs none /sys/kernel/debug
> $ echo -n OFF > /sys/kernel/debug/vgaswitcheroo/switch

This explains the codec stall, at least.  Disabling the D-GPU also
disables the HD-audio controller.  Once when it's disabled, even
accessing the PCI may trigger an Oops.  It's a known problem.

The support of vga-switcheroo for HD-audio was recently added, and I
sent a pull request to Linus today.  Try the latest Linus tree and
pull sound git tree hda-switcheroo tag onto it:
  git://git.kernel.org/pub/scm/linux/kernel/git/tiwai/sound.git tags/hda-switcheroo

I'm not sure whether this is related with the workq Oops, though.
At least, you can try without disabling D-GPU to check whether you see
the same workq problem.


> For the sound kernel module the following options are set in
> /etc/modprobe.d/alsa-base.conf:
> 
> options snd-hda-intel model=hp-dv7-4000 enable_msi=1
> 
> > 
> >>> (without line-break).
> >>>
> >>> By the way, don't know if this is related, I have a phenomenon with a spurious
> >>> interrupt with every linux version I've used before on this notebook. Half a
> >>> minute after starting the system the computer produces approx. 220 lines like
> >>>
> >>> ... kernel: hda-intel: spurious response 0x0:0x0, last cmd=0x170503
> >>>
> >>> Now with 3.4.0, I see an additional message right before (the minute before) the
> >>> "XXX ..." line:
> >>>
> >>> ...kernel: hda_intel: azx_get_response timeout, switching to single_cmd mode:
> >>> last cmd=0x003f0900
> >>
> >> These too seem to be for you, Takashi. :)
> > 
> > This means essentially the codec communication got stalled.  This is a
> > bad signal.  It happens often with a wrong HD-audio verb, but often
> > with a bad IRQ, whatever.
> > 
> > I'd need alsa-info.sh output (run with --no-upload option) for further
> > analysis.
> > 
> > 
> > thanks,
> > 
> > Takashi
> 
> My first try to run the alsa-info.sh script with the plain 3.4 kernel produced
> the same kernel oops freezing the notebook (and /tmp is mounted on tmpfs).
> Therefore I applied the patch from Tejun to produce a usable output.
> I attach it also. As you will notice, it contains the line beginning with "XXX"
> due to Tejun's patch.

Get alsa-info.sh without disabling D-GPU if you run it on 3.4 or
earlier kernel.


thanks,

Takashi
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ