lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [day] [month] [year] [list]
Date:	Tue, 29 Nov 2011 19:35:41 +0100
From:	Lucas Stach <dev@...xeye.de>
To:	Borislav Petkov <bp@...en8.de>,
	Francois Romieu <romieu@...zoreil.com>
Cc:	LKML <linux-kernel@...r.kernel.org>,
	Netdev <netdev@...r.kernel.org>
Subject: Lockups with r8169 driver (Was: Hangs with Linux 3.2.0-rc3)

Hi Boris,

Am Dienstag, den 29.11.2011, 16:43 +0100 schrieb Borislav Petkov:
> On Tue, Nov 29, 2011 at 09:06:55AM +0100, Borislav Petkov wrote:
> > > Nov 29 00:22:04 tellur kernel: [13936.370598] WARNING: at kernel/watchdog.c:241 watchdog_overflow_callback+0x9b/0xa6()
> > > Nov 29 00:22:04 tellur kernel: [13936.370598] Hardware name: GA-970A-UD3
> > > Nov 29 00:22:04 tellur kernel: [13936.370598] Watchdog detected hard LOCKUP on cpu 0
> > > Nov 29 00:22:04 tellur kernel: [13936.370598] Modules linked in: tcp_lp ppdev parport_pc lp parport fuse nfs fscache auth_rpcgss nfs_acl ebtable_nat ebtables ipt_MASQUERADE iptable_nat nf_nat iptable_mangle tun lockd bridge stp llc bluetooth rfkill it87 adt7475 hwmon_vid ip6t_REJECT nf_conntrack_ipv6 nf_defrag_ipv6 ip6table_filter ip6_tables snd_hda_codec_realtek snd_hda_intel snd_hda_codec snd_hwdep snd_seq snd_seq_device snd_pcm ir_lirc_codec lirc_dev ir_mce_kbd_decoder snd_timer snd ir_sony_decoder ir_jvc_decoder ir_rc6_decoder ata_generic pata_acpi edac_core serio_raw soundcore joydev ir_rc5_decoder pata_atiixp ir_nec_decoder r8169 mii mceusb rc_core virtio_net edac_mce_amd sp5100_tco i2c_piix4 xhci_hcd fam15h_power pcspkr snd_page_alloc microcode k10temp virtio_ring virtio kvm_amd kvm uinput sunrpc ipv6 usb_storage firewire_ohci uas firewire_core crc_itu_t nouveau ttm drm_kms_helper drm hwmon i2c_algo_bit i2c_core mxm_wmi video wmi [last unloaded: scsi_wait_scan]
> > > Nov 29 00:22:04 tellur kernel: [13936.370598] Pid: 0, comm: swapper Not tainted 3.2.0-rc3+ #39
> > > Nov 29 00:22:04 tellur kernel: [13936.370598] Call Trace:
> > > Nov 29 00:22:04 tellur kernel: [13936.370598]  <NMI>  [<ffffffff81050c5a>] warn_slowpath_common+0x83/0x9b
> > > Nov 29 00:22:04 tellur kernel: [13936.370598]  [<ffffffff81050d15>] warn_slowpath_fmt+0x46/0x48
> > > Nov 29 00:22:04 tellur kernel: [13936.370598]  [<ffffffff81015204>] ? native_sched_clock+0x34/0x36
> > > Nov 29 00:22:04 tellur kernel: [13936.370598]  [<ffffffff810a7abb>] watchdog_overflow_callback+0x9b/0xa6
> > > Nov 29 00:22:04 tellur kernel: [13936.370598]  [<ffffffff810d28c5>] __perf_event_overflow+0x100/0x17f
> > > Nov 29 00:22:04 tellur kernel: [13936.370598]  [<ffffffff8107055b>] ? local_clock+0x27/0x2f
> > > Nov 29 00:22:04 tellur kernel: [13936.370598]  [<ffffffff810d0cb6>] ? perf_event_update_userpage+0xf/0xa3
> > > Nov 29 00:22:04 tellur kernel: [13936.370598]  [<ffffffff810d2f2e>] perf_event_overflow+0x14/0x16
> > > Nov 29 00:22:04 tellur kernel: [13936.370598]  [<ffffffff8101c22a>] x86_pmu_handle_irq+0xbe/0xf9
> > > Nov 29 00:22:04 tellur kernel: [13936.370598]  [<ffffffff814aa5ee>] perf_event_nmi_handler+0x19/0x1b
> > > Nov 29 00:22:04 tellur kernel: [13936.370598]  [<ffffffff814a9f77>] nmi_handle+0x42/0x67
> > > Nov 29 00:22:04 tellur kernel: [13936.370598]  [<ffffffff814aa028>] do_nmi+0x8c/0x26f
> > > Nov 29 00:22:04 tellur kernel: [13936.370598]  [<ffffffff814a9830>] nmi+0x20/0x30
> > > Nov 29 00:22:04 tellur kernel: [13936.370598]  [<ffffffffa023bf55>] ? rtl8169_interrupt+0x268/0x2a4 [r8169]
> 
> Btw,
> 
> this looks like the box hanged itself after getting an net IRQ over your
> r8169 which reminds me of this other issue being debugged on lkml in
> conjunction with r8169:
> 
> http://marc.info/?l=linux-kernel&m=132225246211817&w=2

Thanks for the pointer, this helped a lot.

It seems the lockup is triggered by the r6169 driver. I can reproduce it
with rc2 also. With AMD IOMMU enabled I get the following in dmesg just
before the lockup:
"AMD-Vi: Event logged [IO_PAGE_FAULT device=03:00.0 domain=0x0018 address=0x0000000000003000 flags=0x0050]"
where device=03:00.0 is my r8169 ethernet adapter. So it seems the
hardware tries to dma to a location it's not supposed to dma to.

Dropping Andreas Herrmann from cc as this is clearly not Bulldozer
related. Instead adding netdev and r8169 maintainer. For completeness I
will reattach the whole crashdump and lspci -vv for the ethernet device.

I will try the last patch mentioned in the linked thread and report
back.

Thanks,
Lucas

> 
> Can you try reproducing the issue without having network traffic. Also,
> according to the thread above, 3.1 kernel is also affected so it could
> make sense for you go back to 3.0 and see whether it happens with it
> too.
> 
> HTH.
> 


View attachment "crash" of type "text/plain" (44488 bytes)

View attachment "lspci" of type "text/plain" (3257 bytes)

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ