netdev - [e1000e] BUG triggered when triggering LED blinking

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [thread-next>] [day] [month] [year] [list]

Message-ID: <20101109083954.GB11829@mail.eitzenberger.org>
Date:	Tue, 9 Nov 2010 09:39:54 +0100
From:	Holger Eitzenberger <holger@...zenberger.org>
To:	e1000-devel@...ts.sourceforge.net
Cc:	netdev@...r.kernel.org
Subject: [e1000e] BUG triggered when triggering LED blinking

Hi,

using e1000e driver version 1.2.10 and kernel version 2.6.32.24 I see
the kernel go BUG() sporadically at the time 'ethtool -p eth0 3' comes
back.

Network hardware is four times 'Intel Corporation 82583V Gigabit Network
Connection' (0x8086:0x150c) on Atom N450.

kernel BUG at kernel/workqueue.c:287!
invalid opcode: 0000 [#1] SMP
last sysfs file:
/sys/devices/pci0000:00/0000:00:1d.0/usb2/2-1/2-1:1.1/input/input2/event2/dev
Modules linked in: nls_utf8 isofs edd ide_cd_mod sr_mod cdrom sg sd_mod
pata_acpi ata_generic usb_storage ppdev ide_pci_generic ata_piix libata
evdev rtc_cmos uhci_hcd parport_pc scsi_mod i2c_i801 ehci_hcd rtc_core
rtc_lib e1000e parport ftdi_sio usbhid usbserial

Pid: 8, comm: events/1 Not tainted (2.6.32.24-62.gce8dff6-ai #1) To Be
Filled By O.E.M.
EIP: 0060:[<c102ed4f>] EFLAGS: 00010206 CPU: 1
EIP is at worker_thread+0xc5/0x144
EAX: c1c052e0 EBX: c1d052e0 ECX: f70cac1c EDX: f70cac1c
ESI: f81c75b0 EDI: f70cac18 EBP: f705e3b0 ESP: f7093f90
 DS: 007b ES: 007b FS: 00d8 GS: 0000 SS: 0068
Process events/1 (pid: 8, ti=f7092000 task=f705e3b0 task.ti=f7092000)
Stack:
 f705e5a0 c1d052ec c1d052e4 00000000 f705e3b0 c103151e f7093fa8 f7093fa8
<0> f7061f68 c1d052e0 c102ec8a 00000000 c103132b 00000000 00000000
00000000
<0> f7093fd0 f7093fd0 c10312ca 00000000 00000000 c100329f f7061f5c
00000000
Call Trace:
 [<c103151e>] ? autoremove_wake_function+0x0/0x2d
 [<c102ec8a>] ? worker_thread+0x0/0x144
 [<c103132b>] ? kthread+0x61/0x66
 [<c10312ca>] ? kthread+0x0/0x66
 [<c100329f>] ? kernel_thread_helper+0x7/0x10
Code: e9 85 00 00 00 8d 79 fc 8b 77 0c 89 7b 18 8b 11 8b 41 04 89 42 04
89 10 89 09 89 49 04 f0 fe 03 fb 8b 41 fc 83 e0 fc 39 c3 74 04 <0f> 0b
eb fe f0 80 61 fc fe 89 f8 ff d6 89 e0 25 00 e0 ff ff 8b
EIP: [<c102ed4f>] worker_thread+0xc5/0x144 SS:ESP 0068:f7093f90
---[ end trace e297b781eb382c2f ]---

The full trace is attached, it may become clearer from that.

After taking a look I think this may be caused by initializing
adapter->led_blink_task several times in e1000_phys_id(), while possibly
led_blink_task is running:

	if ((hw->phy.type == e1000_phy_ife) ||
	    (hw->mac.type == e1000_pchlan) ||
	    (hw->mac.type == e1000_82574)) {
		INIT_WORK(&adapter->led_blink_task, e1000e_led_blink_task);
		if (!adapter->blink_timer.function) {

I can't reproduce it after moving it inside the following if block,
but I'm not quite sure if this catches all races in there.  Especially
the msleep_interruptible() may be too optimistic because it may
actually not wait long enough.  Someone with more knowledge of the
driver should take a look.

I've attached a proposed fix for the double initialization, please check.

 /holger

View attachment "putty.log" of type "text/plain" (4366 bytes)

View attachment "e1000e-fix.diff" of type "text/x-diff" (709 bytes)