lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <alpine.LFD.2.01.0911101343590.31845@localhost.localdomain>
Date:	Tue, 10 Nov 2009 14:02:22 -0800 (PST)
From:	Linus Torvalds <torvalds@...ux-foundation.org>
To:	"Rafael J. Wysocki" <rjw@...k.pl>
cc:	Thomas Gleixner <tglx@...utronix.de>,
	Mike Galbraith <efault@....de>, Ingo Molnar <mingo@...e.hu>,
	LKML <linux-kernel@...r.kernel.org>,
	pm list <linux-pm@...ts.linux-foundation.org>,
	Greg KH <gregkh@...e.de>,
	Jesse Barnes <jbarnes@...tuousgeek.org>,
	Tejun Heo <tj@...nel.org>, Oleg Nesterov <oleg@...hat.com>
Subject: Re: GPF in run_workqueue()/list_del_init(cwq->worklist.next) on
 resume (was: Re: Help needed: Resume problems in 2.6.32-rc, perhaps related
 to preempt_count leakage in keventd)



On Tue, 10 Nov 2009, Rafael J. Wysocki wrote:
> > 
> > So somebody is trying to remove a list entry that was already free'd.
> 
> The entry is being removed by run_workqueue(), so I wonder what the scenario is.

If the workqueue code runs on the wrogn CPU, any random crap can easily 
happen. If a workqueue entry gets queued on the wrong CPU, people who try 
to unqueue it may not find it again, and will thus fail to remove it even 
though it's still listed.

And the code really is pretty subtle. queue_delayed_work_on(), for 
example, uses raw_smp_processor_id() to basically pick a target CPU for 
the work ("whatever the CPU happens to be now"). But does that have to 
match the CPU that the timeout will trigger on? I dunno. I've walked 
through the code many times, but every time I end up forgetting all the 
subtleties a few days later.

The workqueue code is very fragile in many ways. I'm adding Oleg and Tejun 
to the Cc, because Oleg is definitely one of the workqueue locking 
masters, and Tejun has worked on it to make it way more robust, so they 
may have ideas.

Oleg, Tejun? 

> It appears that the problem is not reproducible with init=/bin/bash and the
> minimal set of modules loaded, so perhaps one of the (modular) drivers
> removes a work item from keventd_wq in a racy way.

It's not necessarily that they remove a work item racily - maybe there is 
something that frees a work-item without removing it at all (or, equally 
possible: maybe it does remove it, but has a racy re-attach that means 
that it gets added back to the workqueue again and then freed).

> In the meantime I got another trace, this time with a slab corruption involved.
> Note that it crashed in exactly the same place as previously.

I'm leaving your crash log appended for the new cc's, and I would not be 
at all surprised to hear that the slab corruption is related. The whole 
6b6b6b6b pattern does imply a use-after-free on the workqueue, after all, 
and slab corruption could easily be another result.

Sadly, the 'size-512' thing doesn't really say much. There's a hint that 
it may be sound-related, because the second redzone report (the firxt 
kzalloc) has the data being

 - 16 bytes of zero
 - followed by 16 bytes of "PCM Playback Vol"

and the module information there also says [snd]. But because it's a 
size-512 thing (ie kmalloc/kfree), it may be that the sound driver was 
just the next user of the invalid 512-byte block, and wasn't the _cause_ 
of the problem.

So I don't know, but it might be worthwhile to see if the problem goes 
away if you disable sound. Just to narrow down what has changed.

		Linus
---
> [  210.450676] usb 1-2: GSM modem (1-port) converter now attached to ttyUSB0
> [  210.450725] option 1-2:1.1: GSM modem (1-port) converter detected
> [  210.450997] usb 1-2: GSM modem (1-port) converter now attached to ttyUSB1
> [  210.451044] option 1-2:1.2: GSM modem (1-port) converter detected
> [  210.451422] usb 1-2: GSM modem (1-port) converter now attached to ttyUSB2
> [  210.451470] option 1-2:1.3: GSM modem (1-port) converter detected
> [  210.451746] usb 1-2: GSM modem (1-port) converter now attached to ttyUSB3
> [  210.452092] PM: Finishing wakeup.
> [  210.452095] Restarting tasks ... 
> [  210.452293] usb 5-2: USB disconnect, address 2
> [  210.470519] done.
> [  210.845547] Slab corruption: size-512 start=ffff88007f1182b0, len=512
> [  210.845718] Redzone: 0x9f911029d74e35b/0x9f911029d74e35b.
> [  210.845880] Last user: [<ffffffff810c8bdb>](kfree+0x7c/0x88)
> [  210.846142] 050: 6b 6b 6b 6b 6b 6b 6b 6b 38 e7 60 01 00 88 ff ff
> [  210.847114] Prev obj: start=ffff88007f118098, len=512
> [  210.847275] Redzone: 0xd84156c5635688c0/0xd84156c5635688c0.
> [  210.847436] Last user: [<ffffffffa01ffce0>](kzalloc+0xf/0x11 [snd])
> [  210.847724] 000: 00 00 00 00 02 00 00 00 00 00 00 00 00 00 00 00
> [  210.850735] 010: 50 43 4d 20 50 6c 61 79 62 61 63 6b 20 56 6f 6c
> [  210.851844] Next obj: start=ffff88007f1184c8, len=512
> [  210.852220] Redzone: 0xd84156c5635688c0/0xd84156c5635688c0.
> [  210.852384] Last user: [<ffffffff8106a4c8>](kzalloc+0xf/0x11)
> [  210.852655] 000: 17 ea 3b 81 ff ff ff ff 00 00 00 00 00 00 00 00
> [  210.853981] 010: 24 01 00 00 00 00 00 00 0e a9 06 81 ff ff ff ff
> [  210.887295] e1000e 0000:01:00.0: irq 27 for MSI/MSI-X
> [  210.940224] e1000e 0000:01:00.0: irq 27 for MSI/MSI-X
> [  210.941776] ADDRCONF(NETDEV_UP): eth0: link is not ready
> [  211.061316] general protection fault: 0000 [#1] PREEMPT SMP 
> [  211.061626] last sysfs file: /sys/class/pcmcia_socket/pcmcia_socket0/card_insert
> [  211.061891] CPU 0 
> [  211.062090] Modules linked in: ip6t_LOG xt_tcpudp xt_pkttype ipt_LOG xt_limit af_packet bnep sco rfcomm l2cap crc16 snd_pcm_oss snd_mixer_oss snd_seq binfmt_misc snd_seq_device ip6t_REJECT nf_conntrack_ipv6 ip6table_raw xt_NOTRACK ipt_REJECT xt_state iptable_raw iptable_filter ip6table_mangle nf_conntrack_netbios_ns nf_conntrack_ipv4 nf_conntrack nf_defrag_ipv4 cpufreq_conservative ip_tables cpufreq_ondemand cpufreq_userspace cpufreq_powersave ip6table_filter acpi_cpufreq ip6_tables x_tables freq_table ipv6 microcode fuse loop sr_mod cdrom dm_mod btusb snd_hda_codec_realtek bluetooth arc4 snd_hda_intel ecb snd_hda_codec pcmcia snd_hwdep iwlcore snd_pcm mac80211 toshiba_acpi joydev snd_timer yenta_socket usbhid option video cfg80211 backlight rtc_cmos snd rsrc_nonstatic psmouse usb_storage hid output usbserial rtc_core iTCO_wdt soundcore battery pcmcia_core ac rfkill led_class button intel_agp rtc_lib crc_itu_t serio_raw e1000e iTCO_vendor_support snd_page_alloc uinput s!
 g ehci_hcd uhci_hcd sd_mod crc_t10dif usbcore ext3 jbd fan ahci libata thermal processor [last unloaded: iwlagn]
> [  211.064780] Pid: 9, comm: events/0 Not tainted 2.6.32-rc6-tst #169 PORTEGE R500
> [  211.064780] RIP: 0010:[<ffffffff81054c07>]  [<ffffffff81054c07>] worker_thread+0x15b/0x22a
> [  211.064780] RSP: 0018:ffff88007f0d9e40  EFLAGS: 00010046
> [  211.064780] RAX: ffff88007f118308 RBX: ffff88007f09bd48 RCX: 6b6b6b6b6b6b6b6b
> [  211.064780] RDX: 6b6b6b6b6b6b6b6b RSI: 0000000000000000 RDI: ffff880001613d00
> [  211.064780] RBP: ffff88007f0d9ee0 R08: ffff88007f0d9cf0 R09: ffff88007f0d9e02
> [  211.064780] R10: ffff880001613d00 R11: 0000000000000000 R12: ffff88007f118300
> [  211.064780] R13: ffff880001613d00 R14: ffff88007f0b9140 R15: ffff88007f0b9140
> [  211.064780] FS:  0000000000000000(0000) GS:ffff880001600000(0000) knlGS:0000000000000000
> [  211.064780] CS:  0010 DS: 0018 ES: 0018 CR0: 000000008005003b
> [  211.064780] CR2: 00007f7678984660 CR3: 000000007d3ce000 CR4: 00000000000006f0
> [  211.064780] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
> [  211.064780] DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400
> [  211.064780] Process events/0 (pid: 9, threadinfo ffff88007f0d8000, task ffff88007f0b9140)
> [  211.064780] Stack:
> [  211.064780]  000000000000c918 ffff88007f0b9578 ffff88007f0d9fd8 ffff88007f0b9140
> [  211.064780] <0> ffff880001613d08 ffff88007f0b9140 ffff880001613d18 6b6b6b6b6b6b6b6b
> [  211.064780] <0> 0000000000000000 ffff88007f0b9140 ffffffff81058289 ffff88007f0d9e98
> [  211.064780] Call Trace:
> [  211.064780]  [<ffffffff81058289>] ? autoremove_wake_function+0x0/0x38
> [  211.064780]  [<ffffffff81054aac>] ? worker_thread+0x0/0x22a
> [  211.064780]  [<ffffffff81058062>] kthread+0x69/0x71
> [  211.064780]  [<ffffffff8100c16a>] child_rip+0xa/0x20
> [  211.064780]  [<ffffffff81057ff9>] ? kthread+0x0/0x71
> [  211.064780]  [<ffffffff8100c160>] ? child_rip+0x0/0x20
> [  211.064780] Code: 74 12 4c 89 e6 4c 89 f7 ff 13 48 83 c3 08 48 83 3b 00 eb ec e8 3d ef ff ff 49 8b 45 08 4d 89 65 30 4c 89 ef 48 8b 08 48 8b 50 08 <48> 89 51 08 48 89 0a 48 89 40 08 48 89 00 e8 f6 11 24 00 49 8b 
> [  211.064780] RIP  [<ffffffff81054c07>] worker_thread+0x15b/0x22a
> [  211.064780]  RSP <ffff88007f0d9e40>
> [  211.064780] ---[ end trace 586a2f2b96b909e0 ]---
> [  211.064780] note: events/0[9] exited with preempt_count 1
> 
> Rafael
> 
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ