lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [thread-next>] [day] [month] [year] [list]
Message-ID: <20111129100727.GD2675@tiehlicka.suse.cz>
Date:	Tue, 29 Nov 2011 11:07:27 +0100
From:	Michal Hocko <mhocko@...e.cz>
To:	LKML <linux-kernel@...r.kernel.org>
Cc:	Stanislaw Gruszka <sgruszka@...hat.com>,
	"John W. Linville" <linville@...driver.com>,
	linux-wireless@...r.kernel.org
Subject: [3.2-rc3] 100% CPU usage while in del_timer_sync from
 iwl3945_rs_free_sta

[I am not sure whether this is ieee80211 or iwl3945 issue so put both
maintainers into loop]

Hi,
I have just noticed that my CPU0 is hogged by del_timer_sync called from
iwl3945_rs_free_sta and it doesn't seem to be able to do any progress.
Two consequent sysrq+t show that:

kworker/u:23    R running      0  2190      2 0x00000000
 cdc39dec c046e940 00000000 c06cfa40 c06cfa40 eebbdf1c 00003f38 00000000
 00000000 f4df3200 e22e8000 eef8c800 00003f38 cdc39da8 c0156944 00000000
 f62029c0 cdc39de4 c014d8bd 00000000 eebbc346 00003f38 00000046 f62025c0
Call Trace:
 [<c046e940>] ? __schedule+0x86a/0x893
 [<c0156944>] ? tick_program_event+0x24/0x29
 [<c014d8bd>] ? hrtimer_interrupt+0x120/0x1b2
 [<c012a7dd>] ? get_parent_ip+0xb/0x31
 [<c0472b69>] ? add_preempt_count+0x95/0x98
 [<c01161d2>] ? smp_apic_timer_interrupt+0x6b/0x7e
 [<c0470a0e>] ? apic_timer_interrupt+0x2a/0x30
 [<c013d8ff>] ? lock_timer_base.isra.29+0x14/0x41
 [<c013d948>] ? try_to_del_timer_sync+0x1c/0xb9
 [<c013da16>] ? del_timer_sync+0x31/0x3c
 [<f813a431>] ? iwl3945_rs_free_sta+0x10/0x12 [iwl3945]
 [<f81c2f24>] ? __sta_info_free.isra.24+0x25/0x38 [mac80211]
 [<f81c3cb4>] ? __sta_info_destroy+0x239/0x260 [mac80211]
 [<f81c404c>] ? sta_info_flush+0x43/0x6e [mac80211]
 [<f81c92e3>] ? ieee80211_set_disassoc+0x1a3/0x1d7 [mac80211]
 [<f81c9478>] ? ieee80211_sta_connection_lost+0x31/0x8a [mac80211]
 [<c031136f>] ? dev_printk+0x2b/0x2d
 [<f81cbb42>] ? ieee80211_sta_work+0x113/0x126 [mac80211]
 [<f81ce0f9>] ? ieee80211_iface_work+0x23e/0x250 [mac80211]
 [<c0146673>] ? process_one_work+0x19b/0x2e6
 [<f81cdebb>] ? ieee80211_netdev_select_queue+0x14/0x14 [mac80211]
 [<c0146ba1>] ? worker_thread+0x136/0x1ee
 [<c0146a6b>] ? manage_workers.isra.23+0x14f/0x14f
 [<c0149eb9>] ? kthread+0x67/0x6c
 [<c0149e52>] ? kthread_worker_fn+0x119/0x119
 [<c04755b6>] ? kernel_thread_helper+0x6/0x10

and

kworker/u:23    R running      0  2190      2 0x00000000
 00000000 0001103c 00000000 00000000 cdc39d7c c0115c2f cdc39d98 c0155b4c
 00000001 f62025c0 f62025c0 9f8a8400 00003f62 cdc39da8 c0156944 00000000
 f62029c0 cdc39de4 c014d8bd 00000000 9f4d7f93 00003f62 00000046 f62025c0
Call Trace:
 [<c0115c2f>] ? lapic_next_event+0x1b/0x1f
 [<c0155b4c>] ? clockevents_program_event+0xea/0x108
 [<c0156944>] ? tick_program_event+0x24/0x29
 [<c014d8bd>] ? hrtimer_interrupt+0x120/0x1b2
 [<c012a7dd>] ? get_parent_ip+0xb/0x31
 [<c0472ac7>] ? sub_preempt_count+0x81/0x8e
 [<c0138913>] ? irq_exit+0x90/0x92
 [<c01161d7>] ? smp_apic_timer_interrupt+0x70/0x7e
 [<c0470a0e>] ? apic_timer_interrupt+0x2a/0x30
 [<c013d8fb>] ? lock_timer_base.isra.29+0x10/0x41
 [<c013d948>] ? try_to_del_timer_sync+0x1c/0xb9
 [<c013da16>] ? del_timer_sync+0x31/0x3c
 [<f813a431>] ? iwl3945_rs_free_sta+0x10/0x12 [iwl3945]
 [<f81c2f24>] ? __sta_info_free.isra.24+0x25/0x38 [mac80211]
 [<f81c3cb4>] ? __sta_info_destroy+0x239/0x260 [mac80211]
 [<f81c404c>] ? sta_info_flush+0x43/0x6e [mac80211]
 [<f81c92e3>] ? ieee80211_set_disassoc+0x1a3/0x1d7 [mac80211]
 [<f81c9478>] ? ieee80211_sta_connection_lost+0x31/0x8a [mac80211]
 [<c031136f>] ? dev_printk+0x2b/0x2d
 [<f81cbb42>] ? ieee80211_sta_work+0x113/0x126 [mac80211]
 [<f81ce0f9>] ? ieee80211_iface_work+0x23e/0x250 [mac80211]
 [<c0146673>] ? process_one_work+0x19b/0x2e6
 [<f81cdebb>] ? ieee80211_netdev_select_queue+0x14/0x14 [mac80211]
 [<c0146ba1>] ? worker_thread+0x136/0x1ee
 [<c0146a6b>] ? manage_workers.isra.23+0x14f/0x14f
 [<c0149eb9>] ? kthread+0x67/0x6c
 [<c0149e52>] ? kthread_worker_fn+0x119/0x119
 [<c04755b6>] ? kernel_thread_helper+0x6/0x10

Looks we are stuck in lock_timer_base loop. I am not familiar with the
code but either base is NULL or we are racing with timer->base changes.
I guess that the first one sounds more probable. The timer in question
is &rs_sta->rate_scale_flush.

The disassoc came after resume from RAM when the network was gone (I
have moved to the office from home). I didn't see this issue with 3.1
kernel and early 3.20rc* were too unstable on my laptop so I had to skip
them.

config is attached.

Let me know if you need any further information.

Thanks
-- 
Michal Hocko
SUSE Labs
SUSE LINUX s.r.o.
Lihovarska 1060/12
190 00 Praha 9    
Czech Republic

Download attachment "config.gz" of type "application/octet-stream" (17157 bytes)

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ