linux-kernel - Re: 2.6.18-rc3-g3b445eea BUG: warning at /usr/src/linux-git/kernel/cpu.c:51/unlock_cpu

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite for Android: free password hash cracker in your pocket

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <6bffcb0e0608060959m164436baj9c4c602496e87f5d@mail.gmail.com>
Date:	Sun, 6 Aug 2006 18:59:54 +0200
From:	"Michal Piotrowski" <michal.k.k.piotrowski@...il.com>
To:	"Dave Jones" <davej@...hat.com>,
	"Linus Torvalds" <torvalds@...l.org>,
	"Michal Piotrowski" <michal.k.k.piotrowski@...il.com>,
	LKML <linux-kernel@...r.kernel.org>
Subject: Re: 2.6.18-rc3-g3b445eea BUG: warning at /usr/src/linux-git/kernel/cpu.c:51/unlock_cpu_hotplug()

Hi Dave,

On 05/08/06, Dave Jones <davej@...hat.com> wrote:
> On Fri, Aug 04, 2006 at 10:49:47PM -0400, Dave Jones wrote:
>
> This trace now makes a lot more sense to me.
>
>  > CPU1 called lock_cpu_hotplug() for app cpuspeed. recursive_depth=0
>  >  [<c0104edc>] show_trace_log_lvl+0x58/0x152
>  >  [<c01054c2>] show_trace+0xd/0x10
>  >  [<c01055db>] dump_stack+0x19/0x1b
>  >  [<c013e8c3>] lock_cpu_hotplug+0x39/0xbf
>  >  [<c029fbae>] store_scaling_governor+0x142/0x1a3
>  >  [<c029f1a5>] store+0x37/0x48
>  >  [<c01a6561>] sysfs_write_file+0xab/0xd1
>  >  [<c016f99f>] vfs_write+0xab/0x157
>  >  [<c016ffe4>] sys_write+0x3b/0x60
>  >  [<c0103db9>] sysenter_past_esp+0x56/0x8d
>  > cpuspeed acquired cpu_bitmask_lock
>  >
>  > CPU1 called lock_cpu_hotplug() for app cpuspeed. recursive_depth=0
>  >  [<c0104edc>] show_trace_log_lvl+0x58/0x152
>  >  [<c01054c2>] show_trace+0xd/0x10
>  >  [<c01055db>] dump_stack+0x19/0x1b
>  >  [<c013e8c3>] lock_cpu_hotplug+0x39/0xbf
>  >  [<c0132f3c>] __create_workqueue+0x52/0x122
>  >  [<f901234b>] cpufreq_governor_dbs+0x9f/0x2c3 [cpufreq_ondemand]
>  >  [<c029f7b6>] __cpufreq_governor+0x57/0xd8
>  >  [<c029f985>] __cpufreq_set_policy+0x14e/0x1bc
>  >  [<c029fbc5>] store_scaling_governor+0x159/0x1a3
>  >  [<c029f1a5>] store+0x37/0x48
>  >  [<c01a6561>] sysfs_write_file+0xab/0xd1
>  >  [<c016f99f>] vfs_write+0xab/0x157
>  >  [<c016ffe4>] sys_write+0x3b/0x60
>  >  [<c0103db9>] sysenter_past_esp+0x56/0x8d
>  > Lukewarm IQ detected in hotplug locking
>  > BUG: warning at kernel/cpu.c:46/lock_cpu_hotplug()
>
> So when we write to sysfs to set the governor, we end up in store_scaling_governor()
> which takes the hotplug lock, and then calls off into the governor to let it
> do its thing. Part of ondemand's "thing" is to create a workqueue.
> unfortunatly, __create_workqueue also takes the hotplug lock.
>
> Creating a variant of __create_workqueue that doesn't take the lock
> seems really nasty.
>
> We could remove the locking from store_scaling_governor() and make the governors
> themselves have to do the locking, but I'm not sure that's entirely safe.
>
> We could do something really disgusting like ...
>
>         unlock_cpu_hotplug()
>         ...
>         create_workqueue()
>         ...
>         lock_cpu_hotplug()
>
> in ondemand, which opens up a tiny race window, but as ugly as it is,
> looks to be the best solution of the bunch right now.
>
> Comments?
>
> The really sad part is this is completely unrelated to the original bug reported
> in this thread, which shows just how widespread this braindamage is.
> Michal's traces really don't really scream anything obvious to me.
> (Though given it took me 4 hours to decode my own traces above, this is no
> real sign of how big a problem this might be).
>
> Michal, could you apply this diff.. http://lkml.org/lkml/diff/2006/8/4/381/1
> (change the '120' to '60' first), and send me the debug spew that you get ?
> You'll have to wait until a minute of uptime has passed. Oh, and edit
> include/linux/jiffies.h to change INITIAL_JIFFIES to '0'.

I hope that this one will help

BUG: using smp_processor_id() in preemptible [00000001] code: cpuspeed/1433
caller is lock_cpu_hotplug+0x25/0xc5
 [<fd95918f>] store_speed+0x36/0x9b [cpufreq_userspace]
 [<c029fa95>] store+0x37/0x48
 [<c0104007>] show_trace_log_lvl+0x58/0x159
 [<c0104765>] show_trace+0xd/0x10
 [<c010482d>] dump_stack+0x19/0x1b
 [<c01fc842>] debug_smp_processor_id+0x96/0xac
 [<c01a8197>] sysfs_write_file+0xa6/0xcc
 [<c013e0db>] lock_cpu_hotplug+0x25/0xc5
 [<fd95918f>] store_speed+0x36/0x9b [cpufreq_userspace]
 [<c029fa95>] store+0x37/0x48
 [<c01a8197>] sysfs_write_file+0xa6/0xcc
 [<c0171577>] vfs_write+0xcd/0x179
 [<c0171c20>] sys_write+0x3b/0x71
 [<c010318d>] sysenter_past_esp+0x56/0x8d
DWARF2 unwinder stuck at sysenter_past_esp+0x56/0x8d
Leftover inexact backtrace:
 [<c0104765>] show_trace+0xd/0x10
 [<c010482d>] dump_stack+0x19/0x1b
 [<c01fc842>] debug_smp_processor_id+0x96/0xac
 [<c013e0db>] lock_cpu_hotplug+0x25/0xc5
 [<fd95918f>] store_speed+0x36/0x9b [cpufreq_userspace]
 [<c029fa95>] store+0x37/0x48
 [<c01a8197>]  [<c0171577>] vfs_write+0xcd/0x179

It's from 2.6.18-rc3-mm2.

Config and dmesg log -> http://www.stardust.webpages.pl/files/mm/2.6.18-rc3-mm2/

>
>                 Dave
>
> --
> http://www.codemonkey.org.uk
>

Regards,
Michal

-- 
Michal K. K. Piotrowski
LTG - Linux Testers Group
(http://www.stardust.webpages.pl/ltg/wiki/)
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/