lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [day] [month] [year] [list]
Message-ID: <28c262361001110739l2147f9e1nf353dc3b8d72347f@mail.gmail.com>
Date:	Tue, 12 Jan 2010 00:39:48 +0900
From:	Minchan Kim <minchan.kim@...il.com>
To:	Valdis.Kletnieks@...edu
Cc:	Andrew Morton <akpm@...ux-foundation.org>,
	Thomas Renninger <trenn@...e.de>,
	Michal Schmidt <mschmidt@...hat.com>,
	Dave Jones <davej@...hat.com>, linux-kernel@...r.kernel.org
Subject: Re: mmotm 2010-01-06-14-34 - crash in CPUfreq

I have experienced same problem in 2.6.33-rc3-mm1.
My machine can't boot.

I try to find this patch but failed.
So I try to understand this problem from below changelog.

I guess it was from assigning the policy inside the lock.
lock_policy_rwsem_write uses policy_cpu but it was not initialized
at that time.

so we meet BUG_ON(policy_cpu == -1);
Quick fix is that return -1 if we meet (policy_cpu == -1) in
lock_policy_rwsem_##mode but it's very ugly.

Totally I don't understand cpufreq's code.
I hope cpufreq guys solve this problem perfectly.

I attach my config

barrios@...rios-desktop:~/mmotm$ cat /proc/cpuinfo
processor	: 0
vendor_id	: GenuineIntel
cpu family	: 6
model		: 15
model name	: Intel(R) Core(TM)2 CPU          6300  @ 1.86GHz
stepping	: 6
cpu MHz		: 1600.000
cache size	: 2048 KB
physical id	: 0
siblings	: 2
core id		: 0
cpu cores	: 2
apicid		: 0
initial apicid	: 0
fdiv_bug	: no
hlt_bug		: no
f00f_bug	: no
coma_bug	: no
fpu		: yes
fpu_exception	: yes
cpuid level	: 10
wp		: yes
flags		: fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov
pat pse36 clflush dts acpi mmx fxsr sse sse2 ss ht tm pbe nx lm
constant_tsc arch_perfmon pebs bts aperfmperf pni dtes64 monitor
ds_cpl vmx est tm2 ssse3 cx16 xtpr pdcm lahf_lm tpr_shadow
bogomips	: 3732.84
clflush size	: 64
cache_alignment	: 64
address sizes	: 36 bits physical, 48 bits virtual
power management:

....

On Sat, Jan 9, 2010 at 7:21 AM,  <Valdis.Kletnieks@...edu> wrote:
> On Wed, 06 Jan 2010 14:34:36 PST, akpm@...ux-foundation.org said:
>> The mm-of-the-moment snapshot 2010-01-06-14-34 has been uploaded to
>>
>>    http://userweb.kernel.org/~akpm/mmotm/
>
> Seeing a 100% reproducible crash, bisected down to this commit:
> (Saw crash in -mmotm1210, but haven't had time/etc to bisect till now)
>
> 46aeb7430f79cb4d03e17fedd6399884ab3aa697 is the first bad commit
> commit 46aeb7430f79cb4d03e17fedd6399884ab3aa697
> Author: Thomas Renninger <trenn@...e.de>
> Date:   Mon Dec 14 11:44:11 2009 +0100
>
>    [CPUFREQ] Fix race in cpufreq_update_policy()
>
>    An update can come in asynchronous, e.g. through an acpi notify event/irq.
>    If the policy is assigned before the locking and lock_policy_rwsem_write is
>    held by onlining/offlining CPU code, the info may be stale
>    (and the policy freed) when the locked code gets entered.
>
>    Fix that by assigning the policy inside the lock.
>    Michal added some cleanups to properly exit on failure cases -> thanks.
>
>    Signed-off-by: Thomas Renninger <trenn@...e.de>
>    CC: Michal Schmidt <mschmidt@...hat.com>
>    Signed-off-by: Dave Jones <davej@...hat.com>
>
> Here's the crash:
>
> [    4.840569] iTCO_wdt: Found a ICH9M-E TCO device (Version=2, TCOBASE=0x1060)
> [    4.840832] iTCO_wdt: initialized. heartbeat=30 sec (nowayout=0)
> [    4.840963] iTCO_vendor_support: vendor-support=0
> [    4.841404] device-mapper: ioctl: 4.16.0-ioctl (2009-11-05) initialised: dm-devel@...hat.com
> [    4.841551] EDAC MC: Ver: 2.1.0 Jan  8 2010
> [    4.842049] ------------[ cut here ]------------
> [    4.842160] kernel BUG at drivers/cpufreq/cpufreq.c:88!
> [    4.842271] invalid opcode: 0000 [#1] PREEMPT SMP
> [    4.842643] last sysfs file:
> [    4.842750] CPU 1
> [    4.842941] Pid: 1, comm: swapper Not tainted 2.6.32-08667-g46aeb74 #15 0X564R/Latitude E6500
> [    4.843031] RIP: 0010:[<ffffffff813942f4>]  [<ffffffff813942f4>] lock_policy_rwsem_write+0x43/0xad
> [    4.843126] RSP: 0000:ffff88011f87fd40  EFLAGS: 00010202
> [    4.843126] RAX: 00000000000104b0 RBX: 0000000000000001 RCX: 000000000000003e
> [    4.843126] RDX: 0000000000000000 RSI: 0000000000000001 RDI: ffffffff81ae1f88
> [    4.843126] RBP: ffff88011f87fd60 R08: 0000000000000000 R09: ffffffff81b04810
> [    4.843126] R10: ffffffff8103b7fa R11: ffffffff8184f00d R12: 00000000ffffffff
> [    4.843126] R13: 0000000000000000 R14: 0000000000000002 R15: 0000000000000000
> [    4.843126] FS:  0000000000000000(0000) GS:ffff880028300000(0000) knlGS:0000000000000000
> [    4.843126] CS:  0010 DS: 0000 ES: 0000 CR0: 000000008005003b
> [    4.843126] CR2: 00000034478a8440 CR3: 0000000001a1c000 CR4: 00000000000406e0
> [    4.843126] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
> [    4.843126] DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400
> [    4.843126] Process swapper (pid: 1, threadinfo ffff88011f87e000, task ffff88011f87c040)
> [    4.843126] Stack:
> [    4.843126]  ffff88011f87c040 0000000000000000 0000000000000000 00000000ffffffea
> [    4.843126] <0> ffff88011f87feb0 ffffffff8139558c ffff88011f87fd90 ffffffff810641a8
> [    4.843126] <0> 0000000000000000 0000000000000000 ffff88011f87fe50 ffffffff81527c4c
> [    4.843126] Call Trace:
> [    4.843126]  [<ffffffff8139558c>] cpufreq_update_policy+0x20/0xfc
> [    4.843126]  [<ffffffff810641a8>] ? debug_mutex_free_waiter+0x2e/0x6a
> [    4.843126]  [<ffffffff81527c4c>] ? __mutex_lock_common+0x58b/0x5aa
> [    4.843126]  [<ffffffff8103b7fa>] ? cpu_maps_update_begin+0x12/0x14
> [    4.843126]  [<ffffffff815291b0>] ? _raw_spin_unlock_irqrestore+0x72/0x80
> [    4.843126]  [<ffffffff8103b7fa>] ? cpu_maps_update_begin+0x12/0x14
> [    4.843126]  [<ffffffff81527478>] ? __mutex_unlock_slowpath+0x136/0x17b
> [    4.843126]  [<ffffffff81066494>] ? trace_hardirqs_on_caller+0x118/0x13c
> [    4.843126]  [<ffffffff810664c5>] ? trace_hardirqs_on+0xd/0xf
> [    4.843126]  [<ffffffff815274ac>] ? __mutex_unlock_slowpath+0x16a/0x17b
> [    4.843126]  [<ffffffff815274c6>] ? mutex_unlock+0x9/0xb
> [    4.843126]  [<ffffffff8103b80c>] ? cpu_maps_update_done+0x10/0x12
> [    4.843126]  [<ffffffff815113c7>] ? register_cpu_notifier+0x28/0x32
> [    4.843126]  [<ffffffff81b48568>] cpufreq_stats_init+0x85/0xad
> [    4.843126]  [<ffffffff81b484e3>] ? cpufreq_stats_init+0x0/0xad
> [    4.843126]  [<ffffffff810001ef>] do_one_initcall+0x59/0x14e
> [    4.843126]  [<ffffffff81b21667>] kernel_init+0x15f/0x1b5
> [    4.843126]  [<ffffffff810031d4>] kernel_thread_helper+0x4/0x10
> [    4.843126]  [<ffffffff81529ac0>] ? restore_args+0x0/0x30
> [    4.843126]  [<ffffffff81b21508>] ? kernel_init+0x0/0x1b5
> [    4.843126]  [<ffffffff810031d0>] ? kernel_thread_helper+0x0/0x10
> [    4.843126] Code: 88 1f ae 81 53 31 db 48 83 ec 08 48 8b 14 d5 60 46 b0 81 44 8b 24 10 41 83 fc ff 0f 94 c3 31 d2 89 de e8 f4 42 d0 ff 85 db 74 04 <0f> 0b eb fe 4d 63 e4 48 c7 c3 c0 04 01 00 48 89 df 4a 03 3c e5
> [    4.843126] RIP  [<ffffffff813942f4>] lock_policy_rwsem_write+0x43/0xad
> [    4.843126]  RSP <ffff88011f87fd40>
> [    4.862279] ---[ end trace b837c2c8cb05be90 ]---
> [    4.862402] Kernel panic - not syncing: Attempted to kill init!
> [    4.862523] Pid: 1, comm: swapper Tainted: G      D    2.6.32-08667-g46aeb74 #15
> [    4.862669] Call Trace:
> [    4.862792]  [<ffffffff81526031>] panic+0x7f/0x13a
> [    4.862919]  [<ffffffff81066392>] ? trace_hardirqs_on_caller+0x16/0x13c
> [    4.863049]  [<ffffffff8103db1a>] do_exit+0xd7/0x8d6
> [    4.863161]  [<ffffffff8103a328>] ? spin_unlock_irqrestore+0x9/0xb
> [    4.863288]  [<ffffffff8103af22>] ? kmsg_dump+0x136/0x150
> [    4.863402]  [<ffffffff8152a904>] oops_end+0x89/0x8e
> [    4.863520]  [<ffffffff81005dd1>] die+0x55/0x5e
> [    4.863631]  [<ffffffff8152a324>] do_trap+0x11c/0x12b
> [    4.863752]  [<ffffffff81003ac9>] do_invalid_op+0x8f/0x98
> [    4.863865]  [<ffffffff813942f4>] ? lock_policy_rwsem_write+0x43/0xad
> [    4.863980]  [<ffffffff8152899f>] ? trace_hardirqs_off_thunk+0x3a/0x3c
> [    4.864104]  [<ffffffff8103b7fa>] ? cpu_maps_update_begin+0x12/0x14
> [    4.864218]  [<ffffffff81529af0>] ? irq_return+0x0/0x2
> [    4.864365]  [<ffffffff81003055>] invalid_op+0x15/0x20
> [    4.864478]  [<ffffffff8103b7fa>] ? cpu_maps_update_begin+0x12/0x14
> [    4.864597]  [<ffffffff813942f4>] ? lock_policy_rwsem_write+0x43/0xad
> [    4.864711]  [<ffffffff8139558c>] cpufreq_update_policy+0x20/0xfc
> [    4.864825]  [<ffffffff810641a8>] ? debug_mutex_free_waiter+0x2e/0x6a
> [    4.864944]  [<ffffffff81527c4c>] ? __mutex_lock_common+0x58b/0x5aa
> [    4.865066]  [<ffffffff8103b7fa>] ? cpu_maps_update_begin+0x12/0x14
> [    4.865185]  [<ffffffff815291b0>] ? _raw_spin_unlock_irqrestore+0x72/0x80
> [    4.865311]  [<ffffffff8103b7fa>] ? cpu_maps_update_begin+0x12/0x14
> [    4.865434]  [<ffffffff81527478>] ? __mutex_unlock_slowpath+0x136/0x17b
> [    4.865548]  [<ffffffff81066494>] ? trace_hardirqs_on_caller+0x118/0x13c
> [    4.865662]  [<ffffffff810664c5>] ? trace_hardirqs_on+0xd/0xf
> [    4.865782]  [<ffffffff815274ac>] ? __mutex_unlock_slowpath+0x16a/0x17b
> [    4.865897]  [<ffffffff815274c6>] ? mutex_unlock+0x9/0xb
> [    4.866037]  [<ffffffff8103b80c>] ? cpu_maps_update_done+0x10/0x12
> [    4.866153]  [<ffffffff815113c7>] ? register_cpu_notifier+0x28/0x32
> [    4.866278]  [<ffffffff81b48568>] cpufreq_stats_init+0x85/0xad
> [    4.866391]  [<ffffffff81b484e3>] ? cpufreq_stats_init+0x0/0xad
> [    4.866505]  [<ffffffff810001ef>] do_one_initcall+0x59/0x14e
> [    4.866624]  [<ffffffff81b21667>] kernel_init+0x15f/0x1b5
> [    4.866737]  [<ffffffff810031d4>] kernel_thread_helper+0x4/0x10
> [    4.866860]  [<ffffffff81529ac0>] ? restore_args+0x0/0x30
> [    4.866971]  [<ffffffff81b21508>] ? kernel_init+0x0/0x1b5
> [    4.867099]  [<ffffffff810031d0>] ? kernel_thread_helper+0x0/0x10
>
>



-- 
Kind regards,
Minchan Kim

Download attachment "config.barrios" of type "application/octet-stream" (81056 bytes)

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ